Utilizing Graph Convolutional Networks to Detect and Predict Voting Patterns
Part 0. Executive Summary
Mathematical models serve as a means to understand voting results in past elections as well as to try and predict the results of future elections. Many of these methods rely on demographic data, polling data, and voting district similarity data as well as past election voting results.
This effort shows the ability to match a graph convolutional neural network to past election results. The effort demonstrates a novel approach to selecting the structure of the graph architectures. Results of this effort point to the similarities between voting districts in a past election as well as to guidance on how to target voting districts for future elections.
Part 1. Background
In conjunction with the 3 March 2020 presidential primary in the state of California, the city of Long Beach held primary elections for even-numbered City Council districts within the city. The Los Angeles County Registrar-Recorder/County Clerk provided the voting results for each of these city council districts broken down by voting precinct.
The 6th City Council district, in particular, provided interesting results as six candidates attempted to win the seat outright with greater than 50% of the vote or to make it into a November 2020 runoff with a top two finish in the primary. No candidate won the seat outright, so two candidates will advance to the November 2020 primary. What patterns can be pulled from March 2020 primary to:
1. Show how various precincts voted similarly to other precincts
2. Provide guidance for the advancing candidates as to what precincts to target to win the November 2020 runoff
3. Derive a corresponding runoff campaign strategy based upon precincts to be targeted and how they are affected by similar precincts
Part 2. Methodology
Shapefiles for the Long Beach 6th City Council District boundaries and relevant voting precincts were obtained from References 1 and 2. Intersections between the 6th City Council District and voting precincts were computed and shown in Figure 1.
Results from the 3 March 2020 primary election for the 6th City Council District in Long Beach were downloaded from the Los Angeles County Registrar-Recorder/Clerk County website of Reference 3.
Some precincts had no reported March 2020 voting results, so these precincts were excluded from this study. Some of these precincts may have had their boundaries redrawn or they were simply renamed. Figure 1 shows the voting precincts contained within the 6th City Council district in the city of Long Beach. Light blue precincts had data from the primary and light gray precincts had no data.
For the fifteen voting precincts that had primary election results, a Graph Convolutional Neural Network (GCN) (Ref. 4) was derived which shows the relationships between voting precincts. In most GCNs, the architecture of candidate graphs is described by the adjacency matrix. The adjacency matrix is derived based upon inspection of the connections within the candidate graphs.
How do you derive a GCN if you don’t know the graph architectures? In other words, what graph architectures of voting precinct connectivity are you going to limit the method to based upon your ability to derive the adjacency matrix?
For this effort, a novel approach was used to derive candidate graph configurations. The approach used a three-stage process to derive both the best adjacency matrix (indicating connections between voting precincts) and the weights of the neural network to minimize the network error.
Since this approach was changing the adjacency matrix along with the neural network weights, a typical back propagation algorithm could not be used to find the best GCN. Instead, a genetic algorithm was used to solve for the best GCN over the three stages. Since we’re changing connectivity as well as weighting functions, the design space would be non-convex and disjoint and a gradient-based method would be unsuitable. Thus the methodology stages were:
1. Utilize a genetic algorithm to find the best GCN matching the voting precinct data by modifying the adjacency matrix entries and the neural network weights.
2. Filter the resulting adjacency matrix by examining each entry and setting the value to zero if the adjacency matrix is below a threshold value. Otherwise set the adjacency matrix value to 1. Thus the adjacency matrix for the final GCN design in stage three contains only zeros and ones as is customary but was derived from the data.
3. Utilize a genetic algorithm to find the best GCN matching the voting precinct data by modifying only the neural network weights. (This could have been performed using a standard backpropagation algorithm.)
Code was written in python using standard modules such as folium, geopandas, and scipy to perform the three stage GCN derivation methodology described above.
Part 3. Results
Figure 2 shows the number of ballots cast for neither of the November 2020 runoff candidates. The results shown here give a pictorial view of the precincts with high level of non-runoff votes (in green), medium level of non-runoff votes (in gold), and low level of non-runoff votes (in red).
These are people who showed up to vote but selected one of the other four candidates in the primary. A total of 1676 votes are up for grabs amongst the four excused candidates. The precincts where these 1676 votes exist are color coded in Figure 2 in green (most votes available), gold (moderate number of votes available), and red (least number of votes available). There is a large range between the precinct with the most votes available (184 votes) and the least (74 votes). The three precincts shown in green and roughly bounded by PCH and the Los Angeles river on the lower left and Willow and Long Beach Blvd on the upper right contain 527 of the 1676 available. These are precincts that could be targeted with little travel distance required between the three precincts.
Figure 3 shows the graph convolutional network for precinct 385011A. This figure isn’t necessarily that useful as shown without a lot of examining the specific voting precincts and strengths related to each point/line of the graph.
Figure 4, on the other hand, shows the GCN derived for 385011A superimposed on the voting precinct map. Here the graph architecture between precincts is obvious and the strengths of connections shown stronger in green and weaker in red. The strength of connections (highest in green, lowest in red) shows both the similarities in voting trends between two given voting precincts as well as the similarities that can be exploited for future election strategies. That is, what plays in precinct 3850117A also plays well in neighboring precinct 3850133A as well as “removed” precincts 3850146A, 3850151A, and 3850138A.
Figure 5 shows the GCN that best matches for voting precinct 3850133A. Here there are a significant number of precincts strongly correlated with the behavior of precinct 3850133A (as shown in green). Again, it would take some effort to track down the strongest connection precincts as well as where they are located in the 6th City Council district.
Figure 6 shows the strong connectivity between precinct 3850133A and other precincts as well as where they are geographically located. Since there are a number of strong connections between precinct 3850133A and other precincts, the relationships between these strong connections should be investigated. The deeper investigations could involve demographic studies between the precincts, zoning laws/restrictions between the precincts, and other factors.
Finally Figure 7 shows the best-matched GCN between voting precinct 3850134A and other precincts. In this case, the strong connections between precinct 3850134A and other voting precincts within the 6th City Council district are relatively small. But where re they located?
Figure 8 shows the strong connectivities in a manner that a City Council candidate or campaign manager could visually understand. Voting precinct 3850134A has a strong connection with neighboring precincts 3850133A and 3850021A as well as precinct 3850034A. This strong connectivity information could be used to guide an election strategy to maintain this connectivity and capture additional votes. Furthermore, additional study could be performed on the “weakly” connected precincts to formulate a campaign strategy to capture the votes in these precincts.
Part 4. Commentary
A methodology was presented that allows the derivation of graph architectures and optimal neural network weights to show similarities between voting precincts.
The methodology is a proof of concept and can be expanded to be an operational tool. The expansion would include more voting precincts and demographic data for these voting precincts. Voting predictions for the precincts would also be an expansion that could be used to drive campaign strategies.
Since the methodology derives potential graph architectures, the use for a very large set of voting precincts, such as at a sate or national level, would have to be tried in order to determine its feasibility.
1. “City of Long Beach Council Districts”, City of Long Beach Open Data, http://datalb.longbeach.gov/datasets/9719bd7b092b47a1b170a341e3fa48a7_6
2. “2018 Primary Election Precinct Data”, Statewide Database, https://statewidedatabase.org/d10/p18.html.
3. “March 3, 2020 Presidential Primary Election Results”, Los Angeles County Registrar-Recorder/County Clerk,, https://lavote.net/home/voting-elections/current-elections/election-results/past-election-results#
4. “Semi-Supervised Classification with Graph Convolutional Networks”, Kipf, T. and Welling, M., arXiv preprint arXiv:1609.02907.ILCR, 2017.