## Abstract

Understanding political phenomena requires measuring the political preferences of society. We introduce a model based on mixtures of spatial voting models that infers the underlying distribution of political preferences of voters with only voting records of the population and political positions of candidates in an election. Beyond offering a cost-effective alternative to surveys, this method projects the political preferences of voters and candidates into a shared latent preference space. This projection allows us to directly compare the preferences of the two groups, which is desirable for political science but difficult with traditional survey methods. After validating the aggregated-level inferences of this model against results of related work and on simple prediction tasks, we apply the model to better understand the phenomenon of political polarization in the Texas, New York, and Ohio electorates. Taken at face value, inferences drawn from our model indicate that the electorates in these states may be less bimodal than the distribution of candidates, but that the electorates are comparatively more extreme in their variance. We conclude with a discussion of limitations of our method and potential future directions for research.

## Access this chapter

Tax calculation will be finalised at checkout

Purchases are for personal use only

### Similar content being viewed by others

## Notes

- 1.
Code and data for analyses are available at https://github.com/anahm/inferring-population-preferences.

## References

Abramowitz, A.I., Saunders, K.L.: Is polarization a myth? J. Polit.

**70**(2), 542â€“555 (2008)Airoldi, E.M., Blei, D.M., Fienberg, S.E., Xing, E.P.: Mixed membership stochastic blockmodels. In: Advances in Neural Information Processing Systems, pp. 33â€“40 (2009)

Ansolabehere, S., Palmer, M., Lee, A.: Precinct-level election data (2014). http://hdl.handle.net/1902.1/21919

Ansolabehere, S., Pettigrew, S.: Cumulative CCES Common Content (2006â€“2012) (2014). doi: 10.7910/DVN/26451

BarberÃ¡, P.: Birds of the same feather Tweet together: Bayesian ideal point estimation using Twitter data. Polit. Anal.

**23**(1), 76â€“91 (2015)Bartels, L.M.: Beyond the running tally: partisan bias in political perceptions. Polit. Behav.

**24**(2), 117â€“150 (2002)Bonica, A.: Database on ideology, money in politics, and elections: Public version 1.0 (2013). http://data.stanford.edu/dime

Bonica, A.: Mapping the ideological marketplace. Am. J. Polit. Sci.

**58**(2), 367â€“386 (2014)DiMaggio, P., Evans, J., Bryson, B.: Have americanâ€™s social attitudes become more polarized? Am. J. Sociol.

**102**, 690â€“755 (1996)Ding, W., Ishwar, P., Saligrama, V.: Learning mixed membership mallows models from pairwise comparisons arXiv:1504.00757 (2015)

Downs, A.: An economic theory of political action in a democracy. J. Polit. Econ.

**65**(2), 135â€“150 (1957)Enelow, J.M., Hinich, M.J.: The Spatial Theory of Voting: an Introduction. Cambridge University Press, Cambridge (1984)

Fiorina, M.P., Abrams, S.J.: Political polarization in the American public. Ann. Rev. Polit. Sci.

**11**, 563â€“588 (2008)Fiorina, M.P., Abrams, S.J., Pope, J.C.: Polarization in the American public: misconceptions and misreadings. J. Polit.

**70**(2), 556â€“560 (2008)Flaxman, S.R., Wang, Y.X., Smola, A.J.: Who supported Obama in 2012? Ecological inference through distribution regression. In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 289â€“298. ACM (2015)

Gerber, E.R., Lewis, J.B.: Beyond the median: voter preferences, district heterogeneity, and political representation. J. Polit. Econ.

**112**(6), 1364â€“1383 (2004)Gerrish, S., Blei, D.M.: Predicting legislative roll calls from text. In: Proceedings of the 28th International Conference on Machine Learning, pp. 489â€“496 (2011)

Krafft, P., Moore, J., Desmarais, B., Wallach, H.M.: Topic-partitioned multinetwork embeddings. In: Advances in Neural Information Processing Systems, pp. 2807â€“2815 (2012)

Lee, J.M.: Assessing mass opinion polarization in the U.S. using relative distribution method. Soc. Indic. Res.

**124**(2), 571â€“598 (2014)Levendusky, M.S., Pope, J.C., Jackman, S.D.: Measuring district-level partisanship with implications for the analysis of U.S. elections. J. Polit.

**70**(3), 736â€“753 (2008)Levendusky, M.S., Pope, J.C.: Red states vs. blue states going beyond the mean. Publ. Opin. Q.

**75**(2), 227â€“248 (2011)Lewis, J.B.: Estimating voter preference distributions from individual-level voting data. Polit. Anal.

**9**(3), 275â€“297 (2001)Lewis, J.B., DeVine, B., Pitcher, L., Martis, K.C.: Digital boundary definitions of United States congressional districts, pp. 1789â€“2012 (2013). http://cdmaps.polisci.ucla.edu

McCarty, N., Poole, K.T., Rosenthal, H.: Polarized America: The Dance of Ideology and Unequal Riches, vol. 5. MIT Press, Cambridge (2006)

Poole, K.T., Rosenthal, H.: A spatial model for legislative roll call analysis. Am. J. Polit. Sci.

**29**(2), 357â€“384 (1985)Ruths, D., Pfeffer, J.: Social media for large studies of behavior. Science

**346**(6213), 1063â€“1064 (2014)Tausanovitch, C., Warshaw, C.: Measuring constituent policy preferences in congress, state legislatures, and cities. J. Polit.

**75**(2), 330â€“342 (2013)United States Census Bureau: Tigerweb state-based data files: Voting districts - Census 2010 (2010). http://tigerweb.geo.census.gov

Zhang, Y., Friend, A., Traud, A.L., Porter, M.A., Fowler, J.H., Mucha, P.J.: Community structure in congressional cosponsorship networks. Phys. A

**387**(7), 1705â€“1712 (2008)

## Acknowledgements

Special thanks to David Lazer for bringing our attention to Adam Bonicaâ€™s work, to David Parkes for suggesting a reformulation of our model that enabled integrating out voter positions, and to Matt Blackwell for encouraging us to think more about validity and identifiability. This work was supported in part by the NSF GRFP under grant #1122374. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect those of the sponsors.

## Author information

### Authors and Affiliations

### Corresponding author

## Editor information

### Editors and Affiliations

## Appendices

### AÂ Mathematical Definitions of DiMaggioâ€™s Polarization Metrics

Given the estimates of our model, we use the following analytical form of the standard deviation of a mixture model to measure political polarization in terms of dispersion:

where \(n_i\) is the total number of voters assigned to component *i* and \(M_{\mu }\) is the weighted mean of the mixture distribution of voter preferences.

To measure political polarization in terms of bimodality, we use kurtosis. Kurtosis is the fourth central moment of the mixture distribution divided by the square of the variance of the mixture distribution. We use the following analytical form:

where *X* is a random variable drawn from the mixture distribution and hence the numerator is the fourth central moment of the mixture distribution. The analytical form to compute the z-th central moment of the mixture distribution is below.

where \(Y_i\) is a random variable drawn from component *i* of the mixture distribution, \(w_i\) is the weight of each component, and \(E[(Y_i - \mu _i)^z]\) is the z-th central moment of the *i*th component distribution. In our analysis, we weight each component in the mixture distribution by the proportion of the population assigned that component.

### BÂ Additional Results

In Sect.Â 5, we presented the results of our method assuming the underlying component distribution is Normal and the number of clusters (*K*) is 4. This section tests the robustness of these assumptions and presents our results when varying the underlying component distribution and the number of clusters.

### 1.1 B.1Â Varying the Underlying Component Distribution

We test the inference procedure of our model not only assuming Normal component distributions, but also Uniform and Laplace component distributions. When assuming the distributions of voters follow a Laplace distribution, we use the same Normal prior defined for the mean of the Normal component for the location parameter and the same Inverse Gamma prior defined for the standard deviation of the Normal component for the scale parameter. When we use Uniform component distributions, we use the same Normal prior defined for the mean of the Normal component for both the minimum and the distance between the minimum and maximum parameters. The priors defined for the Normal component parameters can be found in Sect.Â 4. For each alternative underlying component distribution, the inferred distributions can be seen in Fig.Â 4, derived polarization metrics can be seen in TableÂ 2, and prediction comparisons can be seen in Fig.Â 6.

FigureÂ 5 visualizes the comparisons between the results derived of alternative component distributions and alternative data sources described in Sect.Â 5.1. We find significant positive correlations between our district-level point estimates and all of the alternative data sources. Assuming Laplace component distributions, our results have a correlation of 0.3216 with the responses selecting ideology given a discrete scale (left column in Fig.Â 5), 0.2514 with the responses selecting ideology along a continuous scale (middle column), and 0.5323 with the MRP estimates (right column). All of these correlations were significant with p-values less than 0.01. Assuming Uniform component distributions, our results have a correlation of 0.3652 with the responses selecting ideology given a discrete scale, 0.2404 with the responses selecting ideology along a continuous scale, and 0.6331 with the MRP estimates. Again, all of these correlations were significant with p-values less than 0.01.

### 1.2 B.2Â Varying the Number of Clusters

We also varied the number of clusters (*K*) used in the model. The main results section in the paper presented results assuming \(K=4\), but below we include the inferred distributions in Fig.Â 7, derived polarization metrics in TableÂ 3, and prediction comparisons in Fig.Â 9 for \(K=2\) and \(K=8\), assuming Normal underlying precinct distributions. Due to time constraints, we were only able to generate these results given the Texas and New York congressional elections.

FigureÂ 8 visualizes the comparisons between the results derived of alternative component distributions and alternative data sources described in Sect.Â 5.1. We find significant positive correlations between our district-level point estimates and all of the alternative data sources. When our model assumes 2 clusters rather than 4 clusters, the results of our model have a correlation of 0.2845 with the responses selecting ideology given a discrete scale (left column in Fig.Â 5), 0.2646 with the responses selecting ideology along a continuous scale (middle column), and 0.7160 with the MRP estimates (right column). All of these correlations were significant with p-values less than 0.01. When our model assumes 8 clusters, the results of our model have a correlation of 0.2925 with the responses selecting ideology given a discrete scale, 0.1867 with the responses selecting ideology along a continuous scale, and 0.6861 with the MRP estimates (right column). Again, all of these correlations were significant with p-values less than 0.01.

## Rights and permissions

## Copyright information

Â© 2016 Springer International Publishing AG

## About this paper

### Cite this paper

Nahm, A., Pentland, A., Krafft, P. (2016). Inferring Population Preferences via Mixtures of Spatial Voting Models. In: Spiro, E., Ahn, YY. (eds) Social Informatics. SocInfo 2016. Lecture Notes in Computer Science(), vol 10046. Springer, Cham. https://doi.org/10.1007/978-3-319-47880-7_18

### Download citation

DOI: https://doi.org/10.1007/978-3-319-47880-7_18

Published:

Publisher Name: Springer, Cham

Print ISBN: 978-3-319-47879-1

Online ISBN: 978-3-319-47880-7

eBook Packages: Computer ScienceComputer Science (R0)