Skip to main content

Comparing public communication in democracies and autocracies: automated text analyses of speeches by heads of government


Renewed efforts at empirically distinguishing between different forms of political regimes leave out the cultural dimension. In this article, we demonstrate how modern computational tools can be used to fill this gap. We employ web-scraping techniques to generate a data set of speeches by heads of government in European democracies and autocratic regimes around the globe. Our data set includes 4740 speeches delivered between 1999 and 2019 by 40 political leaders of 27 countries. By scaling the results of a dictionary application, we show how, in comparative terms, liberal or illiberal the leaders present themselves to their national and international audience. In order to gauge whether our liberalness scale reveals meaningful distinctions, we perform a series of validity tests: criterion validity, qualitative hand-coding, unsupervised topic modeling, and network analysis. All tests suggest that our liberalness scale does capture meaningful differences between political regimes despite the large heterogeneity of our data.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5


  1. Boese (2019) provides a thorough comparison of V-Dem with Freedom House and Polity and highlights these and other advantages of the most recent V-Dem data set.

  2. Likewise, if public communication by an autocrat continuously emphasizes liberal political norms and values, then, we argue, it undermines the persistence of the illiberal autocratic regime.

  3. With this, we, obviously, do not claim that political leadership (and its rhetoric) is identical to the political regime (and its institutional practices). Instead, we follow those in the literature who argue that the former is prone to have an impact on the longer-term viability of the latter (Diamond 1999; Linz and Stepan 1996; Merkel 1998; Higley and Burton 2006).

  4. We refer to a rather broad understanding of political speeches here which occasionally includes also political statements, e.g. from press conferences, and other more spontaneously produced documents.

  5. To formally distinguish between both regime types, we rely on the most recent V-Dem data and their regime typology (Coppedge et al. 2018; Lührmann et al. 2018b, see also Sect. 4.1).

  6. de Vries et al. (2018) illustrate in various tests that for bag-of-words text models findings generated from human-translated and machine-translated texts highly overlap. Lucas et al. (2015) provide a showcase of how to preprocess and manage multilingual texts in R.

  7. This is demonstrated by the fact that among liberal democracies economic models with varying economic liberties can be found.

  8. See “Appendix” section for the complete list of our dictionary terms.

  9. Another constraint of the model is that it does not account for differences in rhetoric over time. Yet, this is rather a problem of data availability since we do not have enough speeches in each case for estimates per year.

  10. The replication files, including robustness tests for different pre-processing strategies, are available at:

  11. For this formal distinction between democracy and autocracy we rely on the most recent data of the V-Dem Project (Coppedge et al. 2018; Lührmann et al. 2018a) who classify autocracies as regimes in which no de-facto multiparty, or free and fair elections, or Dahl’s institutional prerequisites are not minimally fulfilled (Lührmann et al. 2018b).

  12. Concerning the current Polish prime minister Mateusz Morawiecki and Molodiva’s Pavel Filip, we cannot make clear classifications because the confidence intervals of their point estimates cross the zero line (cf. Fig. 1). The same is true for Edi Rama from Albania—yet, in his case the confidence interval overlaps only by a very small margin, suggesting that he is rather to be seen as an illiberal than liberal speaker.

  13. Available here:

  14. Available here:

  15. For an exploratory study on different language styles among autocrats, see Maerz (2019).

  16. We heavily pre-processed our corpus before applying the unsupervised techniques (removal of stop words, infrequently used terms, punctuation, numbers, stemming, lowercase) to remove irrelevant words and treat words with similar properties as identical. As recommended by Denny and Spirling (2018) and illustrated in our “Appendix” section, we conducted detailed robustness tests to make sure that none of these preprocessing steps uncontrollably alters the STM results. All operations were done in R (2019, v. 3.5.2.) with the STM package (Roberts et al. 2015, v. 1.3.3.). The replication files are available here:

  17. Vladimir Putin has the largest share and smallest 95% confidence interval in this topic, cf. Fig. 4 in the “Appendix” section.

  18. Since the model measures relative topic proportions, the rather isolated position of Orbán in this regard is not a consequence of his comparatively large share of speeches in the corpus. Other speakers have similarly isolated positions despite their rather small amount of speeches (e.g. Emmanuel Macron on ‘Collective Memory’ in Fig. 5, “Appendix” section, or Kim Jong Un on ‘Juche, Military’ in Fig. 4).

  19. The order of the speakers in these plots is based on their scores on our dictionary scale, the horizontal lines around the point estimates refer to the 95% confidence interval for the relative proportions of each speaker.

  20. The way Orbán’s government attacks George Soros, a financier and philanthropist known for his pro-migration and liberal opinions, is a case sui generis in the European Union. Orban’s most recent election campaign has been broadly understood as an anti-Jewish and anti-Muslim manifesto, for example by Cohen (2018) in the Guardian.

  21. For the specifics of illiberal and autocratic language styles, see Dowell et al. (2015), Windsor et al. (2015, 2017) and Maerz (2019).

  22. Lucas et al. (2015, Appendix E) provide more details about the graph estimation procedure which we have adopted here.

  23. Details can be found in the material available at

  24. Further information also here:


  • Adcock, R., Collier, D.: Measurement validity: a shared standard for qualitative and quantitative research. Am. Polit. Sci. Rev. 95(3), 529–546 (2001)

    Article  Google Scholar 

  • Arat, Z.F.: Democracy and Human Rights in Developing Countries. Lynne Rienner, Boulder (1991)

    Google Scholar 

  • Benoit, K., Nulty, P., Obeng, A., Wang, H., Lauderdale, B., Lowe, W.: Quanteda package (2019). Retrieved from

  • Bocskor, Á.: Anti-immigration discourses in Hungary during the ‘Crisis’ year: The Orbân Government’s ‘National Consultation’ Campaign of 2015. Sociology 52(3), 551–568 (2018)

    Article  Google Scholar 

  • Boese, V.A.: How (not) to measure democracy. Int. Area Stud. Rev. (2019)

  • Bogaards, M.: De-democratization in Hungary: diffusely defective democracy. Democratization (2018).

    Article  Google Scholar 

  • Bollen, K.A.: Issues in the comparative measurement of political democracy. Am. Sociol. Rev. 45(3), 370–390 (1980)

    Article  Google Scholar 

  • Bozóki, A., Hegedűs, D.: An externally constrained hybrid regime: Hungary in the European Union. Democratization (2018).

    Article  Google Scholar 

  • Chang, J., Gerrish, S., Wang, C., Blei, D.M.: Reading tea leaves: how humans interpret topic models. Adv. Neural Inf. Process. Syst. 22, 288–296 (2009)

    Google Scholar 

  • Cianetti, L., Dawson, J., Hanley, S.: Rethinking “democratic backsliding” in Central and Eastern Europe—looking beyond Hungary and Poland. East Eur. Polit. 34(3), 243–256 (2018).

    Article  Google Scholar 

  • Cohen, N.: In Hungary, the exploitation of a mythical enemy is poisoning politics. The Guardian, 31 Mar. 2018. (2018) Retrieved from

  • Coppedge, et al.: V-dem country-year dataset v8, Varieties of Democracy (V-Dem) Project (2018).

  • Dahl, R.A.: Polyarchy: participation and opposition. Yale University Press, New Haven (1971)

    Google Scholar 

  • Dahl, R.A.: Democracy and its critics. Yale University Press, New Haven (1989)

    Google Scholar 

  • Denny, M.J., Spirling, A.: Text preprocessing for unsupervised learning: why it matters, when it misleads, and what to do about it. Polit. Anal. 26, 168–189 (2018)

    Article  Google Scholar 

  • de Vries, E., Schoonvelde, M., Schumacher, G.: No longer lost in translation. Evidence that Google Translate Works for Comparative Bag-of-Words Text Applications, Political Analysis, Online First (2018). Retrieved from

  • Diamond, L.J.: Developing Democracy—Toward Consolidation. The John Hopkins University Press, Baltimore (1999)

    Google Scholar 

  • Dowell, N.M., Windsor, L.C., Graesser, A.C.: Computational linguistics analysis of leaders during crises in authoritarian regimes. Dyn. Asymmetric Confl. 9(01–03), 1–12 (2015).

    Article  Google Scholar 

  • Dukalskis, A.: The Authoritarian Public Sphere—Legitimation and Autocratic Power in North Korea, Burma, and China. Routledge, London (2017)

    Google Scholar 

  • Dukalskis, A., Gerschewski, J.: What autocracies say (and what citizens hear): proposing four mechanisms of autocratic legitimation. Contemp. Polit. 23, 251–268 (2017)

    Article  Google Scholar 

  • Easton, D.: A Systems Analysis of Political Life. Wiley, New York (1965)

    Google Scholar 

  • Fauve, A.: Global Astana: nation branding as a legitimization tool for authoritarian regimes. Cent. Asian Surv. 34, 110–124 (2015).

    Article  Google Scholar 

  • FreedomHouse: Freedom house on Hungary in 2019 (2019). Retrieved from

  • Geddes, B., Frantz, E., Wright, J.: Autocratic breakdown and regime transitions: a new data set. Perspect. Polit. 12(02), 313–331 (2014)

    Article  Google Scholar 

  • Gerschewski, J.: The three pillars of stability: legitimation, repression, and cooptation in autocratic regimes. Democratization 20(1), 13–38 (2013)

    Article  Google Scholar 

  • Greene, D., O’Callaghan, D., Cunningham, P.: How many topics? Stability analysis for topic models. Lecture Notes in Computer Science, 8724 LNAI (PART 1), pp. 498–513 (2014)

  • Grimmer, J., Stewart, B.M.: Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Polit. Anal. 21(03), 267–297 (2013)

    Article  Google Scholar 

  • Hanley, S., Vachudova, M.A.: Understanding the illiberal turn: democratic backsliding in the Czech Republic*. East Eur. Polit. 34(3), 276–296 (2018).

    Article  Google Scholar 

  • Hayes, A.F., Krippendorff, K.: Answering the call for a standard reliability measure for coding data. Commun. Methods Meas. 1(1), 77–89 (2007).

    Article  Google Scholar 

  • Higley, J., Burton, M.: Elite Foundations of Liberal Democracy. Rowman and Littlefield Publishers, Oxford (2006)

    Google Scholar 

  • Inglehart, R., Welzel, C.: Modernization, Cultural Change, and Democracy: The Human Development Sequence. Cambridge University Press, Cambridge (2005)

    Google Scholar 

  • Kirsch, H., Welzel, C.: Democracy misunderstood: authoritarian notions of democracy around the globe. Soc. Forces (2018).

    Article  Google Scholar 

  • Krekó, P., Enyedi, Z.: Orbán ’ s laboratory of illiberalism. J. Democr. 29(3), 39–51 (2018)

    Article  Google Scholar 

  • Langer, A.I., Sagarzazu, I.: Are all policy decisions equal? Explaining the variation in media coverage of the UK budget. Policy Stud. J. 45(2), 337–358 (2015).

    Article  Google Scholar 

  • Laver, M., Benoit, K., Garry, J.: Extracting policy positions from political texts using words as data. Am. Polit. Sci. Rev. 97(2), 311–331 (2003)

    Article  Google Scholar 

  • Linz, J.J., Stepan, A.C.: Problems of Democratic Transition and Consolidation: Southern Europe, South America, and Post-communist Europe. Johns Hopkins University Press, Baltimore (1996)

    Google Scholar 

  • Lowe, W.: Yoshikoder: cross-platform multilingual content analysis. Java Software Version 0.6.5. (2015). Retrieved from

  • Lowe, W., Benoit, K., Mikhaylov, S., Laver, M.: Scaling policy preferences from coded political texts. Legis. Stud. Q. 36, 123–155 (2011)

    Article  Google Scholar 

  • Lucas, C., Nielsen, R.A., Roberts, M.E., Stewart, B.M., Storer, A., Tingley, D.: Computer-assisted text analysis for comparative politics. Polit. Anal. 23, 254–277 (2015)

    Article  Google Scholar 

  • Lührmann, A., Mechkova, V., Dahlum, S., Maxwell, L., Petrarca, C.S., Sigman, R., Staffan, I.: State of the world 2017: Autocratization and exclusion? Democratization (2018a).

    Article  Google Scholar 

  • Lührmann, A., Tannenberg, M., Lindberg, S.I.: Regimes of the world (RoW): opening new avenues for the comparative study of political regimes. Polit. Gov. 6(1), 60 (2018b)

    Google Scholar 

  • Maerz, S.F.: Ma’naviyat in Uzbekistan: An ideological extrication from its soviet Past? J. Polit. Ideol. 23, 205–222 (2018a).

    Article  Google Scholar 

  • Maerz, S.F.: The many faces of authoritarian persistence. A set-theory perspective on the survival strategies of authoritarian regimes. Gov. Oppos. (2018b).

    Article  Google Scholar 

  • Maerz, S.F.: Simulating pluralism: the language of democracy in hegemonic authoritarianism. Polit. Res. Exch. (2019).

    Article  Google Scholar 

  • Maerz, S.F., Puschmann, C.: Text as data and automated content analysis for conflict research: a literature survey. In: Computational Conflict Research (Springer Nature as Part of Their Computational Social Sciences Series) (forthcoming) (2019)

  • Makarychev, A., Yatsyk, A.: Entertain and govern: from sochi 2014 to FIFA 2018. Probl. Post Communism 65(2), 115–128 (2018)

    Article  Google Scholar 

  • Megoran, N.: Framing Andijon, narrating the nation: Islam Karimov’s account of the events of 13 May 2005. Cent. Asian Surv. 27(1), 15–31 (2008).

    Article  Google Scholar 

  • Merkel, W.: The consolidation of post-autocratic democracies: a multi-level model. Democratization 5(3), 33–67 (1998).

    Article  Google Scholar 

  • Munzert, S., Rubba, C., Meissner, P., Nyhuis, D.: Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining. Wiley, London (2015)

    Google Scholar 

  • Nelson, L.K., Burk, D., Knudsen, M., McCall, L.: The future of coding: a comparison of hand-coding and three types of computer-assisted text analysis methods. Sociol. Methods Res. (2018).

    Article  Google Scholar 

  • Omelicheva, M.Y.: Authoritarian legitimation: assessing discourses of legitimacy in Kazakhstan and Uzbekistan. Cent. Asian Surv. 35(4), 481–500 (2016).

    Article  Google Scholar 

  • R Core Team: R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna (2019). Retrieved from

  • Roberts, M.E., Stewart, B.M., Tingley, D., Lucas, C., Leder-Luis, J., Gadarian, S.K., Rand, D.G.: Structural topic models for open-ended survey responses. Am. J. Polit. Sci. 58(4), 1064–1082 (2014)

    Article  Google Scholar 

  • Roberts, M.E., Stewart, B.M., Tingley, D., Benoit, K.: Package ‘stm’ (2015). Retrieved from

  • Roberts, M.E., Stewart, B.M., Tingley, D.: stm: R package for structural topic models. J. Stat. Softw. 1, 12 (2018)

    Google Scholar 

  • Scheppele, K.L.: The rule of law and the Frankenstate: why governance checklists do not work. Governance 26(4), 559–562 (2013).

    Article  Google Scholar 

  • Schumpeter, J.A.: Capitalism, Socialism and Democracy, 1st edn. Harper Brothers, New York (1942)

    Google Scholar 

  • Stanley, B.: Confrontation by default and confrontation by design: strategic and institutional responses to Poland’s populist coalition government. Democratization 23(2), 263–282 (2016)

    Article  Google Scholar 

  • Vanhanen, T.: Global trends of democratization in the 1990s: a statistical analysis, Berlin (1994)

  • Welzel, C.: Freedom Rising: Human Empowerment and the Quest for Emancipation. Cambridge University Press, Cambridge (2013)

    Book  Google Scholar 

  • Windsor, L., Dowell, N., Graesser, A.: The language of autocrats: leaders’ language in natural disaster crises. Risk Hazards Crisis Public Policy 5(4), 446–467 (2015)

    Article  Google Scholar 

  • Windsor, L., Dowell, N., Windsor, A., Kaltner, J.: Leader language and political survival strategies. Int. Interact. 44(2), 321–336 (2017).

    Article  Google Scholar 

  • Zakaria, F.: The rise of illiberal democracy. Foreign Aff. 76(6), 22–43 (1997)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Seraphine F. Maerz.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.



Dictionary terms

figure a

Testing the effects of text cleaning procedures

Typically, text-as-data approaches based on the bag-of-words assumption rely on heavy preprocessing of the texts. Different options for preprocessing exist: capital letters can be converted into small letters, stopwords removed, words be lemmed or stemmed etc. Preprocessing decisions can, and often do, influence the results of text analysis. It therefore is paramount to test how robust the results obtained are against equally plausible preprocessing strategies.

Based on our theoretical conceptualization of illiberal and liberal rhetoric in the speeches of heads of government, we choose the following combination of preprocessing steps: Removing punctuation (P), removing numbers (N), put to lowercase (L), stemming (S), remove stopwords (W), and remove infrequently used terms (I). Concerning lowercase, we do not expect huge effects on the results since this is a rather basic procedure. Because all our 4740 speeches were automatically collected from the Internet by using webscraping techniques, we expect most texts to include numbers and other (foreign) signs with no relation to the original text of the speech (e.g. hyperlinks, frames in foreign languages, etc.). This is why we deemed it necessary to erase punctuation and numbers. We also removed infrequently used terms and an individually compiled list of stopwords (e.g. foreign language letters and strange words which we considered not relevant for the analysis).Footnote 23

In the following, we investigate the potential of our preprocessing strategy—PNLSWI—to significantly affect our results. In essence, the diagnostic procedure suggested by Denny and Spirling tests how much documents ‘move’ depending on the applied preprocessing features. It does so by calculating the so-called preText Score which is the result of comparing pairwise distances between documents for a number of preprocessing specifications (cf. Denny and Spirling 2018). For this, we draw a random sample of not preprocessed documents from our corpus (500, as recommended by Denny and Spirling, p. 185) and perform several diagnostics to measure the potential effects of different (combinations of) preprocessing steps. All operations are done in R with Denny and Spirling’s package preText.Footnote 24

The two core functions of this package give us, first, a plot illustrated in Fig. 6. It displays the distance scores (preText Scores) for a number of combinations of preprocessing steps. Higher scores mean higher effects on results. The plot shows that our chosen combination of preprocessing features—PNLSWI, marked with a green line—is in the medium range if compared to other preprocessing specifications. This indicates that our chosen text preprocessing can be expected to have a comparatively not so big effect on the results.

Fig. 6
figure 6

Assessing the expected effects of our P–N–L–S–W–I preprocessing procedure on our corpus’ analysis

A second plot, Fig. 7, shows regression coefficients for each single text cleaning feature. Here negative coefficients indicate that a step tends to reduce the unusualness of the results, positive coefficients indicate that applying the step is likely to produce more unusual results for our corpus. The plot shows that it is particularly the feature of removing stopwords which has high potential to reduce the unusualness of our results and the removing of punctuation that has high potential to increase the unusualness of our results. Yet, as explained above, we assume that this unusualness in the not yet preprocessed documents is caused by the web-scraping procedure, thus consists of frequent foreign language letters, word fragments and punctuation which are not relevant for our analysis of illiberal and liberal language styles. That is why we can safely remove such stopwords and punctuation during preprocessing despite the expected effect.

Overall and based on these diagnostics and robustness tests we can conclude that our combined text cleaning procedures have relatively low effects on the results of our corpus’ analysis and that based on our preprocessing theory, we can accept the expectedly high effects of removing stopwords for our corpus.

Fig. 7
figure 7

Assessing the expected effects of single preprocessing features on our corpus’ analysis (Denny and Spirling 2018)

Additional material for a 14-topic STM

See Figs. 89 and 10.

Fig. 8
figure 8

The coefficients and confidence intervals in each plot show the estimated proportions of selected topics in all speeches per speaker, PART I

Fig. 9
figure 9

The coefficients and confidence intervals in each plot show the estimated proportions of selected topics in all speeches per speaker, PART II

Fig. 10
figure 10

The coefficients and confidence intervals in each plot show the estimated proportions of selected topics in all speeches per speaker, PART III

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Maerz, S.F., Schneider, C.Q. Comparing public communication in democracies and autocracies: automated text analyses of speeches by heads of government. Qual Quant 54, 517–545 (2020).

Download citation

  • Published:

  • Issue Date:

  • DOI:


  • Political speeches of autocrats and democrats
  • Democratic backsliding
  • Qualities of democracy
  • Text-as-data
  • Dictionary
  • Topic models