Keywords

1 Introduction

Language is an important form of data in politics. Constituents express their stances and needs in text such as social media and survey responses. Politicians conduct campaigns through debates, statements of policy positions, and social media. Government staff needs to compile information from various documents to assist in decision-making. Textual data is also prevalent through the documents and debates in the legislation process, negotiations and treaties to resolve international conflicts, and media such as news reports, social media, party platforms, and manifestos.

Natural language processing (NLP) is the study of computational methods to automatically analyse text and extract meaningful information for subsequent analysis. The importance of NLP for policymaking has been highlighted since the last century (Gigley, 1993). With the recent success of NLP and its versatility over tasks such as classification, information extraction, summarization, and translation (Brown et al., 2020; Devlin et al., 2019), there is a rising trend to integrate NLP into the policy decisions and public administrations (Engstrom et al., 2020; Misuraca et al., 2020; Van Roy et al., 2021). Main applications include extracting useful, condensed information from free-form text (Engstrom et al., 2020), and analysing sentiment and citizen feedback by NLP Biran et al. (2022) as in many projects funded by EU Horizon projects (European Commission, 2017). Driven by the broad applications of NLP (Jin et al., 2021a), the research community also starts to connect NLP with various social applications in the fields of computational social science (Engel et al., 2021; Lazer et al., 2009; Luz, 2022; Shah et al., 2015) and political science in particular (Glavaš et al., 2019; Grimmer & Stewart, 2013).

We show an overview of NLP for policymaking in Fig. 7.1. According to this overview, the chapter will consist of three parts. First, we introduce in Sect. 7.2 NLP methods that are applicable to political science, including text classification, topic modelling, event extraction, and score prediction. Next, we cover a variety of cases where NLP can be applied to policymaking in Sect. 7.3. Specifically, we cover four stages: analysing data for evidence-based policymaking, improving policy communication with the public, investigating policy effects, and interpreting political phenomena to the public. Finally, we will discuss limitations and ethical considerations when using NLP for policymaking in Sect. 7.4.

Fig. 7.1
figure 1

Overview of NLP for policymaking

2 NLP for Text Analysis

NLP brings powerful computational tools to analyse textual data (Jurafsky & Martin, 2000). According to the type of information that we want to extract from the text, we introduce four different NLP tools to analyse text data: text classification (by which the extracted information is the category of the text), topic modelling (by which the extracted information is the key topics in the text), event extraction (by which the extracted information is the list of events mentioned in the text), and score prediction (where the extracted information is a score of the text). Table 7.1 lists each method with the type of information it can extract and some example application scenarios, which we will detail in the following subsections.

Table 7.1 Four common NLP methods, the type of information extracted by each of them, and example applications

2.1 Text Classification

As one of the most common types of text analysis methods, text classification reads in a piece of text and predicts its category using an NLP text classification model, as in Fig. 7.2.

Fig. 7.2
figure 2

The usage and example applications of text classification on political text

There are many off-the-shelf existing tools for text classification (Brown et al., 2020; Loria, 2018; Yin et al., 2019) such as the implementationFootnote 1 using the Python package transformers (Wolf et al., 2020). A well-known subtask of text classification is sentiment classification (also known as sentiment analysis or opinion mining), which aims to distinguish the subjective information in the text, such as positive or negative sentiment (Pang & Lee, 2007). However, the existing tools only do well in categories that are easy to predict. If the categorization is customized and very specific to a study context, then there are two common solutions. One is to use dictionary-based methods, by a list of frequent keywords that correspond to a certain category (Albaugh et al., 2013) or using general linguistic dictionaries such as the Linguistic Inquiry and Word Count (LIWC) dictionary (Pennebaker et al., 2001). The second way is to adopt the data-driven pipeline, which requires human hand coding of documents into a predetermined set of categories, then train an NLP model to learn the text classification task (Sun et al., 2019), and verify the performance of the NLP model on a held-out subset of the data, as introduced in Grimmer and Stewart (2013). An example of adapting the state-of-the-art NLP models on a customized dataset is demonstrated in this guide.Footnote 2

Using the text classification method, we can automate many types of analyses in political science. As listed in the examples in Fig. 7.2, researchers can detect political perspective of news articles (Huguet Cabot et al., 2020), the stance in media on a certain topic (Luo et al., 2020), whether campaigns use positive or negative sentiment (Ansolabehere & Iyengar, 1995), which issue area is the legislation about (Adler & Wilkerson, 2011), topics in parliament speech (Albaugh et al., 2013; Osnabrügge et al., 2021), congressional bills (Collingwood & Wilkerson, 2012; Hillard et al., 2008) and political agenda (Karan et al., 2016), whether the international statement is peaceful or belligerent (Schrodt, 2000), whether a speech contains positive or negative sentiment (Schumacher et al., 2016), and whether a US Circuit Courts case decision is conservative or liberal (Hausladen et al., 2020). Moreover, text classification can also be used to categorize the type of language devices that politicians use, such as what type of framing the text uses (Huguet Cabot et al., 2020), and whether a tweet uses political parody (Maronikolakis et al., 2020).

2.2 Topic Modelling

Topic modelling is a method to uncover a list of frequent topics in a corpus of text. For example, news articles that are against vaccination might frequently mention the topic “autism”, whereas news articles supporting vaccination will be more likely to mention “immune” and “protective”. One of the most widely used models is the Latent Dirichlet Allocation (LDA) (Blei et al., 2001) which is available in the Python packages NLTK and Gensim, as in this guide.Footnote 3

Specifically, LDA is a probabilistic model that models each topic as a mixture of words, and each textual document can be represented as a mixture of topics. As in Fig. 7.3, given a collection of textual documents, LDA topic modelling generates a list of topic clusters, for which the number N of topics can be customized by the analyst. In addition, if needed, LDA can also produce a representation of each document as a weighted list of topics. While often the number of topics is predetermined by the analyst, this number can also be dynamically determined by measuring the perplexity of the resulting topics. In addition to LDA, other topic modelling algorithms have been used extensively, such as those based on principal component analysis (PCA) (Chung & Pennebaker, 2008).

Fig. 7.3
figure 3

Given a collection of text documents, topic modelling generates a list of topic clusters

Topic modelling, as described in this section, can facilitate various studies on political text. Previous studies analysed the topics of legislative speech (Quinn et al., 2006, 2010), Senate press releases (Grimmer, 2010a), and electoral manifestos (Menini et al., 2017).

2.3 Event Extraction

Event extraction is the task of extracting a list of events from a given text. It is a subtask of a larger domain of NLP called information extraction (Manning et al., 2008). For example, the sentence “Israel bombs Hamas sites in Gaza” expresses an event “Israel \(\xrightarrow []{\mathit {bombs}}\) Hamas sites” with the location “Gaza”. Event extraction usually incorporates both entity extraction (e.g. Israel, Hamas sites, and Gaza in the previous example) and relation extraction (e.g. “bombs” in the previous example).

Event extraction is a handy tool to monitor events automatically, such as detecting news events (Mitamura et al., 2017; Walker et al., 2006) and detecting international conflicts (Azar, 1980; Trappl, 2006). To foster research on event extraction, there are tremendous efforts into textual data collection (McClelland, 1976; Merritt et al., 1993; Raleigh et al., 2010; Schrodt & Hall, 2006; Sundberg & Melander, 2013), event coding schemes to accommodate different political events (Bond et al., 1997; Gerner et al., 2002; Goldstein, 1992), and dataset validity assessment (Schrodt & Gerner, 1994).

As for event extraction models, similar to text classification models, there are off-the-shelf tools such as the Python packages stanza (Qi et al., 2020) and spaCy (Honnibal et al., 2020). In case of customized sets of event types, researchers can also train NLP models on a collection of textual documents with event annotations (Hogenboom et al., 2011; Liu et al., 2020, inter alia).

2.4 Score Prediction

NLP can also be used to predict a score given input text. A useful application is political text scaling, which aims to predict a score (e.g. left-to-right ideology, emotionality, and different attitudes towards the European integration process) for a given piece of text (e.g. political speeches, party manifestos, and social media posts) (Gennaro & Ash, 2021; Laver et al., 2003; Lowe et al., 2011; Slapin & Proksch, 2008, inter alia).

Traditional models for text scaling include Wordscores (Laver et al., 2003) and WordFish (Lowe et al., 2011; Slapin & Proksch, 2008). Recent NLP models represent the text by high-dimensional vectors learned by neural networks to predict the scores (Glavaš et al., 2017b; Nanni et al., 2019). One way to use the NLP models is to apply off-the-shelf general-purpose models such as InstructGPT (Ouyang et al., 2022) and design a prompt to specify the type of the scaling to the API,Footnote 4 or borrow existing, trained NLP models if the same type of scaling has been studied by previous researchers. Another way is to collect a dataset of text with hand-coded scales, and train NLP models to learn to predict the scale, similar to the practice in Gennaro and Ash (2021); Slapin and Proksch (2008), inter alia.

3 Using NLP for Policymaking

In the political domain, there are large amounts of textual data to analyse (NEUENDORF & KUMAR, 2015), such as parliament debates (Van Aggelen et al., 2017), speeches (Schumacher et al., 2016), legislative text (Baumgartner et al., 2006; Bevan, 2017), database of political parties worldwide (Döring & Regel, 2019), and expert survey data (Bakker et al., 2015). Since it is tedious to hand-code all textual data, NLP provides a low-cost tool to automatically analyse such massive text.

In this section, we will introduce how NLP can facilitate four major areas to help policymaking: before policies are made, researchers can use NLP to analyse data and extract key information for evidence-based policymaking (Sect. 7.3.1); after policies are made, researchers can interpret the priorities among and reasons behind political decisions (Sect. 7.3.2); researchers can also analyse features in the language of politicians when communicating the policies to the public (Sect. 7.3.3); and finally, after the policies have taken effect, researchers can investigate the effectiveness of the policies (Sect. 7.3.4).

3.1 Analysing Data for Evidence-Based Policymaking

A major use of NLP is to extract information from large collections of text. This function can be very useful for analysing the views and needs of constituents, so that policymakers can make decisions accordingly.

As in Fig. 7.4, we will explain how NLP can be used to analyse data for evidence-based policymaking from three aspects: data, information to extract, and political usage.

Fig. 7.4
figure 4

NLP to analyse data for evidence-based policymaking

Data

Data is the basis of such analyses. Large amounts of textual data can reveal information about constituents, media outlets, and influential figures. The data can come from a variety of sources, including social media such as Twitter and Facebook, survey responses, and news articles.

Information to Extract

Based on the large textual corpora, NLP models can be used to extract information that are useful for political decision-making, ranging from information about people, such as sentiment (Rosenthal et al., 2015; Thelwall et al., 2011), stance (Gottipati et al., 2013; Luo et al., 2020; Stefanov et al., 2020; Thomas et al., 2006), ideology (Hirst et al., 2010; Iyyer et al., 2014; Preoţiuc-Pietro et al., 2017), and reasoning on certain topics (Camp et al., 2021; Demszky et al., 2019; Egami et al., 2018), to factual information, such as main topics (Gottipati et al., 2013), events (Ding & Riloff, 2018; Ding et al., 2019; Mitamura et al., 2017; Trappl, 2006), and needs (Crayton et al., 2020; Paul & Frank, 2019; Sarol et al., 2020) expressed in the data. The extracted information cannot only be about people but also about political entities, such as the left-right political scales of parties and political actors (Glavaš et al., 2017b; Slapin & Proksch, 2008), which claims are raised by which politicians (Blessing et al., 2019; Padó et al., 2019), and the legislative body’s vote breakdown for state bills by backgrounds such as gender, rural-urban, and ideological splits (Davoodi et al., 2020).

To extract such information from text, we can often utilize the main NLP tools introduced in Sect. 7.2, including text classification, topic modelling, event extraction, and score prediction (especially text scaling to predict left-to-right ideology). In NLP literature, social media, such as Twitter, is a popular source of textual data to collect public opinions (Arunachalam & Sarkar, 2013; Pak & Paroubek, 2010; Paltoglou & Thelwall, 2012; Rosenthal et al., 2015; Thelwall et al., 2011).

Political Usage

Such information extracted from data is highly valuable for political usage. For example, voters’ sentiment, stance, and ideology are important supplementary for traditional polls and surveys to gather information about the constituents’ political leaning. Identifying the needs expressed by people is another important survey target, which helps politicians understand what needs they should take care of and match the needs and availabilities of resources (Hiware et al., 2020).

Among more specific political uses is to understand the public opinion on parties/president, as well as on certain topics. The public sentiment towards parties (Pla & Hurtado, 2014) and president (Marchetti-Bowick & Chambers, 2012) can serve as a supplementary for the traditional approval rating survey, and stances towards certain topics (Gottipati et al., 2013; Luo et al., 2020; Stefanov et al., 2020) can be important information for legislators to make decisions on debatable issues such as abortion, taxes, and legalization of same-sex marriage. Many existing studies use NLP on social media text to predict election results (Beverungen & Kalita, 2011; Mohammad et al., 2015; O’Connor et al., 2010; Tjong Kim Sang & Bos, 2012; Unankard et al., 2014). In general, big data-driven analyses can facilitate decision-makers to collect more feedback from people and society, enabling policymakers to be closer to citizens, and increase transparency and engagement in political issues (Arunachalam & Sarkar, 2013).

3.2 Interpreting Political Decisions

After policies are made, political scientists and social scientists can use textual data to interpret political decisions. As in Fig. 7.5, there are two major use cases: mining political agendas and discovering policy responsiveness.

Fig. 7.5
figure 5

NLP to interpret political decisions

Mining Political Agendas

Researchers can use textual data to infer a political agenda, including the topics that politicians prioritize, political events, and different political actors’ stances on certain topics. Such data can come from press releases, legislation, and electoral campaigns. Examples of previous studies to analyse the topics and prioritization of political bodies include the research on the prioritization each senator assigns to topics using press releases (Grimmer, 2010b), topics in different parties’ electoral manifestos (Glavaš et al., 2017a), topics in EU parliament speeches (Lauscher et al., 2016) and other various types of text (Grimmer, 2010a; Hopkins & King, 2010; King & Lowe, 2003; Roberts et al., 2014), as well as political event detection from congressional text and news (Nanni et al., 2017).

Research on politicians’ stances include identifying policy positions of politicians (Laver et al., 2003; Lowe et al., 2011; Slapin & Proksch, 2008; Winter & Stewart, 1977, inter alia), how different politicians agree or disagree on certain topics in electoral campaigns (Menini & Tonelli, 2016), and assessment of political personalities (Immelman, 1993).

Further studies look into how political interests affect legislative behaviour. Legislators tend to show strong personal interest in the issues that come before their committees (Fenno, 1973), and Mayhew (2004) identifies that senators replying on appropriations secured for their state have a strong incentive to support legislations that allow them to secure particularistic goods.

Discovering Policy Responsiveness

Policy responsiveness is the study of how policies respond to different factors, such as how changes in public opinion lead to responses in public policy (Stimson et al., 1995). One major direction is that politicians tend to make policies that align with the expectations of their constituents, in order to run for successful re-election in the next term (Canes-Wrone et al., 2002). Studies show that policy preferences of the state public can be a predictor of future state policies (Caughey & Warshaw, 2018). For example, Lax and Phillips (2009) show that more LGBT tolerance leads to more pro-gay legislation in response.

A recent study by Jin et al. (2021b) uses NLP to analyse over 10 million COVID-19-related tweets targeted at US governors; using classification models to obtain the public sentiment, they study how public sentiment leads to political decisions of COVID-19 policies made by US governors. Such use of NLP on massive textual data contrasts with the traditional studies of policy responsiveness which span over several decades and use manually collected survey results (Caughey & Warshaw, 2018; Lax & Phillips, 2009, 2012).

3.3 Improving Policy Communication with the Public

Policy communication is the study to understand how politicians present the policies to their constituents. As in Fig. 7.6, common research questions in policy communication include how politicians establish their images (Fenno, 1978) such as campaign strategies (Petrocik, 1996; Sigelman & Buell Jr, 2004; Simon, 2002), how constituents allocate credit, what receives attention in Congress (Sulkin, 2005), and what receives attention in news articles (Armstrong et al., 2006; McCombs & Valenzuela, 2004; Semetko & Valkenburg, 2000).

Fig. 7.6
figure 6

NLP to analyse policy communication

Based on data from press releases, political statements, electoral campaigns, and news articles,Footnote 5 researchers usually analyse two types of information: the language techniques politicians use and the contents such as topics and underlying moral foundations in these textual documents.

Language Techniques

Policy communication largely focuses on the types of languages that politicians use. Researchers are interested in first analysing the language techniques in political texts, and then, based on these techniques, researchers can dive into the questions of why politicians use them and what are the effects of such usage.

For example, previous studies analyse what portions of political texts are position-taking versus credit-claiming (Grimmer, 2013; Grimmer et al., 2012), whether the claims are vague or concrete (Baerg et al., 2018; Eichorst & Lin, 2019), the frequency of credit-claiming messages versus the actual amount of contributions (Grimmer et al., 2012), and whether politicians tend to make credible or dishonourable promises (Grimmer, 2010b). Within the political statements, it is also interesting to check the ideological proportions (Sim et al., 2013) and how politicians make use of dialectal variations and code-mixing (Sravani et al., 2021).

The representation styles usually affect the effectiveness of policy communication, such as the role of language ambiguity in framing the political agenda (Campbell, 1983; Page, 1976) and the effect of credit-claiming messages on constituents’ allocation of credit (Grimmer et al., 2012).

Contents

The contents of policy communication include the topics in the political statements, such as what senators discuss in floor statements (Hill & Hurley, 2002) and what presidents address in daily speeches (Lee, 2008), and also the moral foundations used by politicians underlying their political tweets (Johnson & Goldwasser, 2018).

Using the extracted content information, researchers can explore further questions such as whether competing politicians or political elites emphasize the same issues (Gabel & Scheve, 2007; Petrocik, 1996) and how the priorities politicians articulate co-vary with the issues discussed in the media (Bartels, 1996). Another open research direction is to analyse the interaction between newspapers and politicians’ messages, such as how often newspapers cover a certain politician’s message and in what way and how such coverage affects incumbency advantage.

Meaningful Future Work

Apart from analysing the language of existing political texts that aims to maximize political interests, an advanced question that is more meaningful to society is how to improve policy communication to steer towards a more beneficial future for society as a whole. There is relatively little research on this, and we welcome future work on this meaningful topic.

3.4 Investigating Policy Effects

After policies are taken into effect, it is important to collect feedback or evaluate the effectiveness of policies. Existing studies evaluate the effects of policies along different dimensions: one dimension is the change in public sentiment, which can be analysed by comparing the sentiment classification results before and after policies, following a similar paradigm in Sect. 7.3.1. There are also studies on how policies affect the crowd’s perception of the democratic process (Miller et al., 1990).

Another dimension is how policies result in economic changes. Calvo-González et al. (2018) investigate the negative consequences of policy volatility that harm long-term economic growth. Specifically, to measure policy volatility, they first obtain main topics by topic modelling on presidential speeches and then analyse how the significance of topics changes over time.

4 Limitations and Ethical Considerations

There are several limitations that researchers and policymakers need to take into consideration when using NLP for policymaking, due to the data-driven and black-box nature of modern NLP. First, the effectiveness of the computational models relies on the quality and comprehensiveness of the data. Although many political discourses are public, including data sources such as news, press releases, legislation, and campaigns, when it comes to surveying public opinions, social media might be a biased representation of the whole population. Therefore, when making important policy decisions, the traditional polls and surveys can provide more comprehensive coverage. Note that in the case of traditional polls, NLP can still be helpful in expediting the processing of survey answers.

The second concern is the black-box nature of modern NLP models. We do not encourage decision-making systems to depend fully on NLP, but suggest that NLP can assist human decision-makers. Hence, all the applications introduced in this chapter use NLP to compile information that is necessary for policymaking instead of directly suggesting a policy. Nonetheless, some of the models are hard to interpret or explain, such as text classification using deep learning models (Brown et al., 2020; Yin et al., 2019), which could be vulnerable to adversarial attacks by small paraphrasing of the text input (Jin et al., 2020). In practical applications, it is important to ensure the trustworthiness of the usage of AI. There could be a preference for transparent machine learning models if they can do the work well (e.g. LDA topic models and traditional classification methods using dictionaries or linguistic rules) or tasks with well-controlled outputs such as event extraction to select spans of the given text that mention events. In cases where only the deep learning models can provide good performance, there should be more detailed performance analysis (e.g. a study to check the correlation of the model decisions and human judgments), error analysis (e.g. different types of errors, failure modes, and potential bias towards certain groups), and studies about the interpretability of the model (e.g. feature attribution of the model, visualization of the internal states of the model).

Apart from the limitations of the technical methodology, there are also ethical considerations arising from the use of NLP. Among the use cases introduced in this chapter, some applications of NLP are relatively safe as they mainly involve analysing public political documents and fact-based evidence or effects of policies. However, others could be concerning and vulnerable to misuse. For example, although effective, truthful policy communication is beneficial for society, it might be tempting to overdo policy communication and by all means optimize the votes. As it is highly important for government and politicians to gain positive public perception, overly optimizing policy communication might lead to propaganda, intrusion of data privacy to collect more user preferences, and, in more severe cases, surveillance and violation of human rights. Hence, there is a strong need for policies to regulate the use of technologies that influence public opinions and pose a challenge to democracy.

5 Conclusions

This chapter provided a brief overview of current research directions in NLP that provide support for policymaking. We first introduced four main NLP tasks that are commonly used in text analysis: text classification, topic modelling, event extraction, and text scaling. We then showed how these methods can be used in policymaking for applications such as data collection for evidence-based policymaking, interpretation of political decisions, policy communication, and investigation of policy effects. We also discussed potential limitations and ethical considerations of which researchers and policymakers should be aware.

NLP holds significant promise for enabling data-driven policymaking. In addition to the tasks overviewed in this chapter, we foresee that other NLP applications, such as text summarization (e.g. to condense information from large documents), question answering (e.g. for reasoning about policies), and culturally adjusted machine translation (e.g. to facilitate international communications), will soon find use in policymaking. The field of NLP is quickly advancing, and close collaborations between NLP experts and public policy experts will be key to the successful use and deployment of NLP tools in public policy.