Towards a large scale analysis of claims: developing a machine learning method for detecting and classifying politicians’ claims of representation

Gevers, Ine; De Mulder, August; Daelemans, Walter

doi:10.1007/s42001-024-00261-y

Towards a large scale analysis of claims: developing a machine learning method for detecting and classifying politicians’ claims of representation

Research Article
Published: 16 March 2024

(2024)
Cite this article

Journal of Computational Social Science Aims and scope Submit manuscript

242 Accesses
1 Altmetric
Explore all metrics

Abstract

In recent decades, many theoreticians have argued that we should pay attention to the claims of representation politicians make about groups in society. Nevertheless, despite recent advances on this topic, empirical research on politicians’ claims of representation remains relatively scant and mostly limited to case studies and manual annotation. Therefore, we develop a reusable Natural Language Processing (NLP) system to automatically classify claims by Dutch-speaking Belgian politicians. Following our new operationalization of claims of representation, which includes six constitutive elements, we use a limited amount of manually annotated data to train NLP models to automatically extract and classify these six elements. Our results show that using a combination of transformer learning (such as BERT), classic machine learning algorithms (such as SVMs), and rule-based methods, we can successfully classify each element of claims of representation, with macro F1-scores between 0.61 and 0.91. Taking all elements into account, we are able to correctly classify 74% of all detected claims in Belgian politicians’ Facebook posts between 2010 and 2022. Being the first to automate this process, this study contributes to the literature by offering a tested and validated method for classifying claims in politicians’ communication, thereby allowing large scale, and longitudinal analysis of claims. In the last section of this article, we further demonstrate some of the possibilities of our models by analyzing which groups politicians claimed to represent in the years before and after the start of the COVID-19 pandemic.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

First public dataset to study 2023 Turkish general election

Article Open access 16 April 2024

More human than human: measuring ChatGPT political bias

Article Open access 17 August 2023

A survey of sentiment analysis in social media

Article 04 July 2018

Data availability

The supporting data is scraped from public Facebook pages of active Flemish politicians using the open-source scraping tool CrowdTangle (https://www.crowdtangle.com/). The originally scraped dataset can be shared upon request with the authors.

Notes

Due to our annotation set-up, ‘neutral’ and ‘unclear’ references to groups were already filtered out before student annotators (and later, the models) had to assign deservingness (and linkage, see below) of the claim. That is, based on the combination of a mentioned group and representation, the first binary decision was whether the text unit contained a claim, yes or no. Since representation already requires taking some sort of stance for or against a group, neutral mentions were already filtered out in this first stage. Similarly, if it was unclear whether an “issue is presented as advantageous to/positive towards the mentioned group or not (e.g., a mere summary of policy measures)” student annotators were instructed to not annotate the claim (see step vii, Q1, in codebook). This facilitated the later decision regarding deservingness and linkage.
https://www.crowdtangle.com/.
For the full annotation guidelines see “Appendix 2”.
If multiple subjects were linked to one subject as part of different types of claims, they had to be coded as separate claims (e.g. “I represent the people, my opponent does not” would be coded as a representative claim about the claim-maker themselves, and a claim of misrepresentation about a political opponent).
If there are multiple text units in a Facebook post that contain one or more claims of representation, annotators further had to annotate all elements for each claim in the text unit(s).
The very high reliability for deservingness may be due to the coding set up, in which coders first had to indicate whether an object was part of a claim of representation, and only if so, classify whether the group was constructed as deserving or undeserving. This strongly facilitates the classification of deservingness as neutral mentions of groups are filtered out beforehand.
i.e., if there are two objects mentioned in the focus sentence of the text unit, the unit is only recorded once here. When text units are repeated as many times as there are objects in the focus sentence, this results in 75,247 units.
https://www.nltk.org/.
We use the special [SEP] token because it is a standard way to separate sentences or input features in BERT, marking the boundary between segments and thereby possibly improving the model’s understanding of contextual dependencies within the input.
The implementation of transformers in NER allows importing general linguistic knowledge into the pipeline, which results in better generalizations when fine-tuning. https://spacy.io/universe/project/spacy-transformers.
This was a separate annotation task in which student annotators were asked to identify every reference to social groups, regardless of whether they are considered the object in a representative claim. A pilot study where two student coders annotated the same 2,000 words showed desirable reliability for this variable with Krippendorff’s alpha.95.
See “Appendix 2” for the exact procedure.
https://huggingface.co/GroNLP/bert-base-dutch-cased.
https://huggingface.co/GroNLP/bert-base-dutch-cased.
https://spacy.io/models/nl#nl_core_news_md.
NORP = Nationalities or religious or political groups; ORG = Companies, agencies, institutions, etc.; PERSON = People, including fictional.
References in the list: minister (MP), meerderheidspartijen (majority parties), minister-president (prime minister), (Vlaamse) regering ((Flemish) government), Vivaldi, vivaldicoalitie (vivaldi coalition), vivaldiregering (vivaldi government), paars-groen (purple-green), paarsgroen (purplegreen), zweedse coalitie (swedish coalition), kamikazecoalitie (kamikaze coalition), tripartite, de regering (the government), de overheid (the government), de staat (the state), staatssecretaris (secretary of state), premier (prime minister), eerste minister (prime minister), federale regering (federal government), de huidige regering (the current government), deze regering (this government).
https://huggingface.co/GroNLP/bert-base-dutch-cased.
i.e.,
$$\begin{aligned} \frac{true\; positives}{true\; positives + false \;negatives} \end{aligned},$$
which measures the proportion of true positives (i.e., the instances from the positive class that got the positive label) to the total number of actual positives (i.e., the sum of true positives and instances of the positive class that got the wrong label).
Original categories were ‘Medical: people with disabilities, disease or other medical conditions’ (= the ill), ‘Socio-economic: health sector, doctors, health care workers’ (= healthcareworkers) and the combination of categories ‘Socio-economic: companies, SME’s, the self-employed, employers’, ‘Socio-economic: other (economic) sectors and professions’ and ‘Socio-economic: the catering industry, bars, restaurants, hotels’ (= business owners).
https://huggingface.co/GroNLP/bert-base-dutch-cased.

References

Adnan, K., & Akbar, R. (2019). An analytical study of information extraction from unstructured and multidimensional big data. Journal of Big Data, 6(1), 1–38.
Article Google Scholar
André, A., Depauw, S., Erzeel, S., et al. (2010). Vertegenwoordigende claims op het net: hoe kamerleden spreken over de groepen die zij vertegenwoordigen. In Gezien, gehoord, vertegenwoordigd? Diversiteit in de Belgische politiek (pp. 205–224).
Bourdieu, P. (1991). Language and symbolic power. Harvard University Press.
Google Scholar
Bruter, M. (2009). Time bomb? The dynamic effect of news and symbols on the political identity of European citizens. Comparative Political Studies, 42(12), 1498–1536.
Article Google Scholar
Cabot, P. L. H., Dankers, V., Abadi, D., et al. (2020). The pragmatics behind politics: Modelling metaphor, framing and emotion in political discourse. In Findings of the association for computational linguistics: emnlp (pp. 4479–4488).
Castiglione, D., & Pollak, J. (2018). Creating political presence: The new politics of democratic representation. University of Chicago Press.
Book Google Scholar
. Chatsiou, K., & Mikhaylov, S. J. (2020). Deep learning for political science. arXiv preprint arXiv:2005.06540
Dada, S., Ashworth, H. C., Bewa, M. J., et al. (2021). Words matter: Political and gender analysis of speeches made by heads of government during the covid-19 pandemic. BMJ Global Health, 6(1), e003910.
Article PubMed Google Scholar
Dayanik, E., & Padó, S., et al. (2020). Masking actor information leads to fairer political claims detection. In D. Jurafsky, J. Chai, & N. Schluter (Eds.), Proceedings of the 58th annual meeting of the Association for Computational Linguistics (pp. 4385–4391). Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.acl-main.404
Chapter Google Scholar
De Wilde, P. (2020). The quality of representative claims: Uncovering a weakness in the defense of the liberal world order. Political Studies, 68(2), 271–292.
Article Google Scholar
Disch, L. (2015). The “constructivist turn’’ in democratic representation: A normative dead-end? Constellations, 22(4), 487–499.
Article Google Scholar
Duan, D. (2019). On authoritarian political representation in contemporary China. Politics and Governance, 7(3), 199–207.
Article Google Scholar
Dutoya, V. (2016). A representative claim made in the name of women? Revue francaise de science politique, 66(1), 49–70.
Article Google Scholar
Erzeel, S. (2011). Vertegenwoordigende claims en de substantiële vertegenwoordiging van vrouwen in de kamer. Res Publica, 53(4), 429.
Article Google Scholar
Fogel-Dror, Y., Shenhav, S. R., Sheafer, T., et al. (2019). Role-based association of verbs, actions, and sentiments with entities in political discourse. Communication Methods and Measures, 13(2), 69–82.
Article Google Scholar
Green, J., Edgerton, J., Naftel, D., et al. (2020). Elusive consensus: Polarization in elite communication on the covid-19 pandemic. Science Advances, 6(28), eabc2717.
Article PubMed PubMed Central ADS Google Scholar
Guasti, P., & Geissel, B. (2019). Saward’s concept of the representative claim revisited: An empirical perspective. Politics and Governance, 7(3), 98–111.
Article Google Scholar
Heberer, T., & Shpakovskaya, A. (2020). Online connective representation in China: The case of the entrepreneurs. Asian Survey, 60(2), 391–415.
Article Google Scholar
Heinisch, R., & Werner, A. (2019). Who do populist radical right parties stand for? Representative claims, claim acceptance and descriptive representation in the Austrian fpö and German afd. Representation, 55(4), 475–492.
Article Google Scholar
Horn, A., Kevins, A., Jensen, C., et al. (2021). Political parties and social groups: New perspectives and data on group and policy appeals. Party Politics, 27(5), 983–995.
Article Google Scholar
Joschko, V., & Glaser, L. (2019). A new approach to map and quantify representative claims and measure their validation: A case study analysis. Politics and Governance, 7(3), 137–151.
Article Google Scholar
Jurafsky, D., Chai, J., Schluter, N., et al. (2020). Proceedings of the 58th annual meeting of the association for computational linguistics. In Proceedings of the 58th annual meeting of the Association for Computational Linguistics.
Kahn, L. H. (2020). Who’s in charge? Leadership during epidemics, bioterror attacks, and other public health crises (2nd ed.). Praeger Security International.
Book Google Scholar
Koopmans, R., & Statham, P. (1999). Political claims analysis: Integrating protest event and political discourse approaches. Mobilization: An International Quarterly, 4(2), 203–221.
Article Google Scholar
Koopmans, R., Statham, P., et al. (2010). Theoretical framework, research design, and methods. In The making of a European public sphere Media discourse and political contention (pp. 34–59).
Lamont, M., Park, B. Y., & Ayala-Hurtado, E. (2017). Trump’s electoral speeches and his appeal to the American white working class. The British Journal of Sociology, 68, S153–S180.
Article PubMed Google Scholar
Lawrence, J., & Reed, C. (2020). Argument mining: A survey. Computational Linguistics, 45(4), 765–818.
Article Google Scholar
Licht, H., & Sczepanski, R. (2023). Who are they talking about? detecting mentions of social groups in political texts with supervised learning.
Lombardo, E., & Meier, P. (2018). Good symbolic representation: The relevance of inclusion. PS: Political Science & Politics, 51(2), 327–330.
Google Scholar
Mancini, E., Ruggeri, F., Galassi, A., et al. (2022) Multimodal argument mining: A case study in political debates. In Proceedings of the 9th workshop on argument mining (pp. 158–170).
Manin, B. (1997). The principles of representative government. Cambridge University Press.
Book Google Scholar
Mazzola, A., & De Backer, M. (2021). Solidarity with vulnerable migrants during and beyond the state of crisis. Culture, Practice & Europeanization, 6(1), 55–69.
Article Google Scholar
Montanaro, L. (2017). Who elected Oxfam? A democratic defense of self-appointed representatives. Cambridge University Press.
Book Google Scholar
Németh, R. (2023). A scoping review on the use of natural language processing in research on political polarization: Trends and research prospects. Journal of Computational Social Science, 6(1), 289–313.
Article PubMed Google Scholar
Prabhakar, A. A., Mohtaj, S., & Möller, S. (2020). Claim extraction from text using transfer learning. In Proceedings of the 17th International Conference on Natural Language Processing (ICON) (pp. 297–302).
Rivas-De-Roca, R., García-Gordillo, M., & Rojas-Torrijos, J. L. (2021). Communication strategies on twitter and institutional websites in the covid-19 second wave: Analysis of the governments of germany, spain, portugal, and the united kingdom. Revista Latina de Comunicacion Social, 79, 49–72.
Article Google Scholar
Sales, M. (2023). The refugee crisis’ double standards: Media framing and the proliferation of positive and negative narratives during the Ukrainian and Syrian crises. Policy brief. https://www.euromesco.net/publication/the-refugee-crisis-double-standards-media-framing-and-the-proliferation-of-positive-and-negative-narratives-during-the-ukrainian-and-syrian-crisis/
Saward, M. (2006). The representative claim. Contemporary Political Theory, 5, 297–318.
Article Google Scholar
Saward, M. (2010). The representative claim. Oxford University Press. https://doi.org/10.1093/acprof:oso/9780199579389.001.0001
Book Google Scholar
Saward, M. (2014). Shape-shifting representation. American Political Science Review, 108(4), 723–736.
Article Google Scholar
Schneider, A., & Ingram, H. (1993). Social construction of target populations: Implications for politics and policy. American Political Science Review, 87(2), 334–347.
Article Google Scholar
Sciensano. (2023). Factsheets: Ad-hoc surveillance of covid-19, 2020–2022. https://www.belgiqueenbonnesante.be/fr/etat-de-sante/factsheets/ad-hoc-surveillance-of-covid-19-mortality-2020-2022, health Status Report.
Severs, E. (2012). Substantive representation through a claims-making lens: A strategy for the identification and analysis of substantive claims. Representation, 48(2), 169–181.
Article Google Scholar
Sharma, R., Somani, A., Kumar, L., et al. (2017). Sentiment intensity ranking among adjectives using sentiment bearing word embeddings. In Proceedings of the 2017 conference on empirical methods in natural language processing (pp. 547–552).
Shen, Y. (2022). Impacts of Afghanistan refugee crisis and solutions for European Union. In 2021 international conference on Social Development and Media Communication (SDMC 2021) (pp. 401–404). Atlantis Press.
Stowe, K., Utama, P., & Gurevych, I. (2022). IMPLI: Investigating NLI models’ performance on figurative language. In Proceedings of the 60th annual meeting of the Association for Computational Linguistics (Volume 1: Long Papers, pp. 5375–5388). Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.acl-long.369
Tajfel, H., & Turner, J. C. (2004). The social identity theory of intergroup behavior. In Political psychology (pp. 276–293). Psychology Press.
Thau, M. (2018). The demobilization of class politics in Denmark: The social democratic party’s group-based appeals 1961–2004. World Political Science, 14(2), 169–188.
Article Google Scholar
Thau, M. (2019). How political parties use group-based appeals: Evidence from Britain 1964–2015. Political Studies, 67(1), 63–82.
Article Google Scholar
Van Aelst, P. (2022). Covid-19 as an ideal case for a rally-around-the-flag? How government communication, media coverage and a polarized public sphere determine leadership approvals in times of crisis. In P. Van Aelst & J. Blumler (Eds.), Political communication in the time of Coronavirus (pp. 1–13). Routledge.
Google Scholar
Vanthomme, K., Gadeyne, S., Lusyne, P., et al. (2021). A population-based study on mortality among Belgian immigrants during the first covid-19 wave in Belgium: Can demographic and socioeconomic indicators explain differential mortality? SSM-Population Health, 14, 1–10.
Article Google Scholar
Vermassen, D., Caluwaerts, D., & Erzeel, S. (2023). Speaking for the voiceless? Representative claims-making on behalf of future generations in Belgium. Parliamentary Affairs, 76(3), 579–599.
Article Google Scholar
Walgrave, S., & Lefevere, J. (2013). Ideology, salience, and complexity: Determinants of policy issue incongruence between voters and parties. Journal of Elections, Public Opinion & Parties, 23(4), 456–483.
Article Google Scholar
de Wilde, P. (2013). Representative claims analysis: Theory meets method. Journal of European Public Policy, 20(2), 278–294. https://doi.org/10.1080/13501763.2013.746128
Article Google Scholar
Wildemann, S., Niederée, C., & Elejalde, E. (2023). Migration reframed? A multilingual analysis on the stance shift in Europe during the Ukrainian crisis. arXiv preprint arXiv:2302.02813
Wondreys, J., & Mudde, C. (2022). Victims of the pandemic? European far-right parties and covid-19. Nationalities Papers, 50(1), 86–103.
Article Google Scholar
Yasser, K., Kutlu, M., & Elsayed, T. (2018). bigir at clef 2018: Detection and verification of check-worthy political claims. In CLEF (Working Notes).

Download references

Acknowledgements

This research was made possible with a grant from the University of Antwerp (BOF/TOP project 43991). We are thankful for the helpful comments we received on earlier versions of this paper by the reviewers of the Journal of Computational Social Science, at the 2023 ICA conference in Toronto and the 2023 CAP conference in Antwerp, as well as the input we received from prof. Stefaan Walgrave.

Author information

Ine Gevers and August De Mulder have contributed equally to this work.

Authors and Affiliations

CLiPS, University of Antwerp, Prinsstraat 13, 2000, Antwerp, Antwerp, Belgium
Ine Gevers & Walter Daelemans
M2P, University of Antwerp, Prinsstraat 13, 2000, Antwerp, Antwerp, Belgium
August De Mulder

Authors

Ine Gevers
View author publications
You can also search for this author in PubMed Google Scholar
August De Mulder
View author publications
You can also search for this author in PubMed Google Scholar
Walter Daelemans
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ine Gevers.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding authors state that there is no Conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Dutch version of example Facebook post

See Fig. 5.

Appendix 2: Manual annotation guidelines

[Start codebook]

Qa: What is the unique ID-code of the Facebook post/parliamentary intervention?

[Text entry]

Q1: Does the Facebook post/parliamentary intervention contain one or more claims of representation?

A claim of representation must contain the following two elements, max. one sentence apart (no sentence separating the two elements):

1)
A group of people that is mentioned (the object). This may be a social group (e.g., ‘woman’, ‘entrepreneurs’, ‘migrants’), a geographically defined constituency (e.g., ‘Flemings’, ‘residents of Antwerp’), or a socio-economic sector (e.g., ‘the health sector’, ‘the catering industry’, ‘the diamond industry’). Groups may be very general (e.g., ‘the people’, ‘the voters’) or very specific (e.g., ‘the factory workers of Ford’, ‘young people with disabilities’). Also, when a single (fictional or non-fictional) person is named standing for a greater group (e.g., ‘Joe the Plumber’), or is used as an example to make a point about a group (e.g., ‘my friend Mike, a teacher, now has to put twice as much effort in administrative tasks. A disgrace!’) this may be considered a group reference. Also negative constructions such as ‘no child should grow up in poverty’ should be considered group references. AND
2)
Reference to either substantive, descriptive or symbolic representation (the representation). Substantive:

(i)
Taking a position on or denouncing an issue that is (dis)advantageous to the mentioned group or taking a position in line with the mentioned group’s position (e.g., ‘Every child has a right to education!’ or ‘convicted child abusers should never again be allowed to come into contact with children’ or ‘many French-speakers agree with me as well!’).
(ii)
Claiming (not) to act to improve/worsen the situation of the mentioned group (e.g., ‘Stop War! For the innocent children in Ukraine, I will be protesting today in Brussels’ or ‘The communist party votes against an aid-package to help the people of Ukraine. This is shocking.’).
(iii)
Claiming a success/failure to improve the situation of the mentioned group (e.g., ‘lower tax rates on electricity and gas! We did it, socialist make the difference for regular folks’ or ‘Our people face the repercussions of the government’s bad policy day in day out’). Descriptive:
(iv)
Claiming (not) to be part of or to have similar/different experiences to the mentioned group (e.g., ‘as a local business owner myself, I know we should lower taxes’ or ‘Being a resident of Antwerp, I went to admire our beautiful cathedral today’. Symbolic:
(v)
Claiming (not) to care about, (not) to appreciate, (not) to celebrate the group mentioned, without a substantive (position taking, acting or succeeding) or descriptive element (e.g., ‘Shout out to all health care workers. Thank you for your hard work!’ or ‘we are the party of farmers’ or ‘today I had the privilege to visit two companies on renewable energy. Interesting work they are doing!

‘no’ should be answered if:

(i)
No group is mentioned
(ii)
Only an individual is mentioned, not standing for a group category (e.g., ‘Triathlete Frederik Van Lierde wins the Iron man! What an achievement!’)
(iii)
A group is mentioned without reference to substantive, descriptive or symbolic representation (e.g., ‘Today John Smith passed away. He leaves three children, a wife and many friends.’)
(iv)
The object refers to a group of party members/other politicians/other parties (e.g., Great job by the mayors of the municipalities of the coast! Thank you all for your hard work.’)
(v)
When the group is only mentioned as part of the name of a company or organization (e.g., ‘the bridges for youth homeless center’)
(vii)
if it is unclear whether an issue is presented as advantageous to/positive towards the mentioned group or not (e.g., a mere summary of policy measures).
(viii)
If the post is written in a language other than Dutch or English.

[Yes/no]

Q2: Who is the object (the represented) of the claim? Write down the group exactly as in the text.

A post/an intervention may contain multiple claims, sometimes these claims should be coded separately.

Rule 1a: If multiple objects (groups) are mentioned that belong to different group categories (see question below) (e.g., ‘we support the elderly, people with disabilities and healthcare workers!’) these should be coded as separate claims. EXCEPTION 1: if a group is mentioned in a post with and without a geographical indication (e.g., ‘Flemish Children’ and ‘children’ in general) may be regarded as the same group and coded as one claim. EXCEPTION 2: if a group is mentioned with and without reference to every one of the given group (‘all students’ and ‘students’) this should be coded as one claim (see also rule 1e)
Rule 1b: If multiple objects (groups) are mentioned that belong to exactly the same categories (see question below), this should be coded as one claim (e.g., ‘pedophiles’ and ‘street racers’ both belong to the category ‘General: criminals…’ and therefore should be coded as one claim, however, ‘families in poverty’ and ‘children in poverty’ should be coded separately, as they belong to different categories). Important: If the group is mentioned both as part of a symbolic and a substantive/descriptive claim, it should still be coded as one substantive/descriptive claim (substantive/descriptive>symbolic).
Rule 1c: If multiple groups are mentioned as a way to refer to everyone (e.g., ‘young and old’, ‘rich and poor’) or to refer to equality between groups (e.g., ‘Finally equal pay between men and women’) this should be regarded as one claim and both groups should be written down separated by a semi-colon. Rule 1d: If the different groups are mentioned in relation to each other (e.g., ‘young people with autism and their families’ or ‘Belgians and their children’) all groups should also be coded separately! (in this case ‘young people with autism’ and ‘their families’, and ‘Belgians’ and ‘their children’).
Rule 1e: If a politician claims to represent everyone or all people of a group, and then emphasizes the inclusion of a certain subgroup (e.g., ‘Schools should be affordable for all students, also those with less funds’) this should be coded as separate groups (‘all students’ and ‘those with less funds’). Important: if a post refers to everyone of a given group (‘all students’) and also mentions this group in general (‘students’), this should be coded as one claim.
Rule 1f: If a summary is given of an overarching group (e.g., ‘we support often forgotten professions such as cleaning aids, cashiers, and healthcare workers’) groups should be coded separately if they belong to separate categories in the question below (‘often forgotten professions’; ‘cleaning aids’; ‘cashiers’ make one claim (‘Socio-economic: other (economic) sectors and professions’) and ‘health care workers’ separately (Socio-economic: health sector, doctors, health care workers)
Rule 2a: If a post/intervention contains reference to misrepresentation (substantive, descriptive, or symbolic), and implies representation by the politician making the claim (e.g., ‘the government fails to help the people’ or ‘people are slipping into poverty and the government does nothing’), it should be regarded as one claim (of misrepresentation). Important: if no subject is explicitly mentioned (e.g., ‘they are letting the people down’ or ‘the local business owners are left out in the cold’ this should not be coded as a claim of misrepresentation, but as claim of representation about the claim maker him/herself. Important: the representative actor that misrepresents should not be separated by more than one sentence from the mentioned group.
Rule 2b: If a post/intervention contains both an explicit claim of misrepresentation and an explicit representative claim, it should be coded as two (separate) claims (e.g., ‘The political elite does not represent the people. Our party does’). Important: when combined with a claim of misrepresentation, simple position taking is insufficient to be coded as a second claim. Explicit reference to the subject (the representative) (e.g., ‘I’, ‘we’, ‘the party’) is necessary.
Rule 2c: If another representative is called upon to act for a certain group, this is not a claim of misrepresentation, but a representative claim (e.g., ‘people are slipping into poverty! The minister must provide a support package!’).
Rule 3: A non-political actor is never the subject! If a group of non-political actors is criticized (e.g., ‘employees are being exploited by the employers’) then this group (e.g., ‘employers’) should be coded as the object of a claim (as undeserving).
Rule 4: If a claim contains multiple subjects (representatives) that are representing/misrepresenting a group, this should be coded as one claim (e.g., ‘the minister and the rest of the government do not represent the people’). If the above rules lead to the conclusion that there are multiple claims, base your first answer on the first claim in the post/intervention. After coding the first claim, open Qualtrics again, and then code the second claim etc. TIP: write down the subject and object of each claim on paper when first reading the post/intervention so you do not miss any claims. Always enter the mentioned group in the exact wordings used in the text. If a number/percentage or pronoun is used, include this. If the group is placed in quotation marks in the post (e.g., ‘victims’), also include these quotation marks. Similarly, if the group is mentioned in a hashtag, include the hashtag (e.g., ‘#internationalwomensday’ is written in full). If the group is broken up by a irrelevant word, this word can be dropped. However, no words should be added.

[Text entry]

Q3: To which group categories does this mentioned group belong?

Rule 1: if the mentioned group belongs to multiple categories (e.g., ‘Flemish families’ or ‘immigrant factory workers’) multiple categories can be combined (e.g., ‘Geography: Flemings…’ + ‘Socio-demographic: families’ and ‘General: immigrants…’ + ‘Socio-economic: workers…’)
Rule 2: when coding a group which was mentioned in relation to a reference group (e.g., ‘their families’ in ‘young people with autism and their families’), both the reference category (‘Socio-demographic: children…’ + ‘Medical: people with disabilities…’) and the and the relational group (‘Socio-demographic: families’) should be chosen. When coding he reference group (‘young people with autism’), however, only the category of the reference group should be chosen (‘Socio-demographic: children…’ + ‘Medical: people with disabilities…’).
Rule 3: If different groups were mentioned as a way to refer to everyone (e.g., ’young and old’, ’rich and poor’) or to refer to equality between groups (e.g., ’Finally equal pay between men and women’), these different categories should be chosen.
Rule 4: if in a post about ’terrorists’ and no explicit reference is made to religion, only the option ‘criminals’ should be chosen. If reference is made to the religion of the perpetrator then the relevant category regarding religious fundamentalism should also be chosen.
Rule 5: options ’bona fide’ and ’rogue’ always have to be combined with another category (e.g., ’bona fide + people’ or ’rogue + companies’). IMPORTANT: this category should only be chosen if the bona fide or rogue character of the group is explicitly mentioned and points to a subdimension within a group (e.g., ‘companies that abuse the system need to be punished’ = rogue + companies). However, if the bona fide or rogue character of a group is merely implied in the group reference (e.g., ’conmen’ or ‘our brave health care workers’) this category should not be chosen, but only the relevant category (’criminals’ or ‘health care workers’).
Rule 6: negative constructions such as ‘no child should grow up in poverty’ or ‘no one deserves this’ respectively belong to categories ‘General: everyone…’ + ‘Socio-demographic: children…’ and ‘General: everyone…’. However: be aware, ’not everyone’ or ‘not all people’ is not the same as ’everyone’, but usually refers to a subgroup (e.g., in ‘not everyone is safe at home. Victims of domestic violence often are forgotten’, ‘not everyone’ and ‘victims of domestic violence’ refer to the same category (‘Medical: Victims (and their relatives) of disasters, criminality or accidents’).
Rule 8: when reference is made to all people of a subgroup (e.g., ‘all students’) the option ’General: everyone…’ should be combined with the relevant group category (‘Socio-demographic:…students’). IMPORTANT: when reference is made to all people in general (e.g., ‘all people’, ‘all civilians’, ‘all persons’), only the option (’General: everyone…’) should be chosen.
Rule 9: if reference is made to ‘girls’ or ‘boys’, both the option ‘Socio-demographic: children, young people…’ and respectively the option ‘Socio-demographic: women’ and ‘Socio-demographic: men’ should be chosen. Important: if a post refers to ‘boys and girls’ as a way to refer to children more generally (e.g., ‘the boys and girls at this elementary school’), only the option ‘Socio-demographic: children, young people…’ should be chosen, not ‘young people’ + ‘men’ + ‘women’.
Rule 10: ‘Illegal immigrants’ always refers to ‘General: criminals…’ + ‘General: immigrants…’ Rule 11: When reference is made to companies of a certain sector which does not have its separate category in the list (e.g., ‘IT-companies’, or ‘energy companies’), only the option ‘Socio-economic: companies…’ should be chosen, not ‘Socio-economic: other (economic) sectors and professions’.
Rule 12: When reference is made to personnel or employees of a certain company which does not have its separate category in the list, then the option ‘Socio-economic: employees…’ should be chosen, not ‘Socio-economic: other (economic) sectors and professions’.

General: bona fide
General: rogue
General: criminals, people who perform illegal activities, terrorists
General: the citizens, the people, our people, regular people
General: everyone, all people, all citizens, all + other category
General: immigrants, ethnic minorities, people with foreign background, asylum seekers, refugees
Geography: Flemings; Flemish; Dutch-speakers
Geography: Walloons; Walloon; French-speakers
Geography: Belgians; Belgian
Geography: (residents of) foreign countries
Geography: (residents of a) Belgian province, city, municipality, neighborhood
Geography: Europeans, European
Geography: rural residents
Geography: urban residents
Medical: people with disabilities, disease or other medical conditions
Medical: victims (and their relatives) of disasters, criminality or accidents
Medical: victims (and their relatives) of corona (the disease) (not to be combined with other categories)
Mobility: (car) drivers
Mobility: pedestrians, cyclists, vulnerable road users
Mobility: travelers public transport, holiday travelers, tourists and other mobility
Political: Left-wing, liberals
Political: right-wing people, conservatives
Political: voters, sympathizers, followers
Political: protesters, activists, civil society organizations, unions, NGO’s
Socio-demographic: singles
Socio-demographic: families
Socio-demographic: the vulnerable
Socio-demographic: children, young people, baby’s, toddlers, students, future generations
Socio-demographic: seniors, the elderly, residents of care centers, grandparents
Socio-demographic: fathers
Socio-demographic: mothers
Socio-demographic: parents
Socio-demographic: men
Socio-demographic: women
Socio-demographic: heterosexuals
Socio-demographic: LGBTQIA+
Socio-demographic: Christians
Socio-demographic: Muslims
Socio-demographic: Jews
Socio-demographic: Muslim fundamentalists
Socio-demographic: Religious fundamentalists (non-Muslim)
Socio-demographic: white people
Socio-economic: workers, factory workers, dock workers
Socio-economic: employees, personnel, staff
Socio-economic: consumers
Socio-economic: companies, SME’s, the self-employed, employers
Socio-economic: multinationals, large companies (explicit reference to large character)
Socio-economic: tax payers, people who contribute by working
Socio-economic: the unemployed, well-faire beneficiaries, residents social housing
Socio-economic: farmers, agricultural sector
Socio-economic: the poor, lower class
Socio-economic: the rich, higher class, the elite
Socio-economic: middleclass
Socio-economic: cultural sector, artists
Socio-economic: education sector, teachers, schools
Socio-economic: health sector, doctors, health care workers
Socio-economic: sport sector, sporters (both professional and recreational), sport clubs
Socio-economic: academic sector, academics, scientists, experts
Socio-economic: financial sector, financial institutions, banks, insurance companies
Socio-economic: public sector and services (police, jailers, fire fighters, soldiers, bus drivers etc.)
Socio-economic: the catering industry, bars, restaurants, hotels
Socio-economic: other (economic) sectors and professions
Socio-economic: victims of measures corona (not the illness) (not to be combined with other group category)
Psychographic: people with certain interests, lifestyle or personality (e.g., stamp collectors, Harley enthusiasts etc.)
Other

Q4: Is the object presented as deserving of representation or not deserving of representation?

Some claims may concerns a group of people who are seen as undeserving of representation (an out-group) (e.g., when a competitor writes ‘the green party is the party of immigrants’). In this case, the green party is presented as representing an ‘undeserving group’. Note that this same social group may also be presented as deserving, for example when the Green party claims that ‘immigrants deserve to be treated with dignity’. Context should be taken into account. In similar vein, politicians will sometimes claim that they themselves do not represent the interests of a undeserving ‘out-group’ (e.g., ‘the rich should pay for the crisis’).

[Deserving/undeserving]

Q5: Who is the subject (the representative) of the claim?

Rule 1: if the claim maker refers to both him/herself and other politicians (from other parties), then only option a) claim maker himself should be chosen (e.g., ‘together with Valerie Van Peel, I proposed legislation to protect children’).
Rule 2: if the claim maker agrees with an accomplishment of a non-political actor (e.g., Pegida), then the claim maker itself (option a) should be regarded as the subject (e.g., ‘impressive demonstration of Pegida against the asylum-scum’).
Rule 3: if the claim maker agrees with an accomplishment of a political actor of another party, then the politician form the other party (option b) should be regarded as the subject if the claim maker does not refer to oneself (e.g., a competitor writing: ‘impressive demonstration of N-VA against the asylum-scum’).
Rule 4: if the post refers to a non-Belgian politician or party as the subject of the claim, choose option e) other political and type in the mentioned subject as it was mentioned in the post/intervention.
Rule 5: if the claim maker refers positively to a government/parliament in which his her party is also seated, then only option a) should be chosen.

Multiple answers are possible

a)
The claim maker him/herself (may also be implicit e.g., ‘the elderly need park benches’), his/her party, or politicians from the claim makers’ party
b)
(Politician from) other parties
c)
The government (Flemish or federal) and ministers
d)
EU (government, commission)
e)
Other political [Text entry].

Q5b: Which political party/parties? (only if option b at Q5)

Multiple answers possible

CD &V
Groen
N-VA
Open VLD
PVDA
Sp.a/vooruit
Vlaams Belang
cdH
Ecolo
DéFi
PP
MR
PS
PTB
Other

[Text entry]

Q6: Is the linkage between subject and object positive or negative?

A positive linkage means that the subject is claimed to represent the object (e.g., ‘I represent women’ or ‘he represents the rich’)

A negative linkage means that the subject is claimed not to represent the object (e.g., ‘he does not represent women’ or ‘convicted child abusers should never be allow to come into contact with children again’).

Although the subject of a claim may sometimes be implicit, the linkage is never absent (e.g., ‘women need equal rights’ or ‘the rich must pay for the crisis’ respectively have a positive and negative linkages, even though the subject is implicit.

[Positive/Negative]

[End codebook]

Appendix 3: Procedure for converting the group category to single labels

During our manual annotation process, student annotators could assign multiple group categories to each object (group mentioned as part of a claim of representation). In total, the student annotators assigned 359 unique labels (unique combinations of group categories) to 4,327 claims. However, the large majority of claims (79% or 3.402 claims) were assigned a single category only. In addition, the top 35 most mentioned unique labels were, apart from the combination of ‘criminals’ and ‘migrants and ethnic minorities’, all single category labels. As single category labels were by far the most common, we decided to convert all labels to single label using a rule-based procedure and to use the new labels to train our NLP models. The rule based procedure was done inductively and consisted of the following steps:

Step 1: Dropping generic or less meaningful categories when combined with other category

1.
When ‘other’ is combined with another category: drop ‘other’ (e.g., experts + other = experts)
2.
When ‘everyone/all’ is combined with another category: drop ‘everyone/all’ (e.g., everyone/all + children and students = children and students)
3.
When ‘rogue’ is combined with another category: drop ‘rogue’ (e.g., rogue + Large companies = Large companies)
4.
When ‘bonafide’ is combined with another category: drop ‘bonafide’ (e.g., bonafide + Large companies = Large companies)
5.
When ‘the people’ is combined with another category: drop ‘the people’ (e.g., the people + the poor = the poor)
6.
When ‘voters’ is combined with another category: drop ‘voters’ (e.g., voters + women = women)
7.
When ‘volunteers’ is combined with another category: drop volunteers (e.g., volunteers + parents = parents)
8.
When ‘people with particular hobbies of interests’ is combined with another category: drop ‘people with particular hobbies of interests’ (e.g., people with particular hobbies of interests + migrants or ethnic minorities = migrants and ethnic minorities)
9.
When ‘victims COVID (measures)’ is combined with another category: drop ‘victims COVID (measures)’ (e.g., victims COVID (measures) + businesses = businesses)
10.
When geographical labels are combined with another category: drop the geographical labels (e.g. City, municipality, or regional residents + athletes = athletes)
11.
When ‘consumers’ is combined with another category: drop ‘consumers’ (e.g., consumers + families = families)
12.
When ‘crime victims’ is combined with another category: drop ‘crime victims’ (e.g., crime victims + children and students = children and students)
13.
When ‘leftist’ is combined with ‘journalists and media’: drop leftist
14.
When ‘rightist is combined with ‘journalists and media’: drop rightist
15.
When ‘migrants and ethnic minorities’ is combined with ‘Muslimfundamentalists’: drop ‘migrants and ethnic minorities’
16.
When ‘families’ is combined with ‘children and students’: drop ‘children and students’
17.
When ‘families’ is combined with ‘parents’: drop ‘parents’
18.
When ‘other sectors and professions’ is combined with ‘businesses: drop ‘other sectors and professions’
19.
When ‘people with disabilities’ is combined with ‘the vulnerable’: drop “people with disabilities’
20.
When ‘men’ is combined with ‘migrants and ethnic minorities’: drop ‘men’
21.
When ‘women’ is combined with ‘migrants and ethnic minorities’: drop ‘women’
22.
When ‘men’ is combined with ‘criminals’: drop ‘men’
23.
When ‘women’ is combined with ‘criminals’: drop ‘women’
24.
When ‘children and students’ is combined with ‘farmers’: drop ‘children and students’
25.
When ‘children and students’ is combined with ‘businesses’: drop ‘children and students’
26.
When ‘Christians’ is combined with ‘migrants and ethnic minorities’: drop ‘Christians’
27.
When ‘criminals’ is combined with ‘Muslim fundamentalists’: drop ‘criminals’
28.
When ‘the vulnerable’ is combined with ‘parents’: drop ‘the vulnerable’
29.
When ‘the vulnerable’ is combined with ‘migrants and ethnic minorities’: drop ‘the vulnerable’
30.
When ‘the poor’ is combined with ‘migrants and ethnic minorities’: drop ‘the poor’
31.
When ‘the unemployed’ is combined with ‘migrants and ethnic minorities’: drop ‘the unemployed’
32.
When ‘the unemployed’ is combined with ‘parents’: drop ‘parents’
33.
When ‘car drivers’ is combined with ‘children and students’: drop ‘car drivers’
34.
When ‘multinationals’ is combined with ‘businesses’: drop ‘multinationals’
35.
When ‘Protesters, civil society, and unions’ is combined with ‘artists and cultural sector’: drop ‘Protesters, civil society, and unions’
36.
When ‘Protesters, civil society, and unions’ is combined with ‘leftists’: drop ‘Protesters, civil society, and unions’
37.
When ‘Protesters, civil society, and unions’ is combined with ‘rightist’: drop ‘Protesters, civil society, and unions’
38.
When ‘parents’ is combined with ‘the poor’: drop ‘parents’
39.
When ‘parents’ is combined with ‘migrants and ethnic minorities’: drop ‘parents’
40.
When ‘businesses’ is combined with ‘migrants and ethnic minorities’: drop ‘businesses’
41.
When ‘Protesters, civil society, and unions’ is combined with ‘migrants and ethnic minorities’: drop ‘Protesters, civil society, and unions’
42.
When ‘rightist’ is combined with ‘migrants and ethnic minorities’: drop ‘rightist’
43.
When ‘migrants and ethnic minorities’ is combined with ‘LGBTQ’: drop ‘migrants and ethnic minorities’
44.
When ‘artists and cultural sector’ is combined with ‘migrants and ethnic minorities’: drop ‘artists and cultural sector’
45.
When ‘white-collar workers, civil servants and personnel’ is combined with ‘migrants and ethnic minorities’: drop ‘white-collar workers, civil servants & personnel’
46.
When ‘criminals’ is combined with ‘children or students’: drop ‘criminals’
47.
When ‘parents’ is combined with ‘criminals’: drop ‘parents’
48.
When ‘women’ is combined with ‘Jewish fundamentalists’: drop ‘women’

Step 2: Merging multi-labels into new single labels

49.
When ‘the vulnerable’ is combined with ‘the unemployed’ = ‘the poor and lower class’
50.
When ‘Jews’ is combined with ‘other religious fundamentalists’ = ‘religious fundamentalists’
51.
When ‘the vulnerable’ is combined with ‘children and students’ = ‘Vulnerable children’
52.
When ‘The poor’ is combined with ‘children and students’ = ‘children in poverty’
53.
When ‘residents social housing’ is combined with ‘children and students’ = ‘children in poverty’
54.
When ‘migrants and ethnic minorities’ is combined with ‘children and students’ = ‘children with ethnic or migration background’
55.
When ‘Muslims’ is combined with ‘children and students’ = ‘children with ethnic or migration background’
56.
When ‘Jews’ is combined with ‘children and students’ = ‘children with ethnic or migration background’
57.
When ‘criminals’ is combined with ‘migrants and ethnic minorities’ = ‘criminal migrants’
58.
When ‘criminals’ is combined with ‘migrants and ethnic minorities’ and ‘children’ = ‘criminal migrants’
59.
When ‘men’ is combined with ‘women’ = ‘everyone/all’
60.
When ‘children and students’ is combined with ‘the elderly’ = ‘everyone/all’
61.
When ‘the poor’ is combined with ‘the rich’ = ‘everyone/all’
62.
When ‘Muslims’ is combined with ‘Jews’ = ‘Migrants and ethnic minorities’

Step 3: Further grouping single labels together

63.
Group ‘white-collar workers, civil servants, personnel’ and ‘laborers’ into ‘workers’
64.
Group ‘unemployed’ and ‘the poor’ into ‘the poor and lower class’
65.
Group ‘victims COVID (disease)’ and ‘people with disabilities’ into ‘people with disabilities or disease’
66.
Group ‘middle class’ and ‘the people’ into ‘the people’
67.
Group ‘men’ and ‘fathers’ into ‘boys, men and fathers’
68.
Group ‘women’ and ‘mothers’ into ‘girls, women and mothers’
69.
Group ‘farmers’ and ‘rural residents’ into ‘farmers and rural residents’
70.
Group ‘the vulnarable’ and ‘the poor and lower class’ into ‘the poor and lower class’
71.
Group ‘vulnerable children’ and ‘children in poverty into ‘children in poverty’
72.
Group ‘Muslim fundamentalists’ and ‘Jewish fundamentalists’ into Religious fundamentalists
73.
Group ‘people with particular hobbies or interests’ and ‘other’ into ‘other’

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Gevers, I., De Mulder, A. & Daelemans, W. Towards a large scale analysis of claims: developing a machine learning method for detecting and classifying politicians’ claims of representation. J Comput Soc Sc (2024). https://doi.org/10.1007/s42001-024-00261-y

Download citation

Received: 23 October 2023
Accepted: 24 February 2024
Published: 16 March 2024
DOI: https://doi.org/10.1007/s42001-024-00261-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards a large scale analysis of claims: developing a machine learning method for detecting and classifying politicians’ claims of representation

Abstract

Access this article

Similar content being viewed by others

First public dataset to study 2023 Turkish general election

More human than human: measuring ChatGPT political bias

A survey of sentiment analysis in social media

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix 1: Dutch version of example Facebook post

Appendix 2: Manual annotation guidelines

Appendix 3: Procedure for converting the group category to single labels

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Towards a large scale analysis of claims: developing a machine learning method for detecting and classifying politicians’ claims of representation

Abstract

Access this article

Similar content being viewed by others

First public dataset to study 2023 Turkish general election

More human than human: measuring ChatGPT political bias

A survey of sentiment analysis in social media

Data availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Appendix 1: Dutch version of example Facebook post

Appendix 2: Manual annotation guidelines

Appendix 3: Procedure for converting the group category to single labels

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation