Overview of PAN 2021: Authorship Verification, Profiling Hate Speech Spreaders on Twitter, and Style Change Detection

Bevendorff, Janek; Chulvi, Berta; De La Peña Sarracén, Gretel Liz; Kestemont, Mike; Manjavacas, Enrique; Markov, Ilia; Mayerl, Maximilian; Potthast, Martin; Rangel, Francisco; Rosso, Paolo; Stamatatos, Efstathios; Stein, Benno; Wiegmann, Matti; Wolska, Magdalena; Zangerle, Eva

doi:10.1007/978-3-030-85251-1_26

Overview of PAN 2021: Authorship Verification, Profiling Hate Speech Spreaders on Twitter, and Style Change Detection

Janek Bevendorff¹⁸,
Berta Chulvi¹⁹,
Gretel Liz De La Peña Sarracén¹⁹,
Mike Kestemont²⁰,
Enrique Manjavacas²⁰,
Ilia Markov²⁰,
Maximilian Mayerl²¹,
Martin Potthast²²,
Francisco Rangel²³,
Paolo Rosso¹⁹,
Efstathios Stamatatos²⁴,
Benno Stein¹⁸,
Matti Wiegmann¹⁸,
Magdalena Wolska¹⁸ &
…
Eva Zangerle²¹

Conference paper
First Online: 14 September 2021

1494 Accesses
12 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12880))

Abstract

The paper gives a brief overview of the three shared tasks organized at the PAN 2021 lab on digital text forensics and stylometry hosted at the CLEF conference. The tasks include authorship verification across domains, author profiling for hate speech spreaders, and style change detection for multi-author documents. In part the tasks are new and in part they continue and advance past shared tasks, with the overall goal of advancing the state of the art, providing for an objective evaluation on newly developed benchmark datasets.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
https://pan.webis.de/data.html.
2.
To generate the datasets, we have followed a methodology that complies with the EU General Data Protection Regulation [26].
3.
We should highlight that we are aware of the legal and ethical issues related to collecting, analysing and profiling social media data [26] and that we are committed to legal and ethical compliance in our scientific research and its outcomes. For instance, we have anonymised the user name, masked all the user mentions and also the class has been changed in order to avoid any explicit mention.
4.
In a realistic scenario, we would need to know a priori the distribution of haters vs non-haters; depending on the study, the number of hatred messages in Twitter ranges from 1% [21] to 10%–15% [39], although when the target are communities such as the LGBT, up to 78% of respondents had experienced online anti-LGBT and hate speech in the last 5 years (https://www.report-it.org.uk/files/online-crime-2020_0.pdf). Furthermore, one of the aims of this shared task is to foster research on profiling haters in order to address this problem automatically.
5.
Machine learning emerged as a methodology in authorship attribution in the 1990s. The first paper to apply a text classification approach in this domain is [19] to the best of our knowledge.
6.
https://zenodo.org/record/3716403.
7.
Thanks to Fabrizio Sebastiani (Consiglio Nazionale delle Ricerche, Italy) for this suggestion.
8.
The following StackExchange sites were used: Code Review, Computer Graphics, CS Educators, CS Theory, Data Science, DBA, DevOps, GameDev, Network Engineering, Raspberry Pi, Superuser, and Server Fault.

References

Basile, V., et al.: SemEval-2019 Task 5: multilingual detection of hate speech against immigrants and women in Twitter. In: Proceedings of the 13th International Workshop on Semantic Evaluation (SemEval-2019), Co-located with the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019) (2019)
Google Scholar
Bevendorff, J., Stein, B., Hagen, M., Potthast, M.: Generalizing unmasking for short texts. In: Burstein, J., Doran, C., Solorio, T. (eds.) 14th Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2019), pp. 654–659. Association for Computational Linguistics, June 2019. https://www.aclweb.org/anthology/N19-1068
BRIER, G.W.: Verification of forecasts expressed in terms of probability. Monthly Weather Rev. 78(1), 1–3 (1950). https://doi.org/10.1175/1520-0493(1950)078<0001:VOFEIT>2.0.CO;2. https://journals.ametsoc.org/view/journals/mwre/78/1/1520-0493_1950_078_0001_vofeit_2_0_co_2.xml
ElSherief, M., Nilizadeh, S., Nguyen, D., Vigna, G., Belding, E.: Peer to peer hate: hate speech instigators and their targets. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 12 (2018)
Google Scholar
Fathallah, J.: Fanfiction and the Author. How FanFic Changes Popular Cultural Texts, Amsterdam University Press, Amsterdam (2017)
Book Google Scholar
Hagen, L., et al.: Emoji use in Twitter white nationalism communication. In: Conference Companion Publication of the 2019 on Computer Supported Cooperative Work and Social Computing, pp. 201–205 (2019)
Google Scholar
Halvani, O., Graner, L.: Cross-domain authorship attribution based on compression: notebook for PAN at CLEF 2018. In: Cappellato, L., Ferro, N., Nie, J., Soulier, L. (eds.) Working Notes of CLEF 2018 - Conference and Labs of the Evaluation Forum, Avignon, France, 10–14 September 2018, CEUR Workshop Proceedings, vol. 2125. CEUR-WS.org (2018). http://ceur-ws.org/Vol-2125/paper_90.pdf
Hellekson, K., Busse, K.: (eds.): The Fan Fiction Studies Reader. University of Iowa Press (2014)
Google Scholar
Juola, P.: Authorship attribution. Found. Trends Inf. Retr. 1(3), 233–334 (2006)
Article Google Scholar
Kestemont, M., et al.: Overview of the cross-domain authorship verification task at PAN 2020. In: Cappellato, L., Eickhoff, C., Ferro, N., Névéol, A. (eds.) Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, 22–25 September 2020, CEUR Workshop Proceedings, vol. 2696. CEUR-WS.org (2020). http://ceur-ws.org/Vol-2696/paper_264.pdf
Kestemont, M., et al.: Overview of the authorship verification task at PAN 2021. In: Faggioli, G., Ferro, N., Joly, A., Maistro, M., Piroi, F. (eds.) CLEF 2021 Labs and Workshops, Notebook Papers. CEUR-WS.org (2021)
Google Scholar
Kestemont, M., Stamatatos, E., Manjavacas, E., Daelemans, W., Potthast, M., Stein, B.: Overview of the cross-domain authorship attribution task at PAN 2019. In: CLEF 2019 Labs and Workshops, Notebook Papers (2019)
Google Scholar
Kestemont, M., Stover, J.A., Koppel, M., Karsdorp, F., Daelemans, W.: Authenticating the writings of Julius Caesar. Expert Syst. Appl. 63, 86–96 (2016). https://doi.org/10.1016/j.eswa.2016.06.029
Article Google Scholar
Kestemont, M., et al.: Overview of the author identification task at PAN 2018: cross-domain authorship attribution and style change detection. In: CLEF 2018 Labs and Workshops, Notebook Papers (2018)
Google Scholar
Koppel, M., Schler, J.: Authorship verification as a one-class classification problem. In: Brodley, C.E. (ed.) Machine Learning, Proceedings of the Twenty-first International Conference (ICML 2004), Banff, Alberta, Canada, 4–8 July 2004, ACM International Conference Proceeding Series, vol. 69. ACM (2004). https://doi.org/10.1145/1015330.1015448
Koppel, M., Schler, J., Argamon, S.: Computational methods in authorship attribution. J. Am. Soc. Inform. Sci. Technol. 60(1), 9–26 (2009)
Article Google Scholar
Mathew, B., Dutt, R., Goyal, P., Mukherjee, A.: Spread of hate speech in online social media. In: Proceedings of the 10th ACM Conference on Web Science, pp. 173–182 (2019)
Google Scholar
Mathew, B., Illendula, A., Saha, P., Sarkar, S., Goyal, P., Mukherjee, A.: Hate begets hate: a temporal study of hate speech. Proc. ACM on Hum.-Comput. Interact. 4(CSCW2), 1–24 (2020)
Article Google Scholar
Matthews, R.A.J., Merriam, T.V.N.: Neural computation in Stylometry I: an application to the works of shakespeare and fletcher. Lit. Linguist. Comput. 8(4), 203–209 (1993). https://doi.org/10.1093/llc/8.4.203. ISSN 0268–1145
Article Google Scholar
Nockleby, J.T.: Hate speech. In: Levy, L.W., Karst, K.L., et al. (eds.) Encyclopedia of the American Constitution. 2nd edn., pp. pp. 1277–1279. Macmillan, New York (2000)
Google Scholar
Pereira-Kohatsu, J.C., Quijano-Sánchez, L., Liberatore, F., Camacho-Collados, M.: Detecting and monitoring hate speech in Twitter. Sensors 19(21), 4654 (2019)
Article Google Scholar
Potthast, M., Gollub, T., Wiegmann, M., Stein, B.: TIRA integrated research architecture. In: Ferro, N., Peters, C. (eds.) Information Retrieval Evaluation in a Changing World. TIRS, vol. 41, pp. 123–160. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-22948-1_5
Chapter Google Scholar
Qian, J., ElSherief, M., Belding, E.M., Wang, W.Y.: Leveraging intra-user and inter-user representation learning for automated hate speech detection. arXiv preprint arXiv:1804.03124 (2018)
Rangel, F., De-La-Peña-Sarracén, G.L., Chulvi, B., Fersini, E., Rosso, P.: Profiling hate speech spreaders on twitter task at PAN 2021. In: Faggioli, G., Ferro, N., Joly, A., Maistro, M., Piroi, F. (eds.) CLEF 2021 Labs and Workshops, Notebook Papers. CEUR-WS.org (2021)
Google Scholar
Rangel, F., Giachanou, A., Ghanem, B., Rosso, P.: Overview of the 8th author profiling task at PAN 2019: profiling fake news spreaders on Twitter. In: CLEF 2020 Labs and Workshops, Notebook Papers. CEUR Workshop Proceedings (2020)
Google Scholar
Rangel, F., Rosso, P.: On the implications of the general data protection regulation on the organisation of evaluation tasks. Lang. Law/Linguagem Direito 5(2), 95–117 (2019)
Google Scholar
Rangel, F., Rosso, P.: Overview of the 7th author profiling task at pan 2019: bots and gender profiling. In: CLEF 2019 Labs and Workshops, Notebook Papers (2019)
Google Scholar
Rangel, F., et al.: Overview of the 2nd author profiling task at PAN 2014. In: CLEF 2014 Labs and Workshops, Notebook Papers (2014)
Google Scholar
Rangel, F., Franco-Salvador, M., Rosso, P.: A low dimensionality representation for language variety identification. In: Gelbukh, A. (ed.) CICLing 2016. LNCS, vol. 9624, pp. 156–169. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-75487-1_13
Chapter Google Scholar
Rangel, F., Rosso, P., Montes-y-Gómez, M., Potthast, M., Stein, B.: Overview of the 6th author profiling task at PAN 2018: multimodal gender identification in Twitter. In: CLEF 2019 Labs and Workshops, Notebook Papers (2018)
Google Scholar
Rangel, F., Rosso, P., Moshe Koppel, M., Stamatatos, E., Inches, G.: Overview of the author profiling task at PAN 2013. In: CLEF 2013 Labs and Workshops, Notebook Papers (2013)
Google Scholar
Rangel, F., Rosso, P., Potthast, M., Stein, B.: Overview of the 5th author profiling task at PAN 2017: gender and language variety identification in Twitter. Working Notes Papers of the CLEF (2017)
Google Scholar
Rangel, F., Rosso, P., Potthast, M., Stein, B., Daelemans, W.: Overview of the 3rd author profiling task at PAN 2015. In: CLEF 2015 Labs and Workshops, Notebook Papers (2015)
Google Scholar
Rangel, F., Rosso, P., Verhoeven, B., Daelemans, W., Potthast, M., Stein, B.: Overview of the 4th author profiling task at PAN 2016: cross-genre evaluations. In: CLEF 2016 Labs and Workshops, Notebook Papers (2016). ISSN 1613–0073
Google Scholar
Ribeiro, M., Calais, P., Santos, Y., Almeida, V., Meira Jr., W.: Characterizing and Detecting Hateful Users on Twitter. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 12 (2018)
Google Scholar
Rosso, P., Rangel, F., Potthast, M., Stamatatos, E., Tschuggnall, M., Stein, B.: Overview of PAN’16–new challenges for authorship analysis: cross-genre profiling, clustering, diarization, and obfuscation. In: Experimental IR Meets Multilinguality, Multimodality, and Interaction. 7th International Conference of the CLEF Initiative (CLEF 16) (2016)
Google Scholar
Stamatatos, E.: A survey of modern authorship attribution methods. JASIST 60(3), 538–556 (2009). https://doi.org/10.1002/asi.21001
Article Google Scholar
Tschuggnall, M., et al.: Overview of the author identification task at PAN 2017: style breach detection and author clustering. In: CLEF 2017 Labs and Workshops, Notebook Papers (2017)
Google Scholar
Waseem, Z.: Are you a racist or am i seeing things? Annotator influence on hate speech detection on Twitter. In: Proceedings of the First Workshop on NLP and Computational Social Science, pp. 138–142. Association for Computational Linguistics, Austin, November 2016. https://doi.org/10.18653/v1/W16-5618. https://www.aclweb.org/anthology/W16-5618
Zangerle, E., Mayerl, M., Potthast, M., Stein, B.: Overview of the style change detection task at PAN 2021. In: Faggioli, G., Ferro, N., Joly, A., Maistro, M., Piroi, F. (eds.) CLEF 2021 Labs and Workshops, Notebook Papers. CEUR-WS.org (2021)
Google Scholar
Zangerle, E., Mayerl, M., Specht, G., Potthast, M., Stein, B.: Overview of the style change detection task at PAN 2020. In: CLEF 2020 Labs and Workshops, Notebook Papers (2020)
Google Scholar
Zangerle, E., Tschuggnall, M., Specht, G., Stein, B., Potthast, M.: Overview of the style change detection task at PAN 2019. In: CLEF 2019 Labs and Workshops, Notebook Papers (2019)
Google Scholar

Download references

Acknowledgments

The work of the researchers from Universitat Politècnica de València was partially funded by the Spanish MICINN under the project MISMIS-FAKEnHATE on MISinformation and MIScommunication in social media: FAKE news and HATE speech (PGC2018-096212-B-C31), and by the Generalitat Valenciana under the project DeepPattern (PROMETEO/2019/121). This article is also based upon work from the DigForAsp COST Action 17124 on Digital Forensics: evidence analysis via intelligent systems and practices, supported by European Cooperation in Science and Technology.

Author information

Authors and Affiliations

Bauhaus-Universität Weimar, Weimar, Germany
Janek Bevendorff, Benno Stein, Matti Wiegmann & Magdalena Wolska
Universitat Politècnica de València, Valencia, Spain
Berta Chulvi, Gretel Liz De La Peña Sarracén & Paolo Rosso
University of Antwerp, Antwerp, Belgium
Mike Kestemont, Enrique Manjavacas & Ilia Markov
University of Innsbruck, Innsbruck, Austria
Maximilian Mayerl & Eva Zangerle
Universität Leipzig, Leipzig, Germany
Martin Potthast
Symanto Research, Nürnberg, Germany
Francisco Rangel
University of the Aegean, Mytilene, Greece
Efstathios Stamatatos

Authors

Janek Bevendorff
View author publications
You can also search for this author in PubMed Google Scholar
Berta Chulvi
View author publications
You can also search for this author in PubMed Google Scholar
Gretel Liz De La Peña Sarracén
View author publications
You can also search for this author in PubMed Google Scholar
Mike Kestemont
View author publications
You can also search for this author in PubMed Google Scholar
Enrique Manjavacas
View author publications
You can also search for this author in PubMed Google Scholar
Ilia Markov
View author publications
You can also search for this author in PubMed Google Scholar
Maximilian Mayerl
View author publications
You can also search for this author in PubMed Google Scholar
Martin Potthast
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Rangel
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Rosso
View author publications
You can also search for this author in PubMed Google Scholar
Efstathios Stamatatos
View author publications
You can also search for this author in PubMed Google Scholar
Benno Stein
View author publications
You can also search for this author in PubMed Google Scholar
Matti Wiegmann
View author publications
You can also search for this author in PubMed Google Scholar
Magdalena Wolska
View author publications
You can also search for this author in PubMed Google Scholar
Eva Zangerle
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matti Wiegmann .

Editor information

Editors and Affiliations

Arizona State University, Tempe, AZ, USA
K. Selçuk Candan
Politehnica University of Bucharest, Bucharest, Romania
Bogdan Ionescu
Université Grenoble Alpes, Saint-Martin-d’Hères, France
Lorraine Goeuriot
Aalborg University Copenhagen, Copenhagen, Denmark
Birger Larsen
HES-SO Valais-Wallis, Sierre, Switzerland
Henning Müller
University of Montpellier, Montpellier, France
Alexis Joly
University of Copenhagen, Copenhagen, Denmark
Maria Maistro
TU Wien, Vienna, Austria
Florina Piroi
University of Padua, Padova, Italy
Guglielmo Faggioli
University of Padua, Padova, Italy
Nicola Ferro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bevendorff, J. et al. (2021). Overview of PAN 2021: Authorship Verification, Profiling Hate Speech Spreaders on Twitter, and Style Change Detection. In: Candan, K.S., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2021. Lecture Notes in Computer Science(), vol 12880. Springer, Cham. https://doi.org/10.1007/978-3-030-85251-1_26

Download citation

DOI: https://doi.org/10.1007/978-3-030-85251-1_26
Published: 14 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85250-4
Online ISBN: 978-3-030-85251-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics