Skip to main content

Overview of PAN 2021: Authorship Verification, Profiling Hate Speech Spreaders on Twitter, and Style Change Detection

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12880))

Abstract

The paper gives a brief overview of the three shared tasks organized at the PAN 2021 lab on digital text forensics and stylometry hosted at the CLEF conference. The tasks include authorship verification across domains, author profiling for hate speech spreaders, and style change detection for multi-author documents. In part the tasks are new and in part they continue and advance past shared tasks, with the overall goal of advancing the state of the art, providing for an objective evaluation on newly developed benchmark datasets.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://pan.webis.de/data.html.

  2. 2.

    To generate the datasets, we have followed a methodology that complies with the EU General Data Protection Regulation [26].

  3. 3.

    We should highlight that we are aware of the legal and ethical issues related to collecting, analysing and profiling social media data [26] and that we are committed to legal and ethical compliance in our scientific research and its outcomes. For instance, we have anonymised the user name, masked all the user mentions and also the class has been changed in order to avoid any explicit mention.

  4. 4.

    In a realistic scenario, we would need to know a priori the distribution of haters vs non-haters; depending on the study, the number of hatred messages in Twitter ranges from 1% [21] to 10%–15% [39], although when the target are communities such as the LGBT, up to 78% of respondents had experienced online anti-LGBT and hate speech in the last 5 years (https://www.report-it.org.uk/files/online-crime-2020_0.pdf). Furthermore, one of the aims of this shared task is to foster research on profiling haters in order to address this problem automatically.

  5. 5.

    Machine learning emerged as a methodology in authorship attribution in the 1990s. The first paper to apply a text classification approach in this domain is [19] to the best of our knowledge.

  6. 6.

    https://zenodo.org/record/3716403.

  7. 7.

    Thanks to Fabrizio Sebastiani (Consiglio Nazionale delle Ricerche, Italy) for this suggestion.

  8. 8.

    The following StackExchange sites were used: Code Review, Computer Graphics, CS Educators, CS Theory, Data Science, DBA, DevOps, GameDev, Network Engineering, Raspberry Pi, Superuser, and Server Fault.

References

  1. Basile, V., et al.: SemEval-2019 Task 5: multilingual detection of hate speech against immigrants and women in Twitter. In: Proceedings of the 13th International Workshop on Semantic Evaluation (SemEval-2019), Co-located with the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019) (2019)

    Google Scholar 

  2. Bevendorff, J., Stein, B., Hagen, M., Potthast, M.: Generalizing unmasking for short texts. In: Burstein, J., Doran, C., Solorio, T. (eds.) 14th Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL 2019), pp. 654–659. Association for Computational Linguistics, June 2019. https://www.aclweb.org/anthology/N19-1068

  3. BRIER, G.W.: Verification of forecasts expressed in terms of probability. Monthly Weather Rev. 78(1), 1–3 (1950). https://doi.org/10.1175/1520-0493(1950)078<0001:VOFEIT>2.0.CO;2. https://journals.ametsoc.org/view/journals/mwre/78/1/1520-0493_1950_078_0001_vofeit_2_0_co_2.xml

  4. ElSherief, M., Nilizadeh, S., Nguyen, D., Vigna, G., Belding, E.: Peer to peer hate: hate speech instigators and their targets. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 12 (2018)

    Google Scholar 

  5. Fathallah, J.: Fanfiction and the Author. How FanFic Changes Popular Cultural Texts, Amsterdam University Press, Amsterdam (2017)

    Book  Google Scholar 

  6. Hagen, L., et al.: Emoji use in Twitter white nationalism communication. In: Conference Companion Publication of the 2019 on Computer Supported Cooperative Work and Social Computing, pp. 201–205 (2019)

    Google Scholar 

  7. Halvani, O., Graner, L.: Cross-domain authorship attribution based on compression: notebook for PAN at CLEF 2018. In: Cappellato, L., Ferro, N., Nie, J., Soulier, L. (eds.) Working Notes of CLEF 2018 - Conference and Labs of the Evaluation Forum, Avignon, France, 10–14 September 2018, CEUR Workshop Proceedings, vol. 2125. CEUR-WS.org (2018). http://ceur-ws.org/Vol-2125/paper_90.pdf

  8. Hellekson, K., Busse, K.: (eds.): The Fan Fiction Studies Reader. University of Iowa Press (2014)

    Google Scholar 

  9. Juola, P.: Authorship attribution. Found. Trends Inf. Retr. 1(3), 233–334 (2006)

    Article  Google Scholar 

  10. Kestemont, M., et al.: Overview of the cross-domain authorship verification task at PAN 2020. In: Cappellato, L., Eickhoff, C., Ferro, N., Névéol, A. (eds.) Working Notes of CLEF 2020 - Conference and Labs of the Evaluation Forum, Thessaloniki, Greece, 22–25 September 2020, CEUR Workshop Proceedings, vol. 2696. CEUR-WS.org (2020). http://ceur-ws.org/Vol-2696/paper_264.pdf

  11. Kestemont, M., et al.: Overview of the authorship verification task at PAN 2021. In: Faggioli, G., Ferro, N., Joly, A., Maistro, M., Piroi, F. (eds.) CLEF 2021 Labs and Workshops, Notebook Papers. CEUR-WS.org (2021)

    Google Scholar 

  12. Kestemont, M., Stamatatos, E., Manjavacas, E., Daelemans, W., Potthast, M., Stein, B.: Overview of the cross-domain authorship attribution task at PAN 2019. In: CLEF 2019 Labs and Workshops, Notebook Papers (2019)

    Google Scholar 

  13. Kestemont, M., Stover, J.A., Koppel, M., Karsdorp, F., Daelemans, W.: Authenticating the writings of Julius Caesar. Expert Syst. Appl. 63, 86–96 (2016). https://doi.org/10.1016/j.eswa.2016.06.029

    Article  Google Scholar 

  14. Kestemont, M., et al.: Overview of the author identification task at PAN 2018: cross-domain authorship attribution and style change detection. In: CLEF 2018 Labs and Workshops, Notebook Papers (2018)

    Google Scholar 

  15. Koppel, M., Schler, J.: Authorship verification as a one-class classification problem. In: Brodley, C.E. (ed.) Machine Learning, Proceedings of the Twenty-first International Conference (ICML 2004), Banff, Alberta, Canada, 4–8 July 2004, ACM International Conference Proceeding Series, vol. 69. ACM (2004). https://doi.org/10.1145/1015330.1015448

  16. Koppel, M., Schler, J., Argamon, S.: Computational methods in authorship attribution. J. Am. Soc. Inform. Sci. Technol. 60(1), 9–26 (2009)

    Article  Google Scholar 

  17. Mathew, B., Dutt, R., Goyal, P., Mukherjee, A.: Spread of hate speech in online social media. In: Proceedings of the 10th ACM Conference on Web Science, pp. 173–182 (2019)

    Google Scholar 

  18. Mathew, B., Illendula, A., Saha, P., Sarkar, S., Goyal, P., Mukherjee, A.: Hate begets hate: a temporal study of hate speech. Proc. ACM on Hum.-Comput. Interact. 4(CSCW2), 1–24 (2020)

    Article  Google Scholar 

  19. Matthews, R.A.J., Merriam, T.V.N.: Neural computation in Stylometry I: an application to the works of shakespeare and fletcher. Lit. Linguist. Comput. 8(4), 203–209 (1993). https://doi.org/10.1093/llc/8.4.203. ISSN 0268–1145

    Article  Google Scholar 

  20. Nockleby, J.T.: Hate speech. In: Levy, L.W., Karst, K.L., et al. (eds.) Encyclopedia of the American Constitution. 2nd edn., pp. pp. 1277–1279. Macmillan, New York (2000)

    Google Scholar 

  21. Pereira-Kohatsu, J.C., Quijano-Sánchez, L., Liberatore, F., Camacho-Collados, M.: Detecting and monitoring hate speech in Twitter. Sensors 19(21), 4654 (2019)

    Article  Google Scholar 

  22. Potthast, M., Gollub, T., Wiegmann, M., Stein, B.: TIRA integrated research architecture. In: Ferro, N., Peters, C. (eds.) Information Retrieval Evaluation in a Changing World. TIRS, vol. 41, pp. 123–160. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-22948-1_5

    Chapter  Google Scholar 

  23. Qian, J., ElSherief, M., Belding, E.M., Wang, W.Y.: Leveraging intra-user and inter-user representation learning for automated hate speech detection. arXiv preprint arXiv:1804.03124 (2018)

  24. Rangel, F., De-La-Peña-Sarracén, G.L., Chulvi, B., Fersini, E., Rosso, P.: Profiling hate speech spreaders on twitter task at PAN 2021. In: Faggioli, G., Ferro, N., Joly, A., Maistro, M., Piroi, F. (eds.) CLEF 2021 Labs and Workshops, Notebook Papers. CEUR-WS.org (2021)

    Google Scholar 

  25. Rangel, F., Giachanou, A., Ghanem, B., Rosso, P.: Overview of the 8th author profiling task at PAN 2019: profiling fake news spreaders on Twitter. In: CLEF 2020 Labs and Workshops, Notebook Papers. CEUR Workshop Proceedings (2020)

    Google Scholar 

  26. Rangel, F., Rosso, P.: On the implications of the general data protection regulation on the organisation of evaluation tasks. Lang. Law/Linguagem Direito 5(2), 95–117 (2019)

    Google Scholar 

  27. Rangel, F., Rosso, P.: Overview of the 7th author profiling task at pan 2019: bots and gender profiling. In: CLEF 2019 Labs and Workshops, Notebook Papers (2019)

    Google Scholar 

  28. Rangel, F., et al.: Overview of the 2nd author profiling task at PAN 2014. In: CLEF 2014 Labs and Workshops, Notebook Papers (2014)

    Google Scholar 

  29. Rangel, F., Franco-Salvador, M., Rosso, P.: A low dimensionality representation for language variety identification. In: Gelbukh, A. (ed.) CICLing 2016. LNCS, vol. 9624, pp. 156–169. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-75487-1_13

    Chapter  Google Scholar 

  30. Rangel, F., Rosso, P., Montes-y-Gómez, M., Potthast, M., Stein, B.: Overview of the 6th author profiling task at PAN 2018: multimodal gender identification in Twitter. In: CLEF 2019 Labs and Workshops, Notebook Papers (2018)

    Google Scholar 

  31. Rangel, F., Rosso, P., Moshe Koppel, M., Stamatatos, E., Inches, G.: Overview of the author profiling task at PAN 2013. In: CLEF 2013 Labs and Workshops, Notebook Papers (2013)

    Google Scholar 

  32. Rangel, F., Rosso, P., Potthast, M., Stein, B.: Overview of the 5th author profiling task at PAN 2017: gender and language variety identification in Twitter. Working Notes Papers of the CLEF (2017)

    Google Scholar 

  33. Rangel, F., Rosso, P., Potthast, M., Stein, B., Daelemans, W.: Overview of the 3rd author profiling task at PAN 2015. In: CLEF 2015 Labs and Workshops, Notebook Papers (2015)

    Google Scholar 

  34. Rangel, F., Rosso, P., Verhoeven, B., Daelemans, W., Potthast, M., Stein, B.: Overview of the 4th author profiling task at PAN 2016: cross-genre evaluations. In: CLEF 2016 Labs and Workshops, Notebook Papers (2016). ISSN 1613–0073

    Google Scholar 

  35. Ribeiro, M., Calais, P., Santos, Y., Almeida, V., Meira Jr., W.: Characterizing and Detecting Hateful Users on Twitter. In: Proceedings of the International AAAI Conference on Web and Social Media, vol. 12 (2018)

    Google Scholar 

  36. Rosso, P., Rangel, F., Potthast, M., Stamatatos, E., Tschuggnall, M., Stein, B.: Overview of PAN’16–new challenges for authorship analysis: cross-genre profiling, clustering, diarization, and obfuscation. In: Experimental IR Meets Multilinguality, Multimodality, and Interaction. 7th International Conference of the CLEF Initiative (CLEF 16) (2016)

    Google Scholar 

  37. Stamatatos, E.: A survey of modern authorship attribution methods. JASIST 60(3), 538–556 (2009). https://doi.org/10.1002/asi.21001

    Article  Google Scholar 

  38. Tschuggnall, M., et al.: Overview of the author identification task at PAN 2017: style breach detection and author clustering. In: CLEF 2017 Labs and Workshops, Notebook Papers (2017)

    Google Scholar 

  39. Waseem, Z.: Are you a racist or am i seeing things? Annotator influence on hate speech detection on Twitter. In: Proceedings of the First Workshop on NLP and Computational Social Science, pp. 138–142. Association for Computational Linguistics, Austin, November 2016. https://doi.org/10.18653/v1/W16-5618. https://www.aclweb.org/anthology/W16-5618

  40. Zangerle, E., Mayerl, M., Potthast, M., Stein, B.: Overview of the style change detection task at PAN 2021. In: Faggioli, G., Ferro, N., Joly, A., Maistro, M., Piroi, F. (eds.) CLEF 2021 Labs and Workshops, Notebook Papers. CEUR-WS.org (2021)

    Google Scholar 

  41. Zangerle, E., Mayerl, M., Specht, G., Potthast, M., Stein, B.: Overview of the style change detection task at PAN 2020. In: CLEF 2020 Labs and Workshops, Notebook Papers (2020)

    Google Scholar 

  42. Zangerle, E., Tschuggnall, M., Specht, G., Stein, B., Potthast, M.: Overview of the style change detection task at PAN 2019. In: CLEF 2019 Labs and Workshops, Notebook Papers (2019)

    Google Scholar 

Download references

Acknowledgments

The work of the researchers from Universitat Politècnica de València was partially funded by the Spanish MICINN under the project MISMIS-FAKEnHATE on MISinformation and MIScommunication in social media: FAKE news and HATE speech (PGC2018-096212-B-C31), and by the Generalitat Valenciana under the project DeepPattern (PROMETEO/2019/121). This article is also based upon work from the DigForAsp COST Action 17124 on Digital Forensics: evidence analysis via intelligent systems and practices, supported by European Cooperation in Science and Technology.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matti Wiegmann .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bevendorff, J. et al. (2021). Overview of PAN 2021: Authorship Verification, Profiling Hate Speech Spreaders on Twitter, and Style Change Detection. In: Candan, K.S., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2021. Lecture Notes in Computer Science(), vol 12880. Springer, Cham. https://doi.org/10.1007/978-3-030-85251-1_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-85251-1_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-85250-4

  • Online ISBN: 978-3-030-85251-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics