Advertisement

Overview of PAN 2019: Bots and Gender Profiling, Celebrity Profiling, Cross-Domain Authorship Attribution and Style Change Detection

  • Walter Daelemans
  • Mike Kestemont
  • Enrique Manjavacas
  • Martin PotthastEmail author
  • Francisco Rangel
  • Paolo Rosso
  • Günther Specht
  • Efstathios Stamatatos
  • Benno Stein
  • Michael Tschuggnall
  • Matti Wiegmann
  • Eva Zangerle
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11696)

Abstract

We briefly report on the four shared tasks organized as part of the PAN 2019 evaluation lab on digital text forensics and authorship analysis. Each task is introduced, motivated, and the results obtained are presented. Altogether, the four tasks attracted 373 registrations, yielding 72 successful submissions. This, and the fact that we continue to invite the submission of software rather than its run output using the TIRA experimentation platform, demarcates a good start into the second decade of PAN evaluations labs.

Notes

Acknowledgments

The work of Paolo Rosso was partially funded by the Spanish MICINN under the research project MISMIS-FAKEnHATE on Misinformation and Miscommunication in social media: FAKE news and HATE speech (PGC2018-096212-B-C31). Our special thanks goes to all PAN participants for providing high-quality submission, to Symanto (https://www.symanto.net) for sponsoring the PAN Lab 2019 and to The Logic Value (https://thelogicvalue.com) for sponsoring the author profiling shared task award.

References

  1. 1.
    Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python. O’Reilly Media, Sebastopol (2009)zbMATHGoogle Scholar
  2. 2.
    Cappellato, L., Ferro, N., Losada, D., Müller, H. (eds.): CLEF 2019 Labs and Workshops, Notebook Papers. CEUR Workshop Proceedings. CEUR-WS.org, September 2019Google Scholar
  3. 3.
    Cardoso, J., Sousa, R.: Measuring the performance of ordinal classification. Int. J. Pattern Recognit Artif Intell. 25(08), 1173–1195 (2011)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Hellekson, K., Busse, K. (eds.): The Fan Fiction Studies Reader. University of Iowa Press, Iowa City (2014)Google Scholar
  5. 5.
    Juola, P.: Authorship attribution. Found. Trends Inf. Retrieval 1(3), 233–334 (2006)CrossRefGoogle Scholar
  6. 6.
    Kestemont, M., Stamatatos, E., Manjavacas, E., Daelemans, W., Potthast, M., Stein, B.: Overview of the cross-domain authorship attribution task at PAN 2019. In: Cappellato et al. [2]Google Scholar
  7. 7.
    Kestemont, M., Stover, J.A., Koppel, M., Karsdorp, F., Daelemans, W.: Authenticating the writings of Julius Caesar. Expert Syst. Appl. 63, 86–96 (2016).  https://doi.org/10.1016/j.eswa.2016.06.029CrossRefGoogle Scholar
  8. 8.
    Kestemont, M., et al.: Overview of the author identification task at PAN-2018: cross-domain authorship attribution and style change detection. In: Cappellato, L. et al. (eds.) Working Notes Papers of the CLEF 2018 Evaluation Labs, Avignon, France, 10–14 September 2018, pp. 1–25 (2018)Google Scholar
  9. 9.
    Koppel, M., Schler, J., Argamon, S.: Computational methods in authorship attribution. J. Am. Soc. Inf. Sci. Technol. 60(1), 9–26 (2009)CrossRefGoogle Scholar
  10. 10.
    Koppel, M., Winter, Y.: Determining if two documents are written by the same author. J. Assoc. Inf. Sci. Technol. 65(1), 178–187 (2014)CrossRefGoogle Scholar
  11. 11.
    Júnior, P.R.M., et al.: Nearest neighbors distance ratio open-set classifier. Mach. Learn. 106(3), 359–386 (2017)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at International Conference on Learning Representations (ICLR 2013) (2013)Google Scholar
  13. 13.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)Google Scholar
  14. 14.
    Oliphant, T.: NumPy: A Guide to NumPy. Trelgol Publishing (2006). http://www.numpy.org/
  15. 15.
    Pedregos, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  16. 16.
    Pizarro, J.: Using n-grams to detect bots on Twitter: notebook for PAN at CLEF 2019. In: Cappellato et al. [2]Google Scholar
  17. 17.
    Potthast, M., et al.: Who wrote the web? Revisiting influential author identification research applicable to information retrieval. In: Ferro, N., et al. (eds.) ECIR 2016. LNCS, vol. 9626, pp. 393–407. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-30671-1_29CrossRefGoogle Scholar
  18. 18.
    Potthast, M., Gollub, T., Wiegmann, M., Stein, B.: TIRA integrated research architecture. In: Ferro, N., Peters, C. (eds.) Information Retrieval Evaluation in a Changing World - Lessons Learned from 20 Years of CLEF. Springer, Heidelberg (2019)Google Scholar
  19. 19.
    Potthast, M., Rosso, P., Stamatatos, E., Stein, B.: A decade of shared tasks in digital text forensics at PAN. In: Azzopardi, L., Stein, B., Fuhr, N., Mayr, P., Hauff, C., Hiemstra, D. (eds.) ECIR 2019. LNCS, vol. 11438, pp. 291–300. Springer, Cham (2019).  https://doi.org/10.1007/978-3-030-15719-7_39CrossRefGoogle Scholar
  20. 20.
    Rangel, F., Celli, F., Rosso, P., Potthast, M., Stein, B., Daelemans, W.: Overview of the 3rd author profiling task at PAN 2015. In: Cappellato, L., Ferro, N., Jones, G., San Juan, E. (eds.) CLEF 2015 Evaluation Labs and Workshop - Working Notes Papers, 8–11 September, Toulouse, France. CEUR-WS.org (2015)Google Scholar
  21. 21.
    Rangel, F., Rosso, P.: On the implications of the general data protection regulation on the organisation of evaluation tasks. Lang. Law= Linguagem e Direito 5(2), 95–117 (2018)Google Scholar
  22. 22.
    Rangel, F., Rosso, P.: Overview of the 7th author profiling task at PAN 2019: bots and gender profiling. In: Cappellato et al. [2]Google Scholar
  23. 23.
    Rangel, F., et al.: Overview of the 2nd author profiling task at PAN 2014. In: Cappellato, L., Ferro, N., Halvey, M., Kraaij, W. (eds.) CLEF 2014 Evaluation Labs and Workshop - Working Notes Papers, 15–18 September, Sheffield, UK. CEUR-WS.org (2014)Google Scholar
  24. 24.
    Rangel, F., Franco-Salvador, M., Rosso, P.: A low dimensionality representation for language variety identification. In: Gelbukh, A. (ed.) CICLing 2016. LNCS, vol. 9624, pp. 156–169. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-75487-1_13CrossRefGoogle Scholar
  25. 25.
    Rangel, F., Rosso, P., G’omez, M.M., Potthast, M., Stein, B.: Overview of the 6th author profiling task at PAN 2018: multimodal gender identification in Twitter. In: CLEF 2018 Labs and Workshops, Notebook Papers. CEUR Workshop Proceedings. CEUR-WS.org (2017)Google Scholar
  26. 26.
    Rangel, F., Rosso, P., Koppel, M., Stamatatos, E., Inches, G.: Overview of the author profiling task at PAN 2013. In: Forner, P., Navigli, R., Tufis, D. (eds.) CLEF 2013 Evaluation Labs and Workshop - Working Notes Papers, 23–26 September, Valencia, Spain, September 2013Google Scholar
  27. 27.
    Rangel, F., Rosso, P., Potthast, M., Stein, B.: Overview of the 5th author profiling task at PAN 2017: gender and language variety identification in Twitter. In: Cappellato, L., Ferro, N., Goeuriot, L., Mandl, T. (eds.) Working Notes Papers of the CLEF 2017 Evaluation Labs. CEUR Workshop Proceedings, CLEF and CEUR-WS.org, September 2017Google Scholar
  28. 28.
    Rangel, F., Rosso, P., Verhoeven, B., Daelemans, W., Potthast, M., Stein, B.: Overview of the 4th author profiling task at PAN 2016: cross-genre evaluations. In: Balog, K., Cappellato, L., Ferro, N., Macdonald, C. (eds.) CLEF 2016 Labs and Workshops, Notebook Papers. CEUR Workshop Proceedings. CEUR-WS.org., September 2016Google Scholar
  29. 29.
    Rosso, P., Rangel, F., Potthast, M., Stamatatos, E., Tschuggnall, M., Stein, B.: Overview of PAN’16: new challenges for authorship analysis: cross-genre profiling, clustering, diarization, and obfuscation. In: Fuhr, N., et al. (eds.) CLEF 2016. LNCS, vol. 9822, pp. 332–350. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-44564-9_28CrossRefGoogle Scholar
  30. 30.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)CrossRefGoogle Scholar
  31. 31.
    Stamatatos, E.: A survey of modern authorship attribution methods. J. Am. Soc. Inf. Sci. Technol. 60, 538–556 (2009)CrossRefGoogle Scholar
  32. 32.
    Teahan, W.J., Harper, D.J.: Using compression-based language models for text categorization. In: Croft, W.B., Lafferty, J. (eds.) Language Modeling for Information Retrieval. INRE, vol. 13, pp. 141–165. Springer, Dordrecht (2003).  https://doi.org/10.1007/978-94-017-0171-6_7CrossRefzbMATHGoogle Scholar
  33. 33.
    Tschuggnall, M., et al.: Overview of the author identification task at PAN-2017: style breach detection and author clustering. In: Cappellato, L. et al. (eds.) Working Notes Papers of the CLEF 2017 Evaluation Labs, pp. 1–22 (2017)Google Scholar
  34. 34.
    Wiegmann, M., Stein, B., Potthast, M.: Celebrity profiling. In: 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019). Association for Computational Linguistics, July 2019Google Scholar
  35. 35.
    Wiegmann, M., Stein, B., Potthast, M.: Overview of the celebrity profiling task at PAN 2019. In: Cappellato et al. [2]Google Scholar
  36. 36.
    Zangerle, E., Tschuggnall, M., Specht, G., Potthast, M., Stein, B.: Overview of the style change detection task at PAN 2019. In: Cappellato et al. [2]Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Walter Daelemans
    • 1
  • Mike Kestemont
    • 1
  • Enrique Manjavacas
    • 1
  • Martin Potthast
    • 2
    Email author
  • Francisco Rangel
    • 3
  • Paolo Rosso
    • 4
  • Günther Specht
    • 5
  • Efstathios Stamatatos
    • 6
  • Benno Stein
    • 7
  • Michael Tschuggnall
    • 5
  • Matti Wiegmann
    • 7
  • Eva Zangerle
    • 5
  1. 1.University of AntwerpAntwerpBelgium
  2. 2.Leipzig UniversityLeipzigGermany
  3. 3.Autoritas ConsultingValenciaSpain
  4. 4.Universitat Politècnica de ValènciaValenciaSpain
  5. 5.University of InnsbruckInnsbruckAustria
  6. 6.University of the AegeanSamosGreece
  7. 7.Bauhaus-Universität WeimarWeimarGermany

Personalised recommendations