Advertisement

A Decade of Shared Tasks in Digital Text Forensics at PAN

  • Martin Potthast
  • Paolo Rosso
  • Efstathios StamatatosEmail author
  • Benno Stein
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11438)

Abstract

Digital text forensics aims at examining the originality and credibility of information in electronic documents and, in this regard, to extract and analyze information about the authors of these documents. The research field has been substantially developed during the last decade. PAN is a series of shared tasks that started in 2009 and significantly contributed to attract the attention of the research community in well-defined digital text forensics tasks. Several benchmark datasets have been developed to assess the state-of-the-art performance in a wide range of tasks. In this paper, we present the evolution of both the examined tasks and the developed datasets during the last decade. We also briefly introduce the upcoming PAN 2019 shared tasks.

Notes

Acknowledgements

We are indebted to many colleagues and friends who contributed greatly to PAN’s tasks: Maik Anderka, Shlomo Argamon, Alberto Barrón-Cedeño, Fabio Celli, Fabio Crestani, Walter Daelemans, Andreas Eiselt, Tim Gollub, Parth Gupta, Matthias Hagen, Teresa Holfeld, Patrick Juola, Giacomo Inches, Mike Kestemont, Moshe Koppel, Manuel Montes-y-Gómez, Aurelio Lopez-Lopez, Francisco Rangel, Miguel Angel Sánchez-Pérez, Günther Specht, Michael Tschuggnall, and Ben Verhoeven. Our special thanks go to PAN’s sponsors throughout the years and not least to the hundreds of participants.

References

  1. 1.
    FIRE 2015 Working Notes Papers, 4–6 December, Gandhinagar, India (2015). http://www.uni-weimar.de/medien/webis/events/pan-at-fire-15
  2. 2.
    FIRE 2017 Working Notes Papers, 8–11 December, Bangalore, India (2017)Google Scholar
  3. 3.
    Amigó, E., et al.: Overview of RepLab 2014: author profiling and reputation dimensions for online reputation management. In: Kanoulas, E., et al. (eds.) CLEF 2014. LNCS, vol. 8685, pp. 307–322. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-11382-1_24CrossRefGoogle Scholar
  4. 4.
    Argamon, S., Juola, P.: Overview of the international authorship identification competition at PAN-2011. In: Petras, V., Forner, P., Clough, P. (eds.) Notebook Papers of CLEF 2011 Labs and Workshops, 19–22 September, Amsterdam, Netherlands (2011). http://www.clef-initiative.eu/publication/working-notes
  5. 5.
    Argamon, S., Koppel, M., Fine, J., Shimoni, A.R.: Gender, genre, and writing style in formal written texts. TEXT 23, 321–346 (2003)CrossRefGoogle Scholar
  6. 6.
    Asghari, H., Mohtaj, S., Fatemi, O., Faili, H., Rosso, P., Potthast, M.: Algorithms and corpora for Persian plagiarism detection. In: Majumder, P., Mitra, M., Mehta, P., Sankhavara, J. (eds.) FIRE 2016. LNCS, vol. 10478, pp. 61–79. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-73606-8_5CrossRefGoogle Scholar
  7. 7.
    Bagnall, D.: Authorship clustering using multi-headed recurrent neural networks-notebook for PAN at CLEF 2016. In: Balog, K., Cappellato, L., Ferro, N., Macdonald, C. (eds.) CLEF 2016 Evaluation Labs and Workshop - Working Notes Papers, 5–8 September, Évora, Portugal. CEUR Workshop Proceedings, CEUR-WS.org, September 2016. http://ceur-ws.org/Vol-1609/
  8. 8.
    Bensalem, I., Boukhalfa, I., Rosso, P., Abouenour, L., Darwish, K., Chikhi, S.: Overview of the AraPlagDet PAN@FIRE2015 shared task on Arabic plagiarism detection. In: FIRE 2015 Working Notes Papers, 4–6 December, Gandhinagar, India [1]Google Scholar
  9. 9.
    Flores, E., Rosso, P., Moreno, L., Villatoro-Tello, E.: On the detection of SOurce COde re-use. In: FIRE 2014 Working Notes Papers, 5–7 December, Bangalore, India, pp. 21–30, December 2014Google Scholar
  10. 10.
    Flores, E., Rosso, P., Villatoro-Tello, E., Moreno, L., Alcover, R., Chirivella, V.: PAN@FIRE: Overview of CL-SOCO track on the detection of cross-language SOurce COde re-use. In: FIRE 2015 Working Notes Papers, 4–6 December, Gandhinagar, India, pp. 1–5 [1]Google Scholar
  11. 11.
    Gollub, T., et al.: Recent trends in digital text forensics and its evaluation. In: Forner, P., Müller, H., Paredes, R., Rosso, P., Stein, B. (eds.) CLEF 2013. LNCS, vol. 8138, pp. 282–302. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-40802-1_28CrossRefGoogle Scholar
  12. 12.
    Halvani, O., Graner, L., Vogel, I.: Authorship verification in the absence of explicit features and thresholds. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) ECIR 2018. LNCS, vol. 10772, pp. 454–465. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-76941-7_34CrossRefGoogle Scholar
  13. 13.
    Holmes, J., Meyerhoff, M.: The Handbook of Language and Gender. Blackwell Handbooks in Linguistics. Wiley, Hoboken (2003)CrossRefGoogle Scholar
  14. 14.
    Inches, G., Crestani, F.: Overview of the international sexual predator identification competition at PAN-2012. In: Forner, P., Karlgren, J., Womser-Hacker, C. (eds.) CLEF 2012 Evaluation Labs and Workshop - Working Notes Papers, 17–20 September, Rome, Italy (2012). http://www.clef-initiative.eu/publication/working-notes
  15. 15.
    Juola, P.: An overview of the traditional authorship attribution subtask. In: Forner, P., Karlgren, J., Womser-Hacker, C. (eds.) CLEF 2012 Evaluation Labs and Workshop - Working Notes Papers, 17–20 September, Rome, Italy (2012). http://www.clef-initiative.eu/publication/working-notes
  16. 16.
    Koppel, M., Argamon, S., Shimoni, A.R.: Automatically categorizing written texts by author gender (2003)Google Scholar
  17. 17.
    Koppel, M., Schler, J., Argamon, S., Winter, Y.: The “fundamental problem” of authorship attribution. Engl. Stud. 93(3), 284–291 (2012)CrossRefGoogle Scholar
  18. 18.
    Litvinova, T., Rangel, F., Rosso, P., Seredin, P., Litvinova, O.: Overview of the RusProfiling PAN at FIRE track on cross-genre gender identification in Russian. In: FIRE 2017 Working Notes Papers, 8–11 December, Bangalore, India [2]Google Scholar
  19. 19.
    Anand Kumar, M., Barathi Ganesh, H.B., Singh, S., Soman, K.P., Rosso, P.: Overview of the INLI PAN at FIRE-2017 track on Indian native language identification. In: FIRE 2017 Working Notes Papers, 8–11 December, Bangalore, India [2]Google Scholar
  20. 20.
    Pennebaker, J.W.: The Secret Life of Pronouns: What Our Words Say About Us. Bloomsbury, USA (2013)Google Scholar
  21. 21.
    Potthast, M., Barrón-Cedeño, A., Eiselt, A., Stein, B., Rosso, P.: Overview of the 2nd international competition on plagiarism detection. In: Braschler, M., Harman, D., Pianta, E. (eds.) Working Notes Papers of the CLEF 2010 Evaluation Labs, September 2010. http://www.clef-initiative.eu/publication/working-notes
  22. 22.
    Potthast, M., et al.: Who wrote the web? Revisiting influential author identification research applicable to information retrieval. In: Ferro, N., et al. (eds.) ECIR 2016. LNCS, vol. 9626, pp. 393–407. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-30671-1_29CrossRefGoogle Scholar
  23. 23.
    Potthast, M., Eiselt, A., Barrón-Cedeño, A., Stein, B., Rosso, P.: Overview of the 3rd international competition on plagiarism detection. In: Notebook Papers of the 5th Evaluation Lab on Uncovering Plagiarism, Authorship and Social Software Misuse (PAN), Amsterdam, The Netherlands, September 2011Google Scholar
  24. 24.
    Potthast, M., et al.: Overview of the 4th international competition on plagiarism detection. In: Forner, P., Karlgren, J., Womser-Hacker, C. (eds.) Working Notes Papers of the CLEF 2012 Evaluation Labs, September 2012. http://www.clef-initiative.eu/publication/working-notes
  25. 25.
    Potthast, M., et al.: Overview of the 5th international competition on plagiarism detection. In: Forner, P., Navigli, R., Tufis, D. (eds.) Working Notes Papers of the CLEF 2013 Evaluation Labs, September 2013. http://www.clef-initiative.eu/publication/working-notes
  26. 26.
    Potthast, M., Gollub, T., Rangel, F., Rosso, P., Stamatatos, E., Stein, B.: Improving the reproducibility of PAN’s shared tasks: plagiarism detection, author identification, and author profiling. In: Kanoulas, E., et al. (eds.) CLEF 2014. LNCS, vol. 8685, pp. 268–299. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-11382-1_22CrossRefGoogle Scholar
  27. 27.
    Potthast, M., et al.: Overview of the 6th international competition on plagiarism detection. In: Cappellato, L., Ferro, N., Halvey, M., Kraaij, W. (eds.) Working Notes Papers of the CLEF 2014 Evaluation Labs. CEUR Workshop Proceedings, CLEF and CEUR-WS.org, September 2014. http://www.clef-initiative.eu/publication/working-notes
  28. 28.
    Potthast, M., Rangel, F., Tschuggnall, M., Stamatatos, E., Rosso, P., Stein, B.: Overview of PAN’17: author identification, author profiling, and author obfuscation. In: Jones, G.J.F., et al. (eds.) CLEF 2017. LNCS, vol. 10456, pp. 275–290. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-65813-1_25CrossRefGoogle Scholar
  29. 29.
    Potthast, M., Stein, B., Anderka, M.: A Wikipedia-based multilingual retrieval model. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 522–530. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-78646-7_51CrossRefGoogle Scholar
  30. 30.
    Potthast, M., Stein, B., Eiselt, A., Barrón-Cedeño, A., Rosso, P.: Overview of the 1st international competition on plagiarism detection. In: Stein, B., Rosso, P., Stamatatos, E., Koppel, M., Agirre, E. (eds.) SEPLN 2009 Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 2009), pp. 1–9. CEUR-WS.org, September 2009. http://ceur-ws.org/Vol-502
  31. 31.
    Rosso, P., Rangel, F., Potthast, M., Stamatatos, E., Tschuggnall, M., Stein, B.: Overview of PAN’16: new challenges for authorship analysis: cross-genre profiling, clustering, diarization, and obfuscation. In: Fuhr, N., et al. (eds.) CLEF 2016. LNCS, vol. 9822, pp. 332–350. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-44564-9_28CrossRefGoogle Scholar
  32. 32.
    Schler, J., Koppel, M., Argamon, S., Pennebaker, J.W.: Effects of age and gender on blogging. In: AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs, pp. 199–205. AAAI (2006)Google Scholar
  33. 33.
    Stamatatos, E.: A survey of modern authorship attribution methods. J. Am. Soc. Inf. Sci. Technol. 60, 538–556 (2009)CrossRefGoogle Scholar
  34. 34.
    Stamatatos, E., Potthast, M., Rangel, F., Rosso, P., Stein, B.: Overview of the PAN/CLEF 2015 evaluation lab. In: Mothe, J., et al. (eds.) CLEF 2015. LNCS, vol. 9283, pp. 518–538. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-24027-5_49CrossRefGoogle Scholar
  35. 35.
    Stamatatos, E., et al.: Overview of PAN 2018: author identification, author profiling, and author obfuscation. In: Bellot, P., et al. (eds.) CLEF 2018. LNCS, vol. 11018, pp. 267–285. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-98932-7_25CrossRefGoogle Scholar
  36. 36.
    Stein, B., Koppel, M., Stamatatos, E. (eds.): SIGIR 2007 Workshop on Plagiarism Analysis, Authorship Identification, and Near-Duplicate Detection (PAN 2007). CEUR-WS.org (2007). http://www.uni-weimar.de/medien/webis/events/pan-07
  37. 37.
    Stein, B., Lipka, N., Prettenhofer, P.: Intrinsic plagiarism analysis. Lang. Resour. Eval. (LRE) 45(1), 63–82 (2011)CrossRefGoogle Scholar
  38. 38.
    Stein, B., Rosso, P., Stamatatos, E., Koppel, M., Agirre, E. (eds.): SEPLN 2009 Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 2009). Universidad Politécnica de Valencia and CEUR-WS.org (2009). http://ceur-ws.org/Vol-502

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Martin Potthast
    • 1
  • Paolo Rosso
    • 2
  • Efstathios Stamatatos
    • 3
    Email author
  • Benno Stein
    • 4
  1. 1.Department of Computer ScienceLeipzig UniversityLeipzigGermany
  2. 2.PRHLT Research CenterUniversitat Politècnica de ValènciaValenciaSpain
  3. 3.Department of Information and Communication Systems EngineeringUniversity of the AegeanSamosGreece
  4. 4.Web Technology and Information SystemsBauhaus-Universität WeimarWeimarGermany

Personalised recommendations