Skip to main content

A Decade of Shared Tasks in Digital Text Forensics at PAN

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11438))

Abstract

Digital text forensics aims at examining the originality and credibility of information in electronic documents and, in this regard, to extract and analyze information about the authors of these documents. The research field has been substantially developed during the last decade. PAN is a series of shared tasks that started in 2009 and significantly contributed to attract the attention of the research community in well-defined digital text forensics tasks. Several benchmark datasets have been developed to assess the state-of-the-art performance in a wide range of tasks. In this paper, we present the evolution of both the examined tasks and the developed datasets during the last decade. We also briefly introduce the upcoming PAN 2019 shared tasks.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    The acronym originates from the title of the first PAN workshop held at SIGIR-2007: Plagiarism analysis, Authorship identification, and Near-duplicate detection [36].

References

  1. FIRE 2015 Working Notes Papers, 4–6 December, Gandhinagar, India (2015). http://www.uni-weimar.de/medien/webis/events/pan-at-fire-15

  2. FIRE 2017 Working Notes Papers, 8–11 December, Bangalore, India (2017)

    Google Scholar 

  3. Amigó, E., et al.: Overview of RepLab 2014: author profiling and reputation dimensions for online reputation management. In: Kanoulas, E., et al. (eds.) CLEF 2014. LNCS, vol. 8685, pp. 307–322. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11382-1_24

    Chapter  Google Scholar 

  4. Argamon, S., Juola, P.: Overview of the international authorship identification competition at PAN-2011. In: Petras, V., Forner, P., Clough, P. (eds.) Notebook Papers of CLEF 2011 Labs and Workshops, 19–22 September, Amsterdam, Netherlands (2011). http://www.clef-initiative.eu/publication/working-notes

  5. Argamon, S., Koppel, M., Fine, J., Shimoni, A.R.: Gender, genre, and writing style in formal written texts. TEXT 23, 321–346 (2003)

    Article  Google Scholar 

  6. Asghari, H., Mohtaj, S., Fatemi, O., Faili, H., Rosso, P., Potthast, M.: Algorithms and corpora for Persian plagiarism detection. In: Majumder, P., Mitra, M., Mehta, P., Sankhavara, J. (eds.) FIRE 2016. LNCS, vol. 10478, pp. 61–79. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73606-8_5

    Chapter  Google Scholar 

  7. Bagnall, D.: Authorship clustering using multi-headed recurrent neural networks-notebook for PAN at CLEF 2016. In: Balog, K., Cappellato, L., Ferro, N., Macdonald, C. (eds.) CLEF 2016 Evaluation Labs and Workshop - Working Notes Papers, 5–8 September, Évora, Portugal. CEUR Workshop Proceedings, CEUR-WS.org, September 2016. http://ceur-ws.org/Vol-1609/

  8. Bensalem, I., Boukhalfa, I., Rosso, P., Abouenour, L., Darwish, K., Chikhi, S.: Overview of the AraPlagDet PAN@FIRE2015 shared task on Arabic plagiarism detection. In: FIRE 2015 Working Notes Papers, 4–6 December, Gandhinagar, India [1]

    Google Scholar 

  9. Flores, E., Rosso, P., Moreno, L., Villatoro-Tello, E.: On the detection of SOurce COde re-use. In: FIRE 2014 Working Notes Papers, 5–7 December, Bangalore, India, pp. 21–30, December 2014

    Google Scholar 

  10. Flores, E., Rosso, P., Villatoro-Tello, E., Moreno, L., Alcover, R., Chirivella, V.: PAN@FIRE: Overview of CL-SOCO track on the detection of cross-language SOurce COde re-use. In: FIRE 2015 Working Notes Papers, 4–6 December, Gandhinagar, India, pp. 1–5 [1]

    Google Scholar 

  11. Gollub, T., et al.: Recent trends in digital text forensics and its evaluation. In: Forner, P., Müller, H., Paredes, R., Rosso, P., Stein, B. (eds.) CLEF 2013. LNCS, vol. 8138, pp. 282–302. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40802-1_28

    Chapter  Google Scholar 

  12. Halvani, O., Graner, L., Vogel, I.: Authorship verification in the absence of explicit features and thresholds. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds.) ECIR 2018. LNCS, vol. 10772, pp. 454–465. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-76941-7_34

    Chapter  Google Scholar 

  13. Holmes, J., Meyerhoff, M.: The Handbook of Language and Gender. Blackwell Handbooks in Linguistics. Wiley, Hoboken (2003)

    Book  Google Scholar 

  14. Inches, G., Crestani, F.: Overview of the international sexual predator identification competition at PAN-2012. In: Forner, P., Karlgren, J., Womser-Hacker, C. (eds.) CLEF 2012 Evaluation Labs and Workshop - Working Notes Papers, 17–20 September, Rome, Italy (2012). http://www.clef-initiative.eu/publication/working-notes

  15. Juola, P.: An overview of the traditional authorship attribution subtask. In: Forner, P., Karlgren, J., Womser-Hacker, C. (eds.) CLEF 2012 Evaluation Labs and Workshop - Working Notes Papers, 17–20 September, Rome, Italy (2012). http://www.clef-initiative.eu/publication/working-notes

  16. Koppel, M., Argamon, S., Shimoni, A.R.: Automatically categorizing written texts by author gender (2003)

    Google Scholar 

  17. Koppel, M., Schler, J., Argamon, S., Winter, Y.: The “fundamental problem” of authorship attribution. Engl. Stud. 93(3), 284–291 (2012)

    Article  Google Scholar 

  18. Litvinova, T., Rangel, F., Rosso, P., Seredin, P., Litvinova, O.: Overview of the RusProfiling PAN at FIRE track on cross-genre gender identification in Russian. In: FIRE 2017 Working Notes Papers, 8–11 December, Bangalore, India [2]

    Google Scholar 

  19. Anand Kumar, M., Barathi Ganesh, H.B., Singh, S., Soman, K.P., Rosso, P.: Overview of the INLI PAN at FIRE-2017 track on Indian native language identification. In: FIRE 2017 Working Notes Papers, 8–11 December, Bangalore, India [2]

    Google Scholar 

  20. Pennebaker, J.W.: The Secret Life of Pronouns: What Our Words Say About Us. Bloomsbury, USA (2013)

    Google Scholar 

  21. Potthast, M., Barrón-Cedeño, A., Eiselt, A., Stein, B., Rosso, P.: Overview of the 2nd international competition on plagiarism detection. In: Braschler, M., Harman, D., Pianta, E. (eds.) Working Notes Papers of the CLEF 2010 Evaluation Labs, September 2010. http://www.clef-initiative.eu/publication/working-notes

  22. Potthast, M., et al.: Who wrote the web? Revisiting influential author identification research applicable to information retrieval. In: Ferro, N., et al. (eds.) ECIR 2016. LNCS, vol. 9626, pp. 393–407. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-30671-1_29

    Chapter  Google Scholar 

  23. Potthast, M., Eiselt, A., Barrón-Cedeño, A., Stein, B., Rosso, P.: Overview of the 3rd international competition on plagiarism detection. In: Notebook Papers of the 5th Evaluation Lab on Uncovering Plagiarism, Authorship and Social Software Misuse (PAN), Amsterdam, The Netherlands, September 2011

    Google Scholar 

  24. Potthast, M., et al.: Overview of the 4th international competition on plagiarism detection. In: Forner, P., Karlgren, J., Womser-Hacker, C. (eds.) Working Notes Papers of the CLEF 2012 Evaluation Labs, September 2012. http://www.clef-initiative.eu/publication/working-notes

  25. Potthast, M., et al.: Overview of the 5th international competition on plagiarism detection. In: Forner, P., Navigli, R., Tufis, D. (eds.) Working Notes Papers of the CLEF 2013 Evaluation Labs, September 2013. http://www.clef-initiative.eu/publication/working-notes

  26. Potthast, M., Gollub, T., Rangel, F., Rosso, P., Stamatatos, E., Stein, B.: Improving the reproducibility of PAN’s shared tasks: plagiarism detection, author identification, and author profiling. In: Kanoulas, E., et al. (eds.) CLEF 2014. LNCS, vol. 8685, pp. 268–299. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11382-1_22

    Chapter  Google Scholar 

  27. Potthast, M., et al.: Overview of the 6th international competition on plagiarism detection. In: Cappellato, L., Ferro, N., Halvey, M., Kraaij, W. (eds.) Working Notes Papers of the CLEF 2014 Evaluation Labs. CEUR Workshop Proceedings, CLEF and CEUR-WS.org, September 2014. http://www.clef-initiative.eu/publication/working-notes

  28. Potthast, M., Rangel, F., Tschuggnall, M., Stamatatos, E., Rosso, P., Stein, B.: Overview of PAN’17: author identification, author profiling, and author obfuscation. In: Jones, G.J.F., et al. (eds.) CLEF 2017. LNCS, vol. 10456, pp. 275–290. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65813-1_25

    Chapter  Google Scholar 

  29. Potthast, M., Stein, B., Anderka, M.: A Wikipedia-based multilingual retrieval model. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 522–530. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78646-7_51

    Chapter  Google Scholar 

  30. Potthast, M., Stein, B., Eiselt, A., Barrón-Cedeño, A., Rosso, P.: Overview of the 1st international competition on plagiarism detection. In: Stein, B., Rosso, P., Stamatatos, E., Koppel, M., Agirre, E. (eds.) SEPLN 2009 Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 2009), pp. 1–9. CEUR-WS.org, September 2009. http://ceur-ws.org/Vol-502

  31. Rosso, P., Rangel, F., Potthast, M., Stamatatos, E., Tschuggnall, M., Stein, B.: Overview of PAN’16: new challenges for authorship analysis: cross-genre profiling, clustering, diarization, and obfuscation. In: Fuhr, N., et al. (eds.) CLEF 2016. LNCS, vol. 9822, pp. 332–350. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44564-9_28

    Chapter  Google Scholar 

  32. Schler, J., Koppel, M., Argamon, S., Pennebaker, J.W.: Effects of age and gender on blogging. In: AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs, pp. 199–205. AAAI (2006)

    Google Scholar 

  33. Stamatatos, E.: A survey of modern authorship attribution methods. J. Am. Soc. Inf. Sci. Technol. 60, 538–556 (2009)

    Article  Google Scholar 

  34. Stamatatos, E., Potthast, M., Rangel, F., Rosso, P., Stein, B.: Overview of the PAN/CLEF 2015 evaluation lab. In: Mothe, J., et al. (eds.) CLEF 2015. LNCS, vol. 9283, pp. 518–538. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24027-5_49

    Chapter  Google Scholar 

  35. Stamatatos, E., et al.: Overview of PAN 2018: author identification, author profiling, and author obfuscation. In: Bellot, P., et al. (eds.) CLEF 2018. LNCS, vol. 11018, pp. 267–285. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-98932-7_25

    Chapter  Google Scholar 

  36. Stein, B., Koppel, M., Stamatatos, E. (eds.): SIGIR 2007 Workshop on Plagiarism Analysis, Authorship Identification, and Near-Duplicate Detection (PAN 2007). CEUR-WS.org (2007). http://www.uni-weimar.de/medien/webis/events/pan-07

  37. Stein, B., Lipka, N., Prettenhofer, P.: Intrinsic plagiarism analysis. Lang. Resour. Eval. (LRE) 45(1), 63–82 (2011)

    Article  Google Scholar 

  38. Stein, B., Rosso, P., Stamatatos, E., Koppel, M., Agirre, E. (eds.): SEPLN 2009 Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 2009). Universidad Politécnica de Valencia and CEUR-WS.org (2009). http://ceur-ws.org/Vol-502

Download references

Acknowledgements

We are indebted to many colleagues and friends who contributed greatly to PAN’s tasks: Maik Anderka, Shlomo Argamon, Alberto Barrón-Cedeño, Fabio Celli, Fabio Crestani, Walter Daelemans, Andreas Eiselt, Tim Gollub, Parth Gupta, Matthias Hagen, Teresa Holfeld, Patrick Juola, Giacomo Inches, Mike Kestemont, Moshe Koppel, Manuel Montes-y-Gómez, Aurelio Lopez-Lopez, Francisco Rangel, Miguel Angel Sánchez-Pérez, Günther Specht, Michael Tschuggnall, and Ben Verhoeven. Our special thanks go to PAN’s sponsors throughout the years and not least to the hundreds of participants.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Efstathios Stamatatos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Potthast, M., Rosso, P., Stamatatos, E., Stein, B. (2019). A Decade of Shared Tasks in Digital Text Forensics at PAN. In: Azzopardi, L., Stein, B., Fuhr, N., Mayr, P., Hauff, C., Hiemstra, D. (eds) Advances in Information Retrieval. ECIR 2019. Lecture Notes in Computer Science(), vol 11438. Springer, Cham. https://doi.org/10.1007/978-3-030-15719-7_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-15719-7_39

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-15718-0

  • Online ISBN: 978-3-030-15719-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics