Skip to main content
Log in

Author Profiling Tracks at FIRE

  • Survey Article
  • Published:
SN Computer Science Aims and scope Submit manuscript

A Publisher Correction to this article was published on 28 September 2023

This article has been updated

Abstract

Benchmarking activities are vital for fostering research and addressing new challenging problems. During the last 10 years of the FIRE initiative, we have been involved in the organization of more than ten tracks, with the aim of the creation of new resources in several languages that were made available to the research community. This allowed to compare the new several approaches on the same datasets. In this chapter, we will focus on the description of three author profiling tracks, on their data creation as well as the result analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Change history

Notes

  1. Lab on on digital text forensics and stylometry https://pan.webis.de.

  2. http://www.clef-initiative.eu.

  3. http://www.folha.uol.com.br.

  4. http://www.dn.pt.

  5. https://github.com/autoritas/RD-Lab/tree/master/data/HispaBlogs.

  6. http://www.liwc.net/.

  7. http://www.psych.rl.ac.uk/.

  8. https://www.inf.uni-hamburg.de/en/inst/ab/lt/resources/data/germeval-2020-psychopred.html.

  9. As suggested by Carol Peters at CLEF 2010 in Padua.

  10. https://www.autoritas.net/APDA/.

  11. The annotation was carried out manually by human annotators. The detailed methodology is described in the overview paper.

  12. The annotation of gender was carried out manually by human annotators. The detailed methodology is described in the overview paper.

  13. https://ru.trustpilot.com/.

References

  1. Al Sukhni E, Alequr Q. Investigating the use of machine learning algorithms in detecting gender of the Arabic tweet author. Int J Adv Comput Sci Appl. 2016;1(7):319–28.

    Google Scholar 

  2. Alsmearat K, Al-Ayyoub M, Al-Shalabi R. An extensive study of the bag-of-words approach for gender identification of Arabic articles. In: 2014 IEEE/ACS 11th international conference on computer systems and applications (AICCSA). 2014. pp 601–608. IEEE.

  3. Alsmearat K, Shehab M, Al-Ayyoub M, Al-Shalabi R, Kanaan G. Emotion analysis of Arabic articles and its impact on identifying the authors gender. In: 12th international conference on computer systems and applications (AICCSA), 2015 IEEE/ACS; 2015.

  4. Álvarez-Carmona MA, López-Monroy AP, Montes-Y-Gómez M, Villaseñor-Pineda L, Jair-Escalante H. Inaoe’s participation at pan’15: author profiling task—notebook for pan at clef 2015; 2015.

  5. Argamon S, Koppel M, Fine J, Shimoni AR. Gender, genre, and writing style in formal written texts. TEXT. 2003;23:321–46.

    Google Scholar 

  6. Argamon S, Dhawle S, Koppel M, Pennebaker JW. Lexical predictors of personality type. In: Proceedings of the joint annual meeting of the interface and the classification society of North America; 2005.

  7. Asghari H, Mohtaj S, Fatemi O, Faili H, Rosso P, Potthast M. Algorithms and corpora for Persian plagiarism detection: overview of pan at fire 2016. In: Notebook Papers of FIRE 2016, FIRE-2016, Kolkata, India, December 7–10, CEUR Workshop Proceedings. CEUR-WS.org, vol 1737; 2016. pp 135–144.

  8. Bachrach Y, Kosinski M, Graepel T, Kohli P, Stillwell D. Personality and patterns of Facebook usage. In: Proceedings of the ACM web science conference. ACM New York, NY, USA; 2012. pp 36–44.

  9. Banerjee S, Chakma K, Naskar DA Sudip, Rosso P, Bandyopadhyay S, Choudhury M. Overview of the mixed script information retrieval (MSIR) at fire-2016. In: Notebook papers of FIRE 2016, FIRE-2016, Kolkata, India, December 7–10, CEUR workshop proceedings. CEUR-WS.org, vol 1737; 2016. pp 94–99.

  10. Barrón-Cedeño A, Rosso P, Lalitha-Devi S, Clough P, Stevenson M. Pan@fire: Overview of the cross-language !ndian text re-use detection competition. In: 2nd and 3th international workshops FIRE 2010 and 2011, multilingual information access in south Asian Languages, Springer, LNCS(7536); 2013. pp 59–70.

  11. Bensalem I, Boukhalfa I, Rosso P, Abouenour L, Darwish K, Chikhi S. Overview of the araplagdet pan@ fire2015 shared task on Arabic plagiarism detection. In: Notebook papers of FIRE 2015, FIRE-2015, Gandhinagar, India, December 4–6, CEUR Workshop Proceedings. CEUR-WS.org, vol 1587; 2015. pp 111–122.

  12. Bishop-Clark C. Cognitive style, personality, and computer programming. Computers in human behavior, vol. 11–2. New York: Elsevier; 1995. p. 241–60.

    Google Scholar 

  13. Castro D, Souza E, de Oliveira AL. Discriminating between brazilian and european portuguese national varieties on twitter texts. In: 5th Brazilian conference on intelligent systems (BRACIS); 2016. pp 265–270.

  14. Celli F, Polonio L. Relationships between personality and interactions in Facebook. Social networking: recent trends, emerging issues and future outlook. New York: Nova Science Publishers Inc; 2013. p. 41–54.

    Google Scholar 

  15. Celli F, Lepri B, Biel JI, Gatica-Perez D, Riccardi G, Pianesi F. The workshop on computational personality recognition 2014. In: Proceedings of the ACM international conference on multimedia, ACM; 2014. pp 1245–1246.

  16. Costa PT, McCrae RR. The revised neo personality inventory (neo-pi-r). The SAGE handbook of personality theory and assessment, vol. 2. Thousand Oaks: Sage Publications Inc.; 2008. p. 179–98.

    Google Scholar 

  17. Elfardy H, Diab MT. Sentence level dialect identification in Arabic. In: Association for computational linguistics (ACL); 2013. pp 456–461.

  18. Estival D, Gaustad T, Hutchinson B, Bao-Pham S, Radford W. Author profiling for English and Arabic emails; 2008.

  19. Flores E, Rosso P, Moreno L, Villatoro-Tello E. Pan@fire: Overview of SOCO track on the detection of source code re-use. In: Notebook papers of FIRE, FIRE-2014. India: Bangalore; 2014.

  20. Flores E, Rosso P, Moreno L, Villatoro-Tello E. Pan@ fire 2015: Overview of cl-soco track on the detection of cross-language source code re-use. In: Proceedings of the seventh forum for information retrieval evaluation (FIRE 2015), Gandhinagar, India; 2015. pp 4–6.

  21. Franco-Salvador M, Rangel F, Rosso P, Taule M, Marti M. Language variety identification using distributed representations of words and documents. Experimental IR meets multilinguality, multimodality, and interaction. Berlin: Springer; 2015. p. 28–40.

    Chapter  Google Scholar 

  22. Golbeck J, Robles C, Turner K. Predicting personality with social media. In: CHI’11 extended abstracts on human factors in computing systems, ACM; 2011. pp 253–262.

  23. Gupta P, Clough P, Rosso P, Stevenson M. Pan@fire: Overview of the cross-language Indian news story search (CLINSS) track. In: Notebook papers of FIRE 2012, FIRE-2012, Kolkata, India, December 17–19; 2012.

  24. Gupta P, Clough P, Rosso P, Stevenson M, Banchs R. Pan@fire: Overview of the cross-language Indian news story search (CLINSS) track. In: Notebook Papers of FIRE 2013, FIRE-2013, Delhi, India, December 4–6; 2013.

  25. Holmes J, Meyerhoff M. The handbook of language and gender. Blackwell handbooks in linguistics. New York: Wiley; 2003.

    Book  Google Scholar 

  26. Huang C, Lee L. Contrastive approach towards text source classification based on top-bag-of-word similarity. In: In PACLIC; 2008. pp 404–410.

  27. Karimi Z, Baraani-Dastjerdi A, Ghasem-Aghaee N, Wagner S. Links between the personalities, styles and performance in computer programming. J Syst Softw. 2016;111:228–41.

    Article  Google Scholar 

  28. Koppel M, Argamon S, Shimoni AR. Automatically categorizing written texts by author gender. Lit Linguist Comput. 2002;17:4.

    Article  Google Scholar 

  29. Kosinski M, Bachrach Y, Kohli P, Stillwell D, Graepel T. Manifestations of user personality in website choice and behaviour on online social networks. New York: Springer; 2013. p. 1–24.

    Google Scholar 

  30. Litvinova T, Litvinlova O, Zagorovskaya O, Seredin P, Sboev A, Romanchenko O. “ruspersonality”: a Russian corpus for authorship profiling and deception detection. In: Intelligence, social media and web (ISMW FRUCT), 2016 international FRUCT conference on, IEEE; 2016. pp 1–7.

  31. Litvinova T, Seredin P, Litvinova O, Zagorovskaya O, Sboev A, Gudovskih D, Moloshnikov I, Rybka R. Gender prediction for authors of Russian texts using regression and classification techniques. In: CDUD@ CLA; 2016. pp 44–53.

  32. Litvinova T, Gudovskikh D, Sboev A, Seredin P, Litvinova O, Pisarevskaya D, Rosso P. Author gender prediction in Russian social media texts. In: Conference on analysis of images, social networks, and texts, AIST-2017, IEEE; 2017. pp 1101–1106.

  33. Litvinova T, Rangel F, Rosso P, Seredin P, Litvinova O. Overview of the rusprofiling pan at fire track on cross-genre gender identification in Russian. In: Notebook papers of FIRE 2017, FIRE-2017, Bangalore, India, December 8–11, CEUR Workshop Proceedings. CEUR-WS.org, vol 2036; 2017. pp 1–7.

  34. Lui M, Cook P. Classifying English documents by national dialect. In: Proceedings of the Australasian Language Technology Association Workshop; 2013. pp 5–15.

  35. Maharjan S, Shrestha P, Solorio T, Hasan R. A straightforward author profiling approach in mapreduce. In: Advances in artificial intelligence. Iberamia; 2014. pp 95–107.

  36. Maier W, Gomez-Rodriguez C. Language variety identification in Spanish tweets. In: LT4CloseLang 2014; 2014.

  37. Mairesse F, Walker MA, Mehl MR, Moore RK. Using linguistic cues for the automatic recognition of personality in conversation and text. J Artif Intell Res. 2007;30–1:457–500.

    Article  MATH  Google Scholar 

  38. Malmasi S, Zampieri M, Ljubešić N, Nakov P, Ali A, Tiedemann J. Discriminating between similar languages and Arabic dialect identification: a report on the third DSL shared task. In: Proceedings of the third workshop on NLP for similar languages, varieties and dialects (VarDial3); 2016. pp 1–14.

  39. Maulana Siagian AHA, Aritsugi M. Dbms-ku approach for author profiling and deception detection in Arabic. In: Metha P, Rosso P, Majumder P, Mitra M (Eds) Working notes of the forum for information retrieval evaluation (FIRE 2019). CEUR workshop proceedings. CEUR-WS.org, Kolkata, India, December 12–15; 2019.

  40. Neuman Y, Cohen Y. A vectorial semantics approach to personality assessment. Sci Rep. 2014;4:4761.

    Article  Google Scholar 

  41. Oberlander J, Nowson S. Whose thumb is it anyway?: classifying author personality from weblog text. In: Proceedings of the COLING/ACL on main conference poster sessions, Association for Computational Linguistics; 2006. pp 627–634.

  42. Paruma-Pabón OH, González FA, Aponte J, Camargo JE, Restrepo-Calle F. Finding relationships between socio-technical aspects and personality traits by mining developer e-mails. In: Proceedings of the 9th international workshop on cooperative and human aspects of software engineering, ACM; 2016. pp 8–14.

  43. Pennebaker JW, Mehl MR, Niederhoffer KG. Psychological aspects of natural language use: our words, our selves. Annu Rev Psychol. 2003;54(1):547–77.

    Article  Google Scholar 

  44. Quercia D, Lambiotte R, Stillwell D, Kosinski M, Crowcroft J. The personality of popular Facebook users. In: Proceedings of the ACM 2012 conference on computer supported cooperative Work, ACM; 2012. pp 955–964.

  45. Rangel F, Rosso P. On the multilingual and genre robustness of emographs for author profiling in social media. In: 6th international conference of CLEF on experimental IR meets multilinguality, multimodality, and interaction, Springer-Verlag, LNCS(9283); 2015. pp 274–280.

  46. Rangel F, Rosso P. On the impact of emotions on author profiling. Inf Process Manag. 2016;52(1):73–92.

    Article  Google Scholar 

  47. Rangel F, Rosso P. On the implications of the general data protection regulation on the organisation of evaluation tasks. Lang Law. 2019;5:95–117.

    Google Scholar 

  48. Rangel F, Rosso P. Overview of the 7th author profiling task at pan 2019: Bots and gender profiling. In: Cappellato L, Ferro N, MÃller H, Losada D (Eds) CLEF 2019 labs and workshops, notebook papers. CEUR Workshop Proceedings. CEUR-WS.org; 2019.

  49. Rangel F, Rosso P, Potthast M, Stein B, Daelemans W. Overview of the 3rd author profiling task at pan 2015. In: Cappellato L, Ferro N, Jones G, San Juan E (Eds) CLEF 2015 labs and workshops, notebook papers. CEUR Workshop Proceedings. CEUR-WS.org, vol. 1391; 2015.

  50. Rangel F, González F, Restrepo-Calle F, Montes M, Rosso P. Pan at fire: Overview of the PR-SOCO track on personality recognition in source code. In: Notebook papers of FIRE 2016, FIRE-2016, Kolkata, India, December 7–10, CEUR workshop proceedings. CEUR-WS.org, vol 1737; 2016. pp 1–5.

  51. Rangel F, Rosso P, Franco-Salvador M. A low dimensionality representation for language variety identification. In: 17th international conference on intelligent text processing and computational linguistics, CICLing. Springer; 2016. LNCS. arXiv:1705.10754

  52. Rangel F, Rosso P, Potthast M, Stein B. Overview of the 5th author profiling task at PAN 2017: gender and language variety identification in Twitter. In: Working notes papers of the CLEF 2017 evaluation labs, CLEF and CEUR-WS.org, CEUR workshop proceedings; 2017.

  53. Rangel F, Rosso P, Charfi A, Zaghouani W, Ghanem B, Sánchez-Junquera J. Overview of the track on author profiling and deception detection in Arabic. In: Metha P, Rosso P, Majumder P, Mitra M (Eds) Working notes of the forum for information retrieval evaluation (FIRE 2019). CEUR workshop proceedings. CEUR-WS.org, Kolkata, India, December 12–15; 2019.

  54. Rangel F, Paolo R, Zaghouani W, Charfi A. Fine-grained analysis of language varieties and demographics. Nat Lang Eng; 2020. (In Press).

  55. Rosso P, Rangel F, Hernández-Farías I, Cagnina L, Zaghouani W, Charfi A. A survey on author profiling, deception, and irony detection for the Arabic language. Lang Ling Compass. 2018;12:4.

    Article  Google Scholar 

  56. Sadat F, Kazemi F, Farzindar A. Automatic identification of Arabic language varieties and dialects in social media. In: Proceedings of SocialNLP; 2014. p 22.

  57. Schler J, Koppel M, Argamon S, Pennebaker JW. Effects of age and gender on blogging. In: AAAI spring symposium: computational approaches to analyzing weblogs, AAAI; 2006. pp 199–205.

  58. Schwartz HA, Eichstaedt JC, Kern ML, Dziurzynski L, Ramones SM, Agrawal M, Shah A, Kosinski M, Stillwell D, Seligman ME, et al. Personality, gender, and age in the language of social media: the open-vocabulary approach. PLoS One. 2013;8–9:773–91.

    Google Scholar 

  59. Sequiera R, Choudhury M, Gupta P, Rosso P, Kumar S, Banerjee S, Kumar-Naskar S, Bandyopadhyay S, Chittaranjan G, Das A, Chakma K. Overview of fire-2015 shared task on mixed script information retrieval. In: Notebook papers of FIRE 2015, FIRE-2015, Gandhinagar, India, December 4–6, CEUR workshop proceedings. CEUR-WS.org, vol 1587; 2015. pp 19–25.

  60. Sun Y, Ning H, Chen K, Kong L, Yang Y, Wang J, Qi H. Author profiling in arabic tweets:an approach based on multi-classification with word and character features. In: Metha P, Rosso P, Majumder P, Mitra M (eds) Working notes of the forum for information retrieval evaluation (FIRE 2019). CEUR workshop proceedings. CEUR-WS.org, Kolkata, India, December 12–15; 2019.

  61. Weren E, Kauer A, Mizusaki L, Moreira V, de Oliveira P, Wives L. Examining multiple features for author profiling. J Inf Data Manag. 2014;20:266–79.

    Google Scholar 

  62. Xu F, Wang M, Li M. Sentence-level dialects identification in the greater china region. Int J Nat Lang Comput. 2016;5:6.

    Google Scholar 

  63. Zaghouani W, Charfi A. Arapâ tweet: a large multiâ dialect twitter corpus for gender, age and language variety identification. In: Proceedings of the 11th international conference on language resources and evaluation (LREC), Miyazaki, Japan; 2018.

  64. Zaghouani W, Charfi A. Guidelines and annotation framework for Arabic author profiling. In: Proceedings of the 3rd workshop on open-source Arabic corpora and processing tools, 11th international conference on language resources and evaluation (LREC), Miyazaki, Japan; 2018.

  65. Zaidan OF, Callison-Burch C. Arabic dialect identification. Comput Ling. 2014;40(1):171–202.

    Article  Google Scholar 

  66. Zampieri M, Gebre B. Automatic identification of language varieties: the case of Portuguese. In: The 11th conference on natural language processing (KONVENS). Osterreichischen Gesellschaft fur Artificial Intelligende (OGAI); 2012. pp 233–237.

  67. Zampieri M, Malmasi S, Ljubešić N, Nakov P, Ali A, Tiedemann J, Scherrer Y, Aepli N. Findings of the vardial evaluation campaign 2017. In: Proceedings of the fourth workshop on NLP for similar languages, varieties and dialects; 2017. pp 1–15.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francisco Rangel.

Ethics declarations

Ethical standards

The publication of datasets containing personality profiles and also gender may potentially lead to ethical issues. The creation of the datasets was done in compliance with ethical standards and with the EU General Data Protection Regulation. A more in-depth discussion on legal and ethical issues can be found in [47].

Funding

The work on the author profiling data in Arabic was made possible by NPRP Grant #9-175-1-033 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Forum for Information Retrieval Evaluation” guest edited by Mandar Mitra and Prasenjit Majumder.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rosso, P., Rangel, F. Author Profiling Tracks at FIRE. SN COMPUT. SCI. 1, 72 (2020). https://doi.org/10.1007/s42979-020-0073-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-020-0073-1

Keywords

Navigation