Advertisement

Language Resources and Evaluation

, Volume 50, Issue 2, pp 165–220 | Cite as

Rediscovering 15 + 2 years of discoveries in language resources and evaluation

  • Joseph Mariani
  • Patrick Paroubek
  • Gil Francopoulo
  • Olivier Hamon
Original Paper

Abstract

This paper analyzes the content of the proceedings of the Language Resources and Evaluation Conference (LREC) over the past 17 years (1998–2014), with the goal of gaining a picture of the LREC community and the topics that are most relevant to the field. We follow the methodology used in similar studies, including the survey of the IEEE ICASSP conference proceedings from 1976 to 1990, the survey of the Association of Computational Linguistics conference proceedings over 50 years, and the survey of the proceedings of the conferences contained in the ISCA Archive over 25 years (1987–2012). We expand on results originally presented at LREC 2014, but include the proceedings of LREC 2014 itself in the study together with an analysis of various citation graphs. We show the evolution over time of the number of papers and authors, including their distribution by gender and affiliation, as well as collaborations and citation patterns among authors and papers, funding sources for reported research, and plagiarism and reuse in LREC papers; results for LREC are compared with similar results for major conferences in related fields. We also consider the evolution of research topics over time and identify the authors who introduced key terms. Finally, we propose and apply a measure of a researcher’s notability and provide the results for LREC authors. The study uses NLP methods that have been published in the corpus considered in the study. In addition to providing a revealing characterization of the LRE community, the study also demonstrates the need for establishing a system for unique identification of authors, papers and other sources to facilitate this type of analysis.

Keywords

ELRA Anthology Language resources Language processing systems evaluation Text analytics Social networks ISLRN Bibliometrics Scientometrics 

Notes

Acknowledgments

The authors wish to thank the ACL colleagues, Ken Church, Sanjeev Khudanpur, Amjbad Abu Jbara, Dragomir Radev and Simone Teufel, who helped them in the starting phase, Isabel Trancoso, who gave her ISCA Archive analysis on the use of assessment and corpora, Wolfgang Hess, who produced and provided a 14 GBytes ISCA Archive, Emmanuelle Foxonet who provided a list of authors given names with genre, Florian Boudin, who made available the TALN Anthology, Helen van der Stelt and Jolanda Voogd (Springer) who provided the LRE data and Douglas O’Shaughnessy, Denise Hurley, Rebecca Wollman and Casey Schwartz (IEEE) who provided the IEEE ICASSP and TASLP data, Nancy Ide and Christopher Cieri who largely improved the readability of the paper. They also thank Khalid Choukri, Alexandre Sicard and Nicoletta Calzolari, who provided information about the past LREC conferences, Victoria Arranz, Ioanna Giannopoulou, Johann Gorlier, Jérémy Leixa, Valérie Mapelli and Hélène Mazo, who helped in recovering the metadata for LREC 1998, and all the organizers, reviewers and authors over the 17 years conferences without whom this analysis could not have been conducted!

References

  1. ACL. (2012). Proceedings of the ACL-2012 special workshop on rediscovering 50 years of discoveries, ACL 2012, Jeju, July 10, 2012. ISBN 978-1-937284-29-9.Google Scholar
  2. Bavelas, A. (1948). A mathematical model for small group structures. Human Organization, 7, 16–30.CrossRefGoogle Scholar
  3. Bavelas, A. (1950). Communication patterns in task oriented groups. Journal of the Acoustical Society of America, 22, 271–282.CrossRefGoogle Scholar
  4. Boudin, F. (2013). TALN archives: une archive numérique francophone des articles de recherche en traitement automatique de la langue. TALN-RÉCITAL 2013, Les Sables d’Olonne, Juin 17–21, 2013.Google Scholar
  5. Bravo, E., Calzolari, A., De Castro, P., Mabile, L., Napolitani, F., Rossi, A. M., & Cambon-Thomsen, A. (2015). Developing a guideline to standardize the citation of bioresources in journal articles (CoBRA). BMC Medicine, 13, 33.CrossRefGoogle Scholar
  6. Calzolari, N., Del Gratta, R., Francopoulo, G., Mariani, J., Rubino, F., Russo, I., et al. (2012). The LRE map. Harmonising community descriptions of resources. In Proceedings of the language resources and evaluation conference (LREC 2012), Istanbul, Turkey, May 23–25, 2012.Google Scholar
  7. Councill, I. G., Giles, C. L., & Kan, M.-Y. (2008). ParsCit: An open-source CRF reference string parsing package. In Proceedings of the language resources and evaluation conference (LREC 2008), Marrakesh, Morocco, May 2008.Google Scholar
  8. Csárdi, G., & Nepusz, T. (2006). The igraph software package for complex network research. InterJournal, Complex Systems, 1695, 1–9.Google Scholar
  9. Drouin, P. (2004). Detection of domain specific terminology using corpora comparison. In Proceedings of the language resources and evaluation conference (LREC 2004), Lisbon, Portugal, May 2004.Google Scholar
  10. Dunne, C., Shneiderman, B., Gove, R., Klavans, J., & Dorr, B. (2012). Rapid understanding of scientific paper collections: Integrating statistics, text analytics, and visualization. Journal of the American Society for Information Science and Technology, 63(12), 2351–2369.CrossRefGoogle Scholar
  11. Francopoulo, G. (2007). TagParser: Well on the way to ISO-TC37 conformance. In ICGL (International conference on global interoperability for language resources), Hong Kong.Google Scholar
  12. Francopoulo, G., Marcoul, F., Causse, D., & Piparo, G. (2013). Global atlas: Proper nouns, from Wikipedia to LMF. In G. Francopoulo (Ed.), LMF—Lexical Markup Framework. London: ISTE/Wiley.CrossRefGoogle Scholar
  13. Francopoulo, G., Mariani, J., & Paroubek, P. (2015a). NLP4NLP: The cobbler’s children won’t go unshod. In 4th international workshop on mining scientific publications (WOSP2015), joint conference on digital libraries 2015 (JCDL 2015), Knoxville (USA), June 24, 2015.Google Scholar
  14. Francopoulo, G., Mariani, J., & Paroubek, P. (2015b). NLP4NLP: Applying NLP to written and spoken scientific NLP corpora. In Workshop on mining scientific papers: Computational linguistics and bibliometrics, 15th international society of scientometrics and informetrics conference (ISSI 2015), Istanbul (Turkey), June 29, 2015.Google Scholar
  15. Francopoulo, G., Mariani, J., & Paroubek, P. (2016). A study of reuse and plagiarism in LREC papers. In Proceedings of LREC 2016, Portorož, Slovenia, May 23–28, 2016.Google Scholar
  16. Freeman, L. C. (1978). Centrality in social networks, conceptual clarifications. Social Networks, 1(1978/79), 215–239.CrossRefGoogle Scholar
  17. Fu, Y., Xu, F., & Uszkoreit, H. (2010). Determining the origin and structure of person names. In Proceedings of the seventh conference on international language resources and evaluation (LREC’10) (pp. 3417–3422), Valletta, Malta. European Language Resources Association (ELRA), May 2010. ISBN 2-9517408-6-7.Google Scholar
  18. Hall, D. L. W., Jurafsky, D., & Manning, C. (2008). Studying the history of ideas using topic models. In Proceedings of the conference on empirical methods in natural language processing (EMNLP’08) (pp. 363–371).Google Scholar
  19. Joerg, B., Höllrigl, T., & Sicilia, M.-A. (2012). Entities and identities in research information systems. In 11th international conference on current research information systems (CRIS2012): “e-Infrastructures for research and innovation: Linking information systems to improve scientific knowledge production”, Prague, Czech Republic, June 6–9, 2012.Google Scholar
  20. Li, H., Councill, I., Lee, W. C., & Giles, C. L. (2006). CiteSeerx: An architecture and web service design for an academic document search engine. In Proceedings of the 15th international conference on the World Wide Web. Google Scholar
  21. Litchfield, B. (2005). Making PDFs portable: Integrating PDF and Java technology. Java Developers Journal, March 24, 2005. http://java.sys-con.com/node/48543 (PDFBox is available at http://pdfbox.apache.org/).
  22. Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge: Cambridge University Press. ISBN 0521865719.CrossRefGoogle Scholar
  23. Mariani, J. (1990). La Conférence IEEE-ICASSP de 1976 à 1990: 15 ans de recherches en Traitement Automatique de la Parole. Notes et Documents LIMSI 90-8, Septembre 1990.Google Scholar
  24. Mariani, J., Cieri, C., Francopoulo, G., Paroubek, P., & Delaborde, M. (2014b). Facing the identification problem in language-related scientific data analysis. In Proceedings of LREC 2014, Reykjavik, Iceland, May 26–31, 2014.Google Scholar
  25. Mariani, J., Paroubek, P., Francopoulo, G., & Delaborde, M. (2013). Rediscovering 25 years of discoveries in spoken language processing: A preliminary ISCA archive analysis. In Proceedings of Interspeech 2013, Lyon, France, August 26–29, 2013.Google Scholar
  26. Mariani, J., Paroubek, P., Francopoulo, G., & Hamon, O. (2014a). Rediscovering 15 years of discoveries in language resources and evaluation: The LREC anthology analysis. In Proceedings of LREC 2014, Reykjavik, Iceland, May 26–31, 2014.Google Scholar
  27. Osborne, F., Motta, E., & Mulholland, P. (2013). Exploring scholarly data with Rexplore. In International semantic web conference, Sydney, Australia.Google Scholar
  28. Paul, M., & Girju, R. (2009). Topic modeling of research fields: An interdisciplinary perspective. In Recent advances in natural language processing (RANLP 2009), Borovets, Bulgaria.Google Scholar
  29. Radev, D. R., Muthukrishnan, P., Qazvinian, V., & Abu-Jbara, A. (2013). The ACL Anthology Network corpus. Language Resources and Evaluation, 47, 919–944.CrossRefGoogle Scholar
  30. Rochat, Y. (2009). Closeness centrality extended to unconnected graphs: The harmonic centrality index. In Applications of Social Network Analysis (ASNA), 2009, Zurich, Switzerland.Google Scholar
  31. Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., & Su, Z. (2008). ArnetMiner: Extraction and mining of academic social networks. In Proceeding of the 14th international conference on knowledge discovery and data mining.Google Scholar
  32. The British National Corpus. (2007). Version 3 (BNC XML edition). Distributed by Oxford University Computing Services on behalf of the BNC Consortium. http://www.natcorp.ox.ac.uk/.
  33. The R Journal. (2012). 4(2), 5–12. ISSN 2073-4859, http://journal.r-project.org/.

Copyright information

© Springer Science+Business Media Dordrecht 2016

Authors and Affiliations

  • Joseph Mariani
    • 1
    • 2
  • Patrick Paroubek
    • 1
  • Gil Francopoulo
    • 2
    • 3
  • Olivier Hamon
    • 4
  1. 1.LIMSICNRS, Université Paris-SaclayOrsayFrance
  2. 2.IMMICNRSOrsayFrance
  3. 3.TagmaticaParisFrance
  4. 4.SyllabsParisFrance

Personalised recommendations