Abstract
In the first section of this chapter, we showcase some of the applications that have traditionally incorporated language identification. In effect, this encompasses all “mixed monolingual” NLP tasks, in routing instances to the monolingual model appropriate to the source language.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
S. Argamon, P. Juola, Overview of the international authorship identification competition at PAN-2011, in CLEF (Notebook Papers/Labs/Workshop) (2011)
A. Babhulgaonkar, S. Sonavane, Language identification for multilingual machine translation, in 2020 International Conference on Communication and Signal Processing (ICCSP) (2020), pp. 401–405. https://doi.org/10.1109/ICCSP48568.2020.9182184
D. Bagnall, Author identification using multi-headed recurrent neural networks, in Working Notes of CLEF 2015 - Conference and Labs of the Evaluation Forum, ed. by L. Cappellato, N. Ferro, G. Jones, E.S. Juan (CEUR-WS.org, Toulouse, France, 2015). https://ceur-ws.org/Vol-1391/150-CR.pdf
K.R. Beesley, Language identifier: a computer program for automatic natural-language identification of on-line text, in Proceedings of the 29th Annual Conference of the American Translators Association: Languages at Crossroads, Seattle, USA (1988), pp. 47–54
Y. Bestgen, Improving the character ngram model for the DSL task with BM25 weighting and less frequently used feature sets, in Proceedings of the Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial) (Association for Computational Linguistics, Valencia, Spain, 2017), pp. 115–123. https://doi.org/10.18653/v1/W17-1214. https://aclanthology.org/W17-1214
J. Bevendorff, B. Chulvi, G.L. De La Peña Sarracén, M. Kestemont, E. Manjavacas, I. Markov, M. Mayerl, M. Potthast, F. Rangel, P. Rosso, E. Stamatatos, B. Stein, M. Wiegmann, M. Wolska, E. Zangerle, Overview of pan 2021: authorship verification, profiling hate speech spreaders on twitter, and style change detection, in Experimental IR Meets Multilinguality, Multimodality, and Interaction, ed. by K.S. Candan, B. Ionescu, L. Goeuriot, B. Larsen, H. Müller, A. Joly, M. Maistro, F. Piroi, G. Faggioli, N. Ferro (Springer International Publishing, Cham, 2021), pp. 419–431. ISBN 978-3-030-85251-1
D. Blanchard, J. Tetreault, D. Higgins, A. Cahill, M. Chodorow, TOEFL11: a corpus of non-native English. ETS Res Report Ser d 2013(2), i–15 (2013)
B. Boenninghoff, R.M. Nickel, D. Kolossa, O2D2: out-of-distribution detector to capture undecidable trials in authorship verification, in Proceedings of the Working Notes of CLEF 2021 - Conference and Labs of the Evaluation Forum, Bucharest, Romania (2021)
B. Boenninghoff, J. Rupp, R.M. Nickel, D. Kolossa, Deep Bayes Factor. Scoring, for authorship verification, in Working Notes of CLEF, Conference and Labs of the Evaluation Forum (Thessaloniki, Greece, 2020), p. 2020
J. Brooke, G. Hirst, Robust, lexicalized native language identification. In: Proceedings of COLING (2012), pp. 391–408
A. Cimino, F. Dell’Orletta, Stacked sentence-document classifier approach for improving native language identification, in Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications (Association for Computational Linguistics, Copenhagen, Denmark, 2017), pp. 430–437. https://doi.org/10.18653/v1/W17-5049. https://aclanthology.org/W17-5049
M. Coulthard, Author identification, idiolect, and linguistic uniqueness. Appl. Ling. 25(4), 431–447 (2004). ISSN 0142-6001. https://doi.org/10.1093/applin/25.4.431
J.E. Custódio, I. Paraboni, EACH-USP ensemble cross-domain authorship attribution, in Working Notes of CLEF, Conference and Labs of the Evaluation Forum (Avignon, France, 2018), p. 2018
B.G. Gebre, M. Zampieri, P. Wittenburg, T. Heskes, Improving native language identification with TF-IDF weighting, in Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications (Association for Computational Linguistics, Atlanta, Georgia, 2013), pp. 216–223. https://aclanthology.org/W13-1728
H. Gómez-Adorno, Y. Alemán, D. Vilariño, M.A. Sanchez-Perez, D. Pinto, G. Sidorov, Author clustering using hierarchical Clustering analysis: notebook for PAN at CLEF 2017, in CEUR Workshop Proceedings, vol. 1866 (CEUR-WS, 2017)
C. Goutte, S.Léger, M. Carpuat, Feature space selection and combination for native language identification, in Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications (2013), pp. 96–100
C. Goutte, S. Léger, M. Carpuat, The NRC system for discriminating similar languages, in Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects (Association for Computational Linguistics and Dublin City University, Dublin, Ireland, 2014), pp. 139–145. https://doi.org/10.3115/v1/W14-5316. https://aclanthology.org/W14-5316
S. Granger, E. Dagneaux, F. Meunier, M. Paquot, et al., International Corpus of Learner English (Presses universitaires de Louvain Louvain-la-Neuve, 2009)
C. Grozea, Brainsignals submission to plant identification task at ImageCLEF 2012, in CLEF (Online Working Notes/Labs/Workshop) (Citeseer, 2012)
D.-M. Iliescu, R. Grand, S. Qirko, R. van der Goot, Much gracias: semi-supervised code-switch detection for Spanish-English: how far can we get?, in Proceedings of the Fifth Workshop on Computational Approaches to Linguistic Code-Switching, June 2021. Association for Computational Linguistics, pp. 65–71. https://www.aclweb.org/anthology/2021.calcs-1.9
R.T. Ionescu, A fast algorithm for local rank distance: application to arabic native language identification, in International Conference on Neural Information Processing (Springer, 2015), pp. 390–400
S. Jarvis, Y. Bestgen, S. Pepper, Maximizing classification accuracy in native language identification, in Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications (2013), pp. 111–118
S. Jarvis, S.A. Crossley, Approaching Language Transfer Through Text Classification: Explorations in the Detection based Approach, vol. 64. Multilingual Matters (2012)
P. Juola, An overview of the traditional authorship attribution subtask, in CLEF (Online Working Notes/Labs/Workshop) (Citeseer, 2012)
P. Juola, E. Stamatatos, Overview of the author identification task at PAN 2013, in CLEF 2013 Evaluation Labs and Workshop – Working Notes Papers, 23–26 September, Valencia, Spain, ed. by P. Forner, R. Navigli, D. Tufis (CEUR-WS.org, 2013). ISBN 978-88-904810-3-1. http://ceur-ws.org/Vol-1179
M. Kestemont, W. Daelemans, M. Tschuggnall, G. Specht, E. Stamatatos, B. Stein, M. Potthast, Overview of the author identification task at PAN-2018: cross-domain authorship attribution and style change detection, in CEUR Workshop Proceedings (2018)
M. Kestemont, E. Manjavacas, I. Markov, J. Bevendorff, M. Wiegmann, E. Stamatatos, M. Potthast, B. Stein, Overview of the cross-domain authorship verification task at PAN 2020, in CLEF (2020)
M. Kestemont, E. Manjavacas, I. Markov, J. Bevendorff, M. Wiegmann, E. Stamatatos, B. Stein, M. Potthast, Overview of the cross-domain authorship verification task at PAN 2021, in CLEF (Working Notes) (2021)
M. Kestemont, E. Stamatatos, E. Manjavacas, W. Daelemans, M. Potthast, B. Stein, Overview of the Cross-domain Authorship Attribution Task at PAN 2019, in CLEF (Working Notes) (2019)
M. Khonji, Y. Iraqi, A slightly-modified GI-based author-verifier with lots of features (ASGALF). CLEF (Working Notes) 1180, 977–983 (2014)
M. Koppel, J. Schler, S. Argamon, Computational methods in authorship attribution. J. Amer. Soc. Inf. Sci. Technol. 60(1), 9–26 (2009). ISSN 1532-2882
S. Malmasi, I. del Río, M. Zampieri, Portuguese native language identification, in International Conference on Computational Processing of the Portuguese Language (Springer, 2018), pp. 115–124
S. Malmasi, M. Dras, Finnish native language identification, in Proceedings of the Australasian Language Technology Association Workshop (2014), pp. 139–144
S. Malmasi, K. Evanini, A. Cahill, J. Tetreault, R. Pugh, C. Hamill, D. Napolitano, Y. Qian, A report on the 2017 native language identification shared task, in Proceedings of the 12th Workshop on Innovative Use of NLP for Building Educational Applications (Association for Computational Linguistics, Copenhagen, Denmark, 2017), pp. 62–75. https://doi.org/10.18653/v1/W17-5007
T. Mizumoto, Y. Hayashibe, K. Sakaguchi, M. Komachi, Y. Matsumoto, NAIST at the NLI 2013 shared task, in Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications (Association for Computational Linguistics, Atlanta, Georgia, 2013), pp. 134–139. https://aclanthology.org/W13-1717
L. Muttenthaler, G. Lucas, J. Amann, Authorship attribution, in fan-fictional texts given variable length character and word N-grams, in Working Notes of CLEF, Conference and Labs of the Evaluation Forum (Lugano, Switzerland, 2019), p. 2019
B. Parlak, A.K. Uysal, The effects of globalisation techniques on feature selection for text classification. J. Inf. Sci. (2020). https://doi.org/10.1177/0165551520930897
X. Ren, B. Yang, D. Liu, H. Zhang, X. Lv, L. Yao, J. Xie, Effective approaches to neural query language identification. Comput. Linguist. 48(4), 887–906 (2022). ISSN 0891-2017. https://doi.org/10.1162/coli_a_00451
P. Rosso, F. Rangel, M. Potthast, E. Stamatatos, M. Tschuggnall, B. Stein, Overview of PAN 2016—new challenges for authorship analysis: cross-genre profiling, clustering, diarization, and obfuscation, in Experimental IR Meets Multilinguality, Multimodality, and Interaction. 7th International Conference of the CLEF Initiative (CLEF 2016), ed. by N. Fuhr, P. Quaresma, B. Larsen, T. Gonçalves, K. Balog, C. Macdonald, L. Cappellato, N. Ferro (Springer, Berlin, Heidelberg, New York, 2016). ISBN 978-3-319-44564-9. https://doi.org/10.1007/978-3-319-44564-9_28
R.S. Roy, M. Choudhury, P. Majumder, K. Agarwal, Overview of the FIRE 2013 track on transliterated search, in Proceedings of the 5th Forum on Information Retrieval Evaluation (FIRE ’13), ed. by P. Majumder, M. Mitra, M. Agrawal, P. Mehta (ACM, New Delhi, India, 2013)
F. Sebastiani, Machine learning in automated text categorization. ACM Comput. Surv. (CSUR) 34(1), 1–47 (2002)
S. Seidman, Authorship verification using the impostors method, in CLEF 2013 Evaluation Labs and Workshop–Working Notes Papers (Citeseer, 2013), pp. 23–26
R. Sequeira, M. Choudhury, P. Gupta, P. Rosso, S. Kumar, S. Banerjee, S.K. Naskar, S. Bandyopadhyay, G. Chittaranjan, A. Das, K. Chakma, Overview of FIRE-2015 shared task on mixed script information retrieval, in Working Notes of the Forum for Information Retrieval Evaluation (FIRE 2015) (Gandhinagar, India, 2015), pp. 21–27
S. Sharma, V. Huddar, I. Aggarwal, N. Khoriya, V. Narayanan, A. Saroop, R. Bhagat, Query language identification with weak supervision and noisy label pruning, in The Web Conference 2021 Workshop on Multilingual Search (2021). https://www.amazon.science/publications/query-language-identification-with-weak-supervision-and-noisy-label-pruning
E. Stamatatos, W. Daelemans, B. Verhoeven, P. Juola, A. López-López, M. Potthast, B. Stein, Overview of the author identification task at PAN 2015, in CLEF 2015 Evaluation Labs and Workshop – Working Notes Papers, 8–11 September, Toulouse, France, ed. by L. Cappellato, N. Ferro, G. Jones, E. San Juan (CEUR-WS.org, 2015). http://ceur-ws.org/Vol-1391
E. Stamatatos, M. Kestemont, K. Kredens, P. Pezik, A. Heini, J. Bevendorff, M. Potthast, B. Stein, Overview of the authorship verification task at PAN 2022, in Working Notes of CLEF (2022)
E. Stamatatos, W. Daelemans, B. Verhoeven, P. Juola, A. López-López, M. Potthast, B. Stein, Overview of the author identification task at pan 2014. CLEF (Working Notes) 1180, 877–897 (2014)
J. Tetreault, D. Blanchard, A. Cahill, A report on the first native language identification shared task, in Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications(Association for Computational Linguistics, Atlanta, Georgia, 2013), pp. 48–57. https://www.aclweb.org/anthology/W13-1706
M. Tschuggnall, E. Stamatatos, B. Verhoeven, W. Daelemans, G. Specht, B. Stein, M. Potthast, Overview of the author identification task at pan-2017: style breach detection and author clustering, in CLEF (Working Notes) (2017)
M. Zampieri, B.G. Gebre, H. Costa, J. van Genabith, Comparing approaches to the identification of similar languages, in Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects (Association for Computational Linguistics, Hissar, Bulgaria, 2015), pp. 66–72. https://aclanthology.org/W15-5411
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Jauhiainen, T., Zampieri, M., Baldwin, T., Lindén, K. (2024). Applications and Related Tasks. In: Automatic Language Identification in Texts. Synthesis Lectures on Human Language Technologies. Springer, Cham. https://doi.org/10.1007/978-3-031-45822-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-45822-4_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45821-7
Online ISBN: 978-3-031-45822-4
eBook Packages: Synthesis Collection of Technology (R0)