International Conference of the Cross-Language Evaluation Forum for European Languages

Experimental IR Meets Multilinguality, Multimodality, and Interaction pp 28-40

Language Variety Identification Using Distributed Representations of Words and Documents

  • Marc Franco-Salvador
  • Francisco Rangel
  • Paolo Rosso
  • Mariona Taulé
  • M. Antònia Martít
Conference paper

DOI: 10.1007/978-3-319-24027-5_3

Volume 9283 of the book series Lecture Notes in Computer Science (LNCS)
Cite this paper as:
Franco-Salvador M., Rangel F., Rosso P., Taulé M., Antònia Martít M. (2015) Language Variety Identification Using Distributed Representations of Words and Documents. In: Mothe J. et al. (eds) Experimental IR Meets Multilinguality, Multimodality, and Interaction. Lecture Notes in Computer Science, vol 9283. Springer, Cham

Abstract

Language variety identification is an author profiling subtask which aims to detect lexical and semantic variations in order to classify different varieties of the same language. In this work we focus on the use of distributed representations of words and documents using the continuous Skip-gram model. We compare this model with three recent approaches: Information Gain Word-Patterns, TF-IDF graphs and Emotion-labeled Graphs, in addition to several baselines. We evaluate the models introducing the Hispablogs dataset, a new collection of Spanish blogs from five different countries: Argentina, Chile, Mexico, Peru and Spain. Experimental results show state-of-the-art performance in language variety identification. In addition, our empirical analysis provides interesting insights on the use of the evaluated approaches.

Keywords

Author profiling Language variety identification Distributed representations Information Gain Word-Patterns TF-IDF graphs Emotion-labeled Graphs 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Marc Franco-Salvador
    • 1
  • Francisco Rangel
    • 1
    • 2
  • Paolo Rosso
    • 1
  • Mariona Taulé
    • 3
  • M. Antònia Martít
    • 3
  1. 1.Universitat Politècnica de ValènciaValenciaSpain
  2. 2.Autoritas Consulting S.A.MadridSpain
  3. 3.Universitat de BarcelonaBarcelonaSpain