Do Your Social Profiles Reveal What Languages You Speak? Language Inference from Social Media Profiles

  • Yu XuEmail author
  • M. Rami Ghorab
  • Zhongqing Wang
  • Dong Zhou
  • Séamus Lawless
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9626)


In the multilingual World Wide Web, it is critical for Web applications, such as multilingual search engines and targeted international advertisements, to know what languages the user understands. However, online users are often unwilling to make the effort to explicitly provide this information. Additionally, language identification techniques struggle when a user does not use all the languages they know to directly interact with the applications. This work proposes a method of inferring the language(s) online users comprehend by analyzing their social profiles. It is mainly based on the intuition that a user’s experiences could imply what languages they know. This is nontrivial, however, as social profiles are usually incomplete, and the languages that are regionally related or similar in vocabulary may share common features; this makes the signals that help to infer language scarce and noisy. This work proposes a language and social relation-based factor graph model to address this problem. To overcome these challenges, it explores external resources to bring in more evidential signals, and exploits the dependency relations between languages as well as social relations between profiles in modeling the problem. Experiments in this work are conducted on a large-scale dataset. The results demonstrate the success of our proposed approach in language inference and show that the proposed framework outperforms several alternative methods.



This research is supported by Science Foundation Ireland through the CNGL Programme (Grant 12/CE/I2267) in the ADAPT Centre ( at Trinity College Dublin. The work is also supported by the National Natural Science Foundation of China under Project No. 61300129, and a project Sponsored by the Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry, China under grant number [2013] 1792.


  1. 1.
    Tucker, R.: A global perspective on bilingualism and bilingual education. In: Georgetown University Round Table on Languages and Linguistics, pp. 332–340 (1999)Google Scholar
  2. 2.
    Diamond, J.: The benefits of multilingualism. Sci. Wash. 330(6002), 332–333 (2010)CrossRefGoogle Scholar
  3. 3.
    Ghorab, M., Leveling, J., Zhou, D., Jones, G.J., Wade, V.: Identifying common user behaviour in multilingual search logs. In: Peters, C., Di Nunzio, G.M., Kurimo, M., Mandl, T., Mostefa, D., Peñas, A., Roda, G. (eds.) CLEF 2009. LNCS, vol. 6241, pp. 518–525. Springer, Heidelberg (2010)Google Scholar
  4. 4.
    Oakes, M., Xu, Y.: A search engine based on query logs, and search log analysis at the university of Sunderland. In: Proceedings of the 10th Cross Language Evaluation Forum (2009)Google Scholar
  5. 5.
    Kontaxis, G., Polychronakis, M., et al.: Minimizing information disclosure to third parties in social login platforms. Int. J. Inf. Secur. 11(5), 321–332 (2012)CrossRefGoogle Scholar
  6. 6.
    Burger, J.D., et al.: Discriminating gender on Twitter. In: EMNLP, pp. 1301–1309 (2011)Google Scholar
  7. 7.
    Li, R., Wang, S., Deng, H., et al.: Towards social user profiling: unified and discriminative influence model for inferring home locations. In: SIGKDD, pp. 1023–1031 (2012)Google Scholar
  8. 8.
    Dunning, T.: Statistical identification of language. Technical Report MCCS 940–273, Computing Research Laboratory, New Mexico State University (1994)Google Scholar
  9. 9.
    Xia, F., Lewis, W.D., Poon, H.: Language ID in the context of harvesting language data off the web. In: EACL, pp. 870–878 (2009)Google Scholar
  10. 10.
    Martins, B., et al.: Language identification in web pages. In: SAC, pp. 764–768 (2005)Google Scholar
  11. 11.
    Stiller, J., Gäde, M., Petras, V.: Ambiguity of queries and the challenges for query language detection. In: The proceedings of Cross Language Evaluation Forum (2010)Google Scholar
  12. 12.
    Carter, S., et al.: Microblog language identification: Overcoming the limitations of short, unedited and idiomatic text. Lang. Resour. Eval. 47(1), 195–215 (2013)CrossRefGoogle Scholar
  13. 13.
    Qiu, F., Cho, J.: Automatic identification of user interest for personalized search. In: WWW, pp. 727–736 (2006)Google Scholar
  14. 14.
    White, R.W., Bailey, P., Chen, L.: Predicting user interests from contextual information. In: SIGIR, pp. 363–370 (2009)Google Scholar
  15. 15.
    Liu, J., Dolan, P., Pedersen, E.R.: Personalized news recommendation based on click behavior. In: IUI, pp. 31–40 (2010)Google Scholar
  16. 16.
    Xu, S., et al.: Exploring folksonomy for personalized search. In: SIGIR, pp. 155–162 (2008)Google Scholar
  17. 17.
    Provost, F., Dalessandro, B., Hook, R., et al.: Audience selection for on-line brand advertising: privacy-friendly social network targeting. In: SIGKDD, pp. 707–716 (2009)Google Scholar
  18. 18.
    Mislove, A., Viswanath, B., Gummadi, K.P., Druschel, P.: You are who you know: inferring user profiles in online social networks. In: WSDM, pp. 251–260 (2010)Google Scholar
  19. 19.
    Maheshwari, S., Sainani, A., Reddy, P.: An approach to extract special skills to improve the performance of resume selection. In: Kikuchi, S., Sachdeva, S., Bhalla, S. (eds.) DNIS 2010. LNCS, vol. 5999, pp. 256–273. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  20. 20.
    Wang, Z., Li, S., Kong, F., Zhou, G.: Collective personal profile summarization with social networks. In: EMNLP, pp. 715–725 (2013)Google Scholar
  21. 21.
    Yang, Z., Cai, K., et al.: Social context summarization. In: SIGIR, pp. 255–264 (2011)Google Scholar
  22. 22.
    Dong, Y., Tang, J., Wu, S., et al.: Link prediction and recommendation across heterogeneous social networks. In: ICDM, pp. 181–190 (2012)Google Scholar
  23. 23.
    Tang, W., Zhuang, H., Tang, J.: Learning to infer social ties in large networks. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011, Part III. LNCS, vol. 6913, pp. 381–397. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  24. 24.
    Tang, J., Wu, S., Sun, J.: Confluence: Conformity influence in large social networks. In: SIGKDD, pp. 347–355 (2013)Google Scholar
  25. 25.
    Hammersley, J.M., Clifford, P.: Markov fields on finite graphs and lattices. Unpublished Manuscript (1971)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Yu Xu
    • 1
    Email author
  • M. Rami Ghorab
    • 1
  • Zhongqing Wang
    • 2
  • Dong Zhou
    • 3
  • Séamus Lawless
    • 1
  1. 1.ADAPT Centre, Knowledge and Data Engineering Group, School of Computer Science and StatisticsTrinity College DublinDublinIreland
  2. 2.Natural Language Processing LabSoochow UniversitySuzhouChina
  3. 3.School of Computer Science and EngineeringHunan University of Science and TechnologyXiangtanChina

Personalised recommendations