Tracking Researcher Mobility on the Web Using Snippet Semantic Analysis

  • Jorge J. García Flores
  • Pierre Zweigenbaum
  • Zhao Yue
  • William Turner
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7614)


This paper presents the Unoporuno system: an application of natural language processing methods to the sociology of migration. Our approach extracts names of people from a scientific publications database, refines Web search queries using bibliographical data and decides of the international mobility category of a person according to the location analysis of those snippets classified as mobility traces. In order to identify mobility traces, snippets are filtered with a name validation grammar, analyzed with mobility related semantic features and classified with a support vector machine. This classification method is completed by a semi-automatic one, where Unoporuno selects 5 snippets to help a sociologist decide upon the mobility status of authors. Empirical evidence for the automatic person classification task suggest that Unoporuno classified 78% of the mobile persons in the right mobility category, with F=0.71. We also present empirical evidence for the semi-automatic task: in 80% of the cases sociologist are able to choose the right category with a moderate level of inter-rater agreement (0.60) based on the 5 snippet selection.


Mobility Status Noun Phrase Regular Expression Semantic Feature Computational Linguistics 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Auriol, L., Felix, B., Schaaper, M.: Mapping Careers and Mobility of Doctorate Holders: Draft Guidelines, Model Questionnaire and Indicators. OECD Science, Technology and Industry Working Papers (2010/01) (2010)Google Scholar
  2. 2.
    Meyer, J.B., Wattiaux, J.P.: Diaspora Knowledge Networks; Vanishing Doubts and Increasing Evidence. International Journal on Multicultural Societies. UNESCO 8(1), 4–24 (2006)Google Scholar
  3. 3.
    Artiles, J., Borthwick, A., Gonzalo, J., Sekine, S., Amigó, E.: WePS-3 Evaluation Campaign: Overview of the Web People Search Clustering and Attribute Extraction Task. In: Conference on Multilingual and Multimodal Information Access Evaluation, CLEF (2010)Google Scholar
  4. 4.
    Artiles, J., Gonzalo, J., Sekine, S.: The SemEval-2007 WePS Evaluation: Establishing a benchmark for the Web People Search Task. In: Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval 2007). ACL (2007)Google Scholar
  5. 5.
    Artiles, J., Gonzalo, J., Sekine, S.: WePS 2 Evaluation Campaign: overview of the Web People Search Clustering Task. In: 18th WWW Conference on 2nd Web People Search Evaluation Workshop, WePS 2009 (2009)Google Scholar
  6. 6.
    Sekine, S., Artiles, J.: WePS2 Attribute Extraction Task. In: 18th WWW Conference on 2nd Web People Search Evaluation Workshop, WePS 2009 (2009)Google Scholar
  7. 7.
    Artiles, J., Gonzalo, J., Amigó, E.: The impact of query refinement in the web people search task. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, ACLShort 2009, pp. 361–364. Association for Computational Linguistics, Stroudsburg (2009)CrossRefGoogle Scholar
  8. 8.
    Liu, J., Birnbaum, L., Pardo, B.: Categorizing blogger’s interests based on short snippets of blog posts. In: Shanahan, J.G., Amer-Yahia, S., Manolescu, I., Zhang, Y., Evans, D.A., Kolcz, A., Choi, K.S., Chowdhury, A. (eds.) CIKM, pp. 1525–1526. ACM (2008)Google Scholar
  9. 9.
    Barr, C., Jones, R., Regelson, M.: The linguistic structure of English Web-search queries. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2008, pp. 1021–1030. Association for Computational Linguistics, Stroudsburg (2008)CrossRefGoogle Scholar
  10. 10.
    Li, X.: Understanding the semantic structure of noun phrase queries. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, pp. 1337–1345. Association for Computational Linguistics, Stroudsburg (2010)Google Scholar
  11. 11.
    Marcos, M.C., Gonzalez-Caro, C.: Comportamiento de los usuarios en la página de resultados de los buscadores. Un estudio basado en eye tracking. El Profesional de la Información 19(4) (July-August 2010)Google Scholar
  12. 12.
    Mateos, P., Longley, P., Webber, R.: El analisis geodemográfico de apellidos en México. Papeles de Población (65), 73–103 (2010)Google Scholar
  13. 13.
    Padró, L., Collado, M., Reese, S., Lloberes, M., Castellón, I.: FreeLing 2.1: Five Years of Open-source Language Processing Tools. In: Proceedings of the Seventh Conference on International Language Resources and Evaluation, LREC 2010. European Language Resources Association (ELRA), Valletta (2010)Google Scholar
  14. 14.
    Bird, S., Loper, E., Klein, E.: Natural Language Processing with Python. O’Reilly Media Inc. (August 2009)Google Scholar
  15. 15.
    Steinberger, R., Pouliquen, B., Kabadjov, M.A., Belyaeva, J., der Goot, E.V.: JRC-NAMES: A Freely Available, Highly Multilingual Named Entity Resource. In: Proceedings of the International Conferenece, RANLP 2011, pp. 104–110 (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Jorge J. García Flores
    • 1
  • Pierre Zweigenbaum
    • 1
  • Zhao Yue
    • 2
  • William Turner
    • 1
  1. 1.LIMSI - CNRSOrsay CedexFrance
  2. 2.Université Paul ValéryMontpellierFrance

Personalised recommendations