Multilingual Name Disambiguation with Semantic Information

  • Zornitsa Kozareva
  • Sonia Vàzquez
  • Andrés Montoyo
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4629)


This paper studies the problem of name ambiguity which concerns the discovery of the different underlying meanings behind a name. We have developed a semantic approach on the basis of which a graph-based clustering algorithm determines the sets of the semantically related sentences that talk about the same name. Our approach is evaluated with the Bulgarian, Romanian, Spanish and English languages for various couples of city, country, person and organization names. The yielded results significantly outperform a majority based classifier and are compared to a bigram co-occurrence approach.


Singular Value Decomposition Semantic Information Semantic Similarity Latent Semantic Analysis Vector Space Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Jurafski, D., Martin, J.: Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics. Prentice-Hal, Englewood Cliffs (2000)Google Scholar
  2. 2.
    Bagga, A., Baldwin, B.: Entity-based cross-document coreferencing using the vector space model. In: Proceedings of the Thirty-Sixth Annual Meeting of the ACL and Seventeenth International Conference on Computational Linguistics, pp. 79–85 (1998)Google Scholar
  3. 3.
    Mann, G.S., Yarowsky, D.: Unsupervised personal name disambiguation. In: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003, pp. 33–40 (2003)Google Scholar
  4. 4.
    Kulkarni, A.: Unsupervised discrimination and labeling of ambiguous names. In: Proceedings of 43rd Annual Meeting of the Association for Computational Linguistics (2005)Google Scholar
  5. 5.
    Pedersen, T., Kulkarni, A., Angheluta, R., Kozareva, Z., Solorio, T.: An unsupervised language independent method of name discrimination using second order co-occurrence features. In: Gelbukh, A. (ed.) CICLing 2006. LNCS, vol. 3878, pp. 208–222. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  6. 6.
    Pedersen, T., Kulkarni, A.: Unsupervised discrimination of person names in web contexts. In: Proceedings of the Eighth International Conference on Intelligent Text Processing and Computational Linguistics (2007)Google Scholar
  7. 7.
    Foltz, P.W.: Using latent semantic indexing for information filtering. In: Proceedings of the ACM SIGOIS and IEEE CS TC-OA conference on Office information systems, pp. 40–47 (1990)Google Scholar
  8. 8.
    Turney, P.D.: Human-level performance on word analogy questions by latent relational analysis. Technical report, Institute for Information Technology, National Research Council of Canada (2004)Google Scholar
  9. 9.
    Cleuziou, G., Martin, L., Vrain, C.: Poboc: An overlapping clustering algorithm, application to rule-based classification and textual data. In: ECAI, pp. 440–444 (2004)Google Scholar
  10. 10.
    Nakov, P., Hearst, M.: Category-based pseudowords. In: NAACL 2003: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pp. 67–69 (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Zornitsa Kozareva
    • 1
  • Sonia Vàzquez
    • 1
  • Andrés Montoyo
    • 1
  1. 1.Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante 

Personalised recommendations