Analysis and Refinement of Cross-Lingual Entity Linking

  • Taylor Cassidy
  • Heng Ji
  • Hongbo Deng
  • Jing Zheng
  • Jiawei Han
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7488)


In this paper we propose two novel approaches to enhance cross-lingual entity linking (CLEL). One is based on cross-lingual information networks, aligned based on monolingual information extraction, and the other uses topic modeling to ensure global consistency. We enhance a strong baseline system derived from a combination of state-of-the-art machine translation and monolingual entity linking to achieve 11.2% improvement in B-Cubed+ F-measure. Our system achieved highly competitive results in the NIST Text Analysis Conference (TAC) Knowledge Base Population (KBP2011) evaluation. We also provide detailed qualitative and quantitative analysis on the contributions of each approach and the remaining challenges.


Knowledge Base Machine Translation Democratic Progressive Party Source Document Source Language 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Artiles, J., Borthwick, A., Gonzalo, J., Sekine, S., Amigo, E.E.: WePS-3 Evaluation Campaign: Overview of the Web People Search Clustering and Attribute Extraction Task. In: Proc. CLEF 2010 (2010)Google Scholar
  2. 2.
    Adafre, S.F., de Rijke, M.: Language-Independent Identification of Parallel Sentences Using Wikipedia. In: Proc. WWW 2011 (2011)Google Scholar
  3. 3.
    Chen, Z., Ji, H.: Language Specific Issue and Feature Exploration in Chinese Event Extraction. In: Proc. HLT-NAACL 2009 (2009)Google Scholar
  4. 4.
    Chen, Z., Ji, H.: Collaborative Ranking: A Case Study on Entity Linking. In: Proc. EMNLP 2011 (2011)Google Scholar
  5. 5.
    Deng, H., Han, J., Zhao, B., Yu, Y., Lin, C.X.: Probabilistic Topic Models with Biased Propagation on Heterogeneous Information Networks. In: Proc. KDD 2011 (2011)Google Scholar
  6. 6.
    Erdmann, M., Nakayama, K., Hara, T., Nishio, S.: Improving the Extraction of Bilingual Terminology from Wikipedia. ACM Transactions on Multimedia Computing Communications and Applications (2009)Google Scholar
  7. 7.
    Fahrni, A., Strube, M.: HITS’ Cross-lingual Entity Linking System at TAC 2011: One Model for All Languages. In: Proc. TAC 2011 (2011)Google Scholar
  8. 8.
    Filatova, E.: Multilingual Wikipedia, Summarization, and Information Trustworthiness. In: Proc. SIGIR 2009 Workshop on Information Access in a Multilingual World (2009)Google Scholar
  9. 9.
    Gale, W.A., Church, K.W., Yarowsky, D.: One Sense Per Discourse. In: Proc. DARPA Speech and Natural Language Workshop (1992)Google Scholar
  10. 10.
    Harris, Z.: Distributional Structure. Word (1954)Google Scholar
  11. 11.
    Ji, H., Grishman, R.: Refining Event Extraction through Cross-Document Inference. In: Proc. of ACL 2008: HLT, pp. 254–262 (2008)Google Scholar
  12. 12.
    Ji, H., Grishman, R., Dang, H.T.: An Overview of the TAC 2011 Knowledge Base Population Track. In: Proc. Text Analytics Conference (TAC 2011) (2011)Google Scholar
  13. 13.
    Ji, H., Grishman, R., Freitag, D., Blume, M., Wang, J., Khadivi, S., Zens, R., Ney, H.: Name Translation for Distillation. In: Handbook of Natural Language Processing and Machine Translation: DARPA Global Autonomous Language Exploitation (2009)Google Scholar
  14. 14.
    Ji, H., Westbrook, D., Grishman, R.: Using Semantic Relations to Refine Coreference Decisions. In: Proc. EMNLP 2005 (2005)Google Scholar
  15. 15.
    Kozareva, Z., Ravi, S.: Unsupervised Name Ambiguity Resolution Using A Generative Model. In: Proc. EMNLP 2011 Workshop on Unsupervised Learning in NLP (2011)Google Scholar
  16. 16.
    Li, Q., Anzaroot, S., Lin, W.P., Li, X., Ji, H.: Joint Inference for Cross-document Information Extraction. In: Proc. CIKM 2011 (2011)Google Scholar
  17. 17.
    Lin, W.P., Snover, M., Ji, H.: Unsupervised Language-Independent Name Translation Mining from Wikipedia Infoboxes. In: Proc. EMNLP 2011 Workshop on Unsupervised Learning for NLP (2011)Google Scholar
  18. 18.
    McNamee, P., Mayfield, J., Lawrie, D., Oard, D.W., Doermann, D.: Cross-Language Entity Linking. In: Proc. IJCNLP 2011 (2011)Google Scholar
  19. 19.
    McNamee, P., Mayfield, J., Oard, D.W., Xu, T., Wu, K., Stoyanov, V., Doermann, D.: Cross-Language Entity Linking in Maryland during a Hurricane. In: Proc. TAC 2011 (2011)Google Scholar
  20. 20.
    Milne, D., Witten, I.H.: Learning to Link with Wikipedia. In: Proc. CIKM 2008 (2008)Google Scholar
  21. 21.
    Monahan, S., Lehmann, J., Nyberg, T., Plymale, J., Jung, A.: Cross-Lingual Cross-Document Coreference with Entity Linking. In: Proc. TAC 2011 (2011)Google Scholar
  22. 22.
    Richman, A.E., Schone, P.: Mining Wiki Resources for Multilingual Named Entity Recognition. In: Proc. ACL 2008 (2008) Google Scholar
  23. 23.
    You, G., Hwang, S., Song, Y., Jiang, L., Nie, Z.: Mining Name Translations from Entity Graph Mappings. In: Proc. EMNLP 2010 (2003)Google Scholar
  24. 24.
    Zheng, J., Ayan, N.F., Wang, W., Burkett, D.: Using Syntax in Large-scale Audio Document Translation. In: Proc. Interspeech (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Taylor Cassidy
    • 1
  • Heng Ji
    • 1
  • Hongbo Deng
    • 2
  • Jing Zheng
    • 3
  • Jiawei Han
    • 2
  1. 1.Computer Science Department and Linguistics Department, Queens College and Graduate CenterCity University of New YorkNew YorkUSA
  2. 2.Computer Science DepartmentUniversity of Illinois at Urbana-ChampaignUrbana-ChampaignUSA
  3. 3.SRI InternationalMenlo ParkUSA

Personalised recommendations