Advertisement

A Comparative Evaluation of Cross-Lingual Text Annotation Techniques

  • Lei Zhang
  • Achim Rettinger
  • Michael Färber
  • Marko Tadić
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8138)

Abstract

In this paper, we study the problem of extracting knowledge from textual documents written in different languages by annotating the text on the basis of a cross-lingual knowledge base, namely Wikipedia. Our contribution is twofold. First, we propose a novel framework for evaluating cross-lingual text annotation techniques, based on annotation of a parallel corpus to a hub-language in a cross-lingual knowledge base. Second, we investigate the performance of different cross-lingual text annotation techniques according to our proposed evaluation framework. We perform experiments for an empirical comparison of three approaches: (i) Cross-lingual Named Entity Annotation (CL-NEA), (ii) Cross-lingual Wikifier Annotation (CL-WIFI), and (iii) Cross-lingual Explicit Semantic Analysis (CL-ESA). Besides establishing an evaluation framework, our results show the differences between the three investigated approaches and demonstrate their advantages and disadvantages.

Keywords

Name Entity Recognition Source Language Computational Linguistics Parallel Corpus English Document 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bikel, D.M., Miller, S., Schwartz, R., Weischedel, R.: Nymble: a high-performance learning name-finder. In: Proceedings of the Fifth Conference on Applied Natural Language Processing, ANLC 1997, pp. 194–201. Association for Computational Linguistics, Stroudsburg (1997)CrossRefGoogle Scholar
  2. 2.
    Sekine, S.: NYU: Description of the Japanese NE system used for MET-2. In: Proc. of the Seventh Message Understanding Conference, MUC-7 (1998)Google Scholar
  3. 3.
    Borthwick, A., Sterling, J., Agichtein, E., Grishman, R.: NYU: Description of the MENE Named Entity System as Used in MUC-7. In: Proceedings of the Message Understanding Conference, MUC-7 (1998)Google Scholar
  4. 4.
    Asahara, M., Matsumoto, Y.: Japanese Named Entity extraction with redundant morphological analysis. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, NAACL 2003, vol. 1, pp. 8–15. Association for Computational Linguistics, Stroudsburg (2003)CrossRefGoogle Scholar
  5. 5.
    McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, CONLL 2003, vol. 4, pp. 188–191. Association for Computational Linguistics, Stroudsburg (2003)CrossRefGoogle Scholar
  6. 6.
    Carreras, X., Màrquez, L., Padró, L.: A simple named entity extractor using AdaBoost. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, CONLL 2003, vol. 4, pp. 152–155. Association for Computational Linguistics, Stroudsburg (2003)CrossRefGoogle Scholar
  7. 7.
    Schapire, R.E., Singer, Y.: Improved Boosting Algorithms Using Confidence-rated Predictions. Mach. Learn. 37(3), 297–336 (1999)CrossRefzbMATHGoogle Scholar
  8. 8.
    Faruqui, M., Padó, S.: Training and Evaluating a German Named Entity Recognizer with Semantic Generalization. In: Proceedings of KONVENS 2010, Saarbrücken, Germany (2010)Google Scholar
  9. 9.
    Mihalcea, R., Csomai, A.: Wikify!: linking documents to encyclopedic knowledge. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, CIKM 2007, pp. 233–242. ACM (2007)Google Scholar
  10. 10.
    Milne, D., Witten, I.H.: Learning to link with wikipedia. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM 2008, pp. 509–518. ACM, New York (2008)Google Scholar
  11. 11.
    Cilibrasi, R.L., Vitanyi, P.M.: The google similarity distance. IEEE Transactions on Knowledge and Data Engineering 19(3), 370–383 (2007)CrossRefGoogle Scholar
  12. 12.
    Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, vol. 6, p. 12 (2007)Google Scholar
  13. 13.
    Gabrilovich, E., Markovitch, S.: Overcoming the Brittleness Bottleneck using Wikipedia: Enhancing Text Categorization with Encyclopedic Knowledge. In: AAAI, pp. 1301–1306 (2006)Google Scholar
  14. 14.
    Sorg, P., Cimiano, P.: Cross-lingual Information Retrieval with Explicit Semantic Analysis. Working Notes of the Annual CLEF Meeting (2008)Google Scholar
  15. 15.
    Potthast, M., Stein, B., Anderka, M.: A Wikipedia-Based Multilingual Retrieval Model. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 522–530. Springer, Heidelberg (2008)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Lei Zhang
    • 1
  • Achim Rettinger
    • 1
  • Michael Färber
    • 1
  • Marko Tadić
    • 2
  1. 1.Institute AIFBKarlsruhe Institute of TechnologyGermany
  2. 2.Faculty of Humanities and Social SciencesUniversity of ZagrebCroatia

Personalised recommendations