Advertisement

Ontology-Supported Text Classification Based on Cross-Lingual Word Sense Disambiguation

  • Dan Tufiş
  • Svetla Koeva
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4578)

Abstract

The paper reports on recent experiments in cross-lingual document processing (with a case study for Bulgarian-English-Romanian language pairs) and brings evidence on the benefits of using linguistic ontologies for achieving, with a high level of accuracy, difficult tasks in NLP such as word alignment, word sense disambiguation, document classification, cross-language information retrieval, etc. We provide brief descriptions of the parallel corpus we used, the multilingual lexical ontology which supports our research, the word alignment and word sense disambiguation systems we developed and a preliminary report on an ongoing development of a system for cross-lingual text-classification which takes advantage of these multilingual technologies. Unlike the keyword-based methods in document processing, the concept-based methods are supposed to better exploit the semantic information contained in a particular document and thus to provide more accurate results.

Keywords

cross-lingual document classification multilingual lexical ontology parallel corpora word alignment word sense disambiguation 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Steinberger, R., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., Tufiş, D.: The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages. In: proceedings of the 5th LREC Conference, Genoa, pp. 2142–2147 (2006)Google Scholar
  2. 2.
    Tufiş, D. (ed.): Special Issue on the BalkaNet Project. Romanian Journal of Information Science and Technology, vol. 7(1-2), Bucharest (2004), http://www.racai.ro/BalkanetSpecialIssue.doc
  3. 3.
    Vossen, P. (ed.): A Multilingual Database with Lexical Semantic Networks. Kluwer Academic Publishers, Dordrecht (1998)zbMATHGoogle Scholar
  4. 4.
    Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)zbMATHGoogle Scholar
  5. 5.
    Niles, I., Pease, A.: Towards a Standard Upper Ontology. In: Proceedings of the 2nd International Conference on Formal Ontology in Information Systems (FOIS-2001), Ogunquit, Maine (2001)Google Scholar
  6. 6.
    Tufiş, D., Ion, R., Ceauşu, Al., Ştefănescu, D.: Improved Lexical Alignment by Combining Multiple Reified Alignments. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, Trento, pp. 153-160 (2006)Google Scholar
  7. 7.
    Alexandrov, M., Gelbukh, A., Rosso, P.: An Approach for Clustering Abstracts. In: Montoyo, A., Muńoz, R., Métais, E. (eds.) NLDB 2005. LNCS, vol. 3513, pp. 275–285. Springer, Heidelberg (2005)Google Scholar
  8. 8.
    Pacuit, E., Salame, S.: Majority logic. In: KR Proceedings, pp. 1–26 (2004)Google Scholar
  9. 9.
    Stoyanova, I., Koeva, S., Lesseva, S.: Applying and analysing Brown corpus model for Bulgarian (to appear)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Dan Tufiş
    • 1
  • Svetla Koeva
    • 2
  1. 1.Research Institute for Artificial Intelligence, Romanian Academy, 13, ”13 Septembrie”, 050711, BucharestRomania
  2. 2.Institute for Bulgarian Language, Bulgarian Academy of Sciences, 52 Shipchenski prohod, 1113 SofiaBulgaria

Personalised recommendations