Advertisement

Morphological and Syntactic Processing for Text Retrieval

  • Jesús Vilares
  • Miguel A. Alonso
  • Manuel Vilares
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3180)

Abstract

This article describes the application of lemmatization and shallow parsing as a linguistically-based alternative to stemming in Text Retrieval, with the aim of managing linguistic variation at both word level and phrase level. Several alternatives for selecting the index terms among the syntactic dependencies detected by the parser are evaluated. Though this article focuses on Spanish, this approach is extensible to other languages by simply adapting the grammar used by the parser.

Keywords

Noun Phrase Natural Language Processing Linguistic Variation Verbal Group Prepositional Phrase 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    http://snowball.tartarus.org (site visited October 2003)
  2. 2.
    Barcala, F.M., Vilares, J., Alonso, M.A., Graña, J., Vilares, M.: Tokenization and proper noun recognition for information retrieval. In: DEXA Workshop 2002, pp. 246–250. IEEE Computer Society Press, Los Alamitos (2002)Google Scholar
  3. 3.
    Buckley, C.: Implementation of the SMART information retrieval system. Technical report, Department of Computer Science, Cornell University (1985)Google Scholar
  4. 4.
    Graña, J.: Técnicas de Análisis Sintáctico Robusto para la Etiquetación del Lenguaje Natural. PhD thesis, University of La Coruña, La Coruña, Spain (2000)Google Scholar
  5. 5.
    Graña, J., Alonso, M.A., Vilares, M.: A common solution for tokenization and part-of-speech tagging: One-pass Viterbi algorithm vs. iterative approaches. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2002. LNCS (LNAI), vol. 2448, pp. 3–10. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  6. 6.
    Graña, J., Barcala, F.M., Alonso, M.A.: Compilation methods of minimal acyclic automata for large dictionaries. In: Watson, B.W., Wood, D. (eds.) CIAA 2001. LNCS, vol. 2494, pp. 135–148. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  7. 7.
    Grefenstette, G., Schiller, A., Aït-Mokhtar, S.: Recognizing lexical patterns in text. In: Van Eynde, F., Gibbon, D. (eds.) Lexicon Development for Speech and Language Processing, pp. 141–168. Kluwer Academic, Dordrecht (2000)Google Scholar
  8. 8.
    Jacquemin, C., Tzoukermann, E.: NLP for term variant extraction: synergy between morphology, lexicon and syntax. In: Strzalkowski, T. (ed.) Natural Language Information Retrieval, pp. 25–74. Kluwer Academic, Dordrecht (1999)Google Scholar
  9. 9.
    Khan, M.S., Khor, S.: Enhanced web document retrieval using automatic query expansion. JASIST 55(1), 29–40 (2004)CrossRefGoogle Scholar
  10. 10.
    Khoo, C.S.-G.: The use of relation matching in Information Retrieval. LIBRES: Library and Information Science Research 7(2) (1997)Google Scholar
  11. 11.
    Montes-y-Gómez, M., Gelbukh, A., López-López, A., Baeza-Yates, R.: Flexible Comparison of Conceptual Structures. In: Mayr, H.C., Lazanský, J., Quirchmayr, G., Vogel, P. (eds.) DEXA 2001. LNCS, vol. 2113, pp. 102–111. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  12. 12.
    Montes-y-Gómez, M., López-López, A., Gelbukh, A.: Information Retrieval with conceptual graph matching. In: Ibrahim, M., Küng, J., Revell, N. (eds.) DEXA 2000. LNCS, vol. 1873, pp. 312–321. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  13. 13.
    Narita, M., Ogawa, Y.: The use of phrases from query texts in information retrieval. In: Proc. of ACM SIGIR 2000, Athens, Greece, pp. 318–320 (2000)Google Scholar
  14. 14.
    Nicolas, S., Moulin, B., Mineau, G.W.: Sesei: A CG-based filter for Internet search engines. In: Ganter, B., de Moor, A., Lex, W. (eds.) ICCS 2003. LNCS (LNAI), vol. 2746, pp. 362–377. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  15. 15.
    Peters, C., Borri, F. (eds.): Results of the CLEF 2003 Cross-Language System Evaluation Campaign, Working Notes for the CLEF 2003 Workshop, Trondheim, Norway (August 2003)Google Scholar
  16. 16.
    Rocchio, J.J.: Relevance Feedback in Information Retrieval. In: Salton, G. (ed.) The SMART Retrieval System-Experiments in Automatic Document Processing, Prentice Hall, Englewood Cliffs (1971)Google Scholar
  17. 17.
    Vilares, J., Alonso, M.A.: A Grammatical Approach to the Extraction of Index Terms. In: Proceedings of International Conference on Recent Advances in Natural Language Processing (RANLP 2003), Borovets, Bulgaria, pp. 500–504 (2003)Google Scholar
  18. 18.
    Vilares, J., Barcala, F.M., Alonso, M.A.: Using syntactic dependency-pairs conation to improve retrieval performance in Spanish. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 381–390. Springer, Heidelberg (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Jesús Vilares
    • 1
  • Miguel A. Alonso
    • 1
  • Manuel Vilares
    • 2
  1. 1.Departamento de ComputaciónUniversidade da CoruñaLa CoruñaSpain
  2. 2.Escuela Superior de Ingeniería InformáticaUniversidade de VigoOrenseSpain

Personalised recommendations