Tools for Nominalization: An Alternative for Lexical Normalization

  • Marco Antonio Insaurriaga Gonzalez
  • Vera Lúcia Strube de Lima
  • José Valdeni de Lima
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3960)


The recognition of morphological variation and conceptual proximity of the words is crucial for tasks where the lexical normalization is used, such as term generation and matching in an information retrieval environment. We present tools that automatically perform nominalization for lexical normalization in Portuguese. Comparing the effects of three alternative strategies (stemming, lemmatizing, and our proposal: nominalization), we demonstrate through an experimental evaluation that nominalization, as lexical normalization, contributes to the performance improvement in a probabilistic information retrieval approach for Portuguese.


Information Retrieval Input String Input Word Concrete Noun Inflectional Language 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Arampatzis, A.T., Weide, T.P., Koster, C.H.A., Bommel, P.: Linguistically-motivated Information Retrieval. Encyclopedia of Library and Inf. Science 69, 201–222 (2000)Google Scholar
  2. 2.
    Bick, E. The Parsing System Palavras, Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. A. University Press (2000) Google Scholar
  3. 3.
    Braschler, M., Ripplinger, B.: How Effective is Stemming and Decompounding for German Text Retrieval? Information Retrieval Journal 7, 291–316 (2004)CrossRefGoogle Scholar
  4. 4.
    Ferreira, A.B.H.: Dicionário Aurélio Eletrônico – Século XXI. Nova Fronteira S.A., Rio de Janeiro (1999)Google Scholar
  5. 5.
    Frakes, W.B., Baeza-Yates, R.: Information Retrieval: Data Structures and Algorithms. Prentice-Hall, New York (1992)Google Scholar
  6. 6.
    Gonzalez, M., de Lima, V.L.S., de Lima, J.V.: Binary Lexical Relations for Text Representation in Information Retrieval. In: Montoyo, A., Muńoz, R., Métais, E. (eds.) NLDB 2005. LNCS, vol. 3513, pp. 21–31. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  7. 7.
    Gonzalez, M.: Termos e Relacionamentos em Evidência na Recuperação de Informação. PhD thesis, Instituto de Informática, UFRGS (2005)Google Scholar
  8. 8.
  9. 9.
    Kettunen, K., Kunttu, T., Järvelin, K.: To stem or lemmatize a highly inflectional language in a probabilistic IR environment? Journal of Documentation 65 (2005)Google Scholar
  10. 10.
    Korenius, T., Laurikkala, J., Järvelin, K., Juhola, M.: Stemming and Lemmatization in the Clustering of Finnish Text Documents. In: 13th Conference on Information and Knowledge Management, CIKM. Proceedings, pp. 625–634 (2004)Google Scholar
  11. 11.
    Krovetz, R.: Viewing morphology as an inference process. Artificial Intelligence 118, 227–294 (2000)CrossRefGoogle Scholar
  12. 12.
    Lapata, M.: The Disambiguation of Nominalizations. Computational Linguistics 28(3), 357–388 (2002)CrossRefGoogle Scholar
  13. 13.
    Mayfield, J., McNamee, P.: Single N-gram Stemming. In: 26th Annual International ACM SIGIR conference on research and development in IR. Proceedings, pp. 415–416 (2003)Google Scholar
  14. 14.
    Orengo, V.M., Huyck, C.: A Stemming Algorithm for the Portuguese Language. In: 8th Symposium on String Processing and IR, SPIRE. Proceedings, pp. 186–193 (2001)Google Scholar
  15. 15.
    Perini, M.A.: Para uma Nova Gramática do Português. São Paulo, Ática (2000)Google Scholar
  16. 16.
    Savary, A., Jacquemin, C.: Reducing Information Variation in Text. In: Renals, S., Grefenstette, G. (eds.) Text- and Speech-Triggered Information Access. LNCS (LNAI), vol. 2705, pp. 145–181. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  17. 17.
    Sever, H., Bitirim, Y.: FindStem: Analysis and Evaluation of a Turkish Stemming Algorithm. In: 10th Symposium on String Processing and IR, SPIRE. Proceedings, pp. 238–251 (2003)Google Scholar
  18. 18.
    Sparck-Jones, K., Walker, S., Robertson, S.E.: A Probabilistic Model of Information Retrieval: Development and Comparative Experiments – Part 1 and 2. Information Processing and Management 36(6), 779–840 (1997)CrossRefGoogle Scholar
  19. 19.
    Vilares, J., Barcala, F.M., Alonso, M.A.: Using Syntactic dependency-pairs conflation to improve retrieval performance in Spanish. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 381–390. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  20. 20.
    Voorhees, E.M.: Overview of TREC 2003. In: 12th Text Retrieval Conference, Gaithersburg. NIST Special Publication - SP500-255 (2003)Google Scholar
  21. 21.
    Ziviani, N.: Text Operations. In: Baeza-Yates, R., Ribeiro-Neto, B. (eds.) Modern Information Retrieval. ACM Press, New York (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Marco Antonio Insaurriaga Gonzalez
    • 1
    • 2
  • Vera Lúcia Strube de Lima
    • 1
  • José Valdeni de Lima
    • 2
  1. 1.PUCRS – Faculdade de InformáticaPorto AlegreBrazil
  2. 2.UFRGS – Instituto de InformáticaPorto AlegreBrazil

Personalised recommendations