Advertisement

Stemming Galician Texts

  • Nieves R. Brisaboa
  • Carlos Callón
  • Juan-Ramón López
  • Ángeles S. Places
  • Goretti Sanmartín
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2476)

Abstract

In this paper we describe a stemming algorithm for Galician language, which supports, at the same time, the four current orthographic regulations for Galician. The algorithm has already been implemented, and we have started to use it for its improvement. But this stemming algorithm cannot be applied over documents previous to the appearance of the first Galician orthographic regulation in 1977; therefore we have adopted an exhaustive approach, consisting in defining a huge collection of wordsets for allowing systematic word comparisons, to stem documents written before that date. We also describe here a tool to build the wordsets needed in this approach.

Keywords

Stemming Digital Libraries Text Retrieval 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Biblioteca Virtual Galega. http://bvg.udc.es.
  2. [2]
    Brisaboa, N. R., Ocaña, E., Penabad, M. R., Places, A. S., Rodríguez, F.J. Biblioteca 7Virtual de Literatura Gallega. In Proc. of IDEAS’2002, pp. 68–77. Cuba, 2002.Google Scholar
  3. [3]
  4. [4]
    Euromosaic: The production and reproduction of the minority language groups in the European Union, ISBN 92-827-5512-6, Luxembourg 1996.Google Scholar
  5. [5]
    European Bureau for the Lesser Used Languages, http://www.eblul.org.
  6. [6]
    Honrado, A., Leon, R., O’Donnell, R. and Sinclair, D. A Word Stemming Algorithm for the Spanish Language. In Proc. of the SPIRE’2000-IEEE Computer Society, pp. 139–145, A Corufia, 2000.Google Scholar
  7. [7]
    Freixeiro Mato, X. R., Gramática da lingua galega, Laiovento, Santiago de Compostela, 1998-2000 (3 vols.).Google Scholar
  8. [8]
    Freixeiro Mato, X. R., Lingua galega: normalidade e conflito, Laiovento, Santiago de Compostela, 2000.Google Scholar
  9. [9]
  10. [10]
    Kraaij, W., Pohlmami, R. Porter’s stemming algorithm for Dutch. In L.G.M. Noordman and W.A.M. de Vroomen, editors, Informatiewetenschap 1994: Wetenschappelijke bijdragen aan de derde STINFON Conferentie, pp. 167–180, Tilburg, 1994.Google Scholar
  11. [11]
    López, J.R., Iglesias, EX., Brisaboa, N.R., Paramá, J.R., Penabad, M.R. BBDD documental para el estudio del español del S. de Oro. In Proc. of CIICC’97, pp. 3–14. Mexico, 1997.Google Scholar
  12. [12]
    López, J.R., Iglesias, EX., Brisaboa, N.R., Paramá, J.R., Penabad, M.R. BBDD documental para el estudio del español antiguo. In Proc. of INFONOR’ 97, pp. 2–8., Chile, 1997.Google Scholar
  13. [13]
    Moreira, V., Huyck, C. A Stemming Algorithm for the Portuguese Language. In Proc. ofSPIRE’2001-IEEE Computer Society, pp. 186–193, Chile, 2001.Google Scholar
  14. [14]
    Portas, M., Língua e sociedade na Galiza, Bahía, A Coruña.Google Scholar
  15. [15]
    Smith, P.D. and Barnes, G.M. Files and Databases: An introduction. Addison-Wesley, 1987.Google Scholar
  16. [16]
  17. [17]
    Wechsler, M., Sheridan, P., Schäuble, P. Multi-Language Text Indexing for Internet Retrieval. In the Proc. of the 5 th RIAO Conference. Montreal, Canada, 1997.Google Scholar
  18. [18]
    Wurm, Stephen A. Atlas of the World’s Languages in Danger of Disappearing. UNESCO Publishing, ISBN 92-3-103798-6.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Nieves R. Brisaboa
    • 1
  • Carlos Callón
    • 2
  • Juan-Ramón López
    • 1
  • Ángeles S. Places
    • 1
  • Goretti Sanmartín
    • 2
  1. 1.Laboratorio de Bases de Datos. Departamento de ComputatiónPortugal
  2. 2.Departamento de Galego-Portugués, Francés e LinguísticaUniversidade da CoruñaPortugal

Personalised recommendations