Building a Nasa Yuwe Language Test Collection

  • Luz Marina SierraEmail author
  • Carlos Alberto Cobos
  • Juan Carlos Corrales
  • Tulio Rojas Curieux
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9041)


The nasa yuwe is the language of the Paez people in Colombia is currently an endangered language[1]. The nasa community has therefore been reviewing different strategies with the purpose of encouraging 1) the visualization process of the language and 2) the sensibilization of the use of the language, by means of computational tools. With the intention of making a contribution to both of these areas, the building of an information retrieval system (IRS) for texts written in Nasa Yuwe is proposed. This would be expected to encourage writing in Nasa Yuwe and the retrieval of documents written in the language. To implement the system, it is necessary to have a test collection with which to assess the IRS, so that the first step, prior to IRS development, is to build that test collection specifically for Nasa Yuwe texts, something which is not currently available. This paper thus presents the first test collection in Nasa Yuwe, as well as showing its construction process and results. The results allow appreciation of:1) the process of building the Nasa Yuwe test collection, 2) the queries, expert opinions and documents; and 3) a statistical analysis of the data, including an analysis of Zipf’s Law[2].


test collection Nasa Yuwe language information retrieval system expert judgment 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Rojas Curieux, T.: Por los caminos de la recuperación de la lengua Paéz (nasa yuwe), Popayán Letrarte editores (2006)Google Scholar
  2. 2.
    Manning, C., Raghavan, P., Shütze, H.: An Introduction to Information Retrieval. Cambridge University Press (2009)Google Scholar
  3. 3.
    Moseley, C.: Atlas de las lenguas del mundo en peligro. Ediciones UNESCO, Popayán (2010), Versión en línea: (accessed Marzo 2013)
  4. 4.
    Instituto Colombiano de Cultura Hispánica: Geografía Humana de Colombia, Región Andina Central Tomo IV Volumen II, Bogotá: Banco de la República (2000)Google Scholar
  5. 5.
    Rojas, C., Esbozo Gramatical de la, T.: lengua nasa (lengua Paéz). In: El Lenguaje en Colombia, Tomo I: Realidad Lingüística de Colomba, Bogotá, Academía Colombiana de la Lengua e Instituto Caro y Cuervo, pp. 479–495 (2009)Google Scholar
  6. 6.
    Universidad del Cauca, CRIC-PEBI-Comisión General de Lenguas: Estudio Sociolingüistico Fase preliminar. Base de datos - CRIC 01/2007 Lengua Nasa Yuwe y Namtrik. Popayán, Cauca, Colombia (2008)Google Scholar
  7. 7.
    Farfán Martínez, M., Rojas Curieux, T.: Zuy Luuçxkwe kwe’kwe’sx ipx kwetuy piyaaka. Cartilla de aprendizaje de nasa yuwe como segunda lengua, Buenos Aires (2010)Google Scholar
  8. 8.
    Jung, I.: Gramática del Páez o nasa yuwe. Descripción de una Lengua Indígena de Colombia. LINOM GmbH (1984, 2008) Google Scholar
  9. 9.
    CRIC y el Programa de Dllo Rural en la Región de Tierra Dentro Cxhab Wala -PT/CW, Diccionario Nasa Yuwe - Castellano, Primera ed., Popayán: Litografía San José (2005) Google Scholar
  10. 10.
    Rojas Curieux, T., Perdomo Dizu, A., Corrales Carvaja, M.H.: Una Mirada al nasa yuwe de Novirao, Primera ed., Popayán: Sello Editorial Universidad del Cauca (2009)Google Scholar
  11. 11.
    Rojas Curieux, T.E.: La lengua paéz una visión de su gramática, primera ed., M. d. Cultura, Ed., Bogotá: Panamericana Formas e Impresos S.A (1998)Google Scholar
  12. 12.
    Carterette, B., Voorhees, E.M.: Overview of Information Retrieval Evaluation. In: Current Challenges in Patent Information Retrieval, pp. 69–85. Springer (2011)Google Scholar
  13. 13.
    Jadidinejad, A.H., Mahmoudi, F., Dehdari, J.: Evaluation of Perstem: A Simple and Efficient Stemming Algorithm for Persian. In: Peters, C., Di Nunzio, G.M., Kurimo, M., Mandl, T., Mostefa, D., Peñas, A., Roda, G. (eds.) CLEF 2009. LNCS, vol. 6241, pp. 98–101. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  14. 14.
    Agosti, M., Bacchin, M., Ferro, N., Melucci, M.: Improving the Automatic Retrieval of Text Documents. In: Peters, C., Braschler, M., Gonzalo, J. (eds.) CLEF 2002. LNCS, vol. 2785, pp. 279–290. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  15. 15.
    Peters, C., Braschler, M., Clough, P.: Evaluation for Multilingual Information Retrieval Systems. In: Multilingual Information Retrieval, pp. 129–169. Springer (2012)Google Scholar
  16. 16.
    NTCIR Project, NTCIR Project 2007 (En línea), (Último acceso: December 5, 2014)
  17. 17.
    Ribeiro-Neto, B., Baeza-Yates, R.: Modern Information Retrieval -the concepts and technology behind search, 2nd edn. Addison Wesley, Harlow (2011)Google Scholar
  18. 18.
    Sheykh Esmaili, K., Salavati, S., Yosefi, S.: Building A Test Collection For Sorani Kurdish. In: ACS International Conference on Computer Systems and Applications (AICCSA), Ifrane (2013)Google Scholar
  19. 19.
    Esmaili, K., Abolhassani, H., Neshati, M., Behrangi, E.: Mahak: A Test Collection for Evaluation of Farsi Information Retrieval Systems. In: IEEE/ACS International Conference on Computer Systems and Applications, AICCSA 2007, pp. 639–644. IEEE (2007)Google Scholar
  20. 20.
    Armenska, J., Tomovski, A., Zdravkova, K., Pehcevski, J.: Information Retrieval Using a Macedonian Test Collection for Question Answering. In: Gusev, M., Mitrevski, P. (eds.) ICT Innovations 2010. CCIS, vol. 83, pp. 205–214. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  21. 21.
    AleAhmad, A., Amiri, H., Darrudi, E., Rahgozar, M., Oroumchian, F.: Hamshahri: A standard Persian text collection. Knowledge-Based Systems 22(5), 382–387 (2009)CrossRefGoogle Scholar
  22. 22.
    Kuriyama, K., Kando, N., Nozue, T., Eguchi, K.: Pooling for a Large-Scale Test Collection: An Analysis of the Search Results from the First NTCIR Workshop. Information Retrieval 5(1), 41–59 (2002)CrossRefzbMATHGoogle Scholar
  23. 23.
    Consejo Regional Indígena del Cauca – Programa de Educación Bilingüe e Intercultural (PEBI - CRIC): Universidad Autónoma Indígena Intercultural –UAIIN (2015), accessed Marzo 2015)
  24. 24.
    Consejo Regional Indígena del Cauca – Programa de Educación Bilingüe e Intercultural (PEBI - CRIC): Cuentos y Cosmovisión Nasa. Area Nasawe’sx Fxinzenxi, Segunda ed., Popayán (2010) Google Scholar
  25. 25.
    Consejo Regional Indígena del Cauca – Programa de Educación Bilingüe e Intercultural (PEBI - CRIC): Te invitamos a leer. Eç thegya’ ipi’ki’ tha’w, Primera ed., Cali: Grafitextos (2007) Google Scholar
  26. 26.
    Asociación de Cabildos Ukawe’sx Nasa Çxhab, Consejo Regional Indígena del Cauca – Programa de Educación Bilingüe e Intercultural (PEBI - CRIC): NASAWE’SX KIWAKA FXI’ZENXI ẼEN, Primera ed., Cali: Grafitextos (2006) Google Scholar
  27. 27.
    Yule Yatacue, M., Vitonas Pavi, C.: Pees kupx fxi’zenxi. La metamorfosis de la vida, Tercera ed., Toribio, Cauca: Grafitextos (2012)Google Scholar
  28. 28.
    Consejo Regional Indígena del Cauca – CRIC, Programa de Educación Bilingüe e Intercultural.: Sistema Educativo Indígena Propio -SEIP. Primer Documento de Trabajo (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Luz Marina Sierra
    • 1
    Email author
  • Carlos Alberto Cobos
    • 1
  • Juan Carlos Corrales
    • 1
  • Tulio Rojas Curieux
    • 1
  1. 1.University of CaucaPopayánColombia

Personalised recommendations