Language Resources and Evaluation

, Volume 51, Issue 4, pp 897–925 | Cite as

Nenek: a cloud-based collaboration platform for the management of Amerindian language resources

  • J. L. Gonzalez
  • Anuschka van’t Hooft
  • Jesus Carretero
  • Victor J. Sosa-Sosa
Original Paper


This article presents Nenek: A cloud-based collaboration platform for language documentation of underresourced languages. Nenek is based on a crowdsourcing scheme that supports native speakers, indigenous associations, government agencies and researchers in the creation of virtual communities of minority language speakers on the Internet. Nenek includes a set of web tools that enables users to work collaboratively on language documentation tasks, build lexicographic assets and produce new language resources. This platform includes a three-stage management model to control the acquisition of existent language resources, the manufacturing of new resources, as well as their distribution within the virtual community and to the general public. In the acquisition stage, existent language resources are either automatically extracted from the web by a crawler or received through donations from users who participate in a monolingual social network. In the manufacturing stage, lexicographic and collaborative tools enable users to build new resources while the acquired and manufactured resources are published in the diffusion stage, either within the virtual community or publicly. We present a life cycle mapping scheme that registers the transformations of the language resources at each of the three stages of language resource management. This scheme also traces the utilization and diffusion of each resource produced by the virtual community. The paper includes a case study in which we present the use of the Nenek platform in a language documentation project of a Mayan language spoken in Mexico's Gulf coast region called Huastec. This case study reveals Nenek's efficiency in terms of acquisition, annotation, manufacturing and diffusion of language resources; it also discusses the participation of the members of the virtual community.


Under-resourced languages Language resource management Language documentation Cloud-based tools Digital repositories Collaboration Life cycle scheme 



We would like to thank the reviewers for their valuable feedback. The Nenek project is sponsored through a grant from the Mexican Secretary of Public Education and the National Council of Science and Technology (SEP-Conacyt research Grant CB-2012-180863). The work presented in this paper has been partially supported by EU under the COST programme Action IC1305, Network for Sustainable Ultrascale Computing (NESUS)


  1. Acosta, J., Hernández, T., Martínez, C., Acosta, N., LejkixKaw ti Tének (2013). An online dictionary created by speakers in a collaborative manner by using Nenek platform. Accessed October 15, 2015.
  2. Adam, A. (2008). Implementing electronic document and record management systems. Boca Raton: Auerbach Publications.Google Scholar
  3. Administration for Native Americans. Native languages archives preservation: A reference guide for establishing archives and repositories. Washington, DC. Accessed October 15, 2015.
  4. AILLA. The archive of the indigenous languages of Latin America. Accessed October 15, 2015.
  5. Alaska Native Language Archive. Accessed October 15, 2015.
  6. Aspell Dictionaries. Accessed October 15, 2015.
  7. Baroni, M., & Kilgarriff, A. (2006). Large linguistically-processed web corpora for multiple languages. In Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations (EACL ’06). Association for Computational Linguistics (pp. 87–90). Stroudsburg, PA, USA.Google Scholar
  8. Baroni, M., Bernardini, S., Ferraresi, A., & Zanchetta, E. (2009). The WaCky wide web: A collection of very large linguistically processed web-crawled corpora. Language Resources and Evaluation, 43, 209–226. doi: 10.1007/s10579-009-9081-4.CrossRefGoogle Scholar
  9. BibliotecaTenek. (2013). Online library for Huastec speakers. Accessed October 15, 2015.
  10. Carretero, J., Gonzalez J. L., & Hooft, A. (2015). Co-Tenek. A Huastec spell checker. Accessed October 15, 2015.
  11. Carretero, J., Scannell, K., Gonzalez, J. L., & Hooft, A. (2015). Co-Tenek. A Huastec spell checker for Mozilla. Accessed October 15, 2015.
  12. Chang, D. (2009). TAPS: Checklist for responsible archiving of digital language resources. MA thesis, Graduate Institute of Applied Linguistics.Google Scholar
  13. DoBes. Documentación de lenguas amenazadas. Accessed September 29, 2015.
  14. ELAR. Endangered languages archive. Accessed September 29, 2015.
  15. Ethnologue. The language of the world. Accessed September 29, 2015.
  16. Gippert, J., Nikolaus, P., Himmelmann, N., & Ulrike, M. (Eds.) (2006). Thick interfaces: Mobilizing language documentation with multimedia. In Essentials of language documentation (pp. 363–379). Berlin: Mouton de Gruyter.Google Scholar
  17. Gonzalez, J. L., Carretero, J., Sosa-Sosa, J., Sanchez, M., & Bergua, B. (2015). SkyCDS: A resilient content delivery service based on diversified cloud storage. Simulation Modelling Practice and Theory, 54, 64–85.CrossRefGoogle Scholar
  18. Gonzalez, J. L., & Marcelin-Jimenez, R. (2011). Phoenix: Fault tolerant distributed web storage based on urls. Journal of Convergence, Section C: Web and Multimedia, 2(1), 79–85.Google Scholar
  19. González, J. L., Pérez, J. C., Sosa-Sosa, V., Cardoso, J. F. R., & Marcelín-Jiménez, R. (2013). An approach for constructing private storage services as a unified fault-tolerant system. Journal of Systems and Software, 86(7), 1907–1922.CrossRefGoogle Scholar
  20. Good, J. (2010). Finding the linguists place in a new technological universe. In L. A. Grenoble & N. L. Furbee (Eds.), Language documentation: practice and values (pp. 111–132). Amsterdam: John Benjamins Publishing Company.CrossRefGoogle Scholar
  21. Grenoble, L. A., & Whaley, L. J. (2006). Saving languages. An introduction to language revitalization. Cambridge: Cambridge University Press.Google Scholar
  22. Harrison, K. D. (2007). When languages die: The extinction of the world’s languages and the erosion of human knowledge. New York: Oxford University Press.CrossRefGoogle Scholar
  23. HD2015. Habilidades Digitales para todos. Accessed October 15, 2015.
  24. Himmelmann, N. P. (1998). Documentary and descriptive linguistics. Linguistics, 36, 161–195.CrossRefGoogle Scholar
  25. Himmelmann, N. P. (2006). Language documentation: What is it and what is it good for? In J. Gippert, N. P. Himmelmann, & U. Mosel (Eds.), Essentials of language documentation (pp. 1–30). Berlin: Mouton de Gruyter.Google Scholar
  26. Hinton, L. (2001). Language revitalization: An overview. In L. Hinton & K. Hale (Eds.), The green book of language revitalization in practice (pp. 3–18). San Diego: Academic Press.Google Scholar
  27. Hooft, A., & Gonzalez, J. L. (2014). Collaborative language documentation: The construction of the Huastec corpus CCURL2014: Collaboration and computing for under-resourced languages in the linked open data era CCURL 2014 Reykjavik, Iceland May 26, 2014.Google Scholar
  28. INEGI: Censo de Población y Vivienda. (2010). Instituto Nacional de Estadística, Geografía e Informática. Aguascalientes (Mexico). Accessed October 15, 2015.
  29. IWS: Internet world stats: Users by language. Accessed September 29, 2015.
  30. JournalTenek. Special edition for Huastec speakers, Teczapic ITV Journal.
  31. Kilgarriff, A., & Grefenstette, G. (2003). Introduction to the special issue on the web as corpus. Computational Linguistics, 29(3), 333–347. doi: 10.1162/089120103322711569.CrossRefGoogle Scholar
  32. Krauss, M. (1992). The worlds languages in crisis. Language, 68, 4–10.CrossRefGoogle Scholar
  33. Nathan, D., & Austin, P. K. (2004). Reconceiving metadata: Language documentation through thick and thin. In P. K. Austin (Ed.), Language documentation and description (Vol. 2, pp. 179–187). London: SOAS.Google Scholar
  34. Nenek in Facebook. Accessed October 15, 2015.
  35. Nenek in Twitter. Accessed October 15, 2015.
  36. Open source software for creating private and public clouds.
  37. Pangloss, Lacito: Langues et civilisations a tradition orale. Accessed September 29, 2015.
  38. Paolillo, J., Pimienta, D., Prado, D., et al. (2005). Measuring linguistic diversity on the internet. Montreal: UNESCO Institute for Statistics.Google Scholar
  39. Paolillo, J., Pimienta, D., & Blanco, A. (2009). Twelve years of measuring linguistic diversity in the internet: Balance and perspectives. Paris: United Nations Educational, Scientific and Cultural Organization.Google Scholar
  40. Paradisce: The Pacific and regional archive for digital sources in endangered cultures. Accessed September 29, 2015.
  41. Rheingold, H. (1993). The virtual community. Homesteading the electronic frontier. Accessed October 15, 2015.
  42. Summer Institute of Linguistics Language & Culture Archives, Hustec Section. Accessed October 15, 2015.
  43. TLA: The language archive. Accessed October 15, 2015.
  44. Warschauer, M. (2001). Language, identity and the internet. In B. Kolko, L. Nakamura, & G. Rodman (Eds.), Race in cyberspace (pp. 151–170). New York: Routledge. Accessed October 15, 2015.

Copyright information

© Springer Science+Business Media Dordrecht 2016

Authors and Affiliations

  • J. L. Gonzalez
    • 1
  • Anuschka van’t Hooft
    • 2
  • Jesus Carretero
    • 3
  • Victor J. Sosa-Sosa
    • 1
  1. 1.CinvestavTamaulipasMexico
  2. 2.FCSH-UASLPSan Luis PotosiMexico
  3. 3.ARCOS-UC3MMadridSpain

Personalised recommendations