Nenek: a cloud-based collaboration platform for the management of Amerindian language resources
This article presents Nenek: A cloud-based collaboration platform for language documentation of underresourced languages. Nenek is based on a crowdsourcing scheme that supports native speakers, indigenous associations, government agencies and researchers in the creation of virtual communities of minority language speakers on the Internet. Nenek includes a set of web tools that enables users to work collaboratively on language documentation tasks, build lexicographic assets and produce new language resources. This platform includes a three-stage management model to control the acquisition of existent language resources, the manufacturing of new resources, as well as their distribution within the virtual community and to the general public. In the acquisition stage, existent language resources are either automatically extracted from the web by a crawler or received through donations from users who participate in a monolingual social network. In the manufacturing stage, lexicographic and collaborative tools enable users to build new resources while the acquired and manufactured resources are published in the diffusion stage, either within the virtual community or publicly. We present a life cycle mapping scheme that registers the transformations of the language resources at each of the three stages of language resource management. This scheme also traces the utilization and diffusion of each resource produced by the virtual community. The paper includes a case study in which we present the use of the Nenek platform in a language documentation project of a Mayan language spoken in Mexico's Gulf coast region called Huastec. This case study reveals Nenek's efficiency in terms of acquisition, annotation, manufacturing and diffusion of language resources; it also discusses the participation of the members of the virtual community.
KeywordsUnder-resourced languages Language resource management Language documentation Cloud-based tools Digital repositories Collaboration Life cycle scheme
We would like to thank the reviewers for their valuable feedback. The Nenek project is sponsored through a grant from the Mexican Secretary of Public Education and the National Council of Science and Technology (SEP-Conacyt research Grant CB-2012-180863). The work presented in this paper has been partially supported by EU under the COST programme Action IC1305, Network for Sustainable Ultrascale Computing (NESUS)
- Acosta, J., Hernández, T., Martínez, C., Acosta, N., LejkixKaw ti Tének (2013). An online dictionary created by speakers in a collaborative manner by using Nenek platform. http://www.nenek.mx/ES/?opc=dictionary. Accessed October 15, 2015.
- Adam, A. (2008). Implementing electronic document and record management systems. Boca Raton: Auerbach Publications.Google Scholar
- Administration for Native Americans. Native languages archives preservation: A reference guide for establishing archives and repositories. Washington, DC. http://www.aihec.org/resources/documents/NativeLanguagePreservationReferenceGuide. Accessed October 15, 2015.
- AILLA. The archive of the indigenous languages of Latin America. http://www.ailla.utexas.org/site/welcome.html. Accessed October 15, 2015.
- Alaska Native Language Archive. https://www.uaf.edu/anla/. Accessed October 15, 2015.
- Aspell Dictionaries. ftp://ftp.gnu.org/gnu/aspell/dict/0index.html. Accessed October 15, 2015.
- Baroni, M., & Kilgarriff, A. (2006). Large linguistically-processed web corpora for multiple languages. In Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations (EACL ’06). Association for Computational Linguistics (pp. 87–90). Stroudsburg, PA, USA.Google Scholar
- BibliotecaTenek. (2013). Online library for Huastec speakers. http://www.nenek.mx/ECAD/. Accessed October 15, 2015.
- Carretero, J., Gonzalez J. L., & Hooft, A. (2015). Co-Tenek. A Huastec spell checker. http://www.nenek.mx/huasteco.dic. Accessed October 15, 2015.
- Carretero, J., Scannell, K., Gonzalez, J. L., & Hooft, A. (2015). Co-Tenek. A Huastec spell checker for Mozilla. https://addons.mozilla.org/addon/huastec-spell-checker/. Accessed October 15, 2015.
- Chang, D. (2009). TAPS: Checklist for responsible archiving of digital language resources. MA thesis, Graduate Institute of Applied Linguistics.Google Scholar
- DoBes. Documentación de lenguas amenazadas. http://dobes.mpi.nl/?lang=es. Accessed September 29, 2015.
- ELAR. Endangered languages archive. http://www.elar-archive.org/index.php. Accessed September 29, 2015.
- Ethnologue. The language of the world. http://www.ethnologue.com/17/. Accessed September 29, 2015.
- Gippert, J., Nikolaus, P., Himmelmann, N., & Ulrike, M. (Eds.) (2006). Thick interfaces: Mobilizing language documentation with multimedia. In Essentials of language documentation (pp. 363–379). Berlin: Mouton de Gruyter.Google Scholar
- Gonzalez, J. L., & Marcelin-Jimenez, R. (2011). Phoenix: Fault tolerant distributed web storage based on urls. Journal of Convergence, Section C: Web and Multimedia, 2(1), 79–85.Google Scholar
- Grenoble, L. A., & Whaley, L. J. (2006). Saving languages. An introduction to language revitalization. Cambridge: Cambridge University Press.Google Scholar
- HD2015. Habilidades Digitales para todos. http://www.sep.gob.mx/es/sep1/habilidades_digitales_para_todos#.VibUA6dVKlN. Accessed October 15, 2015.
- Himmelmann, N. P. (2006). Language documentation: What is it and what is it good for? In J. Gippert, N. P. Himmelmann, & U. Mosel (Eds.), Essentials of language documentation (pp. 1–30). Berlin: Mouton de Gruyter.Google Scholar
- Hinton, L. (2001). Language revitalization: An overview. In L. Hinton & K. Hale (Eds.), The green book of language revitalization in practice (pp. 3–18). San Diego: Academic Press.Google Scholar
- Hooft, A., & Gonzalez, J. L. (2014). Collaborative language documentation: The construction of the Huastec corpus CCURL2014: Collaboration and computing for under-resourced languages in the linked open data era CCURL 2014 Reykjavik, Iceland May 26, 2014.Google Scholar
- INEGI: Censo de Población y Vivienda. (2010). Instituto Nacional de Estadística, Geografía e Informática. Aguascalientes (Mexico). http://www.inegi.org.mx/est/contenidos/proyectos/ccpv/cpv2010/. Accessed October 15, 2015.
- IWS: Internet world stats: Users by language. http://www.internetworldstats.com/stats7.htm. Accessed September 29, 2015.
- JournalTenek. Special edition for Huastec speakers, Teczapic ITV Journal. http://www.nenek.mx/Journal
- Nathan, D., & Austin, P. K. (2004). Reconceiving metadata: Language documentation through thick and thin. In P. K. Austin (Ed.), Language documentation and description (Vol. 2, pp. 179–187). London: SOAS.Google Scholar
- Nenek in Facebook. https://www.facebook.com/NenekMexico. Accessed October 15, 2015.
- Nenek in Twitter. https://twitter.com/NenekMexico. Accessed October 15, 2015.
- Open source software for creating private and public clouds. https://www.openstack.org/
- Pangloss, Lacito: Langues et civilisations a tradition orale. http://lacito.vjf.cnrs.fr/pangloss/index_en.htm. Accessed September 29, 2015.
- Paolillo, J., Pimienta, D., Prado, D., et al. (2005). Measuring linguistic diversity on the internet. Montreal: UNESCO Institute for Statistics.Google Scholar
- Paolillo, J., Pimienta, D., & Blanco, A. (2009). Twelve years of measuring linguistic diversity in the internet: Balance and perspectives. Paris: United Nations Educational, Scientific and Cultural Organization.Google Scholar
- Paradisce: The Pacific and regional archive for digital sources in endangered cultures. http://paradisec.org.au/home.html. Accessed September 29, 2015.
- Rheingold, H. (1993). The virtual community. Homesteading the electronic frontier. http://www.well.com/user/hlr/vcbook/. Accessed October 15, 2015.
- Summer Institute of Linguistics Language & Culture Archives, Hustec Section. http://www.sil.org/resources/search?query=huastec. Accessed October 15, 2015.
- TLA: The language archive. https://tla.mpi.nl/. Accessed October 15, 2015.
- Warschauer, M. (2001). Language, identity and the internet. In B. Kolko, L. Nakamura, & G. Rodman (Eds.), Race in cyberspace (pp. 151–170). New York: Routledge. http://motspluriels.arts.uwa.edu.au/MP1901mw.html. Accessed October 15, 2015.