Abstract
The GOLD Community of Practice is proposed as a model for linking on-line linguistic data to an ontology. The key components of the model include the linguistic data resources themselves and those focused on the knowledge derived from data. Data resources include the ever-increasing amount of linguistic field data and other descriptive language resources being migrated to the Web. The knowledge resources capture generalizations about the data and are anchored in the General Ontology for Linguistic Description (GOLD). It is argued that such a model is in the spirit of the vision for a Semantic Web and, thus, provides a concrete methodology for rendering highly divergent resources semantically interoperable. The focus of this work, then, is not on annotation at the syntactic level, but rather on how annotated Web resources can be linked to an ontology. Furthermore, a methodology is given for creating specific communities of practice within the overall Web infrastructure for linguistics. Finally, ontology-driven search is discussed as a key application of the proposed model.
Similar content being viewed by others
References
Aristar, A. (2003). ‘FIELD’. Technical report, presented at the workshop on digitizing and annotating texts and field recordings. LSA Institute.
Berners-Lee, T., Hendler, J., & Lassila, O. (2001). The Semantic Web. Scientific American.
Bird, S., & Simons, G. F. (2003a). Extending Dublin Core metadata to support the description and discovery of language resources. Computers and the Humanities, 37, 375–388. http://www.arxiv.org/abs/cs.CL/0308022
Bird, S., & Simons, G. F. (2003b). Seven dimensions of portability for language documentation and description. Language, 79, 557–582.
Bruening, B. (2001). Syntax at the edge: Cross-clausal phenomena and the syntax of passamaquoddy. Ph.D. thesis, MIT.
Calzolari, N., Bertagna, F., Lenci, A., & Monachini, M. (2002). Standards and best practice for multilingual computational lexicons & MILE (the Multilingual ISLE Lexical Entry). ISLE Deliverable D2.2-D3.2, ISLE Computational Lexicons Working Group. http://www.ilc.cnr.it/EAGLES96/isle/clwg_doc/ISLE_D2.2-D3.2.zip(2006-07-09).
Calzolari, N., Grishman, R., & Palmer, M. (2001). Survey of major approaches towards Bilingual/Multilingual Lexicons. ISLE Deliverable D2.1-D3.1, ISLE Computational Lexicons Working Group, Pisa.
Calzolari, N., McNaught, J., Palmer, M., & Zampolli, A. (2003). ISLE D14.2-Final report. ISLE Deliverable D14.2, ISLE. http://www.ilc.cnr.it/EAGLES96/isle/ISLE_D14.2.zip (2006-07-09).
Farrar, S. (in press). Using ‘Ontolinguistics’ for language description. In A. Schalley & D. Zaefferer (Eds.), Ontolinguistics: How ontological status shapes the linguistic coding of concepts. Berlin: Mouton de Gruyter. http://www.u.arizona.edu/∼farrar/papers/Far-fc.pdf
Farrar, S., & Langendoen, D. T. (2003). A linguistic ontology for the Semantic Web. GLOT International, 7(3), 97–100. http://www.u.arizona.edu/∼farrar/papers/FarLang03b.pdf
Greenberg, J. (1966). Language universals. Mouton: The Hague.
Ide, N., Lenci, A., & Calzolari, N. (2003). RDF instantiation of ISLE/MILE lexical entries. In Proceedings of ACL’03 workshop on linguistic annotation: Getting the model right, Sapporo, pp. 30–37. http://www.cs.vassar.edu/∼ide/papers/ACL2003-ws-ISLE.pdf(2006-07-09).
Ide, N., & Romary, L. (2004). International standard for a linguistic annotation framework. Journal of Natural Language Engineering, 10(3–4), 211–225.
Kemps-Snijders, M., Nederhof, M.-J., & Wittenburg, P. (2006). LEXUS, a web-based tool for manipulating lexical resources. In LREC 2006: fifth international conference on language resources and evaluation, Genoa, Italy, pp. 1862–1865.
Langendoen, D. T., Farrar, S., & Lewis, W. D. (2002). Bridging the markup gap: Smart search engines for language researchers. In Proceedings of the international workshop on resources and tools in field linguistics. Las Palmas, Gran Canaria, Spain. http://www.u.arizona.edu/∼farrar/papers/LangFarLew02.pdf
Lenci, A., Busa, F., Ruimy, N., Monachini, E. G. M., Calzolari, N., & Zampolli, A. (2000). Linguistic specifications. SIMPLE deliverable D2.1, ILC and University of Pisa, Pisa. http://www.ub.es/gilcub/SIMPLE/reports/simple/SIMPLE_FGuidelines.rtf.zip(2006-07-09).
Lewis, W. D. (2006). ODIN: A model for adapting and enriching legacy infrastructure. In Proceedings of the e-humanities workshop held in cooperation with e-science 2006: 2nd IEEe international conference on e-science and grid computing, Amsterdam. Available at http://www.faculty.washington.edu/wlewis2/papers/ODIN-eH06.pdf(2006-10-29).
Niles, I., & Pease, A. (2001). Toward a standard upper ontology. In C. Welty & B. Smith (Eds.) Proceedings of the 2nd international conference on formal ontology in information systems (FOIS-2001). Ogunquit, Maine. http://www.home.earthlink.net/adampease/professional/FOIS.pdf
Romary, L. (2003). Implementing a data category registry within ISO TC37-Technical note contributing to a future WD for ISO 12620-1. Technical report SC36N0581, International Standards Organization.
Rosse, C., Kumar, A., Mejino Jr., J. L. V., Cook, D. L., Detwilern, L. T., & Smith, B. (2005). A strategy for improving and integrating biomedical ontologies. In Proceedings of AMIA symposium 2005, Washington, DC, pp. 639–643
Simons, G., & Bird, S. (2003). The open language archives community: An infrastructure for distributed archiving of language resources. Literary and Linguistic Computing, 18, 117–128. http://www.arxiv.org/abs/cs.CL/0306040 (2006-May-17).
Simons, G. F., Lewis, W. D., Farrar, S. O., Langendoen, D. T., Fitzsimons, B., & Gonzalez, H. (2004). The semantics of markup: Mapping legacy markup schemas to a common semantics. In Proceedings of the 4th workshop on NLP and XML (NLPXML-2004): held in cooperation with ACL-04, Barcelona, Spain, pp. 25–32. http://www.u.arizona.edu/∼farrar/papers/Sim-etal04b.pdf
Sperberg-McQueen, C. M., & Burnard, L. (Eds.) (2002). Guidelines for electronic text encoding and interchange, TEI P4. Oxford: Text Encoding Initiative Consortium.
Weber, D. J. (2002). Reflections on the Huallaga Quechua dictionary: Derived forms as subentries. In On-line proceedings of the 2002 E-MELD workshop on digitizing lexical information. http://www.saussure.linguistlist.org/cfdocs/emeld/workshop/2002/presentations/weber/emeld.pdf(2006-07-07).
Acknowledgements
Special thanks goes to Terry Langendoen for his support of our research project from the beginning. The idea to construct an ontology for linguistics was conceived by the authors during their work on the E-MELD project [emeld.org] (NSF ITR-0094934). We gratefully acknowledge the support of the E-MELD PIs and associates, especially Gary Simons, Helen Aristar-Dry and Anthony Aristar. We acknowledge the comments of the members of the GOLD summit held in November, 2004 in Fresno, CA including Jeff Good, Baden Hughes, Laura Buszard-Welcher, Brian Fitzsimons, and Ruby Basham. Finally, we gratefully acknowledge the NSF-funded Data-Driven Linguistic Ontology Development project (BCS-0411348) which supported the authors during the writing of this manuscript.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Farrar, S., Lewis, W.D. The GOLD Community of Practice: an infrastructure for linguistic data on the Web. Lang Resources & Evaluation 41, 45–60 (2007). https://doi.org/10.1007/s10579-007-9016-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10579-007-9016-x