Language Resources and Evaluation

, Volume 41, Issue 1, pp 45–60

The GOLD Community of Practice: an infrastructure for linguistic data on the Web

Article

Abstract

The GOLD Community of Practice is proposed as a model for linking on-line linguistic data to an ontology. The key components of the model include the linguistic data resources themselves and those focused on the knowledge derived from data. Data resources include the ever-increasing amount of linguistic field data and other descriptive language resources being migrated to the Web. The knowledge resources capture generalizations about the data and are anchored in the General Ontology for Linguistic Description (GOLD). It is argued that such a model is in the spirit of the vision for a Semantic Web and, thus, provides a concrete methodology for rendering highly divergent resources semantically interoperable. The focus of this work, then, is not on annotation at the syntactic level, but rather on how annotated Web resources can be linked to an ontology. Furthermore, a methodology is given for creating specific communities of practice within the overall Web infrastructure for linguistics. Finally, ontology-driven search is discussed as a key application of the proposed model.

Keywords

Descriptive linguistics Best practice Markup Ontology Semantic Web Smart search 

References

  1. Aristar, A. (2003). ‘FIELD’. Technical report, presented at the workshop on digitizing and annotating texts and field recordings. LSA Institute.Google Scholar
  2. Berners-Lee, T., Hendler, J., & Lassila, O. (2001). The Semantic Web. Scientific American.Google Scholar
  3. Bird, S., & Simons, G. F. (2003a). Extending Dublin Core metadata to support the description and discovery of language resources. Computers and the Humanities, 37, 375–388. http://www.arxiv.org/abs/cs.CL/0308022
  4. Bird, S., & Simons, G. F. (2003b). Seven dimensions of portability for language documentation and description. Language, 79, 557–582.CrossRefGoogle Scholar
  5. Bruening, B. (2001). Syntax at the edge: Cross-clausal phenomena and the syntax of passamaquoddy. Ph.D. thesis, MIT.Google Scholar
  6. Calzolari, N., Bertagna, F., Lenci, A., & Monachini, M. (2002). Standards and best practice for multilingual computational lexicons & MILE (the Multilingual ISLE Lexical Entry). ISLE Deliverable D2.2-D3.2, ISLE Computational Lexicons Working Group. http://www.ilc.cnr.it/EAGLES96/isle/clwg_doc/ISLE_D2.2-D3.2.zip(2006-07-09).
  7. Calzolari, N., Grishman, R., & Palmer, M. (2001). Survey of major approaches towards Bilingual/Multilingual Lexicons. ISLE Deliverable D2.1-D3.1, ISLE Computational Lexicons Working Group, Pisa.Google Scholar
  8. Calzolari, N., McNaught, J., Palmer, M., & Zampolli, A. (2003). ISLE D14.2-Final report. ISLE Deliverable D14.2, ISLE. http://www.ilc.cnr.it/EAGLES96/isle/ISLE_D14.2.zip (2006-07-09).
  9. Farrar, S. (in press). Using ‘Ontolinguistics’ for language description. In A. Schalley & D. Zaefferer (Eds.), Ontolinguistics: How ontological status shapes the linguistic coding of concepts. Berlin: Mouton de Gruyter. http://www.u.arizona.edu/∼farrar/papers/Far-fc.pdf
  10. Farrar, S., & Langendoen, D. T. (2003). A linguistic ontology for the Semantic Web. GLOT International, 7(3), 97–100. http://www.u.arizona.edu/∼farrar/papers/FarLang03b.pdf
  11. Greenberg, J. (1966). Language universals. Mouton: The Hague.Google Scholar
  12. Ide, N., Lenci, A., & Calzolari, N. (2003). RDF instantiation of ISLE/MILE lexical entries. In Proceedings of ACL’03 workshop on linguistic annotation: Getting the model right, Sapporo, pp. 30–37. http://www.cs.vassar.edu/∼ide/papers/ACL2003-ws-ISLE.pdf(2006-07-09).
  13. Ide, N., & Romary, L. (2004). International standard for a linguistic annotation framework. Journal of Natural Language Engineering, 10(3–4), 211–225.CrossRefGoogle Scholar
  14. Kemps-Snijders, M., Nederhof, M.-J., & Wittenburg, P. (2006). LEXUS, a web-based tool for manipulating lexical resources. In LREC 2006: fifth international conference on language resources and evaluation, Genoa, Italy, pp. 1862–1865.Google Scholar
  15. Langendoen, D. T., Farrar, S., & Lewis, W. D. (2002). Bridging the markup gap: Smart search engines for language researchers. In Proceedings of the international workshop on resources and tools in field linguistics. Las Palmas, Gran Canaria, Spain. http://www.u.arizona.edu/∼farrar/papers/LangFarLew02.pdf
  16. Lenci, A., Busa, F., Ruimy, N., Monachini, E. G. M., Calzolari, N., & Zampolli, A. (2000). Linguistic specifications. SIMPLE deliverable D2.1, ILC and University of Pisa, Pisa. http://www.ub.es/gilcub/SIMPLE/reports/simple/SIMPLE_FGuidelines.rtf.zip(2006-07-09).
  17. Lewis, W. D. (2006). ODIN: A model for adapting and enriching legacy infrastructure. In Proceedings of the e-humanities workshop held in cooperation with e-science 2006: 2nd IEEe international conference on e-science and grid computing, Amsterdam. Available at http://www.faculty.washington.edu/wlewis2/papers/ODIN-eH06.pdf(2006-10-29).
  18. Niles, I., & Pease, A. (2001). Toward a standard upper ontology. In C. Welty & B. Smith (Eds.) Proceedings of the 2nd international conference on formal ontology in information systems (FOIS-2001). Ogunquit, Maine. http://www.home.earthlink.net/adampease/professional/FOIS.pdf
  19. Romary, L. (2003). Implementing a data category registry within ISO TC37-Technical note contributing to a future WD for ISO 12620-1. Technical report SC36N0581, International Standards Organization.Google Scholar
  20. Rosse, C., Kumar, A., Mejino Jr., J. L. V., Cook, D. L., Detwilern, L. T., & Smith, B. (2005). A strategy for improving and integrating biomedical ontologies. In Proceedings of AMIA symposium 2005, Washington, DC, pp. 639–643Google Scholar
  21. Simons, G., & Bird, S. (2003). The open language archives community: An infrastructure for distributed archiving of language resources. Literary and Linguistic Computing, 18, 117–128. http://www.arxiv.org/abs/cs.CL/0306040 (2006-May-17).
  22. Simons, G. F., Lewis, W. D., Farrar, S. O., Langendoen, D. T., Fitzsimons, B., & Gonzalez, H. (2004). The semantics of markup: Mapping legacy markup schemas to a common semantics. In Proceedings of the 4th workshop on NLP and XML (NLPXML-2004): held in cooperation with ACL-04, Barcelona, Spain, pp. 25–32. http://www.u.arizona.edu/∼farrar/papers/Sim-etal04b.pdf
  23. Sperberg-McQueen, C. M., & Burnard, L. (Eds.) (2002). Guidelines for electronic text encoding and interchange, TEI P4. Oxford: Text Encoding Initiative Consortium.Google Scholar
  24. Weber, D. J. (2002). Reflections on the Huallaga Quechua dictionary: Derived forms as subentries. In On-line proceedings of the 2002 E-MELD workshop on digitizing lexical information. http://www.saussure.linguistlist.org/cfdocs/emeld/workshop/2002/presentations/weber/emeld.pdf(2006-07-07).

Copyright information

© Springer Science+Business Media 2007

Authors and Affiliations

  1. 1.University of WashingtonSeattleUSA
  2. 2.California State University FresnoFresnoUSA

Personalised recommendations