Treating Dictionaries as a Linked-Data Corpus

Bouda, Peter; Cysouw, Michael

doi:10.1007/978-3-642-28249-2_2

Peter Bouda⁴ &
Michael Cysouw

1468 Accesses
3 Citations

Abstract

In this paper we describe a practical approach to the challenge of linguistic retrodigitization. We propose to distinguish strictly between a base digitization and separate interpretation of the sources. The base digitization only includes a literal electronic transcript of the source. All sources are thus simply treated as strings of characters, i.e. as unstructured corpora. The often complex structure as found in many dictionaries and grammars will subsequently (and possibly much later) be added as Linked Data in the form of standoff annotation. A further advantage of this approach is that the complete digitization and interpretation can be performed collaboratively without a complex organizational superstructure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bánski P, Przepiórkowski A (2009) Stand-off TEI annotation: The case of the National Corpus of Polish. In: Proceedings of the Third Linguistic Annotation Workshop (LAW III), pp 65–67
Google Scholar
Cayless HA, Soroka A (2010) On implementing string-range() for TEI. In: Proceedings of Balisage: The Markup Conference 2010
Google Scholar
Lee K, Romary L (2010) Towards interoperability of ISO standards for Language Resource Management. In: Proceedings of the 2nd International Conference on Global Interoperability for Language Resources
Google Scholar
Schmidt D (2010) The inadequacy of embedded markup for cultural heritage texts. Literary and Linguistic Computing pp 337–356
Google Scholar
Thiesen W, Thiesen E (1998) Diccionario Bora-Castellano Castellano-Bora. Instituto Lingüístico de Verano
Google Scholar

Download references

Author information

Authors and Affiliations

Research Unit “Quantitative Language Comparison”, Ludwig Maximilians University, Munich, Germany
Peter Bouda

Authors

Peter Bouda
View author publications
You can also search for this author in PubMed Google Scholar
Michael Cysouw
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peter Bouda .

Editor information

Editors and Affiliations

, Information Science Institute, University of Southern California, Admiralty Way 4676, Marina del Rey, 90292, California, USA
Christian Chiarcos
Department of Linguistics, Evolutionary Anthropology Leipzig, Max-Planck Instutite for, Deutscher Platz 6, Leipzig, 04103, Germany
Sebastian Nordhoff
, Business Information Systems, University of Leipzig, Johannisgasse 26, Leipzig, 04103, Germany
Sebastian Hellmann

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bouda, P., Cysouw, M. (2012). Treating Dictionaries as a Linked-Data Corpus. In: Chiarcos, C., Nordhoff, S., Hellmann, S. (eds) Linked Data in Linguistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28249-2_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-28249-2_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28248-5
Online ISBN: 978-3-642-28249-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics