The Ancient Greek and Latin Dependency Treebanks

Bamman, David; Crane, Gregory

doi:10.1007/978-3-642-20227-8_5

David Bamman⁴ &
Gregory Crane⁴

Part of the book series: Theory and Applications of Natural Language Processing ((NLP))

748 Accesses
14 Citations

Abstract

This paper describes the development, composition, and several uses of the Ancient Greek and Latin Dependency Treebanks, large collections of Classical texts in which the syntactic, morphological and lexical information for each word is made explicit. To date, over 200 individuals from around the world have collaborated to annotate over 350,000 words, including the entirety of Homer’s Iliad and Odyssey, Sophocles’ Ajax, all of the extant works of Hesiod and Aeschylus, and selections from Caesar, Cicero, Jerome, Ovid, Petronius, Propertius, Sallust and Vergil. While perhaps the most straightforward value of such an annotated corpus for Classical philology is the morphosyntactic searching it makes possible, it also enables a large number of downstream tasks as well, such as inducing the syntactic behavior of lexemes and automatically identifying similar passages between texts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bamman, D., Crane, G.: The design and use of a Latin dependency treebank. In: Proceedings of the Fifth Workshop on Treebanks and Linguistic Theories (TLT2006), pp. 67–78. ÚFAL MFF UK, Prague (2006)
Google Scholar
Bamman, D., Crane, G.: The Latin Dependency Treebank in a cultural heritage digital library. In: Proceedings of the Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2007), pp. 33–40. Association for Computational Linguistics, Prague (2007). URL http://www.aclweb.org/anthology/W/W07/W07-0905
Bamman, D., Crane, G.: Building a dynamic lexicon from a digital library. In: JCDL ’08: Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries, pp. 11–20. ACM, New York, NY, USA (2008). DOI http://doi.acm.org/10.1145/1378889.1378892
Bamman, D., Crane, G.: The logic and discovery of textual allusion. In: Proceedings of the Second Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2008). Marrakesh (2008)
Google Scholar
Bamman, D., Crane, G.: Guidelines for the syntactic annotation of Ancient Greek treebanks, version 1.1. Tech. rep., Tufts Digital Library, Medford (2009)
Google Scholar
Bamman, D., Crane, G.: Pautas para la notación sintáctica del treebank de dependencia para el griego antiguo (1.1), traducción y adaptacón al español de Alejandro Abritta. Tech. rep., Tufts Digital Library, Medford (2010)
Google Scholar
Bamman, D., Mambrini, F., Crane, G.: An ownership model of annotation: The Ancient Greek Dependency Treebank. In: The Eighth International Workshop on Treebanks and Linguistic Theories (2009)
Google Scholar
Bamman, D., Passarotti, M., Crane, G., Raynaud, S.: Guidelines for the syntactic annotation of Latin treebanks, version 1.3. Tech. rep., Tufts Digital Library, Medford (2007)
Google Scholar
Bamman, D., Passarotti, M., Crane, G., Raynaud, S.: Pautas para la notación sintáctica del treebank de dependencia para el latin (1.3), traducción y adaptacón al español de Alejandro Abritta. Tech. rep., Tufts Digital Library, Medford (2010)
Google Scholar
Brants, S., Dipper, S., Hansen, S., Lezius, W., Smith, G.: The TIGER treebank. In: Proceedings of the Workshop on Treebanks and Linguistic Theories. Sozopol (2002)
Google Scholar
Brants, T., Franz, A.: Web 1T 5-gram Version 1. Linguistic Data Consortium, Philadelphia (2006)
Google Scholar
Brin, S., Davis, J., García-Molina, H.: Copy detection mechanisms for digital documents. SIGMOD Rec. 24(2), 398–409 (1995). DOI http://doi.acm.org/10.1145/568271.223855
Google Scholar
Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Comput. Linguist. 19(2), 263–311 (1993)
Google Scholar
Chiou, F.D., Chiang, D., Palmer, M.: Facilitating treebank annotation using a statistical parser. In: Proceedings of the First International Conference on Human Language Technology Research HLT ’01, pp. 1–4 (2001)
Google Scholar
Chomsky, N.: Remarks on nominalization. In: R. Jacobs, P. Rosenbaum (eds.) Reading in English Transformational Grammar. Ginn, Waltham (1970)
Google Scholar
Conington, J. (ed.): P. Vergili Maronis Opera. The Works of Virgil, with Commentary. Whittaker and Co, London (1876)
Google Scholar
Crane, G.: From the old to the new: Integrating hypertext into traditional scholarship. In: Hypertext ’87: Proceedings of the 1st ACM conference on Hypertext, pp. 51–56. ACM Press (1987)
Google Scholar
Crane, G.: New technologies for reading: The lexicon and the digital library. Classical World pp. 471–501 (1998)
Google Scholar
Crane, G., Bamman, D., Cerrato, L., Jones, A., Mimno, D.M., Packel, A., Sculley, D., Weaver, G.: Beyond digital incunabula: Modeling the next generation of digital libraries. In: J. Gonzalo, C. Thanos, M.F. Verdejo, R.C. Carrasco (eds.) ECDL, Lecture Notes in Computer Science, vol. 4172, pp. 353–366. Springer (2006)
Google Scholar
Cuzzolin, P.: On sentential complementation after verba affectuum. In: J. Herman (ed.) Linguistic Studies on Latin, pp. 167–178. Benjamins, Amsterdam-Philadelphia (1991)
Google Scholar
Hajič, J.: Building a syntactically annotated corpus: The Prague Dependency Treebank. In: E. Hajičová (ed.) Issues of Valency and Meaning. Studies in Honor of Jarmila Panevová, pp. 12–19. Prague Karolinum, Charles University Press (1998)
Google Scholar
Hajič, J., Smrž, O., Zemánek, P., Šnaidauf, J., Beška, E.: Prague Arabic dependency treebank: Development in data and tools. In: Proc. of the NEMLAR Intern. Conf. on Arabic Language Resources and Tools (2004)
Google Scholar
Haug, D., Jøhndal, M.: Creating a Parallel Treebank of the Old Indo-European Bible Translations. In: Proceedings of the Second Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2008) (2008)
Google Scholar
Hoad, T.C., Zobel, J.: Methods for identifying versioned and plagiarized documents. J. Am. Soc. Inf. Sci. Technol. 54(3), 203–215 (2003). DOI http://dx.doi.org/10.1002/asi.10170
Google Scholar
Kilgarriff, A., Rychlý, P., Smrž, P., Tugwell, D.: The sketch engine. In: Proceedings of the Eleventh EURALEX International Congress, pp. 105–116 (2004). URL http://www.fit.vutbr.cz/research/view_pub.php?id=7703
Klosa, A., Schnörch, U., Storjohann, P.: ELEXIKO – a lexical and lexicological, corpus-based hypertext information system at the Institut für deutsche Sprache, Mannheim. In: Proceedings of the 12th Euralex International Congress (2006)
Google Scholar
Kroch, A., Santorini, B., Delfs, L.: Penn-Helsinki Parsed Corpus of Early Modern English. http://www.ling.upenn.edu/hist-corpora/ppceme-release-1 (2004)
Kroch, A., Taylor, A.: Penn-Helsinki Parsed Corpus of Middle English, second edition. http://www.ling.upenn.edu/hist-corpora/ppcme2-release-2/ (2000)
Kühner, R., Stegmann, C.: Ausführliche Grammatik der lateinischen Sprache II. Satzlehre. I. Teile Zweite Auflage. Hahnsche Buchhandlung, Hannover (1914)
Google Scholar
Lewis, C.T., Short, C. (eds.): A Latin Dictionary. Clarendon Press, Oxford (1879)
Google Scholar
Liddell, H.G., Scott, R., Jones, H.S., McKenzie, R. (eds.): A Greek-English Lexicon, 9th edition. Oxford University Press, Oxford (1996)
Google Scholar
Maamouri, M., Bies, A., Buckwalter, T., Mekki, W.: The Penn Arabic Treebank: Building a Large-Scale Annotated Arabic Corpus. In: Proc. of the NEMLAR Intern. Conf. on Arabic Language Resources and Tools (2004)
Google Scholar
Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19(2), 313–330 (1994)
Google Scholar
McDonald, R., Pereira, F., Ribarov, K., Hajič, J.: Non-projective dependency parsing using spanning tree algorithms. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 523–530 (2005)
Google Scholar
Mel’čuk, I.: Dependency Syntax: Theory and Practice. University of New York Press, Albany (1988)
Google Scholar
Metzler, D., Bernstein, Y., Croft, W.B., Moffat, A., Zobel, J.: Similarity measures for tracking information flow. In: CIKM ’05: Proceedings of the 14th ACM international conference on Information and knowledge management, pp. 517–524. ACM, New York, NY, USA (2005). DOI http://doi.acm.org/10.1145/1099554.1099695
Passarotti, M.: Verso il Lessico Tomistico Biculturale. La treebank dell’Index Thomisticus. In: P. Raffaella, F. Diego (eds.) Il filo del discorso. Intrecci testuali, articolazioni linguistiche, composizioni logiche. Atti del XIII Congresso Nazionale della Società di Filosofia del Linguaggio, Viterbo, Settembre 2006, pp. 187–205. Roma, Aracne Editrice, Pubblicazioni della Società di Filosofia del Linguaggio (2007)
Google Scholar
Pintzuk, S., Leendert, P.: York-Helsinki Parsed Corpus of Old English Poetry (2001)
Google Scholar
Rocio, V., Alves, M.A., Lopes, J.G., Xavier, M.F., Vicente, G.: Automated creation of a Medieval Portuguese partial treebank. In: A. Abeillé (ed.) Treebanks: Building and Using Parsed Corpora, pp. 211–227. Kluwer Academic Publishers (2003)
Google Scholar
Seo, J., Croft, W.B.: Local text reuse detection. In: SIGIR ’08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pp. 571–578. ACM, New York, NY, USA (2008). DOI http://doi.acm.org/10.1145/1390334.1390432
Sgall, P., Hajičová, E., Panevová, J.: The Meaning of the Sentence in its Semantic and Pragmatic Aspects. Dordrecht: Reidel Publishing Company and Prague: Academia (1986)
Google Scholar
Shivakumar, N., Garcia-Molina, H.: SCAM: A copy detection mechanism for digital documents. In: In Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries (1995)
Google Scholar
Sinclair, J.M. (ed.): Looking Up: an account of the COBUILD project in lexical computing. Collins (1987)
Google Scholar
Smyth, H.W.: Greek Grammar. Harvard University Press (1920)
Google Scholar
Taylor, A., Marcus, M., Santorini, B.: The Penn Treebank: An overview. In: A. Abeillé (ed.) Treebanks: Building and Using Parsed Corpora, pp. 5–22. Kluwer Academic Publishers (2003)
Google Scholar
Taylor, A., Warner, A., Pintzuk, S., Beths, F.: York-Toronto-Helsinki Parsed Corpus of Old English Prose (2003)
Google Scholar
Tesnière, L.: Éleménts de syntaxe structurale. Klincksieck, Paris (1959)
Google Scholar
Zeldes, A., Ritz, J., Lüdeling, A., Chiarcos, C.: Annis: A search tool for multi-layer annotated corpora. In: Proceedings of Corpus Linguistics 2009, Liverpool, July 20-23, 2009. (2009)
Google Scholar
Zemánek, P.: A treebank of Ugaritic: Annotating fragmentary attested languages. In: Proceedings of the Sixth Workshop on Treebanks and Linguistic Theories (TLT2007), pp. 213–218. Bergen (2007)
Google Scholar

Download references

Acknowledgements

Grants from the Alpheios Project (“Building a Greek Treebank”), the National Endowment for the Humanities (PR-50013-08, “The Dynamic Lexicon: Cyberinfrastructure and the Automated Analysis of Historical Languages”), the Andrew W. Mellon Foundation (“The CyberEdition Project: Workflow for Textual Data in Cyberinfrastructure”), the Digital Library Initiative Phrase 2 (IIS-9817484) and the National Science Foundation (BCS-0616521) provided support for this work. This research used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. This paper is made available under a Creative Commons Attribution license.

Author information

Authors and Affiliations

Perseus Project, Tufts University, Medford/Somerville, USA
David Bamman & Gregory Crane

Authors

David Bamman
View author publications
You can also search for this author in PubMed Google Scholar
Gregory Crane
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David Bamman .

Editor information

Editors and Affiliations

, Computational Linguistics / MMCI, Saarland University, Saarbrücken, 66041, Germany
Caroline Sporleder
Fac. Humanities, Tilburg University, Tilburg, Netherlands
Antal van den Bosch
Tilburg School for Humanities, Tilburg Center for Cognition and Communi, University of Tilburg, Tilburg, 5000, Netherlands
Kalliopi Zervanou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bamman, D., Crane, G. (2011). The Ancient Greek and Latin Dependency Treebanks. In: Sporleder, C., van den Bosch, A., Zervanou, K. (eds) Language Technology for Cultural Heritage. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20227-8_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-20227-8_5
Published: 26 April 2011
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20226-1
Online ISBN: 978-3-642-20227-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics