Abstract
This paper describes the development, composition, and several uses of the Ancient Greek and Latin Dependency Treebanks, large collections of Classical texts in which the syntactic, morphological and lexical information for each word is made explicit. To date, over 200 individuals from around the world have collaborated to annotate over 350,000 words, including the entirety of Homer’s Iliad and Odyssey, Sophocles’ Ajax, all of the extant works of Hesiod and Aeschylus, and selections from Caesar, Cicero, Jerome, Ovid, Petronius, Propertius, Sallust and Vergil. While perhaps the most straightforward value of such an annotated corpus for Classical philology is the morphosyntactic searching it makes possible, it also enables a large number of downstream tasks as well, such as inducing the syntactic behavior of lexemes and automatically identifying similar passages between texts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bamman, D., Crane, G.: The design and use of a Latin dependency treebank. In: Proceedings of the Fifth Workshop on Treebanks and Linguistic Theories (TLT2006), pp. 67–78. ÚFAL MFF UK, Prague (2006)
Bamman, D., Crane, G.: The Latin Dependency Treebank in a cultural heritage digital library. In: Proceedings of the Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2007), pp. 33–40. Association for Computational Linguistics, Prague (2007). URL http://www.aclweb.org/anthology/W/W07/W07-0905
Bamman, D., Crane, G.: Building a dynamic lexicon from a digital library. In: JCDL ’08: Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries, pp. 11–20. ACM, New York, NY, USA (2008). DOI http://doi.acm.org/10.1145/1378889.1378892
Bamman, D., Crane, G.: The logic and discovery of textual allusion. In: Proceedings of the Second Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2008). Marrakesh (2008)
Bamman, D., Crane, G.: Guidelines for the syntactic annotation of Ancient Greek treebanks, version 1.1. Tech. rep., Tufts Digital Library, Medford (2009)
Bamman, D., Crane, G.: Pautas para la notación sintáctica del treebank de dependencia para el griego antiguo (1.1), traducción y adaptacón al español de Alejandro Abritta. Tech. rep., Tufts Digital Library, Medford (2010)
Bamman, D., Mambrini, F., Crane, G.: An ownership model of annotation: The Ancient Greek Dependency Treebank. In: The Eighth International Workshop on Treebanks and Linguistic Theories (2009)
Bamman, D., Passarotti, M., Crane, G., Raynaud, S.: Guidelines for the syntactic annotation of Latin treebanks, version 1.3. Tech. rep., Tufts Digital Library, Medford (2007)
Bamman, D., Passarotti, M., Crane, G., Raynaud, S.: Pautas para la notación sintáctica del treebank de dependencia para el latin (1.3), traducción y adaptacón al español de Alejandro Abritta. Tech. rep., Tufts Digital Library, Medford (2010)
Brants, S., Dipper, S., Hansen, S., Lezius, W., Smith, G.: The TIGER treebank. In: Proceedings of the Workshop on Treebanks and Linguistic Theories. Sozopol (2002)
Brants, T., Franz, A.: Web 1T 5-gram Version 1. Linguistic Data Consortium, Philadelphia (2006)
Brin, S., Davis, J., GarcĂa-Molina, H.: Copy detection mechanisms for digital documents. SIGMOD Rec. 24(2), 398–409 (1995). DOI http://doi.acm.org/10.1145/568271.223855
Brown, P.F., Pietra, V.J.D., Pietra, S.A.D., Mercer, R.L.: The mathematics of statistical machine translation: parameter estimation. Comput. Linguist. 19(2), 263–311 (1993)
Chiou, F.D., Chiang, D., Palmer, M.: Facilitating treebank annotation using a statistical parser. In: Proceedings of the First International Conference on Human Language Technology Research HLT ’01, pp. 1–4 (2001)
Chomsky, N.: Remarks on nominalization. In: R. Jacobs, P. Rosenbaum (eds.) Reading in English Transformational Grammar. Ginn, Waltham (1970)
Conington, J. (ed.): P. Vergili Maronis Opera. The Works of Virgil, with Commentary. Whittaker and Co, London (1876)
Crane, G.: From the old to the new: Integrating hypertext into traditional scholarship. In: Hypertext ’87: Proceedings of the 1st ACM conference on Hypertext, pp. 51–56. ACM Press (1987)
Crane, G.: New technologies for reading: The lexicon and the digital library. Classical World pp. 471–501 (1998)
Crane, G., Bamman, D., Cerrato, L., Jones, A., Mimno, D.M., Packel, A., Sculley, D., Weaver, G.: Beyond digital incunabula: Modeling the next generation of digital libraries. In: J. Gonzalo, C. Thanos, M.F. Verdejo, R.C. Carrasco (eds.) ECDL, Lecture Notes in Computer Science, vol. 4172, pp. 353–366. Springer (2006)
Cuzzolin, P.: On sentential complementation after verba affectuum. In: J. Herman (ed.) Linguistic Studies on Latin, pp. 167–178. Benjamins, Amsterdam-Philadelphia (1991)
Hajič, J.: Building a syntactically annotated corpus: The Prague Dependency Treebank. In: E. Hajičová (ed.) Issues of Valency and Meaning. Studies in Honor of Jarmila Panevová, pp. 12–19. Prague Karolinum, Charles University Press (1998)
Hajič, J., Smrž, O., Zemánek, P., Šnaidauf, J., Beška, E.: Prague Arabic dependency treebank: Development in data and tools. In: Proc. of the NEMLAR Intern. Conf. on Arabic Language Resources and Tools (2004)
Haug, D., Jøhndal, M.: Creating a Parallel Treebank of the Old Indo-European Bible Translations. In: Proceedings of the Second Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2008) (2008)
Hoad, T.C., Zobel, J.: Methods for identifying versioned and plagiarized documents. J. Am. Soc. Inf. Sci. Technol. 54(3), 203–215 (2003). DOI http://dx.doi.org/10.1002/asi.10170
Kilgarriff, A., Rychlý, P., Smrž, P., Tugwell, D.: The sketch engine. In: Proceedings of the Eleventh EURALEX International Congress, pp. 105–116 (2004). URL http://www.fit.vutbr.cz/research/view_pub.php?id=7703
Klosa, A., Schnörch, U., Storjohann, P.: ELEXIKO – a lexical and lexicological, corpus-based hypertext information system at the Institut für deutsche Sprache, Mannheim. In: Proceedings of the 12th Euralex International Congress (2006)
Kroch, A., Santorini, B., Delfs, L.: Penn-Helsinki Parsed Corpus of Early Modern English. http://www.ling.upenn.edu/hist-corpora/ppceme-release-1 (2004)
Kroch, A., Taylor, A.: Penn-Helsinki Parsed Corpus of Middle English, second edition. http://www.ling.upenn.edu/hist-corpora/ppcme2-release-2/ (2000)
KĂĽhner, R., Stegmann, C.: AusfĂĽhrliche Grammatik der lateinischen Sprache II. Satzlehre. I. Teile Zweite Auflage. Hahnsche Buchhandlung, Hannover (1914)
Lewis, C.T., Short, C. (eds.): A Latin Dictionary. Clarendon Press, Oxford (1879)
Liddell, H.G., Scott, R., Jones, H.S., McKenzie, R. (eds.): A Greek-English Lexicon, 9th edition. Oxford University Press, Oxford (1996)
Maamouri, M., Bies, A., Buckwalter, T., Mekki, W.: The Penn Arabic Treebank: Building a Large-Scale Annotated Arabic Corpus. In: Proc. of the NEMLAR Intern. Conf. on Arabic Language Resources and Tools (2004)
Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19(2), 313–330 (1994)
McDonald, R., Pereira, F., Ribarov, K., Hajič, J.: Non-projective dependency parsing using spanning tree algorithms. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 523–530 (2005)
Mel’čuk, I.: Dependency Syntax: Theory and Practice. University of New York Press, Albany (1988)
Metzler, D., Bernstein, Y., Croft, W.B., Moffat, A., Zobel, J.: Similarity measures for tracking information flow. In: CIKM ’05: Proceedings of the 14th ACM international conference on Information and knowledge management, pp. 517–524. ACM, New York, NY, USA (2005). DOI http://doi.acm.org/10.1145/1099554.1099695
Passarotti, M.: Verso il Lessico Tomistico Biculturale. La treebank dell’Index Thomisticus. In: P. Raffaella, F. Diego (eds.) Il filo del discorso. Intrecci testuali, articolazioni linguistiche, composizioni logiche. Atti del XIII Congresso Nazionale della Società di Filosofia del Linguaggio, Viterbo, Settembre 2006, pp. 187–205. Roma, Aracne Editrice, Pubblicazioni della Società di Filosofia del Linguaggio (2007)
Pintzuk, S., Leendert, P.: York-Helsinki Parsed Corpus of Old English Poetry (2001)
Rocio, V., Alves, M.A., Lopes, J.G., Xavier, M.F., Vicente, G.: Automated creation of a Medieval Portuguese partial treebank. In: A. Abeillé (ed.) Treebanks: Building and Using Parsed Corpora, pp. 211–227. Kluwer Academic Publishers (2003)
Seo, J., Croft, W.B.: Local text reuse detection. In: SIGIR ’08: Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, pp. 571–578. ACM, New York, NY, USA (2008). DOI http://doi.acm.org/10.1145/1390334.1390432
Sgall, P., Hajičová, E., Panevová, J.: The Meaning of the Sentence in its Semantic and Pragmatic Aspects. Dordrecht: Reidel Publishing Company and Prague: Academia (1986)
Shivakumar, N., Garcia-Molina, H.: SCAM: A copy detection mechanism for digital documents. In: In Proceedings of the Second Annual Conference on the Theory and Practice of Digital Libraries (1995)
Sinclair, J.M. (ed.): Looking Up: an account of the COBUILD project in lexical computing. Collins (1987)
Smyth, H.W.: Greek Grammar. Harvard University Press (1920)
Taylor, A., Marcus, M., Santorini, B.: The Penn Treebank: An overview. In: A. Abeillé (ed.) Treebanks: Building and Using Parsed Corpora, pp. 5–22. Kluwer Academic Publishers (2003)
Taylor, A., Warner, A., Pintzuk, S., Beths, F.: York-Toronto-Helsinki Parsed Corpus of Old English Prose (2003)
Tesnière, L.: Éleménts de syntaxe structurale. Klincksieck, Paris (1959)
Zeldes, A., Ritz, J., LĂĽdeling, A., Chiarcos, C.: Annis: A search tool for multi-layer annotated corpora. In: Proceedings of Corpus Linguistics 2009, Liverpool, July 20-23, 2009. (2009)
Zemánek, P.: A treebank of Ugaritic: Annotating fragmentary attested languages. In: Proceedings of the Sixth Workshop on Treebanks and Linguistic Theories (TLT2007), pp. 213–218. Bergen (2007)
Acknowledgements
Grants from the Alpheios Project (“Building a Greek Treebank”), the National Endowment for the Humanities (PR-50013-08, “The Dynamic Lexicon: Cyberinfrastructure and the Automated Analysis of Historical Languages”), the Andrew W. Mellon Foundation (“The CyberEdition Project: Workflow for Textual Data in Cyberinfrastructure”), the Digital Library Initiative Phrase 2 (IIS-9817484) and the National Science Foundation (BCS-0616521) provided support for this work. This research used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. This paper is made available under a Creative Commons Attribution license.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bamman, D., Crane, G. (2011). The Ancient Greek and Latin Dependency Treebanks. In: Sporleder, C., van den Bosch, A., Zervanou, K. (eds) Language Technology for Cultural Heritage. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20227-8_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-20227-8_5
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20226-1
Online ISBN: 978-3-642-20227-8
eBook Packages: Computer ScienceComputer Science (R0)