Computers and the Humanities

, Volume 33, Issue 1–2, pp 129–153 | Cite as

The Bible as a Parallel Corpus: Annotating the ‘Book of 2000 Tongues’

  • Philip Resnik
  • Mari Broman Olsen
  • Mona Diab
Article

Abstract

We report on a project to annotate biblical texts in order to create an aligned multilingual Bible corpus for linguistic research, particularly computational linguistics, including automatically creating and evaluating translation lexicons and semantically tagged texts. The output of this project will enable researchers to take advantage of parallel translations across a wider number of languages than previously available, providing, with relatively little effort, a corpus that contains careful translations and reliable alignment at the near-sentence level. We discuss the nature of the text, our annotation process, preliminary and planned uses for the corpus, and relevant aspects of the Corpus Encoding Standard (CES) with respect to this corpus. We also present a quantitative comparison with dictionary and corpus resources for modern-day English, confirming the relevance of this corpus for research on present day language.

Bible computational linguistics parallel corpora Corpus Encoding Standard translation lexicons 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alexander, D. and P. Alexander. Eerdman's Concise Bible Handbook. William B. Eerdman's, 1980.Google Scholar
  2. Bassnet-McGuire, S. Translation Studies. New York: Methuen, 1980.Google Scholar
  3. Beekman, J. and J. Callow. Translating the Word of God. Zondervan, Grand Rapids, MI, 1974.Google Scholar
  4. Beheydt, L. and T. Wieers. Elementair woordenboek Nederlands, 1991.Google Scholar
  5. Blight, R. C. Translation Problems from A to Z. Summer Institute of Linguistics, 1992.Google Scholar
  6. Brown, P., J. Cocke, S. D. Pietra, V. D. Pietra, F. Jelinek, R. Mercer, and P. Roossin. “A Statistical Approach to Machine Translation”. Computational Linguistics, 16(2) (1990), 79–85.Google Scholar
  7. Brown, P., S. Della Pietra, V. Della Pietra, and R. Mercer. “A Statistical Approach to Sense Disambiguation in Machine Translation”. In Fourth DARPA Workshop on Speech and Natural Language. Pacific Grove, CA, February, 1991.Google Scholar
  8. Buchanan, M. A. A Graded Spanish Word Book. Toronto, 1927.Google Scholar
  9. Church, Kenneth W. and Robert Mercer. “Introduction to the Special Issue on Computational Linguistics Using Large Corpora”. Computational Linguistics, 19(1) (1993), 1–24.Google Scholar
  10. Connolly, K. The Indestructible Book. Baker, Grand Rapids, MI, 1996.Google Scholar
  11. Deibler, E. An Index of Implicit Information in the Gospels. Summer Institute of Linguistics, 1993.Google Scholar
  12. deWaard, J. and W. A. Smalley. A Translator's Handbook to the Book of Amos. United Bible Society, 1979.Google Scholar
  13. Dorr, B. J. “Machine Translation Divergences: A Formal Description and Proposed Solution”. Computational Linguistics, 20(4) (1994), 597–633.Google Scholar
  14. Dorr, B. J. “Large-scale Dictionary Construction for Foreign Language Tutoring and Interlingual Machine Translation”. Machine Translation, 12(4) (1994), 271–322.Google Scholar
  15. Dorr, B. J. and M. B. Olsen. “Deriving Verbal and Compositional Lexical Aspect for NLP Applications”. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (ACL-97). Madrid, Spain, July 7–12, 1997, pp. 151–158.Google Scholar
  16. Dunning, T. “Accurate Methods for the Statistics of Surprise and Coincidence”. Computational Linguistics, 19(1) (1993), 61–74, March.Google Scholar
  17. Freeman, H. E. An Introduction to the Old Testament Prophets. Chicago: Moody Press, 1968.Google Scholar
  18. GNB. Good News Bible: The Bible in Today's English Version. 1976.Google Scholar
  19. Hofstadter, D. R. Le Ton Beau De Marot: In Praise of the Music of Language. Basic Books, 1997.Google Scholar
  20. Hull, D. A. and D. W. Oard. Symposium on Cross-Language Text and Speech Retrieval. Technical Report SS–97–04, American Association for Artificial Intelligence, Menlo Park, CA, March, 1997.Google Scholar
  21. Ide, N. Corpus Encoding Standard: Document CES 1, version 1.4, October. http://www.cs.vassar.edu/CES/, 1996.Google Scholar
  22. KJV. The Holy Bible, Authorized King James Version.Google Scholar
  23. Kleijn, P. de and E. Nieuwborg. Basiswoordenboek Nederlands. Leuven, 1983.Google Scholar
  24. Kučera, H. and W. Francis. Computational Analysis of Present-day American English. Brown University Press: Providence, R.I., 1967.Google Scholar
  25. LeBlanc, D. Hands Off My NIV! In Christianity Today, June, 1997.Google Scholar
  26. Louw, J. P. and E. A. Nida. Greek-English Lexicon of the New Testament Based on Semantic Domains, 2nd edition. New York: United Bible Societies, 1989.Google Scholar
  27. MacWhinney, B. The CHILDES Project: Tools for Analyzing Talk. Erlbaum, 1991.Google Scholar
  28. Melamed, I. D. “A Geometric Approach to Mapping Bitext Correspondence”. In Conference on Empirical Methods in Natural Language Processing. Philadelphia, Pennsylvania, 1996a.Google Scholar
  29. Melamed, I. D. “Automatic Construction of Clean Broad-coverage Translation Lexicons”. In Proceedings of the 2nd Conference of the Association for Machine Translation in the Americas. Montreal, Canada, 1996b.Google Scholar
  30. Melamed, I. D. “Automatic Detection of Omissions in Translations”. In Proceedings of the 16th Annual Conference on Computational Linguistics (COLING-96). Copenhagen, 1996c.Google Scholar
  31. Melamed, I. D. “Automatic Discovery of Non-compositional Compounds in Parallel Data”. In Proceedings of the 2nd Conference on Empirical Methods in Natural Language Processing (EMNLP-97). Brown University, August, 1997.Google Scholar
  32. Melamed, I. D. Manual Annotation of Translational Equivalence: The Blinker Project. Technical Report 98–07, University of Pennsylvania, 1998.Google Scholar
  33. Miller, G. “WordNet: An On-line Lexical Database'. International Journal of Lexicography, 3(4) (1990) (Special Issue).Google Scholar
  34. Moore, B. R. Doublets in the New Testament. Summer Institute of Linguistics, 1993.Google Scholar
  35. Nida, E. A. Towards a Science of Translating. E. J. Brill, Leiden, 1964.Google Scholar
  36. Nida, E. A. and C. R. Taber. The Theory and Practice of Translation. E. J. Brill, Leiden, 1969.Google Scholar
  37. Niebuhr, G. “Mass Marketing Makes Nonbiblical Texts Readily Accessible”. In New York Times, December, 1997.Google Scholar
  38. Olsen, M. B. A Semantic and Pragmatic Model of Lexical and Grammatical Aspect. New York: Garland, 1997.Google Scholar
  39. Porter, S. “Verbal Aspect in the Greek of the New Testament, with Reference to Tense and Mood”. In Studies in Biblical Greek, Vol. 1. Ed. D. A. Carson, New York: Peter Lang, 1989.Google Scholar
  40. Proctor, P., ed. Longman Dictionary of Contemporary English (LDOCE). Longman Group, 1978.Google Scholar
  41. Resnik, P. and I. D. Melamed. “Semi-automatic acquisition of domain-specific translation lexicons”. In Fifth Conference on Applied Natural Language Processing. Washington, D.C., 1997.Google Scholar
  42. Robinson, D. The Translator's Turn. Baltimore and London: The Johns Hopkins University Press, 1991.Google Scholar
  43. Sciarone, A. G. Vocabolario fondamentale delle lingua italiana. Minerva Italica, Bergamo, 1977.Google Scholar
  44. Vander Beke, G. E. French Word List. New York: Macmillan, 1929.Google Scholar
  45. Vaughan, C. The New Testament from 26 Translations. Zondervan, Grand Rapids, MI, 1967.Google Scholar
  46. Véronis, J. Multext Home Page: Document MUL1, version 0.1, April. http://www.lpl.univaix.fr/projects/multext/, 1996.Google Scholar
  47. Weigelt, M. A. “Textual Criticism of the Bible”. In Baker Encyclopedia of the Bible, Vol. I A-I. Ed. W. A. Elwall, Grand Rapids, MI, 1988.Google Scholar

Copyright information

© Kluwer Academic Publishers 1999

Authors and Affiliations

  • Philip Resnik
    • 1
  • Mari Broman Olsen
    • 1
  • Mona Diab
    • 1
  1. 1.Department of Linguistics and Institute for Advanced Computer StudiesUniversity of MarylandCollege ParkUSA (E-mail

Personalised recommendations