Skip to main content
Log in

The Bible as a Parallel Corpus: Annotating the ‘Book of 2000 Tongues’

  • Published:
Computers and the Humanities Aims and scope Submit manuscript

Abstract

We report on a project to annotate biblical texts in order to create an aligned multilingual Bible corpus for linguistic research, particularly computational linguistics, including automatically creating and evaluating translation lexicons and semantically tagged texts. The output of this project will enable researchers to take advantage of parallel translations across a wider number of languages than previously available, providing, with relatively little effort, a corpus that contains careful translations and reliable alignment at the near-sentence level. We discuss the nature of the text, our annotation process, preliminary and planned uses for the corpus, and relevant aspects of the Corpus Encoding Standard (CES) with respect to this corpus. We also present a quantitative comparison with dictionary and corpus resources for modern-day English, confirming the relevance of this corpus for research on present day language.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Alexander, D. and P. Alexander. Eerdman's Concise Bible Handbook. William B. Eerdman's, 1980.

  • Bassnet-McGuire, S. Translation Studies. New York: Methuen, 1980.

    Google Scholar 

  • Beekman, J. and J. Callow. Translating the Word of God. Zondervan, Grand Rapids, MI, 1974.

    Google Scholar 

  • Beheydt, L. and T. Wieers. Elementair woordenboek Nederlands, 1991.

  • Blight, R. C. Translation Problems from A to Z. Summer Institute of Linguistics, 1992.

  • Brown, P., J. Cocke, S. D. Pietra, V. D. Pietra, F. Jelinek, R. Mercer, and P. Roossin. “A Statistical Approach to Machine Translation”. Computational Linguistics, 16(2) (1990), 79–85.

    Google Scholar 

  • Brown, P., S. Della Pietra, V. Della Pietra, and R. Mercer. “A Statistical Approach to Sense Disambiguation in Machine Translation”. In Fourth DARPA Workshop on Speech and Natural Language. Pacific Grove, CA, February, 1991.

  • Buchanan, M. A. A Graded Spanish Word Book. Toronto, 1927.

  • Church, Kenneth W. and Robert Mercer. “Introduction to the Special Issue on Computational Linguistics Using Large Corpora”. Computational Linguistics, 19(1) (1993), 1–24.

    Google Scholar 

  • Connolly, K. The Indestructible Book. Baker, Grand Rapids, MI, 1996.

    Google Scholar 

  • Deibler, E. An Index of Implicit Information in the Gospels. Summer Institute of Linguistics, 1993.

  • deWaard, J. and W. A. Smalley. A Translator's Handbook to the Book of Amos. United Bible Society, 1979.

  • Dorr, B. J. “Machine Translation Divergences: A Formal Description and Proposed Solution”. Computational Linguistics, 20(4) (1994), 597–633.

    Google Scholar 

  • Dorr, B. J. “Large-scale Dictionary Construction for Foreign Language Tutoring and Interlingual Machine Translation”. Machine Translation, 12(4) (1994), 271–322.

    Google Scholar 

  • Dorr, B. J. and M. B. Olsen. “Deriving Verbal and Compositional Lexical Aspect for NLP Applications”. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (ACL-97). Madrid, Spain, July 7–12, 1997, pp. 151–158.

  • Dunning, T. “Accurate Methods for the Statistics of Surprise and Coincidence”. Computational Linguistics, 19(1) (1993), 61–74, March.

    Google Scholar 

  • Freeman, H. E. An Introduction to the Old Testament Prophets. Chicago: Moody Press, 1968.

    Google Scholar 

  • GNB. Good News Bible: The Bible in Today's English Version. 1976.

  • Hofstadter, D. R. Le Ton Beau De Marot: In Praise of the Music of Language. Basic Books, 1997.

  • Hull, D. A. and D. W. Oard. Symposium on Cross-Language Text and Speech Retrieval. Technical Report SS–97–04, American Association for Artificial Intelligence, Menlo Park, CA, March, 1997.

    Google Scholar 

  • Ide, N. Corpus Encoding Standard: Document CES 1, version 1.4, October. http://www.cs.vassar.edu/CES/, 1996.

  • KJV. The Holy Bible, Authorized King James Version.

  • Kleijn, P. de and E. Nieuwborg. Basiswoordenboek Nederlands. Leuven, 1983.

  • Kučera, H. and W. Francis. Computational Analysis of Present-day American English. Brown University Press: Providence, R.I., 1967.

    Google Scholar 

  • LeBlanc, D. Hands Off My NIV! In Christianity Today, June, 1997.

  • Louw, J. P. and E. A. Nida. Greek-English Lexicon of the New Testament Based on Semantic Domains, 2nd edition. New York: United Bible Societies, 1989.

    Google Scholar 

  • MacWhinney, B. The CHILDES Project: Tools for Analyzing Talk. Erlbaum, 1991.

  • Melamed, I. D. “A Geometric Approach to Mapping Bitext Correspondence”. In Conference on Empirical Methods in Natural Language Processing. Philadelphia, Pennsylvania, 1996a.

  • Melamed, I. D. “Automatic Construction of Clean Broad-coverage Translation Lexicons”. In Proceedings of the 2nd Conference of the Association for Machine Translation in the Americas. Montreal, Canada, 1996b.

  • Melamed, I. D. “Automatic Detection of Omissions in Translations”. In Proceedings of the 16th Annual Conference on Computational Linguistics (COLING-96). Copenhagen, 1996c.

  • Melamed, I. D. “Automatic Discovery of Non-compositional Compounds in Parallel Data”. In Proceedings of the 2nd Conference on Empirical Methods in Natural Language Processing (EMNLP-97). Brown University, August, 1997.

  • Melamed, I. D. Manual Annotation of Translational Equivalence: The Blinker Project. Technical Report 98–07, University of Pennsylvania, 1998.

  • Miller, G. “WordNet: An On-line Lexical Database'. International Journal of Lexicography, 3(4) (1990) (Special Issue).

  • Moore, B. R. Doublets in the New Testament. Summer Institute of Linguistics, 1993.

  • Nida, E. A. Towards a Science of Translating. E. J. Brill, Leiden, 1964.

    Google Scholar 

  • Nida, E. A. and C. R. Taber. The Theory and Practice of Translation. E. J. Brill, Leiden, 1969.

    Google Scholar 

  • Niebuhr, G. “Mass Marketing Makes Nonbiblical Texts Readily Accessible”. In New York Times, December, 1997.

  • Olsen, M. B. A Semantic and Pragmatic Model of Lexical and Grammatical Aspect. New York: Garland, 1997.

    Google Scholar 

  • Porter, S. “Verbal Aspect in the Greek of the New Testament, with Reference to Tense and Mood”. In Studies in Biblical Greek, Vol. 1. Ed. D. A. Carson, New York: Peter Lang, 1989.

    Google Scholar 

  • Proctor, P., ed. Longman Dictionary of Contemporary English (LDOCE). Longman Group, 1978.

  • Resnik, P. and I. D. Melamed. “Semi-automatic acquisition of domain-specific translation lexicons”. In Fifth Conference on Applied Natural Language Processing. Washington, D.C., 1997.

  • Robinson, D. The Translator's Turn. Baltimore and London: The Johns Hopkins University Press, 1991.

    Google Scholar 

  • Sciarone, A. G. Vocabolario fondamentale delle lingua italiana. Minerva Italica, Bergamo, 1977.

    Google Scholar 

  • Vander Beke, G. E. French Word List. New York: Macmillan, 1929.

    Google Scholar 

  • Vaughan, C. The New Testament from 26 Translations. Zondervan, Grand Rapids, MI, 1967.

    Google Scholar 

  • Véronis, J. Multext Home Page: Document MUL1, version 0.1, April. http://www.lpl.univaix.fr/projects/multext/, 1996.

  • Weigelt, M. A. “Textual Criticism of the Bible”. In Baker Encyclopedia of the Bible, Vol. I A-I. Ed. W. A. Elwall, Grand Rapids, MI, 1988.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Resnik, P., Olsen, M.B. & Diab, M. The Bible as a Parallel Corpus: Annotating the ‘Book of 2000 Tongues’. Computers and the Humanities 33, 129–153 (1999). https://doi.org/10.1023/A:1001798929185

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1001798929185

Navigation