Computers and the Humanities

, Volume 29, Issue 6, pp 413–429 | Cite as

Using latent semantic indexing for multilanguage information retrieval

  • Michael W. Berry
  • Paul G. Young


In this paper, a method for indexing cross-language databases for conceptual query matching is presented. Two languages (Greek and English) are combined by appending a small portion of documents from one language to the identical documents in the other language. The proposed merging strategy duplicates less than 7% of the entire database (made up of different translations of the Gospels). Previous strategies duplicated up to 34% of the initial database in order to perform the merger. The proposed method retrieves a larger number of relevant documents for both languages with higher cosine rankings when Latent Semantic Indexing (LSI) is employed. Using the proposed merge strategies, LSI is shown to be effective in retrieving documents from either language (Greek or English) without requiring any translation of a user's query. An effective Bible search product needs to allow the use of natural language for searching (queries). LSI enables the user to form queries with using natural expressions in the user's own native language. The merging strategy proposed in this study enables LSI to retrieve relevant documents effectively using a minimum of the database in a foreign language.

Key words

Bible English Gospels Greek Hebrew information retrieval latent semantic indexing singular value decomposition 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Barker, K., D. Burdick, J. Stek, W. Wessel and R. Youngblood.The New International Study Bible. First edition. Grand Rapids: Zondervan Bible Publishers, 1985.Google Scholar
  2. Berry M. and S. Dumais. “Using Linear Algebra for Intelligent Information Retrieval”.SIAM Review (1995). In press.Google Scholar
  3. Berry, M., S. Dumais and G. O'Brien. “The Computational Complexity of Alternative Updating Approaches for an SVD-encoded Indexing Scheme”. InProceedings of the Seventh SIAM Conference on Parallel Processing for Scientific Computing. Philadelphia: SIAM (1995), pp. 39–44.Google Scholar
  4. Berry, M. W. “Large Scale Singular Value Computations”.International Journal of Supercomputer Applications, 6, 1 (1992), 13–49.Google Scholar
  5. Berry, M. W.SVDPACK: A Fortran-77 Software Library for the Sparse Singular Value Decomposition. Technical Report CS-92-159. University of Tennessee, Knoxville, TN, June 1992.Google Scholar
  6. Berry, M. W. et al.SVDPACKC: Version 1.0 User's Guide. Technical Report CS-93-194. University of Tennessee, Knxville, TN, October 1993.Google Scholar
  7. Deerwester, S., S. Dumais, G. Furnas, T. Landauer and R. Harshman. “Indexing by Latent Semantic Analysis”.Journal of the American Society for Information Science, 41, 6 (1990), 1–13.Google Scholar
  8. Dumais, S. T. “Improving the Retrieval of Information from External Sources”.Behavior Research Methods, Instruments, and Computers, 23, 2 (1991), 229–236.Google Scholar
  9. Golub, G. and C. Van Loan.Matrix Computations, Second edition. Baltimore, MD: Johns-Hopkins, 1989.Google Scholar
  10. Hewitt, S. “Bible Search Programs”.Christian Computing, 5, 11 (1993), 14–24.Google Scholar
  11. Landauer, T. K. and M. L. Littman. “Fully Automatic Cross-language Document Retrieval Using Latent Semantic Indexing”. InProceedings of the Sixth Annual Conference of the UW Centre for the New Oxford English Dictionary and Text Research. Waterloo, Ontario, October 1990, pp. 31–38.Google Scholar
  12. Mirsky, L. “Symmetric Gauge Functions and Unitarily Invariant Norms”.Q. J. Math, 11, 1 (1960), 50–59.Google Scholar
  13. O'Brien, G. W.Information Management Tools for Updating an SVD-Encoded Indexing Scheme. Master's thesis. The University of Tennessee, Knoxville, TN, December 1994.Google Scholar
  14. Strong, J.The Exhaustive Concordance of the Bible. Nashville, TN: Abingdon Press, 1890.Google Scholar
  15. Young, P. G.Cross-Language Information Retrieval Using Latent Semantic Indexing. Master's thesis. The University of Tennessee, Knoxville, TN, December 1994.Google Scholar

Copyright information

© Kluwer Academic Publishers 1995

Authors and Affiliations

  • Michael W. Berry
    • 1
  • Paul G. Young
    • 1
  1. 1.Department of Computer ScienceUniversity of TennesseeKnoxvilleUSA

Personalised recommendations