Document Image Retrieval in a Question Answering System for Document Images

  • Koichi Kise
  • Shota Fukushima
  • Keinosuke Matsumoto
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3163)


Question answering (QA) is the task of retrieving an answer in response to a question by analyzing documents. Although most of the efforts in developing QA systems are devoted to dealing with electronic text, we consider it is also necessary to develop systems for document images. In this paper, we propose a method of document image retrieval for such QA systems. Since the task is not to retrieve all relevant documents but to find the answer somewhere in documents, retrieval should be precision oriented. The main contribution of this paper is to propose a method of improving precision of document image retrieval by taking into account the co-occurrence of successive terms in a question. The indexing scheme is based on two-dimensional distributions of terms and the weight of co-occurrence is measured by calculating the density distributions of terms. The proposed method was tested by using 1253 pages of documents about the major league baseball with 20 questions and found that it is superior to the baseline method proposed by the authors.


  1. 1.
    Voorhees, E.M.: Overview of the TREC 2002 Question Answering Track. In: Proc. of Text REtrieval Conference 2002,
  2. 2.
  3. 3.
    Kwok, C.C.T., Etzioni, O., Weld, D.S.: Scaling Question Answering to the Web. In: Proc. WWW, vol. 10, pp. 150–161 (2001)Google Scholar
  4. 4.
  5. 5. (in Japanese)
  6. 6.
  7. 7.
    Kise, K., Tsujino, M., Matsumoto, K.: Spotting Where to Read on Pages — Retrieval of Relevant Parts from Page Images. In: Lopresti, D.P., Hu, J., Kashi, R.S. (eds.) DAS 2002. LNCS, vol. 2423, pp. 388–399. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  8. 8.
    Kise, K., Yin, W., Matsumoto, K.: Document Image Retrieval Based on 2D Density Distributions of Terms with Pseudo Relevance Feedback. In: Proc. ICDAR 2003, pp. 488–492 (2003)Google Scholar
  9. 9.
    Information Retrieval and OCR: From Converting Content to Grasping Meaning. In: A Workshop at SIGIR 2002 Google Scholar
  10. 10.
    Doermann, D.: The Indexing and Retrieval of Document Images: A Survey. Computer Vision and Image Processing 70(3), 287–298 (1998)CrossRefGoogle Scholar
  11. 11.
    Kurohashi, S., Shiraki, N., Nagao, M.: A Method for Detecting Important Descriptions of a Word Based on Its Density Distribution in Text. Trans. Information Processing Society of Japan 30(4), 845–853 (1997) (In Japanese)Google Scholar
  12. 12.
    Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Pub. Co., Reading (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Koichi Kise
    • 1
  • Shota Fukushima
    • 2
  • Keinosuke Matsumoto
    • 1
  1. 1.Department of Computer and Systems Sciences, Graduate School of EngineeringOsaka Prefecture University 
  2. 2.Department of Computer and Systems Sciences, College of EngineeringOsaka Prefecture UniversitySakai, OsakaJapan

Personalised recommendations