Skip to main content

Document Information Retrieval

  • Chapter

Part of the Advances in Pattern Recognition book series (ACVPR)

Keywords

  • Query Processing
  • Relevance Feedback
  • Document Image
  • Query Term
  • Query Expansion

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (Canada)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cleverdon, C.W. (1984). Optimizing convenient online access to bibliographic databases. Information Services and Use, 4, pp. 37-47.

    Google Scholar 

  2. Salton, G. (1971). The SMART Retrieval System - Experiments in Automatic Document Processing. Englewood Cliffs, NJ: Prentice-Hall, Inc.

    Google Scholar 

  3. Baeza-Yates, R. and Ribeiro-Neto, B. (1999). Modern Information Retrieval. Reading, MA: Addison-Wesley.

    Google Scholar 

  4. Ferber, R. (2003). Information Retrieval - Suchmodelle und Data-Mining-Verfahren für Textsammlungen und das Web. Germany: dpunkt.verlag.

    MATH  Google Scholar 

  5. Luhn, H.P. (1958). The automatic creation of literature abstracts. IBM Journal of Research and Development, 2, pp. 159-165.

    CrossRef  MathSciNet  Google Scholar 

  6. Porter, M.F.(1980). An algorithm for suffix stripping. Program,14, pp. 130-137.

    Google Scholar 

  7. Porter, M.F. (2005). http://www.tartarus.org/martin/PorterStemmer/.

  8. Porter, M.F. (2005). A small string processing language for creating stemmers. http://snowball.tartarus.org.

  9. Porter, M.F. (1983). Information retrieval at the Sedgwick Museum. Information Technology: Research and Development, 2, pp. 169-186.

    Google Scholar 

  10. Lewis, D.D. and Spärck Jones, K. (1993). Natural language processing for information retrieval. Technical Report 307, University of Cambridge Computer Laboratory.

    Google Scholar 

  11. . Kupiec, J., Kimber, D., and Balasubramanian, V. (1994). Speech-based retrieval using semantic co-occurrence filtering. Proceedings of the Human Language Technology (HLT) Conference, US Advanced Projects Research Agency (ARPA), pp. 373-377.

    Google Scholar 

  12. Brauen, T.L. (1971). Document Vector Modification. Englewood Cliffs, NJ: Prentice Hall, pp. 456-484.

    Google Scholar 

  13. Salton, G. and Lesk, M. (1968). Computer evaluation of indexing and text processing. Journal of the ACM, 15, pp. 8-36.

    CrossRef  MATH  Google Scholar 

  14. Belkin, N.J. and Croft, W.B. (1987). Retrieval techniques. Annual Review of Information Science and Technology, 22, pp. 109-145.

    Google Scholar 

  15. Harman, D.K. (1992). Ranking Algorithms. Upper Saddle River, NJ: Prentice Hall, pp. 363-392.

    Google Scholar 

  16. Salton, G., Allen, J., and Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management,24, pp. 513-523.

    CrossRef  Google Scholar 

  17. Fuhr, N. and Buckley, C. (1991). A probabilistic learning approach for document indexing. ACM Transactions on Information Systems, 9, pp. 223-248.

    CrossRef  Google Scholar 

  18. Turtle, H.R. and Croft, W.B. (1991). Evaluation of an inference network-based retrieval model. ACM Transactions on Information Systems,9, pp. 187-222.

    CrossRef  Google Scholar 

  19. Rocchio, J.J. Relevance Feedback in Information Retrieval. Englewood Cliffs, NJ: Prentice Hall, pp. 313-323.

    Google Scholar 

  20. Wilson, R. and Martinez, T.R. (1997). Improved heterogeneous distance functions. Journal of Artificial Intelligence Research, 6, pp. 1-34.

    MATH  MathSciNet  Google Scholar 

  21. Kemp, C. and Ramamohanarao, K. (2002). Long-term learning for web search enginges. In: T. Elomaa, H. Mannila, and H. Toivonen (Eds.). Proceedings of the Sixth European Conference of Principles of Data Mining and Knowledge Discovery (PKKD2002). Lecture Notes in Artificial Intelligence 2431, Helsinki, Finland, Springer, pp. 263-274.

    CrossRef  Google Scholar 

  22. Bhuyan, J.N., Deogun, J.S., and Raghavan, V.V. (1997). An adaptive information retrival system based on user-oriented clustering. ACM Transaction on Information Systems.

    Google Scholar 

  23. Gudivada, V.N., Raghavan, V.V., Grosky, W.I., and Kasanagottu, R. (1997). Information retrieval on the World Wide Web. IEEE Internet Computing, 1.

    Google Scholar 

  24. Friedman, S.R., Maceyak, J.A., and Weiss, S.F. (1971). A Relevance Feedback System Based on Document Transformations. Englewood Cliffs, NJ: Prentice Hall, pp. 447-455.

    Google Scholar 

  25. Salton, G. (1989). Automatic Text Processing: the Transformation, Analysis, and Retrieval of Information by Computer. Reading, MA: Addison-Wesley.

    Google Scholar 

  26. Savoy, J. and Vrajitoru, D. (1996). Evaluation of learning schemes used in information retrieval. Technical Report CR-I-95-02, Faculty of Sciences, University of Neuchâtel.

    Google Scholar 

  27. Rocchio, J.J. (1966). Document retrieval systems - optimization and evaluation. Ph.D. thesis. Cambridge, MA: Harvard Computational Laboratory.

    Google Scholar 

  28. Salton, G. and Buckley, C. (1990). Improving retrieval performance by relevance feedback. Journal of the ASIS, 41, pp. 288-297.

    Google Scholar 

  29. Kise, K., Junker, M., Dengel, A., and Matsumoto, K. (2001). Passage-based document retrieval as a tool for text mining with user's information needs. In: K.P. Jantke, A. Shinohara (Eds.). Discovery Science. Lecture Notes in Computer Science. Princeton, NJ: Springer, Volume 2226, pp. 155-169.

    CrossRef  Google Scholar 

  30. Callan, J.P. (1994). Passage-level evidence in document retrieval. In: W.B. Croft, C.J. Rijsbergen (Eds.). SIGIR. New York: ACM/Springer, pp. 302-310.

    Google Scholar 

  31. Kise, K., Mizuno, H., Yamaguchi, M., and Matsumoto, K. (1999). On the use of density distribution of keywords for automated generation of hypertext links from arbitrary parts of documents. ICDAR, pp. 301-304.

    Google Scholar 

  32. Kise, K., Junker, M., Dengel, A., and Matsumoto, K. (2001). Experimental evaluation of passage-based document retrieval. ICDAR. Silver Spring, MD: IEEE Computer Society, pp. 592-596.

    Google Scholar 

  33. Kurohashi, S., Shiraki, N., and Nagao, M. (1997). A method for detecting important descriptions of a word based on its density distribution in text. Transactions of Information Processing Society of Japan, 38, pp. 845-853 (In Japanese).

    Google Scholar 

  34. Kretser, O. and Moffat, A. (1999). Effective document presentation with a locality-based similarity heuristic. SIGIR. New York: ACM, pp. 113-120.

    Google Scholar 

  35. Kozima, H. and Furugori, T. (1994). Segmenting narrative text into coherent scenes. Literary and Linguistic Computing, 9, pp. 13-19.

    CrossRef  Google Scholar 

  36. Hust, A., Klink, S., Junker, M., and Dengel, A. (2003). Towards collaborative information retrieval: three approaches. Text Mining, pp. 97-112.

    Google Scholar 

  37. Klink, S. (2004). Improving document transformation techniques with collaborative learned term-based concepts. Reading and Learning, pp. 281-305.

    Google Scholar 

  38. Klink, S., Hust, A., and Junker, M. (2002). TCL - an approach for learning meanings of queries in information retrieval systems. Content Management -Digitale Inhalte als Bausteine einer vernetzten Welt, pp. 15-25.

    Google Scholar 

  39. Klink, S., Hust, A., Junker, M., and Dengel, A. (2002). Collaborative learning of term-based concepts for automatic query expansion. Proceedings of the 13th European Conference on Machine Learning (ECML 2002). Lecture Notes in Artificial Intelligence. Helsinki, Finland: Springer, Volume 2430, pp. 195-206.

    Google Scholar 

  40. Klink, S., Hust, A., Junker, M., and Dengel, A. (2002). Improving document retrieval by automatic query expansion using collaborative learning of term-based concepts. Proceedings of the Fifth International Workshop on Document Analysis Systems (DAS 2002). Lecture Notes in Computer Science. Princeton, NJ: Springer, Volume 2423, pp. 376-387.

    Google Scholar 

  41. Text REtrieval conference (TREC). (2005). http://trec.nist.gov/.

  42. Klink, S. (2001). Query reformulation with collaborative concept-based expansion. Proceedings of the First International Workshop on Web Document Analysis (WDA 2001), Seattle, Washington, USA, pp. 19-22.

    Google Scholar 

  43. Pirkola, A. (1999). Studies on Linguistic Problems and Methods in Text Retrieval: The Effects of Anaphor and Ellipsis Resolution in Proximity Searching, and Translation and Query Structuring Methods in Cross-Language Retrieval. Doctoral dissertation. Finland: Department of Information Science, University of Tampere.

    Google Scholar 

  44. Jansen, B.J., Spink, A., and Saracevic, T. (2000). Real life, real users, and real needs: a study and analysis of user queries on the web. Information Processing and Management, 36, pp. 207-227.

    CrossRef  Google Scholar 

  45. Schütze, H. (1998). Automatic word sense discrimination. Computational Linguistics, 24, pp. 97-123.

    Google Scholar 

  46. Oh, J.H. and Choi, K.S. (2002). Word sense disambiguation using static and dynamic sense vectors. Proceedings of the 19th International Conference on Computational Linguistics, Taipei, Taiwan.

    Google Scholar 

  47. Peat, H.J. and Willet, P. (1991). The limitations of term cooccurrence data for query expansion in document retrieval systems. Journal of the American Society of Information Systems, 42, pp. 378-383.

    CrossRef  Google Scholar 

  48. Chen, J.N. and Chang, J.S. (1998). A concept-based adaptive approach to word sense disambiguation. Proceedings of 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics. Los Altos, CA: Morgan Kaufmann, Volume 1, pp. 237-243.

    Google Scholar 

  49. Guthriee, J.A., Guthrie, L., Aidinejad, H., and Wilks, Y. (1991). Subject-dependent cooccurrence and word sense disambiguation. Proceedings of 29th Annual Meeting of the Association for Computational Linguistics, pp. 146-152.

    Google Scholar 

  50. Kise, K., Junker, M., Dengel, A., and Matusmoto, K. (2004). Passage retrieval based on density distributions of terms and its applications to document retrieval and question answering. Reading and Learning, Adaptive Content Recognition. Lecture Notes in Computer Science. Springer, Volume 2956, pp. 306-327.

    Google Scholar 

  51. The ACM ditigal library. (2005). http://www.acm.org/dl/.

  52. Kise, K., Junker, M., Dengel, A., and Matusmoto, K. (2004). Document image retieval in a question answering system for document images. Proceedings of the Sixth International Workshop on Document Analysis Systems (DAS 2004). Lecture Notes in Computer Science. Springer, Volume 3163, pp. 521-532.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2007 Springer-Verlag London Limited

About this chapter

Cite this chapter

Klink, S., Kise, K., Dengel, A., Junker, M., Agne, S. (2007). Document Information Retrieval. In: Chaudhuri, B.B. (eds) Digital Document Processing. Advances in Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-84628-726-8_16

Download citation

  • DOI: https://doi.org/10.1007/978-1-84628-726-8_16

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84628-501-1

  • Online ISBN: 978-1-84628-726-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics