Skip to main content

Searching for Text Documents

  • Chapter
Book cover Multimedia Retrieval

Part of the book series: Data-Centric Systems and Applications ((DCSA))

  • 1040 Accesses

Abstract

Many documents contain, besides text, also images, tables, and so on. This chapter concentrates on the text part only. Traditionally, systems handling text documents are called information storage and retrieval systems. Before the World-Wide Web emerged, such systems were almost exclusively used by professional users, so-called indexers and searchers, e.g., for medical research, in libraries, by governmental organizations and archives. Typically, professional users act as “search intermediaries” for end users. They try to fig out in an interactive dialogue with the system and the end user what it is the end user needs, and how this information should be used in a successful search. Professionals know the collection, they know how documents in the collection are represented in the system, and they know how to use Boolean search operators to control the number of retrieved documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. Berger and J. Lafferty. Information retrieval as statistical translation. In Proceedings of the 22nd ACM Conference on Research and Development in Information Retrieval (SIGIR’99), pages 222–229, 1999.

    Google Scholar 

  2. H.M. Blanken, T. Grabs, H.-J. Schek, and G. Weikum, editors. Intelligent Search on XML data: Applications, Languages, Models, Implementations, and Benchmarks, volume 2818. Springer: LNCS series, 2003.

    Google Scholar 

  3. S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30:107–117, 1998.

    Article  Google Scholar 

  4. G.G. Chowdhury. Introduction to modern information retrieval. Wiley, 1998.

    Google Scholar 

  5. W.B. Croft and D.J. Harper. Using probabilistic models of document retrieval without relevance information. Journal of Documentation, 35(4):285–295, 1979.

    Google Scholar 

  6. N. Fuhr. Probabilistic models in information retrieval. The Computer Journal, 35(3):243–255, 1992.

    Article  MATH  Google Scholar 

  7. N. Fuhr. Probabilistic datalog: A logic for powerful retrieval methods. In Proceedings of the 18th ACM Conference on Research and Development in Information Retrieval (SIGIR’95), pages 282–290, 1995.

    Google Scholar 

  8. W.R. Greiff, W.B. Croft, and H.R. Turtle. Computationally tractable probabilistic modeling of boolean operators. In Proceedings of the 20th ACM Conference on Research and Development in Information Retrieval (SIGIR’97), pages 119–128, 1997.

    Google Scholar 

  9. D.E. Heckerman. Probabilistic Similarity Networks. MIT Press, 1991.

    Google Scholar 

  10. D. Hiemstra. A linguistically motivated probabilistic model of information retrieval. In Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries (ECDL), pages 569–584, 1998.

    Google Scholar 

  11. D. Hiemstra and A.P. de Vries. Relating the new language models of information retrieval to the traditional retrieval models. Technical Report TR-CTIT-00-09, Centre for Telematics and Information Technology, 2000. http://www.ub.utwente.nl/webdocs/ctit/1/00000022.pdf.

    Google Scholar 

  12. D. Hiemstra and W. Kraaij. Twenty-One at TREC-7: Ad-hoc and crosslanguage track. In Proceedings of the seventh Text Retrieval Conference TREC-7, pages 227–238. NIST Special Publication 500-242, 1999.

    Google Scholar 

  13. M.I. Jordan, editor. Learning in Graphical Models. Kluwer Academic Press, 1998.

    Google Scholar 

  14. G. Kowalski. Information Retrieval Systems: Theory and Implementation. Kluwer Academic Publishers, 1997.

    Google Scholar 

  15. D.E. Losada and A. Barreiro. Using a belief revision operator for document ranking in extended boolean models. In Proceedings of the 22nd ACM Conference on Research and Development in Information Retrieval (SIGIR’99), pages 66–73, 1999.

    Google Scholar 

  16. H.P. Luhn. A statistical approach to mechanised encoding and searching of litary information. IBM Journal of Research and Development, 1(4):309–317, 1957.

    Article  MathSciNet  Google Scholar 

  17. C. Manning and H. Schütze. Foundations of Statistical Natural Language Processing. MIT Press, 1999.

    Google Scholar 

  18. D.R.H. Miller, T. Leek, and R.M. Schwartz. A hidden Markov model information retrieval system. In Proceedings of the 22nd ACM Conference on Research and Development in Information Retrieval (SIGIR’99), pages 214–221, 1999.

    Google Scholar 

  19. A.M. Mood and F.A. Graybill. Introduction to the Theory of Statistics, Second edition. McGraw-Hill, 1963.

    Google Scholar 

  20. K. Ng. A maximum likelihood ratio information retrieval model. In Proceedings of the eighth Text Retrieval Conference, TREC-8. NIST Special Publications, to appear.

    Google Scholar 

  21. J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, 1988.

    Google Scholar 

  22. J.M. Ponte and W.B. Croft. A language modeling approach to information retrieval. In Proceedings of the 21st ACM Conference on Research and Development in Information Retrieval (SIGIR’98), pages 275–281, 1998.

    Google Scholar 

  23. M.F. Porter. An algorithm for suffix stripping. Program, 14:130–137, 1980.

    Google Scholar 

  24. B.A.N. Ribeiro and R. Muntz. A belief network model for ir. In Proceedings of the 19th ACM Conference on Research and Development in Information Retrieval (SIGIR’96), pages 252–260, 1996.

    Google Scholar 

  25. S.E. Robertson and K. Sparck-Jones. Relevance weighting of search terms. Journal of the American Society for Information Science, 27:129–146, 1976.

    Article  Google Scholar 

  26. J.J. Rocchio. Relevance feedback in information retrieval. In G. Salton, editor, The Smart Retrieval System: Experiments in Automatic Document Processing, pages 313–323. Prentice Hall, 1971.

    Google Scholar 

  27. G. Salton. The SMART retrieval system: Experiments in automatic document processing. Prentice-Hall, 1971.

    Google Scholar 

  28. G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5):513–523, 1988.

    Article  Google Scholar 

  29. G. Salton, E.A. Fox, and H. Wu. Extended boolean information retrieval. Communications of the ACM, 26(11):1022–1036, 1983.

    Article  MATH  MathSciNet  Google Scholar 

  30. G. Salton and M.J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, 1983.

    Google Scholar 

  31. G. Salton and C.S. Yang. On the specification of term values in automatic indexing. Jounral of Documentation, 29(4):351–372, 1973.

    Google Scholar 

  32. P. Savino and F. Sebastiani. Essential bibliography on multimedia information retrieval, categorisation and filtering. In Slides of the 2nd European Digital Libraries Conference Tutorial on Multimedia Information Retrieval, 1998.

    Google Scholar 

  33. F. Sebastiani. A probabilistic terminological logic for modelling information retrieval. In Proceedings of the 17th ACM Conference on Research and Development in Information Retrieval (SIGIR’94), pages 122–130, 1994.

    Google Scholar 

  34. C.E. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27:379–423, 623–656, 1948.

    MATH  MathSciNet  Google Scholar 

  35. K. Sparck-Jones. A statistical interpretation of term specifity and its application in retrieval. Journal of Documentation, 28(1):11–20, 1972.

    Google Scholar 

  36. H. Turtle and W.B. Croft. Evaluation of an inference network-based retrieval model. ACM Transactions on Information Systems, 9(3):187–222, 1991.

    Article  Google Scholar 

  37. H.R. Turtle. Inference Networks for Document Retrieval. PhD thesis, Centre for Intelligent Information Retrieval, University of Massachusetts Amherst, 1991.

    Google Scholar 

  38. H.R. Turtle and W.B. Croft. A comparison of text retrieval models. The Computer Journal, 35(3):279–290, 1992.

    Article  MATH  Google Scholar 

  39. C.J. van Rijsbergen. Information Retrieval, second edition. Butterworths, 1979. http://www.dcs.gla.ac.uk/Keith/Preface.html.

    Google Scholar 

  40. C.J. van Rijsbergen. A non-classical logic for information retrieval. The Computer Journal, 29(6):481–485, 1986.

    Article  MATH  Google Scholar 

  41. I.H. Witten, A. Moffat, and T.C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Van Nostrand Reinhold, 1994.

    Google Scholar 

  42. I.H. Witten, A. Moffat, and T.C. Bell. Managing Gigabytes: Indexing. Morgan Kaufmann, 1999.

    Google Scholar 

  43. S.K.M. Wong and Y.Y. Yao. On modeling information retrieval with probabilistic inference. ACM Transactions on Information Systems, 13:38–68, 1995.

    Article  Google Scholar 

  44. L.A. Zadeh. Fuzzy sets. Information and Control, 8:338–353, 1965.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Blanken, H., Hiemstra, D. (2007). Searching for Text Documents. In: Blanken, H.M., Blok, H.E., Feng, L., de Vries, A.P. (eds) Multimedia Retrieval. Data-Centric Systems and Applications. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72895-5_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-72895-5_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-72894-8

  • Online ISBN: 978-3-540-72895-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics