Skip to main content

Combining Families of Information Retrieval Algorithms Using Metalearning

  • Chapter

Abstract

This chapter describes some experiments that use metalearning to combine families of information retrieval (IR) algorithms obtained by varying the normalizations and similarity functions. By metalearning, we mean the following simple idea: a family of IR algorithms is applied to a corpus of documents in which relevance is known to produce a learning set. A machine learning algorithm is then applied to this data set to produce a classifier that combines the different IR algorithms. In experiments with TREC-3 data, we could significantly improve precision at the same level of recall with this technique. Most prior work in this area has focused on combining different IR algorithms with various averaging schemes or has used a fixed combining function. The combining function in metalearning is a statistical model itself which in general depends on the document, the query, and the various scores produced by the different component IR algorithms.

Keywords

  • Feature Vector
  • Information Retrieval
  • Similarity Metrics
  • Information Retrieval System
  • Distinct Term

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-1-4757-4305-0_7
  • Chapter length: 11 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   119.00
Price excludes VAT (USA)
  • ISBN: 978-1-4757-4305-0
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   159.00
Price excludes VAT (USA)
Hardcover Book
USD   159.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. T.G. Dietterich.Machine-learning research: Four current directions.AI Magazine, 18 (4): 97–136, 1997.

    Google Scholar 

  2. E.A. Fox and J.A. Shaw.Combination of multiple sources.In Proceedings of the Second Text Retrieval Conference (TREC-2), pages 97–136, 1994.

    Google Scholar 

  3. R.L. Grossman, H. Bodek, D. Northcutt, and H.V. Poor.Data mining and tree-based optimization.In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, E. Simoudis, J. Han and U. Fayyad, eds., AAAI Press, Menlo Park, CA, pages 323–326, 1996.

    Google Scholar 

  4. R.L. Grossman and R.G. Larson.A state space realization theorem for data mining. In subm., 2002.

    Google Scholar 

  5. E. Greengrass.Information retrieval: A survey.United States Department of Defense Technical Report TR–R52–008–001, 2001.

    Google Scholar 

  6. D.K. Harman, editor.Proceedings of the Third Text Retrieval Conference (TREC-3). National Institute of Standards and Technology Special Publication 500–226, 1995.

    Google Scholar 

  7. D.A. Hull, J.O. Pedersen, and H. Schütze.Method combination for document filtering.In Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,1996.

    Google Scholar 

  8. J.H. Lee.Combining multiple evidence from different properties of weighting schemes.In Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,1995.

    Google Scholar 

  9. J.H. Lee.Analyses of multiple evidence combination.In Proceedings of the Twentieth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,1997.

    Google Scholar 

  10. J. Mayfield.Personal communication, 2000.

    Google Scholar 

  11. J. Mayfield, P. McNamee, and C. Piatko.The JHU/APL HAIRCUT System at TREC-8.National Institute of Standards and Technology Special Publication, 2000.

    Google Scholar 

  12. PATTERN. The pattern system version 2.6, Magnify, Inc., 1999.

    Google Scholar 

  13. A.L. Prodromidis, P.K. Chan, and S.J. Stolfo.Meta-learning in distributed data mining systems, issues and approaches.In Advances in Distributed Data Mining, Hillol Kargupta and Philip Chan, eds., MIT Press, Cambridge, MA, pages 81–113, 2000.

    Google Scholar 

  14. G. Salton.Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer.Addison-Wesley, Reading, MA, 1989.

    Google Scholar 

  15. C.C. Vogt and G.W. Cottrell.Predicting the performance of linearly combined IR systems.In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval,pages 190–196, 998.

    Google Scholar 

  16. C. J. van Rijsbergen.Information Retrieval, second edition. Butterworths, London, 1979.

    Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2004 Springer Science+Business Media New York

About this chapter

Cite this chapter

Cornelson, M., Greengrass, E., Grossman, R.L., Karidi, R., Shnidman, D. (2004). Combining Families of Information Retrieval Algorithms Using Metalearning. In: Berry, M.W. (eds) Survey of Text Mining. Springer, New York, NY. https://doi.org/10.1007/978-1-4757-4305-0_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-4757-4305-0_7

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4419-3057-6

  • Online ISBN: 978-1-4757-4305-0

  • eBook Packages: Springer Book Archive