Advertisement

Information Theoretic Approach to Information Extraction

  • Giambattista Amati
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4027)

Abstract

We use the hypergeometric distribution to extract relevant information from documents. The hypergeometric distribution gives the probability estimate of observing a given term frequency with respect to a prior. The lower the probability the higher the amount of information is carried by the term. Given a subset of documents, the information items are weighted by using the inversely related function of of the hypergeometric distribution. We here provide an exemplifying introduction to a topic-driven information extraction from a document collection based on the hypergeometric distribution.

Keywords

Maximum Likelihood Estimator Information Extraction Relevance Feedback Query Expansion Hypergeometric Distribution 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Amati, G.: Frequentist and Bayesian Approach to Information Retrieval. In: Lalmas, M., MacFarlane, A., Rüger, S.M., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 13–24. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  2. 2.
    Attar, R., Fraenkel, A.S.: Local feedback in full-text retrieval systems. Journal of the ACM (JACM) 24(3), 397–417 (1977)MATHCrossRefGoogle Scholar
  3. 3.
    Cai, D., Van Rijsbergen, C.J., Jose, J.M.: Automatic query expansion based on divergence. In: Paques, H., Liu, L., Grossman, D. (eds.) Proceedings of the Tenth International Conference on Information and Knowledge Management (CIKM-2001), Novmber 5–10, pp. 419–426. ACM Press, New York (2001)CrossRefGoogle Scholar
  4. 4.
    Carpineto, C., De Mori, R., Romano, G., Bigi, B.: An information theoretic approach to automatic query expansion. ACM Transactions on Information Systems 19(1), 1–27 (2001)CrossRefGoogle Scholar
  5. 5.
    Croft, W.: Relevance feedback and inference networks. In: Proceedings of the 16th Annual International ACM SIGIR Conference, pp. 2–11 (1993)Google Scholar
  6. 6.
    Croft, W., Harper, D.: Using probabilistic models of document retrieval without relevance information. Journal of Documentation 35, 285–295 (1979)CrossRefGoogle Scholar
  7. 7.
    Haines, D., Croft, W.B.: Relevance feedback and inference networks. In: Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 2–11. ACM Press, New York (1993)CrossRefGoogle Scholar
  8. 8.
    Harman, D.: Relevance feedback revisited. In: Proceedings of ACM SIGIR, Copenhagen, Denmark, June 1992, pp. 1–10 (1992)Google Scholar
  9. 9.
    Ide, E.: New experiments in relevance feedback. In: Salton (ed.) The SMART Retrieval System, pp. 337–354. Prentice-Hall, Englewood Cliffs (1971)Google Scholar
  10. 10.
    Kwok, K.L.: A new method of weighting query terms for ad-hoc retrieval. In: Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 187–195. ACM Press, New York (1996)CrossRefGoogle Scholar
  11. 11.
    Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: Probabilistic models for segmenting and labeling sequence data. In: Brodley, C.E., Danyluk, A.P. (eds.) ICML, pp. 282–289. Morgan Kaufmann, San Francisco (2001)Google Scholar
  12. 12.
    Lafferty, J., Zhai, C.: Document Language Models, Query Models, and Risk Minimization for Information Retrieval. In: Proceedings of ACM SIGIR, New Orleans, Louisiana, USA, September 9-12, pp. 111–119. ACM Press, New York (2001)Google Scholar
  13. 13.
    Robertson, S.: On relevance weight estimation and query expansion. Journal of Documentation 42(3), 288–297 (1986)CrossRefGoogle Scholar
  14. 14.
    Robertson, S.E., Sparck-Jones, K.: Relevance weighting of search terms. Journal of the American Society for Information Science 27, 129–146 (1976)CrossRefGoogle Scholar
  15. 15.
    Rocchio, J.: Relevance feedback in information retrieval. In: Salton (ed.) The SMART Retrieval System, pp. 313–323. Prentice-Hall, Englewood Cliffs (1971)Google Scholar
  16. 16.
    Salton, G., Buckley, C.: Improving Retrieval Performance by Relevance Feedback. Journal of the American Society for Information Science 41(4), 182–188 (1990)CrossRefGoogle Scholar
  17. 17.
    Voorhees, E.M.: Query expansion using lexical-semantic relations. In: Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 61–69. Springer, New York (1994)Google Scholar
  18. 18.
    Wellner, B., McCallum, A., Peng, F., Hay, M.: An integrated, conditional model of information extraction and coreference with application to citation matching. In: Conference on Uncertainty in Artificial Intelligence (UAI) (2004)Google Scholar
  19. 19.
    Xu, J., Croft, W.: Query expansion using local and global document analysis. In: Proceedings of ACM SIGIR, Zurich, Switzerland, August 1996, pp. 4–11 (1996)Google Scholar
  20. 20.
    Xu, J., Croft, W.B.: Improving the effectiveness of information retrieval with local context analysis. ACM Transactions on Information Systems (TOIS) 18(1), 79–112 (2000)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Giambattista Amati
    • 1
  1. 1.Fondazione Ugo BordoniRomeItaly

Personalised recommendations