Skip to main content

Improving Document Transformation Techniques with Collaborative Learned Term-Based Concepts

  • Chapter
Reading and Learning

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2956))

Abstract

Document Transformation techniques have been studied for decades. In this paper, a new approach for a significant improvement is presented based on using a new query expansion method. In contrast to other methods, the regarded query is expanded by adding those terms that are most similar to the concept of individual query terms, rather than selecting terms that are similar to the complete query or that are directly similar to the query terms. Experiments have shown that Document Transformation techniques are significantly improved in the retrieval effectiveness when measuring the recall-precision.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison- Wesley Publishing Company, London (1999)

    Google Scholar 

  2. Belkin, N.J., Croft, W.B.: Retrieval techniques. Annual Review of Information Science and Technology 22, 109–145 (1987)

    Google Scholar 

  3. Bhuyan, J.N., Deogun, J.S., Raghavan, V.: An Adaptive Information Retrival System Based on User-Oriented Clustering. ACM Transaction on Information Systemes (January 1997) (submitted to)

    Google Scholar 

  4. Brauen, T.L.: Document vector modification,  ch. 24, pp. 456–484. Prentice-Hall Inc., Englewood Cliffs (1971)

    Google Scholar 

  5. Chen, J.N., Chang, J.S.: A Concept-based Adaptive Approach to Word Sense Disambiguation. In: Proceedings of 36th Annual Meeting of the Association for ComputationalLinguistics and 17th International Conference on Computational Linguistics (COLING/ACL- 98), University of Montreal, Montreal, Quebec, Canada, August 10-14, vol. 1, pp. 237–243. Morgan Kaufmann Publishers, San Francisco (1998)

    Google Scholar 

  6. Cleverdon, C.W.: Optimizing convenient online access to bibliographic databases. Information Services and Use 4, 37–47 (1984)

    Google Scholar 

  7. Direct Hit. The Direct Hit popularity engine technology: A white paper (1999), http://www.directhit.com/about/products/technology_whitepaper.html

  8. Ferber, R.: Information Retrieval - Suchmodelle und Data-Mining-Verfahren für Textsammlungen und das Web, March 2003, p. 352. dpunkt.verlag, Heidelberg (2003)

    MATH  Google Scholar 

  9. Friedman, S.R., Maceyak, J.A., Weiss, S.F.: A relevance feedback system based on document transformations,  ch. 23, pp. 447–455. Prentice-Hall Inc, Englewood Cliffs (1971)

    Google Scholar 

  10. Fuhr, N., Buckley, C.: A probabilistic learning approach for document indexing. ACM Transactions on Information Systems 9, 223–248 (1991)

    Article  Google Scholar 

  11. Gudivada, V.N., Raghavan, V.V., Grosky, W.I., Kasanagottu, R.: Information Retrieval on the World Wide Web. IEEE Internet Computing 1(5) (September/October 1997)

    Google Scholar 

  12. Guthriee, J.A., Guthrie, L., Aidinejad, H., Wilks, Y.: Subject-Dependent Co-occurrence and Word Sense Disambiguation. In: Proceedings of 29th Annual Meeting of the Association for Computational Linguistics, June 18-21, pp. 146–152. University of California, Berkeley (1991)

    Chapter  Google Scholar 

  13. Harman, D.K.: Ranking algorithms, pp. 363–392. Prentice Hall, Englewood Cliffs (1992)

    Google Scholar 

  14. Henninger, T.: Untersuchungen zur optimierten und intelligenten Suche nach Informationen im WWW am Beispiel einer auf physikalische Inhalte ausgerichteten Suchmaschine, November 4 (2002)

    Google Scholar 

  15. Hust, A.: Query expansion methods for collaborative information retrieval. In: Dengel, A.R., Junker, M., Weisbecker, A. (eds.) Reading and Learning. LNCS, vol. 2956, pp. 252–280. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  16. Jansen, B.J., Spink, A., Saracevic, T.: Real Life, Real Users, and Real Needs: A Study and Analysis of User Queries on the Web. Information Processing and Management 36(2), 207–227 (2000)

    Article  Google Scholar 

  17. Kemp, C., Ramamohanarao, K.: Long-term learning for web search enginges. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 263–274. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  18. Kise, K., Junker, M., Dengel, A., Matsumoto, K.: Passage-Based Document Retrieval as a Tool for Text Mining with User’s Information Needs. In: Jantke, K.P., Shinohara, A. (eds.) DS 2001. LNCS (LNAI), vol. 2226, pp. 155–169. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  19. Klink, S.: Query reformulation with collaborative concept-based expansion. In: Proceedings of the First International Workshop on Web Document Analysis (WDA 2001), Seattle,Washington, USA, pp. 19–22 (2001)

    Google Scholar 

  20. Klink, S., Hust, A., Junker, M.: TCL - An Approach for Learning Meanings of Queries in Information Retrieval Systems. In: Content Management - Digitale Inhalte als Bausteine einer vernetzten Welt, June 2002, pp. 15–25 (2002)

    Google Scholar 

  21. Klink, S., Hust, A., Junker, M., Dengel, A.: Collaborative Learning of Term-Based Concepts for Automatic Query Expansion. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS (LNAI), vol. 2430, pp. 195–206. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  22. Klink, S., Hust, A., Junker, M., Dengel, A.: Improving Document Retrieval by Automatic Query Expansion Using Collaborative Learning of Term-Based Concepts. In: Lopresti, D.P., Hu, J., Kashi, R.S. (eds.) DAS 2002. LNCS, vol. 2423, pp. 376–387. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  23. Maron, M.E., Kuhns, J.L.: On relevance, probabilistic indexing and information retrieval. Journal of the Association for Computing Machinery 7(3), 216–244 (1960)

    Google Scholar 

  24. Oh, J.-H., Choi, K.-S.: Word Sense Disambiguation using Static and Dynamic Sense Vectors. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), Taipei, Taiwan, August 24 - September 1, vol. coling-252 (2002)

    Google Scholar 

  25. Peat, H.J., Willet, P.: The limitations of term cooccurrence data for query expansion in document retrieval systems. Journal of the ASIS 42(5), 378–383 (1991)

    Google Scholar 

  26. Pirkola, A.: Studies on Linguistic Problems and Methods in Text Retrieval: The Effects of Anaphor and Ellipsis Resolution in Proximity Searching, and Translation and Query Structuring Methods in Cross-Language Retrieval. In: Doctoral Dissertation, June 1999. Department of Information Science. University of Tampere, Finland (1999)

    Google Scholar 

  27. Rocchio, J.J.: Document Retrieval Systems - Optimization and Evaluation. Ph.D. Thesis, Harvard Computational Laboratory, Cambridge, MA (March 1966)

    Google Scholar 

  28. Rocchio, J.J.: Relevance feedback in information retrieval, pp. 313–323. Prentice-Hall Inc., Englewood Cliffs (1971)

    Google Scholar 

  29. Salton, G.: The SMART Retrieval System – Experiments in Automatic Document Processing. Prentice-Hall Inc., Englewood Cliffs (1971)

    Google Scholar 

  30. Salton, G.: Automatic Text Processing: The transformation, analysis, and retrieval of information by computer, MA. Addison-Wesley, Reading (1989)

    Google Scholar 

  31. Salton, G., Allen, J., Buckley, C.: Term-Weighting Approaches in Automatic Text Retrieval. Information Processing & Management 24(5), 513–523 (1988)

    Article  Google Scholar 

  32. Salton, G., Buckley, C.: Improving Retrieval Performance by Relevance Feedback. Journal of the American Society for Information Sciences 41(4), 288–297 (1990)

    Article  Google Scholar 

  33. Salton, G., Lesk, M.: Computer evaluation of indexing and text processing. Journal of the ACM 15(1), 8–36 (1968)

    Article  MATH  Google Scholar 

  34. Savoy, J., Vrajitoru, D.: Evaluation of learning schemes used in information retrieval. Technical Report CR-I-95-02, Faculty of Sciences, University of Neuchâtel (1996)

    Google Scholar 

  35. Schütze, H.: Automaticword sense discrimination. Computational Linguistics 24(1), 97–123 (1998)

    Google Scholar 

  36. The SMART document collection. currently: ftp://ftp.cs.cornell.edu/pub/smart/

  37. Text REtrieval Conference. TREC (2003), http://trec.nist.gov/

  38. Turtle, H.R., Croft, W.B.: Evaluation of an inference network-based retrieval model. ACM Transactions on Information Systems 9(3), 187–222 (1991)

    Article  Google Scholar 

  39. Wilson, R., Martinez, T.R.: Improved Heterogeneous Distance Functions. Journal of Artificial Intelligence Research 6, 1–34 (1997)

    MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Klink, S. (2004). Improving Document Transformation Techniques with Collaborative Learned Term-Based Concepts. In: Dengel, A., Junker, M., Weisbecker, A. (eds) Reading and Learning. Lecture Notes in Computer Science, vol 2956. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24642-8_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24642-8_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-21904-0

  • Online ISBN: 978-3-540-24642-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics