Abstract
Document Transformation techniques have been studied for decades. In this paper, a new approach for a significant improvement is presented based on using a new query expansion method. In contrast to other methods, the regarded query is expanded by adding those terms that are most similar to the concept of individual query terms, rather than selecting terms that are similar to the complete query or that are directly similar to the query terms. Experiments have shown that Document Transformation techniques are significantly improved in the retrieval effectiveness when measuring the recall-precision.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison- Wesley Publishing Company, London (1999)
Belkin, N.J., Croft, W.B.: Retrieval techniques. Annual Review of Information Science and Technology 22, 109–145 (1987)
Bhuyan, J.N., Deogun, J.S., Raghavan, V.: An Adaptive Information Retrival System Based on User-Oriented Clustering. ACM Transaction on Information Systemes (January 1997) (submitted to)
Brauen, T.L.: Document vector modification,  ch. 24, pp. 456–484. Prentice-Hall Inc., Englewood Cliffs (1971)
Chen, J.N., Chang, J.S.: A Concept-based Adaptive Approach to Word Sense Disambiguation. In: Proceedings of 36th Annual Meeting of the Association for ComputationalLinguistics and 17th International Conference on Computational Linguistics (COLING/ACL- 98), University of Montreal, Montreal, Quebec, Canada, August 10-14, vol. 1, pp. 237–243. Morgan Kaufmann Publishers, San Francisco (1998)
Cleverdon, C.W.: Optimizing convenient online access to bibliographic databases. Information Services and Use 4, 37–47 (1984)
Direct Hit. The Direct Hit popularity engine technology: A white paper (1999), http://www.directhit.com/about/products/technology_whitepaper.html
Ferber, R.: Information Retrieval - Suchmodelle und Data-Mining-Verfahren für Textsammlungen und das Web, March 2003, p. 352. dpunkt.verlag, Heidelberg (2003)
Friedman, S.R., Maceyak, J.A., Weiss, S.F.: A relevance feedback system based on document transformations,  ch. 23, pp. 447–455. Prentice-Hall Inc, Englewood Cliffs (1971)
Fuhr, N., Buckley, C.: A probabilistic learning approach for document indexing. ACM Transactions on Information Systems 9, 223–248 (1991)
Gudivada, V.N., Raghavan, V.V., Grosky, W.I., Kasanagottu, R.: Information Retrieval on the World Wide Web. IEEE Internet Computing 1(5) (September/October 1997)
Guthriee, J.A., Guthrie, L., Aidinejad, H., Wilks, Y.: Subject-Dependent Co-occurrence and Word Sense Disambiguation. In: Proceedings of 29th Annual Meeting of the Association for Computational Linguistics, June 18-21, pp. 146–152. University of California, Berkeley (1991)
Harman, D.K.: Ranking algorithms, pp. 363–392. Prentice Hall, Englewood Cliffs (1992)
Henninger, T.: Untersuchungen zur optimierten und intelligenten Suche nach Informationen im WWW am Beispiel einer auf physikalische Inhalte ausgerichteten Suchmaschine, November 4 (2002)
Hust, A.: Query expansion methods for collaborative information retrieval. In: Dengel, A.R., Junker, M., Weisbecker, A. (eds.) Reading and Learning. LNCS, vol. 2956, pp. 252–280. Springer, Heidelberg (2004)
Jansen, B.J., Spink, A., Saracevic, T.: Real Life, Real Users, and Real Needs: A Study and Analysis of User Queries on the Web. Information Processing and Management 36(2), 207–227 (2000)
Kemp, C., Ramamohanarao, K.: Long-term learning for web search enginges. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 263–274. Springer, Heidelberg (2002)
Kise, K., Junker, M., Dengel, A., Matsumoto, K.: Passage-Based Document Retrieval as a Tool for Text Mining with User’s Information Needs. In: Jantke, K.P., Shinohara, A. (eds.) DS 2001. LNCS (LNAI), vol. 2226, pp. 155–169. Springer, Heidelberg (2001)
Klink, S.: Query reformulation with collaborative concept-based expansion. In: Proceedings of the First International Workshop on Web Document Analysis (WDA 2001), Seattle,Washington, USA, pp. 19–22 (2001)
Klink, S., Hust, A., Junker, M.: TCL - An Approach for Learning Meanings of Queries in Information Retrieval Systems. In: Content Management - Digitale Inhalte als Bausteine einer vernetzten Welt, June 2002, pp. 15–25 (2002)
Klink, S., Hust, A., Junker, M., Dengel, A.: Collaborative Learning of Term-Based Concepts for Automatic Query Expansion. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS (LNAI), vol. 2430, pp. 195–206. Springer, Heidelberg (2002)
Klink, S., Hust, A., Junker, M., Dengel, A.: Improving Document Retrieval by Automatic Query Expansion Using Collaborative Learning of Term-Based Concepts. In: Lopresti, D.P., Hu, J., Kashi, R.S. (eds.) DAS 2002. LNCS, vol. 2423, pp. 376–387. Springer, Heidelberg (2002)
Maron, M.E., Kuhns, J.L.: On relevance, probabilistic indexing and information retrieval. Journal of the Association for Computing Machinery 7(3), 216–244 (1960)
Oh, J.-H., Choi, K.-S.: Word Sense Disambiguation using Static and Dynamic Sense Vectors. In: Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002), Taipei, Taiwan, August 24 - September 1, vol. coling-252 (2002)
Peat, H.J., Willet, P.: The limitations of term cooccurrence data for query expansion in document retrieval systems. Journal of the ASIS 42(5), 378–383 (1991)
Pirkola, A.: Studies on Linguistic Problems and Methods in Text Retrieval: The Effects of Anaphor and Ellipsis Resolution in Proximity Searching, and Translation and Query Structuring Methods in Cross-Language Retrieval. In: Doctoral Dissertation, June 1999. Department of Information Science. University of Tampere, Finland (1999)
Rocchio, J.J.: Document Retrieval Systems - Optimization and Evaluation. Ph.D. Thesis, Harvard Computational Laboratory, Cambridge, MA (March 1966)
Rocchio, J.J.: Relevance feedback in information retrieval, pp. 313–323. Prentice-Hall Inc., Englewood Cliffs (1971)
Salton, G.: The SMART Retrieval System – Experiments in Automatic Document Processing. Prentice-Hall Inc., Englewood Cliffs (1971)
Salton, G.: Automatic Text Processing: The transformation, analysis, and retrieval of information by computer, MA. Addison-Wesley, Reading (1989)
Salton, G., Allen, J., Buckley, C.: Term-Weighting Approaches in Automatic Text Retrieval. Information Processing & Management 24(5), 513–523 (1988)
Salton, G., Buckley, C.: Improving Retrieval Performance by Relevance Feedback. Journal of the American Society for Information Sciences 41(4), 288–297 (1990)
Salton, G., Lesk, M.: Computer evaluation of indexing and text processing. Journal of the ACM 15(1), 8–36 (1968)
Savoy, J., Vrajitoru, D.: Evaluation of learning schemes used in information retrieval. Technical Report CR-I-95-02, Faculty of Sciences, University of Neuchâtel (1996)
Schütze, H.: Automaticword sense discrimination. Computational Linguistics 24(1), 97–123 (1998)
The SMART document collection. currently: ftp://ftp.cs.cornell.edu/pub/smart/
Text REtrieval Conference. TREC (2003), http://trec.nist.gov/
Turtle, H.R., Croft, W.B.: Evaluation of an inference network-based retrieval model. ACM Transactions on Information Systems 9(3), 187–222 (1991)
Wilson, R., Martinez, T.R.: Improved Heterogeneous Distance Functions. Journal of Artificial Intelligence Research 6, 1–34 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Klink, S. (2004). Improving Document Transformation Techniques with Collaborative Learned Term-Based Concepts. In: Dengel, A., Junker, M., Weisbecker, A. (eds) Reading and Learning. Lecture Notes in Computer Science, vol 2956. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24642-8_16
Download citation
DOI: https://doi.org/10.1007/978-3-540-24642-8_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21904-0
Online ISBN: 978-3-540-24642-8
eBook Packages: Springer Book Archive