A “Bag” or a “Window” of Words for Information Filtering?

Nanas, Nikolaos; Vavalis, Manolis

doi:10.1007/978-3-540-87881-0_17

Nikolaos Nanas¹ &
Manolis Vavalis¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5138))

Included in the following conference series:

Hellenic Conference on Artificial Intelligence

1796 Accesses
7 Citations

Abstract

Treating documents as bag of words is the norm in Information Filtering. Syntactic and semantic correlations between terms are ignored, or in other words, term independence is assumed. In this paper we challenge this common assumption. We use Nootropia, a user profiling model that uses a sliding window approach to capture term dependencies in a network and a spreading activation process to take them into account for document evaluation. Experiments performed based on TREC’s routing guidelines demonstrate that given an adequate window size the additional information that term dependencies encode, results in improved filtering performance over a traditional bag of words approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ide, N., Veronis, J.: Word sense disambiguation: The state of the art. Computational Linguistics 24, 1–40 (1998)
Google Scholar
Wong, S.K.M., Ziarko, W., Wong, P.C.N.: Generalized vector space model in information retrieval. In: Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 18–25. ACM Press, New York (1985)
Google Scholar
Deerwester, S., Dumais, S.T., Landauer, G.W., Hashman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 391–407 (1990)
Article Google Scholar
Billhardt, H., Borrajo, D., Maojo, V.: A context vector model for information retrieval. Journal of the American Society for Information Science and Technology 53, 236–249 (2002)
Article Google Scholar
van Rijsbergen, C.J.: A theoretical basis for the use of co-occurrence data in information retrieval. Journal of Documentation 33, 106–199 (1977)
Article Google Scholar
Nallapati, R., Allan, J.: Capturing term dependencies using a language model based on sentence trees. In: 11th International Conference on Information and Knowledge Management (CIKM 2002), pp. 383–390. ACM Press, New York (2002)
Google Scholar
Lee, C., Lee, G.G.: Probabilistic information retrieval model for a dependency structured indexing system. Information Processing and Management 45, 161–175 (2005)
Article Google Scholar
Losee, R.M.: Term dependence: Truncating the bahadur-lazarsfeld expansion. Information Processing and Management 30, 293–303 (1994)
Article Google Scholar
Turtle, H., Croft, W.B.: Inference networks for document retrieval. In: 13th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1–24 (1990)
Google Scholar
Park, Y.C., Choi, K.S.: Automatic thesaurus construction using bayesian networks. Information Processing and Management 32, 543–553 (1996)
Article Google Scholar
Cunningham, S., Holmes, G., Littin, J., Beale, R., Witten, I.: Applying connectionist models to information retrieval. In: Amari, S., Kasobov, N. (eds.) Brain-Like Computing and Intelligent Information Systems, pp. 435–457. Springer, Heidelberg (1997)
Google Scholar
Belew, R.K.: Adaptive information retrieval: using a connectionist representation to retrieve and learn about documents. In: Belkin, N., Rijsbergen, C. (eds.) 12th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 11–20. ACM Press, New York (1989)
Chapter Google Scholar
Wilkinson, R., Hingston, P.: Using the cosine measure in a neural network for document retrieval. In: 14th Annual Internation ACM SIGIR conference on Research and Development in Information Retrieval, pp. 202–210. ACM Press, New York (1991)
Chapter Google Scholar
Mothe, J.: Search mechanisms using a new neural network model comparison with the vector space model. In: Intelligent Multimedia Information Retrieval Systems and Management (RIAO 1994), pp. 275–294 (1994)
Google Scholar
Wong, S.K.M., Cai, Y.J., Yao, Y.Y.: Computation of term associations by a neural network. In: 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 107–115. ACM Press, New York (1993)
Chapter Google Scholar
Sanderson, M., Croft, B.W.: Deriving concept hierarchies from text. In: 22nd Annual Internation ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, California, United States, pp. 206–213. ACM Press, New York (1999)
Google Scholar
Anick, P., Tipirneri, S.: The paraphrase search assistant: Terminological feedback for iterative information seeking. In: Hearst, M., Gey, F., Tong, R. (eds.) 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 153–159 (1999)
Google Scholar
Widyantoro, D.H., Ioerger, T.R., Yen, J.: An adaptive algorithm for learning changes in user interests. In: ACM/CIKM 1999 Conference on Information and Knowledge Management, Kansas City, MO, pp. 405–412 (1999)
Google Scholar
Mostafa, J., Mukhopadhyay, S., Palakal, M., Lam, W.: A multilevel approach to intelligent information filtering: model, system, and evaluation. ACM Transactions on Information Systems (TOIS) 15, 368–399 (1997)
Article Google Scholar
Mladeni’c, D.: Using text learning to help web browsing. In: 9th International Conference on Human-Computer Interaction (HCI International 2001), New Orleans, LA, pp. 893–897 (2001)
Google Scholar
Menczer, F., Belew, R.: Adaptive information agents in distributed textual environments. In: 2nd International Conference on Autonomous Agents, Minneapolis, MN, pp. 157–164 (1998)
Google Scholar
Sorensen, H., O’ Riordan, A., O’ Riordan, C.: Profiling with the informer text filtering agent. Journal of Universal Computer Science 3, 988–1006 (1997)
Google Scholar
McElligott, M., Sorensen, H.: An evolutionary connectionist approach to personal information filtering. In: 4th Irish Neural Networks Conference 1994, University College Dublin, Ireland, pp. 141–146 (1994)
Google Scholar
Nanas, N., Uren, V., Roeck, A.D.: A comparative evaluation of term weighting methods in information filtering. In: 4th International Workshop on Natural Language and Information Systems (NLIS 2004), pp. 13–17 (2004)
Google Scholar
Nanas, N., Uren, V., De Roeck, A., Domingue, J.: Building and applying a concept hierarchy representation of a user profile. In: 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 198–204. ACM press, New York (2003)
Google Scholar
Nanas, N., Uren, V., De Roeck, A., Domingue, J.: Multi-topic information filtering with a single user profile. In: 3rd Hellenic Conference on Artificial Intelligence, pp. 400–409 (2004)
Google Scholar
Bruza, P.D., Song, D.: Inferring query models by information flow analysis. In: Proceedings of the 11th International ACM Conference on Information and Knowledge Management (CIKM 2002), pp. 260–269 (2002)
Google Scholar
Roeck, A.D., Sarkar, A., Garthwaite, P.H.: Defeating the homogeneity assumption. In: 7th International Conference on the Statistical Analysis of Textual Data (JADT), pp. 282–294 (2004)
Google Scholar
Nanas, N., De Roeck, A.: Autopoiesis, the immune system and adaptive information filtering. Natural Computing (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Research and Technology - Thessaly (CE.RE.TE.TH), Greece
Nikolaos Nanas & Manolis Vavalis

Authors

Nikolaos Nanas
View author publications
You can also search for this author in PubMed Google Scholar
Manolis Vavalis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

John Darzentas George A. Vouros Spyros Vosinakis Argyris Arnellos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nanas, N., Vavalis, M. (2008). A “Bag” or a “Window” of Words for Information Filtering?. In: Darzentas, J., Vouros, G.A., Vosinakis, S., Arnellos, A. (eds) Artificial Intelligence: Theories, Models and Applications. SETN 2008. Lecture Notes in Computer Science(), vol 5138. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87881-0_17

Download citation

DOI: https://doi.org/10.1007/978-3-540-87881-0_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-87880-3
Online ISBN: 978-3-540-87881-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics