Skip to main content

A “Bag” or a “Window” of Words for Information Filtering?

  • Conference paper
Artificial Intelligence: Theories, Models and Applications (SETN 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5138))

Included in the following conference series:

Abstract

Treating documents as bag of words is the norm in Information Filtering. Syntactic and semantic correlations between terms are ignored, or in other words, term independence is assumed. In this paper we challenge this common assumption. We use Nootropia, a user profiling model that uses a sliding window approach to capture term dependencies in a network and a spreading activation process to take them into account for document evaluation. Experiments performed based on TREC’s routing guidelines demonstrate that given an adequate window size the additional information that term dependencies encode, results in improved filtering performance over a traditional bag of words approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ide, N., Veronis, J.: Word sense disambiguation: The state of the art. Computational Linguistics 24, 1–40 (1998)

    Google Scholar 

  2. Wong, S.K.M., Ziarko, W., Wong, P.C.N.: Generalized vector space model in information retrieval. In: Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 18–25. ACM Press, New York (1985)

    Google Scholar 

  3. Deerwester, S., Dumais, S.T., Landauer, G.W., Hashman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 391–407 (1990)

    Article  Google Scholar 

  4. Billhardt, H., Borrajo, D., Maojo, V.: A context vector model for information retrieval. Journal of the American Society for Information Science and Technology 53, 236–249 (2002)

    Article  Google Scholar 

  5. van Rijsbergen, C.J.: A theoretical basis for the use of co-occurrence data in information retrieval. Journal of Documentation 33, 106–199 (1977)

    Article  Google Scholar 

  6. Nallapati, R., Allan, J.: Capturing term dependencies using a language model based on sentence trees. In: 11th International Conference on Information and Knowledge Management (CIKM 2002), pp. 383–390. ACM Press, New York (2002)

    Google Scholar 

  7. Lee, C., Lee, G.G.: Probabilistic information retrieval model for a dependency structured indexing system. Information Processing and Management 45, 161–175 (2005)

    Article  Google Scholar 

  8. Losee, R.M.: Term dependence: Truncating the bahadur-lazarsfeld expansion. Information Processing and Management 30, 293–303 (1994)

    Article  Google Scholar 

  9. Turtle, H., Croft, W.B.: Inference networks for document retrieval. In: 13th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1–24 (1990)

    Google Scholar 

  10. Park, Y.C., Choi, K.S.: Automatic thesaurus construction using bayesian networks. Information Processing and Management 32, 543–553 (1996)

    Article  Google Scholar 

  11. Cunningham, S., Holmes, G., Littin, J., Beale, R., Witten, I.: Applying connectionist models to information retrieval. In: Amari, S., Kasobov, N. (eds.) Brain-Like Computing and Intelligent Information Systems, pp. 435–457. Springer, Heidelberg (1997)

    Google Scholar 

  12. Belew, R.K.: Adaptive information retrieval: using a connectionist representation to retrieve and learn about documents. In: Belkin, N., Rijsbergen, C. (eds.) 12th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 11–20. ACM Press, New York (1989)

    Chapter  Google Scholar 

  13. Wilkinson, R., Hingston, P.: Using the cosine measure in a neural network for document retrieval. In: 14th Annual Internation ACM SIGIR conference on Research and Development in Information Retrieval, pp. 202–210. ACM Press, New York (1991)

    Chapter  Google Scholar 

  14. Mothe, J.: Search mechanisms using a new neural network model comparison with the vector space model. In: Intelligent Multimedia Information Retrieval Systems and Management (RIAO 1994), pp. 275–294 (1994)

    Google Scholar 

  15. Wong, S.K.M., Cai, Y.J., Yao, Y.Y.: Computation of term associations by a neural network. In: 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 107–115. ACM Press, New York (1993)

    Chapter  Google Scholar 

  16. Sanderson, M., Croft, B.W.: Deriving concept hierarchies from text. In: 22nd Annual Internation ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, California, United States, pp. 206–213. ACM Press, New York (1999)

    Google Scholar 

  17. Anick, P., Tipirneri, S.: The paraphrase search assistant: Terminological feedback for iterative information seeking. In: Hearst, M., Gey, F., Tong, R. (eds.) 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 153–159 (1999)

    Google Scholar 

  18. Widyantoro, D.H., Ioerger, T.R., Yen, J.: An adaptive algorithm for learning changes in user interests. In: ACM/CIKM 1999 Conference on Information and Knowledge Management, Kansas City, MO, pp. 405–412 (1999)

    Google Scholar 

  19. Mostafa, J., Mukhopadhyay, S., Palakal, M., Lam, W.: A multilevel approach to intelligent information filtering: model, system, and evaluation. ACM Transactions on Information Systems (TOIS) 15, 368–399 (1997)

    Article  Google Scholar 

  20. Mladeni’c, D.: Using text learning to help web browsing. In: 9th International Conference on Human-Computer Interaction (HCI International 2001), New Orleans, LA, pp. 893–897 (2001)

    Google Scholar 

  21. Menczer, F., Belew, R.: Adaptive information agents in distributed textual environments. In: 2nd International Conference on Autonomous Agents, Minneapolis, MN, pp. 157–164 (1998)

    Google Scholar 

  22. Sorensen, H., O’ Riordan, A., O’ Riordan, C.: Profiling with the informer text filtering agent. Journal of Universal Computer Science 3, 988–1006 (1997)

    Google Scholar 

  23. McElligott, M., Sorensen, H.: An evolutionary connectionist approach to personal information filtering. In: 4th Irish Neural Networks Conference 1994, University College Dublin, Ireland, pp. 141–146 (1994)

    Google Scholar 

  24. Nanas, N., Uren, V., Roeck, A.D.: A comparative evaluation of term weighting methods in information filtering. In: 4th International Workshop on Natural Language and Information Systems (NLIS 2004), pp. 13–17 (2004)

    Google Scholar 

  25. Nanas, N., Uren, V., De Roeck, A., Domingue, J.: Building and applying a concept hierarchy representation of a user profile. In: 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 198–204. ACM press, New York (2003)

    Google Scholar 

  26. Nanas, N., Uren, V., De Roeck, A., Domingue, J.: Multi-topic information filtering with a single user profile. In: 3rd Hellenic Conference on Artificial Intelligence, pp. 400–409 (2004)

    Google Scholar 

  27. Bruza, P.D., Song, D.: Inferring query models by information flow analysis. In: Proceedings of the 11th International ACM Conference on Information and Knowledge Management (CIKM 2002), pp. 260–269 (2002)

    Google Scholar 

  28. Roeck, A.D., Sarkar, A., Garthwaite, P.H.: Defeating the homogeneity assumption. In: 7th International Conference on the Statistical Analysis of Textual Data (JADT), pp. 282–294 (2004)

    Google Scholar 

  29. Nanas, N., De Roeck, A.: Autopoiesis, the immune system and adaptive information filtering. Natural Computing (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

John Darzentas George A. Vouros Spyros Vosinakis Argyris Arnellos

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nanas, N., Vavalis, M. (2008). A “Bag” or a “Window” of Words for Information Filtering?. In: Darzentas, J., Vouros, G.A., Vosinakis, S., Arnellos, A. (eds) Artificial Intelligence: Theories, Models and Applications. SETN 2008. Lecture Notes in Computer Science(), vol 5138. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87881-0_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-87881-0_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-87880-3

  • Online ISBN: 978-3-540-87881-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics