Advertisement

A “Bag” or a “Window” of Words for Information Filtering?

  • Nikolaos Nanas
  • Manolis Vavalis
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5138)

Abstract

Treating documents as bag of words is the norm in Information Filtering. Syntactic and semantic correlations between terms are ignored, or in other words, term independence is assumed. In this paper we challenge this common assumption. We use Nootropia, a user profiling model that uses a sliding window approach to capture term dependencies in a network and a spreading activation process to take them into account for document evaluation. Experiments performed based on TREC’s routing guidelines demonstrate that given an adequate window size the additional information that term dependencies encode, results in improved filtering performance over a traditional bag of words approach.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ide, N., Veronis, J.: Word sense disambiguation: The state of the art. Computational Linguistics 24, 1–40 (1998)Google Scholar
  2. 2.
    Wong, S.K.M., Ziarko, W., Wong, P.C.N.: Generalized vector space model in information retrieval. In: Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 18–25. ACM Press, New York (1985)Google Scholar
  3. 3.
    Deerwester, S., Dumais, S.T., Landauer, G.W., Hashman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 391–407 (1990)CrossRefGoogle Scholar
  4. 4.
    Billhardt, H., Borrajo, D., Maojo, V.: A context vector model for information retrieval. Journal of the American Society for Information Science and Technology 53, 236–249 (2002)CrossRefGoogle Scholar
  5. 5.
    van Rijsbergen, C.J.: A theoretical basis for the use of co-occurrence data in information retrieval. Journal of Documentation 33, 106–199 (1977)CrossRefGoogle Scholar
  6. 6.
    Nallapati, R., Allan, J.: Capturing term dependencies using a language model based on sentence trees. In: 11th International Conference on Information and Knowledge Management (CIKM 2002), pp. 383–390. ACM Press, New York (2002)Google Scholar
  7. 7.
    Lee, C., Lee, G.G.: Probabilistic information retrieval model for a dependency structured indexing system. Information Processing and Management 45, 161–175 (2005)CrossRefGoogle Scholar
  8. 8.
    Losee, R.M.: Term dependence: Truncating the bahadur-lazarsfeld expansion. Information Processing and Management 30, 293–303 (1994)CrossRefGoogle Scholar
  9. 9.
    Turtle, H., Croft, W.B.: Inference networks for document retrieval. In: 13th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1–24 (1990)Google Scholar
  10. 10.
    Park, Y.C., Choi, K.S.: Automatic thesaurus construction using bayesian networks. Information Processing and Management 32, 543–553 (1996)CrossRefGoogle Scholar
  11. 11.
    Cunningham, S., Holmes, G., Littin, J., Beale, R., Witten, I.: Applying connectionist models to information retrieval. In: Amari, S., Kasobov, N. (eds.) Brain-Like Computing and Intelligent Information Systems, pp. 435–457. Springer, Heidelberg (1997)Google Scholar
  12. 12.
    Belew, R.K.: Adaptive information retrieval: using a connectionist representation to retrieve and learn about documents. In: Belkin, N., Rijsbergen, C. (eds.) 12th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 11–20. ACM Press, New York (1989)CrossRefGoogle Scholar
  13. 13.
    Wilkinson, R., Hingston, P.: Using the cosine measure in a neural network for document retrieval. In: 14th Annual Internation ACM SIGIR conference on Research and Development in Information Retrieval, pp. 202–210. ACM Press, New York (1991)CrossRefGoogle Scholar
  14. 14.
    Mothe, J.: Search mechanisms using a new neural network model comparison with the vector space model. In: Intelligent Multimedia Information Retrieval Systems and Management (RIAO 1994), pp. 275–294 (1994)Google Scholar
  15. 15.
    Wong, S.K.M., Cai, Y.J., Yao, Y.Y.: Computation of term associations by a neural network. In: 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 107–115. ACM Press, New York (1993)CrossRefGoogle Scholar
  16. 16.
    Sanderson, M., Croft, B.W.: Deriving concept hierarchies from text. In: 22nd Annual Internation ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, California, United States, pp. 206–213. ACM Press, New York (1999)Google Scholar
  17. 17.
    Anick, P., Tipirneri, S.: The paraphrase search assistant: Terminological feedback for iterative information seeking. In: Hearst, M., Gey, F., Tong, R. (eds.) 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 153–159 (1999)Google Scholar
  18. 18.
    Widyantoro, D.H., Ioerger, T.R., Yen, J.: An adaptive algorithm for learning changes in user interests. In: ACM/CIKM 1999 Conference on Information and Knowledge Management, Kansas City, MO, pp. 405–412 (1999)Google Scholar
  19. 19.
    Mostafa, J., Mukhopadhyay, S., Palakal, M., Lam, W.: A multilevel approach to intelligent information filtering: model, system, and evaluation. ACM Transactions on Information Systems (TOIS) 15, 368–399 (1997)CrossRefGoogle Scholar
  20. 20.
    Mladeni’c, D.: Using text learning to help web browsing. In: 9th International Conference on Human-Computer Interaction (HCI International 2001), New Orleans, LA, pp. 893–897 (2001)Google Scholar
  21. 21.
    Menczer, F., Belew, R.: Adaptive information agents in distributed textual environments. In: 2nd International Conference on Autonomous Agents, Minneapolis, MN, pp. 157–164 (1998)Google Scholar
  22. 22.
    Sorensen, H., O’ Riordan, A., O’ Riordan, C.: Profiling with the informer text filtering agent. Journal of Universal Computer Science 3, 988–1006 (1997)Google Scholar
  23. 23.
    McElligott, M., Sorensen, H.: An evolutionary connectionist approach to personal information filtering. In: 4th Irish Neural Networks Conference 1994, University College Dublin, Ireland, pp. 141–146 (1994)Google Scholar
  24. 24.
    Nanas, N., Uren, V., Roeck, A.D.: A comparative evaluation of term weighting methods in information filtering. In: 4th International Workshop on Natural Language and Information Systems (NLIS 2004), pp. 13–17 (2004)Google Scholar
  25. 25.
    Nanas, N., Uren, V., De Roeck, A., Domingue, J.: Building and applying a concept hierarchy representation of a user profile. In: 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 198–204. ACM press, New York (2003)Google Scholar
  26. 26.
    Nanas, N., Uren, V., De Roeck, A., Domingue, J.: Multi-topic information filtering with a single user profile. In: 3rd Hellenic Conference on Artificial Intelligence, pp. 400–409 (2004)Google Scholar
  27. 27.
    Bruza, P.D., Song, D.: Inferring query models by information flow analysis. In: Proceedings of the 11th International ACM Conference on Information and Knowledge Management (CIKM 2002), pp. 260–269 (2002)Google Scholar
  28. 28.
    Roeck, A.D., Sarkar, A., Garthwaite, P.H.: Defeating the homogeneity assumption. In: 7th International Conference on the Statistical Analysis of Textual Data (JADT), pp. 282–294 (2004)Google Scholar
  29. 29.
    Nanas, N., De Roeck, A.: Autopoiesis, the immune system and adaptive information filtering. Natural Computing (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Nikolaos Nanas
    • 1
  • Manolis Vavalis
    • 1
  1. 1.Centre for Research and Technology - Thessaly (CE.RE.TE.TH)Greece

Personalised recommendations