Information Retrieval

, Volume 10, Issue 2, pp 115–141

An approach for the capture of context-dependent document relationships extracted from Bayesian analysis of users' interactions with information

  • D. R. Campbell
  • S. J. Culley
  • C. A. McMahon
  • F. Sellini


A number of technologies exist which enable the unobtrusive capture of computer interface interactions in the background of a user's working environment. The resulting data can be used in a variety of ways to model aspects of search activity and the general use of electronic documents in normal working routines. In this paper we present an approach for using captured data to identify relationships between documents used by an individual or group, representing their value in a given context—that may relate to specific information need or activity. The approach employs the use of a naïve Bayesian classifier to evaluate possible relationships that are derived implicitly from the data. It is intended that the relationships established be stored within an information retrieval (IR) system to aid in the retrieval of related documents where future users arrive at a similar context. In the evaluation of the approach over 70 hours of data from computer users in industrial and academic settings are collected to assess its overall feasibility. The results indicate that the approach provides a useful method for the establishment of identifiable relationships between documents based on the context of their usage, rather than their content.


Implicit indicators Recommender systems Context Naïve Bayesian classification 


  1. Ashman, H. (2000). Electronic document addressing: Dealing with change. ACM Computing Surveys, 32(3), 201–212.CrossRefGoogle Scholar
  2. Beeferman, D., & Berger, A. (2000). Agglomerative clustering of a search engine query log. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Boston, Massachusetts, United States, August 20–23, 2000) (pp. 407–416), New York: NY. KDD'00. ACM Press.Google Scholar
  3. Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30(1–7), 107–117.CrossRefGoogle Scholar
  4. Claypool, M., Le, P., Waseda, M., & Brown, D. (2001). Implicit interest indicators. In Proceedings of the 6th International Conference on Intelligent user Interfaces. (pp. 33–40).Google Scholar
  5. Coppin, B. (2004). Artificial intelligence illuminated, 1st Edition, ISBN 0-7637-3230-3. (pp. 351–356).Google Scholar
  6. Czerwinski, M., Horvitz, E., & Wilhite, S. (2004). A diary study of task switching and interruptions. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. (pp. 175–182).Google Scholar
  7. Dougherty, J., Kohavi, R., & Sahami, M. (1995). Supervised and unsupervised discretization of continuous features. In Proceedings of the Twelfth International Conference on Machine Learning (pp. 194–202).Google Scholar
  8. Drucker, P. F. (1959). The landmarks of tomorrow. London: Heinemann.Google Scholar
  9. Dumais, S., Platt, J., Heckerman, D., & Sahami, M. (1998). Inductive learning algorithms and representations for text categorization. In Proceedings of the Seventh International Conference on Information and Knowledge Management. (pp. 148–155).Google Scholar
  10. Geller, N. L. (1978). On the citation influence methodology of Pinski and Narin. Information Processing and Management, 14(2), 93–95.MATHCrossRefMathSciNetGoogle Scholar
  11. Goldberg, D., Nichols, D., Oki, B., & Terry, D. (1992). Using collaborative filtering to weave an information tapestry. Communications of the ACM, 35(12), 61–70.CrossRefGoogle Scholar
  12. Hales, C. (1991). Analysis of the engineering design process in an industrial context. Hampshire, UK: Gants Hill Publications.Google Scholar
  13. Haveliwala, T. H. (2002). Topic-sensitive PageRank. In Proceedings of the 11th International Conference on World Wide Web [(Honolulu, Hawaii, USA, May 07–11, 2002)]. pp. 517–526. WWW '02. ACM Press, New York: NY.Google Scholar
  14. Herlocker, J. L., & Konstan, J. A. (2001) Content-independent task-focused recommendation. IEEE Internet Computing, 5(6), 40–47.Google Scholar
  15. Herlocker, J. L., Konstan, J. A., & Riedl, J. (2002) An empirical analysis of design choices in neighborhood-based collaborative filtering algorithms. Information Retrieval, 5(4), 287–310.Google Scholar
  16. Hill, W., Hollan, J., Wroblewski, D., & McCandless, T. (1992). Edit wear and read wear. In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems. (pp. 3–9).Google Scholar
  17. Hill, W., Stead, L., Rosenstein, M., & Furnas, G. (1995). Recommending and evaluating choices in a virtual community of use. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems [(Denver, Colorado, United States, May 07–11, 1995)] (pp. 194–201), New York: NY. ACM Press/Addison-Wesley Publishing Co.Google Scholar
  18. Horvitz, E., Breese, J., Heckerman, D., Hovel, D., & Rommelse, K. (1998). The lumiere project: Bayesian user modelling for inferring the goals and needs of software users. In Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence (pp. 256–265). Morgan Kaufmann, San Francisco. Madison, WI.Google Scholar
  19. Jeh, G., & Widom, J. (2002). SimRank: a measure of structural-context similarity. In Proceedings of the Eighth ACM SIGKDD international Conference on Knowledge Discovery and Data Mining. [(Edmonton, Alberta, Canada, July 23 – 26, 2002)] (pp. 538–543). ACM Press, New York: NY.Google Scholar
  20. Joachims, T., Freitag, D., & Mitchell, T. (1997). WebWatcher: A tour guide for the World Wide Web. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence (IJCAI97).. (pp. 770–777). Nagoya, Japan.Google Scholar
  21. Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. In Proceedings of ECML-98, 10th European Conference on Machine Learning [(Chemnitz, Germany, 1998)]. (pp. 137–142).Google Scholar
  22. Kelly, D., & Belkin, N. J. (2001). Reading time, scrolling and interaction: exploring implicit sources of user preferences for relevance feedback. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval [(New Orleans, Louisiana, United States)]. SIGIR '01 (pp. 408–409) New York: ACM Press.Google Scholar
  23. Kelly, D., & Belkin, N. J. (2004). Display time as implicit feedback: Understanding task effects. In Proceedings of the 27th Annual international ACM SIGIR Conference on Research and Development in Information Retrieval [(Sheffield, United Kingdom, July 25–29, 2004)]. (pp. 377–384). SIGIR '04. ACM Press.Google Scholar
  24. Kleinberg, J. M. (1999). Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5), 604–632.MATHCrossRefMathSciNetGoogle Scholar
  25. Konstan, J., Miller, B., Maltz, D., Herlocker, J., Gordon, L., & Riedl, J. (1997). GroupLens: Applying collaborative filtering to usenet news. Communications of the ACM, 40(3), 77–87.CrossRefGoogle Scholar
  26. Lewis, D. (1998). Naive (Bayes) at forty: The independence assumption in information retrieval. In Proceedings of ECML-98, 10th European Conference on Machine Learning. (pp. 4–15).Google Scholar
  27. Lieberman, H. (1995). Letizia: an agent that assists web browsing. In Proceedings of the 14th International Joint Conference on Artificial Intelligence. (pp. 475–480).Google Scholar
  28. Lowe, A. (2002). Studies of information use by engineering designers and the development of strategies to aid in its classification and retrieval. PhD thesis, University of Bath, UK, Chapter 3.Google Scholar
  29. Morita, M., & Shinoda, Y. (1996). Information filtering based on user behaviour analysis and best match text retrieval. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. (pp. 272–281).Google Scholar
  30. Oard, D. W., & Kim, J. (1998). Implicit feedback for recommender systems. In Proceedings of the AAAI Workshop on Recommender Systems. (pp. 80–82). Madison, WI, USA.Google Scholar
  31. O'Riordan, C., & Sorensen, H. (2002) Information Filtering and Retrieval: An Overview. Technical Report, Department of Information Technology, NUI Galway, Ireland.Google Scholar
  32. Pitkow, J. E. (1999). Summary of WWW characterizations. World Wide Web, 2(1–2), 3–13.CrossRefGoogle Scholar
  33. Resnik, P., Iacovou, N., Suchak, M., Bergstrom, P., & Riedl, J. (1994). Grouplens: An open architecture for collaborative filtering of netnews. In Proceedings of the ACM, Conference on Computer Supported Cooperative Work. (pp. 175–186).Google Scholar
  34. Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1), 1–47.CrossRefGoogle Scholar
  35. Seo, Y. W., & Zhang, B. T. (2000). Learning user's preferences by analyzing wed-browsing behaviours. In Proceedings of the ACM 4th International Conference on Autonomous Agents. (pp. 381–387).Google Scholar
  36. Shardanand, U., & Maes, P. (1995). Social information filtering: Algorithms for automating ‘Word of Mouth’. In Proceedings of the Conference on Human Factors in Computing Systems. (pp. 210–217). New York: NY. ACM Press/Addison-Wesley Publishing Co.Google Scholar
  37. Smith, G., Baudisch, P., Roverson, G., Czerwinski, M., Meyers, B., Robbins, D., & Andrews, D. (2003). GroupBar: The taskbar evolved. In Proceedings of OZCHI 2003: New Directions in Interaction, Information Environments, Media & Technology. (pp. 34–43).Google Scholar
  38. Spinellis, D. (2003). The decay and failures of web references, Communications of the ACM, 46(1):71–77.CrossRefMathSciNetGoogle Scholar
  39. Spinellis, D. (2005) Index-Based Persistent Document Identifiers. Information Retrieval, 8(1), 5–24.Google Scholar
  40. Ullman, D. (2002). Toward the ideal mechanical engineering design support system. Research in Engineering Design, 13, 55–64.Google Scholar
  41. Van Rijsbergen, C. J. (1979). Information retrieval, pp. 60. London: Butterworth.Google Scholar
  42. Wang, J., Vries, A., & Reinders, M. (2006). A user-item relevance model for log-based collaborative filtering. In Proceedings of the European Conference on Information Retrieval (ECIR 2006). (pp. 37–48).Google Scholar
  43. Wen, J., Nie, J., & Zhang, H. (2001). Query clustering using content words and user feedback. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval [(New Orleans, Louisiana, United States)]. (pp. 442–443). New York: NY. SIGIR '01. ACM Press.Google Scholar
  44. White, R. W., Jose, J. M., & Ruthven, I. (2006). An implicit feedback approach for interactive information retrieval. Information Processing and Management, 42(1), 166–190.CrossRefGoogle Scholar
  45. Yang, Y. (1999). An evaluation of statistical approaches to text categorization. Information Retrieval, 1(1–2), 69–90.CrossRefGoogle Scholar
  46. Yang, Y., & Webb, G. I. (2001). Proportional k-Interval discretization for naive-bayes classifiers. In Proceedings of the 12th European Conference on Machine Learning. [(September 05–07, 2001)]. (pp. 564–575). Springer-Verlag: London.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2006

Authors and Affiliations

  • D. R. Campbell
    • 1
  • S. J. Culley
    • 1
  • C. A. McMahon
    • 1
  • F. Sellini
    • 2
  1. 1.Department of Mechanical EngineeringUniversity of BathBathEngland
  2. 2.Airbus UK (Knowledge Management)FiltonEngland

Personalised recommendations