Data Mining of User Navigation Patterns

  • José Borges
  • Mark Levene
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1836)


We propose a data mining model that captures the user navigation behaviour patterns. The user navigation sessions are modelled as a hypertext probabilistic grammar whose higher probability strings correspond to the user’s preferred trails. An algorithm to efficiently mine such trails is given. We make use of the N gram model which assumes that the last N pages browsed affect the probability of the next page to be visited. The model is based on the theory of probabilistic grammars providing it with a sound theoretical foundation for future enhancements. Moreover, we propose the use of entropy as an estimator of the grammar’s statistical properties. Extensive experiments were conducted and the results show that the algorithm runs in linear time, the grammar’s entropy is a good estimator of the number of mined trails and the real data rules confirm the effectiveness of the model.


Association Rule Data Mining Technique Conditional Entropy Support Threshold User Session 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Azer Bestavros. Using speculation to reduce server load and service time on the www. In Proc. of the fourth ACM International Conference on Information and Knowledge Management, pages 403–410, Baltimore,MD, 1995.Google Scholar
  2. 2.
    J. Borges and M. Levene. Mining association rules in hypertext databases. In Proc. of the fourth Int. Conf. on Knowledge Discovery and Data Mining, pages 149–153, August 1998.Google Scholar
  3. 3.
    José Borges and Mark Levene. Heuristics for mining high quality user web navigation patterns. Research Note RN/99/68, Department of Computer Science, University College London, Gower Street, London, UK, October 1999.Google Scholar
  4. 4.
    Alex G. Büchner, M. Baumgarten, S.S. Anand, Maurice D. Mulvenna, and J.G. Hughes. Navigation pattern discovery from internet data. In Proc. of the Web Usage Analysis and User Profiling Workshop, pages 25–30, San Diego, California, August 1999.Google Scholar
  5. 5.
    Alex G. Büchner, Maurice D. Mulvenna, Sarab S. Anand, and John G. Hughes. An internet-enabled knowledge discovery process. In Proc. of 9th International Database Conference, pages 13–27, Hong Kong, July 1999.Google Scholar
  6. 6.
    Lara D. Catledge and James E. Pitkow. Characterizing browsing strategies in the world wide web. Computer Networks and ISDN Systems, 27(6):1065–1073, April 1995.Google Scholar
  7. 7.
    Soumen Chakrabarti, Byron E. Dom, David Gibson, Jon Kleinberg, Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, and Andrew S. Tomkins. Mining the link structure of the world wide web. IEEE Computer, 32(8):60–67, August 1999.Google Scholar
  8. 8.
    E. Charniak. Statistical Language Learning. The MIT Press, 1996.Google Scholar
  9. 9.
    Christopher Chatfield. Statistical inferences regarding markov chain models. Applied Statistics, 22:7–20, 1973.CrossRefGoogle Scholar
  10. 10.
    M.-S. Chen, J. S. Park, and P. S. Yu. Efficient data mining for traversal patterns. IEEE Trans. on Knowledge and Data Eng., 10(2):209–221, March/April 1998.Google Scholar
  11. 11.
    R. Cooley, B. Mobasher, and J. Srivastava. Data preparation for mining world wide web browsing patterns. Knowledge and Information Systems, 1(1):5–32, February 1999.Google Scholar
  12. 12.
    T. Cover and J. Thomas. Elements of Information Theory. John Wiley & Sons, 1991.Google Scholar
  13. 13.
    M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam, and S. Slattery. Learning to extract symbolic knowledge from the world wide web. In Proc. of the 15th National Conf. on Artificial Intelligence, pages 509–516, July 1998.Google Scholar
  14. 14.
    W. Feller. An Introduction to Probability Theory and Its Applications. John Wiley & Sons, second edition, 1968.Google Scholar
  15. 15.
    J. Hopcroft and J. Ullman. Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, 1979.Google Scholar
  16. 16.
    M. Levene and G. Loizou. A probabilistic approach to navigation in hypertext. Information Sciences, 114:165–186, 1999.zbMATHCrossRefMathSciNetGoogle Scholar
  17. 17.
    Venkata N. Padmanabhan and Jeffrey C. Mogul. Using predictive prefetching to improve world wide web latency. Computer Communications Review, 26, 1996.Google Scholar
  18. 18.
    Mike Perkowitz and Oren Etzioni. Adaptive web sites: an AI challenge. In Proc. of Fifteenth International Joint Conference on Artificial Intelligence (IJCAI-97), pages 16–21, Nagoya, Japan, August 1997.Google Scholar
  19. 19.
    Mike Perkowitz and Oren Etzioni. Adaptive sites: Automatically synthesizing web pages. In Proc. of the Fifteenth National Conference on Artificial Intelligence (AAAI-98), pages 727–732, Madison, Wisconsin, July 1998.Google Scholar
  20. 20.
    Peter L.T. Pirolli and James E. Pitkow. Distributions of surfers’ paths through the world wide web: Empirical characterizations. World Wide Web, 2:29–45, 1999.CrossRefGoogle Scholar
  21. 21.
    L. Rosenfeld and P. Morville. Information Architecture for the World Wide Web. O’Reilly, 1998.Google Scholar
  22. 22.
    S. Schechter, M. Krishnan, and M. D. Smith. Using path profiles to predict http requests. Computer Networks and ISDN Systems, 30:457–467, 1998.CrossRefGoogle Scholar
  23. 23.
    M. Spiliopoulou, L. C. Faulstich, and K. Wilkler. A data miner analyzing the navigational behaviour of web users. In Proc. of the Workshop on Machine Learning in User Modelling of the ACAI99, Greece, July 1999.Google Scholar
  24. 24.
    Myra Spiliopoulou and Lukas C. Faulstich. WUM: a tool for web utilization analysis. In Proc. of the International Workshop on the Web and Databases (WebDB’98), pages 184–203, Valencia, Spain, March 1998.Google Scholar
  25. 25.
    R. Stout. Web Site Stats: tracking hits and analyzing traffic. Osborne McGraw-Hill, 1997.Google Scholar
  26. 26.
    C. S. Wetherell. Probabilistic languages: A review and some open questions. Computing Surveys, 12(4):361–379, December 1980.Google Scholar
  27. 27.
    T. W. Yan, M. Jacobsen, H. Garcia-Molina, and U. Dayal. From user access patterns to dynamic hypertext linking. In Proc. of the 5th Int. World Wide Web Conference, pages 1007–1014, 1996.Google Scholar
  28. 28.
    O. R. Zaïane, M. Xin, and J. Han. Discovering web access patterns and trends by applying olap and data mining technology on web logs. In Proc. Advances in Digital Libraries Conf., pages 12–29, Santa Barbara, CA, April 1998.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • José Borges
    • 1
  • Mark Levene
    • 1
  1. 1.Department of Computer ScienceUniversity College LondonLondonUK

Personalised recommendations