LumberJack: Intelligent Discovery and Analysis of Web User Traffic Composition

  • Ed H. Chi
  • Adam Rosien
  • Jeffrey Heer
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2703)


Web Usage Mining enables new understanding of user goals on the Web. This understanding has broad applications, and traditional mining techniques such as association rules have been used in business applications. We have developed an automated method to directly infer the major groupings of user traffic on a Web site [Heer01]. We do this by utilizing multiple data features of user sessions in a clustering analysis. We have performed an extensive, systematic evaluation of the proposed approach, and have discovered that certain clustering schemes can achieve categorization accuracies as high as 99% [Heer02b]. In this paper, we describe the further development of this work into a prototype service called LumberJack, a push-button analysis system that is both more automated and accurate than past systems.


Clustering Log Analysis Web Mining User Profile User Sessions World Wide Web 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [Banerjee01]
    Banerjee, A., Ghosh, J.: Clickstream Clustering using Weighted Longest Common Subsequences. In: Proc. of the Workshop on Web Mining, SIAM Conference on Data Mining, Chicago IL, April 2001, pp. 33–40 (2001)Google Scholar
  2. [Barrett97]
    Barrett, R., Maglio, P.P., Kellem, D.C.: How to personalize the Web. In: Proc. of the ACM Conference on Human Factors in Computing Systems, CHI 1997, Atlanta GA, March 1997, pp. 75–82 (1997)Google Scholar
  3. [BenHur02]
    Ben-Hur, A., Elisseeff, A., Guyon, I.: A Stability Based Method for Discovering Structure in Clustered Data. In: Proceedings of the Pacific Symposium on Biocomputing (PSB2002), Kaua’i, HI (January 2002)Google Scholar
  4. [Caruana94]
    Caruana, R., Freitag, D.: Greedy attribute selection. In: Proc.of International Conference on Machine Learning, ML 1994, pp. 28–36. Morgan Kaufmann, San Francisco (1994)Google Scholar
  5. [Chi00]
    Chi, E.H., Pirolli, P., Pitkow, J.: The Scent of a Site: A System for Analyzing and Predicting Information Scent, Usage, and Usability of a Web Site. In: Proc. of ACM CHI 2000 Conference on Human Factors in Computing Systems, Amsterdam, Netherlands, pp. 161–168, 581, 582. ACM Press, New York (2000)Google Scholar
  6. [Chi01]
    Chi, E.H., Pirolli, P., Chen, K., Pitkow, J.: Using information scent to model user information needs and actions on the Web. In: Proc. of the ACM Conference on Human Factors in Computing Systems, CHI 2001, Seattle, WA, pp. 490–497 (2001)Google Scholar
  7. [Cooley97]
    Cooley, R., Mobasher, B., Srivastava, J.: Web Mining: Information and Pattern Discovery on the World Wide Web. In: Proc. of the International Conference on Tools ith Artificial Ingelligence, pp. 558–567. IEEE, Los Alamitos (1997)CrossRefGoogle Scholar
  8. [CLUTO02]
    CLUTO: A Software Package for Clustering High-Dimensional Datasets, Available at
  9. [Fu99]
    Fu, Y., Sandhu, K., Shih, M.: A Generalization-Based Approach to Clustering of Web Usage Sessions. In: Masand, B., Spiliopoulou, M. (eds.) WebKDD 1999. LNCS (LNAI), vol. 1836, pp. 21–38. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  10. [Heer01]
    Heer, J., Chi, E.H.: Identification of Web User Traffic Composition using Multi- Modal Clustering and Information Scent. In: Proc. of the Workshop on Web Mining, SIAM Conference on Data Mining, Chicago IL, April 2001, pp. 51–58 (2001)Google Scholar
  11. [Heer02a]
    Heer, J., Chi, E.H.: Mining the Structure of User Activity using Cluster Stability. In: Proceedings of the Workshop on Web Analytics, SIAM Conference on Data Mining, Arlington VA (April 2002)Google Scholar
  12. [Heer02b]
    Heer, J., Chi, E.H.: Separating the Swarm: Categorization Methods for User Access Sessions on the Web. In: Proc. of ACM CHI 2002 Conference on Human Factors in Computing Systems, Minneapolis, MN, pp. 243–250. ACM Press, New York (2002)Google Scholar
  13. [Hong01]
    Hong, J.I., Heer, J., Waterson, S., Landay, J.A.: WebQuilt: A Proxy-based Approach to Remote Web Usability Testing. To appear in ACM Transactions on Information SystemsGoogle Scholar
  14. [MacQueen67]
    MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. UC Berkeley Press (1967)Google Scholar
  15. [Mobasher00]
    Mobasher, B., Dai, H., Luo, T., Su, Y., Zhu, J.: Integrating usage and content mining for more effective personalization. In: Bauknecht, K., Madria, S.K., Pernul, G. (eds.) EC-Web 2000. LNCS, vol. 1875, p. 165. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  16. [Modha02]
    Modha, D., Spangler, W.: Feature Weighting in k-Means Clustering. Machine Learning 47 (2002)Google Scholar
  17. [Porter80]
    Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)Google Scholar
  18. [Pirolli99a]
    Pirolli, P., Pitkow, J.E.: Distributions of Surfers’ Paths Through the World Wide Web: Empirical Characterization. World Wide Web 2(1–2), 29–45 (1999)CrossRefGoogle Scholar
  19. [Pirolli99b]
    Pirolli, P., Card, S.K.: Information Foraging. Psychological Review 106(4), 643–675 (1999)CrossRefGoogle Scholar
  20. [Schuetze99]
    Schuetze, H., Manning, C.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)MATHGoogle Scholar
  21. [Schuetze99b]
    Schuetze, H., Pirolli, P., Pitkow, J., Chen, F., Chi, E., Li, J.: System and Method for clustering data objects in a collection. Xerox PARC UIR QCA Technical Report (1999)Google Scholar
  22. [Shahabi97]
    Shahabi, C., Zarkesh, A.M., Adibi, J., Shah, V.: Knowledge Discovery from User’s Web-page Navigation. In: Proc. 7th IEEE Intl. Conf. On Research Issues in Data Engineering, pp. 20–29 (1997)Google Scholar
  23. [SIAM01]
    Proc. of the Workshop on Web Mining, SIAM Conference on Data Mining, Chicago IL (April 2001)Google Scholar
  24. [Srivastava00]
    Srivastava, J., Cooley, R., Deshpande, M.: Web Usage Mining: Discovery and Application of Usage Patterns from Web Data. SIGKDD Explorations 1(2), 12–23 (2000)CrossRefGoogle Scholar
  25. [WEBKDD01]
    Kohavi, R., Masand, B., Spiliopoulou, M., Srivastava, J. (eds.): WebKDD 2001. LNCS (LNAI), vol. 2356. Springer, Heidelberg (2002)MATHGoogle Scholar
  26. [Yan96]
    Yan, T.W., Jacobsen, M., Garcia-Molina, H., Dayal, U.: From User Access Patterns to Dynamic Hypertext Linking. Computer Networks 28(7–11), 1007–1014 (1996)Google Scholar
  27. [Zhao01]
    Zhao, Y., Karypis, G.: Criterion Functions for Document Clustering: Experiments and Analysis. Technical Report #01–40. University of Minnesota, Computer Science Department. Minneapolis, MN (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Ed H. Chi
    • 1
  • Adam Rosien
    • 1
  • Jeffrey Heer
    • 1
  1. 1.PARC (Palo Alto Research Center)Palo AltoUSA

Personalised recommendations