Skip to main content

A Framework for Efficient and Anonymous Web Usage Mining Based on Client-Side Tracking

  • Conference paper
  • First Online:
WEBKDD 2001 — Mining Web Log Data Across All Customers Touch Points (WebKDD 2001)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2356))

Abstract

Web Usage Mining (WUM), a natural application of data mining techniques to the data collected from user interactions with the web, has greatly concerned both academia and industry in recent years. Through WUM, we are able to gain a better understanding of both the web and web user access patterns; a knowledge that is crucial for realization of full economic potential of the web. In this chapter, we describe a framework for WUM that particularly satisfies the challenging requirements of the web personalization applications. For on-line and anonymous web personalization to be effective, WUM must be accomplished in real-time as accurately as possible. On the other hand, the analysis tier of the WUM system should allow compromise between scalability and accuracy to be applicable to real-life web-sites with numerous visitors. Within our WUM framework, we introduce a distributed user tracking approach for accurate, efficient, and scalable collection of the usage data. We also propose a new model, the Feature Matrices (FM) model, to capture and analyze users access patterns. With FM, various features of the usage data can be captured with flexible precision so that we can trade off accuracy for scalability based on the specific application requirements. Moreover, due to low update complexity of the model, FM can adapt to user behavior changes in real-time. Finally, we define a novel similarity measure based on FM that is specifically designed for accurate classification of partial navigation patterns in real-time. Our extensive experiments with both synthetic and real data verify correctness and efficacy of our WUM framework for efficient web personalization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ackerman M., D. Billsus, S. Gaffney, S. Hettich, G. Khoo, D. Kim, R. Klefstad, C. Lowe, A. Ludeman, J. Muramatsu, K. Omori, M. Pazzani, D. Semler, B. Starr, and P. Yap. 1997. Learning Probabilistic User Profiles: Applications to Finding Interesting Web Sites, Notifying Users of Relevant Changes to Web Pages, and Locating Grant Opportunities. AI Magazine 18(2) 47–56, 1997.

    Google Scholar 

  2. Agrawal, R., and R. Srikant. 1994. Fast algorithms for mining association rules. Proceedings of the 20th VLDB conference, p.p 487–499, Santiago, Chile, 1994.

    Google Scholar 

  3. Ansari S., R. Kohavi, L. Mason, Z. Zheng. 2000. Integrating E-Commerce and Data Mining: Architecture and Challenges. Second International Conference on Electronic Commerce and Web Technologies, EC-Web 2000.

    Google Scholar 

  4. Armstrong R., D. Freitag, T. Joachims, and T. Mitchell. 1995. WebWatcher: A Learning Apprentice for the World Wide Web. AAAI Spring Symposium on Information Gathering from Heterogeneous, Distributed Environments, March 1995.

    Google Scholar 

  5. Baumgarten M., A.G. Bchner, S.S. Anand, M.D. Mulvenna, J.G. Hughes. 2000. Navigation Pattern Discovery from Internet Data. M. Spiliopoulou, B. Masand (eds.) Advances in Web Usage Analysis and User Profiling, Lecturer Notes in Computer Science, Vol. 1836, Springer-Verlag, ISBN: 3-540-67818-2, July 2000.

    Google Scholar 

  6. Borges J., M. Levene. 1999. Data mining of user navigation patterns. Proceedings of Workshop on Web Usage Analysis and User Profiling (WEBKDD), in conjunction with ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, p.p 31–36, San Diego, California, August, 1999.

    Google Scholar 

  7. Breese J.S., D. Heckerman, C. Kadie. 1998. Empirical Analysis of Predictive Algorithms for Collaborative Filtering. Proceedings of Uncertainty in Artificial Intelligence, Madison, WI, July 1998. Morgan Kaufmann Publisher.

    Google Scholar 

  8. Büchner A.G., M.D. Mulvenna. 1998. Discovering Internet Marketing Intelligence through Online Analytical Web Usage Mining. ACM SIGMOD Record, ISSN 0163-5808, Vol. 27, No. 4, p.p 54–61, 1998.

    Article  Google Scholar 

  9. Cadez I., Heckerman D., Meek C, Smyth P., and White S.: Visualization of Navigation Patterns on Web-Site Using Model Based Clustering. Technical Report MSR-TR-00-18, Microsoft Research, Microsoft Corporation, Redmond, WA,(2000)

    Google Scholar 

  10. Catledge L. and J. Pitkow. 1995. Characterizing Browsing Behaviors on the World Wide Web. Computer Networks and ISDN Systems, 27(6), 1995.

    Google Scholar 

  11. Chen M.S., J.S. Park, and P.S. Yu. 1998. Efficient Data Mining for Path Traversal Patterns. IEEE Transactions on Knowledge and Data Engineering, Vol. 10, No. 2, p.p 209–221, April, 1998.

    Article  Google Scholar 

  12. Cohen W., A. McCallum, D. Quass. 2000. IEEE Data Engineering Bulletin, Vol. 23, No. 3. p.p 17–24, September 2000.

    Google Scholar 

  13. Cooley R., B. Mobasher, and J. Srivastava. 1999. Data Preparation for Mining World Wide Web Browsing Patterns. Journal of Knowledge and Information Systems, 1(1):5–32, Springer-Verlag, February, 1999.

    Google Scholar 

  14. Drott M.C. 1998. Using Web server logs to improve site design. Proceedings on the sixteenth annual international conference on Computer documentation, p.p 43–50, Quebec Canada, September, 1998.

    Google Scholar 

  15. Fu Y., K. Sandhu, and M. Shih. 1999. Clustering of Web Users Based on Access Patterns. International Workshop on Web Usage Analysis and User Profiling (WEBKDD’99), San Diego, CA, 1999.

    Google Scholar 

  16. Greenberg S. and A. Cockburn. 1999. Getting Back to Back: Alternate Behaviors for a Web Browser’s Back Button. Proceedings of the 5th Annual Human Factors and Web Conference, NIST, Gaithersburg, Maryland, June, 1999.

    Google Scholar 

  17. Greenspun P. 1999. Philip and Alex’s Guide to Web Publishing. Chapter 9, User Tracking; ISBN: 1-55860-534-7.

    Google Scholar 

  18. Henzinger M. 2000. Link Analysis in Web Information Retrieval. IEEE Computer Society, Vol. 23 No. 3, September, 2000.

    Google Scholar 

  19. Huberman B., Pirolli P., Pitkow J., and Lukos R.: Strong Regularities in World Wide Web Surfing. Science, 280, p.p 95–97 (1997)

    Article  Google Scholar 

  20. Konstan J., B. Miller, D. Maltz, J. Herlocker, L. Gordon, and J. Riedl. 1997. Applying Collaborative Filtering to Usenet News. Communications of the ACM (40)3, 1997.

    Google Scholar 

  21. Kuo Y.H., M.H. Wong. 2000. Web Document Classification Based on Hyperlinks and Document Semantics. PRICAI 2000 Workshop on Text and Web Mining, Melbourne, p.p 44–51, August 2000.

    Google Scholar 

  22. Leighton T. 2001. The Challenges of Delivering Content on the Internet. Keynote address in ACM SIGMETRICS 2001 Conference, Massachusetts, June 2001.

    Google Scholar 

  23. Levene L., and G. Loizou. 2000. Zipf’s law for web surfers. Knowledge and Information Systems an International Journal, 2000.

    Google Scholar 

  24. Lieberman H. 1995. Letizia: An Agent that Assists Web Browsing. Proceedings of the International Joint Conference on Artificial Intelligence, Montreal, August 1995.

    Google Scholar 

  25. Lin I.Y., X.M. Huang, and M.S. Chen. 1999. Capturing User Access Patterns in the Web for Data Mining. Proceedings of the 11th IEEE International Conference Tools with Artificial Intelligence, November 7–9, 1999.

    Google Scholar 

  26. Mobasher B., R. Cooley, and J. Srivastava. 1997. Web Mining: Information and Pattern Discovery on the World Wide Web. Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’97), November 1997.

    Google Scholar 

  27. Mobasher B., H. Dai, T. Luo, M. Nakagawa, Y. Sun, J. Wiltshire. 2000. Discovery of Aggregate Usage Profiles for Web Personalization. Proceedings of the Web Mining for E-Commerce Workshop WebKDD’2000, held in conjunction with the ACM-SIGKDD Conference on Knowledge Discovery in Databases KDD’2000), Boston, August 2000.

    Google Scholar 

  28. Mobasher B., R. Cooley, and J. Srivastava. 2000. Automatic Personalization Based on Web Usage Mining. Special Section of the Communications of ACM on “Personalization Technologies with Data Mining”, 43(8):142–151, August, 2000.

    Google Scholar 

  29. Mogul J, and P.J. leach. 1997. Simple Hit-Metering for HTTP. Internet draft-IETF-http-hit-metering-00.txt; HTTP Working Group. January, 1997.

    Google Scholar 

  30. Nasraoui O., R. Krishnapuram, A. Joshi. 1999. Mining Web Access Logs Using a Fuzzy Relational Clustering Algorithm based on a Robust Estimator. Proceedings of 8th World Wide Web Conference (WWW8), Torronto, May, 1999

    Google Scholar 

  31. Paliouras G., C. Papatheodorou, V. Karkaletsis, and C.D. Spyropoulos. 2000. Clustering the Users of Large Web Sites into Communities. Proceedings International Conference on Machine Learning (ICML), p.p 719–726, Stanford, California, 2000.

    Google Scholar 

  32. Pazzani M., L. Nguyen, and S. Mantik. 1995. Learning from hotlists and coldists: Towards a WWW information filtering and seeking agent. Proceedings of IEEE Intl.Conference on Tools with AI, 1995.

    Google Scholar 

  33. Perkowitz M., O. Etzioni. 1998. Adaptive Web sites: Automatically Synthesizing Web Pages. Fifth National Conference in Artificial Intelligence, p.p 727–732, Cambridge, MA, 2000.

    Google Scholar 

  34. Perkowitz M., and O. Etzioni. 2000. Toward adaptive Web sites: Conceptual framework and case study. Artificial Intelligence 118, p.p 245–275, 2000.

    Article  MATH  Google Scholar 

  35. Pitkow J.E. 1997. In Search of Reliable Usage Data on the WWW. The Sixth International World Wide Web Conference, Santa Clara, California, 1997.

    Google Scholar 

  36. Schafer, J.B., J. Konstan, and J. Riedl. Electronic commerce recommender applications. Journal of Data Mining and Knowledge Discovery, 5:115–152, 2001.

    Article  MATH  Google Scholar 

  37. Shahabi C., A. Zarkesh, J. Adibi, V. Shah. 1997. Knowledge Discovery from Users Web-Page Navigation. Proceedings of the IEEE RIDE97 Workshop, April, 1997.

    Google Scholar 

  38. Shahabi C., A. Faisal, F. Banaei-Kashani, J. Faruque. 2000. INSITE: A Tool for Real-Time Knowledge Discovery from Users Web Navigation. Proceedings of Very Large Databases (VLDB’2000), Cairo, Egypt, September, 2000.

    Google Scholar 

  39. Shahabi C., F. Banaei-Kashani, and J. Faruque. 2001. A Reliable, Efficient, and Scalable System for Web Usage Data Acquisition. WebKDD’01 Workshop in conjunction with the ACM-SIGKDD 2001, San Francisco, CA, August, 2001.

    Google Scholar 

  40. Shahabi C., F. Banaei-Kashani, J. Faruque, and A. Faisal. 2001. Feature Matrices: A Model for Efficient and Anonymous Web Usage Mining. EC-Web 2001, Germany, September, 2001.

    Google Scholar 

  41. Shahabi C., F. Banaei-Kashani, Y. Chen, D. McLeod. 2001. Yoda: An Accurate and Scalable Web-based Recommendation System. Sixth International Conference on Cooperative Information Systems (CoopIS 2001), Trento, Italy, September, 2001.

    Google Scholar 

  42. Shahabi C., F. Banaei-Kashani, J. Faruque. 2001. Efficient and Anonymous Web Usage Mining for Web Personalization. To appear at INFORMS Journal on Computing-Special Issue on Mining Web-based Data for e-Business Applications.

    Google Scholar 

  43. Spiliopoulou M., and L.C. Faulstich. 1999. WUM: A Tool for Web Utilization Analysis. In extended version of Proceedings of EDBT Workshop WebDB’98, LNCS 1590. Springer-Verlag, 1999.

    Google Scholar 

  44. Spiliopoulou M. 2000. Web usage mining for site evaluation: Making a site better fit its users. Special Section of the Communications of ACM on “Personalization Technologies with Data Mining”, 43(8):127–134, August, 2000.

    Google Scholar 

  45. Srivastava J., R. Cooley, M. Deshpande, and P.N. Tan. 2000. Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data. SIGKDD Explorations, Vol. 1, Issue 2, 2000.

    Google Scholar 

  46. VanderMeer D., K. Dutta, A. Datta, K. Ramamritham and S.B. Navanthe. 2000. Enabling Scalable Online Personalization on the Web. Proceedings of the 2nd ACM conference on Electronic commerce, p.p 185–196, 2000.

    Google Scholar 

  47. Yan T.W., Jacobsen M., Garcia-Molina H., Dayal U.: From User Access Patterns to Dynamic Hypertext Linking. Fifth International World Wide Web Conference, Paris, France, (1996)

    Google Scholar 

  48. Zhang T., R. Ramakrishnan, and M. Livny. 1996. BIRCH: An Efficient Data Clustering Method for Very Large Databases. SIGMOD’ 96, p.p 103–114, Montreal, Canada, June, 1996.

    Google Scholar 

  49. Zukerman I., D.W. Albrecht, and A.E. Nicholson. 1999. Predicting users’ requests on the WWW. Proceedings of the Seventh International Conference on User Modeling (UM-99), Banff, Canada, p.p 275–284, June, 1999.

    Google Scholar 

  50. http://www.personify.com

  51. http://www.websidestory.com

  52. http://www.bluemartini.com

  53. http://www.webtrends.com

  54. http://docs.yahoo.com/docs/pr/release634.html

  55. http://www.javascript.com

  56. http://java.sun.com/sfaq

  57. http://msdn.microsoft.com/remotescripting

  58. http://www.akamai.com

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Shahabi, C., Banaei-Kashani, F. (2002). A Framework for Efficient and Anonymous Web Usage Mining Based on Client-Side Tracking. In: Kohavi, R., Masand, B.M., Spiliopoulou, M., Srivastava, J. (eds) WEBKDD 2001 — Mining Web Log Data Across All Customers Touch Points. WebKDD 2001. Lecture Notes in Computer Science(), vol 2356. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45640-6_6

Download citation

  • DOI: https://doi.org/10.1007/3-540-45640-6_6

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43969-1

  • Online ISBN: 978-3-540-45640-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics