A Framework for Efficient and Anonymous Web Usage Mining Based on Client-Side Tracking

  • Cyrus Shahabi
  • Farnoush Banaei-Kashani
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2356)

Abstract

Web Usage Mining (WUM), a natural application of data mining techniques to the data collected from user interactions with the web, has greatly concerned both academia and industry in recent years. Through WUM, we are able to gain a better understanding of both the web and web user access patterns; a knowledge that is crucial for realization of full economic potential of the web. In this chapter, we describe a framework for WUM that particularly satisfies the challenging requirements of the web personalization applications. For on-line and anonymous web personalization to be effective, WUM must be accomplished in real-time as accurately as possible. On the other hand, the analysis tier of the WUM system should allow compromise between scalability and accuracy to be applicable to real-life web-sites with numerous visitors. Within our WUM framework, we introduce a distributed user tracking approach for accurate, efficient, and scalable collection of the usage data. We also propose a new model, the Feature Matrices (FM) model, to capture and analyze users access patterns. With FM, various features of the usage data can be captured with flexible precision so that we can trade off accuracy for scalability based on the specific application requirements. Moreover, due to low update complexity of the model, FM can adapt to user behavior changes in real-time. Finally, we define a novel similarity measure based on FM that is specifically designed for accurate classification of partial navigation patterns in real-time. Our extensive experiments with both synthetic and real data verify correctness and efficacy of our WUM framework for efficient web personalization.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ackerman M., D. Billsus, S. Gaffney, S. Hettich, G. Khoo, D. Kim, R. Klefstad, C. Lowe, A. Ludeman, J. Muramatsu, K. Omori, M. Pazzani, D. Semler, B. Starr, and P. Yap. 1997. Learning Probabilistic User Profiles: Applications to Finding Interesting Web Sites, Notifying Users of Relevant Changes to Web Pages, and Locating Grant Opportunities. AI Magazine 18(2) 47–56, 1997.Google Scholar
  2. 2.
    Agrawal, R., and R. Srikant. 1994. Fast algorithms for mining association rules. Proceedings of the 20th VLDB conference, p.p 487–499, Santiago, Chile, 1994.Google Scholar
  3. 3.
    Ansari S., R. Kohavi, L. Mason, Z. Zheng. 2000. Integrating E-Commerce and Data Mining: Architecture and Challenges. Second International Conference on Electronic Commerce and Web Technologies, EC-Web 2000.Google Scholar
  4. 4.
    Armstrong R., D. Freitag, T. Joachims, and T. Mitchell. 1995. WebWatcher: A Learning Apprentice for the World Wide Web. AAAI Spring Symposium on Information Gathering from Heterogeneous, Distributed Environments, March 1995.Google Scholar
  5. 5.
    Baumgarten M., A.G. Bchner, S.S. Anand, M.D. Mulvenna, J.G. Hughes. 2000. Navigation Pattern Discovery from Internet Data. M. Spiliopoulou, B. Masand (eds.) Advances in Web Usage Analysis and User Profiling, Lecturer Notes in Computer Science, Vol. 1836, Springer-Verlag, ISBN: 3-540-67818-2, July 2000.Google Scholar
  6. 6.
    Borges J., M. Levene. 1999. Data mining of user navigation patterns. Proceedings of Workshop on Web Usage Analysis and User Profiling (WEBKDD), in conjunction with ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, p.p 31–36, San Diego, California, August, 1999.Google Scholar
  7. 7.
    Breese J.S., D. Heckerman, C. Kadie. 1998. Empirical Analysis of Predictive Algorithms for Collaborative Filtering. Proceedings of Uncertainty in Artificial Intelligence, Madison, WI, July 1998. Morgan Kaufmann Publisher.Google Scholar
  8. 8.
    Büchner A.G., M.D. Mulvenna. 1998. Discovering Internet Marketing Intelligence through Online Analytical Web Usage Mining. ACM SIGMOD Record, ISSN 0163-5808, Vol. 27, No. 4, p.p 54–61, 1998.CrossRefGoogle Scholar
  9. 9.
    Cadez I., Heckerman D., Meek C, Smyth P., and White S.: Visualization of Navigation Patterns on Web-Site Using Model Based Clustering. Technical Report MSR-TR-00-18, Microsoft Research, Microsoft Corporation, Redmond, WA,(2000)Google Scholar
  10. 10.
    Catledge L. and J. Pitkow. 1995. Characterizing Browsing Behaviors on the World Wide Web. Computer Networks and ISDN Systems, 27(6), 1995.Google Scholar
  11. 11.
    Chen M.S., J.S. Park, and P.S. Yu. 1998. Efficient Data Mining for Path Traversal Patterns. IEEE Transactions on Knowledge and Data Engineering, Vol. 10, No. 2, p.p 209–221, April, 1998.CrossRefGoogle Scholar
  12. 12.
    Cohen W., A. McCallum, D. Quass. 2000. IEEE Data Engineering Bulletin, Vol. 23, No. 3. p.p 17–24, September 2000.Google Scholar
  13. 13.
    Cooley R., B. Mobasher, and J. Srivastava. 1999. Data Preparation for Mining World Wide Web Browsing Patterns. Journal of Knowledge and Information Systems, 1(1):5–32, Springer-Verlag, February, 1999.Google Scholar
  14. 14.
    Drott M.C. 1998. Using Web server logs to improve site design. Proceedings on the sixteenth annual international conference on Computer documentation, p.p 43–50, Quebec Canada, September, 1998.Google Scholar
  15. 15.
    Fu Y., K. Sandhu, and M. Shih. 1999. Clustering of Web Users Based on Access Patterns. International Workshop on Web Usage Analysis and User Profiling (WEBKDD’99), San Diego, CA, 1999.Google Scholar
  16. 16.
    Greenberg S. and A. Cockburn. 1999. Getting Back to Back: Alternate Behaviors for a Web Browser’s Back Button. Proceedings of the 5th Annual Human Factors and Web Conference, NIST, Gaithersburg, Maryland, June, 1999.Google Scholar
  17. 17.
    Greenspun P. 1999. Philip and Alex’s Guide to Web Publishing. Chapter 9, User Tracking; ISBN: 1-55860-534-7.Google Scholar
  18. 18.
    Henzinger M. 2000. Link Analysis in Web Information Retrieval. IEEE Computer Society, Vol. 23 No. 3, September, 2000.Google Scholar
  19. 19.
    Huberman B., Pirolli P., Pitkow J., and Lukos R.: Strong Regularities in World Wide Web Surfing. Science, 280, p.p 95–97 (1997)CrossRefGoogle Scholar
  20. 20.
    Konstan J., B. Miller, D. Maltz, J. Herlocker, L. Gordon, and J. Riedl. 1997. Applying Collaborative Filtering to Usenet News. Communications of the ACM (40)3, 1997.Google Scholar
  21. 21.
    Kuo Y.H., M.H. Wong. 2000. Web Document Classification Based on Hyperlinks and Document Semantics. PRICAI 2000 Workshop on Text and Web Mining, Melbourne, p.p 44–51, August 2000.Google Scholar
  22. 22.
    Leighton T. 2001. The Challenges of Delivering Content on the Internet. Keynote address in ACM SIGMETRICS 2001 Conference, Massachusetts, June 2001.Google Scholar
  23. 23.
    Levene L., and G. Loizou. 2000. Zipf’s law for web surfers. Knowledge and Information Systems an International Journal, 2000.Google Scholar
  24. 24.
    Lieberman H. 1995. Letizia: An Agent that Assists Web Browsing. Proceedings of the International Joint Conference on Artificial Intelligence, Montreal, August 1995.Google Scholar
  25. 25.
    Lin I.Y., X.M. Huang, and M.S. Chen. 1999. Capturing User Access Patterns in the Web for Data Mining. Proceedings of the 11th IEEE International Conference Tools with Artificial Intelligence, November 7–9, 1999.Google Scholar
  26. 26.
    Mobasher B., R. Cooley, and J. Srivastava. 1997. Web Mining: Information and Pattern Discovery on the World Wide Web. Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’97), November 1997.Google Scholar
  27. 27.
    Mobasher B., H. Dai, T. Luo, M. Nakagawa, Y. Sun, J. Wiltshire. 2000. Discovery of Aggregate Usage Profiles for Web Personalization. Proceedings of the Web Mining for E-Commerce Workshop WebKDD’2000, held in conjunction with the ACM-SIGKDD Conference on Knowledge Discovery in Databases KDD’2000), Boston, August 2000.Google Scholar
  28. 28.
    Mobasher B., R. Cooley, and J. Srivastava. 2000. Automatic Personalization Based on Web Usage Mining. Special Section of the Communications of ACM on “Personalization Technologies with Data Mining”, 43(8):142–151, August, 2000.Google Scholar
  29. 29.
    Mogul J, and P.J. leach. 1997. Simple Hit-Metering for HTTP. Internet draft-IETF-http-hit-metering-00.txt; HTTP Working Group. January, 1997.Google Scholar
  30. 30.
    Nasraoui O., R. Krishnapuram, A. Joshi. 1999. Mining Web Access Logs Using a Fuzzy Relational Clustering Algorithm based on a Robust Estimator. Proceedings of 8th World Wide Web Conference (WWW8), Torronto, May, 1999Google Scholar
  31. 31.
    Paliouras G., C. Papatheodorou, V. Karkaletsis, and C.D. Spyropoulos. 2000. Clustering the Users of Large Web Sites into Communities. Proceedings International Conference on Machine Learning (ICML), p.p 719–726, Stanford, California, 2000.Google Scholar
  32. 32.
    Pazzani M., L. Nguyen, and S. Mantik. 1995. Learning from hotlists and coldists: Towards a WWW information filtering and seeking agent. Proceedings of IEEE Intl.Conference on Tools with AI, 1995.Google Scholar
  33. 33.
    Perkowitz M., O. Etzioni. 1998. Adaptive Web sites: Automatically Synthesizing Web Pages. Fifth National Conference in Artificial Intelligence, p.p 727–732, Cambridge, MA, 2000.Google Scholar
  34. 34.
    Perkowitz M., and O. Etzioni. 2000. Toward adaptive Web sites: Conceptual framework and case study. Artificial Intelligence 118, p.p 245–275, 2000.MATHCrossRefGoogle Scholar
  35. 35.
    Pitkow J.E. 1997. In Search of Reliable Usage Data on the WWW. The Sixth International World Wide Web Conference, Santa Clara, California, 1997.Google Scholar
  36. 36.
    Schafer, J.B., J. Konstan, and J. Riedl. Electronic commerce recommender applications. Journal of Data Mining and Knowledge Discovery, 5:115–152, 2001.MATHCrossRefGoogle Scholar
  37. 37.
    Shahabi C., A. Zarkesh, J. Adibi, V. Shah. 1997. Knowledge Discovery from Users Web-Page Navigation. Proceedings of the IEEE RIDE97 Workshop, April, 1997.Google Scholar
  38. 38.
    Shahabi C., A. Faisal, F. Banaei-Kashani, J. Faruque. 2000. INSITE: A Tool for Real-Time Knowledge Discovery from Users Web Navigation. Proceedings of Very Large Databases (VLDB’2000), Cairo, Egypt, September, 2000.Google Scholar
  39. 39.
    Shahabi C., F. Banaei-Kashani, and J. Faruque. 2001. A Reliable, Efficient, and Scalable System for Web Usage Data Acquisition. WebKDD’01 Workshop in conjunction with the ACM-SIGKDD 2001, San Francisco, CA, August, 2001.Google Scholar
  40. 40.
    Shahabi C., F. Banaei-Kashani, J. Faruque, and A. Faisal. 2001. Feature Matrices: A Model for Efficient and Anonymous Web Usage Mining. EC-Web 2001, Germany, September, 2001.Google Scholar
  41. 41.
    Shahabi C., F. Banaei-Kashani, Y. Chen, D. McLeod. 2001. Yoda: An Accurate and Scalable Web-based Recommendation System. Sixth International Conference on Cooperative Information Systems (CoopIS 2001), Trento, Italy, September, 2001.Google Scholar
  42. 42.
    Shahabi C., F. Banaei-Kashani, J. Faruque. 2001. Efficient and Anonymous Web Usage Mining for Web Personalization. To appear at INFORMS Journal on Computing-Special Issue on Mining Web-based Data for e-Business Applications.Google Scholar
  43. 43.
    Spiliopoulou M., and L.C. Faulstich. 1999. WUM: A Tool for Web Utilization Analysis. In extended version of Proceedings of EDBT Workshop WebDB’98, LNCS 1590. Springer-Verlag, 1999.Google Scholar
  44. 44.
    Spiliopoulou M. 2000. Web usage mining for site evaluation: Making a site better fit its users. Special Section of the Communications of ACM on “Personalization Technologies with Data Mining”, 43(8):127–134, August, 2000.Google Scholar
  45. 45.
    Srivastava J., R. Cooley, M. Deshpande, and P.N. Tan. 2000. Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data. SIGKDD Explorations, Vol. 1, Issue 2, 2000.Google Scholar
  46. 46.
    VanderMeer D., K. Dutta, A. Datta, K. Ramamritham and S.B. Navanthe. 2000. Enabling Scalable Online Personalization on the Web. Proceedings of the 2nd ACM conference on Electronic commerce, p.p 185–196, 2000.Google Scholar
  47. 47.
    Yan T.W., Jacobsen M., Garcia-Molina H., Dayal U.: From User Access Patterns to Dynamic Hypertext Linking. Fifth International World Wide Web Conference, Paris, France, (1996)Google Scholar
  48. 48.
    Zhang T., R. Ramakrishnan, and M. Livny. 1996. BIRCH: An Efficient Data Clustering Method for Very Large Databases. SIGMOD’ 96, p.p 103–114, Montreal, Canada, June, 1996.Google Scholar
  49. 49.
    Zukerman I., D.W. Albrecht, and A.E. Nicholson. 1999. Predicting users’ requests on the WWW. Proceedings of the Seventh International Conference on User Modeling (UM-99), Banff, Canada, p.p 275–284, June, 1999.Google Scholar
  50. 50.
  51. 51.
  52. 52.
  53. 53.
  54. 54.
  55. 55.
  56. 56.
  57. 57.
  58. 58.

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Cyrus Shahabi
    • 1
  • Farnoush Banaei-Kashani
    • 1
  1. 1.Department of Computer Science, Integrated Media Systems CenterUniversity of Southern CaliforniaLos AngelesUSA

Personalised recommendations