Abstract
Web Usage Mining (WUM), a natural application of data mining techniques to the data collected from user interactions with the web, has greatly concerned both academia and industry in recent years. Through WUM, we are able to gain a better understanding of both the web and web user access patterns; a knowledge that is crucial for realization of full economic potential of the web. In this chapter, we describe a framework for WUM that particularly satisfies the challenging requirements of the web personalization applications. For on-line and anonymous web personalization to be effective, WUM must be accomplished in real-time as accurately as possible. On the other hand, the analysis tier of the WUM system should allow compromise between scalability and accuracy to be applicable to real-life web-sites with numerous visitors. Within our WUM framework, we introduce a distributed user tracking approach for accurate, efficient, and scalable collection of the usage data. We also propose a new model, the Feature Matrices (FM) model, to capture and analyze users access patterns. With FM, various features of the usage data can be captured with flexible precision so that we can trade off accuracy for scalability based on the specific application requirements. Moreover, due to low update complexity of the model, FM can adapt to user behavior changes in real-time. Finally, we define a novel similarity measure based on FM that is specifically designed for accurate classification of partial navigation patterns in real-time. Our extensive experiments with both synthetic and real data verify correctness and efficacy of our WUM framework for efficient web personalization.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ackerman M., D. Billsus, S. Gaffney, S. Hettich, G. Khoo, D. Kim, R. Klefstad, C. Lowe, A. Ludeman, J. Muramatsu, K. Omori, M. Pazzani, D. Semler, B. Starr, and P. Yap. 1997. Learning Probabilistic User Profiles: Applications to Finding Interesting Web Sites, Notifying Users of Relevant Changes to Web Pages, and Locating Grant Opportunities. AI Magazine 18(2) 47–56, 1997.
Agrawal, R., and R. Srikant. 1994. Fast algorithms for mining association rules. Proceedings of the 20th VLDB conference, p.p 487–499, Santiago, Chile, 1994.
Ansari S., R. Kohavi, L. Mason, Z. Zheng. 2000. Integrating E-Commerce and Data Mining: Architecture and Challenges. Second International Conference on Electronic Commerce and Web Technologies, EC-Web 2000.
Armstrong R., D. Freitag, T. Joachims, and T. Mitchell. 1995. WebWatcher: A Learning Apprentice for the World Wide Web. AAAI Spring Symposium on Information Gathering from Heterogeneous, Distributed Environments, March 1995.
Baumgarten M., A.G. Bchner, S.S. Anand, M.D. Mulvenna, J.G. Hughes. 2000. Navigation Pattern Discovery from Internet Data. M. Spiliopoulou, B. Masand (eds.) Advances in Web Usage Analysis and User Profiling, Lecturer Notes in Computer Science, Vol. 1836, Springer-Verlag, ISBN: 3-540-67818-2, July 2000.
Borges J., M. Levene. 1999. Data mining of user navigation patterns. Proceedings of Workshop on Web Usage Analysis and User Profiling (WEBKDD), in conjunction with ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, p.p 31–36, San Diego, California, August, 1999.
Breese J.S., D. Heckerman, C. Kadie. 1998. Empirical Analysis of Predictive Algorithms for Collaborative Filtering. Proceedings of Uncertainty in Artificial Intelligence, Madison, WI, July 1998. Morgan Kaufmann Publisher.
Büchner A.G., M.D. Mulvenna. 1998. Discovering Internet Marketing Intelligence through Online Analytical Web Usage Mining. ACM SIGMOD Record, ISSN 0163-5808, Vol. 27, No. 4, p.p 54–61, 1998.
Cadez I., Heckerman D., Meek C, Smyth P., and White S.: Visualization of Navigation Patterns on Web-Site Using Model Based Clustering. Technical Report MSR-TR-00-18, Microsoft Research, Microsoft Corporation, Redmond, WA,(2000)
Catledge L. and J. Pitkow. 1995. Characterizing Browsing Behaviors on the World Wide Web. Computer Networks and ISDN Systems, 27(6), 1995.
Chen M.S., J.S. Park, and P.S. Yu. 1998. Efficient Data Mining for Path Traversal Patterns. IEEE Transactions on Knowledge and Data Engineering, Vol. 10, No. 2, p.p 209–221, April, 1998.
Cohen W., A. McCallum, D. Quass. 2000. IEEE Data Engineering Bulletin, Vol. 23, No. 3. p.p 17–24, September 2000.
Cooley R., B. Mobasher, and J. Srivastava. 1999. Data Preparation for Mining World Wide Web Browsing Patterns. Journal of Knowledge and Information Systems, 1(1):5–32, Springer-Verlag, February, 1999.
Drott M.C. 1998. Using Web server logs to improve site design. Proceedings on the sixteenth annual international conference on Computer documentation, p.p 43–50, Quebec Canada, September, 1998.
Fu Y., K. Sandhu, and M. Shih. 1999. Clustering of Web Users Based on Access Patterns. International Workshop on Web Usage Analysis and User Profiling (WEBKDD’99), San Diego, CA, 1999.
Greenberg S. and A. Cockburn. 1999. Getting Back to Back: Alternate Behaviors for a Web Browser’s Back Button. Proceedings of the 5th Annual Human Factors and Web Conference, NIST, Gaithersburg, Maryland, June, 1999.
Greenspun P. 1999. Philip and Alex’s Guide to Web Publishing. Chapter 9, User Tracking; ISBN: 1-55860-534-7.
Henzinger M. 2000. Link Analysis in Web Information Retrieval. IEEE Computer Society, Vol. 23 No. 3, September, 2000.
Huberman B., Pirolli P., Pitkow J., and Lukos R.: Strong Regularities in World Wide Web Surfing. Science, 280, p.p 95–97 (1997)
Konstan J., B. Miller, D. Maltz, J. Herlocker, L. Gordon, and J. Riedl. 1997. Applying Collaborative Filtering to Usenet News. Communications of the ACM (40)3, 1997.
Kuo Y.H., M.H. Wong. 2000. Web Document Classification Based on Hyperlinks and Document Semantics. PRICAI 2000 Workshop on Text and Web Mining, Melbourne, p.p 44–51, August 2000.
Leighton T. 2001. The Challenges of Delivering Content on the Internet. Keynote address in ACM SIGMETRICS 2001 Conference, Massachusetts, June 2001.
Levene L., and G. Loizou. 2000. Zipf’s law for web surfers. Knowledge and Information Systems an International Journal, 2000.
Lieberman H. 1995. Letizia: An Agent that Assists Web Browsing. Proceedings of the International Joint Conference on Artificial Intelligence, Montreal, August 1995.
Lin I.Y., X.M. Huang, and M.S. Chen. 1999. Capturing User Access Patterns in the Web for Data Mining. Proceedings of the 11th IEEE International Conference Tools with Artificial Intelligence, November 7–9, 1999.
Mobasher B., R. Cooley, and J. Srivastava. 1997. Web Mining: Information and Pattern Discovery on the World Wide Web. Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’97), November 1997.
Mobasher B., H. Dai, T. Luo, M. Nakagawa, Y. Sun, J. Wiltshire. 2000. Discovery of Aggregate Usage Profiles for Web Personalization. Proceedings of the Web Mining for E-Commerce Workshop WebKDD’2000, held in conjunction with the ACM-SIGKDD Conference on Knowledge Discovery in Databases KDD’2000), Boston, August 2000.
Mobasher B., R. Cooley, and J. Srivastava. 2000. Automatic Personalization Based on Web Usage Mining. Special Section of the Communications of ACM on “Personalization Technologies with Data Mining”, 43(8):142–151, August, 2000.
Mogul J, and P.J. leach. 1997. Simple Hit-Metering for HTTP. Internet draft-IETF-http-hit-metering-00.txt; HTTP Working Group. January, 1997.
Nasraoui O., R. Krishnapuram, A. Joshi. 1999. Mining Web Access Logs Using a Fuzzy Relational Clustering Algorithm based on a Robust Estimator. Proceedings of 8th World Wide Web Conference (WWW8), Torronto, May, 1999
Paliouras G., C. Papatheodorou, V. Karkaletsis, and C.D. Spyropoulos. 2000. Clustering the Users of Large Web Sites into Communities. Proceedings International Conference on Machine Learning (ICML), p.p 719–726, Stanford, California, 2000.
Pazzani M., L. Nguyen, and S. Mantik. 1995. Learning from hotlists and coldists: Towards a WWW information filtering and seeking agent. Proceedings of IEEE Intl.Conference on Tools with AI, 1995.
Perkowitz M., O. Etzioni. 1998. Adaptive Web sites: Automatically Synthesizing Web Pages. Fifth National Conference in Artificial Intelligence, p.p 727–732, Cambridge, MA, 2000.
Perkowitz M., and O. Etzioni. 2000. Toward adaptive Web sites: Conceptual framework and case study. Artificial Intelligence 118, p.p 245–275, 2000.
Pitkow J.E. 1997. In Search of Reliable Usage Data on the WWW. The Sixth International World Wide Web Conference, Santa Clara, California, 1997.
Schafer, J.B., J. Konstan, and J. Riedl. Electronic commerce recommender applications. Journal of Data Mining and Knowledge Discovery, 5:115–152, 2001.
Shahabi C., A. Zarkesh, J. Adibi, V. Shah. 1997. Knowledge Discovery from Users Web-Page Navigation. Proceedings of the IEEE RIDE97 Workshop, April, 1997.
Shahabi C., A. Faisal, F. Banaei-Kashani, J. Faruque. 2000. INSITE: A Tool for Real-Time Knowledge Discovery from Users Web Navigation. Proceedings of Very Large Databases (VLDB’2000), Cairo, Egypt, September, 2000.
Shahabi C., F. Banaei-Kashani, and J. Faruque. 2001. A Reliable, Efficient, and Scalable System for Web Usage Data Acquisition. WebKDD’01 Workshop in conjunction with the ACM-SIGKDD 2001, San Francisco, CA, August, 2001.
Shahabi C., F. Banaei-Kashani, J. Faruque, and A. Faisal. 2001. Feature Matrices: A Model for Efficient and Anonymous Web Usage Mining. EC-Web 2001, Germany, September, 2001.
Shahabi C., F. Banaei-Kashani, Y. Chen, D. McLeod. 2001. Yoda: An Accurate and Scalable Web-based Recommendation System. Sixth International Conference on Cooperative Information Systems (CoopIS 2001), Trento, Italy, September, 2001.
Shahabi C., F. Banaei-Kashani, J. Faruque. 2001. Efficient and Anonymous Web Usage Mining for Web Personalization. To appear at INFORMS Journal on Computing-Special Issue on Mining Web-based Data for e-Business Applications.
Spiliopoulou M., and L.C. Faulstich. 1999. WUM: A Tool for Web Utilization Analysis. In extended version of Proceedings of EDBT Workshop WebDB’98, LNCS 1590. Springer-Verlag, 1999.
Spiliopoulou M. 2000. Web usage mining for site evaluation: Making a site better fit its users. Special Section of the Communications of ACM on “Personalization Technologies with Data Mining”, 43(8):127–134, August, 2000.
Srivastava J., R. Cooley, M. Deshpande, and P.N. Tan. 2000. Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data. SIGKDD Explorations, Vol. 1, Issue 2, 2000.
VanderMeer D., K. Dutta, A. Datta, K. Ramamritham and S.B. Navanthe. 2000. Enabling Scalable Online Personalization on the Web. Proceedings of the 2nd ACM conference on Electronic commerce, p.p 185–196, 2000.
Yan T.W., Jacobsen M., Garcia-Molina H., Dayal U.: From User Access Patterns to Dynamic Hypertext Linking. Fifth International World Wide Web Conference, Paris, France, (1996)
Zhang T., R. Ramakrishnan, and M. Livny. 1996. BIRCH: An Efficient Data Clustering Method for Very Large Databases. SIGMOD’ 96, p.p 103–114, Montreal, Canada, June, 1996.
Zukerman I., D.W. Albrecht, and A.E. Nicholson. 1999. Predicting users’ requests on the WWW. Proceedings of the Seventh International Conference on User Modeling (UM-99), Banff, Canada, p.p 275–284, June, 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shahabi, C., Banaei-Kashani, F. (2002). A Framework for Efficient and Anonymous Web Usage Mining Based on Client-Side Tracking. In: Kohavi, R., Masand, B.M., Spiliopoulou, M., Srivastava, J. (eds) WEBKDD 2001 — Mining Web Log Data Across All Customers Touch Points. WebKDD 2001. Lecture Notes in Computer Science(), vol 2356. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45640-6_6
Download citation
DOI: https://doi.org/10.1007/3-540-45640-6_6
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43969-1
Online ISBN: 978-3-540-45640-7
eBook Packages: Springer Book Archive