International Journal on Digital Libraries

, Volume 5, Issue 2, pp 133–150 | Cite as

Using incremental Web log mining to create adaptive web servers

  • Tapan KamdarEmail author
  • Anupam Joshi
Regular contribution


Personalization of content returned from a Web site is an important problem in general and affects e-commerce and e-services in particular. Targeting appropriate information or products to the end user can significantly change (for the better) the user experience on a Web site. One possible approach to Web personalization is to mine typical user profiles from the vast amount of historical data stored in access logs. We present a system that mines the logs to obtain profiles and uses them to automatically generate a Web page containing URLs the user might be interested in. Profiles generated are only based on the prior traversal patterns of the user on the Web site and do not involve providing any declarative information or require the user to log in. Profiles are dynamic in nature. With time, a user’s traversal pattern changes. To reflect changes to the personalized page generated for the user, the profiles have to be regenerated, taking into account the existing profile. Instead of creating a new profile, we incrementally add and/or remove information from a user profile, aiming to save time as well as physical memory requirements.


Web mining Personalization Data mining E-commerce Fuzzy clustering 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th VLDB conference, Santiago, Chile, month 1994, pp 487–499Google Scholar
  2. 2.
    Amazon. Scholar
  3. 3.
    Armstrong R, Joachims D, Freitag T, Mitchell T (1995) Webwatcher: a learning apprentice for the World Wide Web. In: Proceedings of the AAAI spring symposium on information gathering from heterogeneous, distributed environments, Stanford, CA, March 1995, pp 6–13Google Scholar
  4. 4.
    Arocena G, Mendelz A (1998) Weboql: restructuring documents, databases, and web. In: Proceedings of the IEEE international conference on data engineering ’98, location, month 1998. IEEE Press, New YorkGoogle Scholar
  5. 5.
    Bajcsy P, Ahuja N (1998) Location- and density-based hierarchical clustering using similarity analysis. IEEE Trans Patt Anal Mach Intell 20:1011–1015CrossRefGoogle Scholar
  6. 6.
    Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Plenum, New YorkGoogle Scholar
  7. 7.
    Buchner A, Mulvenna M (1998) Discovering internet market intelligence through online analytical web usage mining. SIGMOD Rec 27(4):54–61CrossRefGoogle Scholar
  8. 8.
    Charikar M, Chekuri C, Feder T, Motwani R (1997) Incremental clustering and dynamic information retrieval. In: Proceedings of the 29th ACM symposium on theory of computing, location, month 1997, pp 626–635Google Scholar
  9. 9.
    Chen MS, Park J-S, Yu PS (1998) Efficient data mining for path traversal patterns. IEEE Trans Knowl Data Eng 10(2):209–221CrossRefGoogle Scholar
  10. 10.
    Cooley R, Mobasher B, Srivastav J (1997) Web mining: information and pattern discovery on the World Wide Web. In: Proceedings of the IEEE international conference on tools with AI, Newport Beach, CA, month 1997, pp 558–567Google Scholar
  11. 11.
    El Sonbaty Y, Ismail MA (1998) Fuzzy clustering for symbolic data. IEEE Trans Fuzzy Sys 6:195–204CrossRefGoogle Scholar
  12. 12.
    Ester M, Kriegel HP, Sander J, Wimmer M, Xiaowei X (1998) Incremental clustering for mining in a data warehousing environment. In: Proceedings of the 24th international conference on very large data bases, New York, August 1998, pp 323–333Google Scholar
  13. 13.
    Firefly. http://www.firefly.comGoogle Scholar
  14. 14.
    Fisher DH (1987) Knowledge acquisition via incremental conceptual clustering. Mach Learn 2:pagesGoogle Scholar
  15. 15.
    Fu KS (1982) Syntactic pattern recognition and applications. Academic, San DiegoGoogle Scholar
  16. 16.
    Gowda KC, Diday E (1992) Symbolic clustering using a new similarity measure. IEEE Trans Sys Man Cybern 20:368–377CrossRefGoogle Scholar
  17. 17.
    Guha S, Rastogi R, Shim K (1998) CURE: an efficient algorithm for large databases. In: Proceedings of SIGMOD ’98, Seattle, June 1998, pp 73–84Google Scholar
  18. 18.
    Hathaway RJ, Bezdek JC (1993) Switching regression models and fuzzy clustering. IEEE Trans Fuzzy Sys 1(3):195–204CrossRefGoogle Scholar
  19. 19.
    Joshi A, Krishnapuram R (1998) Robust fuzzy clustering methods to support web mining. In: Proceedings of the SIGMOD workshop on data mining and knowledge discovery, location, month 1998, 15:1–8Google Scholar
  20. 20.
    Joshi A, Jiang Z (2001) Retriever: improving web search engine results using clustering. In: Gangopadhyay A (ed) Business with electronic commerce: issues and trends. Idea PressGoogle Scholar
  21. 21.
    Joshi A, Weerawarana S, Houstis E (1997) On disconnected browsing of distributed information. In: Proceedings of the IEEE international workshop on research issues in data engineering (RIDE), Birmingham, UK, month 1997, pp 101–108Google Scholar
  22. 22.
    Joshi A, Punyapu C, Karnam P (1998) Personalization and asynchronicity to support mobile web access. In: Proceedings of the workshop on Web information and data management, 7th international conference on information and knowledge management, location, November 1998, pagesGoogle Scholar
  23. 23.
    Joshi A, Joshi K, Krishnapuram R (1999) On mining web access logs. Technical report, CS Department, University of Maryland Baltimore County, Bethesda, MDGoogle Scholar
  24. 24.
    Kaufman L, Rousseeuw PJ (1987) Clustering by means of medoids. In: Dodge Y (ed) Statistical data analysis based on the L1 norm, North Holland/Elsevier, Amsterdam, pp 405–416Google Scholar
  25. 25.
    Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, Brussels, BelgiumGoogle Scholar
  26. 26.
    Krishnapuram R, Keller JM (1993) A possibilistic approach to clustering. IEEE Trans Fuzzy Sys 1(2):98–110CrossRefGoogle Scholar
  27. 27.
    Krishnapuram R, Joshi A, Nasraoui O, Yi L (2001) Low complexiy fuzzy relational clustering algorithms for web mining. IEEE Trans Fuzzy Sys 9:pagesGoogle Scholar
  28. 28.
    Nasraoui O, Frigui H, Joshi A, Krishnapuram R () Mining web access logs using relational competitive fuzzy clustering. In: Proceedings of the 8th international fuzzy systems association world congress, location, August 1999, pagesGoogle Scholar
  29. 29.
    Ng RT, Han J (1994) Efficient and effective clustering methods for spatial data mining. In: Proceedings of the 20th conference on very large data bases, Santiago, Chile, September 1994, pp 144–155Google Scholar
  30. 30.
    Nottingham M (year) Follow: a session based log analyzing tool.∼mnot/follow/Google Scholar
  31. 31.
    Perkowitz M, Etzioni O (1997) Adaptive web sites: an AI challenge. In: Proceedings of the international joint conference on AI – IJCAI97, location, month 1997, pagesGoogle Scholar
  32. 32.
    Perkowitz M, Etzioni O (1998) Adaptive web sites: automatically synthesizing web pages. In: Proceedings of AAAI ’98, location, month 1998, pagesGoogle Scholar
  33. 33.
    Ramkumar GD, Swami A (1998) Clustering data without distance functions. Bull IEEE Comput Soc Tech Committee Data Eng 21:9–14Google Scholar
  34. 34.
    Shahabi C, Zarkesh A.M, Abidi J, Shah V, Sadri R (1999) Analysis and design of server informative www-sites. In: Proceedings of the ACM conference on information and knowledge management CIKM, Kansas City, month 1999, pagesGoogle Scholar
  35. 35.
    Sneath PHA, Sokal RR (1973) Numerical taxonomy: the principles and practice of numerical classification. Freeman, San FranciscoGoogle Scholar
  36. 36.
    Srivastava J, Cooley R, Deshpande M, Tan P-N (2000) Web usage mining: discovery and applications of usage patterns from Web data. SIGKDD Explorat 1(2):pagesGoogle Scholar
  37. 37.
    Tan P-N, Kumar V (2002) Discovery of web robot sessions based on their navigational patterns. Data Min Knowl Discov 6(1):9–35MathSciNetCrossRefGoogle Scholar
  38. 38.
    Terveen L, Hill W, Amento B (1997) PHOAKS – a system for sharing recommendations. Commun ACM 40(3):59–62CrossRefGoogle Scholar
  39. 39.
    Zaiane O, Han J (1998) Webml: Querying the world-wide web for resources and knowledge. In: Proceedings of the workshop on Web information and data management, 7th international conference on information and knowledge management, location, month 1998, pagesGoogle Scholar
  40. 40.
    Zaiane OR, Xin M, Han J (1998) Discovering Web access patterns and trends by applying OLAP and data mining technology on Web logs. In: Proceedings of the conference on advances in digital libraries (ADL’98), location, month 1998, pp 19–29Google Scholar
  41. 41.
    Zamir O, Etzioni O (1998) Web document clustering: a feasibility demonstration. In: Proceedings of SIGIR’98, Melbourne, Australia, August 1998, pp 46–54Google Scholar

Copyright information

© Springer-Verlag 2005

Authors and Affiliations

  1. 1.Department of Computer Science and Electrical EngineeringUniversity of Maryland Baltimore CountyBaltimoreUSA

Personalised recommendations