Skip to main content

Web Log Mining

  • Chapter
Web Intelligence

Abstract

In the design and implementation of an Intelligent Web Information System (IWIS), it is necessary to consider the learning and discovery functionalities that produce the required knowledge of the system. Web log files provide a useful resource for the discovery of useful knowledge. In the context of IWIS, we present a brief survey of Web log mining. An overview of the more general topic known as Web mining is given first. Web log mining is then reviewed by focusing on three important aspects, namely, data preparation, Web log mining, and applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. R. Agrawal, T. Imielinski, A. Swami: Mining association rules between sets of items in large databases. Proc. SIGMOD’93 (1993) pp. 207–216

    Google Scholar 

  2. R. Agrawal, R. Srikant: Mining sequential patterns: generalizations and performance improvements. Proc. the 5th International Conference on Extendinding Database Technology (1996) pp. 3–17

    Google Scholar 

  3. G. Arocena, A. Mendelzon: WebOQL: restructuring documents, databases and webs. Proc. IEEE International Conference on Data Engineering (1998) pp. 24–33

    Google Scholar 

  4. R. Baeza-Yates, B. Ribeiro-Neto: Modern Information Retrieval ( Addison Wesley, New York, 1999 )

    Google Scholar 

  5. M. Balabanovic: An adaptive web page recommendation service, Proc. the 1st International Conference on Autonomous Agents (1997) pp. 378–385

    Google Scholar 

  6. P. Batista, M. Silva: Mining on–line newspaper web access logs, Proc. the 2nd International Conference on Adaptive Hypermedia and Adaptive Web Based Systems (Malaga, 2002) http://citeseetnj.nec.com/517088.html

  7. B. Berendt: Detail and Context in Web Usage Mining: coarsening and visualising sequences, LNAI 2356 (Springer, 2002 ) pp. 1–24

    Google Scholar 

  8. F. Bonchi, E Giannotii, C. Gozzi, G. Manco, M. Nanni, D. Pedreschi, C. Renso, S. Ruggieri: Web log data warehousing and mining for intelligent web caching, Data Knowledge Engineering, 39, 165–189 (2001)

    Article  MATH  Google Scholar 

  9. J. Borges, M. Levene: Mining association rules in hypertext databases, Proc. KDD’98 (1998) pp. 149–153

    Google Scholar 

  10. I. Borges, M. Levene: Data mining of user navigation patterns, Proc. WEBKDD’99: Workshop on Web Usage Analysis and User Profiling (1999) pp. 92–111

    Google Scholar 

  11. J. Borges, M. Levene: Heuristics for mining high quality user web navigation patterns, Research Note RN/99/68, Department of Computer Science, University College, London, 1999

    Google Scholar 

  12. J. Borges, M. Levene: A heuristic to capture longer user web navigation patterns, Proc. the 1st International Conference on Electronic Commerce and Web Technologies (2000) pp. 155–164

    Google Scholar 

  13. A.G. Buchner, S. Anand, M. Mulvenna, J. Hughes: Discovering internet marketing intelligence through Web log mining, Proc. Unicom’99 Data Mining and Data Warehousing: Realising the full value of Business Data (1999) pp. 127–138

    Google Scholar 

  14. A.G. Buchner, M. Baumgarten, S.S. Anand, M.D. Mulvenna, J.G. Hughes: Navigation pattern discovery from internet data, Proc WEBKDD’99: Workshop on Web Usage Analysis and User Profiling (1999) pp. 25–30

    Google Scholar 

  15. I. Cadez, D. Heckermain, C. Meek, P. Smyth, S. White: Visualization of navigation patterns on a web site using model–based clusterin, Proc. KDD’00 (2000) pp. 280–284

    Google Scholar 

  16. L.D. Catledge, J.E. Pitkow: Characterizing browsing behaviors on the World–Wide Web, Computer Networks and ISDN System, 27, 1065–1073 (1995)

    Article  Google Scholar 

  17. S. Chakrabarti, B. Dom, R. Kumar,P. R.ghavan, S. Rajagopalan, A. Tomkins, D. Gibson, J.M. Kleinberg: Mining the Web’s link structure, IEEE Computer, 32, 60–67 (1999)

    Article  Google Scholar 

  18. P. Chan: A non–invasive learning approach to building web user profiles, Proc. WE– BKDD’99: Workshop on Web Usage Analysis and User Profiling (1999) pp. 7–12

    Google Scholar 

  19. E. Chen, B. Krishnamurthy, J. Rexford: Improving end–to–end performance of the Web using server volumes and proxy filters, Proc. ACM SIGCOM (1998) pp. 241–253

    Google Scholar 

  20. M.S. Chen, J.S. Park, P.S. Yu: Data mining for path traversal patterns in a web environment, Proc. the 16th International Conference on Distributed Computing System (1996) pp. 385–392

    Google Scholar 

  21. M.S. Chen, J.S. Park, P.S. Yu: Efficient data mining for path traversal patterns, IEEE Transactions on Knowledge and Data Engineerin, 10, 209–221 (1998)

    Article  Google Scholar 

  22. R. Chen, K. Sivakumar, H. Kargupta: Collective mining of Bayesian networks from distributed heterogeneous data, accepted in publication of Knowledge and Information Systems (2001) http://www.csee.umbc.edu/ hillol/PUBS/kais02.pdf

  23. R. Cooley, B. Mobasher, J. Srivastava: Web mining: information and pattern discovery on the World–Wide Web, Proc. the 9th IEEE International Conference on Tools with Artificial Intelligence (1997) pp. 558–567

    Google Scholar 

  24. R. Cooley, P.–N. Tan, J. Srivastava: Discovery of interesting usage patterns from web data, Technical Report TR 99–022, University of Minnesota (1999)

    Google Scholar 

  25. R. Cooley, B. Mobasher, J. Srivastava: Data preparation for mining World–Wide Web browsing patterns, Knowledge and Information System, 1, 5–32 (1999)

    Google Scholar 

  26. M. Deshpande, G. Karypis: Selective Markov models for predicting web page accesses. Technical Report #00–056, University of Minessota (2000)

    Google Scholar 

  27. M. Drott: Using web server logs to improve site design, Proc. ACM Conference on Computer Documentation (1998) pp. 43–50

    Google Scholar 

  28. S. Elo-Dean, M. Viveros: Data mining the IBM official 1996 Olympics web site, Technical report, IBM T.J. Watson Reseach Center (1997)

    Google Scholar 

  29. O. Etzioni, The World Wide Web: quagmire or gold mine? Communications of the ACM, 39, 65–68 (1996)

    Article  Google Scholar 

  30. U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy (eds.): Advances in Knowledge Discovery and Data Minining (AAAI/MIT Press, 1996 )

    Google Scholar 

  31. U. Fayyard, G. Piatetsky-Shapiro, P. Smyth: From data mining to knowledge discovery: an overview, In: U.M. Fayyad, G. Piatetsky–Shapiro, P. Smyth, R. Uthurusamy (eds.) Advances in knowledge Discovery and Data Minining 1–34 (1996)

    Google Scholar 

  32. D. Florescu, A.Y. Levy, A.O. Mendelzon: Database techniques for the World–Wide Web: a survey, SIGMOD Record, 27, 59–74 (1998)

    Article  Google Scholar 

  33. Y. Fu, K. Sandhu, M.-Y. Shih: A generalization–based approach to clustering of web usage sessions, Proc. WEBKDD’99: Workshop on Web Usage Analysis and User Profiling (1999) pp. 21–38

    Google Scholar 

  34. A.A. Freitas: On rule interestingness measures, Knowledge–Based System, 12, 309315 (1999)

    Google Scholar 

  35. M. Garofalakis, R. Rastogi, S. Sestogi, K. Shim, Data mining and the web: past, present and future, Proc. Workshop on Web Information and Data Management (1999) pp. 43–47

    Google Scholar 

  36. W. Gaul, L. Schmidt–Thieme: Recommender systems based on navigation path features, Proc. WEBKDD’01 (San Francisco, 2001) http://robotics.stanford.edu/—ronnyk/WEBKDD2001/lars.ps

  37. A. Geyer–Schulz, M. Hahsler, M. Jahn: A customer purchase incidence model applied to recommender services, LNAI 2356 (Springer, 2002 ) pp. 25–47

    Google Scholar 

  38. D. Gibson, J. Kleinberg, P. Raghavan: Inferring web communities from link topology, Proc. the 9th ACM Conference on Hypertext and Hypermedia (1998) pp. 225–234

    Google Scholar 

  39. J. Han, M. Kamber: Data Mining, Concepts and Techniques (Morgan Kaufmann Publishers, Inc., San Francisco, 2001 )

    Google Scholar 

  40. M. Hearst: Untangling text data mining, Proc. ACL’99: the 37th Annual Meeting of the Association for Computational Linguistics (University of Maryland, 1999)

    Google Scholar 

  41. Z.–X. Huang, J. Ng, D.W. Cheung. M.K. Ng, W.–K. Ching: A cube model and cluster analysis for web access sessions, LNAI 2356 (Springer, 2002 ) pp. 48–67

    Google Scholar 

  42. A. Joshi, R. Krishnapuram: Robust fuzzy clustering methods to support web mining, Proc. Workshop on Data Mining and Knowledge Discovery, 15. 1–15. 8 (1998)

    Google Scholar 

  43. A. Joshi, R. Krishnapuram: On mining web access logs, Proc. Workshop on Research Issues in Data Mining and Knowledge Discovery (2000) pp. 63–69

    Google Scholar 

  44. K.P. Joshi, A. Joshi, Y. Yesha, R. Krishnapuram: Warehousing and mining web logs, Proc. ACM CIKM Workshop on Web Information and Data Management (1999) pp. 63–68

    Google Scholar 

  45. T. Kamdar, A. Joshi: On creating adaptive web servers using weblog mining, Technical report CS–TR–00–05, Department of Computer Science and Electrical Engineering, University of Maryland (2000)

    Google Scholar 

  46. M. Kamber, R. Shinghal: Evaluating the interestingness of characteristic rules, Proc. KDD–96 (1996) pp. 263–266

    Google Scholar 

  47. S. Khoshafian, A.B. Baker: Multimedia and Imaging Databases (Morgan Kaufmann Publishers, Inc., San Francisco, 1996 )

    Google Scholar 

  48. R. Kosala, H. Blockeel: Web Mining Research: A Survey, SIGKDD Exploration. 2, 1–15 (2000)

    Article  Google Scholar 

  49. N. Koutsoupias: Exploring web access logs with correspondence analysis, Proc. the 2nd Hellenic Conference on Artificial Intelligence, Companion Volume (2002) pp. 229–236

    Google Scholar 

  50. B. Lan, S. Bressan, B. Ooi: Making web servers pushier, Proc. WEBKDD’99: Workshop on Web Usage Analysis and User Profiling (1999) pp. 112–125

    Google Scholar 

  51. P. Langley: User modeling in adaptive interfaces, Proc. the 7th International Conference on User Modeling (1999) pp. 357–370

    Google Scholar 

  52. B. &, Y.M. Ma, P.S. Yu: Discovering unexpected information from your competitors’ web sites, Proc KDD’01 (2001) pp. 144–153

    Google Scholar 

  53. W.W. Lou, G.M. &, H.J. Lu, Q. Yang: Cut–and–pick transactions for proxy log mining, LNCS 2287 (Springer, 2002 ) pp. 88–105

    Google Scholar 

  54. S.K. Madria, S.S. Bhowmick, W.K. Ng, E.–P. Lim: Research issues in web data mining, Proc. Data Warehousing and Knowledge Discovery, 1st International Conference (1999) pp. 303–312

    Google Scholar 

  55. B. Mobasher: WebPersonalizer: a server–side recommender system based on web usage mining, Technical Support, Department of Computer Science, University of Minnesota (2001)

    Google Scholar 

  56. B. Mobasher, H. Dai, T. Luo, Y. Sun, J. Zhu: Combining Web Usage and Content Mining for More Effective Personalization, Proc. the International Conference on ECommerce and Web Technologies (2000)

    Google Scholar 

  57. B. Mobasher, N. Jain, E. Han, J. Srivastava: Web mining: pattern discovery from World Wide Web transactions, Technical Report TR96–050, Department of Computer Science, University of Minnesota (1996)

    Google Scholar 

  58. B. Mobasher, R. Cooley, J. Srivastava: Creating adaptive web sites through usage–based clustering of URLs, Proc. IEEE Knowledge and Data Engineering Workshop (KDEX’99) (1999)

    Google Scholar 

  59. B. Mobasher, R. Cooley, J. Srivastava: Automatic personalization based on web usage mining, Communications of the ACM, 43, 127–134 (2000)

    Article  Google Scholar 

  60. T. Morzy, M. Wojciechowski, M. Zakrzewicz: Web users clustering, Poznan University of technology (1999) http://www.cs.put.poznan.pl/mzakrzewicz/pubs/iscis00.pdf

  61. A. Nanopoulos, D. Katsaros, Y. Manolopoulos: Exploiting web log mining for web cache enhancement, LNAI 2356 (Springer, 2001 ) pp. 68–87

    Google Scholar 

  62. A. Nanopoulos, Y. Manolopoulos: Finding generalized path patterns for web log data mining, LNCS 1884, (Springer, 2000 ) pp. 215–228

    Google Scholar 

  63. O. Nasraoui, H. Frigui, A. Joshi, R. Krishnapuram: Mining web access logs using relational competitive fuzzy clustering, Proc. the 8th International Fuzzy Systems Association World Congress (1999)

    Google Scholar 

  64. S. Oyanagi, K. Kubota, A. Nakase: Application of matrix clustering to web log analysis and access prediction, Proc. WEBKDD’01 (San Francisco, 2001)

    Google Scholar 

  65. B. Padmanabhan, Z. Zheng, S. Kimbrough: Personalization from incomplete data: what you don’t know can hurt, Proc. KDD’01 (2001) pp. 154–163

    Google Scholar 

  66. M. Perkowitz, O. Etzioni: Adaptive web sites: automatically synthesizing web pages, Proc. 15th National Conference on Artificial Intelligence (1998) pp. 727–732

    Google Scholar 

  67. M. Perkowitz, O. Etzioni: Adaptive web sites: Conceptual cluster mining, Proc. 16th International Joint Conference on Artificial Intelligence (1999) pp. 264–269

    Google Scholar 

  68. G. Piatetsky-Shapiro: Discovery, analysis, and presentation of strong rules. In: Piatetsky–Shapir. G. and Frawle. W.J. (Eds.) Knowledge Discovery in Database AAAI/MIT Press, 229–238 (1991)

    Google Scholar 

  69. P. Pirolli, J. Pitkow, R. Rao: Silk from a sow’sear: Extracting usable structures from the web, Proc. 1996 Conference on Human Factors in Computing System (1996) pp. 118125

    Google Scholar 

  70. J. Punin, M. Krishnamoorthy, M. Zaki: LOGML: log markup language for web usage mining, LNAI 2356 (Springer, 2001 ) pp. 88–112

    Google Scholar 

  71. J.R. Quinlan: C4.5: Programs for Machine Learning (Morgan Kaufmann Publishers, Inc., San Francisco, 1993 )

    Google Scholar 

  72. J. Rauch: Logical calculi for knowledge discovery in databases, Proc. PKDD’97 (1997) pp. 47–57

    Google Scholar 

  73. J. Rauch, M. Simunek: Mining for association rules by 4ft–miner, Proc. the 14th International Conference on Applications of Prolog (2001) pp. 285–295

    Google Scholar 

  74. G. Salton, M. McGill: Introduction to Modern Information Retrieval ( McGraw Hill, New York, 1983 )

    MATH  Google Scholar 

  75. C. Shahabi, E. Banaei–Kashani, J. Faruque: A framework for efficient and anonymous web usage mining based on client–side tracking, LNAI 2356 (Springer, 2002) pp. 113144

    Google Scholar 

  76. C. Shahabi, A. Faisal, F. Banaei–Kashani, J. Faruque: Insite: A tool for real–time knowledge discovery from users web navigation, Proc. the 26th International Conference on Very Large Databases (2000) pp. 635–638

    Google Scholar 

  77. C. Shahabi, F. Banaei–Kashani, J. F.ruque, A. Faisal: Feature matrices: a model for efficient and anonymous web usage mining, LNCS 2115 (Springer, 2002 ) pp. 280–294

    Google Scholar 

  78. C. Shahabi, A. Zarkesh, J. Adibi, V. Shah: Knowledge discovery from users Webpage navigation, Proc 7th IEEE International Conference On Research Issues in Data Engineering (1997) pp. 20–29

    Google Scholar 

  79. L. Shen, L. Cheng, J. Ford, E Makedon, V. Megalooikonomou, T. Steinberg: Mining the most interesting web access associations, Proc. the 5th International Conference on Knowledge Discovery and Data Mining (KDD ‘89) (1999) pp. 145–154

    Google Scholar 

  80. E. Spertus: Parasite: Mining structural information on the Web, Computer Networks and ISDN Systems. International Journal of Computer and Telecommunication Networking, 29, 1205–1215 (1997)

    Google Scholar 

  81. M. Spiliopoulou, L. Faulstich: WUM: a web utilization miner, The World–Wide Web and Database, International Workshop WebDB’98, 109–115 (1998)

    Google Scholar 

  82. R. Srikant, Y. Yang: Mining web logs to improve website organization, Proc. World–Wide Web 2001 (2001) pp. 430–437

    Google Scholar 

  83. J. Srivastava, R. Cooley, M. Deshpande, P.–N. Tan: Web usage mining: discovery and applications of usage patterns from web data, SIGKDD Exploration, 1, 12–23 (2000)

    Article  Google Scholar 

  84. V.S. Subrahmanian: Principles of Multimedia Database Systems, (Morgan Kaufmann Publishers, Inc., San Francisco, 1998 )

    Google Scholar 

  85. A.–H. Tan: Text mining: the state of the art and the challenges, Proc. PAKDD’99 Workshop on Knowledge Discovery from Advanced Databases (1999) pp. 65–70

    Google Scholar 

  86. P. Tan, V. Kumar: Mining indirect associations in web data, LNAI 2356 (Springer, 2002 ) pp. 145–166

    Google Scholar 

  87. C.J. van Rijsbergen: Information Retrieval ( Butterworths, London, 1979 )

    Google Scholar 

  88. K. Wu, P.S. Yu, A. Ballman: SpeedTracer: a web usage mining tool, IBM Systems Journal, 37, 89–105 (1998)

    Article  Google Scholar 

  89. T. Yan, M. Jacobsen, H. Garcia–Molina, U. Dayal, From user access patterns to dynamic hypertext linking, Computer Networks and ISDN System, 28, 10071014 (1996)

    Google Scholar 

  90. Y. Yang, J.O. Pedersen: A comparative study on feature selection in text categorization, Proc. the 14th International Conference on Machine Learning (1997) pp. 412420

    Google Scholar 

  91. Q. Yang, H. Zhang, I. Tian, Y. Li: Mining web logs for prediction models in WWW caching and prefetching, Proc. KDD’01 (2001) pp. 473–478

    Google Scholar 

  92. Y.Y. Yao, H.J. Hamilton, X. Wang, PagePrompter: an intelligent web agent created using data mining techniques, Technical Report, CS–2000–08, Department of Computer Science, University of Regina (2000)

    Google Scholar 

  93. Y.Y. Yao, N. Zhong, An analysis of quantitative measures associated with rules, Proc. PAKDD’99 (1999) pp. 479–488

    Google Scholar 

  94. Y.Y. Yao, N. Zhong, J. &, S. Ohsuga: Web intelligence (WI): research challenges and trends in the new information age. In: N. Zhong. Y. Y. Yao. J. &. S. Ohsuga (eds.) Web Intelligence: Research and Development, LNAI 2198 (Springer, 2001 ) pp. 1–17

    Chapter  Google Scholar 

  95. O.R. Zaiane, J. Han, Z.–N. Li, S.H. Chee, J. Chiang: Multimediaminer: a system pro– totype for multimedia data mining, Proc. ACM SIGMOD’98 (1998) pp. 581–583

    Google Scholar 

  96. O.R. Zaiane, M. Xin, J. Han: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web Logs, Advances in Digital Librarie (1998) pp. 19–29

    Google Scholar 

  97. A. Zarkesh, J. Adibi, C. Shahabi, R. Sadri, V. Shah: Analysis and Design of Server Informative WWW–sites, Proc. 6th International Conference on Information and Knowledge Management (1997) pp. 254–261

    Google Scholar 

  98. N. Zhong, C. &. S. Ohsuga: Dynamically organizing KDD processes, International Journal of Pattern Recognition and Artificial Intelligence, 15, 451–473 (2001)

    Article  Google Scholar 

  99. N. Zhong, C. &, Y. Kakemoto, S. Ohsuga: KDD process planning, Proc. KDD’97 (1997) pp. 291–294

    Google Scholar 

  100. N. Zhong, J. &, Y.Y. Yao (eds.): Special Issue on Web Intelligence, IEEE Computer, 35 (11) (November 2002)

    Google Scholar 

  101. N. Zhong, J. &, Y.Y. Yao, S. Ohsuga: Web intelligence (WI). Proc. the 24th IEEE Computer Society International Computer Software and Applications Conference (IEEE CS Press, 2000 ) pp. 469–470

    Google Scholar 

  102. N. Zhong, Y.Y. Yao, J. &, S. Ohsuga, (eds.): Web Intelligence: Research and Development (LNAI 2198, Springer, 2001 )

    Google Scholar 

  103. Sane Solution, LLC. The NetTracker: logfile analysis and usage tracking software, http://www.sane.com/products/NetTracker

  104. Stephen Turner, The Analog: logfile analyze, http://www.analog.cx/

  105. NetIQ Co. The Webtrends: web analytics for smarter decision, http://www.netiq.com/webtrends/default.asp

  106. Pilot Software, Inc. The Hitlist: Business analysis solution, http://www.pilotsoftware.com/solutions/hitlist.htm

  107. Blue Martini Software, Inc. The Blue Martini: Evaluating customers experience, http://www.bluemartini.com

  108. Information Discover, Inc. The Data Mining Suite: Powerful data mining system for very large databases,http://www.datamining.com

  109. Ascential Software, Inc. The Torrent Webhouse: analysis of Web system data, http://www.torrent.com/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Lu, Z., Yao, Y., Zhong, N. (2003). Web Log Mining. In: Zhong, N., Liu, J., Yao, Y. (eds) Web Intelligence. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-05320-1_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-05320-1_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-07936-8

  • Online ISBN: 978-3-662-05320-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics