Web Log Mining

Lu, Zhiyong; Yao, Yiyu; Zhong, Ning

doi:10.1007/978-3-662-05320-1_9

Zhiyong Lu⁴,
Yiyu Yao⁴ &
Ning Zhong⁵

291 Accesses
10 Citations

Abstract

In the design and implementation of an Intelligent Web Information System (IWIS), it is necessary to consider the learning and discovery functionalities that produce the required knowledge of the system. Web log files provide a useful resource for the discovery of useful knowledge. In the context of IWIS, we present a brief survey of Web log mining. An overview of the more general topic known as Web mining is given first. Web log mining is then reviewed by focusing on three important aspects, namely, data preparation, Web log mining, and applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.00; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. Agrawal, T. Imielinski, A. Swami: Mining association rules between sets of items in large databases. Proc. SIGMOD’93 (1993) pp. 207–216
Google Scholar
R. Agrawal, R. Srikant: Mining sequential patterns: generalizations and performance improvements. Proc. the 5th International Conference on Extendinding Database Technology (1996) pp. 3–17
Google Scholar
G. Arocena, A. Mendelzon: WebOQL: restructuring documents, databases and webs. Proc. IEEE International Conference on Data Engineering (1998) pp. 24–33
Google Scholar
R. Baeza-Yates, B. Ribeiro-Neto: Modern Information Retrieval ( Addison Wesley, New York, 1999 )
Google Scholar
M. Balabanovic: An adaptive web page recommendation service, Proc. the 1st International Conference on Autonomous Agents (1997) pp. 378–385
Google Scholar
P. Batista, M. Silva: Mining on–line newspaper web access logs, Proc. the 2nd International Conference on Adaptive Hypermedia and Adaptive Web Based Systems (Malaga, 2002) http://citeseetnj.nec.com/517088.html
B. Berendt: Detail and Context in Web Usage Mining: coarsening and visualising sequences, LNAI 2356 (Springer, 2002 ) pp. 1–24
Google Scholar
F. Bonchi, E Giannotii, C. Gozzi, G. Manco, M. Nanni, D. Pedreschi, C. Renso, S. Ruggieri: Web log data warehousing and mining for intelligent web caching, Data Knowledge Engineering, 39, 165–189 (2001)
Article MATH Google Scholar
J. Borges, M. Levene: Mining association rules in hypertext databases, Proc. KDD’98 (1998) pp. 149–153
Google Scholar
I. Borges, M. Levene: Data mining of user navigation patterns, Proc. WEBKDD’99: Workshop on Web Usage Analysis and User Profiling (1999) pp. 92–111
Google Scholar
J. Borges, M. Levene: Heuristics for mining high quality user web navigation patterns, Research Note RN/99/68, Department of Computer Science, University College, London, 1999
Google Scholar
J. Borges, M. Levene: A heuristic to capture longer user web navigation patterns, Proc. the 1st International Conference on Electronic Commerce and Web Technologies (2000) pp. 155–164
Google Scholar
A.G. Buchner, S. Anand, M. Mulvenna, J. Hughes: Discovering internet marketing intelligence through Web log mining, Proc. Unicom’99 Data Mining and Data Warehousing: Realising the full value of Business Data (1999) pp. 127–138
Google Scholar
A.G. Buchner, M. Baumgarten, S.S. Anand, M.D. Mulvenna, J.G. Hughes: Navigation pattern discovery from internet data, Proc WEBKDD’99: Workshop on Web Usage Analysis and User Profiling (1999) pp. 25–30
Google Scholar
I. Cadez, D. Heckermain, C. Meek, P. Smyth, S. White: Visualization of navigation patterns on a web site using model–based clusterin, Proc. KDD’00 (2000) pp. 280–284
Google Scholar
L.D. Catledge, J.E. Pitkow: Characterizing browsing behaviors on the World–Wide Web, Computer Networks and ISDN System, 27, 1065–1073 (1995)
Article Google Scholar
S. Chakrabarti, B. Dom, R. Kumar,P. R.ghavan, S. Rajagopalan, A. Tomkins, D. Gibson, J.M. Kleinberg: Mining the Web’s link structure, IEEE Computer, 32, 60–67 (1999)
Article Google Scholar
P. Chan: A non–invasive learning approach to building web user profiles, Proc. WE– BKDD’99: Workshop on Web Usage Analysis and User Profiling (1999) pp. 7–12
Google Scholar
E. Chen, B. Krishnamurthy, J. Rexford: Improving end–to–end performance of the Web using server volumes and proxy filters, Proc. ACM SIGCOM (1998) pp. 241–253
Google Scholar
M.S. Chen, J.S. Park, P.S. Yu: Data mining for path traversal patterns in a web environment, Proc. the 16th International Conference on Distributed Computing System (1996) pp. 385–392
Google Scholar
M.S. Chen, J.S. Park, P.S. Yu: Efficient data mining for path traversal patterns, IEEE Transactions on Knowledge and Data Engineerin, 10, 209–221 (1998)
Article Google Scholar
R. Chen, K. Sivakumar, H. Kargupta: Collective mining of Bayesian networks from distributed heterogeneous data, accepted in publication of Knowledge and Information Systems (2001) http://www.csee.umbc.edu/ hillol/PUBS/kais02.pdf
R. Cooley, B. Mobasher, J. Srivastava: Web mining: information and pattern discovery on the World–Wide Web, Proc. the 9th IEEE International Conference on Tools with Artificial Intelligence (1997) pp. 558–567
Google Scholar
R. Cooley, P.–N. Tan, J. Srivastava: Discovery of interesting usage patterns from web data, Technical Report TR 99–022, University of Minnesota (1999)
Google Scholar
R. Cooley, B. Mobasher, J. Srivastava: Data preparation for mining World–Wide Web browsing patterns, Knowledge and Information System, 1, 5–32 (1999)
Google Scholar
M. Deshpande, G. Karypis: Selective Markov models for predicting web page accesses. Technical Report #00–056, University of Minessota (2000)
Google Scholar
M. Drott: Using web server logs to improve site design, Proc. ACM Conference on Computer Documentation (1998) pp. 43–50
Google Scholar
S. Elo-Dean, M. Viveros: Data mining the IBM official 1996 Olympics web site, Technical report, IBM T.J. Watson Reseach Center (1997)
Google Scholar
O. Etzioni, The World Wide Web: quagmire or gold mine? Communications of the ACM, 39, 65–68 (1996)
Article Google Scholar
U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy (eds.): Advances in Knowledge Discovery and Data Minining (AAAI/MIT Press, 1996 )
Google Scholar
U. Fayyard, G. Piatetsky-Shapiro, P. Smyth: From data mining to knowledge discovery: an overview, In: U.M. Fayyad, G. Piatetsky–Shapiro, P. Smyth, R. Uthurusamy (eds.) Advances in knowledge Discovery and Data Minining 1–34 (1996)
Google Scholar
D. Florescu, A.Y. Levy, A.O. Mendelzon: Database techniques for the World–Wide Web: a survey, SIGMOD Record, 27, 59–74 (1998)
Article Google Scholar
Y. Fu, K. Sandhu, M.-Y. Shih: A generalization–based approach to clustering of web usage sessions, Proc. WEBKDD’99: Workshop on Web Usage Analysis and User Profiling (1999) pp. 21–38
Google Scholar
A.A. Freitas: On rule interestingness measures, Knowledge–Based System, 12, 309315 (1999)
Google Scholar
M. Garofalakis, R. Rastogi, S. Sestogi, K. Shim, Data mining and the web: past, present and future, Proc. Workshop on Web Information and Data Management (1999) pp. 43–47
Google Scholar
W. Gaul, L. Schmidt–Thieme: Recommender systems based on navigation path features, Proc. WEBKDD’01 (San Francisco, 2001) http://robotics.stanford.edu/—ronnyk/WEBKDD2001/lars.ps
A. Geyer–Schulz, M. Hahsler, M. Jahn: A customer purchase incidence model applied to recommender services, LNAI 2356 (Springer, 2002 ) pp. 25–47
Google Scholar
D. Gibson, J. Kleinberg, P. Raghavan: Inferring web communities from link topology, Proc. the 9th ACM Conference on Hypertext and Hypermedia (1998) pp. 225–234
Google Scholar
J. Han, M. Kamber: Data Mining, Concepts and Techniques (Morgan Kaufmann Publishers, Inc., San Francisco, 2001 )
Google Scholar
M. Hearst: Untangling text data mining, Proc. ACL’99: the 37th Annual Meeting of the Association for Computational Linguistics (University of Maryland, 1999)
Google Scholar
Z.–X. Huang, J. Ng, D.W. Cheung. M.K. Ng, W.–K. Ching: A cube model and cluster analysis for web access sessions, LNAI 2356 (Springer, 2002 ) pp. 48–67
Google Scholar
A. Joshi, R. Krishnapuram: Robust fuzzy clustering methods to support web mining, Proc. Workshop on Data Mining and Knowledge Discovery, 15. 1–15. 8 (1998)
Google Scholar
A. Joshi, R. Krishnapuram: On mining web access logs, Proc. Workshop on Research Issues in Data Mining and Knowledge Discovery (2000) pp. 63–69
Google Scholar
K.P. Joshi, A. Joshi, Y. Yesha, R. Krishnapuram: Warehousing and mining web logs, Proc. ACM CIKM Workshop on Web Information and Data Management (1999) pp. 63–68
Google Scholar
T. Kamdar, A. Joshi: On creating adaptive web servers using weblog mining, Technical report CS–TR–00–05, Department of Computer Science and Electrical Engineering, University of Maryland (2000)
Google Scholar
M. Kamber, R. Shinghal: Evaluating the interestingness of characteristic rules, Proc. KDD–96 (1996) pp. 263–266
Google Scholar
S. Khoshafian, A.B. Baker: Multimedia and Imaging Databases (Morgan Kaufmann Publishers, Inc., San Francisco, 1996 )
Google Scholar
R. Kosala, H. Blockeel: Web Mining Research: A Survey, SIGKDD Exploration. 2, 1–15 (2000)
Article Google Scholar
N. Koutsoupias: Exploring web access logs with correspondence analysis, Proc. the 2nd Hellenic Conference on Artificial Intelligence, Companion Volume (2002) pp. 229–236
Google Scholar
B. Lan, S. Bressan, B. Ooi: Making web servers pushier, Proc. WEBKDD’99: Workshop on Web Usage Analysis and User Profiling (1999) pp. 112–125
Google Scholar
P. Langley: User modeling in adaptive interfaces, Proc. the 7th International Conference on User Modeling (1999) pp. 357–370
Google Scholar
B. &, Y.M. Ma, P.S. Yu: Discovering unexpected information from your competitors’ web sites, Proc KDD’01 (2001) pp. 144–153
Google Scholar
W.W. Lou, G.M. &, H.J. Lu, Q. Yang: Cut–and–pick transactions for proxy log mining, LNCS 2287 (Springer, 2002 ) pp. 88–105
Google Scholar
S.K. Madria, S.S. Bhowmick, W.K. Ng, E.–P. Lim: Research issues in web data mining, Proc. Data Warehousing and Knowledge Discovery, 1st International Conference (1999) pp. 303–312
Google Scholar
B. Mobasher: WebPersonalizer: a server–side recommender system based on web usage mining, Technical Support, Department of Computer Science, University of Minnesota (2001)
Google Scholar
B. Mobasher, H. Dai, T. Luo, Y. Sun, J. Zhu: Combining Web Usage and Content Mining for More Effective Personalization, Proc. the International Conference on ECommerce and Web Technologies (2000)
Google Scholar
B. Mobasher, N. Jain, E. Han, J. Srivastava: Web mining: pattern discovery from World Wide Web transactions, Technical Report TR96–050, Department of Computer Science, University of Minnesota (1996)
Google Scholar
B. Mobasher, R. Cooley, J. Srivastava: Creating adaptive web sites through usage–based clustering of URLs, Proc. IEEE Knowledge and Data Engineering Workshop (KDEX’99) (1999)
Google Scholar
B. Mobasher, R. Cooley, J. Srivastava: Automatic personalization based on web usage mining, Communications of the ACM, 43, 127–134 (2000)
Article Google Scholar
T. Morzy, M. Wojciechowski, M. Zakrzewicz: Web users clustering, Poznan University of technology (1999) http://www.cs.put.poznan.pl/mzakrzewicz/pubs/iscis00.pdf
A. Nanopoulos, D. Katsaros, Y. Manolopoulos: Exploiting web log mining for web cache enhancement, LNAI 2356 (Springer, 2001 ) pp. 68–87
Google Scholar
A. Nanopoulos, Y. Manolopoulos: Finding generalized path patterns for web log data mining, LNCS 1884, (Springer, 2000 ) pp. 215–228
Google Scholar
O. Nasraoui, H. Frigui, A. Joshi, R. Krishnapuram: Mining web access logs using relational competitive fuzzy clustering, Proc. the 8th International Fuzzy Systems Association World Congress (1999)
Google Scholar
S. Oyanagi, K. Kubota, A. Nakase: Application of matrix clustering to web log analysis and access prediction, Proc. WEBKDD’01 (San Francisco, 2001)
Google Scholar
B. Padmanabhan, Z. Zheng, S. Kimbrough: Personalization from incomplete data: what you don’t know can hurt, Proc. KDD’01 (2001) pp. 154–163
Google Scholar
M. Perkowitz, O. Etzioni: Adaptive web sites: automatically synthesizing web pages, Proc. 15th National Conference on Artificial Intelligence (1998) pp. 727–732
Google Scholar
M. Perkowitz, O. Etzioni: Adaptive web sites: Conceptual cluster mining, Proc. 16th International Joint Conference on Artificial Intelligence (1999) pp. 264–269
Google Scholar
G. Piatetsky-Shapiro: Discovery, analysis, and presentation of strong rules. In: Piatetsky–Shapir. G. and Frawle. W.J. (Eds.) Knowledge Discovery in Database AAAI/MIT Press, 229–238 (1991)
Google Scholar
P. Pirolli, J. Pitkow, R. Rao: Silk from a sow’sear: Extracting usable structures from the web, Proc. 1996 Conference on Human Factors in Computing System (1996) pp. 118125
Google Scholar
J. Punin, M. Krishnamoorthy, M. Zaki: LOGML: log markup language for web usage mining, LNAI 2356 (Springer, 2001 ) pp. 88–112
Google Scholar
J.R. Quinlan: C4.5: Programs for Machine Learning (Morgan Kaufmann Publishers, Inc., San Francisco, 1993 )
Google Scholar
J. Rauch: Logical calculi for knowledge discovery in databases, Proc. PKDD’97 (1997) pp. 47–57
Google Scholar
J. Rauch, M. Simunek: Mining for association rules by 4ft–miner, Proc. the 14th International Conference on Applications of Prolog (2001) pp. 285–295
Google Scholar
G. Salton, M. McGill: Introduction to Modern Information Retrieval ( McGraw Hill, New York, 1983 )
MATH Google Scholar
C. Shahabi, E. Banaei–Kashani, J. Faruque: A framework for efficient and anonymous web usage mining based on client–side tracking, LNAI 2356 (Springer, 2002) pp. 113144
Google Scholar
C. Shahabi, A. Faisal, F. Banaei–Kashani, J. Faruque: Insite: A tool for real–time knowledge discovery from users web navigation, Proc. the 26th International Conference on Very Large Databases (2000) pp. 635–638
Google Scholar
C. Shahabi, F. Banaei–Kashani, J. F.ruque, A. Faisal: Feature matrices: a model for efficient and anonymous web usage mining, LNCS 2115 (Springer, 2002 ) pp. 280–294
Google Scholar
C. Shahabi, A. Zarkesh, J. Adibi, V. Shah: Knowledge discovery from users Webpage navigation, Proc 7th IEEE International Conference On Research Issues in Data Engineering (1997) pp. 20–29
Google Scholar
L. Shen, L. Cheng, J. Ford, E Makedon, V. Megalooikonomou, T. Steinberg: Mining the most interesting web access associations, Proc. the 5th International Conference on Knowledge Discovery and Data Mining (KDD ‘89) (1999) pp. 145–154
Google Scholar
E. Spertus: Parasite: Mining structural information on the Web, Computer Networks and ISDN Systems. International Journal of Computer and Telecommunication Networking, 29, 1205–1215 (1997)
Google Scholar
M. Spiliopoulou, L. Faulstich: WUM: a web utilization miner, The World–Wide Web and Database, International Workshop WebDB’98, 109–115 (1998)
Google Scholar
R. Srikant, Y. Yang: Mining web logs to improve website organization, Proc. World–Wide Web 2001 (2001) pp. 430–437
Google Scholar
J. Srivastava, R. Cooley, M. Deshpande, P.–N. Tan: Web usage mining: discovery and applications of usage patterns from web data, SIGKDD Exploration, 1, 12–23 (2000)
Article Google Scholar
V.S. Subrahmanian: Principles of Multimedia Database Systems, (Morgan Kaufmann Publishers, Inc., San Francisco, 1998 )
Google Scholar
A.–H. Tan: Text mining: the state of the art and the challenges, Proc. PAKDD’99 Workshop on Knowledge Discovery from Advanced Databases (1999) pp. 65–70
Google Scholar
P. Tan, V. Kumar: Mining indirect associations in web data, LNAI 2356 (Springer, 2002 ) pp. 145–166
Google Scholar
C.J. van Rijsbergen: Information Retrieval ( Butterworths, London, 1979 )
Google Scholar
K. Wu, P.S. Yu, A. Ballman: SpeedTracer: a web usage mining tool, IBM Systems Journal, 37, 89–105 (1998)
Article Google Scholar
T. Yan, M. Jacobsen, H. Garcia–Molina, U. Dayal, From user access patterns to dynamic hypertext linking, Computer Networks and ISDN System, 28, 10071014 (1996)
Google Scholar
Y. Yang, J.O. Pedersen: A comparative study on feature selection in text categorization, Proc. the 14th International Conference on Machine Learning (1997) pp. 412420
Google Scholar
Q. Yang, H. Zhang, I. Tian, Y. Li: Mining web logs for prediction models in WWW caching and prefetching, Proc. KDD’01 (2001) pp. 473–478
Google Scholar
Y.Y. Yao, H.J. Hamilton, X. Wang, PagePrompter: an intelligent web agent created using data mining techniques, Technical Report, CS–2000–08, Department of Computer Science, University of Regina (2000)
Google Scholar
Y.Y. Yao, N. Zhong, An analysis of quantitative measures associated with rules, Proc. PAKDD’99 (1999) pp. 479–488
Google Scholar
Y.Y. Yao, N. Zhong, J. &, S. Ohsuga: Web intelligence (WI): research challenges and trends in the new information age. In: N. Zhong. Y. Y. Yao. J. &. S. Ohsuga (eds.) Web Intelligence: Research and Development, LNAI 2198 (Springer, 2001 ) pp. 1–17
Chapter Google Scholar
O.R. Zaiane, J. Han, Z.–N. Li, S.H. Chee, J. Chiang: Multimediaminer: a system pro– totype for multimedia data mining, Proc. ACM SIGMOD’98 (1998) pp. 581–583
Google Scholar
O.R. Zaiane, M. Xin, J. Han: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web Logs, Advances in Digital Librarie (1998) pp. 19–29
Google Scholar
A. Zarkesh, J. Adibi, C. Shahabi, R. Sadri, V. Shah: Analysis and Design of Server Informative WWW–sites, Proc. 6th International Conference on Information and Knowledge Management (1997) pp. 254–261
Google Scholar
N. Zhong, C. &. S. Ohsuga: Dynamically organizing KDD processes, International Journal of Pattern Recognition and Artificial Intelligence, 15, 451–473 (2001)
Article Google Scholar
N. Zhong, C. &, Y. Kakemoto, S. Ohsuga: KDD process planning, Proc. KDD’97 (1997) pp. 291–294
Google Scholar
N. Zhong, J. &, Y.Y. Yao (eds.): Special Issue on Web Intelligence, IEEE Computer, 35 (11) (November 2002)
Google Scholar
N. Zhong, J. &, Y.Y. Yao, S. Ohsuga: Web intelligence (WI). Proc. the 24th IEEE Computer Society International Computer Software and Applications Conference (IEEE CS Press, 2000 ) pp. 469–470
Google Scholar
N. Zhong, Y.Y. Yao, J. &, S. Ohsuga, (eds.): Web Intelligence: Research and Development (LNAI 2198, Springer, 2001 )
Google Scholar
Sane Solution, LLC. The NetTracker: logfile analysis and usage tracking software, http://www.sane.com/products/NetTracker
Stephen Turner, The Analog: logfile analyze, http://www.analog.cx/
NetIQ Co. The Webtrends: web analytics for smarter decision, http://www.netiq.com/webtrends/default.asp
Pilot Software, Inc. The Hitlist: Business analysis solution, http://www.pilotsoftware.com/solutions/hitlist.htm
Blue Martini Software, Inc. The Blue Martini: Evaluating customers experience, http://www.bluemartini.com
Information Discover, Inc. The Data Mining Suite: Powerful data mining system for very large databases,http://www.datamining.com
Ascential Software, Inc. The Torrent Webhouse: analysis of Web system data, http://www.torrent.com/

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada, S4S 0A2
Zhiyong Lu & Yiyu Yao
Department of Systems and Information Engineering, Maebashi Institute of Technology, 460-1 Kamisadori-Cho, Maebashi-City, 371-0816, Japan
Ning Zhong

Authors

Zhiyong Lu
View author publications
You can also search for this author in PubMed Google Scholar
Yiyu Yao
View author publications
You can also search for this author in PubMed Google Scholar
Ning Zhong
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Knowledge Information Systems Lab., Dept. of Systems and Information Eng., Maebashi Institute of Technology, 460-1 Kamisadori-Cho, 371-0816, Maebashi-City, Japan
Ning Zhong
Dept. of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong, China
Jiming Liu
Dept. of Computer Science, University of Regina, S4S 0A2, Regina, Saskatchewan, Canada
Yiyu Yao

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lu, Z., Yao, Y., Zhong, N. (2003). Web Log Mining. In: Zhong, N., Liu, J., Yao, Y. (eds) Web Intelligence. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-05320-1_9

Download citation

DOI: https://doi.org/10.1007/978-3-662-05320-1_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-07936-8
Online ISBN: 978-3-662-05320-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Web Log Mining

Abstract

Access this chapter

Preview

Similar content being viewed by others

Web Usage Data Cleaning

A Thorough Study on Weblog Files and Its Analysis Tools

Analysis of Web Log Mining Based on Association Rule

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Web Log Mining

Abstract

Access this chapter

Preview

Similar content being viewed by others

Web Usage Data Cleaning

A Thorough Study on Weblog Files and Its Analysis Tools

Analysis of Web Log Mining Based on Association Rule

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation