Abstract
In the design and implementation of an Intelligent Web Information System (IWIS), it is necessary to consider the learning and discovery functionalities that produce the required knowledge of the system. Web log files provide a useful resource for the discovery of useful knowledge. In the context of IWIS, we present a brief survey of Web log mining. An overview of the more general topic known as Web mining is given first. Web log mining is then reviewed by focusing on three important aspects, namely, data preparation, Web log mining, and applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
R. Agrawal, T. Imielinski, A. Swami: Mining association rules between sets of items in large databases. Proc. SIGMOD’93 (1993) pp. 207–216
R. Agrawal, R. Srikant: Mining sequential patterns: generalizations and performance improvements. Proc. the 5th International Conference on Extendinding Database Technology (1996) pp. 3–17
G. Arocena, A. Mendelzon: WebOQL: restructuring documents, databases and webs. Proc. IEEE International Conference on Data Engineering (1998) pp. 24–33
R. Baeza-Yates, B. Ribeiro-Neto: Modern Information Retrieval ( Addison Wesley, New York, 1999 )
M. Balabanovic: An adaptive web page recommendation service, Proc. the 1st International Conference on Autonomous Agents (1997) pp. 378–385
P. Batista, M. Silva: Mining on–line newspaper web access logs, Proc. the 2nd International Conference on Adaptive Hypermedia and Adaptive Web Based Systems (Malaga, 2002) http://citeseetnj.nec.com/517088.html
B. Berendt: Detail and Context in Web Usage Mining: coarsening and visualising sequences, LNAI 2356 (Springer, 2002 ) pp. 1–24
F. Bonchi, E Giannotii, C. Gozzi, G. Manco, M. Nanni, D. Pedreschi, C. Renso, S. Ruggieri: Web log data warehousing and mining for intelligent web caching, Data Knowledge Engineering, 39, 165–189 (2001)
J. Borges, M. Levene: Mining association rules in hypertext databases, Proc. KDD’98 (1998) pp. 149–153
I. Borges, M. Levene: Data mining of user navigation patterns, Proc. WEBKDD’99: Workshop on Web Usage Analysis and User Profiling (1999) pp. 92–111
J. Borges, M. Levene: Heuristics for mining high quality user web navigation patterns, Research Note RN/99/68, Department of Computer Science, University College, London, 1999
J. Borges, M. Levene: A heuristic to capture longer user web navigation patterns, Proc. the 1st International Conference on Electronic Commerce and Web Technologies (2000) pp. 155–164
A.G. Buchner, S. Anand, M. Mulvenna, J. Hughes: Discovering internet marketing intelligence through Web log mining, Proc. Unicom’99 Data Mining and Data Warehousing: Realising the full value of Business Data (1999) pp. 127–138
A.G. Buchner, M. Baumgarten, S.S. Anand, M.D. Mulvenna, J.G. Hughes: Navigation pattern discovery from internet data, Proc WEBKDD’99: Workshop on Web Usage Analysis and User Profiling (1999) pp. 25–30
I. Cadez, D. Heckermain, C. Meek, P. Smyth, S. White: Visualization of navigation patterns on a web site using model–based clusterin, Proc. KDD’00 (2000) pp. 280–284
L.D. Catledge, J.E. Pitkow: Characterizing browsing behaviors on the World–Wide Web, Computer Networks and ISDN System, 27, 1065–1073 (1995)
S. Chakrabarti, B. Dom, R. Kumar,P. R.ghavan, S. Rajagopalan, A. Tomkins, D. Gibson, J.M. Kleinberg: Mining the Web’s link structure, IEEE Computer, 32, 60–67 (1999)
P. Chan: A non–invasive learning approach to building web user profiles, Proc. WE– BKDD’99: Workshop on Web Usage Analysis and User Profiling (1999) pp. 7–12
E. Chen, B. Krishnamurthy, J. Rexford: Improving end–to–end performance of the Web using server volumes and proxy filters, Proc. ACM SIGCOM (1998) pp. 241–253
M.S. Chen, J.S. Park, P.S. Yu: Data mining for path traversal patterns in a web environment, Proc. the 16th International Conference on Distributed Computing System (1996) pp. 385–392
M.S. Chen, J.S. Park, P.S. Yu: Efficient data mining for path traversal patterns, IEEE Transactions on Knowledge and Data Engineerin, 10, 209–221 (1998)
R. Chen, K. Sivakumar, H. Kargupta: Collective mining of Bayesian networks from distributed heterogeneous data, accepted in publication of Knowledge and Information Systems (2001) http://www.csee.umbc.edu/ hillol/PUBS/kais02.pdf
R. Cooley, B. Mobasher, J. Srivastava: Web mining: information and pattern discovery on the World–Wide Web, Proc. the 9th IEEE International Conference on Tools with Artificial Intelligence (1997) pp. 558–567
R. Cooley, P.–N. Tan, J. Srivastava: Discovery of interesting usage patterns from web data, Technical Report TR 99–022, University of Minnesota (1999)
R. Cooley, B. Mobasher, J. Srivastava: Data preparation for mining World–Wide Web browsing patterns, Knowledge and Information System, 1, 5–32 (1999)
M. Deshpande, G. Karypis: Selective Markov models for predicting web page accesses. Technical Report #00–056, University of Minessota (2000)
M. Drott: Using web server logs to improve site design, Proc. ACM Conference on Computer Documentation (1998) pp. 43–50
S. Elo-Dean, M. Viveros: Data mining the IBM official 1996 Olympics web site, Technical report, IBM T.J. Watson Reseach Center (1997)
O. Etzioni, The World Wide Web: quagmire or gold mine? Communications of the ACM, 39, 65–68 (1996)
U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy (eds.): Advances in Knowledge Discovery and Data Minining (AAAI/MIT Press, 1996 )
U. Fayyard, G. Piatetsky-Shapiro, P. Smyth: From data mining to knowledge discovery: an overview, In: U.M. Fayyad, G. Piatetsky–Shapiro, P. Smyth, R. Uthurusamy (eds.) Advances in knowledge Discovery and Data Minining 1–34 (1996)
D. Florescu, A.Y. Levy, A.O. Mendelzon: Database techniques for the World–Wide Web: a survey, SIGMOD Record, 27, 59–74 (1998)
Y. Fu, K. Sandhu, M.-Y. Shih: A generalization–based approach to clustering of web usage sessions, Proc. WEBKDD’99: Workshop on Web Usage Analysis and User Profiling (1999) pp. 21–38
A.A. Freitas: On rule interestingness measures, Knowledge–Based System, 12, 309315 (1999)
M. Garofalakis, R. Rastogi, S. Sestogi, K. Shim, Data mining and the web: past, present and future, Proc. Workshop on Web Information and Data Management (1999) pp. 43–47
W. Gaul, L. Schmidt–Thieme: Recommender systems based on navigation path features, Proc. WEBKDD’01 (San Francisco, 2001) http://robotics.stanford.edu/—ronnyk/WEBKDD2001/lars.ps
A. Geyer–Schulz, M. Hahsler, M. Jahn: A customer purchase incidence model applied to recommender services, LNAI 2356 (Springer, 2002 ) pp. 25–47
D. Gibson, J. Kleinberg, P. Raghavan: Inferring web communities from link topology, Proc. the 9th ACM Conference on Hypertext and Hypermedia (1998) pp. 225–234
J. Han, M. Kamber: Data Mining, Concepts and Techniques (Morgan Kaufmann Publishers, Inc., San Francisco, 2001 )
M. Hearst: Untangling text data mining, Proc. ACL’99: the 37th Annual Meeting of the Association for Computational Linguistics (University of Maryland, 1999)
Z.–X. Huang, J. Ng, D.W. Cheung. M.K. Ng, W.–K. Ching: A cube model and cluster analysis for web access sessions, LNAI 2356 (Springer, 2002 ) pp. 48–67
A. Joshi, R. Krishnapuram: Robust fuzzy clustering methods to support web mining, Proc. Workshop on Data Mining and Knowledge Discovery, 15. 1–15. 8 (1998)
A. Joshi, R. Krishnapuram: On mining web access logs, Proc. Workshop on Research Issues in Data Mining and Knowledge Discovery (2000) pp. 63–69
K.P. Joshi, A. Joshi, Y. Yesha, R. Krishnapuram: Warehousing and mining web logs, Proc. ACM CIKM Workshop on Web Information and Data Management (1999) pp. 63–68
T. Kamdar, A. Joshi: On creating adaptive web servers using weblog mining, Technical report CS–TR–00–05, Department of Computer Science and Electrical Engineering, University of Maryland (2000)
M. Kamber, R. Shinghal: Evaluating the interestingness of characteristic rules, Proc. KDD–96 (1996) pp. 263–266
S. Khoshafian, A.B. Baker: Multimedia and Imaging Databases (Morgan Kaufmann Publishers, Inc., San Francisco, 1996 )
R. Kosala, H. Blockeel: Web Mining Research: A Survey, SIGKDD Exploration. 2, 1–15 (2000)
N. Koutsoupias: Exploring web access logs with correspondence analysis, Proc. the 2nd Hellenic Conference on Artificial Intelligence, Companion Volume (2002) pp. 229–236
B. Lan, S. Bressan, B. Ooi: Making web servers pushier, Proc. WEBKDD’99: Workshop on Web Usage Analysis and User Profiling (1999) pp. 112–125
P. Langley: User modeling in adaptive interfaces, Proc. the 7th International Conference on User Modeling (1999) pp. 357–370
B. &, Y.M. Ma, P.S. Yu: Discovering unexpected information from your competitors’ web sites, Proc KDD’01 (2001) pp. 144–153
W.W. Lou, G.M. &, H.J. Lu, Q. Yang: Cut–and–pick transactions for proxy log mining, LNCS 2287 (Springer, 2002 ) pp. 88–105
S.K. Madria, S.S. Bhowmick, W.K. Ng, E.–P. Lim: Research issues in web data mining, Proc. Data Warehousing and Knowledge Discovery, 1st International Conference (1999) pp. 303–312
B. Mobasher: WebPersonalizer: a server–side recommender system based on web usage mining, Technical Support, Department of Computer Science, University of Minnesota (2001)
B. Mobasher, H. Dai, T. Luo, Y. Sun, J. Zhu: Combining Web Usage and Content Mining for More Effective Personalization, Proc. the International Conference on ECommerce and Web Technologies (2000)
B. Mobasher, N. Jain, E. Han, J. Srivastava: Web mining: pattern discovery from World Wide Web transactions, Technical Report TR96–050, Department of Computer Science, University of Minnesota (1996)
B. Mobasher, R. Cooley, J. Srivastava: Creating adaptive web sites through usage–based clustering of URLs, Proc. IEEE Knowledge and Data Engineering Workshop (KDEX’99) (1999)
B. Mobasher, R. Cooley, J. Srivastava: Automatic personalization based on web usage mining, Communications of the ACM, 43, 127–134 (2000)
T. Morzy, M. Wojciechowski, M. Zakrzewicz: Web users clustering, Poznan University of technology (1999) http://www.cs.put.poznan.pl/mzakrzewicz/pubs/iscis00.pdf
A. Nanopoulos, D. Katsaros, Y. Manolopoulos: Exploiting web log mining for web cache enhancement, LNAI 2356 (Springer, 2001 ) pp. 68–87
A. Nanopoulos, Y. Manolopoulos: Finding generalized path patterns for web log data mining, LNCS 1884, (Springer, 2000 ) pp. 215–228
O. Nasraoui, H. Frigui, A. Joshi, R. Krishnapuram: Mining web access logs using relational competitive fuzzy clustering, Proc. the 8th International Fuzzy Systems Association World Congress (1999)
S. Oyanagi, K. Kubota, A. Nakase: Application of matrix clustering to web log analysis and access prediction, Proc. WEBKDD’01 (San Francisco, 2001)
B. Padmanabhan, Z. Zheng, S. Kimbrough: Personalization from incomplete data: what you don’t know can hurt, Proc. KDD’01 (2001) pp. 154–163
M. Perkowitz, O. Etzioni: Adaptive web sites: automatically synthesizing web pages, Proc. 15th National Conference on Artificial Intelligence (1998) pp. 727–732
M. Perkowitz, O. Etzioni: Adaptive web sites: Conceptual cluster mining, Proc. 16th International Joint Conference on Artificial Intelligence (1999) pp. 264–269
G. Piatetsky-Shapiro: Discovery, analysis, and presentation of strong rules. In: Piatetsky–Shapir. G. and Frawle. W.J. (Eds.) Knowledge Discovery in Database AAAI/MIT Press, 229–238 (1991)
P. Pirolli, J. Pitkow, R. Rao: Silk from a sow’sear: Extracting usable structures from the web, Proc. 1996 Conference on Human Factors in Computing System (1996) pp. 118125
J. Punin, M. Krishnamoorthy, M. Zaki: LOGML: log markup language for web usage mining, LNAI 2356 (Springer, 2001 ) pp. 88–112
J.R. Quinlan: C4.5: Programs for Machine Learning (Morgan Kaufmann Publishers, Inc., San Francisco, 1993 )
J. Rauch: Logical calculi for knowledge discovery in databases, Proc. PKDD’97 (1997) pp. 47–57
J. Rauch, M. Simunek: Mining for association rules by 4ft–miner, Proc. the 14th International Conference on Applications of Prolog (2001) pp. 285–295
G. Salton, M. McGill: Introduction to Modern Information Retrieval ( McGraw Hill, New York, 1983 )
C. Shahabi, E. Banaei–Kashani, J. Faruque: A framework for efficient and anonymous web usage mining based on client–side tracking, LNAI 2356 (Springer, 2002) pp. 113144
C. Shahabi, A. Faisal, F. Banaei–Kashani, J. Faruque: Insite: A tool for real–time knowledge discovery from users web navigation, Proc. the 26th International Conference on Very Large Databases (2000) pp. 635–638
C. Shahabi, F. Banaei–Kashani, J. F.ruque, A. Faisal: Feature matrices: a model for efficient and anonymous web usage mining, LNCS 2115 (Springer, 2002 ) pp. 280–294
C. Shahabi, A. Zarkesh, J. Adibi, V. Shah: Knowledge discovery from users Webpage navigation, Proc 7th IEEE International Conference On Research Issues in Data Engineering (1997) pp. 20–29
L. Shen, L. Cheng, J. Ford, E Makedon, V. Megalooikonomou, T. Steinberg: Mining the most interesting web access associations, Proc. the 5th International Conference on Knowledge Discovery and Data Mining (KDD ‘89) (1999) pp. 145–154
E. Spertus: Parasite: Mining structural information on the Web, Computer Networks and ISDN Systems. International Journal of Computer and Telecommunication Networking, 29, 1205–1215 (1997)
M. Spiliopoulou, L. Faulstich: WUM: a web utilization miner, The World–Wide Web and Database, International Workshop WebDB’98, 109–115 (1998)
R. Srikant, Y. Yang: Mining web logs to improve website organization, Proc. World–Wide Web 2001 (2001) pp. 430–437
J. Srivastava, R. Cooley, M. Deshpande, P.–N. Tan: Web usage mining: discovery and applications of usage patterns from web data, SIGKDD Exploration, 1, 12–23 (2000)
V.S. Subrahmanian: Principles of Multimedia Database Systems, (Morgan Kaufmann Publishers, Inc., San Francisco, 1998 )
A.–H. Tan: Text mining: the state of the art and the challenges, Proc. PAKDD’99 Workshop on Knowledge Discovery from Advanced Databases (1999) pp. 65–70
P. Tan, V. Kumar: Mining indirect associations in web data, LNAI 2356 (Springer, 2002 ) pp. 145–166
C.J. van Rijsbergen: Information Retrieval ( Butterworths, London, 1979 )
K. Wu, P.S. Yu, A. Ballman: SpeedTracer: a web usage mining tool, IBM Systems Journal, 37, 89–105 (1998)
T. Yan, M. Jacobsen, H. Garcia–Molina, U. Dayal, From user access patterns to dynamic hypertext linking, Computer Networks and ISDN System, 28, 10071014 (1996)
Y. Yang, J.O. Pedersen: A comparative study on feature selection in text categorization, Proc. the 14th International Conference on Machine Learning (1997) pp. 412420
Q. Yang, H. Zhang, I. Tian, Y. Li: Mining web logs for prediction models in WWW caching and prefetching, Proc. KDD’01 (2001) pp. 473–478
Y.Y. Yao, H.J. Hamilton, X. Wang, PagePrompter: an intelligent web agent created using data mining techniques, Technical Report, CS–2000–08, Department of Computer Science, University of Regina (2000)
Y.Y. Yao, N. Zhong, An analysis of quantitative measures associated with rules, Proc. PAKDD’99 (1999) pp. 479–488
Y.Y. Yao, N. Zhong, J. &, S. Ohsuga: Web intelligence (WI): research challenges and trends in the new information age. In: N. Zhong. Y. Y. Yao. J. &. S. Ohsuga (eds.) Web Intelligence: Research and Development, LNAI 2198 (Springer, 2001 ) pp. 1–17
O.R. Zaiane, J. Han, Z.–N. Li, S.H. Chee, J. Chiang: Multimediaminer: a system pro– totype for multimedia data mining, Proc. ACM SIGMOD’98 (1998) pp. 581–583
O.R. Zaiane, M. Xin, J. Han: Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web Logs, Advances in Digital Librarie (1998) pp. 19–29
A. Zarkesh, J. Adibi, C. Shahabi, R. Sadri, V. Shah: Analysis and Design of Server Informative WWW–sites, Proc. 6th International Conference on Information and Knowledge Management (1997) pp. 254–261
N. Zhong, C. &. S. Ohsuga: Dynamically organizing KDD processes, International Journal of Pattern Recognition and Artificial Intelligence, 15, 451–473 (2001)
N. Zhong, C. &, Y. Kakemoto, S. Ohsuga: KDD process planning, Proc. KDD’97 (1997) pp. 291–294
N. Zhong, J. &, Y.Y. Yao (eds.): Special Issue on Web Intelligence, IEEE Computer, 35 (11) (November 2002)
N. Zhong, J. &, Y.Y. Yao, S. Ohsuga: Web intelligence (WI). Proc. the 24th IEEE Computer Society International Computer Software and Applications Conference (IEEE CS Press, 2000 ) pp. 469–470
N. Zhong, Y.Y. Yao, J. &, S. Ohsuga, (eds.): Web Intelligence: Research and Development (LNAI 2198, Springer, 2001 )
Sane Solution, LLC. The NetTracker: logfile analysis and usage tracking software, http://www.sane.com/products/NetTracker
Stephen Turner, The Analog: logfile analyze, http://www.analog.cx/
NetIQ Co. The Webtrends: web analytics for smarter decision, http://www.netiq.com/webtrends/default.asp
Pilot Software, Inc. The Hitlist: Business analysis solution, http://www.pilotsoftware.com/solutions/hitlist.htm
Blue Martini Software, Inc. The Blue Martini: Evaluating customers experience, http://www.bluemartini.com
Information Discover, Inc. The Data Mining Suite: Powerful data mining system for very large databases,http://www.datamining.com
Ascential Software, Inc. The Torrent Webhouse: analysis of Web system data, http://www.torrent.com/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Lu, Z., Yao, Y., Zhong, N. (2003). Web Log Mining. In: Zhong, N., Liu, J., Yao, Y. (eds) Web Intelligence. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-05320-1_9
Download citation
DOI: https://doi.org/10.1007/978-3-662-05320-1_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-07936-8
Online ISBN: 978-3-662-05320-1
eBook Packages: Springer Book Archive