Skip to main content
Log in

Web usage mining: extracting unexpected periods from web logs

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Existing Web usage mining techniques are currently based on an arbitrary division of the data (e.g. “one log per month”) or guided by presumed results (e.g. “what is the customers’ behaviour for the period of Christmas purchases?”). These approaches have two main drawbacks. First, they depend on the above-mentioned arbitrary organization of data. Second, they cannot automatically extract “seasonal peaks” from among the stored data. In this paper, we propose a specific data mining process (in particular, to extract frequent behaviour patterns) in order to reveal the densest periods automatically. From the whole set of possible combinations, our method extracts the frequent sequential patterns related to the extracted periods. A period is considered to be dense if it contains at least one frequent sequential pattern for the set of users connected to the website in that period. Our experiments show that the extracted periods are relevant and our approach is able to extract both frequent sequential patterns and the associated dense periods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD conference, Washington DC, USA, May, pp 207–216

  • Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the 11th international conference on data engineering (ICDE’95), Tapei, Taiwan, March

  • Bonchi F, Giannotti F, Gozzi C, Manco G, Nanni M, Pedreschi D, Renso C and Ruggieri S (2001). Web log data warehousing and mining for intelligent web caching. Data Knowl Eng 39(2): 165–189

    Article  MATH  Google Scholar 

  • Cooley R, Mobasher B and Srivastava J (1999). Data preparation for mining world wide web browsing patterns. Knowl Inf Syst 1(1): 5–32

    Google Scholar 

  • Cormen T, Leiserson C, Rivest R (1994) Introduction to algorithms. MIT Press

  • Fayad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R, (eds) (1996) Advances in knowledge discovery and data mining. AAAI Press Menlo Park, CA

  • Han J, Kamber M (2001) Data mining, concepts and techniques. Morgan Kaufmann

  • Hay B, Wets G and Vanhoof K (2004). Mining navigation patterns using a sequence alignment method. Knowl Inf Syst 6(2): 150–163

    Google Scholar 

  • http Analyze, http://www.http-analyze.org/

  • Kleinberg J (2002) Bursty and hierarchical structure in streams. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining, Edmonton, Alberta, Canada, July 23–26

  • Kum H, Pei J, Wang W, Duncan D (2003) ApproxMAP: approximate mining of consensus sequential patterns. In: Proceedings of SIAM International Conference on data mining, San Francisco, CA

  • Kumar R, Novak J, Raghavan P, Tomkins A (2003) On the bursty evolution of blogspace. In: WWW ’03: Proceedings of the 12th international conference on World wide web, pp 568–576

  • Masseglia F, Cathala F, Poncelet P (1998) The PSP approach for mining sequential patterns. In: Proceedings of the 2nd European symposium on principles of data mining and knowledge discovery (PKDD’98), Nantes, France, September, pp 176–184

  • Masseglia F, Poncelet P and Cicchetti R (2000). An efficient algorithm for web usage mining. Netw Inf Syst J 2: 571–603

    Google Scholar 

  • Masseglia F, Poncelet P, Teisseire M, Marascu A (2005) Web usage mining: Extracting unexpected periods from web logs. In: Proceedings of the 2nd workshop on temporal data mining (TDM 2005), held in conjunction with the 5th IEEE international conference on data mining (ICDM’05), Houston, USA, 27 November

  • Meger N, Rigotti C (2004) Constraint-based mining of episode rules and optimal window sizes. In: Proceedings of the 8th European conference on principles and practice of knowledge discovery in databases (PKDD), Pisa, Italy, September, pp 313–324

  • Mobasher B, Dai H, Luo T and Nakagawa M (2002). Discovery and evaluation of aggregate usage profiles for web personalization. Data Mining Knowl Discov 6(1): 61–82

    Article  MathSciNet  Google Scholar 

  • Mueller A (1995) Fast sequential and parallel algorithms for association rules mining: a comparison. Technical report CS-TR-3515, Department of Computer Science, University of Maryland-College Park, August

  • Nakagawa M, Mobasher B (2003) Impact of site characteristics on recommendation models based on association rules and sequential patterns. In: Proceedings of the IJCAI’03 workshop on intelligent techniques for web personalization, Acapulco, Mexico, August

  • Neuss C, Vromas J (1996) Applications CGI en Perl pour les Webmasters. Thomson Publishing

  • Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu MC (2001) PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. In: 17th international conference on data engineering (ICDE)

  • Spiliopoulou M, Faulstich LC, Winkler K (1999) A data miner analyzing the navigational behaviour of web users. In: Proceedings of the workshop on machine learning in user modelling of the ACAI’99 international conference Creta, Greece, July

  • Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: Proceedings of the 5th international conference on extending database technology (EDBT’96), Avignon, France, September, pp 3–17

  • Tanasa D, Trousse B (2004) Advanced data preprocessing for intersites web usage mining. IEEE Intell Syst 19(2):59–65. ISSN 1094-7167

    Google Scholar 

  • Webalizer, http://www.mrunix.net/webalizer/

  • World Wide Web Consortium. (1998) httpd-log files. http://lists.w3.org/Archives

  • Zhu J, Hong J, Hughes JG (2002) Using Markov chains for link prediction in adaptive web sites. In: Proceedings of soft-ware 2002: first international conference on computing in an imperfect world, Belfast, UK, April, pp 60–73

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to F. Masseglia.

Additional information

Responsible Editor: Chang-shing Perng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Masseglia, F., Poncelet, P., Teisseire, M. et al. Web usage mining: extracting unexpected periods from web logs. Data Min Knowl Disc 16, 39–65 (2008). https://doi.org/10.1007/s10618-007-0080-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-007-0080-z

Keywords

Navigation