Data Mining and Knowledge Discovery

, Volume 16, Issue 1, pp 39–65 | Cite as

Web usage mining: extracting unexpected periods from web logs

  • F. Masseglia
  • P. Poncelet
  • M. Teisseire
  • A. Marascu
Article

Abstract

Existing Web usage mining techniques are currently based on an arbitrary division of the data (e.g. “one log per month”) or guided by presumed results (e.g. “what is the customers’ behaviour for the period of Christmas purchases?”). These approaches have two main drawbacks. First, they depend on the above-mentioned arbitrary organization of data. Second, they cannot automatically extract “seasonal peaks” from among the stored data. In this paper, we propose a specific data mining process (in particular, to extract frequent behaviour patterns) in order to reveal the densest periods automatically. From the whole set of possible combinations, our method extracts the frequent sequential patterns related to the extracted periods. A period is considered to be dense if it contains at least one frequent sequential pattern for the set of users connected to the website in that period. Our experiments show that the extracted periods are relevant and our approach is able to extract both frequent sequential patterns and the associated dense periods.

Keywords

Web usage mining Sequential patterns Periods Users behaviour 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD conference, Washington DC, USA, May, pp 207–216Google Scholar
  2. Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the 11th international conference on data engineering (ICDE’95), Tapei, Taiwan, MarchGoogle Scholar
  3. Bonchi F, Giannotti F, Gozzi C, Manco G, Nanni M, Pedreschi D, Renso C and Ruggieri S (2001). Web log data warehousing and mining for intelligent web caching. Data Knowl Eng 39(2): 165–189 MATHCrossRefGoogle Scholar
  4. Cooley R, Mobasher B and Srivastava J (1999). Data preparation for mining world wide web browsing patterns. Knowl Inf Syst 1(1): 5–32 Google Scholar
  5. Cormen T, Leiserson C, Rivest R (1994) Introduction to algorithms. MIT PressGoogle Scholar
  6. Fayad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R, (eds) (1996) Advances in knowledge discovery and data mining. AAAI Press Menlo Park, CAGoogle Scholar
  7. Han J, Kamber M (2001) Data mining, concepts and techniques. Morgan KaufmannGoogle Scholar
  8. Hay B, Wets G and Vanhoof K (2004). Mining navigation patterns using a sequence alignment method. Knowl Inf Syst 6(2): 150–163 Google Scholar
  9. Kleinberg J (2002) Bursty and hierarchical structure in streams. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining, Edmonton, Alberta, Canada, July 23–26Google Scholar
  10. Kum H, Pei J, Wang W, Duncan D (2003) ApproxMAP: approximate mining of consensus sequential patterns. In: Proceedings of SIAM International Conference on data mining, San Francisco, CAGoogle Scholar
  11. Kumar R, Novak J, Raghavan P, Tomkins A (2003) On the bursty evolution of blogspace. In: WWW ’03: Proceedings of the 12th international conference on World wide web, pp 568–576Google Scholar
  12. Masseglia F, Cathala F, Poncelet P (1998) The PSP approach for mining sequential patterns. In: Proceedings of the 2nd European symposium on principles of data mining and knowledge discovery (PKDD’98), Nantes, France, September, pp 176–184Google Scholar
  13. Masseglia F, Poncelet P and Cicchetti R (2000). An efficient algorithm for web usage mining. Netw Inf Syst J 2: 571–603 Google Scholar
  14. Masseglia F, Poncelet P, Teisseire M, Marascu A (2005) Web usage mining: Extracting unexpected periods from web logs. In: Proceedings of the 2nd workshop on temporal data mining (TDM 2005), held in conjunction with the 5th IEEE international conference on data mining (ICDM’05), Houston, USA, 27 NovemberGoogle Scholar
  15. Meger N, Rigotti C (2004) Constraint-based mining of episode rules and optimal window sizes. In: Proceedings of the 8th European conference on principles and practice of knowledge discovery in databases (PKDD), Pisa, Italy, September, pp 313–324Google Scholar
  16. Mobasher B, Dai H, Luo T and Nakagawa M (2002). Discovery and evaluation of aggregate usage profiles for web personalization. Data Mining Knowl Discov 6(1): 61–82 CrossRefMathSciNetGoogle Scholar
  17. Mueller A (1995) Fast sequential and parallel algorithms for association rules mining: a comparison. Technical report CS-TR-3515, Department of Computer Science, University of Maryland-College Park, AugustGoogle Scholar
  18. Nakagawa M, Mobasher B (2003) Impact of site characteristics on recommendation models based on association rules and sequential patterns. In: Proceedings of the IJCAI’03 workshop on intelligent techniques for web personalization, Acapulco, Mexico, AugustGoogle Scholar
  19. Neuss C, Vromas J (1996) Applications CGI en Perl pour les Webmasters. Thomson PublishingGoogle Scholar
  20. Pei J, Han J, Mortazavi-Asl B, Pinto H, Chen Q, Dayal U, Hsu MC (2001) PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. In: 17th international conference on data engineering (ICDE)Google Scholar
  21. Spiliopoulou M, Faulstich LC, Winkler K (1999) A data miner analyzing the navigational behaviour of web users. In: Proceedings of the workshop on machine learning in user modelling of the ACAI’99 international conference Creta, Greece, JulyGoogle Scholar
  22. Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: Proceedings of the 5th international conference on extending database technology (EDBT’96), Avignon, France, September, pp 3–17Google Scholar
  23. Tanasa D, Trousse B (2004) Advanced data preprocessing for intersites web usage mining. IEEE Intell Syst 19(2):59–65. ISSN 1094-7167Google Scholar
  24. World Wide Web Consortium. (1998) httpd-log files. http://lists.w3.org/Archives
  25. Zhu J, Hong J, Hughes JG (2002) Using Markov chains for link prediction in adaptive web sites. In: Proceedings of soft-ware 2002: first international conference on computing in an imperfect world, Belfast, UK, April, pp 60–73Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  • F. Masseglia
    • 1
  • P. Poncelet
    • 2
  • M. Teisseire
    • 3
  • A. Marascu
    • 1
  1. 1.INRIA Sophia Antipolis – AxIS Project/TeamSophia AntipolisFrance
  2. 2.EMA-LGI2P/Site EERIENimes Cedex 1France
  3. 3.LIRMM UMR CNRS 5506Montpellier Cedex 5France

Personalised recommendations