Knowledge and Information Systems

, Volume 5, Issue 4, pp 439–465 | Cite as

HDM: A Client/Server/Engine Architecture for Real-Time Web Usage Mining

  • Florent MassegliaEmail author
  • Maguelonne Teisseire
  • Pascal Poncelet


The behavior of the users of a website may change so quickly that it becomes a real challenge to attempt to make predictions according to the frequent patterns coming from the analysis of an access log file. In order to reduce the obsolescence of behavioral patterns as much as possible, the ideal method would provide frequent patterns in real time, making the result immediately available. In this paper, we propose a method for finding frequent behavioral patterns in real time, whatever the number of connected users. Considering how fast frequent behavior patterns may have changed since the time the access log file was analyzed, this result thus provides completely appropriate navigation schemata for predicting user behavior. Based on a distributed heuristic, our method also tackles and provides answers to several problems within the framework of data mining: the discovery of ‘interesting zones’ (a large number of frequent patterns concentrated over a period of time, or ‘super-frequent’ patterns), discovering very long sequential patterns and interactive data mining (‘on-the-fly’ modification of the minimum support).


Distributed Heuristic Interactive data mining Long sequential patterns Real time Zone mining 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agrawal R, Srikant R (1995) Mining sequential patterns. In Proceedings of the 11th international conference on data engineering (ICDE’95), Taipei, TaiwanGoogle Scholar
  2. 2.
    Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD conference, Washington, DC, pp 207–216Google Scholar
  3. 3.
    Bayardo R (1998) Efficiently mining long patterns from databases. In Proceedings of the 1998 ACM SIGMOD conference, Almaden, CA, pp 85–93Google Scholar
  4. 4.
    Cooley R (2000) Web usage mining: discovery and application of interesting patterns from web data. Technical report, University of MinnesotaGoogle Scholar
  5. 5.
    Cooley R, Mobasher B, Srivastava J (1997) Web mining: information and pattern discovery on the World Wide Web. In Proceedings of the 9th IEEE international conference on tools with artificial intelligence (ICTAI’97)Google Scholar
  6. 6.
    Cormen T, Leiserson C, Rivest R (1990) Introduction to Algorithm. MIT Press, Cambridge, MAGoogle Scholar
  7. 7.
    Fayad U, Piatetsky-Shapiro G, Smyth, P, Uthurusamy R (eds) (1996) Advances in knowledge discovery and data mining. AAAI Press, Menlo Park, CAGoogle Scholar
  8. 8.
    Gunopulos D, Manila H, Saluja S (1997) Discovering all most specific sentences by randomized algorithms extended abstract. Technical report, Academy of FinlandGoogle Scholar
  9. 9.
    Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In Proceedings of the 2000 ACM SIGMOD conference, Dallas, TXGoogle Scholar
  10. 10.
    Masseglia F, Poncelet P, Cicchetti R (1999a) An efficient algorithm for web usage mining. Networking and Information Systems Journal 2(571–603)Google Scholar
  11. 11.
    Masseglia F, Poncelet P, Teisseire M (1999b) Using data mining techniques on web access logs to dynamically improve hypertext structure. ACM SigWeb Letters, 8:13–19Google Scholar
  12. 12.
    Masseglia F, Teisseire M, Poncelet P (2001) Real time web usage mining: a heuristic based distributed miner. In Proceedings of the 2nd international conference on web information systems engineering (WISE’2001), Kyoto, JapanGoogle Scholar
  13. 13.
    Mobasher B, Jain N, Han E, Srivastava J (1996) Web mining: pattern discovery from World Wide Web transactions. Technical report TR-96-050, Department of Computer Science, University of MinnesotaGoogle Scholar
  14. 14.
    Neuss C, Vromas J (1996) Applications CGI en Perl pour les webmasters. Thomson Publishing.Google Scholar
  15. 15.
    Spiliopoulou M, Faulstich L (1998) WUM: a tool for web utilization analysis. In Proceedings of the EDBT workshop (WebDB’98), Valencia, SpainGoogle Scholar
  16. 16.
    Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In Proceedings of the 5th international conference on extending database technology (EDBT’96), Avignon, France, pp 3–17Google Scholar
  17. 17.
    Toivonen H (1996) Sampling large databases for association rules. In Proceedings of the 22nd international conference on very large databases (VLDB’96), Bombay, IndiaGoogle Scholar
  18. 18.
    WWW Consortium (1998) httpd-log files. In Scholar
  19. 19.
    Zaïane O, Xin M, Han J (1998) Discovering web access patterns and trends by applying OLAP and data mining technology on web logs. In Proceedings on advances in digital libraries conference (ADL’98), Santa Barbara, CAGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2003

Authors and Affiliations

  • Florent Masseglia
    • 1
    • 2
    Email author
  • Maguelonne Teisseire
    • 1
  • Pascal Poncelet
    • 3
  1. 1.LIRMMMontpellierFrance
  2. 2.Laboratoire PRiSMUniversité de VersaillesVersaillesFrance
  3. 3.Ecole des Mines d’AlèsNîmesFrance

Personalised recommendations