Skip to main content
Log in

Fast incremental mining of web sequential patterns with PLWAP tree

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

An Erratum to this article was published on 13 August 2009

Abstract

Point and click at web pages generate continuous data sequences, which flow into the web log data, causing the need to update previously mined web sequential patterns. Algorithms for mining web sequential patterns from scratch include WAP, PLWAP and Apriori-based GSP. Reusing old patterns with only recent additional data sequences in an incremental fashion, when updating patterns, would achieve fast response time with reasonable memory space usage. This paper proposes two algorithms, RePL4UP (Revised PLWAP For UPdate), and PL4UP (PLWAP For UPdate), which use the PLWAP tree structure to incrementally update web sequential patterns efficiently without scanning the whole database even when previous small items become frequent. The RePL4UP concisely stores the position codes of small items in the database sequences in its metadata during tree construction. During mining, RePL4UP scans only the new additional database sequences, revises the old PLWAP tree to restore information on previous small items that have become frequent, while it deletes previous frequent items that have become small using the small item position codes. PL4UP initially builds a bigger PLWAP tree that includes all sequences in the database using a tolerance support, t, that is lower than the regular minimum support, s. The position code features of the PLWAP tree are used to efficiently mine these trees to extract current frequent patterns when the database is updated. These approaches more quickly update old frequent patterns without the need to re-scan the entire updated database.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the 11th Int’l conference on data engineering, Taipei, pp 3–14

  • Berendt B, Spiliopoulou M (2000) Analyzing navigation behavior in web sites integrating multiple information systems. VLDB Journal, Special Issue on Databases and the Web 9(1): 56–75

    Google Scholar 

  • Cheung H, Yan X, Han J (2004) IncSpan: incremental mining of sequential patterns. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, Seattle, pp 527–532

  • Cheung DW, Han J, Ng VT, Wong CY (1996) Maintenance of discovered association rules in large database: an incremental updating technique. In: Proceedings of the 12th international conference on data Engineering, New Orleans

  • Cheung D, Kao B, Lee J (1997) Discovering user access patterns on the world wide web. In: Proceedings of the 1st Pacific-Asia conference on knowledge discovery and data mining (PAKDD’97)

  • El-Sayed M, Carolina R, Elke AR (2004) FS-miner: efficient and incremental mining of frequent sequence patterns in web logs. In: Proceedings of the 6th ACM international workshop on web information and data management, Washington DC, pp 128–135

  • Ezeife CI, Chen M (2004a) Mining web sequential patterns incrementally with revised PLWAP tree. In: Proceedings of the fifth international conference on web-age information management (WAIM 2004) Dalian, published in LNCS by Springer, pp 539–548

  • Ezeife CI, Chen M (2004b) Incremental mining of web sequential patterns using PLWAP tree on tolerance minsupport. In: Proceedings of the IEEE 8th international database engineering and applications symposium (IDEAS04), Coimbra, pp 465–479

  • Ezeife CI, Lu Y (2005) Mining web log sequential patterns with position coded pre-order linked WAP-tree. Int J Data Mining Knowl Discov, Kluwer Acad Publ 10: 5–38

    Article  MathSciNet  Google Scholar 

  • Ezeife CI, Lu Yi, Liu Yi (2005) PLWAP sequential mining: open source code proceedings of the open source data mining workshop on frequent pattern mining implementations, in conjunction with ACM SIGKDD, Chicago, August 21–24, pp 26–29

  • Han J, Kamber M (2001) Data mining: concepts and techniques Morgan Kaufmann

  • Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Int J Data Mining Knowl Discov, Kluwer Acad Publ 8(1): 53–87

    Article  MathSciNet  Google Scholar 

  • Kao B, Zhang M, Yi C-L, Cheung DW (2005) Efficient algorithms for mining and incremental update of maximal frequent sequences. Int J Data Mining Knowl Discov, Springer Sci Publ 10: 87–116

    Article  Google Scholar 

  • Lee Y-S, Yen S-J (2008) Incremental and interactive mining of web traversal patterns. Inform Sci 178(2): 287–306

    Article  Google Scholar 

  • Liu J-W, Yu S-J, Le J-J (2003) Online mining dynamic web news patterns using machine learn methods. FSKD Conference, Springer Lecture Notes in AI 3614, pp 462–465

  • Lu Yi, Ezeife CI (2003) Position coded pre-order linked WAP-tree for web log sequential pattern mining. In: Proceedings of the 7th Pacific-Asia conference on knowledge discovery and data mining (PAKDD 2003), Seoul, Korea

  • Masseglia F, Poncelet P, Cicchetti R (1999) An efficient algorithm for web usage mining. Netw Inform Syst J 2(5–6): 571–603

    Google Scholar 

  • Masseglia F, Poncelet P, Teisseire M (2003) Incremental mining of sequential patterns in large databases. Data Knowl Eng 46(1): 97–121

    Article  Google Scholar 

  • Nanopoulos A, Manolopoulos Y (2000) Finding generalized path patterns for web log data mining. Data Knowl Eng 37(3): 243–266

    Article  Google Scholar 

  • Nanopoulos A, Manolopoulos Y (2001) Mining patterns from graph traversals. Data Knowl Eng 37(3): 243–266

    Article  MATH  Google Scholar 

  • Nguyen S, Sun X, Orlowska M (2005) Improvements of incSpan: incremental mining of sequential patterns in large database. In: Proceedings 2000 Pacific-Asia conference on knowledge discovery and data mining (PAKDD’05), pp 442–451

  • Ou J-C, Lee C-H, Chen M-S (2008) Incremental web log mining with dynamic threshold. VLDBJ 17: 827–847

    Article  Google Scholar 

  • Parthasarathy S, Zaki MJ, Ogihara M, Dwarkadas S (1999) Incremental and interactive sequence mining. In: Proceedings of the 8th international conference on information and knowledge management (CIKM99), Kansas City, pp 251– 258

  • Pei J, Han J, Mortazavi-asl B, Zhu H (2000) Mining access patterns efficiently from web logs. In: proceedings 2000 Pacific-Asia conference on knowledge discovery and data mining (PAKDD’00), Kyoto, pp 396–407

  • Pei J, Han J, Mortazavi-Asl B, Pinto H (2001) PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. In: The proceedings of the 2001 international conference on data engineering (ICDE ’01), pp 215–224

  • Srikant R, Agrawal R (1995) Mining generalized association rules. In: Proceedings of the 21st int’l conference on very large databases (VLDB), Zurich

  • Spiliopoulou M (1999) The laborious way from data mining to web mining. J Comput Syst Sci Eng, Special Issue Semant Web 14: 113–126

    Google Scholar 

  • Tang P, Turkia M (2007) Mining frequent web access patterns with partial enumerations. 45th ACM Annual Southeast Regional Conference, 23–24 March 2007, Winston-Salem, N.Carolina, pp 226–231

  • Wang K (1997) Discovering patterns from large and dynamic sequential data. J Intell Inform Syst 9(1): 33–56

    Article  Google Scholar 

  • Wang K, Tan J (1996) Incremental discovery of sequential patterns. In: Proceedings of the ACM workshop on research issues on data mining and knowledge discovery, Montreal

  • Yen S-J, Lee Y-S (2006) An incremental data mining algorithm for discovering web access patterns. Int J Bus Intell Data Mining 1(3): 288–303

    MathSciNet  Google Scholar 

  • Zaki MJ (2000) SPADE: an efficient algorithm for mining frequent sequences. Mach Learn 42: 31–60

    Article  Google Scholar 

  • Zhang M, Kao B, Cheung D, Yip C-L (2002) Efficient algorithms for incremental update of frequent sequences. In: Proceedings of the sixth Pacific-Asia conference on knowledge discovery and data mining (PAKDD), pp 186–197

  • Zhang M, Kao B, Yip C-L (2002) A comparison study on algorithms for incremental update of frequent sequences. In: Proceedings of the IEEE international conference on data mining ICDM, pp 554–561

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to C. I. Ezeife.

Additional information

Responsible editor: Eamonn Keogh.

An erratum to this article can be found at http://dx.doi.org/10.1007/s10618-009-0144-3

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ezeife, C.I., Liu, Y. Fast incremental mining of web sequential patterns with PLWAP tree. Data Min Knowl Disc 19, 376–416 (2009). https://doi.org/10.1007/s10618-009-0133-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-009-0133-6

Keywords

Navigation