Abstract
Sequential pattern mining is a vital data mining task to discover the frequently occurring patterns in sequence databases. As databases develop, the problem of maintaining sequential patterns over an extensively long period of time turn into essential, since a large number of new records may be added to a database. To reflect the current state of the database where previous sequential patterns would become irrelevant and new sequential patterns might appear, there is a need for efficient algorithms to update, maintain and manage the information discovered. Several efficient algorithms for maintaining sequential patterns have been developed. Here, we have presented an efficient algorithm to handle the maintenance problem of CFM-sequential patterns (Compact, Frequent, Monetary-constraints based sequential patterns). In order to efficiently capture the dynamic nature of data addition and deletion into the mining problem, initially, we construct the updated CFM-tree using the CFM patterns obtained from the static database. Then, the database gets updated from the distributed sources that have data which may be static, inserted, or deleted. Whenever the database is updated from the multiple sources, CFM tree is also updated by including the updated sequence. Then, the updated CFM-tree is used to mine the progressive CFM-patterns using the proposed tree pattern mining algorithm. Finally, the experimentation is carried out using the synthetic and real life distributed databases that are given to the progressive CFM-miner. The experimental results and analysis provides better results in terms of the generated number of sequential patterns, execution time and the memory usage over the existing IncSpan algorithm.
Article PDF
Avoid common mistakes on your manuscript.
Abbreviations
- CFM:
-
Compactness
Frequency
Monetary
- min_sup:
-
minimum support
- CFM-tree:
-
Tree with Compact frequent monetary values of sequential pattern
- CT:
-
Compact threshold
- Tm:
-
Monetary threshold
References
Cláudia Antunes, Arlindo L. Oliveira, “Generalization of Pattern-growth Methods for Sequential Pattern Mining with Gap Constraints”, Machine Learning and Data Mining in Pattern Recognition, Lecture Notes in Computer Science, Vol: 2734, pp: 239–251, 2003.
M. J. Zaki, “Efficient enumeration of frequent sequences,” In Proceedings of the 7th International Conference on Information and Knowledge Management, Washington, USA, pp. 68–75, 1998.
Jianyong Wang, Yuzhou Zhang, Lizhu Zhou, George Karypis, Charu C. Aggarwal, “Discriminating Subsequence Discovery for Sequence Clustering.”, Proceedings of the Seventh SIAM International Conference on Data Mining, April 26–28, 2007, Minneapolis, Minnesota, USA 2007.
Jinlin Chen, Terry Cook, “Mining contiguous sequential patterns from web logs”, In Proceedings of the 16th international conference on World Wide Web, Banff, Alberta, Canada, pp: 1177–1178, 2007.
Florent Masseglia, Pascal Poncelet and Maguelonne Teisseire, “Incremental mining of sequential patterns in large databases”, Data & Knowledge Engineering, Vol. 46, No.1, pp. 97–121, 2003.
Ming-Yen Lin and Suh-Yin Lee, “Interactive Sequence Discovery by Incremental Mining”, An International Journal of Information Sciences-Informatics and Computer Science, vol. 165, No. 3–4, pp.187–205, October 2004.
Jian Pei, Jiawei Han and Wei Wang, “Constraint-based sequential pattern mining: the pattern-growth methods”, Journal of Intelligent Information Systems, Vol: 28, No: 2, pp: 133–160, 2007.
Rong She, Fei Chen, Ke Wang, Martin Ester, Jennifer L. Gardy, Fiona S. L. Brinkman, “Frequent-subsequence-based prediction of outer membrane proteins”, In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp: 436–445, 2003.
Bhawna Mallick, Deepak Garg and P. S. Grover, “Incremental Mining of Sequential Patterns- Progress and Challenges”, Intelligent Data Analysis – An International Journal, Vol. 17, No. 3, 2013, accepted for publication.
David Lo and Siau-Cheng Khoo., “SMArTIC: towards building an accurate, robust and scalable specification miner.”, In Proceedings of the 14th ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2005, Portland, Oregon, USA, November 5–11, 2006.
Bhawna Mallick, Deepak Garg and P.S. Grover, “CFM-PrefixSpan: A pattern growth algorithm incorporating compactness and monetary”, International Journal of Innovative Computing, Information and Control, ISSN 1349–4198, Volume 8, Number 7(A), July 2012, pp 4509–4520.
Ming-Yen, Lin, Suh-Yin Lee, “Efficient mining of sequential patterns with time constraints by delimited pattern growth”, Knowledge and Information Systems, Vol.7, No. 4, pp.499–514, 2005.
Jian Pei, Jiawei Han and Wei Wang, “Constraint-based sequential pattern mining: the pattern-growth methods”, Journal of Intelligent Information Systems, Vol. 28, No: 2, pp: 133–160, 2007.
Tarek Sobh, “Innovations and Advanced Techniques in Computer and Information Sciences”, Springer, 2007, ISBN 978-1-4020-6268-1.
Jigyasa Bisaria, Namita Srivastav, Kamal Raj Pardasani, “A Rough Set Model for Sequential Pattern Mining with Constraints”, In Proceedings of the (IJCNS) International Journal of Computer and Network Security, Vol. 1, No. 2, November 2009.
Enhong Chen, Huanhuan Cao, Qing Li, and Tieyun Qian, “Efficient strategies for tough aggregate constraint-based sequential pattern mining”, Information Sciences, Vol. 178, No.6, pp.1498–1518, 15 March 2008.
Ming-Yen, Lin, Suh-Yin Lee, “Efficient mining of sequential patterns with time constraints by delimited pattern growth”, Knowledge and Information Systems, Vol.7, No. 4, pp.499–514, 2005.
Jong Bum Lee, Minghao Piao, Jin-ho Shin, Hi-Seok Kim; Keun Ho Ryu, “ITFP: Incremental TFP for mining frequent patterns from large data sets”, In proceedings of the 2nd International Conference on Computer Engineering and Technology (ICCET), Chengdu, pp: V2-181– V2-185, 2010.
Ming-Yen Lin, Sue-Chen Hsueh, Chih-Chen Chan, “Incremental Discovery of Sequential Patterns Using a Backward Mining Approach”, In proceedings of the International Conference on Computational Science and Engineering, Vancouver, BC, pp: 64–70, 2009.
Jen-Wei Huang, Chi-Yao Tseng, Jian-Chih Ou, Ming-Syan Chen, “A General Model for Sequential Pattern Mining with a Progressive Database”, IEEE Transactions on Knowledge and Data Engineering, Vol. 20, No: 9, pp: 1153–1167, 2008.
Lionel Vinceslas, Jean-Emile Symphor, Alban Mancheron and Pascal Poncelet, “SPAMS: a novel Incremental Approach for Sequential Pattern Mining in Data Streams”, Advances in Knowledge Discovery and Management, Studies in Computational Intelligence, Vol: 292, pp: 201–216, 2010.
Yue Chen, Jiankui Guo, Yaqin Wang, Yun Xiong and Yangyong Zhu, “Incremental Mining of Sequential Patterns Using Prefix Tree”, Advances in Knowledge Discovery and Data Mining, Lecture Notes in Computer Science, Vol: 4426 pp: 433–440, 2007.
Lei Chang, Dongqing Yang, Tengjiao Wang and Shiwei Tang, “IMCS: Incremental Mining of Closed Sequential Patterns”, Advances in Data and Web Management, Lecture Notes in Computer Science, Vol: 4505, pp: 50–61, 2007.
Jian Pei, Jiawei Han, Behzad Mortazavi-Asl, Jianyong Wang, Helen Pinto, Qiming Chen, Umeshwar Dayal and Mei-Chun Hsu, “Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach”, IEEE transactions on Knowledge and Data Engineering, Vol. 16, No. 10, October 2004.
Hong Cheng, Xifeng Yan, Jiawei Han, “IncSpan: incremental mining of sequential patterns in large database”, In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, 2004.
Maragatham G & Lakshmi M, (2011), “A weighted Particle Swarm Optimization Technique for optimizing association rules”, 4th International Conference on Recent trends in Computing, communication and information technologies, Dec 9–11, 2011. Proceedings published by Springer (LNCS) - Communications in Computer and Information Science (CCIS) Series part II, pp: 675–684.
George Aloysius, D. Binu, “An approach to products placement in supermarkets using PrefixSpan algorithm”, Journal of King Saud University - Computer and Information Sciences, in press, July 2012.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
About this article
Cite this article
Mallick, B., Garg, D. & Grover, P.S. Progressive CFM-Miner: An Algorithm to Mine CFM — Sequential Patterns from a Progressive Database. Int J Comput Intell Syst 6, 209–222 (2013). https://doi.org/10.1080/18756891.2013.768432
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1080/18756891.2013.768432