A novel mapreduce algorithm for distributed mining of sequential patterns using co-occurrence information

Saleti, Sumalatha; Subramanyam, R. B. V.

doi:10.1007/s10489-018-1259-2

A novel mapreduce algorithm for distributed mining of sequential patterns using co-occurrence information

Published: 20 August 2018

Volume 49, pages 150–171, (2019)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Sumalatha Saleti¹ &
R. B. V. Subramanyam¹

553 Accesses
11 Citations
Explore all metrics

Abstract

Sequential Pattern Mining (SPM) problem is much studied and extended in several directions. With the tremendous growth in the size of datasets, traditional algorithms are not scalable. In order to solve the scalability issue, recently few researchers have developed distributed algorithms based on MapReduce. However, the existing MapReduce algorithms require multiple rounds of MapReduce, which increases communication and scheduling overhead. Also, they do not address the issue of handling long sequences. They generate huge number of candidate sequences that do not appear in the input database and increases the search space. This results in more number of candidate sequences for support counting. Our algorithm is a two phase MapReduce algorithm that generates the promising candidate sequences using the pruning strategies. It also reduces the search space and thus the support computation is effective. We make use of the item co-occurrence information and the proposed Sequence Index List (SIL) data structure helps in computing the support at fast. The experimental results show that the proposed algorithm has better performance over the existing MapReduce algorithms for the SPM problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Mining Uncertain Sequential Patterns in Iterative MapReduce

Scalable and parallel sequential pattern mining using spark

Article 10 May 2018

An Efficient Map-Reduce Framework to Mine Periodic Frequent Patterns

Notes

References

Agrawal R, Srikant R (1995) Mining Sequential Patterns. In: Proceedings of the Eleventh international conference on data engineering, pp 3–14
Aseervatham S, Osmani A, Viennet E (2006) bitSPADE: a lattice-based sequential pattern mining algorithm using bitmap representation. In: Proceedings of the Sixth international conference on data mining
Ayres J, Flannick J, Gehrke J, Yiu T (2002) Sequential PAttern mining using a bitmap representation. In: Proceedings of the Eighth ACM SIGKDD international conference on knowledge discovery and data mining
Chen CC, Shuai HH, Chen MS (2017) Distributed and scalable sequential pattern mining through stream processing. Knowl Inf Syst 53(2):365–390
Article Google Scholar
Chen CC, Tseng CY, Chen MS (2013) Highly scalable sequential pattern mining based on MapReduce model on the cloud. In: Proceedings of IEEE international congress on big data, pp 310–317
Chen J (2010) An UpDown directed acyclic graph approach for sequential pattern mining. IEEE Trans Knowl Data Eng 22(7):913–928
Article Google Scholar
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51 (1):107–113
Article Google Scholar
Fournier-Viger P, Gomariz A, Campos M, Thomas R (2014) Fast vertical mining of sequential patterns using co-occurrence information. In: Tseng VS, Ho TB, Zhou ZH, Chen ALP, Kao HY (eds) Advances in knowledge discovery and data mining. Springer, Cham, pp 40–52
Fournier-Viger P, Lin JCW, Kiran RU, Koh YS, Thomas R (2017) A survey of sequential pattern mining. Data Science and Pattern Recognition 1(1):54–77
Google Scholar
Fumarola F, Lanotte PF, Ceci M, Malerba D (2016) cloFAST: closed sequential pattern mining using sparse and vertical id-lists. Knowl Inf Syst 48(2):429–463
Article Google Scholar
Gomariz A, Campos M, Marin R, Goethals B (2013) claSP: an efficient algorithm for mining frequent closed sequences. In: Pei J, Tseng VS, Cao L, Motoda H, Xu G (eds) Advances in knowledge discovery and data mining, vol 7818. Springer, Heidelberg, pp 50–61
Guralnik V, Karypis G (2004) Parallel tree-projection-based sequence mining algorithms. Parallel Comput 30(4):443–472
Article Google Scholar
Han J, Pei J, Mortazavi-Asl B, Chen Q, Dayal U, Hsu MC (2000) FreeSpan: frequent pattern-projected sequential pattern mining. In: Proceedings of the Sixth ACM SIGKDD international conference on knowledge discovery and data mining, pp 355–359
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a Frequent-Pattern tree approach. Data Min Knowl Disc 8(1):53–87
Article MathSciNet Google Scholar
Hoang T, Le B, Tran MT (2017) Distributed algorithm for sequential pattern mining on a large sequence dataset. In: Proceedings of the Ninth international conference on knowledge and systems engineering, pp 18–23
Huang JW, Lin SC, Chen MS (2010) DPSP: distributed progressive sequential pattern mining on the cloud. In: Zaki MJ, Yu JX, Ravindran B, Pudi V (eds) Advances in knowledge discovery and data mining. Springer, Berlin, pp 27–34
Huynh B, Vo B, Snasel V (2017) An efficient method for mining frequent sequential patterns using multi-Core processors. Appl Intell 46(3):703–716
Article Google Scholar
Kieu T, Vo B, Le T, Deng ZH, Le B (2017) Mining top-k co-occurrence items with sequential pattern. Expert Syst Appl 85(1):123–133
Article Google Scholar
Mabroukeh NR, Ezeife CI (2010) A taxonomy of sequential pattern mining algorithms. ACM Comput Surv 43(1):3:1–3:41
Article Google Scholar
Masseglia F, Cathala F, Poncelet P (1998) The PSP approach for mining sequential patterns. In: Proceedings of the Second European symposium on principles of data mining and knowledge discovery, Lect Notes Comput Sci, vol 1510, pp 176–184
Miliaraki I, Berberich K, Gemulla R, Zoupanos S (2013) Mind the gap: large-scale frequent sequence mining. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data, pp 797–808
Pei J, Han J, Mortazavi-Asl B, Wang J, Pinto H, Chen Q, Dayal U, Hsu MC (2004) Mining sequential patterns by pattern-growth: the PrefixSpan approach. IEEE Trans Knowl Data Eng 16(11):1424–1440
Article Google Scholar
Salvemini E, Fumarola F, Malerba D, Han J (2011) FAST sequence mining based on sparse Id-Lists. In: Kryszkiewicz M, Rybinski H, Skowron A, Ras ZW (eds) Foundations of intelligent systems. Springer, Berlin, pp 316–325
Shintani T, Kitsuregawa M (1998) Mining algorithms for sequential patterns in parallel : hash based approach. In: Wu X, Kotagiri R, Korb KB (eds) Research and development in knowledge discovery and data mining, vol 1394. Springer, Berlin, pp 283–294
Srikant R, Agrawal R (1996) Mining sequential patterns: generalizations and performance improvements. In: Proceedings of the Fifth international conference on extending database technology, vol 1057, pp 3–17
Wang J, Huang JL, Chen YC (2016) On efficiently mining high utility sequential patterns knowledge information systems. https://doi.org/10.1007/s10115-015-0914-8
Wang X, Wang J, Wang T, Li H, Yang D (2010) Parallel sequential pattern mining by transaction decomposition. In: Proceedings of the Seventh international conference on fuzzy systems and knowledge discovery, pp 1746–1750
White T (2015) Hadoop: The Definitive guide, fourth edn O’Reilly Media
Yang Z, Kitsuregawa M (2005) LAPIN-SPAM: an improved algorithm for mining sequential pattern. In: Proceedings of the 21st international conference on data engineering
Yang Z, Wang Y, Kitsuregawa M (2007) LAPIN: Effective sequential pattern mining algorithms by last position induction for dense databases. In: Kotagiri R, Krishna PR, Mohania M, Nantajeewarawat E (eds) Advances in databases: concepts, systems and applications, vol 4443. Springer, Berlin, pp 1020–1023
Yong-qing W, Dong L, Lin-shan D (2012) Distributed prefixspan algorithm based on MapReduce. In: Proceedings of 2012 internatioanl symposium on information technology in medicine and education, pp 901–904
Yu X, Liu J, Liu X, Ma C, Li B (2015) A MapReduce reinforced distributed sequential pattern mining algorithm. In: Wang G, Zomaya A, Martinez G, Li K (eds) Algorithms and architectures for parallel processing, vol 9529. Springer, Cham, pp 183– 197
Zaki MJ (2001) Parallel sequence mining on Shared-Memory machines. J Parallel Distrib Comput 61(3):401–426
Article MATH Google Scholar
Zaki MJ (2001) SPADE: An efficient algorithm for mining frequent sequences. Mach Learn 42(1-2):31–60
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

National Institute of Technology, Warangal, India
Sumalatha Saleti & R. B. V. Subramanyam

Authors

Sumalatha Saleti
View author publications
You can also search for this author in PubMed Google Scholar
R. B. V. Subramanyam
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sumalatha Saleti.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Saleti, S., Subramanyam, R.B.V. A novel mapreduce algorithm for distributed mining of sequential patterns using co-occurrence information. Appl Intell 49, 150–171 (2019). https://doi.org/10.1007/s10489-018-1259-2

Download citation

Published: 20 August 2018
Issue Date: 15 January 2019
DOI: https://doi.org/10.1007/s10489-018-1259-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel mapreduce algorithm for distributed mining of sequential patterns using co-occurrence information

Abstract

Access this article

Similar content being viewed by others

Mining Uncertain Sequential Patterns in Iterative MapReduce

Scalable and parallel sequential pattern mining using spark

An Efficient Map-Reduce Framework to Mine Periodic Frequent Patterns

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A novel mapreduce algorithm for distributed mining of sequential patterns using co-occurrence information

Abstract

Access this article

Similar content being viewed by others

Mining Uncertain Sequential Patterns in Iterative MapReduce

Scalable and parallel sequential pattern mining using spark

An Efficient Map-Reduce Framework to Mine Periodic Frequent Patterns

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation