Applied Intelligence

, Volume 46, Issue 3, pp 703–716 | Cite as

An efficient method for mining frequent sequential patterns using multi-Core processors

Article

Abstract

The problem of mining frequent sequential patterns (FSPs) has attracted a great deal of research attention. Although there are many efficient algorithms for mining FSPs, the mining time is still high, especially for large or dense datasets. Parallel processing has been widely applied to improve processing speed for various problems. Some parallel algorithms have been proposed, but most of them have problems related to synchronization and load balancing. Based on a multi-core processor architecture, this paper proposes a load-balancing parallel approach called Parallel Dynamic Bit Vector Sequential Pattern Mining (pDBV-SPM) for mining FSPs from huge datasets using the dynamic bit vector data structure for fast determining support values. In the pDBV-SPM approach, the support count is sorted in ascending order before the set of frequent 1-sequences is partitioned into parts, each of which is assigned to a task on a processor so that most of the nodes in the leftmost branches will be infrequent and thus pruned during the search; this strategy helps to better balance the search tree. Experiments are conducted to verify the effectiveness of pDBV-SPM. The experimental results show that the proposed algorithm outperforms PIB-PRISM for mining FSPs in terms of mining time and memory usage.

Keywords

Data mining Dynamic bit vectors Multi-core processors Sequence patterns 

References

  1. 1.
    Agrawal R, Srikant R (1995) Mining Sequential Patterns. ICDE’95:3–14Google Scholar
  2. 2.
    Agrawal R, Srikant R (1996a) Mining Sequential Patterns: Generalizations and Performance Improvements. EDBT’96:3–17Google Scholar
  3. 3.
    Andrew B (2008) Multi-Core Processor Architecture Explained. http://software.intel.com/en-us/articles/multi-core-processor-architecture-explained. Accessed 20 Aug 2014
  4. 4.
    Ayres J, Gehrke J, Yiu T, Flannick J (2002) Sequential Pattern Mining using a Bitmap Representaion. SIGKDD’02:1–7Google Scholar
  5. 5.
    Casali A, Ernst C (2013) Extracting Correlated Patterns on Multicore Architectures. CD-ARES’13:118–133Google Scholar
  6. 6.
    Cong S, Han J, Padua D (2005) Parallel Mining of Closed Sequential Patterns. ACM SIGKDD’05:562–567Google Scholar
  7. 7.
    Flouri T, Iliopoulos C, Park K, Pissis S (2012) GapMis-OMP: Pairwise Short-Read Alignment on Multi-core Architectures. Artificial Intelligence Applications and Innovations 382:593–601CrossRefGoogle Scholar
  8. 8.
    Fournier-Viger P, Gomariz A, Campos M, Thomas R (2014) Fast Vertical Mining of Sequential Patterns Using Co-occurrence Information. PAKDD’14:40–52Google Scholar
  9. 9.
    Gouda K, Hassaan M, Zaki M (2010) Prism: An Effective Approach for Frequent Sequence Mining via Prime-Block Encoding. J Comput Syst Sci 76(1):88–102MathSciNetCrossRefMATHGoogle Scholar
  10. 10.
    Han J, Pei J, Yin Y (2000a) Mining Frequent Patterns Without Candiyear Generation. ACM SIGMOD:1–12Google Scholar
  11. 11.
    Han J, Pei J, Asl BM, Chen Q, Dayal U, Hsu M (2000b) Freespan: Frequent Pattern-Projected Sequential Pattern Mining. KDD’00:355–359Google Scholar
  12. 12.
    Huynh B, Vo B (2015) Using Multi-Core Processors for Mining Frequent Sequential Patterns. ICIC Express Letters 9(11):3071–3079Google Scholar
  13. 13.
    Laurent A, Négrevergne B, Sicard N, Termier A (2012) Efficient Parallel Mining of Gradual Patterns on Multicore Processors. Advances in Knowledge Discovery and Management 398:137–151CrossRefGoogle Scholar
  14. 14.
    Liu L, Li E, Zhang Y, Tang Z (2007) Optimization of Frequent Itemset Mining on Multiple-Core Processor. VLDB ’07:1275–1285Google Scholar
  15. 15.
    Lo D, Khoo SC, Liu C (2008) Mining and Ranking Generators of Sequential Patterns. SDM’08:553–564Google Scholar
  16. 16.
    Masseglia F, Cathala F, Poncelet P (1998) The PSP Approach for Mining Sequential Patterns. PKDD’98:176–184Google Scholar
  17. 17.
    Mannila H, Toivonen H, Verkamo AI (1997) Discovery of Frequent Episodes in Event Sequences. Data Min Knowl Disc:259–289Google Scholar
  18. 18.
    Negrevergne B, Termier A, Méhaut JF, Uno T (2010) Discovering Closed Frequent Itemsets on Multicore: Parallelizing Computations and Optimizing Memory Accesses. HPCS’10 IEEE:521–528Google Scholar
  19. 19.
    Negrevergne B, Termier A, Rousset MC, Méhaut J F (2014) Para Miner: A Generic Pattern Mining Algorithm for Multi-Core Architectures. Data Min Knowl Disc 28(3):593–633. http://link.springer.com/article/10.1007/s10618-013-0313-2 CrossRefMATHGoogle Scholar
  20. 20.
    Nguyen D, Vo B, Le B (2014) Efficient Strategies for Parallel Mining Class Association Rules. Expert Systems with Applications 41(10):4716–4729CrossRefGoogle Scholar
  21. 21.
    Pham T, Luo J, Vo B (2013) An Effective Algorithm for Mining Closed Sequential Patterns and Their Minimal Generators based on Prefix Trees. Int J Intell Inf Database Syst 7(4):324–339Google Scholar
  22. 22.
    Pham T, Luo J, Hong TP, Vo B (2014) An Efficient Method for Mining Non-Redundant Sequential Rules using Attributed Prefix Trees. Eng Appl Artif Intell 32:88–99CrossRefGoogle Scholar
  23. 23.
    Pei J, Han J, Asl BM, Wang J, Pinto H, Chen Q, Dayal U, Hsu M (2004) Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach. IEEE Trans Knowl Data Eng 16(11):1424–1440CrossRefGoogle Scholar
  24. 24.
    Raza K (2013) Application of Data Mining In Bioinformatics. Indian J Comput Sci Engineer 1(2):114–118Google Scholar
  25. 25.
    Sánchez F, Cabarcas F, Ramirez A, Valero M (2010) Long DNA Sequence Comparison on Multicore Architectures. Euro-Par 2010 - Parallel Process 6272:247–259CrossRefGoogle Scholar
  26. 26.
    Schlegel B, Karnagel T, Kiefer T, Lehner W (2013) Scalable frequent itemset mining on many-core processors. In: The 9th International Workshop on Data Management on New Hardware ACM Article No. 3Google Scholar
  27. 27.
    Tran T, Le B, Vo B (2015) Combination of Dynamic Bit Vectors and Transaction Information for Mining Frequent Closed Sequences Efficiently. Eng Appl Artif Intell 38:183–189CrossRefGoogle Scholar
  28. 28.
    Van T, Vo B, Le B (2014) IMSRPreTree: An Improved Algorithm for Mining Sequential Rules based on The Prefix-Tree. Vietnam. J Comput Sci 1(2):97–105Google Scholar
  29. 29.
    Vijayarani S, Deepa S (2014) An Efficient Algorithm for Sequence Generation in Data Mining. Int J Cybernetics & Inf 3(1):21–30CrossRefGoogle Scholar
  30. 30.
    Vo B, Hong TP, Le B (2012) DBV-Miner: A dynamic bit-vector approach for fast mining frequent closed itemsets. Expert Systems With Applications 39(8):7196–7206CrossRefGoogle Scholar
  31. 31.
    Wang W, Yang J (2005) Mining Sequential Patterns from Large Data Sets. Adv Database Syst 28:1–161CrossRefMATHGoogle Scholar
  32. 32.
    Wang J, Han J (2004) BIDE: Efficient Mining of Frequent Closed Sequences. In: ICDE ’04:79–90Google Scholar
  33. 33.
    Wanga CS, Lee AJT (2009) Mining Inter-Sequence Patterns. Expert Systems with Aplications 36 (4):8649–8658CrossRefGoogle Scholar
  34. 34.
    Weichbroth P, Owoc M, Pleszkun M (2012) Web User Navigation Patterns Discovery from WWW Server Log Files. FedCSIS’12:1177–1176Google Scholar
  35. 35.
    Yan X, Han J, Afshar R (2003) CloSpan: Mining Closed Sequential Patterns in Large Datasets. In: SDM’03:166–177Google Scholar
  36. 36.
    Yu KM, Wu SH (2011) An Efficient Load Balancing Multi-Core Frequent Patterns Mining Algorithm. In: TrustCom’11:1408–1412Google Scholar
  37. 37.
    Zaki J (2001a) SPADE: An Efficient Algorithm for Mining Frequent Sequences. Machine Learning Journal 42:31–60CrossRefMATHGoogle Scholar
  38. 38.
    Zaki J (2001b) Parallel Sequence Mining on Shared-Memory Machines. J Parallel Distrib Comput 61(3):401–426CrossRefMATHGoogle Scholar
  39. 39.
    Zaki J, Wang TL, Toivonen TT (2002) BIOKDD01: Workshop on Data Mining in Bioinformatics. In: ACM SIGKDD Explorations, 3(2):71–73Google Scholar
  40. 40.
    Zubi ZS, Raiani MSE (2014) Using Web Logs Dataset Via Web Mining for User Behavior Understanding. Int J Comput Comm 8:103–111Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Center for Applied Information TechnologyTon Duc Thang UniversityHo Chi Minh CityVietnam
  2. 2.Division of Data ScienceTon Duc Thang UniversityHo Chi Minh CityVietnam
  3. 3.Faculty of Information TechnologyTon Duc Thang UniversityHo Chi Minh CityVietnam
  4. 4.VŠB-Technical University of OstravaOstrava-PorubaCzech Republic

Personalised recommendations