Advertisement

Distributed Mining of Significant Frequent Colossal Closed Itemsets from Long Biological Dataset

  • Manjunath K. VanahalliEmail author
  • Nagamma Patil
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 940)

Abstract

Mining colossal itemsets have gained more attention in recent times. An extensive set of short and average sized itemsets do not confine complete and valuable information for decision making. But, the traditional itemset mining algorithms expend a gigantic measure of time in mining these little and average sized itemsets. Colossal itemsets are very significant for numerous applications including the field of bioinformatics and are influential during the decision making. The new mode of dataset known as long biological dataset was contributed by Bioinformatics. These datasets are high dimensional datasets, which are depicted by an expansive number of features (attributes) and a less number of rows (samples). Extracting huge amount of information and knowledge from high dimensional long biological dataset is a nontrivial task. The existing algorithms are computationally expensive and sequential in mining significant Frequent Colossal Closed itemsets (FCCI) from long biological dataset. Distributed computing is a good strategy to overcome the inefficiency of the existing sequential algorithm. The paper proposes a distributed computing approach for mining FCCI. The row enumerated mining search space is efficiently cut down by pruning strategy enclosed in Distributed Row Enumerated Frequent Colossal Closed Itemset Mining (DREFCCIM) algorithm. The proposed DREFCCIM algorithm is the first distributed algorithm to mine FCCI from long biological dataset. The experimental results demonstrate the efficient performance of the DREFCCIM algorithm in comparison to the current algorithms.

Keywords

Bioinformatics High dimensional dataset Colossal itemset Distributed computing Minimum support Minimum cardinality Long biological dataset 

References

  1. 1.
    Alves, R., Rodriguez-Baena, D.S., Aguilar-Ruiz, J.S.: Gene association analysis: a survey of frequent pattern mining from gene expression data. Briefings Bioinform. 11, 210–224 (2009)CrossRefGoogle Scholar
  2. 2.
  3. 3.
    Djenouri, Y., Djenouri, D., Belhadi, A., Cano, A.: Exploiting GPU and cluster parallelism in single scan frequent itemset mining. Inf. Sci. (2018)Google Scholar
  4. 4.
    Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM Sigmod Rec. 29, 1–12 (2000)CrossRefGoogle Scholar
  5. 5.
    Javed, A., Khokhar, A.: Frequent pattern mining on message passing multiprocessor systems. Distrib. Parallel Databases 16(3), 321–334 (2004)CrossRefGoogle Scholar
  6. 6.
    Lin, K.C., Liao, I.E., Chang, T.P., Lin, S.F.: A frequent itemset mining algorithm based on the principle of inclusion-exclusion and transaction mapping. Inf. Sci. 276, 278–289 (2014)CrossRefGoogle Scholar
  7. 7.
    Liu, H., Han, J., Xin, D., Shao, Z.: Mining frequent patterns on very high dimensional data: a topdown row enumeration approach. In: Proceeding of the 2006 SIAM International Conference on Data Mining (SDM 2006), Bethesda, MD, pp. 280–291. SIAM (2006)Google Scholar
  8. 8.
    Liu, H., Wang, X., He, J., Han, J., Xin, D., Shao, Z.: Top-down mining of frequent closed patterns from very high dimensional data. Inf. Sci. 179(7), 899–924 (2009)CrossRefGoogle Scholar
  9. 9.
    Lucchese, C., Orlando, S., Perego, R.: Parallel mining of frequent closed patterns: harnessing modern computer architectures. In: Seventh IEEE International Conference on Data Mining, ICDM 2007, pp. 242–251. IEEE (2007)Google Scholar
  10. 10.
    Naulaerts, S., Meysman, P., Bittremieux, W., Vu, T.N., Berghe, W.V., Goethals, B., Laukens, K.: A primer to frequent itemset mining for bioinformatics. Briefings Bioinform. 16(2), 216–231 (2015)CrossRefGoogle Scholar
  11. 11.
    Negrevergne, B., Termier, A., Méhaut, J.F., Uno, T.: Discovering closed frequent itemsets on multicore: parallelizing computations and optimizing memory accesses. In: 2010 International Conference on High Performance Computing and Simulation (HPCS), pp. 521–528. IEEE (2010)Google Scholar
  12. 12.
    Negrevergne, B., Termier, A., Rousset, M.C., Méhaut, J.F.: Para miner: a generic pattern mining algorithm for multi-core architectures. Data Min. Knowl. Discov. 28(3), 593–633 (2014)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Pan, F., Tung, A.K., Cong, G., Xu, X.: Cobbler: combining column and row enumeration for closed pattern discovery. In: 16th International Conference on Scientific and Statistical Database Management, Proceedings, pp. 21–30. IEEE (2004)Google Scholar
  14. 14.
    Sohrabi, M.K., Barforoush, A.A.: Efficient colossal pattern mining in high dimensional datasets. Knowl.-Based Syst. 33, 41–52 (2012)CrossRefGoogle Scholar
  15. 15.
    Song, W., Yang, B., Xu, Z.: Index-BitTableFI: an improved algorithm for mining frequent itemsets. Knowl.-Based Syst. 21(6), 507–513 (2008)CrossRefGoogle Scholar
  16. 16.
    Tanbeer, S.K., Ahmed, C.F., Jeong, B.S., Lee, Y.K.: Efficient single-pass frequent pattern mining using a prefix-tree. Inf. Sci. 179(5), 559–583 (2009)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Vo, B., Hong, T.P., Le, B.: DBV-miner: a dynamic bit-vector approach for fast mining frequent closed itemsets. Expert Syst. Appl. 39(8), 7196–7206 (2012)CrossRefGoogle Scholar
  18. 18.
    Wang, J., Han, J., Pei, J.: Closet+: searching for the best strategies for mining frequent closed itemsets. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 236–245. ACM (2003)Google Scholar
  19. 19.
    Xun, Y., Zhang, J., Qin, X.: Fidoop: parallel mining of frequent itemsets using mapreduce. IEEE Trans. Syst. Man Cybern. Syst. 46(3), 313–325 (2016)CrossRefGoogle Scholar
  20. 20.
    Yu, K.M., Zhou, J.: Parallel TID-based frequent pattern mining algorithm on a PC cluster and grid computing system. Expert Syst. Appl. 37(3), 2486–2494 (2010)CrossRefGoogle Scholar
  21. 21.
    Zaki, M.J., Hsiao, C.J.: Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans. Knowl. Data Eng. 17(4), 462–478 (2005)CrossRefGoogle Scholar
  22. 22.
    Zhu, F., Yan, X., Han, J., Yu, P.S., Cheng, H.: Mining colossal frequent patterns by core pattern fusion. In: IEEE 23rd International Conference on Data Engineering, ICDE 2007, pp. 706–715. IEEE (2007)Google Scholar
  23. 23.
    Zulkurnain, N.F., Haglin, D.J., Keane, J.A.: Disclose: discovering colossal closed itemsets via a memory efficient compact row-tree. In: Emerging Trends in Knowledge Discovery and Data Mining, pp. 141–156. Springer (2012)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Department of Information TechnologyNational Institute of Technology KarnatakaMangaloreIndia

Personalised recommendations