Advertisement

Frequent Itemset Mining in High Dimensional Data: A Review

  • Fatimah Audah Md. ZakiEmail author
  • Nurul Fariza Zulkurnain
Conference paper
  • 1.1k Downloads
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 481)

Abstract

This paper provides a brief overview of the techniques used in frequent itemset mining. It discusses the search strategies used; i.e. depth first vs. breadth-first, and dataset representation; i.e. horizontal vs. vertical representation. In addition, it reviews many techniques used in several algorithms that make frequent itemset mining more efficient. These algorithms are discussed based on the proposed search strategies which include row-enumeration vs. column-enumeration, bottom-up vs. top-down traversal, and a number of new data structures. Finally, the paper reviews on the latest algorithms of colossal frequent itemset/pattern which currently is the most relevant to mining high-dimensional dataset.

Keywords

Data mining High-dimensional data 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

Acknowledgment

The authors would like to thank the Malaysian Government and International Islamic University Malaysia (IIUM) for the research grant under Fundamental Research Grant Scheme (FRGS) Research Project FRGS14-139-0380.

References

  1. 1.
    Adriaans, P. & Zantinge, D. (1996). Data Mining. Harlow, England: Addison Wesley Longman.Google Scholar
  2. 2.
    Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in largedatabases. In: Proceedings of the 1993ACM-SIGMOD international conference on managementDFGHJL’of data (SIGMOD’93), Washington, DC, pp 207–216.Google Scholar
  3. 3.
    Han, J., Kamber, M., and Pei, J. (2012). Data Mining: Concepts and Techniques. San Francisco: Morgan Kaufmann Publishers.Google Scholar
  4. 4.
    Clarke, R., Ressom, H. W., Wang, A., Xuan, J., Liu, M. C., Gehan, E. A., & Wang, Y. (2008). The properties of high-dimensional data spaces: implications for exploring gene and protein expression data. Nature Reviews. Cancer, 8(1), 37–49. http://doi.org/10.1038/nrc2294
  5. 5.
    Zulkurnain, Nurul Fariza (2012). DisClose: Discovering Colossal Closed Itemsets from High Dimensional Datasets via a Compact Row-Tree.Google Scholar
  6. 6.
    Han, J., Pei, J., and Yin, Y., (2000). Mining frequent patterns without candidate generation. Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pp. 1–12.Google Scholar
  7. 7.
    Zaki, M. J., (2000). Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering, 12(3), pp. 372-390.Google Scholar
  8. 8.
    Borgetl, C. (2012), WIREs Data Mining Knowl Discov 2012, 2: 437–456.Google Scholar
  9. 9.
    Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proceeding of the 7th international conference on database theory (ICDT’99), Jerusalem, Israel, pp 398–416.Google Scholar
  10. 10.
    Pei J, Han J, Mao R (2000) CLOSET: an efficient algorithm for mining frequent closed itemsets.In: Proceeding of the 2000 ACM-SIGMOD international workshop data mining and knowledge discovery (DMKD’00), Dallas, TX, pp 11–20.Google Scholar
  11. 11.
    Wang J, Han J, Pei J (2003) CLOSET+: searching for the best strategies for mining frequent closed itemsets. In: Proceeding of the 2003 ACM SIGKDD international conference on knowledge discovery and data mining (KDD’03), Washington, DC, pp 236–245.Google Scholar
  12. 12.
    Grahne G, Zhu J (2003)Efficiently using prefix-trees in mining frequent itemsets. In: Proceeding of the ICDM’03 international workshop on frequent itemset mining implementations (FIMI’03), Melbourne, FL, pp 123–132.Google Scholar
  13. 13.
    Zaki MJ, Hsiao CJ (2002) CHARM: an efficient algorithm for closed itemset mining. In: Proceeding of the 2002SIAMinternational conference on data mining (SDM’02),Arlington,VA, pp 457–473.Google Scholar
  14. 14.
    Liu G, Lu H, Lou W, Yu JX (2003) On computing, storing and querying frequent patterns. In: Proceeding of the 2003 ACM SIGKDD international conference on knowledge discovery and data mining (KDD’03), Washington, DC, pp 607–612.Google Scholar
  15. 15.
    Burdick D, Calimlim M, Gehrke J (2001) MAFIA: a maximal frequent itemset algorithm for transactional databases. In: Proceeding of the 2001 international conference on data engineering (ICDE’01), Heidelberg, Germany, pp 443–452.Google Scholar
  16. 16.
    Pan F, Cong G, Tung AKH, Yang J, Zaki M (2003) CARPENTER: finding closed patterns in long biological datasets. In: Proceeding of the 2003 ACMSIGKDD international conference on knowledge discovery and data mining (KDD’03),Washington, DC, pp 637–642.Google Scholar
  17. 17.
    R. J. Bayardo Jr, “Efficiently mining long patterns from databases,” in ACM Sigmod Record, 1998, vol. 27, no. 2, pp. 85–93.Google Scholar
  18. 18.
    R. Agrawal, C. Aggarwal, and V. Prasad, “Depth first generation of long patterns, “In SIGKDD, 2000.Google Scholar
  19. 19.
    D.-I. Lin and Z. M. Kedem, “Pincer-search: an efficient algorithm for discovering the maximum frequent set,” Knowledge and Data Engineering, IEEE Transactions on, vol. 14, no. 3, pp. 553–566, 2002.Google Scholar
  20. 20.
    Q. Zou, W. W. Chu, and B. Lu, “SmartMiner: A depth first algorithm guided by tail information for mining maximal frequent itemsets,” in Data Mining, 2002. ICDM 2003. Proceedings. 2002 IEEE International Conference on, 2002, pp. 570–577.Google Scholar
  21. 21.
    Hua-Fu Li, Suh -Yin Lee and Man-Kwan Shan, “Online mining (recently) maximal frequent itemsets over data streams,” Research Issues in Data Engineering: Stream Data Mining and Applications, 2005. RIDE-SDMA 2005. 15th International Workshop on , pp: 11-18, 3-4 April 2005.Google Scholar
  22. 22.
    K. Gouda and M. J. Zaki, “Genmax: An efficient algorithm formining maximal frequent itemsets,” Data Mining and Knowledge Discovery, vol. 11, no. 3, pp. 223–242, 2005.Google Scholar
  23. 23.
    T. Hu, S. Y. Sung, H. Xiong, and Q. Fu, “Discovery of maximum length frequent itemsets,” Information Sciences, vol. 178, no. 1, pp. 69–87, 2008.Google Scholar
  24. 24.
    Tran Anh Tai; Ngo Tuan Phong; Nguyen Kim Anh, “An Efficient Algorithm for Discovering Maximum Length Frequent Itemsets,” in Knowledge and Systems Engineering (KSE), 2011 Third International Conference on , vol., no., pp.62-69, 14-17 Oct. 2011.Google Scholar
  25. 25.
    M.Rajalakshmi,Dr.T.Purusothaman, Dr.R.Nedunchezhian, “Maximal Frequent Itemset Generation Using Segmentation Approach”, International Journal of Database Management Systems (IJDMS),Vol. 3, No.3, Aug 2011.Google Scholar
  26. 26.
    NVB Gangadhara Rao, Sirisha Aguru, “A Hash based Mining Algorithm for Maximal Frequent Item Sets using Double Hashing,” Journal of Advances in Computational Research: An International Journal Vol. 1 No. 1-2 (Jan-Dec, 2012).Google Scholar
  27. 27.
    P.C.S.Nagendra setty, D. Haritha, and Vedula Venkateswara Rao, “Improved Maximal Length Frequent Item Set Mining,” International Journal of Engineering and Innovative Technology (IJEIT) Volume 2, Issue 2, August 2012.Google Scholar
  28. 28.
    G. Vijay Kumar, Dr. V. Valli Kumari, “MaRFI: Maximal Regular Frequent Itemset Mining using a pair of Transactionids”, International Journal of Computer Science & Engineering Technology (IJCSET), ISSN: 2229-3345, Vol. 4, No. 07, Jul 2013.Google Scholar
  29. 29.
    G. Vijay Kumar, V. Valli Kumari, “IncMaRFI: Mining Maximal Regular Frequent Itemsets In Incremental Databases,” International Journal of Engineering Science and Technology (IJEST), ISSN: 0975- 5462 Vol. 5, No.08, Aug. 2013.Google Scholar
  30. 30.
    Jnanamurthy HK, Vishesh HV, Vishruth Jain, Preetham Kumar, Radhika M. Pai, “Discovery of Maximal Frequent Item Sets using Subset Creation,” International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol. 3, No. 1, Jan. 2013.Google Scholar
  31. 31.
    Maha Attia Hana, “MVEMFI: Visualizing and Extracting Maximal Frequent Itemsets,” Int. Journal of Engineering Research and Applications Vol. 3, Issue 5, pp.183-189, Sep-Oct 2013.Google Scholar
  32. 32.
    Wang J, Han J, Pei J. “CLOSET+: searching for the best strategies for mining frequent closed itemsets,” In: Proceedingsof the The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington, DC, USA: ACM Press, 2003;236–45.Google Scholar
  33. 33.
    Pan F, Cong G, Tung AK, et al. “Carpenter: finding closed patterns in long biological datasets,” In: Proceedings of the the Ninth SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington, DC, USA: ACM Press, 2003; 637–42.Google Scholar
  34. 34.
    Cong G, Tung AK, Xu X, et al. “FARMER: finding interesting rule groups in microarray datasets,” In: Proceedings of the ACM SIGMOD International Conference on Management of Data. Paris, France: ACM Press, 2004; 143–54.Google Scholar
  35. 35.
    Cong G, Tan K, Tung AK, et al. “Mining top-K covering rule groups for gene expression data,” In: Proceedings of the ACM SIGMOD International Conference on Management of Data. Baltimore, Maryland, USA: ACM Press, 2005; 670–81.Google Scholar
  36. 36.
    F. Pan, A. Tung, G. Cong, and X. Xu, “Cobbler: Combining column and row enumeration for closed pattern discovery,” in Scientific and Statistical Database Management, 2004. Proceedings. 16th International Conference on, 2004, pp. 21–30.Google Scholar
  37. 37.
    J. Wang, J. Han, Y. Lu, and P. Tzvetkov, “TFP: An efficient algorithm for mining top-k frequent closed itemsets,” Knowledge and Data Engineering, IEEE Transactions on, vol. 17, no. 5, pp. 652–663, 2005.Google Scholar
  38. 38.
    H. Liu, J. Han, D. Xin, and Z. Shao, “Mining frequent patterns from very high dimensional data: A top-down row enumeration approach,” in Proceeding of the 2006 SIAM international conference on data mining (SDM’06), Bethesda, MD, 2006, pp. 280–291.Google Scholar
  39. 39.
    H. Liu, X. Wang, J. He, J. Han, D. Xin, and Z. Shao, “Top-down mining of frequent closed patterns from very high dimensional data,” Information Sciences, vol. 179, no. 7, pp. 899–924, 2009.Google Scholar
  40. 40.
    McIntosh T, Chawla S. High confidence rule mining for microarray analysis. IEEE/ACM Trans Comput Biol Bioinform 2007; 4:611–23.Google Scholar
  41. 41.
    Zhu F, Yan X, Han J, et al. Mining colossal frequent patterns by core pattern fusion. In: Proceedings of the IEEE 23rd International Conference on Data Engineering. Istanbul, Turkey: ACM Press, 2007; 706–15.Google Scholar
  42. 42.
    Madhavi D. and Mogalla S., “An Efficient Approach to Colossal Pattern Mining,” IJCSNS International Journal of Computer Science and Network Security, VOL.10 No.1, January 2010.Google Scholar
  43. 43.
    Mohammad Karim Sohrabi and Ahmad Abdollahzadeh Barforoush, “Efficient colossal pattern mining in high dimensional datasets,” Knowledge-Based Systems, vol. 33, pp. 41–52, 2012.Google Scholar
  44. 44.
    K.Prasanna and M.Seetha, “A Doubleton Pattern Mining Approach for Discovering Colossal Patterns from Biological Dataset,” International Journal of Computer Applications, vol. 119, no.21, June 2015.Google Scholar
  45. 45.
    K.Prasanna and M.Seetha, “Efficient and Accurate Discovery of Colossal Pattern Sequences from Biological Datasets: A Doubleton Pattern Mining Strategy (DPMine),” Eleventh International Multi-Conference on Information Processing (IMCIP-2015), Procedia Computer Science, vol. 54, 2015, pp. 412 – 421.Google Scholar
  46. 46.
    Thanh-Long Nguyen, Bay Vo, and Vaclav Snasel. 2017. Efficient algorithms for mining colossal patterns in high dimensional databases. Know.-Based Syst. 122, C (April 2017), 75-89.Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  • Fatimah Audah Md. Zaki
    • 1
    Email author
  • Nurul Fariza Zulkurnain
    • 1
  1. 1.Department of Electrical and Computer EngineeringInternational Islamic University MalaysiaKuala LumpurMalaysia

Personalised recommendations