Advertisement

Distributed Lazy Association Classification Algorithm Based on Spark

  • Xueming LiEmail author
  • Chaoyang zhang
  • Guangwei Chen
  • Xiaoteng Sun
  • Qi Zhang
  • Haomin Yang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10086)

Abstract

The lazy association classification algorithms are inefficient when classifying multiple unclassified samples at the same time. The existing lazy association classification algorithms are sequential which can’t deal with the big data problems. To solve these problems, we propose a distributed lazy association classification algorithm based on Spark, named as SDLAC. Firstly, it clusters the unclassified samples by K-Means algorithm. Secondly, it executes distributed projections according to clustered results, and mines classification association rules by a distributed mining algorithm based on spark. Then it constructs classifier to classify unclassified samples. The experiments are conducted on the 5 UCI datasets and a big dataset from the first national college competition on cloud computing(China). The results show that SDLAC algorithm is more accurate than the CBA algorithm. Besides, its efficiency is far more than the typical distributed lazy association classification algorithm. In other words, the SDLAC algorithm can adapt big data environment.

Keywords

Distributed lazy association classification Big data Spark 

References

  1. 1.
    Thabtah, F.: A review of associative classification mining. Knowl. Eng. Rev. 22(1), 37–65 (2007)CrossRefGoogle Scholar
  2. 2.
    Veloso, A., Meira, W., Zaki, M.J.: Lazy association classification. In: 6th International Conference on Data Mining, pp. 645–654. IEEE (2006)Google Scholar
  3. 3.
    Neapolitan, R.E.: Learning Bayesian Networks. Prentice Hall, Upper Saddle River (2004)Google Scholar
  4. 4.
    Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)Google Scholar
  5. 5.
    Cristianini, N., Shawe, J.: An introduction to Support Vector Machines. In: Cambridge University Press (2000)Google Scholar
  6. 6.
    Liu, B., Hsu, W., Ma, Y.: Integrating classification and association rule mining. In: Proceeding of KDD, pp. 80–86 (1998)Google Scholar
  7. 7.
    Rules, C., Li, W., Han, J., et al.: CMAR: accurate and efficient classification based on multiple class-association rules. In: IEEE International Conference on Data Mining, ICDM, pp. 369–376. IEEE Computer Society (2001)Google Scholar
  8. 8.
    Xueming, L., Meng, F., Binfei, L.: Associative classification based on hybrid strategy. J. Comput. Appl. (Chinese) 30(3), 724–727 (2013)Google Scholar
  9. 9.
    Xueming, L., Xueming, L., Tao, Y.: Quantitative associative classification based on lazy method. J. Comput. Appl. (Chinese) 33(8), 2184–2187 (2013)Google Scholar
  10. 10.
    Yanyan, F.: Research on Distributed Mining of Association rules algorithm based on MapReduce. In: Harbin Engineering University (2013)Google Scholar
  11. 11.
    Yue, W.: The Method Research of Mining Association Rules in Distributed Environments. In: Chongqing University (2003)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Xueming Li
    • 1
    Email author
  • Chaoyang zhang
    • 2
  • Guangwei Chen
    • 2
  • Xiaoteng Sun
    • 2
  • Qi Zhang
    • 2
  • Haomin Yang
    • 2
  1. 1.Key Laboratory of Dependable Service Computing in Cyber Physical Society, Ministry of EducationChongqing UniversityChongqingChina
  2. 2.College of Computer ScienceChongqing UniversityChongqingChina

Personalised recommendations