Advertisement

Hybrid Parallel Linguistic Fuzzy Rules with Canopy MapReduce for Big Data Classification in Cloud

  • V. VennilaEmail author
  • A. Rajiv Kannan
Article
  • 21 Downloads

Abstract

With the increasing availability of large amount of information and the benefits related to data processing, big data have gained large significance in recent years. With scalable nature of data, big data applications are processed using MapReduce programming model. However, the application of rule-based models in datasets is not straightforward and big data are not classified in an efficient manner. To overcome the above-mentioned problems, parallel linguistic fuzzy rule with canopy MapReduce (LFR-CM) framework is introduced. LFR-CM framework classifies big data using canopy MapReduce function for information sharing in cloud with higher classification accuracy and lesser time consumption. It comprises three steps for efficient classification in cloud environment. Initially, it constructs the fuzzy knowledge base (KB) from the big data training set where linguistic fuzzy rules are constructed. The second step in LFR-CM framework has three operations. The first operation is map function used in parallel manner through every cloud user without transmitting any data to other cloud user nodes. The second operation is processing of data through the map function across all additional cloud user nodes. The third operation is reduce function deployed by each cloud user through the partitioned information. Finally, by this way, the data classification is performed with higher classification accuracy and lesser time consumption. LFR-CM framework is implemented and evaluated on Amazon EC2 cloud big data datasets and compared with the other classification system that utilizes MapReduce in terms of the runtime, classification time, classification accuracy and input/output cost. Based on the results observed from the study, LFR-CM framework is more efficient than the existing methods.

Keywords

Big data Cloud environment MapReduce Linguistic fuzzy rules Canopy fuzzy MapReduce 

List of Symbols

\(\hbox{CS}\)

Cloud servers

\(\hbox{CU}\)

Cloud users

\(R_{i}\)

Fuzzy rules

\(P_{i}^{1}\)

Antecedent fuzzy set

\(C_{i}\)

Class label

\(\hbox{RW}_{i}\)

Rule weight

\(a_{p}\)

Membership function

\(C_{\rm mn}\)

Cloud master node

\(\hbox{MAP}\)

Map function

\(\hbox{FM}_{\text{i}}\)

Mapping threshold factor

\({\text{DS}}_{\rm{i}}\)

Training set

\({\text{CT}}\)

Classification time

\({\text{A}}_{\rm{i}}\)

Classification accuracy

\({\text{DCC}}\)

Data correctly classified

N

Number of data

\(\hbox{KB}\)

Knowledge base

n

Number of instances

\(C_{\rm wn}\)

Cloud worker nodes

References

  1. 1.
    Ayma, V.A., Ferreira, R.S., Happ, P., Oliveira, D., Feitosa, R., Costa, G., Gamba, P.: Classification algorithms for big data analysis, a MapReduce approach. Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci. 40(3), 17–21 (2015)Google Scholar
  2. 2.
    Cao, J., Cui, H., Shi, H., Jiao, L.: Big data: a parallel particle swarm optimization-back-propagation neural network algorithm based on mapreduce. PloS One 11(6), e0157551 (2015)Google Scholar
  3. 3.
    Chandak, M.B.: Role of big-data in classification and novel class detection in data streams. J. Big Data 3(1), 5 (2015)Google Scholar
  4. 4.
    Del Río, S., López, V., Benítez, J.M., Herrera, F.: On the use of MapReduce for imbalanced big data using random forest. Inf. Sci. 285, 112–137 (2014)Google Scholar
  5. 5.
    Gao, F., Mei, J., Sun, J., Wang, J., Yang, E., Hussain, A.: A novel classification algorithm based on incremental semi-supervised support vector machine. PloS One 10(8), e0135709 (2015)Google Scholar
  6. 6.
    Bhadani, A., Jothimani, D.: Big data: challenges, opportunities, and realities. Eff. Big Data Manag. Oppor. Implement. 1–24 (2017)Google Scholar
  7. 7.
    Ishibuchi, H., Yamamoto, T.: Rule weight specification in fuzzy rule-based classification systems. IEEE Trans. Fuzzy Syst. 13(4), 428–435 (2005)Google Scholar
  8. 8.
    Kamal, S., Ripon, S.H., Dey, N., Ashour, A.S., Santhi, V.: A MapReduce approach to diminish imbalance parameters for big deoxyribonucleic acid dataset. Comput. Methods Programs Biomed. 131, 191–206 (2016)Google Scholar
  9. 9.
    Kashyap, H., Ahmed, H.A., Hoque, N., Roy, S., Bhattacharyya, D.K.: Big data analytics in bioinformatics: a machine learning perspective. arXiv preprint arXiv:1506.05101 (2015)
  10. 10.
    Li, L., Xu, J., Xiao, W., Ge, B.: Behavior based social dimensions extraction for multi-label classification. PLoS One 11(4), e0152857 (2016)Google Scholar
  11. 11.
    Liu, H., Gegov, A., Stahl, F.: J-measure based hybrid pruning for complexity reduction in classification rules. WSEAS Trans. Syst. 12(9), 433–446 (2013)Google Scholar
  12. 12.
    Olshannikova, E., Ometov, A., Koucheryavy, Y., Olsson, T.: Visualizing big data with augmented and virtual reality: challenges and research agenda. J. Big Data 2(1), 1–27 (2015)Google Scholar
  13. 13.
    Peng, X., Liu, C.: Algorithms for neutrosophic soft decision making based on EDAS, new similarity measure and level soft set. J. Intell. Fuzzy Syst. 32(1), 955–968 (2017)zbMATHGoogle Scholar
  14. 14.
    Peng, X., Selvachandran, G.: Pythagorean fuzzy set: state of the art and future directions. Artif. Intell. Rev. (2017).  https://doi.org/10.1007/s10462-017-9596-9 Google Scholar
  15. 15.
    Peng, X., Yang, Y.: Algorithms for interval-valued fuzzy soft sets in stochastic multi-criteria decision making based on regret theory and prospect theory with combined weight. Appl. Soft Comput. 54, 415–430 (2017)Google Scholar
  16. 16.
    Peng, X., Yang, Y.: Some results for pythagorean fuzzy sets. Int. J. Intell. Syst. 30(11), 1133–1160 (2015)Google Scholar
  17. 17.
    Pramanik, T., Samanta, S., Pal, M., Mondal, S., Sarkar, B.: Interval-valued fuzzy ϕ-tolerance competition graphs. Springer 5, 1–19 (2016)Google Scholar
  18. 18.
    Preoţiuc-Pietro, D., Volkova, S., Lampos, V., Bachrach, Y., Aletras, N.: Studying user income through language, behaviour and affect in social media. PLoS One 10(9), e0138717 (2015)Google Scholar
  19. 19.
    Rahman, M.N., Esmailpour, A.: A hybrid data center architecture for big data. Big Data Res. 3, 29–40 (2016)Google Scholar
  20. 20.
    Razzaghi, T., Roderick, O., Safro, I., Marko, N.: Multilevel weighted support vector machine for classification on healthcare data with missing values. PLoS ONE 11(5), e0155119 (2016)Google Scholar
  21. 21.
    Samanta, S., Sarkar, B.: Generalized fuzzy Euler graphs and generalized fuzzy Hamiltonian graphs. J. Intell. Fuzzy Syst. 35(3), 3413–3419 (2018)Google Scholar
  22. 22.
    Samanta, S., Sarkar, B.: Representation of competitions by generalized fuzzy graphs. Int. J. Comput. Intell. Syst. 11(1), 1005–1015 (2018)Google Scholar
  23. 23.
    Samanta, S., Pramanik, T., Sarkar, B., Pal, M.: Fuzzy φ-tolerance competition graphs. Soft. Comput. 21(13), 3723–3734 (2017)zbMATHGoogle Scholar
  24. 24.
    Sarkar, B., Samanta, S.: Generalized fuzzy trees. Int. J. Comput. Intell. Syst. 10(1), 711–720 (2017)Google Scholar
  25. 25.
    Sarkar, B., Mahapatra, A.S.: Periodic review fuzzy inventory models with variable lead time and fuzzy demand. Int. Trans. Oper. Res. 24(5), 1197–1227 (2017)MathSciNetzbMATHGoogle Scholar
  26. 26.
    Singh, D., Roy, D., Mohan, C.K.: DiP-SVM: distribution preserving kernel support vector machine for big data. IEEE Trans. Big Data 3(1), 79–90 (2017)Google Scholar
  27. 27.
    Soni, H.N., Sarkar, B., Joshi, M.: Demand uncertainty and learning in fuzziness in a continuous review inventory model. J. Intell. Fuzzy Syst. 33(4), 2595–2608 (2017)zbMATHGoogle Scholar
  28. 28.
    Souliotis, K., Kani, C., Papageorgiou, M., Lionis, D., Gourgoulianis, K.: Using big data to assess prescribing patterns in Greece: the case of chronic obstructive pulmonary disease. PLoS ONE 11(5), e0154960 (2016)Google Scholar
  29. 29.
    Sug, H.: Applying randomness effectively based on random forests for classification task of datasets of insufficient information. J. Appl. Math. 2012, 13 (2012)zbMATHGoogle Scholar
  30. 30.
    Suthaharan, S.: Machine learning models and algorithms for big data classification, vol. 36. Springer, Boston (2016)zbMATHGoogle Scholar
  31. 31.
    Tcheng, D.K., Nayak, A.K., Fowlkes, C.C., Punyasena, S.W.: Visual recognition software for binary classification and its application to spruce pollen identification. PLoS ONE 11(2), e0148879 (2016)Google Scholar
  32. 32.
    Triguero, I., Peralta, D., Bacardit, J., García, S., Herrera, F.: MRPR: a MapReduce solution for prototype reduction in big data classification. Neurocomputing 150, 331–345 (2015)Google Scholar
  33. 33.
    Wu, C.J., Ku, C.F., Ho, J.M., Chen, M.S.: A novel pipeline approach for efficient big data broadcasting. IEEE Trans. Knowl. Data Eng. 28(1), 17–28 (2016)Google Scholar
  34. 34.
    Yang, C., Huang, Q., Li, Z., Liu, K., Hu, F.: Big data and cloud computing: innovation opportunities and challenges. Int. J. Digit. Earth 10(1), 13–53 (2017)Google Scholar
  35. 35.
    Yun, X., Wu, G., Zhang, G., Li, K., Wang, S.: FastRAQ: a fast approach to range-aggregate queries in big data environments. IEEE Trans. Cloud Comput. 3(2), 206–218 (2015)Google Scholar

Copyright information

© Taiwan Fuzzy Systems Association 2019

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringK.S.R. College of EngineeringTiruchengodeIndia

Personalised recommendations