Efficient DANNLO classifier for multi-class imbalanced data on Hadoop

Satyanarayana, S.; Tayar, Yerremsetty; Prasad, R. Siva Ram

doi:10.1007/s41870-018-0187-z

Efficient DANNLO classifier for multi-class imbalanced data on Hadoop

Original Research
Published: 19 April 2018

Volume 11, pages 321–329, (2019)
Cite this article

International Journal of Information Technology Aims and scope Submit manuscript

S. Satyanarayana¹,
Yerremsetty Tayar² &
R. Siva Ram Prasad²

235 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

In recent years, multi-class imbalance data classification is a major problem in big data. In such situations, we focused on developing a new Deep Artificial Neural Network Learning Optimization (DANNLO) Classifier for large collection of imbalanced data. In our proposed work, first the dataset reduction using principal component analysis for dimensionality reduction and initial centroid is computed. Then, parallel hierarchical pillar k-means clustering algorithm based on MapReduce is used to partitioning of an imbalanced data set into similar subset, which can improve the computational cost. The resultant clusters are given as input to the deep ANN for learning. In the next stage, deep neural network has been trained using the back propagation algorithm. In order to optimize the n-dimensional weight space, firefly optimization algorithm is used. Attractiveness and distance of each firefly is computed. Hadoop is used to handle these large volumes of variable size data. Imbalanced datasets is taken from ECDC (European Centre for Disease Prevention and Control) repository. The experimental results illustrated that the proposed method can significantly improve the effectiveness in classifying imbalanced data based on TP rate, F-measure, G-mean measures, confusion matrix, precision, recall, and ROC. The experimental results suggests that DANNLO classifier exceed other ordinary classifiers such as SVM and Random forest classifier on tested imbalanced data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

Article 09 November 2022

A Review on Random Forest: An Ensemble Classifier

Selecting critical features for data classification based on machine learning methods

Article Open access 23 July 2020

References

Hu H, Wen Y, Chua T-S, Li X (2014) Toward scalable systems for big data analytics: a technology tutorial. IEEE Access 2:652–687
Article Google Scholar
Wu X, Zhu X, Wu G-Q, Ding W (2014) Data mining with big data. IEEE Trans Knowl Data Eng 26(1):97–107
Article Google Scholar
Triguero I, Peralta D, B J, García S, Herrera F (2015) MRPR: a MapReduce solution for prototype reduction in big data classification. Neurocomputing 150:331–345 (Elsevier)
Article Google Scholar
Ou G, Murphey YL (2007) Multi-class pattern classification using neural networks. Pattern Recognit 40(1):4–18 (Elsevier)
Article MATH Google Scholar
Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv (CSUR) 34(1):1–47
Article MathSciNet Google Scholar
López V, Fernández A, García S, Palade V, Herrera F (2013) An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf Sci 250:113–141 (Elsevier)
Article Google Scholar
Sun Y, Kamel MS, Wong AK, Wang Y (2007) Cost-sensitive boosting for classification of imbalanced data. Pattern Recognit 40(12):3358–3378 (Elsevier)
Article MATH Google Scholar
Lee J, Lapira E, Bagheri B, Kao H-A (2013) Recent advances and trends in predictive manufacturing systems in big data environment. Manuf Lett 1(1):38–41 (Elsevier)
Article Google Scholar
Dubey R, Zhou J, Wang Y, Thompson PM, Ye J (2014) Analysis of sampling techniques for imbalanced data: an n = 648 ADNI study. NeuroImage 87:220–241
Article Google Scholar
Rokach L (2006) Decomposition methodology for classification tasks: a meta decomposer framework. Pattern Anal Appl 9(2):257–271 (Elsevier)
Article MathSciNet Google Scholar
Kumar CN, Rao KN, Govardhan A, Sandhya N (2015) Subset K-means approach for handling imbalanced-distributed data. In: Emerging ICT for bridging the future—proceedings of the 49th annual convention of the Computer Society of India CSI, Springer, vol 2, pp 497–508
Shim K (2012) MapReduce algorithms for big data analysis. Proc VLDB Endow 5(12):2016–2017 (ACM)
Article Google Scholar
Polat K, Güneş S (2009) A new feature selection method on classification of medical datasets: kernel F-score feature selection. Expert Syst Appl 36(7):10367–10373
Article Google Scholar
Mazurowski MA, Habas PA, Zurada JM, Lo JY, Baker JA, Tourassi GD (2008) Training neural network classifiers for medical decision making: the effects of imbalanced datasets on classification performance. Neural Netw 21(2):427–436 (Elsevier)
Article Google Scholar
Partovi FY, Anandarajan M (2002) Classifying inventory using an artificial neural network approach. Comput Ind Eng 41(4):389–404 (Elsevier)
Article Google Scholar
Zhou ZH, Liu XY (2006) Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans Knowl Data Eng 18(1):63–77
Article Google Scholar
Chen Y, Raab F, Katz R (2014) From tpc-c to big data benchmarks: a functional workload model. In: Specifying big data benchmarks, Springer, pp 28–43
Pal A, Agrawal S (2014) An experimental approach towards big data for analyzing memory utilization on a Hadoop cluster using HDFS and MapReduce. In: Networks & soft computing (ICNSC), IEEE, pp 442–447
Dittrich J, Quiané-Ruiz JA (2012) Efficient big data processing in Hadoop MapReduce. Proc VLDB Endow 5(12):2014–2015 (ACM)
Article Google Scholar
del Río S, López V, Benítez JM, Herrera F (2014) On the use of MapReduce for imbalanced big data using random forest. Inf Sci 285:112–137 (Elsevier)
Article Google Scholar
Krawczyk B, Galar M, Jeleń Ł, Herrera F (2016) Evolutionary under sampling boosting for imbalanced classification of breast cancer malignancy. Appl Soft Comput 38:714–726 (Elsevier)
Article Google Scholar
Ibarguren I, Pérez JM, Muguerza J, Gurrutxaga I, Arbelaitz O (2015) Coverage-based resampling: building robust consolidated decision trees. Knowl Based Syst 79:51–67 (Elsevier)
Article Google Scholar
Geiß C, Pelizari PA, Marconcini M, Sengara W, Edwards M, Lakes T, Taubenböck H (2015) Estimation of seismic building structural types using multi-sensor remote sensing and machine learning techniques. ISPRS J Photogramm Remote Sens 104:175–188 Elsevier
Article Google Scholar
Sun Z, Song Q, Zhu X, Sun H, Xu B, Zhou Y (2015) A novel ensemble method for classifying imbalanced data. Pattern Recognit 48(5):1623–1637 (Elsevier)
Article Google Scholar
Zhang J, Wong JS, Li T, Pan Y (2014) A comparison of parallel large-scale knowledge acquisition using rough set theory on different MapReduce runtime systems. Int J Approx Reason 55(3):896–907 (Elsevier)
Article Google Scholar
Nayak J, Naik B, Behera HS (2016) A novel nature inspired firefly algorithm with higher order neural network: performance analysis. Eng Sci Technol Int J 19(1):197–211
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation (K L University), Green Fields, Vaddeswaram, Guntur, AP, India
S. Satyanarayana
Department of Computer Science and Engineering, Acharya Nagarjuna University, Guntur, AP, India
Yerremsetty Tayar & R. Siva Ram Prasad

Authors

S. Satyanarayana
View author publications
You can also search for this author in PubMed Google Scholar
Yerremsetty Tayar
View author publications
You can also search for this author in PubMed Google Scholar
R. Siva Ram Prasad
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. Satyanarayana.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Satyanarayana, S., Tayar, Y. & Prasad, R.S.R. Efficient DANNLO classifier for multi-class imbalanced data on Hadoop. Int. j. inf. tecnol. 11, 321–329 (2019). https://doi.org/10.1007/s41870-018-0187-z

Download citation

Received: 27 May 2017
Accepted: 13 April 2018
Published: 19 April 2018
Issue Date: 04 June 2019
DOI: https://doi.org/10.1007/s41870-018-0187-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient DANNLO classifier for multi-class imbalanced data on Hadoop

Abstract

Access this article

Similar content being viewed by others

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

A Review on Random Forest: An Ensemble Classifier

Selecting critical features for data classification based on machine learning methods

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient DANNLO classifier for multi-class imbalanced data on Hadoop

Abstract

Access this article

Similar content being viewed by others

Imbalanced data preprocessing techniques for machine learning: a systematic mapping study

A Review on Random Forest: An Ensemble Classifier

Selecting critical features for data classification based on machine learning methods

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation