Big data classification using deep learning and apache spark architecture

Brahmane, Anilkumar V.; Krishna, B. Chaitanya

doi:10.1007/s00521-021-06145-w

Big data classification using deep learning and apache spark architecture

Original Article
Published: 07 July 2021

Volume 33, pages 15253–15266, (2021)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

436 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

The oddity in large information is rising step by step so that the current programming instruments faces trouble in supervision of huge information. Moreover, the pace of the irregularity information in the immense datasets is a key imperative to the exploration business. Along these lines, this paper proposes a novel method for taking care of the large information utilizing Spark structure. The proposed method experiences two stages for arranging the enormous information, which includes highlight choice and arrangement, which is acted in the underlying hubs of Spark engineering. The proposed improvement calculation is named Rider Chaotic Biography streamlining (RCBO) calculation, which is the incorporation of the Rider Optimization Algorithm (ROA) and the standard confused biogeography-based-advancement (CBBO). The proposed RCBO-profound stacked auto-encoder utilizing Spark structure successfully handles the large information for achieving powerful huge information arrangement. Here, the proposed RCBO is utilized for choosing reasonable highlights from the monstrous dataset. Besides, the profound stacked auto-encoder utilizes RCBO for preparing so as to characterize colossal datasets. In this research we focused on problem of supervision related to big information of The Cover type Data in UCI machine learning repository. The dataset describes the forest cover set data to predict the forest cover type from cartographic variables. The dataset is multivariate in nature with number of web hits 263,361. The number of instances is 581012 with 54 numbers of attributes and the task associated for the dataset is classification. The examination of the proposed RCBO-profound stacked auto-encoder-based Spark structure utilizing the UCI AI datasets uncovered that the proposed technique beat different strategies, by procuring maximal exactness of 86.71%, dice coefficient of 92.7%, affectability of 75.2% and explicitness of 95.4% separately.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Integrating BIM and AI for Smart Construction Management: Current Status and Future Directions

Article 03 November 2022

Yue Pan & Limao Zhang

Trends and Future Perspective Challenges in Big Data

Big Data Analytics: A Literature Review Paper

References

Ramírez-Gallego S, Krawczyk B, García S, Woźniak M, Benítez JM, Herrera F (2017) Nearest neighbor classification for high-speed big data streams using spark. IEEE Trans Syst Man Cybern: Syst 47(10):2727–2739
Article Google Scholar
Duan M, Li K, Liao X, Li K (2018) A parallel multi classification algorithm for big data using an extreme learning machine. IEEE Trans Neural Netw Learn Syst 29(6):2337–2351
Article MathSciNet Google Scholar
Elsebakhi E, Lee F, Schendel E, Haque A, Kathireason N, Pathare T, Syed N, Al-Ali R (2015) Large-scale machine learning based on functional networks for biomedical big data with high performance computing platforms. J Comput Sci 11:69–81
Article MathSciNet Google Scholar
Lin W, Wu Z, Lin L, Wen A, Li J (2017) An Ensemble Random Forest Algorithm for Insurance Big Data Analysis. IEEE Access 5:16568–16575
Article Google Scholar
Hernández ÁB, Perez MS, Gupta S, Muntés-Mulero V (2018) Using machine learning to optimize parallelism in big data applications. Futur Gener Comput Syst 86:1076–1092
Article Google Scholar
Ramírez-Gallego S, García S, Benítez JM, Herrera F (2018) A distributed evolutionary multivariate discretizer for big data processing on apache spark. Swarm Evol Comput 38:240–250
Article Google Scholar
Karim MR, Cochez M, Beyan OD, Ahmed CF, Decker S (2018) Mining maximal frequent patterns in transactional databases and dynamic data streams: a spark-based approach. Inf Sci 432:278–300
Article MathSciNet Google Scholar
Salloum S, Dautov R, Chen X, Peng PX, Huang JZ (2016) Big data analytics on Apache Spark. Int J Data Sci Anal 1:145–164
Article Google Scholar
Zhao B, Zhou H, Li G, Huang Y (2018) ZenLDA: Large-scale topic model training on distributed data-parallel platform. Big Data Min Anal 1(1):57–74
Article Google Scholar
J. Yan, Y. Meng, L. Lu and C. Guo, Big-data-driven based intelligent prognostics scheme in industry 4.0 environment, 2017 Prognostics and System Health Management Conference (PHM-Harbin), Harbin, pp. 1–5, 2017.
K. Zhang, Y. Tanimura, H. Nakada and H. Ogawa, Understanding and improving disk-based intermediate data caching in Spark, 2017 IEEE International Conference on Big Data (Big Data), Boston, MA, pp. 2508–2517, 2017.
S. Caíno-Lores, J. Carretero, B. Nicolae, O. Yildiz and T. Peterka, "Spark-DIY: A Framework for Interoperable Spark Operations with High Performance Block-Based Data Models," 2018 IEEE/ACM 5th International Conference on Big Data Computing Applications and Technologies (BDCAT), Zurich, pp. 1–10, 2018.
G. Ditzler, S. Hariri and A. Akoglu, High Performance Machine Learning (HPML) Framework to Support DDDAS Decision Support Systems: Design Overview, 2017 IEEE 2nd International Workshops on Foundations and Applications of Self* Systems (FAS*W), Tucson, AZ, pp. 360–362, 2017.
S. Ekanayake, S. Kamburugamuve, P. Wickramasinghe and G. C. Fox, Java thread and process performance for parallel machine learning on multicore HPC clusters, 2016 IEEE International Conference on Big Data (Big Data), Washington, DC, pp. 347–354, 2016.
J. Fu, J. Sun and K. Wang, SPARK—A Big Data Processing Platform for Machine Learning, 2016 International Conference on Industrial Informatics-Computing Technology, Intelligent Technology, Industrial Information Integration (ICIICII), Wuhan, pp. 48–51, 2016.
A. Gupta, H. K. Thakur, R. Shrivastava, P. Kumar and S. Nag, A Big Data Analysis Framework Using Apache Spark and Deep Learning, 2017 IEEE International Conference on Data Mining Workshops (ICDMW), New Orleans, LA, pp. 9–16, 2017.
A. T. Hadgu, A. Nigam and E. Diaz-Aviles, Large-scale learning with AdaGrad on Spark, 2015 IEEE International Conference on Big Data (Big Data), Santa Clara, CA, pp. 2828–2830, 2015.
Z. Han and Y. Zhang, Spark: A Big Data Processing Platform Based on Memory Computing, 2015 Seventh International Symposium on Parallel Architectures, Algorithms and Programming (PAAP), Nanjing, pp. 172–176, 2015.
K. Kato, A. Takefusa, H. Nakada and M. Oguchi, Consideration of parallel data processing over an apache spark cluster, 2017 IEEE International Conference on Big Data (Big Data), Boston, MA, pp. 4757–4759, 2017.
A. Koliopoulos, P. Yiapanis, F. Tekiner, G. Nenadic and J. Keane, A Parallel Distributed Weka Framework for Big Data Mining Using Spark, 2015 IEEE International Congress on Big Data, New York, NY, pp. 9–16, 2015.
S. N. Lighari and D. M. A. Hussain, Testing of algorithms for anomaly detection in Big data using apache spark, 2017 9th International Conference on Computational Intelligence and Communication Networks (CICN), Girne, pp. 97–100, 2017.
J. Lv, B. Wu, C. Liu and X. Gut, PF-Face: A Parallel Framework for Face Classification and Search from Massive Videos Based on Spark, 2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM), Xi'an, pp. 1–7, 2018.
M. A. Rahman, J. Hossen and V. C, SMBSP: A Self-Tuning Approach using Machine Learning to Improve Performance of Spark in Big Data Processing, 2018 7th International Conference on Computer and Communication Engineering (ICCCE), Kuala Lumpur, pp. 274–279, 2018.
A. Sheshasaayee and J. V. N. Lakshmi, An insight into tree based machine learning techniques for big data analytics using Apache Spark, 2017 International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT), Kannur, pp. 1740–1743, 2017.
S. Srivastava, A. Nigam and R. Kumari, Work-in-Progress: Towards Efficient and Scalable Big Data Analytics: Mapreduce vs. RDD’s, 2017 International Conference on Information Technology (ICIT), Bhubaneswar, pp. 272–275, 2017.
UCI machine learning repository, https://archive.ics.uci.edu/ml/datasets/covertype, Accessed on February 2019.
Binu D, Kariyappa BS (2019) RideNN: a new rider optimization algorithm-based neural network for fault diagnosis in analog circuits. IEEE Trans Instrum Meas 68(1):2–26
Article Google Scholar
Wang J-S, Song J-D (2017) Chaotic biogeography-based optimisation (CBBO) algorithm. IAENG Int J Comput Sci 44(2):24
Google Scholar
Jayapriya, K., & Mary, N. A. B, Employing a novel 2-gram subgroup intra pattern (2GSIP) with stacked auto encoder for membrane protein classification, Molecular Biology Reports, 2019.
Liu, G., Bao, H. and Han, B., A stacked autoencoder-based deep neural network for achieving gearbox fault diagnosis, Mathematical Problems in Engineering, 2018.
Bobe A, Nicola A, Popa C (2015) Weaker hypotheses for the genral projection algorithm with corrections An. St. Uni. “ Ovidius. Constanta-Seria Mathematica 23(3):9–16. https://doi.org/10.1515/auom-2015-0043
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Vaddeswaram, AP, India
Anilkumar V. Brahmane & B. Chaitanya Krishna

Authors

Anilkumar V. Brahmane
View author publications
You can also search for this author in PubMed Google Scholar
B. Chaitanya Krishna
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anilkumar V. Brahmane.

Ethics declarations

Conflict of interest

We declare that there is no any financial relationship with any organization or funding agencies related to this article. Also no any research grant is available for this research work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Brahmane, A.V., Krishna, B.C. Big data classification using deep learning and apache spark architecture. Neural Comput & Applic 33, 15253–15266 (2021). https://doi.org/10.1007/s00521-021-06145-w

Download citation

Received: 08 February 2020
Accepted: 21 May 2021
Published: 07 July 2021
Issue Date: November 2021
DOI: https://doi.org/10.1007/s00521-021-06145-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Big data classification using deep learning and apache spark architecture

Abstract

Access this article

Similar content being viewed by others

Integrating BIM and AI for Smart Construction Management: Current Status and Future Directions

Trends and Future Perspective Challenges in Big Data

Big Data Analytics: A Literature Review Paper

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Big data classification using deep learning and apache spark architecture

Abstract

Access this article

Similar content being viewed by others

Integrating BIM and AI for Smart Construction Management: Current Status and Future Directions

Trends and Future Perspective Challenges in Big Data

Big Data Analytics: A Literature Review Paper

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation