Abstract
To boost the performance level of big data, data partitioning is considered to be as the backbone of big data applications. In recent years, many researchers are focusing their work toward data science and analysis for real-time applications with the integration of big data. Human interaction with data partitioning of big data is quite time-consuming. So, it is needed to make the data partition elastic as well as scalable while handling a high workload under the distributed system. In this paper, a multi-objective fuzzy-swarm optimization algorithm is proposed for cluster-based data partitioning. This paper also provided an analytical result analysis of different optimization algorithms for data partitioning, i.e., reduction or clustering along with their limitations. This paper provides an approach to enhance the efficiency level for clustering large complex data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Prasad, B.R., Bendale, U.K., Agarwal, S.: Distributed feature selection using vertical partitioning for high dimensional data. In: International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 807–813 (2016)
Bolon canedo, V., Sanchez, N., Cervino, J.: Toward parallel feature selection from vertically partitioned data. ESANN (2014)
Bakshi, K.: Considerations for big data: architecture and approach. In: IEEE Aerospace Conference, pp. 1–7 (2012)
Chen, X., Xie, M.: A split-and-conquer approach for analysis of extraordinarily large data. Statistica Sinica 24(4), 1655–1684 (2014)
Agarwal, S., Mozafari, B., Panda, A., Milner, H., Madden, S., Stoica, I.: BlinkDB: queries with bounded errors and bounded response times on very large data. In: ACM European Conference on Computer Systems (EuroSys’13), Prague, Czech Republic, pp. 29–42 (2013)
Lazar, N.: The big picture: Divide and combine to conquer big data. Chance 31(1), 57–59 (2018)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Symposium on Operating System Design and Implementation (OSDI’04), pp. 137–150 (2004)
Singh, D., Reddy, C.K.: A survey on platforms for big data analytics. J. Big Data 2(1) (2014)
Jagadish, H.V., Gehrke, J., Labrinidis, A., Papakonstantinou, Y., Patel, J.M., Ramakrishnan, R., Shahabi, C.: Big data and its technical challenges. Commun. ACM 57(7), 86–94 (2014)
R. Nair.: Big data needs approximate computing: Technical perspective. Communications of the ACM, 58(1), 104–104 (2015)
Li, K., Li, G.: Approximate query processing: what is new and where to go? Data Sci. Eng. 3(4), 379–397 (2018)
Sagi, O., Rokach, L.: Ensemble learning: a survey. Data Mining Know. Discov. 8(4), 1–18 (2018)
Basiri, S., Ollila, E., Koivunen, V.: Robust, scalable, and fast bootstrap method for analyzing large scale data. IEEE Trans. Signal Process 64(4), 1007–1017 (2016)
Das, S., Agrawal, D., El Abbadi, A., Elastras.: An elastic transactional data store in the cloud. In: Conference on Hot Topics in Cloud Computing (HotCloud’09), San Diego, CA, USA, pp. 1–5 (2009)
Baker, J., Bond, C., Corbett, J., Furman, J., Khorlin, A., Larson, J., Leon, J.-M., Li, Y., Lloyd, A., Yushprakh, V.: Megastore: providing scalable, highly available storage for interactive services. In: Conference on Innovative Database Research (CIDR), Asilomar, CA, USA, pp. 223–234 (2011)
Kamal, J., Murshed, M., Buyya, R.: Workload-aware incremental repartitioning of shared-nothing distributed databases for scalable OLTP applications. Future Gener. Comput. Syst. 56, 421–435 (2016)
Huang, Y.-F., Lai, C.-J.: Integrating frequent pattern clustering and branch-and-bound approaches for data partitioning. Inf. Sci. 328, 288–301 (2016)
Phansalkar, S., Ahirrao, S.: Survey of data partitioning algorithms for big data stores. In: International Conference on Parallel, Distributed and Grid Computing (PDGC), pp. 163–168 (2016)
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: IEEE Symposium on Mass Storage Systems and Technologies (MSST), Incline Village, NV, USA, pp. 1–10 (2010)
Khan, M.A., Arshad, H., Nisar, W., Javed, M.Y., Sharif, M.: An integrated design of fuzzy C-Means and NCA-Based Multi-properties Feature Reduction for Brain Tumor Recognition. Signal and Image Processing Techniques for the Development of Intelligent Healthcare Systems, 1–28 (2020)
Siddiqi, U.F., Sait, S.M., Kaynak, O.: Genetic algorithm for the mutual information-based feature selection in univariate time series data. IEEE Access. 8, 9597–9609 (2020)
Kong, L., et al.: Distributed feature selection for big data using fuzzy rough sets. IEEE Trans. Fuzzy Syst. 28, 846–857 (2020)
Shaw, R.N., Walde, P., Ghosh, A.: IOT based MPPT for performance improvement of solar PV arrays operating under partial shade dispersion. In: 2020 IEEE 9th Power India International Conference (PIICON), SONEPAT, India, pp. 1–4 (2020). 10.1109/PIICON49524.2020.9112952
El-Hasnony, M., Barakat, S.I., Elhoseny, M., Mostafa, R.R.: Improved feature selection model for big data analytics. IEEE Access 8, 66989–67004 (2020)
Paul, S., Verma, J.K., Datta, A., Shaw, R.N., Saikia, A.: Deep learning and its importance for early signature of neuronal disorders. In: 2018 4th International Conference on Computing Communication and Automation (ICCCA), Greater Noida, India, pp. 1–5 (2018). https://doi.org/10.1109/ccaa.2018.8777527
Fong, S., Wong, R., Vasilakos, A.: Accelerated PSO swarm search feature selection for data stream mining big data. Serv. IEEE Trans. Comput. 9, 33–45 (2016)
Gu, S., Cheng, R., Jin, Y.: Feature selection for high-dimensional classification using a competitive swarm optimizer. Soft. Comput. 22, 811–822 (2018)
Yan, D., Cao, H., Yu, Y., Wang, Y., Yu, X.: Single-objective/multiobjective cat swarm optimization clustering analysis for data partition. In: IEEE Trans. Autom. Sci. Eng. 17(3). 1633–1646 (2020)
Wang, S., Eick, C.F.: MR-SNN: design of parallel shared nearest neighbor clustering algorithm using MapReduce. In: IEEE International Conference on Big Data Analysis (ICBDA), pp. 312–315 (2017)
Sangeetha, J., Prakash, V. S. J.: An efficient inclusive similarity based clustering (ISC) algorithm for big data. In: World Congress on Computing and Communication Technologies (WCCCT), pp. 84–88 (2017)
Barhanpurkar, K., Rajawat, A.S., Bedi, P., Mohammed, O.: Detection of sleep apnea & cancer mutual symptoms using deep learning techniques. In: 2020 Fourth International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), Palladam, India, pp. 821–828 (2020). https://doi.org/10.1109/i-smac49090.2020.9243488
Singh Rajawat, A., Jain, S.: Fusion deep learning based on back propagation neural network for personalization. In: 2nd International Conference on Data, Engineering and Applications (IDEA), Bhopal, India, pp. 1–7 (2020). https://doi.org/10.1109/idea49133.2020.9170693
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Goyal, S.B., Bedi, P., Rajawat, A.S., Shaw, R.N., Ghosh, A. (2022). Multi-objective Fuzzy-Swarm Optimizer for Data Partitioning. In: Bianchini, M., Piuri, V., Das, S., Shaw, R.N. (eds) Advanced Computing and Intelligent Technologies. Lecture Notes in Networks and Systems, vol 218. Springer, Singapore. https://doi.org/10.1007/978-981-16-2164-2_25
Download citation
DOI: https://doi.org/10.1007/978-981-16-2164-2_25
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-2163-5
Online ISBN: 978-981-16-2164-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)