Abstract
The evolution and advancements in Information and Communication Technologies (ICT) have enabled large scale distributed computing with a huge chunk of applications for massive number of users. This has obviously generated large volumes of data, thus severely burdening the processing capacity of computers as well as the inflexible traditional networks. State-of-the-art methods for addressing datacenter level performance fixes are yet found wanting for sufficiently addressing this huge processing, storage, and network movement with proprietary protocols for this voluminous data. In this chapter the works have focused on addressing the backend server performance through effective reducer placement, intelligent compression policy, handling slower tasks and in-network performance boosting techniques through effective traffic engineering, traffic classification, topology discovery, energy minimization and load balancing in datacenter-oriented applications. Hadoop, the defacto standard in distributed big data storage and processing, has been designed to store data with its Hadoop Distributed File System (HDFS) and processing engine MapReduce large datasets reliably. However, the processing performance of Hadoop is critically dependent on the time taken to transfer data during the shuffle generated during MapReduce. Also, during concurrent execution of tasks, slower tasks need to be properly identified and efficiently handled to improve the completion time of jobs. To overcome these limitations, three contributions have been made; (i) Compression of generated map outputs at a suitable time when all the map tasks are yet to be completed to shift the load of network onto the CPU; (ii) Placing the reducer onto the nodes where the computation done is highest based on a couple of counters, one maintained at the rack level and another at node level, to minimize the run-time data copying; and (iii) Placing the slower map tasks onto the nodes where the computation done is highest and network is handled by prioritizing. Software defined networking (SDN) has been a boon for next generation networking owing to the separation of control plane from the data plane. It has the capability to address the network requirements in a timely manner by setting flows for every to and fro data movement and gathering large network statistics at the controller to make informed decisions about the network. A core issue in the network for the controller is traffic classification, which can substantially assist SDN controllers towards efficient routing and traffic engineering decisions. This chapter presents a traffic classification scheme utilizing three classifiers namely Feed-forward Neural Network (FFNN), Logistic Regression (LR), Naïve Bayes and employing Particle Swarm Optimization (PSO) for improved traffic classification with less overhead and without overlooking the key Quality of Service (QoS) criterion. Also lowering energy minimization and link utilization has been an important criterion for lowering the operating cost of the network and effectively utilizing the network. This issue has been addressed in the chapter by formulating a multi-objective problem while simultaneously addressing the QoS constraints by proposing a metaheuristic, since no polynomial solution exists and hence an evolutionary based metaheuristic (Clonal Selection) based energy optimization scheme, namely, Clonal Selection Based Energy Minimization (CSEM) has been devised. The obtained results show the efficacy of the proposed traffic classification scheme and CSEM based solution as compared with the state-of-the-art techniques. SDN has been a promising newer network paradigm but security issues and expensive capital procurement of SDN limit its full deployment hence moving to a hybrid SDN (h-SDN) deployment is only logical moving forward. The usage of both centralized and decentralized paradigms in h-SDN with intrinsic issues of interoperability poses challenges to key issues of topology gathering by the controller for proper allocation of network resources and traffic engineering for optimum network performance. State-of-the-art protocols for topology gathering, such as Link Layer Discovery Protocol (LLDP) and Broadcast Domain Discovery Protocol (BDDP) require a huge number of messages and such schemes only gather link information of SDN devices leaving out legacy switches’ (LS) links which results in sub-optimal performance. This chapter provides novel schemes which unearth topology discovery by requiring fewer messages and gathering link information of all the devices in both single and multi-controller environments (might be used when scalability issue is prevalent in h-SDN). Traffic engineering problems in h-SDN are addressed by proper placement of SDN nodes in h-SDN by utilizing the analyzing key criterion of traffic details and the degree of a node while lowering the link utilization in real-time topologies. The results of the proposed schemes for topology discovery and SDN node placement demonstrate the merits as compared with the state-of-the-art protocols.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
White, T.: Hadoop: The definitive guide. O'Reilly Media, Inc. (2012)
Hussain, M.W., Roy, D.S.: A counter-based profiling scheme for improving locality through data and reducer placement. In: Advances in Machine Learning for Big Data Analysis, pp. 101–118. Springer, Singapore. (2022)
Chen, Q., Liu, C., Xiao, Z.: Improving MapReduce performance using smart speculative execution strategy. IEEE Trans. Comput. 63(4), 954–967 (2013)
Ashu, A., Hussain, M.W., Sinha Roy, D., Reddy, H.K.: Intelligent data compression policy for Hadoop performance optimization. In: International Conference on Soft Computing and Pattern Recognition, pp. 80–89. Springer, Cham (2019). (Dec 2019)
Hammoud, M., Sakr, M.F.: Locality-aware reduce task scheduling for MapReduce. In: 2011 IEEE Third International Conference on Cloud Computing Technology and Science, pp. 570–576. IEEE (2011). (Nov 2011)
Singh, A.P., Hemant Kumar, G., Paik, S.S., Sinha Roy, D.: Storage and analysis of Synchrophasor data for event detection in Indian power system using Hadoop ecosystem. In: Data and Communication Networks, pp. 291–304. Springer, Singapore (2019)
Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R.H., Stoica, I.: Improving MapReduce performance in heterogeneous environments. In: Osdi, vol. 8, no. 4, p. 7 (2008). (Dec 2008)
Hussain, M.W., Reddy, K.H., Roy, D.S.: A counter based approach for reducer placement with augmented Hadoop rackawareness. Turkish J. Electric. Eng. Comput. Sci. 29(1), 437–453 (2021). (SCI-indexed, I.F=0.682)
Hussain, M.W., Reddy, K.H.K., Roy, D.S.: Resource aware execution of speculated tasks in Hadoop with SDN. Int. J. Adv. Sci. Technol. 28(13), 72–84 (2019). (Scopus -indexed)
Pakzad, F., Portmann, M., Tan, W.L., Indulska, J.: Efficient topology discovery in software defined networks. In: 2014 8th International Conference on Signal Processing and Communication Systems (ICSPCS), pp. 1–8. IEEE (2014). (Dec 2014)
Pakzad, F., Portmann, M., Tan, W.L., Indulska, J.: Efficient topology discovery in OpenFlow-based software defined networks. Comput. Commun. 77, 52–61 (2016)
Sinha, Y., Haribabu, K.: A survey: hybrid sdn. J. Netw. Comput. Appl. 100, 35–55 (2017)
Hussain, M.W., Sinha Roy, D.: Enabling indirect link discovery between SDN switches. In: Proceedings of the International Conference on Computing and Communication Systems, pp. 471–481. Springer, Singapore (2021)
Hussain, M.W., Moulik, S., Roy, D.S.: A broadcast based link discovery scheme for minimizing messages in software defined networks. In: 2021 IEEE Globecom Workshops (GC Wkshps), pp. 1–6. IEEE (2021). (Dec 2021).
Hussain, M.W., Reddy, K.H.K., Rodrigues, J.J., Roy, D.S.: An indirect controller-legacy switch forwarding scheme for link discovery in hybrid SDN. IEEE Syst. J. 15(2), 3142–3149 (2020). (SCI-indexed, I.F=3.987)
Hussain, M.W., Khan, M.S., Reddy, K.H.K., Roy, D.S.: Extended indirect controller-legacy switch forwarding for link discovery in hybrid multi-controller SDN. Comput. Commun. 189, 148–157 (2022). (SCI-indexed, I.F=5.047)
Nehra, A., Tripathi, M., Gaur, M.S., Battula, R.B., Lal, C.: SLDP: a secure and lightweight link discovery protocol for software defined networking. Comput. Netw. 150, 102–116 (2019)
Khan, S., Gani, A., Wahab, A.W.A., Guizani, M., Khan, M.K.: Topology discovery in software defined networks: Threats, taxonomy, and state-of-the-art. IEEE Commun. Surv. Tutor. 19(1), 303–324 (2016)
Pradhan, B., Hussain, M.W., Srivastava, G., Debbarma, M.K., Barik, R.K., Lin, J.C.W.: A neuro‐evolutionary approach for software defined wireless network traffic classification. IET Commun. (2022)
Zhu, R., Wang, H., Gao, Y., Yi, S., Zhu, F.: Energy saving and load balancing for SDN based on multi-objective particle swarm optimization. In: International Conference on Algorithms and Architectures for Parallel Processing, pp. 176–189. Springer, Cham (2015). (Nov 2015)
Hussain, M.W., Pradhan, B., Gao, X.Z., Reddy, K.H.K., Roy, D.S.: Clonal selection algorithm for energy minimization in software defined networks. Appl. Soft Comput. 96, 106617 (2020). (SCI-indexed, I.F=6.472)
Ant Dataset (2010). https://ant.isi.edu/datasets/index.html
SNDLIB. http://sndlib.zib.de/home.action.
Agarwal, S., Kodialam, M., Lakshman, T.V.: Traffic engineering in software defined networks. In: 2013 Proceedings IEEE INFOCOM, pp. 2211–2219. IEEE (2013). (Apr 2013).
Vissicchio, S., Vanbever, L., Rexford, J.: Sweet little lies: fake topologies for flexible routing. In: Proceedings of the 13th ACM Workshop on Hot Topics in Networks, pp. 1–7 (2014). (Oct 2014)
Guo, Y., Wang, Z., Yin, X., Shi, X., Wu, J.: Traffic engineering in SDN/OSPF hybrid network. In: 2014 IEEE 22nd International Conference on Network Protocols, pp. 563–568. IEEE (2014). (Oct 2014)
Caria, M., Jukan, A., Hoffmann, M.: A performance study of network migration to SDN-enabled traffic engineering. In: 2013 IEEE Global Communications Conference (GLOBECOM), pp. 1391–1396. IEEE (2013). (Dec 2013)
Hussain, M.W., Sinha Roy, D.: Intelligent node placement for improving traffic engineering in hybrid SDN. In: Advances in Communication, Devices and Networking, pp. 287–296. Springer, Singapore (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Hussain, M.W., Roy, D.S. (2023). Performance Optimization Strategies for Big Data Applications in Distributed Framework. In: Dash, S.R., Das, H., Li, KC., Tello, E.V. (eds) Intelligent Technologies: Concepts, Applications, and Future Directions, Volume 2. Studies in Computational Intelligence, vol 1098. Springer, Singapore. https://doi.org/10.1007/978-981-99-1482-1_10
Download citation
DOI: https://doi.org/10.1007/978-981-99-1482-1_10
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-1481-4
Online ISBN: 978-981-99-1482-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)