Skip to main content

Performance Optimization Strategies for Big Data Applications in Distributed Framework

  • Chapter
  • First Online:
Intelligent Technologies: Concepts, Applications, and Future Directions, Volume 2

Abstract

The evolution and advancements in Information and Communication Technologies (ICT) have enabled large scale distributed computing with a huge chunk of applications for massive number of users. This has obviously generated large volumes of data, thus severely burdening the processing capacity of computers as well as the inflexible traditional networks. State-of-the-art methods for addressing datacenter level performance fixes are yet found wanting for sufficiently addressing this huge processing, storage, and network movement with proprietary protocols for this voluminous data. In this chapter the works have focused on addressing the backend server performance through effective reducer placement, intelligent compression policy, handling slower tasks and in-network performance boosting techniques through effective traffic engineering, traffic classification, topology discovery, energy minimization and load balancing in datacenter-oriented applications. Hadoop, the defacto standard in distributed big data storage and processing, has been designed to store data with its Hadoop Distributed File System (HDFS) and processing engine MapReduce large datasets reliably. However, the processing performance of Hadoop is critically dependent on the time taken to transfer data during the shuffle generated during MapReduce. Also, during concurrent execution of tasks, slower tasks need to be properly identified and efficiently handled to improve the completion time of jobs. To overcome these limitations, three contributions have been made; (i) Compression of generated map outputs at a suitable time when all the map tasks are yet to be completed to shift the load of network onto the CPU; (ii) Placing the reducer onto the nodes where the computation done is highest based on a couple of counters, one maintained at the rack level and another at node level, to minimize the run-time data copying; and (iii) Placing the slower map tasks onto the nodes where the computation done is highest and network is handled by prioritizing. Software defined networking (SDN) has been a boon for next generation networking owing to the separation of control plane from the data plane. It has the capability to address the network requirements in a timely manner by setting flows for every to and fro data movement and gathering large network statistics at the controller to make informed decisions about the network. A core issue in the network for the controller is traffic classification, which can substantially assist SDN controllers towards efficient routing and traffic engineering decisions. This chapter presents a traffic classification scheme utilizing three classifiers namely Feed-forward Neural Network (FFNN), Logistic Regression (LR), Naïve Bayes and employing Particle Swarm Optimization (PSO) for improved traffic classification with less overhead and without overlooking the key Quality of Service (QoS) criterion. Also lowering energy minimization and link utilization has been an important criterion for lowering the operating cost of the network and effectively utilizing the network. This issue has been addressed in the chapter by formulating a multi-objective problem while simultaneously addressing the QoS constraints by proposing a metaheuristic, since no polynomial solution exists and hence an evolutionary based metaheuristic (Clonal Selection) based energy optimization scheme, namely, Clonal Selection Based Energy Minimization (CSEM) has been devised. The obtained results show the efficacy of the proposed traffic classification scheme and CSEM based solution as compared with the state-of-the-art techniques. SDN has been a promising newer network paradigm but security issues and expensive capital procurement of SDN limit its full deployment hence moving to a hybrid SDN (h-SDN) deployment is only logical moving forward. The usage of both centralized and decentralized paradigms in h-SDN with intrinsic issues of interoperability poses challenges to key issues of topology gathering by the controller for proper allocation of network resources and traffic engineering for optimum network performance. State-of-the-art protocols for topology gathering, such as Link Layer Discovery Protocol (LLDP) and Broadcast Domain Discovery Protocol (BDDP) require a huge number of messages and such schemes only gather link information of SDN devices leaving out legacy switches’ (LS) links which results in sub-optimal performance. This chapter provides novel schemes which unearth topology discovery by requiring fewer messages and gathering link information of all the devices in both single and multi-controller environments (might be used when scalability issue is prevalent in h-SDN). Traffic engineering problems in h-SDN are addressed by proper placement of SDN nodes in h-SDN by utilizing the analyzing key criterion of traffic details and the degree of a node while lowering the link utilization in real-time topologies. The results of the proposed schemes for topology discovery and SDN node placement demonstrate the merits as compared with the state-of-the-art protocols.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. White, T.: Hadoop: The definitive guide. O'Reilly Media, Inc. (2012)

    Google Scholar 

  2. Hussain, M.W., Roy, D.S.: A counter-based profiling scheme for improving locality through data and reducer placement. In: Advances in Machine Learning for Big Data Analysis, pp. 101–118. Springer, Singapore. (2022)

    Google Scholar 

  3. Chen, Q., Liu, C., Xiao, Z.: Improving MapReduce performance using smart speculative execution strategy. IEEE Trans. Comput. 63(4), 954–967 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  4. Ashu, A., Hussain, M.W., Sinha Roy, D., Reddy, H.K.: Intelligent data compression policy for Hadoop performance optimization. In: International Conference on Soft Computing and Pattern Recognition, pp. 80–89. Springer, Cham (2019). (Dec 2019)

    Google Scholar 

  5. Hammoud, M., Sakr, M.F.: Locality-aware reduce task scheduling for MapReduce. In: 2011 IEEE Third International Conference on Cloud Computing Technology and Science, pp. 570–576. IEEE (2011). (Nov 2011)

    Google Scholar 

  6. Singh, A.P., Hemant Kumar, G., Paik, S.S., Sinha Roy, D.: Storage and analysis of Synchrophasor data for event detection in Indian power system using Hadoop ecosystem. In: Data and Communication Networks, pp. 291–304. Springer, Singapore (2019)

    Google Scholar 

  7. Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R.H., Stoica, I.: Improving MapReduce performance in heterogeneous environments. In: Osdi, vol. 8, no. 4, p. 7 (2008). (Dec 2008)

    Google Scholar 

  8. Hussain, M.W., Reddy, K.H., Roy, D.S.: A counter based approach for reducer placement with augmented Hadoop rackawareness. Turkish J. Electric. Eng. Comput. Sci. 29(1), 437–453 (2021). (SCI-indexed, I.F=0.682)

    Google Scholar 

  9. Hussain, M.W., Reddy, K.H.K., Roy, D.S.: Resource aware execution of speculated tasks in Hadoop with SDN. Int. J. Adv. Sci. Technol. 28(13), 72–84 (2019). (Scopus -indexed)

    Google Scholar 

  10. Pakzad, F., Portmann, M., Tan, W.L., Indulska, J.: Efficient topology discovery in software defined networks. In: 2014 8th International Conference on Signal Processing and Communication Systems (ICSPCS), pp. 1–8. IEEE (2014). (Dec 2014)

    Google Scholar 

  11. Pakzad, F., Portmann, M., Tan, W.L., Indulska, J.: Efficient topology discovery in OpenFlow-based software defined networks. Comput. Commun. 77, 52–61 (2016)

    Article  Google Scholar 

  12. Sinha, Y., Haribabu, K.: A survey: hybrid sdn. J. Netw. Comput. Appl. 100, 35–55 (2017)

    Article  Google Scholar 

  13. Hussain, M.W., Sinha Roy, D.: Enabling indirect link discovery between SDN switches. In: Proceedings of the International Conference on Computing and Communication Systems, pp. 471–481. Springer, Singapore (2021)

    Google Scholar 

  14. Hussain, M.W., Moulik, S., Roy, D.S.: A broadcast based link discovery scheme for minimizing messages in software defined networks. In: 2021 IEEE Globecom Workshops (GC Wkshps), pp. 1–6. IEEE (2021). (Dec 2021).

    Google Scholar 

  15. Hussain, M.W., Reddy, K.H.K., Rodrigues, J.J., Roy, D.S.: An indirect controller-legacy switch forwarding scheme for link discovery in hybrid SDN. IEEE Syst. J. 15(2), 3142–3149 (2020). (SCI-indexed, I.F=3.987)

    Google Scholar 

  16. Hussain, M.W., Khan, M.S., Reddy, K.H.K., Roy, D.S.: Extended indirect controller-legacy switch forwarding for link discovery in hybrid multi-controller SDN. Comput. Commun. 189, 148–157 (2022). (SCI-indexed, I.F=5.047)

    Google Scholar 

  17. Nehra, A., Tripathi, M., Gaur, M.S., Battula, R.B., Lal, C.: SLDP: a secure and lightweight link discovery protocol for software defined networking. Comput. Netw. 150, 102–116 (2019)

    Article  Google Scholar 

  18. Khan, S., Gani, A., Wahab, A.W.A., Guizani, M., Khan, M.K.: Topology discovery in software defined networks: Threats, taxonomy, and state-of-the-art. IEEE Commun. Surv. Tutor. 19(1), 303–324 (2016)

    Article  Google Scholar 

  19. Pradhan, B., Hussain, M.W., Srivastava, G., Debbarma, M.K., Barik, R.K., Lin, J.C.W.: A neuro‐evolutionary approach for software defined wireless network traffic classification. IET Commun. (2022)

    Google Scholar 

  20. Zhu, R., Wang, H., Gao, Y., Yi, S., Zhu, F.: Energy saving and load balancing for SDN based on multi-objective particle swarm optimization. In: International Conference on Algorithms and Architectures for Parallel Processing, pp. 176–189. Springer, Cham (2015). (Nov 2015)

    Google Scholar 

  21. Hussain, M.W., Pradhan, B., Gao, X.Z., Reddy, K.H.K., Roy, D.S.: Clonal selection algorithm for energy minimization in software defined networks. Appl. Soft Comput. 96, 106617 (2020). (SCI-indexed, I.F=6.472)

    Google Scholar 

  22. Ant Dataset (2010). https://ant.isi.edu/datasets/index.html

  23. SNDLIB. http://sndlib.zib.de/home.action.

  24. Agarwal, S., Kodialam, M., Lakshman, T.V.: Traffic engineering in software defined networks. In: 2013 Proceedings IEEE INFOCOM, pp. 2211–2219. IEEE (2013). (Apr 2013).

    Google Scholar 

  25. Vissicchio, S., Vanbever, L., Rexford, J.: Sweet little lies: fake topologies for flexible routing. In: Proceedings of the 13th ACM Workshop on Hot Topics in Networks, pp. 1–7 (2014). (Oct 2014)

    Google Scholar 

  26. Guo, Y., Wang, Z., Yin, X., Shi, X., Wu, J.: Traffic engineering in SDN/OSPF hybrid network. In: 2014 IEEE 22nd International Conference on Network Protocols, pp. 563–568. IEEE (2014). (Oct 2014)

    Google Scholar 

  27. Caria, M., Jukan, A., Hoffmann, M.: A performance study of network migration to SDN-enabled traffic engineering. In: 2013 IEEE Global Communications Conference (GLOBECOM), pp. 1391–1396. IEEE (2013). (Dec 2013)

    Google Scholar 

  28. Hussain, M.W., Sinha Roy, D.: Intelligent node placement for improving traffic engineering in hybrid SDN. In: Advances in Communication, Devices and Networking, pp. 287–296. Springer, Singapore (2022)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mir Wajahat Hussain .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Hussain, M.W., Roy, D.S. (2023). Performance Optimization Strategies for Big Data Applications in Distributed Framework. In: Dash, S.R., Das, H., Li, KC., Tello, E.V. (eds) Intelligent Technologies: Concepts, Applications, and Future Directions, Volume 2. Studies in Computational Intelligence, vol 1098. Springer, Singapore. https://doi.org/10.1007/978-981-99-1482-1_10

Download citation

Publish with us

Policies and ethics