Skip to main content

Near-data Prediction Based Speculative Optimization in a Distribution Environment


Hadoop is an open source from Apache with a distributed file system and MapReduce distributed computing framework. The current Apache 2.0 license agreement supports on-demand payment by consumers for cloud platform services, helping users leverage their respective different hardware to provides cloud services. In cloud-based environment, there is a need to balance the resource requirements of workloads, optimize load performance, and the cloud compute costs to manage. When the processing power of clustered machines varies widely, such as when hardware is aging or overloaded, Hadoop offers a speculative execution (SE) optimization strategy, by monitoring task progress in real time, in the starting identical backup tasks on different nodes when multiple tasks under a job are not running at the same speed, providing the first to go. The completed calculations maintain the overall progress of the job. At present, the SE strategy’s incorrect selection of backup nodes and resource constraints may result in poor Hadoop performance, and subsequent tasks cannot be completed execution and other problems. This paper proposes an SE optimization strategy based on near data prediction, which analyzes the prediction of real-time task execution information to predict the required running time, select backup nodes based on actual requirements and approximate data to make the SE strategy achieve the best performance. Experiments prove that in a heterogeneous Hadoop environment, the optimization strategy can effectively improve the effectiveness and accuracy of various tasks and enhance the performance of cloud computing. Platform performance can benefits consumers better than before.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5


  1. Abolfazli S, Sanaei Z, Alizadeh M, Gani A, Xia F (2014) An experimental analysis on cloud-based mobile augmentation in mobile cloud computing. IEEE Trans Consum Electron 60(1):146–154

    Article  Google Scholar 

  2. Chi X, Yan C, Wang H, Rafique W, Qi L (2020) Amplified locality-sensitive hashing-based recommender systems with privacy protection. Concurrency and Computation: Practice and Experience e5681.

  3. Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  4. Fu Z, Sun X, Linge N, Zhou L (2014) Achieving effective cloud search services: multi-keyword ranked search over encrypted cloud data supporting synonym query. IEEE Trans Consum Electron 60 (1):164–172

    Article  Google Scholar 

  5. Giselsson P, Boyd S (2017) Linear convergence and metric selection for douglas-rachford splitting and admm. IEEE Trans Autom Control 62(2):532–544

    MathSciNet  Article  Google Scholar 

  6. Gong W, Qi L, Xu Y (2018) Privacy-aware multidimensional mobile service quality prediction and recommendation in distributed fog environment. Wirel Commun Mob Comput 2018

  7. Gu Z, Qiu M (2018) Introduction to the special issue on ?embedded artificial intelligence and smart computing?

  8. Hamdani M, Aklouf Y, Bouarara HA (2019) Improved fuzzy load-balancing algorithm for cloud computing system. In: Proceedings of the 9th international conference on information systems and technologies, pp 1–4

  9. Huang X, Zhang L, Li R, Wan L, Li K (2016) Novel heuristic speculative execution strategies in heterogeneous distributed environments. Comput Electr Eng 50:166–179

    Article  Google Scholar 

  10. Iqbal MH, Soomro TR (2015) Big data analysis: Apache storm perspective. Int J Comput Trends Technol 19(1):9–14

    Article  Google Scholar 

  11. Kalyampudi PL, Krishna PV, Kuppani S, Saritha V (2021) A work load prediction strategy for power optimization on cloud based data centre using deep machine learning. Evol Intel 14:519–527

    Article  Google Scholar 

  12. Lee YT, Hsiao WH, Huang CM, Seng-cho TC (2016) An integrated cloud-based smart home management system with community hierarchy. IEEE Trans Consum Electron 62(1):1–9

    Article  Google Scholar 

  13. Li J, Liu Y, Pan J, Zhang P, Chen W, Wang L (2020) Map-balance-reduce: an improved parallel programming model for load balancing of mapreduce. Futur Gener Comput Syst 105:993–1001

    Article  Google Scholar 

  14. Li Y, Yang Q, Lai S, Li B (2015) A new speculative execution algorithm based on c4. 5 decision tree for hadoop. In: International conference of young computer scientists, Engineers and Educators. Springer, pp 284–291

  15. Li Z, Shen H, Ligon W, Denton J (2016) An exploration of designing a hybrid scale-up/out hadoop architecture based on performance measurements. IEEE Trans Parallel Distrib Syst 28(2):386–400

    Google Scholar 

  16. Liu Q, Cai W, Fu Z, Shen J, Linge N (2016a) A smart strategy for speculative execution based on hardware resource in a heterogeneous distributed environment. Int J Grid Distrib Comput 9(2):203–214

    Article  Google Scholar 

  17. Liu Q, Cai W, Jin D, Shen J, Fu Z, Liu X, Linge N (2016b) Estimation accuracy on execution time of run-time tasks in a heterogeneous distributed environment. Sensors 16(9):1386

    Article  Google Scholar 

  18. Liu Q, Cai W, Shen J, Fu Z, Liu X, Linge N (2016c) A speculative approach to spatial-temporal efficiency with multi-objective optimization in a heterogeneous cloud environment. Secur Commun Netw 9(17):4002–4012

    Article  Google Scholar 

  19. Liu Q, Cai W, Shen J, Liu X, Linge N (2016d) An adaptive approach to better load balancing in a consumer-centric cloud environment. IEEE Trans Consum Electron 62(3):243–250

    Article  Google Scholar 

  20. Liu Q, Chen F, Chen F, Wu Z, Liu X, Linge N (2018) Home appliances classification based on multi-feature using elm. IJSNet 28(1):34–42

    Article  Google Scholar 

  21. Qi L, Dou W, Wang W, Li G, Yu H, Wan S (2018) Dynamic mobile crowdsourcing selection for electricity load forecasting. IEEE Access 6:46926–46937

    Article  Google Scholar 

  22. Qi L, Chen Y, Yuan Y, Fu S, Zhang X, Xu X (2019) A qos-aware virtual machine scheduling method for energy conservation in cloud-based cyber-physical systems. World Wide Web 1–23

  23. Qi L, Zhang X, Li S, Wan S, Wen Y, Gong W (2020) Spatial-temporal data-driven service recommendation with privacy-preservation. Inform Sci 515:91–102

    Article  Google Scholar 

  24. Sanchez R, Almenares F, Arias P, Diaz-Sanchez D, Marin A (2012) Enhancing privacy and dynamic federation in idm for consumer cloud computing. IEEE Trans Consum Electron 58(1):95–103

    Article  Google Scholar 

  25. Tang S, Lee B S, He B (2014) Dynamicmr: A dynamic slot allocation optimization framework for mapreduce clusters. IEEE Trans Cloud Comput 2(3):333–347

    Article  Google Scholar 

  26. Vaquero L M, Roderomerino L, Caceres J, Lindner M (2008) A break in the clouds: Towards a cloud definition. Acm Sigcomm Comput Commun Rev 39(1):50–55

    Article  Google Scholar 

  27. Wan S, Goudos S (2020) Faster r-cnn for multi-class fruit detection using a robotic vision system. Comput Netw 168:107036

    Article  Google Scholar 

  28. Wan S, Gu Z, Ni Q (2019a) Cognitive computing and wireless communications on the edge for healthcare service robots. Comput Commun

  29. Wan S, Qi L, Xu X, Tong C, Gu Z (2020) Deep learning models for real-time human activity recognition with smartphones. Mob Netw Appl 25:743–755

    Article  Google Scholar 

  30. Wang Y, Lu W, Lou R, Wei B (2015) Improving mapreduce performance with partial speculative execution. J Grid Comput 13(4):587–604

    Article  Google Scholar 

  31. Wu H, Li K, Tang Z, Zhang L (2014) A heuristic speculative execution strategy in heterogeneous distributed environments. In: 2014 Sixth international symposium on parallel architectures, algorithms and programming. IEEE, pp 268–273

  32. Xu H, Lau WC (2015) Optimization for speculative execution in a mapreduce-like cluster. In: 2015 IEEE conference on computer communications, INFOCOM. IEEE, pp 1071–1079

  33. Xu H, Lau W C (2016) Optimization for speculative execution in big data processing clusters. IEEE Trans Parallel Distrib Syst 28(2):530–545

    Google Scholar 

  34. Xu X, He C, Xu Z, Qi L, Wan S, Bhuiyan MZA (2019a) Joint optimization of offloading utility and privacy for edge computing enabled iot. IEEE Intern Things J

  35. Xu X, Li Y, Huang T, Xue Y, Peng K, Qi L, Dou W (2019b) An energy-aware computation offloading method for smart edge computing in wireless metropolitan area networks. J Netw Comput Appl 133:75–85

    Article  Google Scholar 

  36. Xu X, Liu X, Xu Z, Wang C, Wan S, Yang X (2020) Joint optimization of resource utilization and load balance with privacy preservation for edge services in 5g networks. Mobile Netw Appl 25:713–724

    Article  Google Scholar 

  37. Xu X, Mo R, Dai F, Lin W, Wan S, Dou W (2019d) Dynamic resource provisioning with fault tolerance for data-intensive meteorological workflows in cloud. IEEE Trans Industr Inform

  38. Xu X, Xue Y, Qi L, Yuan Y, Zhang X, Umer T, Wan S (2019e) An edge computing-enabled computation offloading method with privacy preservation for internet of connected vehicles. Futur Gener Comput Syst 96:89–100

    Article  Google Scholar 

  39. Xu X, Zhang X, Gao H, Xue Y, Qi L (2019f) Become :Blockchain-enabled computation offloading for iot in mobile edge computing. IEEE Trans Industr Inform

  40. Xu X, Cao H, Geng Q, Liu X, Dai F, Wang C (2020) Dynamic resource provisioning for workflow scheduling under uncertainty in edge computing environment. Concurr Comput Pract Exper 56–74.

  41. Xu Y, Qi L, Dou W, Yu J (2017) Privacy-preserving and scalable service recommendation based on simhash in a distributed cloud environment. Complexity 2017

  42. Yang SJ, Chen YR (2015) Design adaptive task allocation scheduler to improve mapreduce performance in heterogeneous clouds. J Netw Comput Appl 57:61–70

    Article  Google Scholar 

  43. Zhang M, Zheng N, Li H, Gu Z (2018) A decomposition-based approach to optimization of ttp-based distributed embedded systems. J Syst Archit 91:53–61

    Article  Google Scholar 

  44. Zhao Q, Gu Z, Zeng H, Zheng N (2018) Schedulability analysis and stack size minimization with preemption thresholds and mixed-criticality scheduling. J Syst Archit 83:57–74

    Article  Google Scholar 

Download references


This work has received funding from National Natural Science Foundation of China (No. 41911530242, 41975142), 5150 Spring Specialists (05492018012, 05762018039), Major Program of the National Social Science Fund of China (Grant No.17ZDA092), 333 High-Level Talent Cultivation Project of Jiangsu Province (BRA2018332), Royal Society of Edinburgh, UK and China Natural Science Foundation Council (RSE Reference: 62967_Liu_2018_2) under their Joint International Projects funding scheme and basic Research Programs (Natural Science Foundation) of Jiangsu Province (BK20191398).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Yuemei Hu.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Liu, Q., Wu, X., Liu, X. et al. Near-data Prediction Based Speculative Optimization in a Distribution Environment. Mobile Netw Appl (2022).

Download citation

  • Accepted:

  • Published:

  • DOI:


  • Distributed systems
  • Hadoop
  • Speculative execution
  • Locally weighted regression
  • Near data prediction