Advertisement

Big Data for Smart Infrastructure Design: Opportunities and Challenges

  • Yasir Arfat
  • Sardar UsmanEmail author
  • Rashid Mehmood
  • Iyad Katib
Chapter
Part of the EAI/Springer Innovations in Communication and Computing book series (EAISICC)

Abstract

Big data is being at the forefront of many ICT-based developments in all spheres of life, be it business, education, or entertainment. Big data is being generated from many diverse sources including social media, Internet of Things (IoT), manufacturing and operations. Big data technologies allow us to take informed decisions from structured or unstructured data. Management and analysis of heterogeneous data generated by various sources brings numerous challenges and diversity in solutions. The aim of this chapter is to discuss different opportunities, issues, and challenges of big data with the main focus on the Hadoop platforms. We provide a detailed survey of opportunities, challenges, and issues of Hadoop-based big data developments in terms of data locality, load balancing, heterogeneity issues, scheduling issues, in-memory computation, multiple query optimizations, and I/O issues. Taxonomy of these challenges and opportunities is also presented.

Keywords

HPC Big data Hadoop Map-Reduce HDFS Data locality Scheduling I/O Load balancing 

Notes

Acknowledgments

The authors acknowledge with thanks the technical and financial support from the Deanship of Scientific Research (DSR) at the King Abdul-Aziz University (KAU), Jeddah, Saudi Arabia, under the grant number G-673-793-38. The work carried out in this chapter is supported by the HPC Center at the King Abdul-Aziz University.

References

  1. 1.
    Usman, S., Mehmood, R., Katib, I.: Big data and HPC convergence: the cutting edge and outlook. In: Mehmood, R., Bhaduri, B., Katib, I., Chlamtac, I. (eds.) Smart Societies, Infrastructure, Technologies and Applications. SCITA 2017. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, pp. 11–26. Springer, Cham (2018)Google Scholar
  2. 2.
    Mehmood, R., Faisal, M.A., Altowaijri, S.: Future networked healthcare systems: a review and case study. In: Boucadair, M., Jacquenet, C. (eds.) Handbook of Research on Redesigning the Future of Internet Architectures, pp. 531–558. IGI Global, Hershey (2015)CrossRefGoogle Scholar
  3. 3.
    Alam, F., Mehmood, R., Katib, I., Albogami, N.N., Albeshri, A.: Data fusion and IoT for smart ubiquitous environments: a survey. IEEE Access. 5, 9533–9554 (2017)CrossRefGoogle Scholar
  4. 4.
    Muhammed, T., Mehmood, R., Albeshri, A., Katib, I.: UbeHealth: a personalized ubiquitous cloud and edge-enabled networked healthcare system for smart cities. IEEE Access. 6, 32258 (2018)CrossRefGoogle Scholar
  5. 5.
    Suma, S., Mehmood, R., Albugami, N., Katib, I., Albeshri, A.: Enabling next generation logistics and planning for smarter societies. Procedia Comput. Sci. 109, 1122–1127 (2017)CrossRefGoogle Scholar
  6. 6.
    Suma, S., Mehmood, R., Albeshri, A.: Automatic event detection in smart cities using big data analytics. In: Mehmood, R., Bhaduri, B., Katib, I., Chlamtac, I. (eds.) Smart Societies, Infrastructure, Technologies and Applications. SCITA 2017. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, pp. 111–122. Springer, Cham (2018)Google Scholar
  7. 7.
    Mehmood, R., Alam, F., Albogami, N.N., Katib, I., Albeshri, A., Altowaijri, S.M.: UTiLearn: a personalised ubiquitous teaching and learning system for smart societies. IEEE Access. 5, 2615–2635 (2017)CrossRefGoogle Scholar
  8. 8.
    Mehmood, R., Graham, G.: Big data logistics: a health-care transport capacity sharing model. Procedia Comput. Sci. 64, 1107–1114 (2015)CrossRefGoogle Scholar
  9. 9.
    Ahmed, W., Khan, M., Khan, A.A., Mehmood, R., Algarni, A., Albeshri, A., Katib, I.: A framework for faster porting of scientific applications between heterogeneous clouds. In: Mehmood, R., Bhaduri, B., Katib, I., Chlamtac, I. (eds.) Smart Societies, Infrastructure, Technologies and Applications. SCITA 2017. Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST, pp. 27–43. Springer, Cham (2018)Google Scholar
  10. 10.
    Alotaibi, S., Mehmood, R.: Big data enabled healthcare supply chain management: opportunities and challenges. In: Mehmood, R., Bhaduri, B., Katib, I., Chlamtac, I. (eds.) Smart Societies, Infrastructure, Technologies and Applications. SCITA 2017. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, pp. 207–215. Springer, Cham (2018)Google Scholar
  11. 11.
    Alamoudi, E., Mehmood, R., Albeshri, A., Gojobori, T.: DNA profiling methods and tools: a review. In: Mehmood, R., Bhaduri, B., Katib, I., Chlamtac, I. (eds.) Smart Societies, Infrastructure, Technologies and Applications. SCITA 2017. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, pp. 216–231. Springer, Cham (2018)Google Scholar
  12. 12.
    Al Shehri, W., Mehmood, R., Alayyaf, H.: A Smart pain management system using big data computing. In: Mehmood, R., Bhaduri, B., Katib, I., Chlamtac, I. (eds.) Smart Societies, Infrastructure, Technologies and Applications. SCITA 2017. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, pp. 232–246. Springer, Cham (2018)Google Scholar
  13. 13.
    Khanum, A., Alvi, A., Mehmood, R.: Towards a semantically enriched computational intelligence (SECI) framework for smart farming. In: Mehmood, R., Bhaduri, B., Katib, I., Chlamtac, I. (eds.) Smart Societies, Infrastructure, Technologies and Applications. SCITA 2017. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, pp. 247–257. Springer, Cham (2018)Google Scholar
  14. 14.
    Aqib, M., Mehmood, R., Albeshri, A., Alzahrani, A.: Disaster management in smart cities by forecasting traffic plan using deep learning and GPUs. In: Mehmood, R., Bhaduri, B., Katib, I., Chlamtac, I. (eds.) Smart Societies, Infrastructure, Technologies and Applications. SCITA 2017. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, pp. 139–154. Springer, Cham (2018)Google Scholar
  15. 15.
    Alam, F., Mehmood, R., Katib, I.: D2TFRS: an object recognition method for autonomous vehicles based on RGB and spatial values of pixels. In: Mehmood, R., Bhaduri, B., Katib, I., Chlamtac, I. (eds.) Smart Societies, Infrastructure, Technologies and Applications. SCITA 2017. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, pp. 155–168. Springer, Cham (2018)Google Scholar
  16. 16.
    Muhammed, T., Mehmood, R., Albeshri, A.: Enabling reliable and resilient IoT based smart city applications. In: Mehmood, R., Bhaduri, B., Katib, I., Chlamtac, I. (eds.) Smart Societies, Infrastructure, Technologies and Applications. SCITA 2017. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, pp. 169–184. Springer, Cham (2018)Google Scholar
  17. 17.
    Al-Dhubhani, R., Mehmood, R., Katib, I., Algarni, A.: Location privacy in smart cities era. In: Mehmood, R., Bhaduri, B., Katib, I., Chlamtac, I. (eds.) Smart Societies, Infrastructure, Technologies and Applications. SCITA 2017. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, pp. 123–138. Springer, Cham (2018)Google Scholar
  18. 18.
    Alomari, E., Mehmood, R.: Analysis of tweets in Arabic language for detection of road traffic conditions. In: Mehmood, R., Bhaduri, B., Katib, I., Chlamtac, I. (eds.) Smart Societies, Infrastructure, Technologies and Applications. SCITA 2017. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, pp. 98–110. Springer, Cham (2018)Google Scholar
  19. 19.
    Arfat, Y., Aqib, M., Mehmood, R., Albeshri, A., Katib, I., Albogami, N., Alzahrani, A.: Enabling smarter societies through mobile big data fogs and clouds. Procedia Comput. Sci. 109, 1128–1133 (2017)CrossRefGoogle Scholar
  20. 20.
    Schlingensiepen, J., Nemtanu, F., Mehmood, R., McCluskey, L.: Autonomic transport management systems—enabler for smart cities, personalized medicine, participation and industry grid/industry 4.0. In: Intelligent Transportation Systems—Problems and Perspectives, pp. 3–35. Springer, Cham (2016)Google Scholar
  21. 21.
    Alyahya, H., Mehmood, R., Katib, I.: Parallel sparse matrix vector multiplication on intel MIC: performance analysis. In: Mehmood, R., Bhaduri, B., Katib, I., Chlamtac, I. (eds.) Smart Societies, Infrastructure, Technologies and Applications. SCITA 2017. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, pp. 306–322. Springer, Cham (2018)Google Scholar
  22. 22.
    Arfat, Y., Mehmood, R., Albeshri, A.: Parallel shortest path graph computations of united states road network data on apache spark. In: Mehmood, R., Bhaduri, B., Katib, I., Chlamtac, I. (eds.) Smart Societies, Infrastructure, Technologies and Applications. SCITA 2017. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, pp. 323–336. Springer, Cham (2018)Google Scholar
  23. 23.
    Kruse, C.S., Goswamy, R., Raval, Y., Marawi, S.: Challenges and opportunities of big data in health care: a systematic review. JMIR Med. Inf. 4, e38 (2016)CrossRefGoogle Scholar
  24. 24.
    Sivarajah, U., Kamal, M.M., Irani, Z., Weerakkody, V.: Critical analysis of big data challenges and analytical methods. J. Bus. Res. 70, 263–286 (2017)CrossRefGoogle Scholar
  25. 25.
    Chauhan, S., Agarwal, N., Kar, A.K.: Addressing big data challenges in smart cities: a systematic literature review. Info. 18, 73–90 (2016)CrossRefGoogle Scholar
  26. 26.
    Chen, M., Mao, S., Liu, Y.: Big data: a survey. Mob. Netw. Appl. 19, 171–209 (2014)CrossRefGoogle Scholar
  27. 27.
    Padhy, R.P.: Big data processing with Hadoop-MapReduce in cloud systems. IJ-CLOSER Int. J. Cloud Comput. Serv. Sci. 2, 233–245 (2012)Google Scholar
  28. 28.
    Singh, K., Kaur, R.: Hadoop: addressing challenges of big data. In: 2014 IEEE International Advance Computing Conference (IACC), pp. 686–689. IEEE (2014)Google Scholar
  29. 29.
    Xu, Z., Shi, Y.: Exploring big data analysis: fundamental scientific problems. Ann. Data Sci. 2(4), 363–372 (2015)CrossRefGoogle Scholar
  30. 30.
    Hashem, I.A.T., Yaqoob, I., Badrul Anuar, N., Mokhtar, S., Gani, A., Ullah Khan, S.: The rise of “Big Data” on cloud computing: review and open research issues. Inf. Syst. 47, 98–115 (2014)CrossRefGoogle Scholar
  31. 31.
    Radha, K., Rao, B.T.: Slot utilization and performance improvement in hadoop cluster. Presented at the (2016)Google Scholar
  32. 32.
    Guo, Z., Fox, G., Zhou, M.: Investigation of data locality in MapReduce. In: IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), pp. 419–426. IEEE (2012)Google Scholar
  33. 33.
    Eltabakh, M.Y., Tian, Y., Özcan, F., Gemulla, R., Krettek, A., McPherson, J.: CoHadoop: flexible data placement and its exploitation in Hadoop. Proc. VLDB Endow. 4, 575–585 (2011)CrossRefGoogle Scholar
  34. 34.
    Wang, L., Tao, J., Ranjan, R., Marten, H., Streit, A., Chen, J., Chen, D.: G-Hadoop: MapReduce across distributed data centers for data-intensive computing. Futur. Gener. Comput. Syst. 29, 739–750 (2013)CrossRefGoogle Scholar
  35. 35.
    Hsu, C.-H., Slagter, K.D., Chung, Y.-C.: Locality and loading aware virtual machine mapping techniques for optimizing communications in MapReduce applications. Futur. Gener. Comput. Syst. 53, 43–54 (2015)CrossRefGoogle Scholar
  36. 36.
    Yu, X., Hong, B.: Grouping blocks for MapReduce co-locality. In: 2015 IEEE International Parallel and Distributed Processing Symposium, pp. 271–280. IEEE (2015)Google Scholar
  37. 37.
    Lin, Z., Cai, M., Huang, Z., Lai, Y.: SALA: a skew-avoiding and locality-aware algorithm for MapReduce-Based Join. 1, 311–323 (2014)CrossRefGoogle Scholar
  38. 38.
    Rhine, R., Bhuvan, N.T.: Locality Aware MapReduce, pp. 221–228. Springer, Cham (2016)Google Scholar
  39. 39.
    Chen, T.Y., Wei, H.W., Wei, M.F., Chen, Y.J., Hsu, T.S., Shih, W.K.: LaSA: a locality-aware scheduling algorithm for Hadoop-MapReduce resource assignment. Proc. 2013 Int. Conf. Collab. Technol. Syst. CTS 2013, pp. 342–346 (2013)Google Scholar
  40. 40.
    Tan, J., Meng, S., Meng, X., Zhang, L.: Improving reducetask data locality for sequential MapReduce jobs. In: 2013 Proceedings IEEE INFOCOM, pp. 1627–1635. IEEE (2013)Google Scholar
  41. 41.
    Ibrahim, S., Jin, H., Lu, L., Wu, S., He, B., Qi, L.: LEEN: locality/fairness-aware key partitioning for MapReduce in the Cloud. In: 2010 IEEE Second International Conference on Cloud Computing Technology and Science, pp. 17–24. IEEE (2010)Google Scholar
  42. 42.
    Panchputre, K., Chaudhary, P., Garg, R.: Locality-aware load balancer for HBase, pp. 1–8Google Scholar
  43. 43.
    Wang, K., Zhou, X., Li, T., Zhao, D., Lang, M., Raicu, I.: Optimizing load balancing and data-locality with data-aware scheduling. In: Proceedings—2014 IEEE International Conference on Big Data, IEEE Big Data 2014, pp. 119–128 (2015)Google Scholar
  44. 44.
    Park, J., Lee, D., Kim, B., Huh, J., Maeng, S.: Locality-aware dynamic VM reconfiguration on MapReduce clouds. In: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing—HPDC’12, pp. 27. ACM Press, New York (2012)Google Scholar
  45. 45.
    Zhang, X., Feng, Y., Feng, S., Fan, J., Ming, Z.: An effective data locality aware task scheduling method for MapReduce framework in heterogeneous environments. In: 2011 International Conference on Cloud and Service Computing, pp. 235–242. IEEE (2011)Google Scholar
  46. 46.
    Fan, X., Ma, X., Liu, J., Li, D.: Dependency-aware data locality for MapReduce. In: IEEE International Conference on Cloud Computing CLOUD, pp. 408–415 (2014)Google Scholar
  47. 47.
    Khan, M., Liu, Y., Li, M.: Data locality in Hadoop cluster systems. In: 2014 11th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2014), pp. 720–724 (2014)Google Scholar
  48. 48.
    Chen, Y., Liu, Z., Wang, T., Wang, L.: Load balancing in MapReduce based on data locality. Presented at the (2014)Google Scholar
  49. 49.
    Kc, K., Freeh, V.W.: Dynamically controlling node-level parallelism in Hadoop. 2015 IEEE 8th Int. Conf. Cloud Comput., pp. 309–316 (2015)Google Scholar
  50. 50.
    Palit, I., Reddy, C.K.: Scalable and parallel boosting with mapReduce. IEEE Trans. Knowl. Data Eng. 24, 1904–1916 (2012)CrossRefGoogle Scholar
  51. 51.
    Perkins, L.S., Andrews, P., Panda, D., Morton, D., Bonica, R., Werstiuk, N., Kreiser, R.: A survey of load balancing techniques for data intensive computing. In: 2009 International Symposium on Collaborative Technologies and Systems (CTS 2009), vol. 41, p. c1 (2011)Google Scholar
  52. 52.
    Ajitha, A., Ramesh, D.: Improved task graph-based parallel data processing for dynamic resource allocation in cloud. Procedia Eng. 38, 2172–2178 (2012)CrossRefGoogle Scholar
  53. 53.
    Nishanth, S., Radhikaa, B., Ragavendar, T.J., Babu, C., Prabavathy, B.: CoHadoop ++ : a load balanced data co­location in radoop distributed file system. In: Proceedings of 2013 5th International Conference on Advanced Computing, pp. 100–105 (2013)Google Scholar
  54. 54.
    Xu, Y., Qu, W., Li, Z., Liu, Z., Ji, C., Li, Y., Li, H.: Balancing reducer workload for skewed data using sampling. Comput. Electr. Eng. 40, 675–687 (2014)CrossRefGoogle Scholar
  55. 55.
    Chen, Q., Yao, J., Xiao, Z.: LIBRA: Lightweight Data Skew Mitigation in MapReduce. IEEE Trans. Parallel Distrib. Syst. 9219, 1–1 (2014)Google Scholar
  56. 56.
    Zhou, H., Wen, Q.: Load balancing solution based on AHP for Hadoop. In: 2014 IEEE Workshop on Electronics, Computer and Applications pp. 633–636 (2014)Google Scholar
  57. 57.
    Gao, Z., Liu, D., Yang, Y., Zheng, J., Hao, Y.: A load balance algorithm based on nodes performance in Hadoop cluster. In: APNOMS 2014—16th Asia-Pacific Network Operations and Management Symposium, pp. 1–4 (2014)Google Scholar
  58. 58.
    Fadika, Z., Dede, E., Hartog, J., Govindaraju, M.: MARLA: MapReduce for heterogeneous clusters. In: Proceedings of the 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2012), pp. 49–56. 2012.eneous clusters (2012)Google Scholar
  59. 59.
    Wang, Y., Croft, W.L.: Smart shuffling in MapReduce: a solution to Balance Network Traffic and Workloads (2015)Google Scholar
  60. 60.
    Myung, J., Shim, J., Yeon, J., Lee, S.: Handling data skew in join algorithms using MapReduce. Expert Syst. Appl. 51, 286–299 (2016)CrossRefGoogle Scholar
  61. 61.
    Xie, J.X.J., Yin, S.Y.S., Ruan, X.R.X., Ding, Z.D.Z., Tian, Y.T.Y., Majors, J., Manzanares, A., Qin, X.Q.X.: Improving MapReduce performance through data placement in heterogeneous Hadoop clusters. In: 2010 IEEE International Symposium on Parallel and Distributed Processing, Workshops and Phd Forum (IPDPSW), vol. 9, pp. 29–42 (2010)Google Scholar
  62. 62.
    Arasanal, R.M., Rumani, D.U.: Improving MapReduce performance through complexity and performance based data placement in heterogeneous hadoop clusters. In: Presented at the (2013)Google Scholar
  63. 63.
    Lee, C.W., Hsieh, K.Y., Hsieh, S.Y., Hsiao, H.C.: A dynamic data placement strategy for Hadoop in heterogeneous environments. Big Data Res. 1, 14–22 (2014)CrossRefGoogle Scholar
  64. 64.
    Sujitha, S., Jaganathan, S.: Aggrandizing Hadoop in terms of node heterogeneity & data locality. In: 2013 IEEE International Conference on Smart Structures and Systems, ICSSS 2013, 145–151 (2013)Google Scholar
  65. 65.
    Ubarhande, V., Popescu, A.-M., Gonzalez-Velez, H.: Novel data-distribution technique for Hadoop in heterogeneous cloud environments. In: 2015 Ninth International Conference on Complex, Intelligent, and Software Intensive Systems, pp. 217–224. IEEE (2015)Google Scholar
  66. 66.
    Huang, X., Zhang, L., Li, R., Wan, L., Li, K.: Novel heuristic speculative execution strategies in heterogeneous distributed environments. Comput. Electr. Eng. 50, 166–179 (2015)CrossRefGoogle Scholar
  67. 67.
    Prasad, M.S.G., Nagesh, H.R., Prabhu, S.: Performance analysis of schedulers to handle multi jobs in Hadoop cluster. Int. J. Mod. Educ. Comput. Sci. 7, 51–56 (2015)CrossRefGoogle Scholar
  68. 68.
    Sethi, K.K., Ramesh, D.: Delay scheduling with reduced workload on JobTracker in Hadoop. Presented at the (2016)Google Scholar
  69. 69.
    Zaharia, M., Borthakur, D., Sarma, J. S., Elmeleegy, K., Shenker, S., Stoica, I.: Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th European conference on Computer systems—EuroSys ’10, 2010, p.265.Google Scholar
  70. 70.
    Sun, M., Zhuang, H., Li, C., Lu, K., Zhou, X.: Scheduling algorithm based on prefetching in MapReduce clusters. Appl. Soft Comput. 38, 1–10 (2015)Google Scholar
  71. 71.
    Gu, R., Yang, X., Yan, J., Sun, Y., Wang, B., Yuan, C., Huang, Y.: SHadoop: improving MapReduce performance by optimizing job execution mechanism in Hadoop clusters. J. Parallel Distrib. Comput. 74, 2166–2179 (2014)CrossRefGoogle Scholar
  72. 72.
    Yang, Y., Xu, J., Wang, F., Ma, Z., Wang, J., Li, L.: A MapReduce task scheduling algorithm for deadline-constraint in homogeneous environment. In: 2014 Second International Conference on Advanced Cloud and Big Data, pp. 208–212. IEEE (2014)Google Scholar
  73. 73.
    Sadasivam, G.S., Selvaraj, D.: A novel parallel hybrid PSO-GA using MapReduce to schedule jobs in Hadoop data grids. In: Proceedings—2010 Second World Congress Nature and Biologically Inspired Computing NaBIC 2010, pp. 377–382 (2010)Google Scholar
  74. 74.
    Li, L., Tang, Z., Li, R., Yang, L.: New improvement of the Hadoop relevant data locality scheduling algorithm based on LATE. In: Procedings of 2011 International Conference on Mechatron Science, Electric Engineering and Computer, MEC 2011, pp. 1419–1422 (2011)Google Scholar
  75. 75.
    Zaharia, M., Chowdhury, M., Das, T., Dave, A.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, NSDI’12, pp. 2–2 (2012)Google Scholar
  76. 76.
    Engle, C., Lupher, A., Xin, R., Zaharia, M., Franklin, M.J., Shenker, S., Stoica, I.: Shark: fast data analysis using coarse-grained distributed memory. In: Proceedings of the SIGMOD—International Conference on Management of Data, pp. 689–692 (2012)Google Scholar
  77. 77.
    Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. https://dl.acm.org/citation.cfm?id=1863103.1863113 (2010)
  78. 78.
    Dokeroglu, T., Ozal, S., Bayir, M.A., Cinar, M.S., Cosar, A.: Improving the performance of Hadoop Hive by sharing scan and computation tasks. J. Cloud Comput. 3, 12 (2014)CrossRefGoogle Scholar
  79. 79.
    He, Y., Lee, R., Huai, Y., Shao, Z., Jain, N., Zhang, X., Xu, Z.: RCFile: A fast and space-efficient data placement structure in MapReduce-based warehouse systems. In: Proceedings of the International Conference on Data Engineering, pp. 1199–1208 (2011)Google Scholar
  80. 80.
    Thusoo, A., et al.: Hive—a petabyte scale data warehouse using Hadoop. In: Proceedings of the ICDE, pp. 996–1005 (2010)Google Scholar
  81. 81.
    Dokeroglu, T., Cınar, M.S., Yazıcı, A., Sert, S.A., Cosar, A.: Improving Hadoop hive query response times through efficient virtual resource allocation. Flex. Query Ans. Syst. 5822, 88–98 (2009)CrossRefGoogle Scholar
  82. 82.
    Xin, R.S., Rosen, J., Zaharia, M., Franklin, M.J., Shenker, S., Stoica, I.: Shark:SQL and rich analytics at scale. In: Proceedings of the 2013 International Conference on Management of data, SIGMOD’13, pp. 13–24 (2013)Google Scholar
  83. 83.
    Wang, G., Chan, C.-Y.: Multi-query optimization in MapReduce framework. In: Proceedings of VLDB Endowment, pp. 145–156 (2013)CrossRefGoogle Scholar
  84. 84.
    Bissiriou, C.A.A., Chaoui, H.: Big data analysis and query optimization improve HadoopDB performance. In: Proceedings of the 10th International Conference on Semantic Systems, SEM’14, pp. 1–4 (2014)Google Scholar
  85. 85.
    Silva, Y.N., Reed, J.M.: Exploiting MapReduce-based similarity joins. In: Proceedings of the 2012 International Conference on Management Data—SIGMOD’12, vol. 693 (2012)Google Scholar
  86. 86.
    Suciu, D.: Distributed query evaluation on semistructured data. ACM Trans. Database Syst. 27, 1–62 (2002)CrossRefGoogle Scholar
  87. 87.
    Ding, D., Dong, F., Luo, J.: Multi-Q: multiple queries optimization based on MapReduce in cloud. In: 2014 Second International Conference on Advanced Cloud and Big Data, pp. 100–107 (2014)Google Scholar
  88. 88.
    Aly, A.M., Elmeleegy, H., Qi, Y., Aref, W.: Kangaroo. In Proceedings of the Ninth ACM International Conference on Web Search Data Mining—WSDM’16, pp. 397–406 (2016)Google Scholar
  89. 89.
    Zou, H., Yu, Y., Tang, W., Chen, H.W.M.: FlexAnalytics: A flexible data analytics framework for big data applications with I/O performance improvement. Big Data Res. 1, 4–13 (2014)CrossRefGoogle Scholar
  90. 90.
    Li, H., Ghodsi, A., Zaharia, M., Baldeschwieler, E., Shenker, S., Stoica, I.: Tachyon: memory throughput I/O for cluster computing frameworks. Memory. 18, 1 (2013)Google Scholar
  91. 91.
    Yu, W., Member, S., Wang, Y., Que, X., Xu, C.: Virtual shuffling for efficient data movement in MapReduce. IEEE Trans. Comput. 64, 556–568 (2015)MathSciNetCrossRefGoogle Scholar
  92. 92.
    Yin, J., Wang, J.: Optimize parallel data access in big data processing. In: 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 721–724 (2015)Google Scholar
  93. 93.
    Wang, J., Xiao, Q., Yin, J., Shang, P.: DRAW: a new Data-gRouping-AWare data placement scheme for data intensive applications with interest locality. IEEE Trans. Magn. 49, 2514–2520 (2013)CrossRefGoogle Scholar
  94. 94.
    Xue, R., Gao, S., Ao, L., Guan, Z.: BOLAS: bipartite-graph oriented locality-aware scheduling for MapReduce tasks. In: 2015 14th International Symposium on Parallel and Distributed Computing, pp. 37–45. IEEE (2015)Google Scholar
  95. 95.
    Satapathy, S.C., Mandal, J.K., Udgata, S.K., Bhateja, V.: Information systems design and intelligent applications, vol. 434. Springer, New Delhi (2016)CrossRefGoogle Scholar
  96. 96.
    Tung, L.-D., Nguyen-Van, Q., Hu, Z.: Efficient query evaluation on distributed graphs with hadoop environment. In: ACM International Conference Proceedings Series, pp. 311–319 (2013)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Yasir Arfat
    • 1
  • Sardar Usman
    • 1
    Email author
  • Rashid Mehmood
    • 2
  • Iyad Katib
    • 1
  1. 1.Department of Computer ScienceFCIT, King Abdulaziz UniversityJeddahSaudi Arabia
  2. 2.High Performance Computing CenterKing Abdulaziz UniversityJeddahSaudi Arabia

Personalised recommendations