Advertisement

Improving the performance of Apache Hadoop on pervasive environments through context-aware scheduling

  • Guilherme W. Cassales
  • Andrea Schwertner Charão
  • Manuele Kirsch-Pinheiro
  • Carine Souveyet
  • Luiz-Angelo SteffenelEmail author
Original Research

Abstract

This article proposes to improve Apache Hadoop scheduling through a context-aware approach. Apache Hadoop is the most popular implementation of the MapReduce paradigm for distributed computing, but its design does not adapt automatically to computing nodes’ context and capabilities. By introducing context-awareness into Hadoop, we intent to dynamically adapt its scheduling to the execution environment. This is a necessary feature in the context of pervasive grids, which are heterogeneous, dynamic and shared environments. The solution has been incorporated into Hadoop and assessed through controlled experiments. The experiments demonstrate that context-awareness provides comparative performance gains, especially when some of the resources disappear during execution.

Keywords

Context Information Slave Node Gantt Chart Hadoop Cluster Pervasive Environment 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Abbreviations

API

Application programming interface

DHT

Distributed hash table

FIFO

First in, first out

HDFS

Hadoop distributed file system

P2P

Peer-to-Peer

PER-MARE

Pervasive map-reduce project

SLA

Service-level agreement

VM

Virtual machine

YARN

Yet another resource negotiator

Notes

Acknowledgments

The authors would like to thank their partners in the PER-MARE project STIC-AmSud (2014) and acknowledge the financial support given to this research by the CAPES/MAEE/ANII STIC-AmSud collaboration program (project number 13STIC07).

References

  1. Apache, Apache Hadoop, 2014. http://hadoop.apache.org/docs/r2.6.0/index.html. Last access: November 2014
  2. Assuncao MD, Netto MAS, Koch F, Bianchi S (2012) Context-aware job scheduling for cloud computing environments. In: IEEE Fifth International Conference on Utility and Cloud Computing (UCC). 2012. pp 255–262. doi: 10.1109/UCC.2012.33
  3. Baldauf M, Dustdar S, Rosenberg F (2007) A survey on context-aware systems. Int J Ad Hoc Ubiquitous Comput 2(4):263–277CrossRefGoogle Scholar
  4. Cassales GW, Charao AS, Pinheiro MK, Souveyet C, Steffenel LA (2014) Bringing Context to Apache Hadoop. In: 8th International Conference on Mobile Ubiquitous Computing, Rome, ItalyGoogle Scholar
  5. Cassales GW, Charao AS, Kirsch Pinheiro M, Souveyet C, Steffenel LA (2015) Context-aware scheduling for apache hadoop over pervasive environments. Procedia Comp Sci 52:202–209. The 6th International Conference on Ambient Systems, Networks and Technologies (ANT-2015), the 5th International Conference on Sustainable Energy Information Technology (SEIT-2015). doi: 10.1016/j.procs.2015.05.058. http://www.sciencedirect.com/science/article/pii/S1877050915008583
  6. Cavallo M, Cusma L, Modica GD, Polito C, Tomarchio O (2015) A scheduling strategy to run Hadoop jobs on geodistributed data. In: 3rd Workshop on CLoud for IoT (CLIoT 2015), in conjunction with the European Conference on Service-Oriented and Cloud Computing (ESOCC 2015)Google Scholar
  7. Chen Q, Zhang D, Guo M, Deng Q, Guo S (2010) SAMR: a self-adaptive MapReduce scheduling algorithm in heterogeneous environment, In: Proceedings of the 2010 10th IEEE International Conference on Computer and Information Technology. CIT ’10 (IEEE Computer Society, Washington, DC, USA, 2010), pp 2736–2743 (978-0-7695-4108-2)Google Scholar
  8. Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113CrossRefGoogle Scholar
  9. Engel T, Charo A, Kirsch-Pinheiro M, Steffenel LA (2015) Performance improvement of data mining in weka through multi-core and gpu acceleration: opportunities and pitfalls. J Ambient Intel Humaniz Comput 6(4):377–390. doi: 10.1007/s12652-015-0292-9
  10. Grid’5000, Grid 5000, 2013. https://www.grid5000.fr/, Last access: July 2014
  11. Hamilton, J.: Hadoop Wins TeraSort, 2008. http://perspectives.mvdirona.com/2008/07/hadoop-wins-terasort/. Last access: September 2015
  12. Hofmann P, Woods D (2010) Cloud computing: the limits of public clouds for business applications. IEEE Internet Comput 14(6):90–93. doi: 10.1109/MIC.2010.136 CrossRefGoogle Scholar
  13. Huang S, Huang J, Dai J, Xie T, Huang B: The HiBench benchmark suite: Characterization of the MapReduce-based data analysis. In: 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW), 2010, pp 41–51. doi: 10.1109/ICDEW.2010.5452747
  14. Hunt P, Konar M, Junqueira FP, Reed B, ZooKeeper: wait-free Coordination for Internet-scale Systems. In: Proceedings of the USENIX Annual Technical Conference (USENIX Association, Boston, MA, USA, 2010), pp 11. http://dl.acm.org/citation.cfm?id=1855840.1855851
  15. Isard M, Prabhakaran V, Currey J, Wieder U, Talwar K, Goldberg A (2009) Quincy: fair scheduling for distributed computing clusters, in Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles. SOSP ’09 (ACM, New York, NY, USA, 2009), pp 261–276 (978-1-60558-752-3)Google Scholar
  16. Kumar KA, Konishetty VK, Voruganti K, Rao GVP (2012) CASH: context aware scheduler for Hadoop. In: Proceedings of the International Conference on Advances in Computing, Communications and Informatics. ICACCI ’12, New York, NY, USA, 2012, pp 52–61 (978-1-4503-1196-0)Google Scholar
  17. Li J, Wang Q, Jayasinghe D, Park J, Zhu T, Pu C (2013) Performance overhead among three hypervisors: an experimental study using Hadoop benchmarks. In: 2013 IEEE International Congress on Big Data (BigData Congress) 2013, pp 9–16. 2013, doi: 10.1109/BigData.Congress..11
  18. Maamar Z, Benslimane D, Narendra NC (2006) What can context do for web services? Commun ACM 49(12):98–103CrossRefGoogle Scholar
  19. Marozzo F, Talia D, Trunfio P (2012) P2p-mapreduce: parallel data processing in dynamic cloud environments. J Comput Syst Sci 78(5):1382–1402CrossRefGoogle Scholar
  20. Maurer M, Brandic I, Sakellariou R (2012) Self-adaptive and resource-efficient SLA enactment for cloud computing infrastructures. In: 2012 IEEE 5th International Conference on cloud computing (CLOUD), 2012, pp 368–375. doi: 10.1109/CLOUD.2012.55
  21. Najar S, Kirsch M, Pinheiro C (2015) Souveyet, service discovery and prediction on pervasive information system. J Ambient Intell Human Comp 6(4):407–423. doi: 10.1007/s12652-015-0288-5 CrossRefGoogle Scholar
  22. Nascimento AP, Boeres C, Rebello VEF (2008) Dynamic self-scheduling for parallel applications with task dependencies. In: Proceedings of the 6th International Workshop on MGC. MGC ’08, New York, NY, USA, 2008, pp 1–116 (978-1-60558-365-5)Google Scholar
  23. Oracle, Overview of Java SE Monitoring and Management, 2014. http://docs.oracle.com/javase/7/docs/technotes/guides/management/overview.html, Last access: July 2014
  24. Parashar M, Pierson JM (2010) Pervasive grids: challenges and opportunities. In: Li K, Hsu C, Yang L, Dongarra J, Zima H (eds) Handbook of Research on Scalable Computing Technologies. (IGI Global, 2010), pp 14–30. doi: 10.4018/978-1-60566-661-7.ch002 ( 978–160566661-7)
  25. Ramakrishnan A, Preuveneers D, Berbers Y (2014) Enabling self-learning in dynamic and open IoT environments. In: Shakshuki E, Yasar A (eds) The 5th International Conference on Ambient Systems, Networks and Technologies (ANT-2014), the 4th International Conference on Sustainable Energy Information Technology (SEIT-2014), vol. 32, 2014, pp 207–214. doi: 10.1016/j.procs.2014.05.416
  26. Rasooli A, Down DG (2012) Coshh: a classification and optimization based scheduler for heterogeneous hadoop systems. In: Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis. SCC ’12 (IEEE Computer Society, Washington, DC, USA, 2012), pp. 1284–1291 (978-0-7695-4956-9)Google Scholar
  27. Sandholm T, Lai K (2010) Dynamic Proportional Share Scheduling in Hadoop. In: Proceedings of the 15th International Conference on Job Scheduling Strategies for Parallel Processing. JSSPP’10, Berlin, Heidelberg, 2010, pp 110–131. (3–642-16504-4, 978-3-642-16504-7)Google Scholar
  28. Schadt EE, Linderman MD, Sorenson J, Lee L, Nolan GP (2010) Computational solutions to large-scale data management and analysis. Nat Rev Genet 11(9):647–657. doi: 10.1038/nrg2857
  29. Steffenel LA, Kirsch Pinheiro M (2015) Leveraging data intensive applications on a pervasive computing platform: The case of mapreduce. Procedia Comp Sci 52:1034–1039 (2015). The 6th International Conference on Ambient Systems, Networks and Technologies (ANT-2015), the 5th International Conference on Sustainable Energy Information Technology (SEIT-2015). doi: 10.1016/j.procs.2015.05.102. http://www.sciencedirect.com/science/article/pii/S1877050915009023
  30. Steffenel LA, Flauzac O, Charão AS, Barcelos PP, Stein B, Nesmachnow S, Kirsch Pinheiro M, Diaz D (2013) PER-MARE: adaptive deployment of MapReduce over pervasive grids. In: Proceedings of the 2013 Eighth International Conference on P2P, Parallel, Grid, Cloud and Internet Computing. 3PGCIC ’13 (IEEE Computer Society, Washington, DC, USA, 2013), pp 17–24 (978-0-7695-5094-7)Google Scholar
  31. STIC-AmSud, PER-MARE project, 2014. http://cosy.univ-reims.fr/PER-MARE, Last access: July 2014
  32. Tian C, Zhou H, He Y, Zha L (2009) A dynamic MapReduce scheduler for heterogeneous workloads. In: Proceedings of the 2009 Eighth International Conference on Grid and Cooperative Computing. GCC ’09 (IEEE Computer Society, Washington, DC, USA, 2009), pp 218–224 (978-0-7695-3766-5)Google Scholar
  33. Xie J, Ruan X, Ding Z, Tian Y, Majors J, Manzanares A, Yin S, Qin X (2010) Improving MapReduce performance through data placement in heterogeneous Hadoop clusters, in Parallel and Distributed Processing, Workshops and Phd Forum (IPDPSW)Google Scholar
  34. Zaharia M, Konwinski A, Joseph AD, Katz R, Stoica I (2008) Improving MapReduce performance in heterogeneous environments, in Proceedings of the 8th USENIX conference on Operating systems design and implementation. OSDI’08 (USENIX Association, Berkeley, CA, USA, 2008), pp 29–42Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  • Guilherme W. Cassales
    • 1
  • Andrea Schwertner Charão
    • 1
  • Manuele Kirsch-Pinheiro
    • 2
  • Carine Souveyet
    • 2
  • Luiz-Angelo Steffenel
    • 3
    Email author
  1. 1.Laboratório de Sistemas de ComputaçãoUniversidade Federal de Santa MariaSanta MariaBrazil
  2. 2.Centre de Recherche en InformatiqueUniversité Paris 1 Panthéon-SorbonneParisFrance
  3. 3.Laboratoire CReSTIC—Équipe SysComUniversité de Reims Champagne-ArdenneReimsFrance

Personalised recommendations