Skip to main content

Advertisement

SpringerLink
Log in
Menu
Find a journal Publish with us
Search
Cart
Book cover

European Conference on Parallel Processing

Euro-Par 2012: Euro-Par 2012 Parallel Processing pp 179–190Cite as

  1. Home
  2. Euro-Par 2012 Parallel Processing
  3. Conference paper
Scheduling MapReduce Jobs in HPC Clusters

Scheduling MapReduce Jobs in HPC Clusters

  • Marcelo Veiga Neves19,
  • Tiago Ferreto19 &
  • César De Rose19 
  • Conference paper
  • 3132 Accesses

  • 5 Citations

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 7484)

Abstract

MapReduce (MR) has become a de facto standard for large-scale data analysis. Moreover, it has also attracted the attention of the HPC community due to its simplicity, efficiency and highly scalable parallel model. However, MR implementations present some issues that may complicate its execution in existing HPC clusters, specially concerning the job submission. While on MR there are no strict parameters required to submit a job, in a typical HPC cluster, users must specify the number of nodes and amount of time required to complete the job execution. This paper presents the MR Job Adaptor, a component to optimize the scheduling of MR jobs along with HPC jobs in an HPC cluster. Experiments performed using real-world HPC and MapReduce workloads have show that MR Job Adaptor can properly transform MR jobs to be scheduled in an HPC Cluster, minimizing the job turnaround time, and exploiting unused resources in the cluster.

Keywords

  • Turnaround Time
  • Resource Management System
  • Free Slot
  • Unused Resource
  • Mixed Workload

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Download conference paper PDF

References

  1. Apache Hadoop on Demand (HOD) (2012), http://hadoop.apache.org/common/docs/current/hod_scheduler.html (accessed on February 2012)

  2. Parallel Workloads Archive (2012), http://www.cs.huji.ac.il/labs/parallel/workload/ (accessed on February 2012)

  3. Casanova, H.: Simgrid: A toolkit for the simulation of application scheduling. In: Proceedings of the First IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 2001), Brisbane, Australia (May 2001)

    Google Scholar 

  4. Chen, Y., Ganapathi, A., Griffith, R., Katz, R.H.: The case for evaluating mapreduce performance using workload suites. In: MASCOTS, pp. 390–399. IEEE (2011)

    Google Scholar 

  5. De Rose, C.A.F., Ferreto, T., Calheiros, R.N., Cirne, W., Costa, L.B., Fireman, D.: Allocation strategies for utilization of space shared resources in bag of tasks grids. Future Generation Computer Systems 24(5), 331–341 (2008)

    CrossRef  Google Scholar 

  6. Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)

    CrossRef  Google Scholar 

  7. Ekanayake, J., et al.: Twister: a runtime for iterative mapreduce. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC 2010, pp. 810–818. ACM, New York (2010)

    CrossRef  Google Scholar 

  8. Feitelson, D.G., Mu’alem Weil, A.: Utilization and predictability in scheduling the IBM SP2 with backfilling. In: 12th Intl. Parallel Processing Symp (IPPS), pp. 542–546 (April 1998)

    Google Scholar 

  9. Fox, G., et al.: Parallel data mining from multicore to cloudy grids. In: Proceedings of HPC 2008 (2011)

    Google Scholar 

  10. Gropp, W., Lusk, E., Skjellum, A.: Using MPI Portable Parallel Programming with the Message Passing Interface. The MIT Press (1994)

    Google Scholar 

  11. Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A., Joseph, A.D., Katz, R., Shenker, S., Stoica, I.: Mesos: Flexible resource sharing for the cloud. USENIX (August 2011)

    Google Scholar 

  12. Isard, M., et al.: Dryad: distributed data-parallel programs from sequential building blocks. In: Proceedings of EuroSys 2007 (January 2007)

    Google Scholar 

  13. Krishnan, S., Tatineni, M.: Myhadoop-hadoop-on-demand on traditional hpc resources. sdsc.edu (2011), http://www.sdsc.edu/~allans/MyHadoop.pdf

  14. Lublin, U., Feitelson, D.G.: The workload on parallel supercomputers: Modeling the characteristics of rigid jobs. J. Parallel & Distributed Comput. 63(11), 1105–1122 (2003)

    CrossRef  MATH  Google Scholar 

  15. Middleton, A.: Data-intensive technologies for cloud computing. In: Handbook of Cloud Computing (January 2010)

    Google Scholar 

  16. Oracle: Oracle Grid Engine, previously known as Sun Grid Engine (SGE) (2012), http://www.oracle.com/technetwork/oem/grid-engine-166852.html (accessed on February 2012)

  17. Schadt, E., Linderman, M., Sorenson, J.: Computational solutions to large-scale data management and analysis. Nature Reviews (January 2010)

    Google Scholar 

  18. Sehrish, S., et al.: Mrap: a novel mapreduce-based framework to support hpc analytics applications with access patterns. In: Proceedings of HPDC 2010, pp. 107–118 (2010), http://doi.acm.org/10.1145/1851476.1851490

  19. Srirama, S., Jakovits, P.: Adapting scientific computing problems to clouds using mapreduce. Future Generation Computer Systems (January 2011)

    Google Scholar 

  20. Team, A.H.: Apache hadoop web site (2011), http://hadoop.apache.org (accessed on February 2012)

  21. Team, A.H.: Hamster: Hadoop and mpi on the same cluster (2011), https://issues.apache.org/jira/browse/MAPREDUCE-2911 (accessed on February 2012)

  22. Top 500: Top 500 Supercomputers Site (2012), http://www.top500.org (accessed on February 2012)

  23. TORQUE: TORQUE Resource Manager (2012), http://www.clusterresources.com/products/torque-resource-manager.php (accessed on February 2012)

  24. Verma, A., Cherkasova, L., Campbell, R.H.: Aria: automatic resource inference and allocation for mapreduce environments. In: Proceedings of ICAC 2011, pp. 235–244 (2011)

    Google Scholar 

  25. Wang, G., et al.: Towards synthesizing realistic workload traces for studying the hadoop ecosystem. In: MASCOTS. pp. 400–408. IEEE (2011)

    Google Scholar 

  26. Zaharia, M., et al.: Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Morin, C., Muller, G. (eds.) EuroSys, pp. 265–278. ACM (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Faculty of Informatics, PUCRS, Brazil

    Marcelo Veiga Neves, Tiago Ferreto & César De Rose

Authors
  1. Marcelo Veiga Neves
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Tiago Ferreto
    View author publications

    You can also search for this author in PubMed Google Scholar

  3. César De Rose
    View author publications

    You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

  1. University of Patras, Computer Technology Institute and Press “Diophantus”,, N. Kazantzaki, 26504, Rio, Greece

    Christos Kaklamanis

  2. University of Patras, University Building B, 26504, Rio, Greece

    Theodore Papatheodorou

  3. Computer Technology Institute and Press “Diophantus”, University of Patras, N. Kazantzaki, 26504, Rio, Greece

    Paul G. Spirakis

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Neves, M.V., Ferreto, T., De Rose, C. (2012). Scheduling MapReduce Jobs in HPC Clusters. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds) Euro-Par 2012 Parallel Processing. Euro-Par 2012. Lecture Notes in Computer Science, vol 7484. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32820-6_19

Download citation

  • .RIS
  • .ENW
  • .BIB
  • DOI: https://doi.org/10.1007/978-3-642-32820-6_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32819-0

  • Online ISBN: 978-3-642-32820-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Search

Navigation

  • Find a journal
  • Publish with us

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our imprints

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support

167.114.118.210

Not affiliated

Springer Nature

© 2023 Springer Nature