Toward Scheduling I/O Request of Mapreduce Tasks Based on Markov Model

Ikken, Sonia; Renault, Éric; Tahar Kechadi, M.; Tari, Abdelkamel

doi:10.1007/978-3-319-25744-0_7

Sonia Ikken^16,17,
Éric Renault^16,17,
M. Tahar Kechadi¹⁸ &
…
Abdelkamel Tari¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 9395))

Included in the following conference series:

International Conference on Mobile, Secure and Programmable Networking

628 Accesses
1 Citations
1 Altmetric

Abstract

In Cloud storage of multiple CPU cores, many Mapreduce applications may run in parallel on each compute node and collocate with local Disks storage. These Disks storage are shared by multiple applications that use full CPU power of the node. Each application tends to issue contiguous I/O requests in parallel to the same Disk; however if large number of Mapreduce tasks enters the I/O phase at the same time, the requests from the same task may be interrupted by the requests of other tasks. Then, the I/O nodes receive these requests as non-contiguous way under I/O contention. This interleaved access pattern causes performance degradation for Mapreduce application, this is particularly important when writing intermediate files by multiple tasks in parallel to the shared Disk storage. In order to overcome this problem, we have proposed approach for optimizing write access for Mapreduce application. The contributions of this paper are: (1) analyze the open issues on scheduling access request of Mapreduce workload; (2) propose framework for scheduling and predicting I/O request of Mapreduce application; (3) describe each role of component that intervenes in the scheduling theses I/O request on Block-level of storage server to provide contiguous access.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Apache Hadoop Core. http://hadoop.apache.org/core
Zhang, X., Davis, K., Jiang, S.: Opportunistic data-driven execution of parallel programs for efficient I/O services. In: Proceedings of IPDPS12, pp. 330–341. IEEE (2012)
Google Scholar
Lofstead, J., Zheng, F., Liu, Q., Klasky, S., Oldfield, R., Kordenbrock, T., Schwan, K., Wolf, M.: Managing variability in the IO performance of petascale storage systems. In: Proceedings of SC10. IEEE Computer Society (2010)
Google Scholar
Ching, W.-K., Ng, M.K.: Markov Chains: Models Algorithms and Applications. Springer, US (2006)
MATH Google Scholar
Filip, B., Cyril, G., Qingbo, W., Timothy, T.: Priority IO scheduling in the cloud. In: Proceeding of HotCloud 2013, the 5th USENIX Workshop on Hot Topics in Cloud Computing (2013)
Google Scholar
Prashant, T., Sushma, S.: A development approach towards self learning schedulers in Linux. Proc. Int. J. Recent Innov. Trends Comput. Commun. 2(4), 814–819 (2014)
Google Scholar
Iyer, S., Druschel, P.: Anticipatory scheduling: a disk scheduling framework to overcome deceptive idleness in synchronous I/O. In: ACM Symposium on Operating Systems Principles (SOSP 2001) (2001)
Google Scholar
Kambatla, K., Pathak, A., Pucha, H.: Towards optimizing hadoop provisioning in the cloud. In: Proceeding of HotCloud. USENIX, Berkeley (2009)
Google Scholar
Huai, Y., Lee, R., Zhang, S., Xia, C.H., Zhang, X.: DOT: a matrix model for analyzing, optimizing and deploying software for big data analytics in distributed systems. In: Proceeding of SOCC, pp. 4:1–4:14. ACM, New York (2011)
Google Scholar
Jahani, E., Cafarella, M.J., Ré, C.: Automatic optimization for MapReduce programs. Proc. VLDB Endow 4(6), 385–396 (2011)
Article Google Scholar
Yang, H., Luan, Z., Li, W., Qian, D.: MapReduce workload modeling with statistical approach. J. Grid Comput. 10, 279–310 (2012). doi:10.1007/s10723-011-9201-4
Article Google Scholar
Herodotou, H.: Hadoop performance models, Technical report, Duke University (2010). http://www.cs.duke.edu/starfish/files/hadoop-models.pdf
Jindal, A., Quiané-Ruiz, J.-A., Dittrich, J.: Trojan data layouts: right shoes for a running elephant. In: Proceeding of SOCC, pp. 21:121:14. ACM, New York (2011)
Google Scholar
Siyuan, M., Xian-He, S., Ioan, R.: I/O Throttling and Coordination for MapReduce. Technical Report, Illinois Institute of Technology (2012)
Google Scholar
Yiqi, X., Adrian, S., Ming, Z.: IBIS: interposed big-data I/O scheduler. In: Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, pp. 109–110. ACM (2013)
Google Scholar
Pu, X., Liu, L., Mei, Y., Sivathanu, S., Koh, Y., Pu, C.: Understanding performance interference of I/O workload in virtualized cloud environments. In: Proceeding of CLOUD, pp. 51–58 (2010)
Google Scholar
Mesnier, M.P., Wachs, M., Sambasivan, R.R., Zheng, A.X., Ganger, G.R.: Modeling the relativetness of storage. In: Proceeding of SIGMETRICS, pp. 37–48. ACM, New York
Google Scholar
Gulati, A., Shanmuganathan, G., Ahmad, I., Waldspurger, C., Uysal, M.: Pesto: online storage performance management in virtualized datacenters. In: Proceeding of SOCC, pp. 19:1–19:14. ACM, New York (2011)
Google Scholar
Chiang, R., Huang, H.: TRACON: interference-aware scheduling for data-intensive applications in virtualized environments. In: Proceedings of SC, pp. 1–12 (2011)
Google Scholar
Celis, J.R., Gonzales, D., Lagda, E., Rutaquio Jr., L.: A comprehensive review for disk scheduling algorithms. Int. J. Comput. Sci. Issues (IJCSI) 11(1), 74 (2014)
Google Scholar

Download references

Acknowledgments

Work funded by the European Commission under the Erasmus Mundus GreenIT project (GreenIT for the benefit of civil society. 3772227-1-2012-ES-ERAMUNDUS-EMA21; Grant Agreement n 2012-2625/001-001-EMA2).

Author information

Authors and Affiliations

Institut Mines-Télécom – Télécom SudParis, Évry, France
Sonia Ikken & Éric Renault
Laboratoire Samovar UMR CNRS 5157, Évry, France
Sonia Ikken & Éric Renault
UCD School of Computer Science and Informatics, Dublin, Ireland
M. Tahar Kechadi
University of Abdarahmane Mira, Bejaia, Algeria
Abdelkamel Tari

Authors

Sonia Ikken
View author publications
You can also search for this author in PubMed Google Scholar
Éric Renault
View author publications
You can also search for this author in PubMed Google Scholar
M. Tahar Kechadi
View author publications
You can also search for this author in PubMed Google Scholar
Abdelkamel Tari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sonia Ikken .

Editor information

Editors and Affiliations

CNAM/CEDRIC, Paris, France
Selma Boumerdassi
CNAM/CEDRIC, Paris, France
Samia Bouzefrane
Institut Mines-Télécom -Télécom SudParis, Evry, France
Éric Renault

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ikken, S., Renault, É., Tahar Kechadi, M., Tari, A. (2015). Toward Scheduling I/O Request of Mapreduce Tasks Based on Markov Model. In: Boumerdassi, S., Bouzefrane, S., Renault, É. (eds) Mobile, Secure, and Programmable Networking. MSPN 2015. Lecture Notes in Computer Science(), vol 9395. Springer, Cham. https://doi.org/10.1007/978-3-319-25744-0_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-25744-0_7
Published: 25 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25743-3
Online ISBN: 978-3-319-25744-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics