Advertisement

Toward Scheduling I/O Request of Mapreduce Tasks Based on Markov Model

  • Sonia IkkenEmail author
  • Éric Renault
  • M. Tahar Kechadi
  • Abdelkamel Tari
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9395)

Abstract

In Cloud storage of multiple CPU cores, many Mapreduce applications may run in parallel on each compute node and collocate with local Disks storage. These Disks storage are shared by multiple applications that use full CPU power of the node. Each application tends to issue contiguous I/O requests in parallel to the same Disk; however if large number of Mapreduce tasks enters the I/O phase at the same time, the requests from the same task may be interrupted by the requests of other tasks. Then, the I/O nodes receive these requests as non-contiguous way under I/O contention. This interleaved access pattern causes performance degradation for Mapreduce application, this is particularly important when writing intermediate files by multiple tasks in parallel to the shared Disk storage. In order to overcome this problem, we have proposed approach for optimizing write access for Mapreduce application. The contributions of this paper are: (1) analyze the open issues on scheduling access request of Mapreduce workload; (2) propose framework for scheduling and predicting I/O request of Mapreduce application; (3) describe each role of component that intervenes in the scheduling theses I/O request on Block-level of storage server to provide contiguous access.

Keywords

Mapreduce Cloud storage Disk I/O Markov model Scheduling algorithm 

Notes

Acknowledgments

Work funded by the European Commission under the Erasmus Mundus GreenIT project (GreenIT for the benefit of civil society. 3772227-1-2012-ES-ERAMUNDUS-EMA21; Grant Agreement n 2012-2625/001-001-EMA2).

References

  1. 1.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  2. 2.
    Apache Hadoop Core. http://hadoop.apache.org/core
  3. 3.
    Zhang, X., Davis, K., Jiang, S.: Opportunistic data-driven execution of parallel programs for efficient I/O services. In: Proceedings of IPDPS12, pp. 330–341. IEEE (2012)Google Scholar
  4. 4.
    Lofstead, J., Zheng, F., Liu, Q., Klasky, S., Oldfield, R., Kordenbrock, T., Schwan, K., Wolf, M.: Managing variability in the IO performance of petascale storage systems. In: Proceedings of SC10. IEEE Computer Society (2010)Google Scholar
  5. 5.
    Ching, W.-K., Ng, M.K.: Markov Chains: Models Algorithms and Applications. Springer, US (2006)zbMATHGoogle Scholar
  6. 6.
    Filip, B., Cyril, G., Qingbo, W., Timothy, T.: Priority IO scheduling in the cloud. In: Proceeding of HotCloud 2013, the 5th USENIX Workshop on Hot Topics in Cloud Computing (2013)Google Scholar
  7. 7.
    Prashant, T., Sushma, S.: A development approach towards self learning schedulers in Linux. Proc. Int. J. Recent Innov. Trends Comput. Commun. 2(4), 814–819 (2014)Google Scholar
  8. 8.
    Iyer, S., Druschel, P.: Anticipatory scheduling: a disk scheduling framework to overcome deceptive idleness in synchronous I/O. In: ACM Symposium on Operating Systems Principles (SOSP 2001) (2001)Google Scholar
  9. 9.
    Kambatla, K., Pathak, A., Pucha, H.: Towards optimizing hadoop provisioning in the cloud. In: Proceeding of HotCloud. USENIX, Berkeley (2009)Google Scholar
  10. 10.
    Huai, Y., Lee, R., Zhang, S., Xia, C.H., Zhang, X.: DOT: a matrix model for analyzing, optimizing and deploying software for big data analytics in distributed systems. In: Proceeding of SOCC, pp. 4:1–4:14. ACM, New York (2011)Google Scholar
  11. 11.
    Jahani, E., Cafarella, M.J., Ré, C.: Automatic optimization for MapReduce programs. Proc. VLDB Endow 4(6), 385–396 (2011)CrossRefGoogle Scholar
  12. 12.
    Yang, H., Luan, Z., Li, W., Qian, D.: MapReduce workload modeling with statistical approach. J. Grid Comput. 10, 279–310 (2012). doi: 10.1007/s10723-011-9201-4 CrossRefGoogle Scholar
  13. 13.
    Herodotou, H.: Hadoop performance models, Technical report, Duke University (2010). http://www.cs.duke.edu/starfish/files/hadoop-models.pdf
  14. 14.
    Jindal, A., Quiané-Ruiz, J.-A., Dittrich, J.: Trojan data layouts: right shoes for a running elephant. In: Proceeding of SOCC, pp. 21:121:14. ACM, New York (2011)Google Scholar
  15. 15.
    Siyuan, M., Xian-He, S., Ioan, R.: I/O Throttling and Coordination for MapReduce. Technical Report, Illinois Institute of Technology (2012)Google Scholar
  16. 16.
    Yiqi, X., Adrian, S., Ming, Z.: IBIS: interposed big-data I/O scheduler. In: Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, pp. 109–110. ACM (2013)Google Scholar
  17. 17.
    Pu, X., Liu, L., Mei, Y., Sivathanu, S., Koh, Y., Pu, C.: Understanding performance interference of I/O workload in virtualized cloud environments. In: Proceeding of CLOUD, pp. 51–58 (2010)Google Scholar
  18. 18.
    Mesnier, M.P., Wachs, M., Sambasivan, R.R., Zheng, A.X., Ganger, G.R.: Modeling the relativetness of storage. In: Proceeding of SIGMETRICS, pp. 37–48. ACM, New YorkGoogle Scholar
  19. 19.
    Gulati, A., Shanmuganathan, G., Ahmad, I., Waldspurger, C., Uysal, M.: Pesto: online storage performance management in virtualized datacenters. In: Proceeding of SOCC, pp. 19:1–19:14. ACM, New York (2011)Google Scholar
  20. 20.
    Chiang, R., Huang, H.: TRACON: interference-aware scheduling for data-intensive applications in virtualized environments. In: Proceedings of SC, pp. 1–12 (2011)Google Scholar
  21. 21.
    Celis, J.R., Gonzales, D., Lagda, E., Rutaquio Jr., L.: A comprehensive review for disk scheduling algorithms. Int. J. Comput. Sci. Issues (IJCSI) 11(1), 74 (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Sonia Ikken
    • 1
    • 2
    Email author
  • Éric Renault
    • 1
    • 2
  • M. Tahar Kechadi
    • 3
  • Abdelkamel Tari
    • 4
  1. 1.Institut Mines-Télécom – Télécom SudParisÉvryFrance
  2. 2.Laboratoire Samovar UMR CNRS 5157ÉvryFrance
  3. 3.UCD School of Computer Science and InformaticsDublinIreland
  4. 4.University of Abdarahmane MiraBejaiaAlgeria

Personalised recommendations