CAD: An Efficient Data Management and Migration Scheme across Clouds for Data-Intensive Scientific Applications

  • Ching-Hsien Hsu
  • Alfredo Cuzzocrea
  • Shih-Chang Chen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6864)


Data management and migration are important research challenges of novel Cloud environments. While moving data among different geographical domains, it is important to lower the transmission cost for performance purposes. Efficient scheduling methods allow us to manage data transmissions with lower number of steps and shorter transmission time. In previous research efforts, several methods have been proposed in literature in order to manage data and minimize transmission cost for the case of Single Cluster environments. Unfortunately, these methods are not suitable to large-scale and complicated environments such as Clouds, with particular regard to the case of scheduling policies. Starting from these motivations, in this paper we propose an efficient data transmission method for data-intensive scientific applications over Clouds, called Cloud Adaptive Dispatching (CAD). This method adapts to specialized characteristics of Cloud systems and successfully shortens the transmission cost, while also avoiding node contention during moving data from sites to sites. We conduct an extensive campaign of experiments focused to test the effective performance of CAD. Results clearly demonstrate the improvements offered by CAD in supporting data transmissions across Clouds for data-intensive scientific applications.


Schedule Algorithm Cloud Environment Migration Scheme Transmission Cost Transmission Schedule 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    de Assuncao, M.D., di Costanzo, A., Buyya, R.: Evaluating the Cost-Benefit of Using Cloud Computing to Extend the Capacity of Clusters. In: Proceedings of the 18th ACM International Symposium on High Performance Distributed Computing, pp. 141–150 (June 2009)Google Scholar
  2. 2.
    Baptiste, P., Brucker, P., Chrobak, M., Dürr, C., Kravchenko, S.A., Sourd, F.: The Complexity of Mean Flow Time Scheduling Problems with Release Times. Journal of Scheduling 10(2), 139–146 (2007)MathSciNetzbMATHCrossRefGoogle Scholar
  3. 3.
    Brucker, P., Kravchenko, S.A.: Scheduling Jobs with Equal Processing Times and Time Windows on Identical Parallel Machines. Journal of Scheduling 11(4), 229–237 (2008)MathSciNetzbMATHCrossRefGoogle Scholar
  4. 4.
    Byun, E.J., Choi, S.J., Baik, M.S., Gil, J.M., Park, C.Y., Hwang, C.S.: MJSA: Markov Job Scheduler based on Availability in Desktop Grid Computing Environments. Future Generation Computer Systems 23(4), 616–622 (2007)CrossRefGoogle Scholar
  5. 5.
    Castillo, C., Rouskas, G.N., Harfoush, K.: Efficient Resource Management Using Advance Reservations for Heterogeneous Grids. In: Proceedings of 21st IEEE International Parallel and Distributed Processing, pp. 1–12 (April 2008)Google Scholar
  6. 6.
    Chang, R.-S., Chang, J.-S., Lin, P.-S.: An Ant Algorithm for Balanced Job Scheduling in Grids. Future Generation Computer Systems 25(1), 20–27 (2009)CrossRefGoogle Scholar
  7. 7.
    Cheng, C.-W., Wu, J.-J., Liu, P.: QoS-Aware, Access-Efficient, and Storage-Efficient Replica Placements. Journal of Supercomputing 49(1), 42–63 (2009)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Cohen, J., Jeannot, E., Padoy, N., Wagner, F.c.: Messages Scheduling for Parallel Data Redistribution between Clusters. IEEE Transactions on Parallel and Distributed Systems 17(10), 1163–1175 (2006)CrossRefGoogle Scholar
  9. 9.
    Grounds, N.G., Antonio, J.K., Muehring, J.: Cost-Minimizing Scheduling of Workflows on a Cloud of Memory Managed Multicore Machines. In: Jaatun, M.G., Zhao, G., Rong, C. (eds.) Cloud Computing. LNCS, vol. 5931, pp. 435–450. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  10. 10.
    Guo, M., Pan, Y., Liu, Z.: Symbolic Communication Set Generation for Irregular Parallel Applications. The Journal of Supercomputing 25(3), 199–214 (2003)zbMATHCrossRefGoogle Scholar
  11. 11.
    Hsu, C.-H., Bai, S.-W., Chung, Y.-C., Yang, C.-S.: A Generalized Basic-Cycle Calculation Method for Efficient Array Redistribution. IEEE Transactions on Parallel and Distributed Systems 11(12), 1201–1216 (2000)CrossRefGoogle Scholar
  12. 12.
    Hsu, C.-H., Chen, M.-H., Yang, C.-T., Li, K.-C.: Optimizing Communications of Dynamic Data Redistribution on Symmetrical Matrices in Parallelizing Compilers. IEEE Transactions on Parallel and Distributed Systems 17(11), 1226–1241 (2006)CrossRefGoogle Scholar
  13. 13.
    Hsu, C.-H., Chen, S.-C., Lan, C.-Y.: Scheduling Contention-Free Irregular Redistributions in Parallelizing Compilers. The Journal of Supercomputing 40(3), 229–247 (2007)CrossRefGoogle Scholar
  14. 14.
    Huang, J.-W., Chu, C.-P.: A Flexible Processor Mapping Technique toward Data Localization for Block-Cyclic Data Redistribution. The Journal of Supercomputing 45(2), 151–172 (2008)CrossRefGoogle Scholar
  15. 15.
    Jeannot, E., Wagner, F.: Scheduling Messages for Data Redistribution: An Experimental Study. The International Journal of High Performance Computing Applications 20(4), 443–454 (2006)CrossRefGoogle Scholar
  16. 16.
    Kalpakis, K., Dasgupta, K., Wolfson, O.: Optimal Placement of Replicas in Trees with Read, Write, and Storage Costs. IEEE Transactions on Parallel and Distributed Systems 12(6), 628–637 (2001)CrossRefGoogle Scholar
  17. 17.
    Karwande, A., Yuan, X., Lowenthal, D.K.: An MPI Prototype for Compiled Communication on Ethernet Switched Clusters. Journal of Parallel and Distributed Computing 65(10), 1123–1133 (2005)CrossRefGoogle Scholar
  18. 18.
    Lin, P.-Y., Liu, P.: Job Scheduling Techniques for Distributed Systems with Temporal Constraints. In: Bellavista, P., Chang, R.-S., Chao, H.-C., Lin, S.-F., Sloot, P.M.A. (eds.) GPC 2010. LNCS, vol. 6104, pp. 280–289. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  19. 19.
    Liu, H., Orban, D.: GridBatch: Cloud Computing for Large-Scale Data-Intensive Batch Applications. In: Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid, May 2008, pp. 295–304 (2008)Google Scholar
  20. 20.
    Prylli, L., Touranchean, B.: Fast Runtime Block Cyclic Data Redistribution on Multiprocessors. Journal of Parallel and Distributed Computing 45(1), 63–72 (1997)zbMATHCrossRefGoogle Scholar
  21. 21.
    Rauber, T., Rünger, G.: A Data Re-Distribution Library for Multi-Processor Task Programming. International Journal of Foundations of Computer Science 17(2), 251–270 (2006)zbMATHCrossRefGoogle Scholar
  22. 22.
    Sudarsan, R., Ribbens, C.J.: Efficient Multidimensional Data Redistribution for Resizable Parallel Computations. In: Stojmenovic, I., Thulasiram, R.K., Yang, L.T., Jia, W., Guo, M., de Mello, R.F. (eds.) ISPA 2007. LNCS, vol. 4742, pp. 182–194. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  23. 23.
    Tu, M., Li, P., Ma, Q., Yen, I.-L., Bastani, F.B.: On the Optimal Placement of Secure Data Objects over Internet. In: Proceedings of 19th IEEE International Parallel and Distributed Processing, pp. 14–14 (April 2005)Google Scholar
  24. 24.
    Wang, H., Guo, M., Wei, D.: Divide-and-Conquer Algorithm for Irregular Redistribution in Parallelizing Compilers. The Journal of Supercomputing 29(2), 157–170 (2004)zbMATHCrossRefGoogle Scholar
  25. 25.
    Wang, H., Guo, M., Wei, D.: Message Scheduling for Irregular Data Redistribution in Parallelizing Compilers. IEICE Transactions on Information and Systems E89-D(2), 418–424 (2006)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Wee, S., Liu, H.: Client-Side Load Balancer using Cloud. In: Proceedings of the 2010 ACM Symposium on Applied Computing, pp. 399–405 (March 2010)Google Scholar
  27. 27.
    Wu, J.-J., Lin, Y.-F., Liu, P.: Optimal Replica Placement in Hierarchical Data Grids with Locality Assurance. Journal of Parallel and Distributed Computing 68(12), 1517–1538 (2008)CrossRefGoogle Scholar
  28. 28.
    Yang, Y., Liu, K., Chen, J., Liu, X., Yuan, D., Jin, H.: An Algorithm in SwinDeW-C for Scheduling Transaction-Intensive Cost-Constrained Cloud Workflows. In: Proceedings of the 4th IEEE International Conference on eScience, pp. 374–375 (December 2008)Google Scholar
  29. 29.
    Yook, H.-G., Park, M.-S.: Scheduling GEN_BLOCK Array Redistribution. The Journal of Supercomputing 22(3), 251–267 (2002)zbMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Ching-Hsien Hsu
    • 1
  • Alfredo Cuzzocrea
    • 2
  • Shih-Chang Chen
    • 1
  1. 1.Chung Hua UniversityTaiwan
  2. 2.ICAR-CNR and University of CalabriaItaly

Personalised recommendations