Cluster Computing

, Volume 17, Issue 3, pp 957–977 | Cite as

A Threshold-based Dynamic Data Replication and Parallel Job Scheduling strategy to enhance Data Grid

  • N. MansouriEmail author


Data Grids provide environment for huge, data-intensive applications that produce and process enormous data. Such environments are thus asked to manage data and schedule jobs at the same time. These two important operations have to be tightly coupled to achieve the best results. Replication techniques are widely used to increase the availability of data, improving performance of query latency and load balancing in Data Grid. Also effective resource scheduling is a challenging research issue. In this paper we propose a job scheduling policy, called Parallel Job Scheduling (PJS), and a dynamic data replication strategy, called Threshold-based Dynamic Data Replication (TDDR), to improve the data access efficiencies in a hierarchical Data Grid. The PJS uses hierarchical scheduling to reduce the search time for an appropriate computing node. It considers network characteristics, number of jobs waiting in queue, file locations, and disk read speed of storage drive at data sources. The main idea of TDDR strategy is using a threshold value to determine if the requested replica needs to be copied to the node. The TDDR determines this threshold dynamically based on data request arrival rates and available storage capacities. Then, in order to overcome the problem of limited storage space in each node, we design an efficient replica replacement strategy, which is developed as a two stages process. First, it deletes those files with minimum time for transferring. Second, if space is still insufficient then it considers the last time the replica was requested, number of access, size of replica and file transfer time. Results from the simulation show that our proposed algorithms have better performance in comparison with other algorithms in terms of Mean Job Time, Number of Intercommunications, Number of Replications, Computing Resource Usage, and Effective Network Usage.


Data Grid Data replication Job scheduling File access pattern Simulation 


  1. 1.
    Yeo, C.S., Buyya, R., Assuncao, M.D., Yu, J., Sulistio, A., Venugopal, S., Placek, M.: Utility computing on global grids. In: Bidgoli, H. (ed.) Handbook of Computer Networks. Wiley, New York (2006) Google Scholar
  2. 2.
    Torkestani, J.A.: A new approach to the job scheduling problem in computational grids. Clust. Comput. 15(3), 201–210 (2012). doi: 10.1007/s10586-011-0192-5 CrossRefGoogle Scholar
  3. 3.
    Pinel, F., Dorronsoro, B., Pecero, J.E., Bouvry, P., Khan, U.S.: A two-phase heuristic for the energy-efficient scheduling of independent tasks on computational grids. Clust. Comput. 16(3), 421–433 (2013). doi: 10.1007/s10586-012-0207-x CrossRefGoogle Scholar
  4. 4.
    Taheri, J., Lee, Y.C., Zomaya, A.Y., Siegel, H.: A Bee Colony based optimization approach for simultaneous job scheduling and data replication in grid environments. Comput. Oper. Res. (2011). doi: 10.1016/j.cor.2011.11.012 Google Scholar
  5. 5.
    Andronikou, V., Mamouras, K., Tserpes, K., Kyriazis, D., Varvarigou, T.: Dynamic QoS-aware data replication in grid environments based on data “importance”. Future Gener. Comput. Syst. 28, 544–553 (2012). doi: 10.1016/j.future.2011.02.003 CrossRefGoogle Scholar
  6. 6.
    Tang, M., Lee, B.S., Yeo, C.K., Tang, X.: Dynamic replication algorithms for the multi-tier Data Grid. Future Gener. Comput. Syst. 21, 775–790 (2005). doi: 10.1016/j.future.2004.08.001 CrossRefGoogle Scholar
  7. 7.
    Chang, R.S., Chang, H.P.: A dynamic data replication strategy using access weights in Data Grids. J. Supercomput. 45(3), 277–295 (2008). doi: 10.1007/s11227-008-0172-6 CrossRefGoogle Scholar
  8. 8.
    Foster, I.: The grid: a new infrastructure for 21st century science (2002). doi: 10.1002/0470867167.ch2
  9. 9.
    Ranganathan, K., Foster, I.: Design and evaluation of dynamic replication strategies for a high performance Data Grid. In: International Conference on Computing in High Energy and Nuclear Physics (2001) Google Scholar
  10. 10.
    Lamehamedi, H., Szymanski, B.: Data replication strategies in grid environments. In: ICA3PP, pp. 378–383 (2002). doi: 10.1109/icapp.2002.1173605 Google Scholar
  11. 11.
    Ranganathan, K., Iamnitchi, A., Foster, I.: Improving data availability through dynamic model-driven replication in large peer-to-peer communities. In: CCGrid, pp. 376–381 (2002). doi: 10.1109/CCGRID.2002.1017164 Google Scholar
  12. 12.
    Rahman, R.M., Barker, K., Alhajj, R.: Replica placement in data grid: considering utility and risk. In: International Conference on Information Technology: Coding and Computing, vol. 1, pp. 354–359 (2005). doi: 10.1109/ITCC.2005.117 Google Scholar
  13. 13.
    Vazhkudai, S., Tuecke, S., Foster, I.: Replica selection in the globus Data Grid. In: First IEEE/ACM International Symposium on Cluster Computing and the Grid, pp. 106–113 (2001). doi: 10.1109/CCGRID.2001.923182 CrossRefGoogle Scholar
  14. 14.
    Stockinger, H., Samar, A., Holtman, K., Allcock, B., Foster, I., Tierney, B.: File and object replication in Data Grids. Clust. Comput. 5(3), 305–314 (2002). doi: 10.1023/A:1015681406220 CrossRefGoogle Scholar
  15. 15.
    Yuan, Y., Wu, Y., Yang, G., Yu, F.: Dynamic data replication based on local optimization principle in Data Grid. In: Proceedings of the Sixth International Conference on Grid and Cooperative Computing, pp. 815–822 (2007). doi: 10.1109/gcc.2007.62 Google Scholar
  16. 16.
    McClatchey, R., Anjum, A., Stockinger, H., Ali, A., Willers, I., Thomas, M.: Data intensive and network aware (DIANA) grid scheduling. J. Grid Comput. 5, 43–64 (2007). doi: 10.1007/s10723-006-9059-z CrossRefGoogle Scholar
  17. 17.
    Dang, N.N., Lim, S.B.: Combination of replication and scheduling in data grid. Int. J. Comput. Sci. Netw. Secur. 7(3), 304–308 (2007) Google Scholar
  18. 18.
    Liu, C., Baskiyar, S.: A scalable grid scheduler for real-time applications. Int. J. Comput. Appl. 16(1), 34–42 (2009) Google Scholar
  19. 19.
    Song, H.J., Liu, J., Jakobsen, D., Zhang, X., Taura, K., Chien, A.: The MicroGrid: a scientific tool for modeling computational grids. Sci. Program. 8(3), 127–141 (2000) Google Scholar
  20. 20.
    Takefusa, A., Matsuoka, S., Nakada, H., Aida, K., Nagashima, U.: Overview of a performance evaluation system for global computing scheduling algorithms. In: Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing (1999) Google Scholar
  21. 21.
    Casanova, H.: SimGrid: a toolkit for the simulation of application scheduling. In: Proceedings of the 1st IEEE/ACM International Symposium on Cluster Computing and the Grid, pp. 430–437 (2001). doi: 10.1109/CCGRID.2001.923223 CrossRefGoogle Scholar
  22. 22.
    Buyya, R., Murshed, M.: GridSim: a toolkit for the modeling and simulation of distributed resource management and scheduling for grid computing. J. Concurr. Comput. 14, 1175–1200 (2002) CrossRefzbMATHGoogle Scholar
  23. 23.
    Bell, W.H., Cameron, D.G., Capozza, L., Millar, A.P., Stockinger, K., Zini, F.: OptorSim: a grid simulator for studying dynamic data replication strategies. Int. J. High Perform. Comput. Appl. 17, 1–20 (2003) Google Scholar
  24. 24.
    Ranganathan, K., Foster, I.: Identifying dynamic replication strategies for a high performance Data Grid. In: Proceedings of the Second International Workshop on Grid Computing, pp. 75–86 (2001) Google Scholar
  25. 25.
    Lei, M., Vrbsky, S.V., Hong, X.: An on-line replication strategy to increase availability in Data Grids. Future Gener. Comput. Syst. 24, 85–98 (2008). doi: 10.1016/j.future.2007.04.009 CrossRefzbMATHGoogle Scholar
  26. 26.
    Bsoul, M., Al-Khasawneh, A., Abdallah, E.E., Kilani, Y.: Enhanced fast spread replication strategy for data grid. J. Netw. Comput. Appl. 34, 575–580 (2011). doi: 10.1016/j.jnca.2010.12.006 CrossRefGoogle Scholar
  27. 27.
    Sashi, K., Thanamani, A.S.: Dynamic replica management for Data Grid. Int. J. Eng. Technol. 2, 329–333 (2010) CrossRefGoogle Scholar
  28. 28.
    Park, S.-M., Kim, J.-H., Go, Y.-B., Yoon, W.-S.: Dynamic grid replication strategy based on Internet hierarchy. In: International Workshop on Grid and Cooperative Computing, vol. 1, pp. 1324–1331 (2003) Google Scholar
  29. 29.
    Sashi, K., Thanamani, A.: Dynamic replication in a Data Grid using a modified BHR region based algorithm. Future Gener. Comput. Syst. 27(2), 202–210 (2011) CrossRefGoogle Scholar
  30. 30.
    Horri, A., Sepahvand, R., Dastghaibyfard, G.H.: A hierarchical scheduling and replication strategy. Int. J. Comput. Sci. Netw. Secur. 8(8), 30–35 (2008) Google Scholar
  31. 31.
    Mansouri, N., Dastghaibyfard, G.H.: Job scheduling and dynamic data replication in data grid environment. J. Supercomput. 64(1), 204–225 (2013). doi: 10.1007/s11227-012-0850-2 CrossRefGoogle Scholar
  32. 32.
    Saadat, N., Rahmani, A.M.: PDDRA: a new pre-fetching based dynamic data replication algorithm in data grids. Future Gener. Comput. Syst. 28(7), 1045–1057 (2011). doi: 10.1016/j.future.2011.10.011 Google Scholar
  33. 33.
    Nukarapu, D.T., Tang, B., Wang, L., Lu, S.: Data replication in data intensive scientific applications with performance guarantee. IEEE Trans. Parallel Distrib. Syst. (2011). doi: 10.1109/TPDS.2010.207 Google Scholar
  34. 34.
    Chang, R., Chang, J., Lin, S.: Job scheduling and data replication on data grids. Future Gener. Comput. Syst. 23, 846–860 (2007). doi: 10.1016/j.future.2007.02.008 CrossRefGoogle Scholar
  35. 35.
    Zhang, J., Lee, B., Tang, X., Yeo, C.: Impact of parallel download on job scheduling in Data Grid environment. In: Seventh International Conference on Grid and Cooperative Computing, pp. 102–109 (2008) CrossRefGoogle Scholar
  36. 36.
    Tang, M., Lee, B.S., Tang, X., Yeo, C.: The impact of data replication on job scheduling performance in the Data Grid. Future Gener. Comput. Syst. 22, 254–268 (2006) CrossRefzbMATHGoogle Scholar
  37. 37.
    Vazhkudai, S.: Enabling the co-allocation of Grid Data transfers. In: Proceedings of the Fourth International Workshop on Grid Computing, pp. 44–51 (2003) Google Scholar
  38. 38.
    Shorfuzzaman, M., Graham, P., Eskicioglu, R.: Adaptive popularity-driven replica placement in hierarchical Data Grids. J. Supercomput. 51, 374–392 (2010) CrossRefGoogle Scholar
  39. 39.
    Mansouri, N., Dastghaibyfard, G.H.: A dynamic replica management strategy in Data Grid. J. Netw. Comput. Appl. 35(4), 1297–1303 (2012). doi: 10.1016/j.jnca.2012.01.014 CrossRefGoogle Scholar
  40. 40.
    Mansouri, N., Dastghaibyfard, G.H., Mansouri, E.: Combination of data replication and scheduling algorithm for improving data availability in Data Grids. J. Netw. Comput. Appl. 36, 711–722 (2013) CrossRefGoogle Scholar
  41. 41.
    Mansouri, N., Dastghaibyfard, G.H.: Enhanced dynamic hierarchical replication and weighted scheduling strategy in data grid. J. Parallel Distrib. Comput. (2013). doi: 10.1016/j.jpdc.2013.01.002 Google Scholar
  42. 42.
    Cameron, D.G., Carvajal-schiaffino, R., Millar, A.P., Nicholson, C., Stockinger, K., Zini, F.: UK grid simulation with OptorSim. In: UK e-Science All Hands Meeting (2003) Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.Department of Computer ScienceShahid Bahonar University of KermanKermanIran

Personalised recommendations