Skip to main content
Log in

Simultaneous Scheduling of Replication and Computation for Data-Intensive Applications on the Grid

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

Managing large datasets has become one major application of Grids. Life science applications usually manage large databases that should be replicated to scale applications. The growing number of users and the simple access to Internet-based application has stressed Grid middleware. Such environment are thus asked to manage data and schedule computation tasks at the same time. These two important operations have to be tightly coupled. This paper presents an algorithm (Scheduling and Replication Algorithm, SRA) that combines data management and scheduling using a steady-state approach. Using a model of the platform, the number of requests as well as their distribution, the number and size of databases, we define a linear program to satisfy all the constraints at every level of the platform in steady-state. The solution of this linear program will give us a placement for the databases on the servers as well as providing, for each kind of job, the server on which they should be executed. Our theoretical results are validated using simulation and logs from a large life science application.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. “Institut de Biologie et Chime des Protéines”. http://www.ibcp.fr.

  2. “The European DataGrid Project”. http://www.eu-datagrid.org.

  3. W. Bell, D. Cameron, L. Capozza, A. Millar, K. Stockinger and F. Zini, “Simulation of Dynamic Grid Replication Strategies in OptorSim”, in Proc. of the 3rd Int’l. IEEE Workshop on Grid Computing (Grid'2002), 2002.

  4. W. Bell, D. Cameron, L. Capozza, A. Millar, K. Stockinger and F. Zini, “OptorSim – A Grid Simulator for Studying Dynamic Data Replication Strategies”, International Journal of High Performance Computing Applications, Vol. 17, No. 4, 2003, http://edg-wp2.web.cern.ch/edg-wp2/publications.html.

  5. M. Berkelaar, “LP_SOLVE”, http://www.cs.sunysb.edu/~algorith/implement/lpsolve/implement.shtml.

  6. F. Berman, G. Fox and A. Hey (eds.), Grid Computing: Making the Global Infrastructure a Reality, Wiley, 2003.

  7. B. Boeckmann, A. Bairoch, R. Apweiler, M.-C. Blatter, A. Estreicher, E. Gasteiger, M. Martin, K. Michoud, C. O'Donovan, I. Phan, S. Pilbout and M. Schneider, “The SWISS-PROT Protein Knowledgebase and its Supplement TrEMBL in 2003”, Nucleic Acids Research, Vol. 31, pp. 365–370, 2003.

    Article  PubMed  Google Scholar 

  8. P. Bucher and A. Bairoch, “A Generalized Profile Syntax for Biomolecular Sequences Motifs and its Function in Automatic Sequence Interpretation”, in R. Altman, D. Brutlag, P. Karp, R. Lathrop and D. Searls (eds.), Proceedings 2nd International Conference on Intelligent Systems for Molecular Biology, Vol. 2, pp. 53–61, 1994.

  9. V. Cardellini, E. Casalicchio, M. Colajanni and P. Su, “The State of the Art in Locally Distributed Web-Server Systems”, ACM Computing Surveys, Vol. 34, No. 2, pp. 263–311, 2002.

    Article  Google Scholar 

  10. A. Chakrabarti, R. Dheepak and S. Sengupta, “Integration of Scheduling and Replication in Data Grids”. Technical Report TR-0407-001, Infosys Tech. Ltd, 2004.

  11. A. Chervenak, I. Foster, C. Kesselman, C. Salisbury and S. Tuecke, “The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets”, Journal of Network and Computer Applications, Vol. 23, pp. 187–200, 2001.

    Article  Google Scholar 

  12. D.G. Cameron, R. Carvajal-Schiaffino, A. Millar, C. Nicholson, K. Stockinger and F. Zini, “Evaluating Scheduling and Replica Optimisation Strategies in OptorSim”, in 4th International Workshop on Grid Computing (Grid2003), 2003.

  13. DIET, http://graal.ens-lyon.fr/DIET/.

  14. I. Foster and C. Kesselman (eds.), The Grid 2: Blueprint for a New Computing Infrastructure, Morgan Kaufmann, 2004.

  15. M.R. Garey and D.S. Johnson, Computers and Intractability, a Guide to the Theory of NP-Completeness, W. H. Freeman and Company, 1979.

  16. GRIPPS, http://gripps.ibcp.fr/index.php.

  17. W. Hoscheck, J. Jaen-Martinez, A. Samar, H. Stockinger and K. Stockinger, “Data Management in an International Data Grid Project”, First IEEE/ACM Int’l Workshop on Grid Computing (Grid 2000), 2000

  18. K. Calvert, M. Doar and E.W. Zegura, “Modeling Internet Topology”, IEEE Communications Magazine, Vol. 35, pp. 160–163, 1997.

    Article  Google Scholar 

  19. T. Kosar and M. Livny, “Stork: Making Data Placement a First Class Citizen in the Grid”, in Proceedings of 24th IEEE Int. Conference on Distributed Computing Systems (ICDCS2004), Tokyo, Japan, 2004.

  20. A. Krishnan, “A Survey of Life Sciences Applications on the Grid”, New Generation Computing, Vol. 22, pp. 111–126, 2004.

    Article  MATH  Google Scholar 

  21. H. Lamehamedi, B. Szymanski, Z. Shentu and E. Deelman, “Data Replication Strategies in Grid Environments”, in Proc. 5th International Conference on Algorithms and Architecture for Parallel Processing, ICA3PP’2002, pp. 378–383, 2002.

  22. H. Mohamed and D. Epema, “An Evaluation of the Close-to-Files Processor and Data Co-allocation Policy in Multiclusters”, in Cluster 2004, pp. 287–298, 2004.

  23. S. Podlipding and L. Böszörmenyi, “A Survey of Web Cache Replacement Strategies”, ACM Computing Surveys, Vol. 35, No. 4, pp. 374–398, 2003.

    Article  Google Scholar 

  24. X. Qin and H. Jiang, “Data Grid: Supporting Data-Intensive Applications in Wide-Area Networks”. Technical Report TR-03-05-01, University of Nebraska-Lincoln, Lincoln, Nebraska, USA, 2003.

  25. K. Ranganathan and I. Foster, “Decoupling Computation and Data Scheduling in Distributed Data Intensive Applications”, in Proceedings of the 11th International Symposium for High Performance Distributed Computing (HPDC-11), Edinburgh, 2002.

  26. K. Ranganathan and I. Foster, “Simulation Studies of Computation and Data Scheduling Algorithms for Data Grids”, Journal of Grid Computing, Vol. 1, No. 1, pp. 53–62, 2003.

    Article  Google Scholar 

  27. D. Thain, T. Tannenbaum and M. Livny, “Distributed Computing in Practice: The Condor experience”, Concurrency and Computation: Practice and Experience, 2004.

  28. C. Wu, L. Yeh, H. Huang, L. Arminski, J. Castro-Alvear, Y. Chen, Z. Hu, P. Kourtesis, R. Ledley and B. Suzek et al., “The Protein Information Resource”, Nucleic Acids Research, Vol. 31, pp. 345–347, 2003.

    Article  PubMed  Google Scholar 

  29. C. Xu, H. Jin and P. Srimani, “Special Issue on Scalable Web Services and Architecture”, Journal of Parallel and Distributed Computing, Vol. 63, 2003.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antoine Vernois.

Additional information

This work was supported in part by the ACI GRID and Grid5000 projects of the French Department of Research.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Desprez, F., Vernois, A. Simultaneous Scheduling of Replication and Computation for Data-Intensive Applications on the Grid. J Grid Computing 4, 19–31 (2006). https://doi.org/10.1007/s10723-005-9016-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10723-005-9016-2

Key words

Navigation