Skip to main content
Log in

Cloud computing in e-Science: research challenges and opportunities

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Service-oriented architecture (SOA), workflow, the Semantic Web, and Grid computing are key enabling information technologies in the development of increasingly sophisticated e-Science infrastructures and application platforms. While the emergence of Cloud computing as a new computing paradigm has provided new directions and opportunities for e-Science infrastructure development, it also presents some challenges. Scientific research is increasingly finding that it is difficult to handle “big data” using traditional data processing techniques. Such challenges demonstrate the need for a comprehensive analysis on using the above-mentioned informatics techniques to develop appropriate e-Science infrastructure and platforms in the context of Cloud computing. This survey paper describes recent research advances in applying informatics techniques to facilitate scientific research particularly from the Cloud computing perspective. Our particular contributions include identifying associated research challenges and opportunities, presenting lessons learned, and describing our future vision for applying Cloud computing to e-Science. We believe our research findings can help indicate the future trend of e-Science, and can inform funding and research directions in how to more appropriately employ computing technologies in scientific research. We point out the open research issues hoping to spark new development and innovation in the e-Science field.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. The Diamond Light Source—http://www.diamond.ac.uk.

  2. http://www.gria.org/about-gria/a-business-perspective.

  3. http://www.oasis-open.org/committees/uddi-spec/.

  4. http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=regrep.

  5. http://www.cagrid.org.

  6. http://camera.calit2.net/.

  7. http://scec.usc.edu/scecpedia/CyberShake.

  8. http://galaxy.psu.edu/.

  9. http://ptolemy.eecs.berkeley.edu/ptolemyII/.

  10. http://library.kepler-project.org/.

  11. http://toolshed.g2.bx.psu.edu/.

  12. http://www.shiwa-workflow.eu/.

  13. http://openprovenance.org/.

  14. http://www.w3.org/2011/prov/.

  15. http://aws.amazon.com/ec2.

  16. http://www.windowsazure.com.

  17. http://hadoop.apache.org.

  18. http://www.stratosphere.eu/.

  19. http://yahoo.github.com/oozie/index.html.

  20. http://sna-projects.com/azkaban/.

  21. http://www.cascading.org/.

  22. http://www.globus.org/toolkit/.

  23. MashMyData http://www.mashmydata.org.

  24. OPeNDAP, http://www.opendap.org.

  25. Web Processing Service, http://www.opengeospatial.org/standards/wps.

  26. At the time of writing this paper, a paper for addressing this multi-step delegation problem is in preparation.

  27. OAuth, http://oauth.net/.

  28. W3C—http://www.w3.org/.

  29. RDF—http://www.w3.org/RDF/.

  30. Turtle—Terse RDF Triple Language—http://www.w3.org/TeamSubmission/turtle/.

  31. RDF Test Cases (N-Triples)—http://www.w3.org/TR/rdf-testcases/#ntriples.

  32. OWL 2 Web Ontology Language—http://www.w3.org/TR/owl2-overview/.

  33. RDF Schema—http://www.w3.org/TR/rdf-schema/.

  34. SPARQL Query Language for RDF—http://www.w3.org/TR/rdf-sparql-query/.

  35. A web server returns a representation of a resource based on the HTTP-Accept header of a client request.

  36. http://linkeddata.org.

  37. http://www.iucr.org/.

  38. http://www.epsrc.ac.uk/Pages/default.aspx.

  39. NERC SIS: http://www.nerc.ac.uk/research/sites/data/sis.asp.

  40. Advanced Climate Research Infrastructure for Data (ACRID)—http://www.cru.uea.ac.uk/cru/projects/acrid/.

  41. The Digital Object Identifier (DOI) System—http://www.doi.org/.

  42. Open Archives Initiative Object Reuse and Exchange (OAI-ORE)—http://www.openarchives.org/ore/.

  43. Open Archives Initiative Object Reuse and Exchange http://www.openarchives.org/ore/.

  44. W3C Provenance Working Group http://www.w3.org/2011/prov/wiki/Main_, Page accessed 18 Dec 2011.

  45. Semantic Publishing and Referencing Ontologies (SPAR) http://purl.org/spar/page. Accessed 18 Dec 2011.

  46. http://www.researchobject.org/.

  47. PROV-O: The PROV Ontology http://www.w3.org/TR/prov-o/.

  48. Open Annotation Collaboration http://www.openannotation.org/.

  49. COHSE—http://cohse.cs.manchester.ac.uk/.

  50. myGrid project—http://www.mygrid.org.uk.

  51. Persistent Uniform Resource Locators http://purl.oclc.org/docs/index.html.

  52. Digital Object Identifier http://www.doi.org/.

  53. The Friend of a Friend (FOAF) vocabulary—http://xmlns.com/foaf/spec/.

  54. Semantically Interlinked Online Communities (SIOC)—http://sioc-project.org/ontology.

  55. Simple Knowledge Organization System Reference (SKOS)—http://www.w3.org/TR/swbp-skos-core-spec.

  56. http://code.google.com/p/baetle/wiki/DoapOntology.

  57. The Gene Ontology Project http://www.geneontology.org/.

  58. Geography Markup Language http://www.opengeospatial.org/standards/gml.

  59. http://onlinelibrary.wiley.com/doi/10.1029/EO085i027p00260-03/abstract.

  60. JISC Biophysical Repositories in the Lab project (BRIL), http://www.jisc.ac.uk/whatwedo/programmes/inf11/digpres/bril.

  61. Eduserve Managed Hosting and Cloud, http://www.eduserv.org.uk/hosting.

  62. UK National Grid Service, http://www.ngs.ac.uk.

  63. http://www.sienainitiative.eu.

  64. CDMI standard, SNIA, http://www.snia.org/cdmi.

  65. http://www.ogf.org/.

  66. http://grouper.ieee.org/groups/2301/.

  67. http://grouper.ieee.org/groups/2302/.

  68. DuraCloud, DuraSpace, http://duracloud.org.

  69. http://www.dmtf.org/standards/cim.

  70. http://www.globus.org/toolkit/docs/5.0/5.0.0/execution/gram5/pi/.

  71. http://www.omg.org/spec/SoaML/.

  72. http://aws.amazon.com/cloudformation/.

  73. http://sla-at-soi.eu.

  74. http://www.theguardian.com/world/the-nsa-files.

  75. http://code.google.com/p/google-refine/.

  76. http://msdn.microsoft.com/en-us/library/hh213066.aspx.

  77. A. Kumbhare, Y. Simmhan, V. Prasanna, Designing a secure storage repository for sharing scientific datasets using public clouds, http://ceng.usc.edu/~simmhan/pubs/kumbhare-datacloud-2011.pdf.

  78. Personal data in the Cloud: a global survey of consumer attitudes. Fujitsu Research Institute: http://www.fujitsu.com/downloads/SOL/fai/reports/fujitsupersonaldata-in-the-cloud.pdf.

  79. O. Qing Zhang, M. Kirchberg, R. K. L. Ko, B. S. Lee, How to track your data: the case for cloud computing provenance, HP Laboratories HPL-2012-11, http://www.hpl.hp.com/techreports/2012/HPL-2012-11.pdf.

  80. MPEG-21 standard, http://mpeg.chiariglione.org/standards/mpeg-21/mpeg-21.htm.

  81. EU FP7 project Contrail, http://contrail-project.eu/.

  82. http://www.helix-nebula.eu.

  83. http://science.energy.gov/~/media/ascr/pdf/program-documents/docs/Magellan_Final_Report.pdf.

  84. http://www.epsrc.ac.uk.

  85. http://www.einfrastructuresouth.ac.uk/.

  86. http://n8hpc.org.uk/.

  87. http://www.gaussian.com/.

  88. ftp://ftp.dl.ac.uk/ccp5/DL_POLY/DL_POLY_CLASSIC/DOCUMENTS/USRMAN2.19.pdf.

  89. http://www.3ds.com/products/simulia/portfolio/.

  90. http://www.cyclecomputing.com.

  91. EU-FP7 Project VENUS-C, http://www.venus-c.eu.

  92. Cloud Foundry, Open Source PaaS, http://www.cloudfoundry.com.

  93. https://aws.amazon.com/ec2/instance-types/.

  94. http://star.mit.edu/cluster/.

  95. https://portal.futuregrid.org/using/clouds.

  96. European Commission e-Infrastructure, European Grid Initiative, http://www.egi.eu.

References

  1. Yang X, Wang L, von Laszewski G (2009) Recent research advances in e-Science. Cluster Comput (special issue). http://springerlink.com/content/f058408qr771348q/

  2. Yang X, Wang L et al (2011) Guide to e-Science: next generation scientific research and discovery. Springer, Berlin

    Book  Google Scholar 

  3. Hey AJG, Trefethen AE (2003) In: Berman F, Fox GC, Hey AJG (eds) The data deluge: an e-Science perspective, in grid computing–making the global infrastructure a reality. Wiley, New York, pp 809–824

    Google Scholar 

  4. Sutter JP, Alcock SG, Sawhney KJS (2011) Automated in-situ optimization of bimorph mirrors at diamond light source. In: Proc. SPIE 8139, 813906. doi:10.1117/12.892719.

  5. Voss A, Meer EV, Fergusson D (2008) Research in a connected world (Edited book). http://www.lulu.com/product/ebook/research-in-a-connected-world/17375289

  6. Zhang L, Zhang J, Cai H (2007) Services computing: core enabling technology of the modern services industry. Springer, New York

    Google Scholar 

  7. Yang X, Dove M, Bruin R et al (2010) A service-oriented framework for running quantum mechanical simulation for material properties over grids. IEEE Trans Syst Man Cybern Part C Appl Rev 40(3)

  8. Yang X, Bruin R, Dove M (2010) User-centred design practice for grid-enabled simulation in e-Science. New Gener Comput 28(2):147–159. doi:10.1007/s00354-008-0082-4, Springer

  9. Hamre T, Sandven S (2011) Open service network for marine environmental data. EuroGOOS, Sopot

    Google Scholar 

  10. Browdy SF (2011) GEOSS common infrastructure: internal structure and standards. GeoViQua First Workshop, Barcelona

    Google Scholar 

  11. Yang X, Dove M, Bruin R, Walkingshaw A, Sinclair R, Wilson DJ, Murray-Rust P (2012) An e-Science data infrastructure for simulations within grid computing environment: methods, approaches, and practice. Concurr Comput Pract Exp.

  12. Yang X (2011) QoS-oriented service computing: bring SOA into cloud environment. In: Liu X, Li Y (eds) Advanced design approaches to emerging software systems: principles, methodology and tools. IGI Global USA

  13. Zhang S, Wang W, Wu H, Vasilakos AV, Liu P (2013) Towards transparent and distributed workload management for large scale web servers. Future Generation Comp Syst 29(4):913–925

    Article  Google Scholar 

  14. Yang X, Nasser B, Surridge M, Middleton S (2012) A business-oriented cloud federation model for real-time applications. Elsevier, Amsterdam, Future generation computer systems. doi:10.1016/j.future.2012.02.005

    Google Scholar 

  15. Ludäscher B, Altintas I, Berkley C, Higgins D, Jaeger E, Jones M, Lee E, Tao J, Zhao Y (2005) Scientific workflow management and the Kepler system. Concurr Comput Pract Exp 18(10):1039–1065

    Article  Google Scholar 

  16. Oinn T, Addis M, Ferris J, Marvin D, Senger M, Greenwood M, Carver T, Glover K, Pocock MR, Wipat A, Li P (2004) Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20(17):3045–3054, Oxford University Press, London.

  17. Taylor I, Shields M, Wang I, Harrison A (2007) The Triana workflow environment: architecture and applications. In: Taylor I, Deelman E, Gannon D, Shields M (eds) Workflows for e-Science. Springer, New York, pp 320–339

    Chapter  Google Scholar 

  18. Deelman E, Mehta G, Singh G, Su M, Vahi K (2007) Pegasus: mapping large-scale workflows to distributed resources. In: Taylor I, Deelman E, Gannon D, Shields M (eds) Workflows for e-Science. Springer, New York, pp 376–394

    Chapter  Google Scholar 

  19. Fahringer T, Jugravu A, Pllana S, Prodan R, Seragiotto Jr, C, Truong H (2005) ASKALON: a tool set for cluster and Grid computing. Concurr Comput Pract Exp 17(2–4):143–169, Wiley InterScience.

  20. Zhao Y, Hategan M, Clifford B, Foster I, von Laszewski G, Nefedova V, Raicu I, Stef-Praun T, Wilde M (2007) Swift: fast, reliable, loosely coupled parallel computation. Proceedings of 2007 IEEE congress on services (Services 2007), pp 199–206.

  21. Yang X, Bruin R, Dove M (2010) Developing an end-to-end scientific workflow: a case study of using a reliable, lightweight, and comprehensive workflow platform in e-Science. doi:10.1109/MCSE.2009.211.

  22. Ludäscher B, Altintas I, Bowers S, Cummings J, Critchlow T, Deelman E, Roure DD, Freire J, Goble C, Jones M, Klasky S, McPhillips T, Podhorszki N, Silva C, Taylor I, Vouk M (2009) Scientific process automation and workflow management. In Shoshani A, Rotem D (eds) Scientific data management: challenges, existing technology, and deployment, computational science series. Chapman & Hall/CRC, pp 476–508.

  23. Deelman E, Gannon D, Shields M, Taylor I (2009) Workflows and e-Science: an overview of workflow system features and capabilities. Future Gener Comput Syst 25(5):528–540

    Article  Google Scholar 

  24. Taylor I, Deelman E, Gannon D, Shields M (eds) (2007) Workflows for e-Science. Springer, New York, ISBN: 978-1-84628-519-6.

  25. Yu Y (2006) Buyya R (2006) A taxonomy of workflow management systems for grid computing. J Grid Comput 3:171–200

    Article  Google Scholar 

  26. Wang J, Korambath P, Kim S, Johnson S, Jin K, Crawl D, Altintas I, Smallen S, Labate B, Houk KN (2011) Facilitating e-science discovery using scientific workflows on the grid. In: Yang X, Wang L, Jie W (eds) Guide to e-Science: next generation scientific research and discovery. Springer, Berlin, pp 353–382. ISBN 978-0-85729-438-8

    Chapter  Google Scholar 

  27. MacLennan, BJ (1992) Functional programming: practice and theory. Addison-Wesley.

  28. Plale B, Gannon D, Reed DA, Graves SJ, Droegemeier K, Wilhelmson R, Ramamurthy M (2005) Towards dynamically adaptive weather analysis and forecasting in LEAD. In: International conference on computational science (2), pp 624–631.

  29. Wang J, Crawl D, Altintas I (2012) A framework for distributed data-parallel execution in the Kepler scientific workflow system. In: Proceedings of 1st international workshop on advances in the Kepler scientific workflow system and its applications at ICCS 2012 conference.

  30. Islam M, Huang A, Battisha M, Chiang M, Srinivasan S, Peters C, Neumann A, Abdelnur A (2012) Oozie: towards a scalable workflow management system for hadoop. In: Proceedings of the 1st international workshop on scalable workflow enactment engines and technologies (SWEET’12).

  31. El-Rewini H, Lewis T, Ali H (1994) Task scheduling in parallel and distributed systems. PTR Prentice Hall, ISBN: 0-13-099235-6.

  32. Yu J, Buyya R, Ramamohanarao K (2008) Workflow scheduling algorithms for grid computing. In: Xhafa F, Abraham A (eds) Metaheuristics for scheduling in distributed computing environments. Springer, Berlin, pp 173–214. ISBN 978-3-540-69260-7

    Chapter  Google Scholar 

  33. Dong F, Akl S (2006) Scheduling algorithms for grid computing: state of the art and open problems, Technical Report 2006–504. Queen’s University.

  34. Wieczorek M, Prodan R, Fahringer T (2005) Scheduling of scientific workflows in the ASKALON grid environment. SIGMOD Record 34(3):56–62

    Article  Google Scholar 

  35. Wang J, Korambath P, Altintas I, Davis J, Crawl D (2014) Workflow as a service in the cloud: architecture and scheduling algorithms. In: Proceedings of international conference on computational science (ICCS 2014).

  36. Vazirani VV (2003) Approximation algorithms. Springer, Berlin. ISBN 3-540-65367-8

    Book  Google Scholar 

  37. Morton T, Pentico DW (1993) Heuristic scheduling systems: with applications to production systems and project management. Wiley, New York. ISBN 0-471-57819-3

    Google Scholar 

  38. Kosar T, Balman M (2009) A new paradigm: data-aware scheduling in grid computing. Future Gener Comput Syst 25(4):406–413

    Article  Google Scholar 

  39. Yuan D, Yang Y, Liu X, Zhang G, Chen J (2012) A data dependency based strategy for intermediate data storage in scientific cloud workflow systems. Concurr Comput Pract Exp 24(9):956–976

    Article  Google Scholar 

  40. Viana V, de Oliveira D, Mattoso M (2011) Towards a cost model for scheduling scientific workflows activities in cloud environments. IEEE World Congress on Services, pp 216–219.

  41. Kllapi H, Sitaridi E, Tsangaris MM, Ioannidis YE (2011) Schedule optimization for data processing flows on the Cloud. In: SIGMOD conference, pp 289–300.

  42. De Roure D, Goble C, Stevens R (2009) The design and realisation of the myexperiment virtual research environment for social sharing of workflows. Future Gener Comput Syst 25:561–567. doi:10.1016/j.future.2008.06.010

    Article  Google Scholar 

  43. Karasavvas K, Wolstencroft K, Mina E, Cruickshank D, Williams A, De Roure D, Goble C, Roos M (2012) Opening new gateways to workflows for life scientists. In: Gesing S et al. (eds) HealthGrid applications and technologies meet science gateways for life sciences. IOS Press, pp 131–141.

  44. Terstyanszky G, Kukla T, Kiss T, Kacsuk P, Balasko A, Farkas Z (2014) Enabling scientific workflow sharing through coarse-grained interoperability. Future Gener Comput Syst 37:46–59, ISSN 0167–739X. doi:10.1016/j.future.2014.02.016.

  45. Plankensteiner K, Montagnat J, Prodan R (2011) IWIR: a language enabling portability across grid workflow systems. In: Proceedings of workshop on workflows in support of large-scale science (WORKS’11), Seattle. doi:10.1145/2110497.2110509.

  46. Simmhan YL, Plale B, Gannon D (2005) A survey of data provenance in e-Science. SIGMOD Record 34(3):31–36

    Article  Google Scholar 

  47. Ikeda R, Park H, Widom J (2011) Provenance for generalized map and reduce workflows. In: Proceedings of CIDR’2011, pp 273–283.

  48. Crawl D, Wang J, Altintas I (2011) Provenance for mapreduce-based data-intensive workflows. In: Proceedings of the 6th workshop on workflows in support of large-scale science (WORKS11) at supercomputing 2011 (SC2011) conference, pp 21–29.

  49. Muniswamy-Reddy K, Macko P, Seltzer M (2010) Provenance for the cloud. In: Proceedings of the 8th conference on file and storage technologies (FAST’10), The USENIX Association.

  50. Foster I, Zhao Y, Raicu I, Lu S (2008) Cloud computing and grid computing 360-degree compared. In: Grid computing environments workshop, 2008 (GCE’08), pp 1–10.

  51. Bell G, Hey T, Szalay A (2009) Beyond the data deluge. Science 323(5919):1297–1298. doi:10.1126/science.1170411

    Article  Google Scholar 

  52. Chang W-L, Vasilakos AV (2014) Molecular Computing: Towards A Novel Computing Architecture for Complex Problem Solving. Springer, March 2014 (Book in Big Data Series).

  53. Illumina Company, HiSeqTM Sequencing Systems. http://www.illumina.com/documents/systems/hiseq/datasheet_hiseq_systems.pdf

  54. Wang J, Crawl D, Altintas I, Li W (2014) Big data applications using workflows for data parallel computing. IEEE Comput Sci Eng.

  55. Dean J, Ghemawat S, Mapreduce S (2008) Simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  56. Moretti C, Bui H, Hollingsworth K, Rich B, Flynn P, Thain D (2010) All-pairs: an abstraction for data-intensive computing on campus Grids. IEEE Trans Parallel Distrib Syst 21:33–46

    Article  Google Scholar 

  57. Gu Y, Grossman R (2009) Sector and sphere: the design and implementation of a high performance data Cloud. Philos Trans R Soc A 367(1897):2429–2445

    Article  Google Scholar 

  58. Gropp W, Lusk E, Skjellum A (1999) Using MPI: portable parallel programming with the message passing interface, 2nd edn. MIT Press, Cambridge, Scientific and Engineering Computation Series

    Google Scholar 

  59. Chapman B, Jost G, van der Pas R, Kuck D (2007) Using OpenMP: portable shared memory parallel programming. The MIT Press, Cambridge

    Google Scholar 

  60. Schatz M (2009) Cloudburst: highly sensitive read mapping with mapreduce. Bioinformatics 25(11):1363–1369

    Article  Google Scholar 

  61. Langmead B, Schatz MC, Lin J, Pop M, Salzberg SL (2009) Searching for snps with Cloud computing. Genome Biol 10(134)

  62. Kalyanaraman A, Cannon WR, Latt B, Baxter DJ (2011) MapReduce implementation of a hybrid spectral library-database search method for large-scale peptide identification. Bioinformatics, Advance online access. doi:10.1093/bioinformatics/btr523

    Google Scholar 

  63. Dahiphale D, Karve R, Vasilakos AV, Liu H, Yu Z, Chhajer A, Wang J, Wang C (2014) An advanced mapreduce:cloud mapreduce, enhancements and applications. IEEE Trans Netw Serv Manag 11(1):101–115

    Article  Google Scholar 

  64. Wang J, Crawl D, Altintas I (2009) Kepler + Hadoop: a general architecture facilitating data-intensive applications in scientific workflow systems. In: Proceedings of the 4th workshop on workflows in support of large-scale science (WORKS09) at supercomputing 2009 (SC2009) conference. ACM, ISBN 978-1-60558-717-2.

  65. Zhang C, Sterck HD (2009) CloudWF: a computational workflow system for clouds based on hadoop. In: Proceedings of the 1st international conference on cloud computing (CloudCom 2009).

  66. Fei X, Lu S, Lin C (2009) A mapreduce-enabled scientific workflow composition framework. In: Proceedings of 2009 IEEE international conference on web services (ICWS 2009), pp 663–670.

  67. Olston C, Chiou G, Chitnis L, Liu F, Han Y, Larsson M, Neumann A, Rao VBN, Sankarasubramanian V, Seth S, Tian C, ZiCornell T, Wang X (2011) Nova: continuous pig/hadoop workflows. ACM SIGMOD 2011 international conference on management of data (Industrial Track), Athens.

  68. Mateescu G, Gentzsch W, Ribbens CJ (2011) Hybrid computing–where HPC meets grid and cloud computing. Future Gener Comput Syst 27(5):440–453, ISSN 0167–739X. doi:10.1016/j.future.2010.11.003.

  69. Parashar M, AbdelBaky M, Rodero I, Devarakonda A (2013) Cloud paradigms and practices for computational and data-enabled science and engineering. Comput Sci Eng 15:10–18. doi:10.1109/MCSE.2013.49

    Article  Google Scholar 

  70. Basney J, Gaynor J (2011) An oauth service for issuing certificates to science gateways for teragrid users. TeraGrid ‘11, Salt Lake City.

  71. Pearlman J, Craglia M, Bertrand F, Nativi S, Gaigalas G, Dubois G, Niemeyer S, Fritz S (2011) EuroGEOSS: an interdisciplinary approach to research and applications for forestry, biodiversity and drought. http://www.eurogeoss.eu/Documents/publications%20-%20papers/2011%2034ISRSE%20EuroGEOSS%20Pearlman%20et%20al.pdf

  72. Baker CJO, Cheung K-H (eds) (2006) Semantic Web: Revolutionizing knowledge discovery in the life sciences.

  73. Berners-Lee T (2009) Linked data–design issues, W3C. http://www.w3.org/DesignIssues/LinkedData.html

  74. Shaon A, Woolf A, Crompton S, Boczek R, Rogers W, Jackson M (2011) An open source linked data framework for publishing environmental data under the UK location strategy, Terra Cognita workshop, the ISWIC 2011 conference. http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/Workshops/Terra/paper6.pdf

  75. Shaon A, Callaghan S, Lawrence B, Matthews B, Osborn T, Harpham C (2011) Opening up climate research : a linked data approach to publishing data provenance, 7th international digital curation conference (DCC11), Bristol. http://epubs.stfc.ac.uk/work-details?w=60958

  76. Callaghan S, Pepler S, Hewer F, Hardaker P, Gadian A (2009) How to publish data using overlay journals: the OJIMS project, Publication: Ariadne Issue 61, Originating URL: http://www.ariadne.ac.uk/issue61/callaghan-et-al/. Last modified: Thursday, 19-Nov-2009 10:59:06 UTC

  77. Callaghan S, Hewer F, Pepler S, Hardaker P, Gadian A (2009) Overlay journals and data publishing in the meteorological sciences, Publication Date: 30-July-2009 Publication: Ariadne Issue 60 Originating. http://www.ariadne.ac.uk/issue60/callaghan-et-al/ File last modified: Thursday, 30-Jul-2009 15:46:43 UTC

  78. Lawrence B, Pepler S, Jones C, Matthews B, Callaghan S (2011) Citation and peer review of data: moving towards formal data publication. Int J Digital Curation 6(2):2011. http://www.ijdc.net/index.php/ijdc/article/view/181/265

  79. Bechhofer S, Ainsworth J, Bhagat J, Buchan I, Couch P, Cruickshank D, Delderfield M, Dunlop I, Gamble M, Goble C, Michaelides D, Missier P, Owen S, Newman D, De Roure S, Sufi S (2010) Why linked data is not enough for scientists. In: Proceedings of the 6th IEEE e-Science conference, Brisbane.

  80. Zhao J, Goble C, Stevens R (2004) Semantic web applications to e-Science in silico experiments. In: Proceedings of the 13th international World Wide Web conference on alternate track papers and posters. http://www.iw3c2.org/WWW2004/docs/2p284.pdf

  81. Sauermann L, Cyganiak R (2008) Cool URIs for the Semantic Web. W3C Interest Group Note. http://www.w3.org/TR/cooluris/

  82. Haase P, Schmidt M, Schwarte A (2011) The information workbench as a self-service platform for linked data applications. In: Proceedings of the second international workshop on consuming linked data (COLD2011), Bonn. http://ceur-ws.org/Vol-782/HaaseEtAl_COLD2011.pdf

  83. Earl T (2011) SOA, cloud computing and semantic web technology: understanding how they can work together. 3rd annual SOA and semantic technology symposium, 2011. http://www.afei.org/events/1a03/documents/daytwo_keypm_erl.pdf

  84. Foster I, Kesselman C (eds) The grid: blueprint for a new computing infrastructure. Morgan Kaufmann, ISBN 1-55860-475-8

  85. Fitzgerald S (2003) Grid information services for distributed resource sharing. In: Proceedings of the 10th IEEE international symposium on high performance distributed computing.

  86. Laure E, Fisher SM, Frohner A, Grandi C, Kunszt P (2006) Programming the grid with gLite. Comput Methods Sci Technol 12(1):33–45

    Article  Google Scholar 

  87. Romberg M (2002) The UNICORE grid infrastructure. J Sci Program Arch 10(2). IOS Press Amsterdam.

  88. Risch M, Altmann J, Guo L, Fleming A, Courcoubetis C (2009) The GridEcon platform: a business scenario testbed for commercial cloud services. In: Grid economics and business models. LNCS, vol 5745/2009. Springer, Berlin.

  89. Toni F, Morge M et al. (2008) The ArguGrid platform: an overview. In: Grid economics and business models. LNCS, vol 5206/2008. Springer, Berlin.

  90. Wei G, Vasilakos AV, Zheng Y, Xiong N (2010) A game-theoretic method of fair resource allocation for cloud computing services. J Supercomput 54(2):252–269

    Article  Google Scholar 

  91. Dustdar S, Guo Y, Satzger B, Truong HL (2011) Principles of elastic processes. IEEE Internet Comput 15(5):66–71

    Article  Google Scholar 

  92. Guo L, Guo Y, Tian X (2010) IC cloud: a design space for composable cloud computing. In: Proceedings of IEEE cloud computing, Miami.

  93. Duan Q, Yan Y, Vasilakos AV (2012) A Survey on Service-Oriented Network Virtualization Toward Convergence of Networking and Cloud Computing. Network and Service Management, IEEE Transactions, 9(4):373–392, 10 Dec 2012.

  94. Xu F, Liu F, Jin H, Vasilakos AV (2014) Managing Performance Overhead of Virtual Machines in Cloud Computing: A Survey, State of the Art, and Future Directions. Proceedings of the IEEE, 102(1):11–31, 17 Dec 2013.

  95. Wang J, Korambath P, Altintas I (2011) A physical and virtual compute cluster resource load balancing approach to data-parallel scientific workflow scheduling. In: Proceedings of IEEE 2011 fifth international workshop on scientific workflows (SWF 2011), at 2011 congress on services (Services 2011), pp 212–215.

  96. Chadwick K et al. (2012) FermiGrid and FermiCloud update. International symposium on grids and clouds 2012 (ISGC 2012), Taipei.

  97. Schaffer HE, Averitt SF, Hoit MI, Peeler A, Sills ED, Vouk MA (2009) NCSU’s virtual computing lab: a Cloud computing solution. Computer 42(7):94–97

    Article  Google Scholar 

  98. Berriman GB, Deelman E, Juve G, Rynge M, Vöckler JS (1983) The application of cloud computing to scientific workflows: a study of cost and performance. Philos Trans R Soc A Math Phys Eng Sci 371:2013

    Google Scholar 

  99. Mell P, Grance T (2009) The NIST definition of cloud computing. http://www.nist.gov/itl/cloud/upload/cloud-def-v15.pdf

  100. EMC Report (2008) The diverse and exploding digital universe, IDC White Paper. http://www.emc.com/collateral/analyst-reports/diverse-exploding-digital-universe.pdf

  101. Jensen J, Downing R, Waddington S, Hedges M, Zhang J, Knight G (2011) Kindura–federating data clouds for archiving. In: Proceedings of international symposium on grids and clouds.

  102. Hedges M, Hasan A. Blanke T (2007) Management and preservation of research data with iRODS. In: Proceedings of the ACM first workshop on CyberInfrastructure: information management in e-Science. doi:10.1145/1317353.1317358.

  103. Moore RW, Wan M, Rajasekar A (2005) Storage resource broker; generic software infrastructure for managing globally distributed data. In: Proceedings of local to global data interoperability–challenges and technologies, Sardinia. doi:10.1109/LGDI.2005.1612467.

  104. Chine K (2010) Open science in the cloud: towards a universal platform for scientific and statistical computing, handbook of cloud computing, part 4, pp 453–474.

  105. Vogels W (2009) Eventually consistent. Commun ACM 52:40. doi:10.1145/1435417.1435432

    Article  Google Scholar 

  106. Schatz MC, Langmead B, Salzberg SL (2010 July) Cloud computing and the DNA data race. Nat Biotechnol 28(7):691–693

  107. EMC Report: managing information storage: trends 2011–2012. http://www.emc.com/collateral/emc-perspective/h2159-managing-storage-ep.pdf

  108. Excel DataScope, Microsoft Research. http://research.microsoft.com/en-us/projects/exceldatascope

  109. Greenwood D, Khajeh-Hosseini A, Smith J, Sommerville I (2012) The cloud adoption toolkit: addressing the challenges of cloud adoption in enterprise. http://arxiv.org/pdf/1008.1900

  110. Loutas N, Peristeras V, Bouras T, Kamateri E, Zeginis D, Tarabanis K (2010) Towards a reference architecture for semantically interoperable clouds. 2010 IEEE second international conference on cloud computing technology and science, pp 143–150.

  111. Andreozzi S, Burke S, Ehm F, Field L, Galang G, Konya B, Litmaath M, Millar P, Navarro JP (2009) GLUE Specification v. 2.0 (ANL).

  112. Ruiz-Alvarez A, Humphrey M (2011) A model and decision procedure for data storage in Cloud computing. ScienceCloud’11, San Jose.

  113. EPSRC Policy Framework on Research Data (2011). http://www.legislation.gov.uk/ukpga/2000/36/contents

  114. NERC Data Policy (2011). http://www.nerc.ac.uk/research/sites/data/policy.asp

  115. Nair SK, Porwal S, Dimitrakos T, Ferrer AJ, Tordsson J, Sharif T, Sheridan C, Rajarajan M, Khan AU (2010) Towards secure cloud bursting, brokerage and aggregation, 2010 eighth IEEE European conference on web services, pp 190–196. http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5693261

  116. Wang C, Wang Q, Ren K, Lou W (2010) Privacy-preserving public auditing for data storage security in cloud computing. In: INFOCOM, 2010 proceedings IEEE. doi:10.1109/INFCOM.2010.5462173.

  117. Yang X, Blower JD, Bastin L, Lush V, Zabala A, Maso J, Cornford D, Diaz P, Lumsden J (2012) An integrated view of data quality in earth observation. Philos Trans R Soc A. doi:10.1098/rsta.2012.0072

    Google Scholar 

  118. Wei L, Zhu H, Cao Z, Jia W, Vasilakos AV (2010) SecCloud: Bridging Secure Storage and Computation in Cloud. Distributed Computing Systems Workshops (ICDCSW), 2010 IEEE 30th International Conference, IEEE, Genova, 21–25 June 2010.

  119. Wei L, Zhu H, Cao Z, Dong X, Jia W, Chen Y, Vasilakos AV (2014) Security and privacy for storage and computation in cloud computing. Inf Sci 258:371–386

    Article  Google Scholar 

  120. Bose R, Frew J (2005) Lineage retrieval for scientific data processing: a survey. ACM Comput Surv 37(1):1–28

    Article  Google Scholar 

  121. Muniswamy-Reddy K-K, Braun U, Holland DA, Macko P, Maclean D, Margo D, Seltzer M, Smogor R (2009) Layering in provenance systems. In: Proc of the USENIX Technical Conf. USENIX Association, pp 129–142.

  122. Muniswamy-Reddy K-K, Macko P, Seltzer MI (2009) Making a cloud provenance-aware. In: Cheney J (ed) First workshop on the theory and practice of provenance. USENIX, San Francisco

    Google Scholar 

  123. Ahmed W, Wu YW (2013) A survey on reliability in distributed systems. J Comput Syst Sci 79(8):1243–1255. doi:10.1016/j.jcss.2013.02.006

    Article  MathSciNet  MATH  Google Scholar 

  124. Dai YS, Yang B, Dongarra J, Zhang G (2009) Cloud service reliability: modeling and analysis. In: PRDC. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.214.143&rep=rep1&type=pdf

  125. Rellermeyer JS, Bagchi S (2012) Dependability as a cloud service–a modular approach. In: Dependable systems and networks workshops (DSN-W), 2012 IEEE/IFIP 42nd international conference. doi:10.1109/DSNW.2012.6264688.

  126. Berners-Lee T, Fielding R, Masinter L (2005) Uniform resource identifiers (URI): generic syntax. Internet Engineering Task Force (IETF) Request for Comments (RFC) 3986. http://www.ietf.org/rfc/rfc3986.txt

  127. Sollins K, Masinter L (1994) Functional requirements for uniform resource names. Internet Engineering Task Force (IETF) Request for Comments (RFC) 1737. http://tools.ietf.org/html/rfc1737

  128. Paskin N (2010) Digital object identifier (DOI) system. Encyclopaedia of library and information sciences, 3rd edn, pp 1586–1592 (ISBN: 978-0-8493-9712-7). http://www.doi.org/overview/DOI_article_ELIS3.pdf

  129. Bizer C, Heath T, Berners-Lee T (2009) Linked data–the story so far. Int J Semantic Web Inf Syst 5(3):1–22

    Article  Google Scholar 

  130. Delbru R, Campinas S, Tummarello G (2011) Searching web data: an entity retrieval and high-performance indexing model. J Web Semantics.

  131. Rochwerger B, Breitgand D, Levy E, Galis A, Nagin K, Llorente IM, Montero R, Wolfsthal Y, Elmroth E, Caceres J, Ben-Yehuda M, Emmerich W, Gala F (2009) The reservoir model and architecture for open federated Cloud computing. IBM J Res Dev 53(4):1–11

    Article  Google Scholar 

  132. Plank G, Burton RAB et al (2009) Generation of histo-anatomically representative models of the individual heart: tools and application. Philos Trans R Soc A 367(1896):2257–2292. doi:10.1098/rsta.2009.0056

    Article  MathSciNet  MATH  Google Scholar 

  133. He Q, Zhou S, Kobler B, Duffy D, McGlynn T (2010) Case study for running HPC applications in public clouds. In: Proceedings of the 19th ACM Lting. ACM, pp 395–401.

  134. Bientinesi P, Iakymchuk R, Napper J (2010) HPC on competitive cloud resources. In: Handbook of cloud computing. Springer, pp 493–516.

  135. Vouk MA, Sills E, Dreher P (2010) Integration of high-performance computing into cloud computing services. Handbook of cloud computing. Springer, US, pp 255–276

    Chapter  Google Scholar 

  136. Kindura, JISC FSD Programme case study. http://jiscinfonetcasestudies.pbworks.com/w/page/45197715/Kindura

Download references

Acknowledgments

We thank the anonymous reviewers for their constructive and insightful suggestions. Professor Michael Wilson of STFC suddenly passed away during the preparation of this paper. He was closely involved with its drafting, and we are indebted to his ideas and insights.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoyu Yang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, X., Wallom, D., Waddington, S. et al. Cloud computing in e-Science: research challenges and opportunities. J Supercomput 70, 408–464 (2014). https://doi.org/10.1007/s11227-014-1251-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-014-1251-5

Keywords

Navigation