Skip to main content

Architecting Scientific Data Systems in the Cloud

  • Chapter
  • First Online:
Cloud Computing

Abstract

Scientists, educators, decision makers, students, and many others utilize scientific data produced by science instruments. They study our universe, make new discoveries in areas such as weather forecasting and cancer research, and shape policy decisions that impact nations fiscally, socially, economically, and in many other ways. Over the past 20 years or so, the data produced by these scientific instruments have increased in volume, complexity, and resolution, causing traditional computing infrastructures to have difficulties in scaling up to deal with them. This reality has led us, and others, to investigate the applicability of cloud computing to address the scalability challenges. NASA’s Jet Propulsion Laboratory (JPL) is at the forefront of transitioning its science applications to the cloud environment. Through the Apache Object Oriented Data Technology (OODT) framework, for NASA’s first software released at the open-source Apache Software Foundation (ASF), engineers at JPL have been able to scale the storage and computational aspects of their scientific data systems to the cloud – thus achieving reduced costs and improved performance. In this chapter, we report on the use of Apache OODT for cloud computing, citing several examples in a number of scientific domains. Experience, specific performance, and numbers are also reported. Directions for future work in the area are also suggested.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Crichton, D., Mattmann, C., Hughes, J.S., Kelly, S., Hart, A.: A multi-disciplinary, model-driven, distributed science data system architecture. In Yang, X., Wang, L.L., Jie, W. (eds.) Guide to e-Science: Next Generation Scientific Research and Discovery, 1st edn. XVIII, 558 pp., 184 illus. Springer, London, ISBN 978-0-85729-438-8 (2011)

    Google Scholar 

  2. Chang, G., Malhotra, S., Wolgast, P.: Leveraging the Cloud for robust and efficient lunar processing. In: Proceedings of the IEEE Aerospace Conference, Big Sky, March 2011

    Google Scholar 

  3. Khawaja, S., Powell, M.W., Crockett, T.M., Norris, J.S., Rossi, R., Soderstrom, T.: Polyphony: a workflow orchestration framework for Cloud Computing. In: Proceedings of CCGRID, pp. 606–611 (2011)

    Google Scholar 

  4. Crichton, D., Mattmann, C., Thornquist, M., Hughes, J.S., Anton, K.: Bioinformatics: biomarkers of early detection. In: Grizzle, W., Srivastava, S. (eds.) Translational Pathology of Early Cancer. IOS Press, Amsterdam (2012)

    Google Scholar 

  5. ICSE 2011 Software Engineering for Cloud Computing Workshop (SECLOUD): https://sites.google.com/site/icsecloud2011/ (2011)

  6. Marru, S., Gunathilake, L., et al.: Apache Airavata: a framework for distributed applications and computational workflows. In: Proceedings of the SC 2011 Workshop on Gateway Computing Environments, Seattle, 18 November 2011

    Google Scholar 

  7. Garfinkel, S.: An evaluation of Amazon’s grid computing services: EC2, S3 and SQS. Tech. Rep. TR-08-07, Harvard University, August 2007

    Google Scholar 

  8. Acosta N.: Big Data on the Open Cloud. Rackspace US, Inc. (2012)

    Google Scholar 

  9. Ciurana, E.: Developing with Google App Engine. Firstpress, Berkeley (2009)

    Google Scholar 

  10. OpenStack: http://openstack.org/ (2012)

  11. CloudStack – Open Source Cloud Computing: http://cloudstack.org/ (2012)

  12. Amazon Elastic MapReduce (Amazon EMR): http://aws.amazon.com/elasticmapreduce/ (2012)

  13. Nebula Cloud Computing Platform – NASA: http://nebula.nasa.gov (2012)

  14. Chang, G., Law, E., Malhotra, S.: Demonstration of LMMP lunar image processing using Amazon E2C Cloud Computing facility. In: Proceedings of the ICSE 2011 Software Engineering for Cloud Computing (SECLOUD) Workshop, Honolulu, May 2011

    Google Scholar 

  15. Tran, J., Cinquini, L., Mattmann, C., Zimdars, P., Cuddy, D., Leung, K., Kwoun, O., Crichton, D., Freeborn, D.: Evaluating Cloud Computing in the NASA DESDynI ground data system. In: Proceedings of the ICSE 2011 Workshop on Software Engineering for Cloud Computing – SECLOUD, Honolulu, 22 May 2011

    Google Scholar 

  16. Mattmann, C., Crichton, D., Medvidovic, N., Hughes, S.: A software architecture-based framework for highly distributed and data intensive scientific applications. In: Proceedings of the 28th International Conference on Software Engineering (ICSE06), Software Engineering Achievements Track, Shanghai, 20–28 May 2006, pp. 721–730 (2006)

    Google Scholar 

  17. Mattmann, C., Freeborn, D., Crichton, D., Foster, B., Hart, A., Woollard, D., Hardman, S., Ramirez, P., Kelly, S., Chang, A.Y., Miller, C.E.: A reusable process control system framework for the orbiting carbon observatory and NPP Sounder PEATE missions. In: Proceedings of the 3rd IEEE International Conference on Space Mission Challenges for Information Technology (SMC-IT 2009), 19–23 July 2009, pp. 165–172 (2009)

    Google Scholar 

  18. Google Summer of Code – Google Code: http://code.google.com/soc/ (2012)

Download references

Acknowledgments

The authors would like to thank the many project sponsors and collaborators that have supported this effort from the National Aeronautics and Space Administration, National Cancer Institute, and the Jet Propulsion Laboratory. This includes Elizabeth Kay-Im, Gary Lau, Sudhir Srivastava, Christos Patriotis, Shan Malhotra, Dana Freeborn, Andrew Hart, Paul Ramirez, Brian Foster, and Brian Chafin.

This work was performed at the Jet Propulsion Laboratory, California Institute of Technology, under contract to the National Aeronautics and Space Administration.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Crichton .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag London

About this chapter

Cite this chapter

Crichton, D. et al. (2013). Architecting Scientific Data Systems in the Cloud. In: Mahmood, Z. (eds) Cloud Computing. Computer Communications and Networks. Springer, London. https://doi.org/10.1007/978-1-4471-5107-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-5107-4_2

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-5106-7

  • Online ISBN: 978-1-4471-5107-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics