Skip to main content

Architecting Data-Intensive Software Systems

  • Chapter
  • First Online:
Handbook of Data Intensive Computing

Data-intensive software is increasingly prominent in today’s world, where the collection, processing, and dissemination of ever-larger volumes of data has become a driving force behind innovation in the early twenty-first century. The trend towards massive data manipulation is broad-based, and case studies can be examined in domains from politics, to intelligence gathering, to scientific and medical research. The scientific domain in particular provides a rich array of case studies that offer ready insight into many of the modern software engineering, and software architecture challenges associated with data-intensive systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We use “archive” and “repository” interchangeably throughout the chapter.

  2. 2.

    Metadata refers to “data about data.” As an example, consider a book data file, and its associated metadata, “author,” with potentially many values.

  3. 3.

    In the world of science data systems and data-intensive systems in general, “products” refer to the output data file(s) along with their metadata.

  4. 4.

    The size of their orbit is comparable to the diameter of the Sun.

  5. 5.

    A white dwarf is the remnant of a star with a mass of about that of the Sun compressed into a volume about the size of the Earth. The Sun will end its life some five billion years hence as a white dwarf.

References

  1. H. Rottgering, LOFAR, a new low frequency radio telescope. New Astronomy Reviews, Volume 47, Issues 4–5, High-redshift radio galaxies - past, present and future, September 2003, Pages 405–409.

    Google Scholar 

  2. http://twitter.com/{\#}!/chrismattmann/status/66141594474127361.

  3. C. Mattmann. Software Connectors for Highly Distributed and Voluminous Data-Intensive Systems. Ph.D. Dissertation. University of Southern California, 2007.

    Google Scholar 

  4. R. T. Kouzes, G. A. Anderson, S. T. Elbert, I Gorton, D. K. Gracio, The Changing Paradigm of Data-Intensive Computing. Computer, vol.42, no.1, pp.26–34, Jan. 2009.

    Google Scholar 

  5. C. Mattmann, D. Crichton, N. Medvidovic and S. Hughes. A Software Architecture-Based Framework for Highly Distributed and Data Intensive Scientific Applications. In Proceedings of the 28th International Conference on Software Engineering (ICSE06), Software Engineering Achievements Track, pp. 721–730, Shanghai, China, May 20th–28th, 2006.

    Google Scholar 

  6. C. Mattmann, D. Freeborn, D. Crichton, B. Foster, A. Hart, D. Woollard, S. Hardman, P. Ramirez, S. Kelly, A. Y. Chang, C. E. Miller. A Reusable Process Control System Framework for the Orbiting Carbon Observatory and NPP Sounder PEATE missions. In Proceedings of the 3rd IEEE Intl Conference on Space Mission Challenges for Information Technology (SMC-IT 2009), pp. 165–172, July 19–23, 2009.

    Google Scholar 

  7. T. White. Hadoop: The Definitive Guide. 2nd Edition, O’Reilly, 2010.

    Google Scholar 

  8. P. Couvares, T. Kosar, A. Roy, J. Weber, K. Wenger. Workflow Management in Condor. In Workflows for e-Science. I. J. Taylor, E. Deelman, D. B. Gannon, M. Shields, eds. Springer London, pp. 357–375, 2007.

    Google Scholar 

  9. Y. Gil, V. Ratnakar, K. Jihie, J. Moody, E. Deelman, P.A González-Calero, P. Groth. Wings: Intelligent Workflow-Based Design of Computational Experiments. IEEE Intelligent Systems. vol.26, no.1, pp.62–72, Jan.-Feb. 2011.

    Google Scholar 

  10. D. Woollard, N. Medvidovic, Y. Gil, and C. Mattmann. Scientific Software as Workflows: From Discovery to Distribution. IEEE Software – Special Issue on Developing Scientific Software, Vol. 25, No. 4, July/August, 2008.

    Google Scholar 

  11. Science Gateways Group, Indiana University Pervasive Technologies Institute, http://pti.iu.edu/sgg,Accessed:July2011.

  12. D. N. Williams, R. Ananthakrishnan, D. E. Bernholdt, S. Bharathi, D. Brown, M. Chen, A. L. Chervenak, L. Cinquini, R. Drach, I. T. Foster, P. Fox, D. Fraser, J. Garcia, S. Hankin, P. Jones, D. E. Middleton, J. Schwidder, R. Schweitzer, R. Schuler, A. Shoshani, F. Siebenlist, A. Sim, W. G. Strand, M. Su, N. Wilhelmi, The Earth System Grid: Enabling Access to Multi-Model Climate Simulation Data, in the Bulletin of the American Meteorological Society, February 2009.

    Google Scholar 

  13. J. Tran, L. Cinquini, C. Mattmann, P. Zimdars, D. Cuddy, K. Leung, O. Kwoun, D. Crichton and D. Freeborn. Evaluating Cloud Computing in the NASA DESDynI Ground Data System. In Proceedings of the ICSE 2011 Workshop on Software Engineering for Cloud Computing - SECLOUD, Honolulu, HI, May 22, 2011.

    Google Scholar 

  14. M. McCandless, E. Hatcher, and O. Gospodneti. Lucene in Action, Manning Publications, 532 pages, 2011.

    Google Scholar 

  15. C. Mattmann, D. Crichton, J. S. Hughes, S. Kelly, S. Hardman, R. Joyner and P. Ramirez. A Classification and Evaluation of Data Movement Technologies for the Delivery of Highly Voluminous Scientific Data Products. In Proceedings of the NASA/IEEE Conference on Mass Storage Systems and Technologies (MSST2006), pp. 131–135, College Park, Maryland, May 15–18, 2006.

    Google Scholar 

  16. A. Hart, C. Mattmann, J. Tran, D. Crichton, H. Kincaid, J. S. Hughes, S. Kelly, K. Anton, D. Johnsey, C. Patriotis. Enabling Effective Curation of Cancer Biomarker Research Data. In Proceedings of the 22nd IEEE International Symposium on Computer-Based Medical Systems (CBMS), Albuquerque, NM, August 3rd–4th, 2009.

    Google Scholar 

  17. A. Hart, J. Tran, D. Crichton, K. Anton, H. Kincaid, S. Kelly, J.S. Hughes and C. Mattmann. An Extensible Biomarker Curation Approach and Software Infrastructure for the Early De- tection of Cancer. In Proceedings of the IEEE Intl. Conference on Health Informatics, pp. 387–392, Porto, Portugal, January 14–17, 2009.

    Google Scholar 

  18. C. Lynch. Big data: How do your data grow? Nature, 455:28–29, 2008.

    Article  Google Scholar 

  19. N. R. Mehta, N. Medvidovic, and S. Phadke. 2000. Towards a taxonomy of software connectors. In Proceedings of the 22nd international conference on Software engineering (ICSE ’00). ACM, New York, NY, USA, 178–187.

    Google Scholar 

  20. J. Yu, R. Buyya. A Taxonomy of Workflow Management Systems for Grid Computing. J. Grid Comput., 2005: 171 ∼ 200.

    Google Scholar 

  21. D. Woollard, C. Mattmann, and N. Medvidovic. Injecting Software Architectural Constraints into Legacy Scientific Applications. In Proceedings of the ICSE 2009 Workshop on Software Engineering for Computational Science and Engineering, pp. 65–71, Vancouver, Canada, May 23, 2009.

    Google Scholar 

  22. M. Uschold and G. M., Ontologies and Semantics for Seamless Connectivity. SIGMOD Record, vol. 33, 2004.

    Google Scholar 

  23. L. F. Richardson. Weather prediction by numerical process, Cambridge University Press, 1922.

    Google Scholar 

  24. J. Kim. Precipitation and snow budget over the southwestern United Sates during the 1994–1995 winter season in a mesoscale model simulation. Water Res. 33, 2831–2839, 1997.

    Article  Google Scholar 

  25. J. Kim, R. T. Kim, W. Arritt, and N. Miller. Impacts of increased atmopheric CO2 on the hydroclimate of the Western United States. J. Climate 15, 1926–1942, 2002.

    Article  Google Scholar 

  26. F. M. Ralph, P.J. Neiman, and G.A. Wick, 2004. Satellite and CALJET aircraft observations of atmospheric rivers over the eastern North Pacific Ocean during the winter of 1997/1998, Mon. Weather Rev., 132, 1721–1745.

    Article  Google Scholar 

  27. A. Hart, C. Goodale, C. Mattmann, P. Zimdars, D. Crichton, P. Lean, J. Kim, and D. Waliser. A Cloud-Enabled Regional Climate Model Evaluation System. In Proceedings of the ICSE 2011 Workshop on Software Engineering for Cloud Computing - SECLOUD, Honolulu, HI, May 22, 2011.

    Google Scholar 

  28. J. P. McMullin, B. Water, D. Schiebel, W. Young, K. Golap. CASA Architecture and Applications, Proceedings of Astronomical Data Analysis Software and Systems, Vol. 376, p. 127, October 2006.

    Google Scholar 

  29. C. R. Bales., N. P. Molotch, T. H. Painter, M. D. Dettinger, R. Rice, and J. Dozie. Mountain Hydrology of the Western United States, Water Resources Research, in press., 2006.

    Google Scholar 

  30. T. P Barnett, J. C. Adam, and D. P. Lettenmaier. Potential impacts of a warming climate on water availability in snow-dominated regions, Nature, 438, doi:10.1038/nature04141, 2005.

    Google Scholar 

  31. T. P. Barnett et al. Human-induced changes in the hydrology of the western United States, Science, 319(5866), 1080–1083, 2008.

    Article  Google Scholar 

  32. P. W. Mote, A. F. Hamlet, M. P. Clark, and D. P. Lettenmaier. Declining mountain snowpack in western North America, Bulletin of the American Meteorological Society, 86(1), 39–49, 2005.

    Article  Google Scholar 

  33. D. W. Pierce, et al. Attribution of declining western U.S. snowpack to human effects, Journal of Climate, 21, 6425–6444, 2008.

    Google Scholar 

  34. T. H. Painter, A. P. Barrett, C. C. Landry, J. C. Neff, M. P. Cassidy, C. R. Lawrence, K. E. McBride, and G. L. Farmer. Impact of disturbed desert soils on duration of mountain snow cover, Geophysical Research Letters, 34, 2007.

    Google Scholar 

  35. M. T. Anderson and J. Lloyd H. Woosley. Water availability for the Western United States – Key Scientific Challenges, US Geological Survey Circular, 1261(85), 2005.

    Google Scholar 

  36. P. C. D. Milly, J. Betancourt, M. Falkenmark, R. Hirsch, Z. Kundzweicz, D. Lettenmaier, and R. Stouffer. Stationarity is Dead, Wither Water Management?, Science, 319(5863), 573–574, 2008.

    Article  Google Scholar 

  37. W. Tracz. 1995. DSSA (Domain-Specific Software Architecture): pedagogical example. SIGSOFT Softw. Eng. Notes 20, 3 (July 1995), 49–62.

    Google Scholar 

  38. S. Weibel, J. Kunze, C. Lagoze and M. Wolf, Dublin Core Metadata for Resource Discovery, Number 2413 in IETF, The Internet Society, 1998.

    Google Scholar 

  39. Home Page for ISO/IEC 11179 Information Technology, http://metadata-stds.org/11179/,Accessed:July2011.

  40. National Radio Astronomy Observatory Innovations in Data-Intensive Astronomy Workshop, http://www.nrao.edu/meetings/bigdata/,Accessed:06/27/11.

Download references

Acknowledgements

This work was conducted at the Jet Propulsion Laboratory, California Institute of Technology under contract to the National Aeronautics and Space Administration. The authors would like to thank the editors of the book for their resolve to publish the book and to work with the authors’ tenuous work schedules to get this chapter published.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chris A. Mattmann .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Mattmann, C.A. et al. (2011). Architecting Data-Intensive Software Systems. In: Furht, B., Escalante, A. (eds) Handbook of Data Intensive Computing. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1415-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-1415-5_2

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4614-1414-8

  • Online ISBN: 978-1-4614-1415-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics