Data-intensive software is increasingly prominent in today’s world, where the collection, processing, and dissemination of ever-larger volumes of data has become a driving force behind innovation in the early twenty-first century. The trend towards massive data manipulation is broad-based, and case studies can be examined in domains from politics, to intelligence gathering, to scientific and medical research. The scientific domain in particular provides a rich array of case studies that offer ready insight into many of the modern software engineering, and software architecture challenges associated with data-intensive systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We use “archive” and “repository” interchangeably throughout the chapter.
- 2.
Metadata refers to “data about data.” As an example, consider a book data file, and its associated metadata, “author,” with potentially many values.
- 3.
In the world of science data systems and data-intensive systems in general, “products” refer to the output data file(s) along with their metadata.
- 4.
The size of their orbit is comparable to the diameter of the Sun.
- 5.
A white dwarf is the remnant of a star with a mass of about that of the Sun compressed into a volume about the size of the Earth. The Sun will end its life some five billion years hence as a white dwarf.
References
H. Rottgering, LOFAR, a new low frequency radio telescope. New Astronomy Reviews, Volume 47, Issues 4–5, High-redshift radio galaxies - past, present and future, September 2003, Pages 405–409.
http://twitter.com/{\#}!/chrismattmann/status/66141594474127361.
C. Mattmann. Software Connectors for Highly Distributed and Voluminous Data-Intensive Systems. Ph.D. Dissertation. University of Southern California, 2007.
R. T. Kouzes, G. A. Anderson, S. T. Elbert, I Gorton, D. K. Gracio, The Changing Paradigm of Data-Intensive Computing. Computer, vol.42, no.1, pp.26–34, Jan. 2009.
C. Mattmann, D. Crichton, N. Medvidovic and S. Hughes. A Software Architecture-Based Framework for Highly Distributed and Data Intensive Scientific Applications. In Proceedings of the 28th International Conference on Software Engineering (ICSE06), Software Engineering Achievements Track, pp. 721–730, Shanghai, China, May 20th–28th, 2006.
C. Mattmann, D. Freeborn, D. Crichton, B. Foster, A. Hart, D. Woollard, S. Hardman, P. Ramirez, S. Kelly, A. Y. Chang, C. E. Miller. A Reusable Process Control System Framework for the Orbiting Carbon Observatory and NPP Sounder PEATE missions. In Proceedings of the 3rd IEEE Intl Conference on Space Mission Challenges for Information Technology (SMC-IT 2009), pp. 165–172, July 19–23, 2009.
T. White. Hadoop: The Definitive Guide. 2nd Edition, O’Reilly, 2010.
P. Couvares, T. Kosar, A. Roy, J. Weber, K. Wenger. Workflow Management in Condor. In Workflows for e-Science. I. J. Taylor, E. Deelman, D. B. Gannon, M. Shields, eds. Springer London, pp. 357–375, 2007.
Y. Gil, V. Ratnakar, K. Jihie, J. Moody, E. Deelman, P.A González-Calero, P. Groth. Wings: Intelligent Workflow-Based Design of Computational Experiments. IEEE Intelligent Systems. vol.26, no.1, pp.62–72, Jan.-Feb. 2011.
D. Woollard, N. Medvidovic, Y. Gil, and C. Mattmann. Scientific Software as Workflows: From Discovery to Distribution. IEEE Software – Special Issue on Developing Scientific Software, Vol. 25, No. 4, July/August, 2008.
Science Gateways Group, Indiana University Pervasive Technologies Institute, http://pti.iu.edu/sgg,Accessed:July2011.
D. N. Williams, R. Ananthakrishnan, D. E. Bernholdt, S. Bharathi, D. Brown, M. Chen, A. L. Chervenak, L. Cinquini, R. Drach, I. T. Foster, P. Fox, D. Fraser, J. Garcia, S. Hankin, P. Jones, D. E. Middleton, J. Schwidder, R. Schweitzer, R. Schuler, A. Shoshani, F. Siebenlist, A. Sim, W. G. Strand, M. Su, N. Wilhelmi, The Earth System Grid: Enabling Access to Multi-Model Climate Simulation Data, in the Bulletin of the American Meteorological Society, February 2009.
J. Tran, L. Cinquini, C. Mattmann, P. Zimdars, D. Cuddy, K. Leung, O. Kwoun, D. Crichton and D. Freeborn. Evaluating Cloud Computing in the NASA DESDynI Ground Data System. In Proceedings of the ICSE 2011 Workshop on Software Engineering for Cloud Computing - SECLOUD, Honolulu, HI, May 22, 2011.
M. McCandless, E. Hatcher, and O. Gospodneti. Lucene in Action, Manning Publications, 532 pages, 2011.
C. Mattmann, D. Crichton, J. S. Hughes, S. Kelly, S. Hardman, R. Joyner and P. Ramirez. A Classification and Evaluation of Data Movement Technologies for the Delivery of Highly Voluminous Scientific Data Products. In Proceedings of the NASA/IEEE Conference on Mass Storage Systems and Technologies (MSST2006), pp. 131–135, College Park, Maryland, May 15–18, 2006.
A. Hart, C. Mattmann, J. Tran, D. Crichton, H. Kincaid, J. S. Hughes, S. Kelly, K. Anton, D. Johnsey, C. Patriotis. Enabling Effective Curation of Cancer Biomarker Research Data. In Proceedings of the 22nd IEEE International Symposium on Computer-Based Medical Systems (CBMS), Albuquerque, NM, August 3rd–4th, 2009.
A. Hart, J. Tran, D. Crichton, K. Anton, H. Kincaid, S. Kelly, J.S. Hughes and C. Mattmann. An Extensible Biomarker Curation Approach and Software Infrastructure for the Early De- tection of Cancer. In Proceedings of the IEEE Intl. Conference on Health Informatics, pp. 387–392, Porto, Portugal, January 14–17, 2009.
C. Lynch. Big data: How do your data grow? Nature, 455:28–29, 2008.
N. R. Mehta, N. Medvidovic, and S. Phadke. 2000. Towards a taxonomy of software connectors. In Proceedings of the 22nd international conference on Software engineering (ICSE ’00). ACM, New York, NY, USA, 178–187.
J. Yu, R. Buyya. A Taxonomy of Workflow Management Systems for Grid Computing. J. Grid Comput., 2005: 171 ∼ 200.
D. Woollard, C. Mattmann, and N. Medvidovic. Injecting Software Architectural Constraints into Legacy Scientific Applications. In Proceedings of the ICSE 2009 Workshop on Software Engineering for Computational Science and Engineering, pp. 65–71, Vancouver, Canada, May 23, 2009.
M. Uschold and G. M., Ontologies and Semantics for Seamless Connectivity. SIGMOD Record, vol. 33, 2004.
L. F. Richardson. Weather prediction by numerical process, Cambridge University Press, 1922.
J. Kim. Precipitation and snow budget over the southwestern United Sates during the 1994–1995 winter season in a mesoscale model simulation. Water Res. 33, 2831–2839, 1997.
J. Kim, R. T. Kim, W. Arritt, and N. Miller. Impacts of increased atmopheric CO2 on the hydroclimate of the Western United States. J. Climate 15, 1926–1942, 2002.
F. M. Ralph, P.J. Neiman, and G.A. Wick, 2004. Satellite and CALJET aircraft observations of atmospheric rivers over the eastern North Pacific Ocean during the winter of 1997/1998, Mon. Weather Rev., 132, 1721–1745.
A. Hart, C. Goodale, C. Mattmann, P. Zimdars, D. Crichton, P. Lean, J. Kim, and D. Waliser. A Cloud-Enabled Regional Climate Model Evaluation System. In Proceedings of the ICSE 2011 Workshop on Software Engineering for Cloud Computing - SECLOUD, Honolulu, HI, May 22, 2011.
J. P. McMullin, B. Water, D. Schiebel, W. Young, K. Golap. CASA Architecture and Applications, Proceedings of Astronomical Data Analysis Software and Systems, Vol. 376, p. 127, October 2006.
C. R. Bales., N. P. Molotch, T. H. Painter, M. D. Dettinger, R. Rice, and J. Dozie. Mountain Hydrology of the Western United States, Water Resources Research, in press., 2006.
T. P Barnett, J. C. Adam, and D. P. Lettenmaier. Potential impacts of a warming climate on water availability in snow-dominated regions, Nature, 438, doi:10.1038/nature04141, 2005.
T. P. Barnett et al. Human-induced changes in the hydrology of the western United States, Science, 319(5866), 1080–1083, 2008.
P. W. Mote, A. F. Hamlet, M. P. Clark, and D. P. Lettenmaier. Declining mountain snowpack in western North America, Bulletin of the American Meteorological Society, 86(1), 39–49, 2005.
D. W. Pierce, et al. Attribution of declining western U.S. snowpack to human effects, Journal of Climate, 21, 6425–6444, 2008.
T. H. Painter, A. P. Barrett, C. C. Landry, J. C. Neff, M. P. Cassidy, C. R. Lawrence, K. E. McBride, and G. L. Farmer. Impact of disturbed desert soils on duration of mountain snow cover, Geophysical Research Letters, 34, 2007.
M. T. Anderson and J. Lloyd H. Woosley. Water availability for the Western United States – Key Scientific Challenges, US Geological Survey Circular, 1261(85), 2005.
P. C. D. Milly, J. Betancourt, M. Falkenmark, R. Hirsch, Z. Kundzweicz, D. Lettenmaier, and R. Stouffer. Stationarity is Dead, Wither Water Management?, Science, 319(5863), 573–574, 2008.
W. Tracz. 1995. DSSA (Domain-Specific Software Architecture): pedagogical example. SIGSOFT Softw. Eng. Notes 20, 3 (July 1995), 49–62.
S. Weibel, J. Kunze, C. Lagoze and M. Wolf, Dublin Core Metadata for Resource Discovery, Number 2413 in IETF, The Internet Society, 1998.
Home Page for ISO/IEC 11179 Information Technology, http://metadata-stds.org/11179/,Accessed:July2011.
National Radio Astronomy Observatory Innovations in Data-Intensive Astronomy Workshop, http://www.nrao.edu/meetings/bigdata/,Accessed:06/27/11.
Acknowledgements
This work was conducted at the Jet Propulsion Laboratory, California Institute of Technology under contract to the National Aeronautics and Space Administration. The authors would like to thank the editors of the book for their resolve to publish the book and to work with the authors’ tenuous work schedules to get this chapter published.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Mattmann, C.A. et al. (2011). Architecting Data-Intensive Software Systems. In: Furht, B., Escalante, A. (eds) Handbook of Data Intensive Computing. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-1415-5_2
Download citation
DOI: https://doi.org/10.1007/978-1-4614-1415-5_2
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-1414-8
Online ISBN: 978-1-4614-1415-5
eBook Packages: Computer ScienceComputer Science (R0)