Architecting Data-Intensive Software Systems

  • Chris A. Mattmann
  • Daniel J. Crichton
  • Andrew F. Hart
  • Cameron Goodale
  • J. Steven Hughes
  • Sean Kelly
  • Luca Cinquini
  • Thomas H. Painter
  • Joseph Lazio
  • Duane Waliser
  • Nenad Medvidovic
  • Jinwon Kim
  • Peter Lean
Chapter

Data-intensive software is increasingly prominent in today’s world, where the collection, processing, and dissemination of ever-larger volumes of data has become a driving force behind innovation in the early twenty-first century. The trend towards massive data manipulation is broad-based, and case studies can be examined in domains from politics, to intelligence gathering, to scientific and medical research. The scientific domain in particular provides a rich array of case studies that offer ready insight into many of the modern software engineering, and software architecture challenges associated with data-intensive systems.

References

  1. 1.
    H. Rottgering, LOFAR, a new low frequency radio telescope. New Astronomy Reviews, Volume 47, Issues 4–5, High-redshift radio galaxies - past, present and future, September 2003, Pages 405–409.Google Scholar
  2. 2.
  3. 3.
    C. Mattmann. Software Connectors for Highly Distributed and Voluminous Data-Intensive Systems. Ph.D. Dissertation. University of Southern California, 2007.Google Scholar
  4. 4.
    R. T. Kouzes, G. A. Anderson, S. T. Elbert, I Gorton, D. K. Gracio, The Changing Paradigm of Data-Intensive Computing. Computer, vol.42, no.1, pp.26–34, Jan. 2009.Google Scholar
  5. 5.
    C. Mattmann, D. Crichton, N. Medvidovic and S. Hughes. A Software Architecture-Based Framework for Highly Distributed and Data Intensive Scientific Applications. In Proceedings of the 28th International Conference on Software Engineering (ICSE06), Software Engineering Achievements Track, pp. 721–730, Shanghai, China, May 20th–28th, 2006.Google Scholar
  6. 6.
    C. Mattmann, D. Freeborn, D. Crichton, B. Foster, A. Hart, D. Woollard, S. Hardman, P. Ramirez, S. Kelly, A. Y. Chang, C. E. Miller. A Reusable Process Control System Framework for the Orbiting Carbon Observatory and NPP Sounder PEATE missions. In Proceedings of the 3rd IEEE Intl Conference on Space Mission Challenges for Information Technology (SMC-IT 2009), pp. 165–172, July 19–23, 2009.Google Scholar
  7. 7.
    T. White. Hadoop: The Definitive Guide. 2nd Edition, O’Reilly, 2010.Google Scholar
  8. 8.
    P. Couvares, T. Kosar, A. Roy, J. Weber, K. Wenger. Workflow Management in Condor. In Workflows for e-Science. I. J. Taylor, E. Deelman, D. B. Gannon, M. Shields, eds. Springer London, pp. 357–375, 2007.Google Scholar
  9. 9.
    Y. Gil, V. Ratnakar, K. Jihie, J. Moody, E. Deelman, P.A González-Calero, P. Groth. Wings: Intelligent Workflow-Based Design of Computational Experiments. IEEE Intelligent Systems. vol.26, no.1, pp.62–72, Jan.-Feb. 2011.Google Scholar
  10. 10.
    D. Woollard, N. Medvidovic, Y. Gil, and C. Mattmann. Scientific Software as Workflows: From Discovery to Distribution. IEEE Software – Special Issue on Developing Scientific Software, Vol. 25, No. 4, July/August, 2008.Google Scholar
  11. 11.
    Science Gateways Group, Indiana University Pervasive Technologies Institute, http://pti.iu.edu/sgg,Accessed:July2011.
  12. 12.
    D. N. Williams, R. Ananthakrishnan, D. E. Bernholdt, S. Bharathi, D. Brown, M. Chen, A. L. Chervenak, L. Cinquini, R. Drach, I. T. Foster, P. Fox, D. Fraser, J. Garcia, S. Hankin, P. Jones, D. E. Middleton, J. Schwidder, R. Schweitzer, R. Schuler, A. Shoshani, F. Siebenlist, A. Sim, W. G. Strand, M. Su, N. Wilhelmi, The Earth System Grid: Enabling Access to Multi-Model Climate Simulation Data, in the Bulletin of the American Meteorological Society, February 2009.Google Scholar
  13. 13.
    J. Tran, L. Cinquini, C. Mattmann, P. Zimdars, D. Cuddy, K. Leung, O. Kwoun, D. Crichton and D. Freeborn. Evaluating Cloud Computing in the NASA DESDynI Ground Data System. In Proceedings of the ICSE 2011 Workshop on Software Engineering for Cloud Computing - SECLOUD, Honolulu, HI, May 22, 2011.Google Scholar
  14. 14.
    M. McCandless, E. Hatcher, and O. Gospodneti. Lucene in Action, Manning Publications, 532 pages, 2011.Google Scholar
  15. 15.
    C. Mattmann, D. Crichton, J. S. Hughes, S. Kelly, S. Hardman, R. Joyner and P. Ramirez. A Classification and Evaluation of Data Movement Technologies for the Delivery of Highly Voluminous Scientific Data Products. In Proceedings of the NASA/IEEE Conference on Mass Storage Systems and Technologies (MSST2006), pp. 131–135, College Park, Maryland, May 15–18, 2006.Google Scholar
  16. 16.
    A. Hart, C. Mattmann, J. Tran, D. Crichton, H. Kincaid, J. S. Hughes, S. Kelly, K. Anton, D. Johnsey, C. Patriotis. Enabling Effective Curation of Cancer Biomarker Research Data. In Proceedings of the 22nd IEEE International Symposium on Computer-Based Medical Systems (CBMS), Albuquerque, NM, August 3rd–4th, 2009.Google Scholar
  17. 17.
    A. Hart, J. Tran, D. Crichton, K. Anton, H. Kincaid, S. Kelly, J.S. Hughes and C. Mattmann. An Extensible Biomarker Curation Approach and Software Infrastructure for the Early De- tection of Cancer. In Proceedings of the IEEE Intl. Conference on Health Informatics, pp. 387–392, Porto, Portugal, January 14–17, 2009.Google Scholar
  18. 18.
    C. Lynch. Big data: How do your data grow? Nature, 455:28–29, 2008.CrossRefGoogle Scholar
  19. 19.
    N. R. Mehta, N. Medvidovic, and S. Phadke. 2000. Towards a taxonomy of software connectors. In Proceedings of the 22nd international conference on Software engineering (ICSE ’00). ACM, New York, NY, USA, 178–187.Google Scholar
  20. 20.
    J. Yu, R. Buyya. A Taxonomy of Workflow Management Systems for Grid Computing. J. Grid Comput., 2005: 171 ∼ 200.Google Scholar
  21. 21.
    D. Woollard, C. Mattmann, and N. Medvidovic. Injecting Software Architectural Constraints into Legacy Scientific Applications. In Proceedings of the ICSE 2009 Workshop on Software Engineering for Computational Science and Engineering, pp. 65–71, Vancouver, Canada, May 23, 2009.Google Scholar
  22. 22.
    M. Uschold and G. M., Ontologies and Semantics for Seamless Connectivity. SIGMOD Record, vol. 33, 2004.Google Scholar
  23. 23.
    L. F. Richardson. Weather prediction by numerical process, Cambridge University Press, 1922.Google Scholar
  24. 24.
    J. Kim. Precipitation and snow budget over the southwestern United Sates during the 1994–1995 winter season in a mesoscale model simulation. Water Res. 33, 2831–2839, 1997.CrossRefGoogle Scholar
  25. 25.
    J. Kim, R. T. Kim, W. Arritt, and N. Miller. Impacts of increased atmopheric CO2 on the hydroclimate of the Western United States. J. Climate 15, 1926–1942, 2002.CrossRefGoogle Scholar
  26. 26.
    F. M. Ralph, P.J. Neiman, and G.A. Wick, 2004. Satellite and CALJET aircraft observations of atmospheric rivers over the eastern North Pacific Ocean during the winter of 1997/1998, Mon. Weather Rev., 132, 1721–1745.CrossRefGoogle Scholar
  27. 27.
    A. Hart, C. Goodale, C. Mattmann, P. Zimdars, D. Crichton, P. Lean, J. Kim, and D. Waliser. A Cloud-Enabled Regional Climate Model Evaluation System. In Proceedings of the ICSE 2011 Workshop on Software Engineering for Cloud Computing - SECLOUD, Honolulu, HI, May 22, 2011.Google Scholar
  28. 28.
    J. P. McMullin, B. Water, D. Schiebel, W. Young, K. Golap. CASA Architecture and Applications, Proceedings of Astronomical Data Analysis Software and Systems, Vol. 376, p. 127, October 2006.Google Scholar
  29. 29.
    C. R. Bales., N. P. Molotch, T. H. Painter, M. D. Dettinger, R. Rice, and J. Dozie. Mountain Hydrology of the Western United States, Water Resources Research, in press., 2006.Google Scholar
  30. 30.
    T. P Barnett, J. C. Adam, and D. P. Lettenmaier. Potential impacts of a warming climate on water availability in snow-dominated regions, Nature, 438, doi:10.1038/nature04141, 2005.Google Scholar
  31. 31.
    T. P. Barnett et al. Human-induced changes in the hydrology of the western United States, Science, 319(5866), 1080–1083, 2008.CrossRefGoogle Scholar
  32. 32.
    P. W. Mote, A. F. Hamlet, M. P. Clark, and D. P. Lettenmaier. Declining mountain snowpack in western North America, Bulletin of the American Meteorological Society, 86(1), 39–49, 2005.CrossRefGoogle Scholar
  33. 33.
    D. W. Pierce, et al. Attribution of declining western U.S. snowpack to human effects, Journal of Climate, 21, 6425–6444, 2008.Google Scholar
  34. 34.
    T. H. Painter, A. P. Barrett, C. C. Landry, J. C. Neff, M. P. Cassidy, C. R. Lawrence, K. E. McBride, and G. L. Farmer. Impact of disturbed desert soils on duration of mountain snow cover, Geophysical Research Letters, 34, 2007.Google Scholar
  35. 35.
    M. T. Anderson and J. Lloyd H. Woosley. Water availability for the Western United States – Key Scientific Challenges, US Geological Survey Circular, 1261(85), 2005.Google Scholar
  36. 36.
    P. C. D. Milly, J. Betancourt, M. Falkenmark, R. Hirsch, Z. Kundzweicz, D. Lettenmaier, and R. Stouffer. Stationarity is Dead, Wither Water Management?, Science, 319(5863), 573–574, 2008.CrossRefGoogle Scholar
  37. 37.
    W. Tracz. 1995. DSSA (Domain-Specific Software Architecture): pedagogical example. SIGSOFT Softw. Eng. Notes 20, 3 (July 1995), 49–62.Google Scholar
  38. 38.
    S. Weibel, J. Kunze, C. Lagoze and M. Wolf, Dublin Core Metadata for Resource Discovery, Number 2413 in IETF, The Internet Society, 1998.Google Scholar
  39. 39.
    Home Page for ISO/IEC 11179 Information Technology, http://metadata-stds.org/11179/,Accessed:July2011.
  40. 40.
    National Radio Astronomy Observatory Innovations in Data-Intensive Astronomy Workshop, http://www.nrao.edu/meetings/bigdata/,Accessed:06/27/11.

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Chris A. Mattmann
    • 1
  • Daniel J. Crichton
    • 1
  • Andrew F. Hart
    • 1
  • Cameron Goodale
    • 1
  • J. Steven Hughes
    • 1
  • Sean Kelly
    • 1
  • Luca Cinquini
    • 1
  • Thomas H. Painter
    • 1
  • Joseph Lazio
    • 1
  • Duane Waliser
    • 1
  • Nenad Medvidovic
    • 2
  • Jinwon Kim
    • 3
  • Peter Lean
    • 4
  1. 1.Instrument and Science Data Systems, NASA Jet Propulsion LaboratoryCalifornia Institute of TechnologyPasadenaUSA
  2. 2.Computer Science Department, Viterbi School of EngineeringUniversity of Southern CaliforniaLos AngelesUSA
  3. 3.Joint Institute for Regional Earth System Science and Engineering (JIFRESSE)University of California, Los AngelesLos AngelesUSA
  4. 4.Department of MeteorologyUniversity of ReadingReadingUK

Personalised recommendations