Earth Science Informatics

, Volume 10, Issue 1, pp 85–97 | Cite as

Toward cyberinfrastructure to facilitate collaboration and reproducibility for marine integrated ecosystem assessments

  • Stace E. Beaulieu
  • Peter A. Fox
  • Massimo Di Stefano
  • Andrew Maffei
  • Patrick West
  • Jonathan A. Hare
  • Michael Fogarty
Research Article


There is a growing need for cyberinfrastructure to support science-based decision making in management of natural resources. In particular, our motivation was to aid the development of cyberinfrastructure for Integrated Ecosystem Assessments (IEAs) for marine ecosystems. The IEA process involves analysis of natural and socio-economic information based on diverse and disparate sources of data, requiring collaboration among scientists of many disciplines and communication with other stakeholders. Here we describe our bottom-up approach to developing cyberinfrastructure through a collaborative process engaging a small group of domain and computer scientists and software engineers. We report on a use case evaluated for an Ecosystem Status Report, a multi-disciplinary report inclusive of Earth, life, and social sciences, for the Northeast U.S. Continental Shelf Large Marine Ecosystem. Ultimately, we focused on sharing workflows as a component of the cyberinfrastructure to facilitate collaboration and reproducibility. We developed and deployed a software environment to generate a portion of the Report, retaining traceability of derived datasets including indicators of climate forcing, physical pressures, and ecosystem states. Our solution for sharing workflows and delivering reproducible documents includes IPython (now Jupyter) Notebooks. We describe technical and social challenges that we encountered in the use case and the importance of training to aid the adoption of best practices and new technologies by domain scientists. We consider the larger challenges for developing end-to-end cyberinfrastructure that engages other participants and stakeholders in the IEA process.


E-science Executable workflow Indicator IPython notebook Open science Use case methodology 


  1. Acreman M (2005) Linking science and decision-making: features and experience from environmental river flow setting. Environ Model Softw 20:99–109. doi:10.1016/j.envsoft.2003.08.019 CrossRefGoogle Scholar
  2. Ahuja MK, Carley KM (1998) Network structure in virtual organizations. J Comput-Mediat Commun 3:0. doi:10.1111/j.1083-6101.1998.tb00079.x CrossRefGoogle Scholar
  3. Car NJ (2013) The eReefs information architecture. 20th International Congress on Modelling and Simulation, Adelaide, Australia, 1–6 December 2013, p. 831–837. Accessed 23 December 2015
  4. Chen Y, Minchin SA, Seaton S, Joehnk KD, Robson BJ, Bai Q (2011) eReefs – a new perspective on the Great Barrier Reef. 19th International Congress on Modelling and Simulation, Perth, Australia, 12–16 December 2011, p. 1195–1201. Accessed 23 December 2015
  5. Cooke NJ, Hilton ML (2015) Enhancing the effectiveness of team science. The National Academies Press, Washington. doi:10.17226/19007 Google Scholar
  6. Di Stefano M, Fox P, Beaulieu S, Maffei A (2012) The integrated ecosystems assessment initiative - enabling the assessment of impacts on large marine ecosystems: informatics to the forefront of science-based decision support. 2012 I.E. Annual Science Conference, Bergen Accessed 23 December 2015Google Scholar
  7. Di Stefano M, Fox P, Maffei A, West P, Hare J (2013) An open source approach to enable the reproducibility of scientific workflows in the ocean sciences. American Geophysical Union Fall Meeting, San Francisco Accessed 23 December 2015Google Scholar
  8. Eaton JW, Bateman D, Hauberg S, Wehbring R (2014) GNU Octave version 3.8.1 manual: a high-level interactive language for numerical computations. Create Space Independent Publishing Platform. ISBN 1441413006.
  9. Ecosystem Assessment Program (2009) Ecosystem Assessment Report for the Northeast U.S. Continental Shelf Large Marine Ecosystem. U.S. Department of Commerce, Northeast Fisheries Science Center Reference Document 09–11, 61 pp. Accessed 23 December 2015
  10. Ecosystem Assessment Program (2012) Ecosystem Status Report for the Northeast Shelf Large Marine Ecosystem - 2011. U.S. Department of Commerce, Northeast Fisheries Science Center Reference Document 12–07, 32 pp. Accessed 23 December 2015
  11. Fox P, McGuinness DL (2008) TWC Semantic Web Methodology. Accessed 23 December 2015
  12. Fox P, Batchelder H, Lawrence S, Maffei A, Young O (2012) Information models for development and evolution of complex multi-scale knowledge networks for marine ecosystems. Ocean Sciences Meeting, Salt Lake City Accessed 23 December 2015Google Scholar
  13. Frederic J (2013) Nbconvert refactor. Final 1.0. Accessed 17 October 2016
  14. Goble CA, Bhagat J, Aleksejevs S et al (2010) myExperiment: a repository and social network for the sharing of bioinformatics workflows. Nucleic Acids Res 38:W677–W682. doi:10.1093/nar/gkq429 CrossRefGoogle Scholar
  15. GRASS Development Team (2015) Geographic Resources Analysis Support System (GRASS) Software, Version 7.0. Open Source Geospatial Foundation.
  16. Horsburgh JS (2015) Hydrology domain cyberinfrastructures: Successes, challenges, and opportunities. American Geophysical Union Fall Meeting, abstract #H42A-07. Accessed 15 April 2016
  17. Howison J, Deelman E, McLennan MJ, Ferreira da Silva R, Herbsleb JD (2015) Understanding the scientific software ecosystem and its impact: current and future measures. Research Evaluation 24:454–470. doi:10.1093/reseval/rvv014 CrossRefGoogle Scholar
  18. Hunter JD (2007) Matplotlib: a 2D graphics environment. Computing in Science & Engineering 9:90–95. doi:10.1109/MCSE.2007.55 CrossRefGoogle Scholar
  19. iMarine (2014) Executive Summary: “iMarine data platform for collaborations” workshop, 7 March 2014, FAO, Rome, Italy. Accessed 31 December 2015
  20. Jirotka M, Lee CP, Olson GM (2013) Supporting scientific collaboration: methods, tools, and concepts. Comput Supported Coop Work 22:667–715. doi:10.1007/s10606-012-9184-0 CrossRefGoogle Scholar
  21. Jones E, Oliphant E, Peterson P, et al. (2001) SciPy: Open Source Scientific Tools for Python. Accessed 18 October 2016
  22. Jupyter Team (2015) Jupyter Documentation. Kernels (Programming Languages). Accessed 18 October 2016
  23. Kluyver T, Ragan-Kelley B, Perez F, Granger B, Bussonnier M, Frederic J, Kelley K, Hamrick J, Grout J, Corlay S, Ivanov P, Avila D, Abdalla S, Willing C, Jupyter Development Team (2016) Jupyter Notebooks—a publishing format for reproducible computational workflows. In: Loizides F, Schmidt B (eds) Positioning and power in academic publishing: players, agents and agendas. p. 87–90. doi:10.3233/978-1-61499-649-1-87
  24. Levin PS, Fogarty MJ, Murawski SA, Fluharty D (2009) Integrated ecosystem assessments: developing the scientific basis for ecosystem-based management of the ocean. PLoS Biol 7:e1000014. doi:10.1371/journal.pbio.1000014 CrossRefGoogle Scholar
  25. Levin PS, Kelble CR, Shuford RL, Ainsworth C, deReynier Y, Dunsmore R, Fogarty MJ, Holsman K, Howell EA, Monaco ME, Oakes SA, Werner F (2014) Guidance for implementation of integrated ecosystem assessments: a US perspective. ICES J Mar Sci 71:1198–1204. doi:10.1093/icesjms/fst112 CrossRefGoogle Scholar
  26. Liu J, Pacitti E, Valduriez P, Mattoso M (2015) A survey of data-intensive scientific workflow management. Journal of Grid Computing 13:457–493. doi:10.1007/s10723-015-9329-8 CrossRefGoogle Scholar
  27. Ma X, Beaulieu SE, Fu L, Fox P, Di Stefano M, West P (2017) Documenting provenance for reproducible marine ecosystem assessment in open science. In: Diviacco P, Leadbetter A, Glaves HM (eds) Oceanographic and marine cross-domain data management for sustainable development. IGI Global, Hershey, pp. 100–126. doi:10.4018/978-1-5225-0700-0.ch005 CrossRefGoogle Scholar
  28. McKinney W (2010) Data structures for statistical computing in python. Proceedings of the 9th python in science conference:51–56Google Scholar
  29. Muller MJ, Kuhn S (1993) Participatory design. Commun ACM 36:24–28. doi:10.1145/153571.255960 CrossRefGoogle Scholar
  30. Muste M, Bennett D, Secchi S, Schnoor J, Kusiak A, Arnold N, Mishra S, Ding D, Rapolu U (2013) End-to-end cyberinfrastructure for decision-making support in watershed management. J Water Resour Plan Manag 139:565–573. doi:10.1061/(ASCE)WR.1943-5452.0000289 CrossRefGoogle Scholar
  31. National Ocean Council (2013) National ocean policy implementation plan. Accessed 23 December 2015
  32. NOAA (2014) National Oceanic and Atmospheric Administration Information Quality Guidelines. Issue date of this revision: 30 October 2014. Accessed 15 April 2016
  33. Nosek BA, Alter G, Banks GC, Borsboom D, Bowman SD, Breckler SJ, Buck S, Chambers CD, Chin G, Christensen G, Contestabile M, Dafoe A, Eich E, Freese J, Glennerster R, Goroff D, Green DP, Hesse B, Humphreys M, Ishiyama J, Karlan D, Kraut A, Lupia A, Mabry P, Madon T, Malhotra N, Mayo-Wilson E, McNutt M, Miguel E, Paluck EL, Simonsohn U, Soderberg C, Spellman BA, Turitto J, VandenBos G, Vazire S, Wagenmakers EJ, Wilson R, Yarkoni T (2015) Promoting an open research culture. Science 348:1422–1425. doi:10.1126/science.aab2374 CrossRefGoogle Scholar
  34. Obama B (2013) Executive order -- Making open and machine readable the new default for government information. The White House, Office of the Press Secretary, May 09, 2013. Accessed 23 December 2015
  35. Olson JS, Hofer EC, Bos N, Zimmerman A, Olson GM, Cooney D, Faniel I (2008) A theory of remote scientific collaboration. In: Olson GM, Zimmerman A, Bos N (eds) Scientific collaboration on the internet. MIT Press, Cambridge, MA, pp. 73–99CrossRefGoogle Scholar
  36. Palmer MA (2012) Socioenvironmental sustainability and actionable science. Bioscience 62:5–6. doi:10.1525/bio.2012.62.1.2 CrossRefGoogle Scholar
  37. Pennington D (2011) Collaborative, cross-disciplinary learning and co-emergent innovation in eScience teams. Earth Sci Inf 4:55–68. doi:10.1007/s12145-011-0077-4 CrossRefGoogle Scholar
  38. Pérez F, Granger BE (2007) IPython: a system for interactive scientific computing. Computing in Science and Engineering 9:21–29. doi:10.1109/MCSE.2007.53 CrossRefGoogle Scholar
  39. Pulsifer PL, Collins JA, Kaufman M, Eicken H, Parsons MA, Gearheard S (2011) Applying agile methods to the development of a community-based sea ice observations database. American Geophysical Union Fall Meeting, abstract #IN54A-08. Accessed 15 April 2016
  40. R Core Team (2013) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. M, Pérez F, Granger B, Kluyver T, Ivanov P, Frederic J, Bussonier M (2014) The Jupyter/IPython architecture: a unified view of computational research, from interactive exploration to communication and publication. American Geophysical Union Fall Meeting, abstract #H44D-07. Accessed 15 April 2016
  41. Reichman OJ, Jones MB, Schildhauer MP (2011) Science challenges and opportunities of open data in ecology. Science 331:703–705. doi:10.1126/science.1197962 CrossRefGoogle Scholar
  42. Samhouri JF, Lester SE, Selig ER, Halpern BS, Fogarty MJ, Longo C, McLeod KL (2012) Sea sick? Setting targets to assess ocean health and ecosystem services. Ecosphere 3:art41. doi:10.1890/ES11-00366.1 CrossRefGoogle Scholar
  43. Samhouri JF, Haupt AJ, Levin PS, Link JS, Shuford R (2014) Lessons learned from developing integrated ecosystem assessments to inform marine ecosystem-based management in the USA. ICES J Mar Sci 71:1205–1215. doi:10.1093/icesjms/fst141 CrossRefGoogle Scholar
  44. Shen H (2014) Interactive notebooks: sharing the code. The free IPython notebook makes data analysis easier to record, understand and reproduce. Nature 515:151–152. doi:10.1038/515151a CrossRefGoogle Scholar
  45. Taconet M, Ellebroek A, Castelli D, Pagano P, Caumont H, Garavelli S, Parker S (2014) Sustaining iMarine: a public partnership led business model. The iMarine Sustainability White Paper, final release November 2014, 65 pp. Accessed 23 December 2015
  46. Tilmes C, Fox P, Ma X, McGuinness DL, Privette AP, Smith A, Waple A, Zednik S, Zheng JG (2013) Provenance representation for the national climate assessment in the global change information system. IEEE Trans Geosci Remote Sens 51:5160–5168. doi:10.1109/TGRS.2013.2262179 CrossRefGoogle Scholar
  47. van den Hove S (2007) A rationale for science-policy interfaces. Futures 39:807–826. doi:10.1016/j.futures.2006.12.004 CrossRefGoogle Scholar
  48. Wilson G (2014) Software Carpentry: lessons learned. F1000Research 3:62, Version 1, 19 Feb 2014. doi:10.12688/f1000research.3–62.v1
  49. Wright DJ (2016) Toward a digital resilience. Elementa: Science of the Anthropocene 4:000082. doi:10.12952/journal.elementa.000082 Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg (outside the USA) 2016

Authors and Affiliations

  • Stace E. Beaulieu
    • 1
  • Peter A. Fox
    • 2
  • Massimo Di Stefano
    • 1
    • 2
    • 3
  • Andrew Maffei
    • 1
  • Patrick West
    • 2
  • Jonathan A. Hare
    • 4
  • Michael Fogarty
    • 4
  1. 1.Woods Hole Oceanographic InstitutionWoods HoleUSA
  2. 2.Tetherless World ConstellationRensselaer Polytechnic InstituteTroyUSA
  3. 3.Center for Coastal and Ocean MappingUniversity of New HampshireDurhamUSA
  4. 4.Northeast Fisheries Science CenterNational Oceanic and Atmospheric AdministrationWoods HoleUSA

Personalised recommendations