Abstract
There is a growing need for cyberinfrastructure to support science-based decision making in management of natural resources. In particular, our motivation was to aid the development of cyberinfrastructure for Integrated Ecosystem Assessments (IEAs) for marine ecosystems. The IEA process involves analysis of natural and socio-economic information based on diverse and disparate sources of data, requiring collaboration among scientists of many disciplines and communication with other stakeholders. Here we describe our bottom-up approach to developing cyberinfrastructure through a collaborative process engaging a small group of domain and computer scientists and software engineers. We report on a use case evaluated for an Ecosystem Status Report, a multi-disciplinary report inclusive of Earth, life, and social sciences, for the Northeast U.S. Continental Shelf Large Marine Ecosystem. Ultimately, we focused on sharing workflows as a component of the cyberinfrastructure to facilitate collaboration and reproducibility. We developed and deployed a software environment to generate a portion of the Report, retaining traceability of derived datasets including indicators of climate forcing, physical pressures, and ecosystem states. Our solution for sharing workflows and delivering reproducible documents includes IPython (now Jupyter) Notebooks. We describe technical and social challenges that we encountered in the use case and the importance of training to aid the adoption of best practices and new technologies by domain scientists. We consider the larger challenges for developing end-to-end cyberinfrastructure that engages other participants and stakeholders in the IEA process.
Similar content being viewed by others
References
Acreman M (2005) Linking science and decision-making: features and experience from environmental river flow setting. Environ Model Softw 20:99–109. doi:10.1016/j.envsoft.2003.08.019
Ahuja MK, Carley KM (1998) Network structure in virtual organizations. J Comput-Mediat Commun 3:0. doi:10.1111/j.1083-6101.1998.tb00079.x
Car NJ (2013) The eReefs information architecture. 20th International Congress on Modelling and Simulation, Adelaide, Australia, 1–6 December 2013, p. 831–837. http://www.mssanz.org.au.previewdns.com/modsim2013/C7/car2.pdf. Accessed 23 December 2015
Chen Y, Minchin SA, Seaton S, Joehnk KD, Robson BJ, Bai Q (2011) eReefs – a new perspective on the Great Barrier Reef. 19th International Congress on Modelling and Simulation, Perth, Australia, 12–16 December 2011, p. 1195–1201. http://www.mssanz.org.au/modsim2011/C4/chen.pdf. Accessed 23 December 2015
Cooke NJ, Hilton ML (2015) Enhancing the effectiveness of team science. The National Academies Press, Washington. doi:10.17226/19007
Di Stefano M, Fox P, Beaulieu S, Maffei A (2012) The integrated ecosystems assessment initiative - enabling the assessment of impacts on large marine ecosystems: informatics to the forefront of science-based decision support. 2012 I.E. Annual Science Conference, Bergen http://tw.rpi.edu/media/2012/10/08/cd52/ICES_2012.pdf. Accessed 23 December 2015
Di Stefano M, Fox P, Maffei A, West P, Hare J (2013) An open source approach to enable the reproducibility of scientific workflows in the ocean sciences. American Geophysical Union Fall Meeting, San Francisco http://tw.rpi.edu/media/2014/02/23/b139/AGU2013-IN51A-15330-MDS.pdf. Accessed 23 December 2015
Eaton JW, Bateman D, Hauberg S, Wehbring R (2014) GNU Octave version 3.8.1 manual: a high-level interactive language for numerical computations. Create Space Independent Publishing Platform. ISBN 1441413006. http://www.gnu.org/software/octave/doc/interpreter/
Ecosystem Assessment Program (2009) Ecosystem Assessment Report for the Northeast U.S. Continental Shelf Large Marine Ecosystem. U.S. Department of Commerce, Northeast Fisheries Science Center Reference Document 09–11, 61 pp. http://www.nefsc.noaa.gov/publications/crd/crd0911/. Accessed 23 December 2015
Ecosystem Assessment Program (2012) Ecosystem Status Report for the Northeast Shelf Large Marine Ecosystem - 2011. U.S. Department of Commerce, Northeast Fisheries Science Center Reference Document 12–07, 32 pp. http://nefsc.noaa.gov/publications/crd/crd1207/. Accessed 23 December 2015
Fox P, McGuinness DL (2008) TWC Semantic Web Methodology. http://tw.rpi.edu/web/doc/TWC_SemanticWebMethodology. Accessed 23 December 2015
Fox P, Batchelder H, Lawrence S, Maffei A, Young O (2012) Information models for development and evolution of complex multi-scale knowledge networks for marine ecosystems. Ocean Sciences Meeting, Salt Lake City https://tw.rpi.edu//web/doc/OSC2012_139_ecoop_poster. Accessed 23 December 2015
Frederic J (2013) Nbconvert refactor. Final 1.0. http://digitalcommons.calpoly.edu/physsp/85. Accessed 17 October 2016
Goble CA, Bhagat J, Aleksejevs S et al (2010) myExperiment: a repository and social network for the sharing of bioinformatics workflows. Nucleic Acids Res 38:W677–W682. doi:10.1093/nar/gkq429
GRASS Development Team (2015) Geographic Resources Analysis Support System (GRASS) Software, Version 7.0. Open Source Geospatial Foundation. http://grass.osgeo.org
Horsburgh JS (2015) Hydrology domain cyberinfrastructures: Successes, challenges, and opportunities. American Geophysical Union Fall Meeting, abstract #H42A-07. https://agu.confex.com/agu/fm15/meetingapp.cgi/Paper/66729. Accessed 15 April 2016
Howison J, Deelman E, McLennan MJ, Ferreira da Silva R, Herbsleb JD (2015) Understanding the scientific software ecosystem and its impact: current and future measures. Research Evaluation 24:454–470. doi:10.1093/reseval/rvv014
Hunter JD (2007) Matplotlib: a 2D graphics environment. Computing in Science & Engineering 9:90–95. doi:10.1109/MCSE.2007.55
iMarine (2014) Executive Summary: “iMarine data platform for collaborations” workshop, 7 March 2014, FAO, Rome, Italy. http://uripreview.i-marine.eu/be0c89a7-6eca-4ae1-ac87-9a52d8800641.pdf. Accessed 31 December 2015
Jirotka M, Lee CP, Olson GM (2013) Supporting scientific collaboration: methods, tools, and concepts. Comput Supported Coop Work 22:667–715. doi:10.1007/s10606-012-9184-0
Jones E, Oliphant E, Peterson P, et al. (2001) SciPy: Open Source Scientific Tools for Python. http://www.scipy.org/. Accessed 18 October 2016
Jupyter Team (2015) Jupyter Documentation. Kernels (Programming Languages). http://jupyter.readthedocs.io/en/latest/projects/kernels.html. Accessed 18 October 2016
Kluyver T, Ragan-Kelley B, Perez F, Granger B, Bussonnier M, Frederic J, Kelley K, Hamrick J, Grout J, Corlay S, Ivanov P, Avila D, Abdalla S, Willing C, Jupyter Development Team (2016) Jupyter Notebooks—a publishing format for reproducible computational workflows. In: Loizides F, Schmidt B (eds) Positioning and power in academic publishing: players, agents and agendas. p. 87–90. doi:10.3233/978-1-61499-649-1-87
Levin PS, Fogarty MJ, Murawski SA, Fluharty D (2009) Integrated ecosystem assessments: developing the scientific basis for ecosystem-based management of the ocean. PLoS Biol 7:e1000014. doi:10.1371/journal.pbio.1000014
Levin PS, Kelble CR, Shuford RL, Ainsworth C, deReynier Y, Dunsmore R, Fogarty MJ, Holsman K, Howell EA, Monaco ME, Oakes SA, Werner F (2014) Guidance for implementation of integrated ecosystem assessments: a US perspective. ICES J Mar Sci 71:1198–1204. doi:10.1093/icesjms/fst112
Liu J, Pacitti E, Valduriez P, Mattoso M (2015) A survey of data-intensive scientific workflow management. Journal of Grid Computing 13:457–493. doi:10.1007/s10723-015-9329-8
Ma X, Beaulieu SE, Fu L, Fox P, Di Stefano M, West P (2017) Documenting provenance for reproducible marine ecosystem assessment in open science. In: Diviacco P, Leadbetter A, Glaves HM (eds) Oceanographic and marine cross-domain data management for sustainable development. IGI Global, Hershey, pp. 100–126. doi:10.4018/978-1-5225-0700-0.ch005
McKinney W (2010) Data structures for statistical computing in python. Proceedings of the 9th python in science conference:51–56
Muller MJ, Kuhn S (1993) Participatory design. Commun ACM 36:24–28. doi:10.1145/153571.255960
Muste M, Bennett D, Secchi S, Schnoor J, Kusiak A, Arnold N, Mishra S, Ding D, Rapolu U (2013) End-to-end cyberinfrastructure for decision-making support in watershed management. J Water Resour Plan Manag 139:565–573. doi:10.1061/(ASCE)WR.1943-5452.0000289
National Ocean Council (2013) National ocean policy implementation plan. https://www.whitehouse.gov//sites/default/files/national_ocean_policy_implementation_plan.pdf. Accessed 23 December 2015
NOAA (2014) National Oceanic and Atmospheric Administration Information Quality Guidelines. Issue date of this revision: 30 October 2014. http://www.cio.noaa.gov/services_programs/IQ_Guidelines_103014.html. Accessed 15 April 2016
Nosek BA, Alter G, Banks GC, Borsboom D, Bowman SD, Breckler SJ, Buck S, Chambers CD, Chin G, Christensen G, Contestabile M, Dafoe A, Eich E, Freese J, Glennerster R, Goroff D, Green DP, Hesse B, Humphreys M, Ishiyama J, Karlan D, Kraut A, Lupia A, Mabry P, Madon T, Malhotra N, Mayo-Wilson E, McNutt M, Miguel E, Paluck EL, Simonsohn U, Soderberg C, Spellman BA, Turitto J, VandenBos G, Vazire S, Wagenmakers EJ, Wilson R, Yarkoni T (2015) Promoting an open research culture. Science 348:1422–1425. doi:10.1126/science.aab2374
Obama B (2013) Executive order -- Making open and machine readable the new default for government information. The White House, Office of the Press Secretary, May 09, 2013. https://www.whitehouse.gov/the-press-office/2013/05/09/executive-order-making-open-and-machine-readable-new-default-government-. Accessed 23 December 2015
Olson JS, Hofer EC, Bos N, Zimmerman A, Olson GM, Cooney D, Faniel I (2008) A theory of remote scientific collaboration. In: Olson GM, Zimmerman A, Bos N (eds) Scientific collaboration on the internet. MIT Press, Cambridge, MA, pp. 73–99
Palmer MA (2012) Socioenvironmental sustainability and actionable science. Bioscience 62:5–6. doi:10.1525/bio.2012.62.1.2
Pennington D (2011) Collaborative, cross-disciplinary learning and co-emergent innovation in eScience teams. Earth Sci Inf 4:55–68. doi:10.1007/s12145-011-0077-4
Pérez F, Granger BE (2007) IPython: a system for interactive scientific computing. Computing in Science and Engineering 9:21–29. doi:10.1109/MCSE.2007.53
Pulsifer PL, Collins JA, Kaufman M, Eicken H, Parsons MA, Gearheard S (2011) Applying agile methods to the development of a community-based sea ice observations database. American Geophysical Union Fall Meeting, abstract #IN54A-08. http://adsabs.harvard.edu/abs/2011AGUFMIN54A..08P. Accessed 15 April 2016
R Core Team (2013) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/Ragan-Kelley M, Pérez F, Granger B, Kluyver T, Ivanov P, Frederic J, Bussonier M (2014) The Jupyter/IPython architecture: a unified view of computational research, from interactive exploration to communication and publication. American Geophysical Union Fall Meeting, abstract #H44D-07. http://adsabs.harvard.edu/abs/2014AGUFM.H44D..07R. Accessed 15 April 2016
Reichman OJ, Jones MB, Schildhauer MP (2011) Science challenges and opportunities of open data in ecology. Science 331:703–705. doi:10.1126/science.1197962
Samhouri JF, Lester SE, Selig ER, Halpern BS, Fogarty MJ, Longo C, McLeod KL (2012) Sea sick? Setting targets to assess ocean health and ecosystem services. Ecosphere 3:art41. doi:10.1890/ES11-00366.1
Samhouri JF, Haupt AJ, Levin PS, Link JS, Shuford R (2014) Lessons learned from developing integrated ecosystem assessments to inform marine ecosystem-based management in the USA. ICES J Mar Sci 71:1205–1215. doi:10.1093/icesjms/fst141
Shen H (2014) Interactive notebooks: sharing the code. The free IPython notebook makes data analysis easier to record, understand and reproduce. Nature 515:151–152. doi:10.1038/515151a
Taconet M, Ellebroek A, Castelli D, Pagano P, Caumont H, Garavelli S, Parker S (2014) Sustaining iMarine: a public partnership led business model. The iMarine Sustainability White Paper, final release November 2014, 65 pp. ftp://ftp.fao.org/FI/DOCUMENT/FIGIS_FIRMS/2015/Inf11e.pdf. Accessed 23 December 2015
Tilmes C, Fox P, Ma X, McGuinness DL, Privette AP, Smith A, Waple A, Zednik S, Zheng JG (2013) Provenance representation for the national climate assessment in the global change information system. IEEE Trans Geosci Remote Sens 51:5160–5168. doi:10.1109/TGRS.2013.2262179
van den Hove S (2007) A rationale for science-policy interfaces. Futures 39:807–826. doi:10.1016/j.futures.2006.12.004
Wilson G (2014) Software Carpentry: lessons learned. F1000Research 3:62, Version 1, 19 Feb 2014. doi:10.12688/f1000research.3–62.v1
Wright DJ (2016) Toward a digital resilience. Elementa: Science of the Anthropocene 4:000082. doi:10.12952/journal.elementa.000082
Acknowledgments
We would like to thank others in the Ecosystem Assessment Program who contributed to use case development, including G. DePiper, K. Friedland, S. Gaichas, K. Hyde, R. Gamble, M. Jones, and S. Lucey. We would like to thank our colleagues for commenting on this manuscript, including B. Lee, J. Futrelle, X. Ma, A. Shipunova, A. Voorhis, and S. Zednik, and three anonymous reviewers. Support for this research was provided by the U.S. National Science Foundation #0955649 with additional support to SB by the Investment in Science Fund at Woods Hole Oceanographic Institution.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: H. A. Babaie
Rights and permissions
About this article
Cite this article
Beaulieu, S.E., Fox, P.A., Di Stefano, M. et al. Toward cyberinfrastructure to facilitate collaboration and reproducibility for marine integrated ecosystem assessments. Earth Sci Inform 10, 85–97 (2017). https://doi.org/10.1007/s12145-016-0280-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12145-016-0280-4