Why is Data Sharing in Collaborative Natural Resource Efforts so Hard and What can We Do to Improve it?
Increasingly, research and management in natural resource science rely on very large datasets compiled from multiple sources. While it is generally good to have more data, utilizing large, complex datasets has introduced challenges in data sharing, especially for collaborating researchers in disparate locations (“distributed research teams”). We surveyed natural resource scientists about common data-sharing problems. The major issues identified by our survey respondents (n = 118) when providing data were lack of clarity in the data request (including format of data requested). When receiving data, survey respondents reported various insufficiencies in documentation describing the data (e.g., no data collection description/no protocol, data aggregated, or summarized without explanation). Since metadata, or “information about the data,” is a central obstacle in efficient data handling, we suggest documenting metadata through data dictionaries, protocols, read-me files, explicit null value documentation, and process metadata as essential to any large-scale research program. We advocate for all researchers, but especially those involved in distributed teams to alleviate these problems with the use of several readily available communication strategies including the use of organizational charts to define roles, data flow diagrams to outline procedures and timelines, and data update cycles to guide data-handling expectations. In particular, we argue that distributed research teams magnify data-sharing challenges making data management training even more crucial for natural resource scientists. If natural resource scientists fail to overcome communication and metadata documentation issues, then negative data-sharing experiences will likely continue to undermine the success of many large-scale collaborative projects.
KeywordsNatural resource management Metadata Data sharing Data flow diagrams Distributed teams Data transfer
This work was supported by the Integrated Status and Effectiveness Monitoring Program (funded by Bonneville Power Administration (2003-017-00), the National Research Council and the Northwest Fisheries Science Center (NOAA-Fisheries). Chris Jordan, Steve Rentmeester, and Andy Albaugh provided valuable insight and experiences in the development of this manuscript.
- Armstrong DJ, Cole P (2002) Managing distances and differences in geographically distributed work groups. In: Hinds P, Kiesler S (eds) Distributed work. The MIT Press, CambridgeGoogle Scholar
- Booch G, Rumbaugh J, Jacobson I (2005) Unified modeling language user guide, the Addison-Wesley object technology series. Addison-Wesley Professional, BostonGoogle Scholar
- Federal Geographic Data Committee (1999) Content Standard for Digital Geospatial Data, Part 1, Biological Data Profile. Federal Geographic Data Committee and USGS Biological Resources Division. Report no. FGDC-STD-001.1-1999Google Scholar
- Hinds P, Kiesler S (2002) Distributed work. MIT Press, CambridgeGoogle Scholar
- Kiesler S, Cummings J (2002) What do we know about proximity and distance in work groups? In: Hinds PJ, Kiesler S (eds) Distributed work. MIT Press, Cambridge, pp 57–80Google Scholar
- Ludäscher B, Altintas I, Bowers S, Cummings J, Critchlow T, Deelman E, De Roure D, Freire J, Goble C, Jones M, Klasky S, McPhillips T, Podhorszki N, Silva C, Taylor I, Vouk M (2009) Scientific data management: challenges, existing technology, and deployment, computational science series. In: Shoshani, Rotem (eds) Scientific process automation and workflow management. Chapman & Hall/CRC, WashingtonCrossRefGoogle Scholar
- Mager RF, Pipe P (1997) Analyzing performance problems, or, you really oughta wanna: how to figure out why people aren’t doing what they should be, and what to do about it, vol 3. Center for Effective Performance, Atlanta, GAGoogle Scholar
- Nardi BA, Whittaker S (2002) The place of face-to-face communication in distributed work. In: Hinds PJ, Kiesler S (eds) Distributed work. MIT Press, Cambridge, pp 83–112Google Scholar
- Oakley KL, Thomas LP, Fancy SG (2003) Guidelines for long-term monitoring protocols. Wildl Soc Bull 31:1000–1003Google Scholar
- Quinn M, Alexander S (2008) Information technology and the protection of biodiversity in protected areas. In: Hanna KS, Clark DA, Slowcombe S (eds) Transforming parks and protected areas: policy and governance in a changing world. Routledge, New York, pp 62–84Google Scholar
- Rentmeester S (ed) (2010) Regional Guidance on Metadata for Environmental Data. PNAMP Series Report No. 2010-001. Cook, WA: Pacific Northwest Aquatic Monitoring Partnership. http://www.pnamp.org/document/2771
- Schmidt B (2009) Considerations for regional data collection, sharing and exchange. StreamNet, p 27. ftp://ftp.streamnet.org/pub/streamnet/projman_files/Data_Sharing_Guide_2009-06-01.pdf
- Turnhout E, Boonman-Berson S (2011) Databases, scaling practices, and the globalization of biodiversity. Ecol Soc 16(1):35Google Scholar
- Wallis J, Mayernik M, Pepe A, Borgman C (2008) An exploration of the life cycle of eScience collaboratory data. iConference 2008. Los Angeles, CAGoogle Scholar