Scientific and technical data sharing: a trading perspective
- 877 Downloads
It is arguably a precept that the open sharing of data maximises the scientific utility of the research that generated that data. Indeed, progress depends on individual scientists being able to build on the results produced by others. The means to facilitate sharing undoubtedly exist, but various studies have identified reluctance among researchers to share information with their peers, at least until the professional priorities of the original researchers have been accommodated. With a view to encouraging less inhibited collaboration, we appraise the processes of data exchange from the perspective of a trading environment and consider how data exchanges might promote (or perhaps hinder) collaboration in data-rich scientific research disciplines and how such an exchange might be set up. We suggest an exchange with trusted brokers (akin to the commodity markets) as a way to overcome the challenges of the current environment. We conclude by encouraging the scientific and technical community to debate the merits of a trading perspective on data sharing and exchange.
KeywordsPerspective Viewpoint Data Sharing Trading Exchange Broker
It is arguably a precept that the open sharing of data maximises the scientific utility of the research that generated that data . Indeed, progress depends on individual scientists being able to build on the results produced by others . The means to facilitate sharing undoubtedly exist, but various studies have identified reluctance among researchers to share information with their peers, at least until the professional priorities of the original researchers have been accommodated. However, there is little evidence of an integrated approach to the establishment of models that encourage open sharing and exchange, notwithstanding the appeals in the Royal Society report . In this viewpoint article we consider the context and environment in which science tends to be conducted and put forward a model for integrating the channels for sharing and exchange, focusing particularly on the chemistry domain, while recognising fully the potential for wider application.
In the context of scientific collaboration, the terms sharing and exchange are used almost interchangeably, thereby disregarding the wider understanding that exchange involves receiving something in return, whereas sharing might be more altruistic. Such distinctions often become blurred, in that sharing can be mutual and exchanged items are not necessarily equivalent in value. In the scientific research context specifically, neither sharing nor exchange involves an explicit assumption of receiving assets in return, apart from a reliance that proper attribution, such as citation, will be given.
We use the terms sharing and exchange interchangeably and in conjunction in this article, adopting a broad interpretation that each activity involves an individual or group making data and other collateral available for other researchers to use.
In this article, we also refer to open sharing, open access, open data, and openness in general. The Royal Society report  does not venture a definition of the term open, but does very effectively convey the widely accepted understanding of the meaning of the term, which we in turn adopt.
Trade is a synonym of exchange, with the subtle implication that the trading process involves some form of regulatory procedure. Trading commonly involves an agent of some form; we assess in a subsequent section the potential role of an agent in trading scientific and technical data.
Context and environment
We are presenting a viewpoint with regard to the sharing of data that is generated as part of an experiment or other activity, in contrast with data created with the express intention of sharing it for a specific purpose. Our criterion for making such a distinction when defining our context is one of purpose.
For example, the Cambridge structural database (CSD) is “the world’s repository of experimentally determined organic and metal–organic crystal structures” . Crystallographers determine these structures with the purpose of depositing the data to the CSD.
This article relates to facilitating the exchange of the substantial amounts of data generated as an adjunct to research with a purpose other than data creation. One example of such data is the spectral information obtained when confirming the structure of a synthetic product. Typically, the interpretation of the spectra would be included with the published report of the synthesis, but the raw spectral data would remain private. The context of this article is the sharing of complementary data, which is commonly retained in a variety of institutional and private stores.
That range of stores is a primary feature of the environment in which complementary data is generated and preserved. Data might be stored in a national or subject area repository, an institutional (university) repository, a web-based store such as Zenodo  and Figshare , a laboratory repository, a personal computer, and even on a flash drive or other portable media. Data might also be preserved with a publication, as supplementary material. The preservation environment is therefore complex and indicative of diverse means whereby shared data can be accessed.
Undoubtedly a considerable amount of data exchange and sharing occurs on a peer-to-peer basis, relying on pre-existing relationships. This article does not explore such informal exchanges, which would not in any case be susceptible to more formal processes.
Tracking data accesses and reuse thereafter is a more haphazard process, although tools are increasingly becoming available. In all cases, however, the onus is on the researcher to check for access and reuse. Google Scholar  has a provision for tracking citations to publications and researchers can deploy ImpactStory , for example, to measure and share their research impacts. Identifiers such as ORCID and ResearcherID enable the unique identification of researchers. The most significant challenge in tracking access and reuse is to establish with some certainly whether data has been reused or repurposed or obtained merely as a matter of interest or even curiosity.
In some instances, researchers will impose restrictions on the use of data that they are otherwise willing to share, the best-known example being the embargo placed on items until a given date, usually a publication date. Other forms of regulation are copyright licenses such as the Creative Commons license  and specific requirements applied by data providers, such as requests for feedback.
In presenting our viewpoint, we appraise the environment for exchanging and sharing data generated as a consequence of another activity. This environment comprises a disparate set of facilities that lack a means for linking them together, which we regard as a missing “hub” in an area of growing importance and activity. It is a pivotal point that the hub we envisage is not another data store: it is the embodiment of a mechanism for sharing data through a trusted broker that we believe has a well-founded analogy with a trading model. The broker would manage packages (manifests) that describe the data to be exchanged but would not manage or handle the data itself. The “currencies” involved in the trade are reward and recognition for the researcher.
We believe that the existence of a trusted broker operating a regulated data exchange mechanism would offer a valuable opportunity to the Higher Education Community. We further consider that a consortium of national and subject area repositories could advantageously operate the broker. We therefore offer our viewpoint to the community for discussion.
The chemistry domain
We focus on the chemistry domain, while recognizing the indications that barriers to open access exist elsewhere in applied informatics. With regard to studies of information-seeking behaviour, Davis described chemists as an “ideal group to study”, owing to their heavy use of journal literature . In the 10 years since that article was published, chemistry has become ever more dependent on data and accordingly on the preservation, curation, discovery, access, and provenance of that data. The authors of this article have recently reviewed information and data sharing in the chemical sciences from an e-Research perspective .
A survey of data sharing and the use of collaborative technologies by chemists revealed attitudes that appear to be inconsistent with a reliance on the published work of other researchers, such as a tendency to store data as hard copy and a reluctance to allow open access to research results . Another article noted that chemists were not taking full advantage of Web-based resources, yet needed “unfettered bench-top access via the Web” . A recent study of information sharing and exchange in the life sciences detected similar characteristics within that domain . Bird, Willoughby, and Frey discuss attitudes to sharing and the attendant implications for record keeping in chemistry and other sciences in their review of laboratory notebooks in the digital era .
Reward systems favour publication rather than data curation;
Significant effort is required to organise, manage, and curate data; we refer to this as the burden of curation;
Research tends to be competitive, leading to a reluctance to share data until papers have been published and/or data is no longer commercially sensitive;
Researchers value ownership of their data: it is their intellectual property.
We recognise that this potential minefield of conflicting interests and requirements requires a different perspective if the technical innovations of the Web and digital commerce are to help to increase the efficiency of scientific research by encouraging higher quality dissemination of data.
With a view to encouraging the growth of scientific and technical collaboration, we put forward a trading environment as a fresh perspective on scientific data sharing and exchange. We note that historically markets and exchanges evolved to address and manage the inhibitions and conflicts outlined in the preceding paragraphs, perhaps not always completely effectively but ideally at least transparently and with a clear audit trail.
Before we examine models for trading and exchange, we offer in the following section two illustrations of what can occur if data is not shared openly and in contrast what should happen.
Motivations for open data sharing
In our introduction to the context and environment for this viewpoint article, we gave as an example of data generated as an adjunct to other research the spectral information obtained when confirming the structure of a synthetic product. We noted that, typically, the interpretation of the spectra would be included with the published report of the synthesis, whereas the raw spectral data would remain private.
The phrase “remain private” can conceal a range of potential inhibitors to the subsequent reuse and/or reinterpretation of that raw data. At best the data would have been preserved in an institutional (university) repository; at worst the data would have been retained on a personal computer or on portable media. The less controlled the storage medium, the greater the risk that the raw spectral data might subsequently be misplaced, making any reuse or reinterpretation impossible.
If the research group implements a robust data management policy, the raw spectral data would be held in an open access repository that has a warranted preservation period. The publication of the synthesis would be accompanied by a data citation, using for example a DataCite DOI.
Models for trading and exchange
A trading infrastructure offers a novel social and technological solution based on economic models of exchange, which have evolved to ensure transparency, access, appropriate acknowledgement, and adherence to licence conditions. Exchanges (such as commodity and stock exchanges) can be found in all developed economies and once established, provide the reassurance necessary to encourage trading and to facilitate analysis, regulation, and accountability as required.
Among economic models of exchange, the closest correspondences to the sharing of scientific information are: (a) the gift economy, in which custom governs exchanges, rather than explicit remuneration contracts; and (b) knowhow trading, which is based on informal exchanges of technical knowledge. In these models the value of the commodity is not an explicit monetary value, but usually a rather more tenuous concept.
Carter  describes knowhow trading as “the informal exchange of practical technical knowledge between pairs of engineers and other technicians in different firms”. Her approach is theoretical, whereas Meyer  relies on examples of “collective invention”. This term was originally defined by Allen  as “the free exchange of information about new techniques and plant designs among firms in an industry”. Meyer redefines the term as “a process in which improvements or experimental findings about a production process or tool are regularly shared”. Carter and Meyer set the scene, even though their context is commercial and related more to manufacturing rather than scientific data.
Communities can share technical information in ways ranging from patent licensing through collaborative projects to ad hoc groups with a common interest, such as the open source community. Knowhow trading lies in the middle of the sharing spectrum: it is based on established relationships. Meyer shows a presumption in favour of collective invention when outcomes are uncertain, noting that the “process evaporates” when the uncertainty diminishes.
“Technological advances are often kept secret or patented, making them the intellectual property of their inventors. Scientific advances are more often published openly. One reason for the difference is that scientific investigation is driven so much by curiosity, whereas technological investigation is clearly functional, driven by the goal of producing something and usually to earn a profit.”
Von Hippel  characterized the behaviour he observed in steel minimill processing as: “an informal trading network that develops between engineers having common professional interests”. He distinguished knowhow trading from Allen’s view of collection invention in that the valuable information exchanged between traders remains secret from non-traders. On the other hand, Carter emphasises the practical nature of the technical knowledge and notes that it is “generally cheaper to acquire knowhow through exchanges”, even though competitive advantage can thereby be lost.
When involved in a trading network, scientific researchers and commercial technology developers would have in common not only a community with mutual interests but also a view of information and data as assets than could—optionally—be shared and/or exchanged, rather than as goods that have a market value. Moreover, while acknowledging that some advantage might be surrendered as a consequence of cooperation, a trading network should bring a recognition that cooperation is almost always cheaper (in saving of effort as well as in financial terms) than “going it alone”.
Knowhow trading is a form of barter, in which the partners exchange intellectual property assets, with an expectation that the assets are of approximately equivalent significance. In the scientific research context specifically, neither sharing nor exchange involves an explicit assumption of receiving assets in return, apart from a reliance that proper attribution, such as citation, will be given if recipients use the information or data to further their own work. In that sense, arrangements for scientific sharing would resemble a gift economy rather than knowhow trading. In effect, the existence of established relationships is a requirement of knowhow trading. Although data sharing is more likely to occur within an existing relationship, data can also shared on an ad hoc basis. As noted in the Introduction, the future progress of science depends on promoting a culture of sharing.
Arising from the broader basis just outlined, the selection of partners that is implicit in knowhow trading is not a necessary feature of sharing. With knowhow trading as described by Carter and von Hippel, the rewards are technological advance and saving the costs of “re-inventing the wheel”. When scientific data is shared, the rewards are imprecise and less certain: the advancement of knowledge and understanding; and the enhancement of professional reputations through citation.
“Would-be traders develop networks of colleagues and get to know potential partners through professional organizations and informal referrals. Cumulative experience provides a basis for judging individual partners’ contributions.”
Although Carter rightly points out knowhow trading could extend into “multi lateral networks of three or more”, the nature of such networks would remain constrained by their constituent relationships. Meyer devotes an entire section to discussing the “social network perspective” of collective invention. Social networking support is vital for an information-sharing community, which can in practice be very fluid in nature.
Trading scientific and technical data
Before we discuss whether scientists could—and should—trade in data, we consider briefly the extent to which data is, or can be, a part of “knowhow”. For example, if a process, procedure, or characterization relies explicitly on some data, that data forms part of the “knowhow”, and therefore becomes a component of the trade. Alternatively, if the data is no more than the result of applying a “knowhow” method, then consequential data is neither part of the knowhow nor a necessary component of any trade, although the data might be supplied gratuitously.
Information and any data associated with it are already traded routinely, one example being patent licensing, so the answer to the ‘could’ question has to be “yes”. Information differs from tangible goods in several ways, notably that information is held jointly rather than transferred. (Carter notes that special arrangements would be required to deny access to the original possessor of information).
The total holding of traded tangibles remains constant, whereas the sum of information holdings is a multiple of the number of possessors. Moreover, information has effectively no distribution costs, whereas the distance between suppliers and buyers of tangibles can influence trade, for example by encouraging entrepreneurs to set up markets in neutral locations;
Trade in tangibles usually occurs to offset imbalances: we trade our excess stock for other items that for us are in short supply and will be useful to us. When we trade information we do so only on the basis of its usefulness and not its quantity. The value of tangibles can derive from both intrinsic characteristics and usefulness; information has value only in terms of its usefulness. The value of information can vary between possessors and is fragile: one inappropriate transfer can destroy the usefulness of information and therefore its value.
We conclude that scientific data could be traded, provided we take due account of the preceding considerations. However, we argue that scientific data should not be traded in the same sense as tangible goods, partly in response to the same considerations, but particularly because such trading would run counter to the philosophy of open data .
Although scientists undoubtedly do share information without any formal trading arrangement, they are all aware of the disincentives that inhibit the development of a fully open culture. Carter, probing “the economic incentives that motivate the sharing of technical information”, argued that persistence of information provided a strong incentive to exchange . However, Collins had previously attributed a lack of cooperation between universities to their sense of competition . The remarks he quotes are interesting from a behavioural perspective, as is his observation: “Nearly every laboratory expressed a preference for giving information only to those who had something to return”. These conservative attitudes towards true openness are still in evidence today: we have noted how both Downing  and Borgman  describe attitudes that are inimical to the open sharing of information and data.
A data-sharing model must therefore both provide incentives and overcome the inhibitions introduced by competition. The incentives for any individual scientist might range from increasing the sum of human knowledge to achieving international recognition in one’s field, from self-denying curiosity to naked ambition. Without being cynical, it seems highly unlikely that any scientist would reject recognition for good work done, which leads to the conclusion that appropriate acknowledgement would provide a strong incentive.
A data-sharing model must offer explicit benefits to producers, not only for sharing their data, but also for the effort of curation and creating the metadata. The advantage in sharing data might appear to lie with consumers, but in addition to their obligations to give proper attributions, such as citations, they in turn can become producers who rely on the integrity of other consumers. A data-sharing model based on knowhow trading thus requires an infrastructure that facilitates the formation of social links based on trust, thereby realising the benefits of sharing scientific information and data on much the same basis as technological knowhow.
We believe that a research data exchange with associated services is capable of allaying the main misgivings of practitioners about making data more freely available, and ensure that not only the wider education community but also the public achieves maximum gain from research. To overcome the deterrents identified by Borgman and other observers, the trading infrastructure must ensure transparency, access, acknowledgement, and compliance with conditions. To develop a solution based on a trading environment it is essential that we appreciate fully the obligations that might arise from the practice of trading data in a scientific context, in our case chemistry.
The concept of a notional hub that brings together the disparate facilities comprising the data exchange environment is a valuable aid to understanding the embodiment of a mechanism for sharing data through a trusted broker that we believe has a well-founded analogy with a trading model. Moreover, it is a pivotal point that the hub we envisage is not another data store.
As shown in Fig. 1, the broker would accept packages consisting of metadata conforming to a prescribed schema that describes the nature, provenance, and access provisions for the data being published. The package would not include the data itself.
A standard description of each item of data that the broker has available;
A controlled vocabulary of terms used to classify the data held by the broker;
Validation of packages on receipt;
Search facilities for discovering data, not only using the controlled vocabulary but also via free text search;
A mechanism for querying the provenance of an item or set of items;
Records of all accesses by consumers;
Automatic notification to producers of queries and requests, which would form the basis for a reward and recognition system;
Services, for example, to provide feedback from the consumer to the data producer.
Enabling open access to data and metadata, including provenance data;
Enabling publishers to restrict the distribution of the data they share;
Relieving the burden of curation using, for example, methods as described by Shotton et al. ;
Providing access records that researchers can trust and thereby overcome their inhibitions regarding ownership and open access;
Establishing a trusted reward and recognition system that can recognise use of shared data in the form of a citation, drawing on the experience of organisations such as DataCite .
In the UK the funders of significant government-sponsored research have begun to require explicit data management policies and demand that data be shared as the default strategy. These funders and the Universities and the main recipients of these funds are only just beginning to realize the implications of these demands in terms of the information infrastructure required to collect, retain, curate, and deliver the data (in context and with provenance to meet the requirements of transparency). The researchers’ requirements in terms of reward also need to be considered. If the necessary reward structures are not present then there is no incentive for the researchers to participate beyond what they are contractually obliged to by grant condition; this is not the way to archive the highest quality data and metadata in the public domain.
In this viewpoint article, we suggest that a trading perspective could provide a fruitful area of research, which could benefit from the existence of a proven technology, the trusted broker, on which to base a proof of concept. We suggest that an exchange built upon a trusted broker would make a very valuable component of the national and international data infrastructure and could solve many of the perceived problems in data sharing. We hope by offering this viewpoint to encourage the scientific and technical community to debate the merits of a trading perspective on data sharing and exchange and the potential of trusted brokers and their associated services to promote the open sharing that we believe is so important for future progress.
This article arose from many discussions under the e-Science research grants and originally saw the light of day as grant applications, for research which has as yet not been funded. Our views have been formed both as practicing chemists and information researchers and been brought into focus over the last decade with work funded by the RCUK e-Science programme (EPSRC Grant GR/R67729, EP/C008863, EP/E502997, EP/G026238, BBSRC BB/D00652X), the EPSRC National Crystallography Service (Tender RCUK/D/EPSRC/Facilities/XRC/10), the HEFCE and JISC Data Management Programme and the University Modernisation (UMF), and most recently the RCUK Digital Economy Theme as part of the IT as a Utility Network + funding (EPSRC EP/K003569). These views could not have been honed without considerable interaction with our colleagues in Chemistry, Computer Science, and Statistics in Southampton and the e-Research South Consortium (EPSRC EP/F05811X), especially UKOLN, OeRC, STFC and the DCC, and our professional society and industrial colleagues at the Royal Society of Chemistry, Microsoft Research (MSR) and IBM.
- 1.Boulton G, Campbell P, Collins B, Elias P, Hall W, Laurie G, O’Neill O, Rawlins M, Thornton J, Vallance P, Walport M (2012) Science as an open enterprise. Royal Society, LondonGoogle Scholar
- 3.The Cambridge Crystallographic Data Centre (CCDC). http://www.ccdc.cam.ac.uk/pages/Home.aspx
- 4.Zenodo. http://zenodo.org/
- 5.Figshare. http://figshare.com/
- 6.DataCite. http://www.datacite.org/
- 7.Downloading an Archive in Amazon Glacier. http://docs.aws.amazon.com/amazonglacier/latest/dev/downloading-an-archive.html
- 8.Google Scholar. http://scholar.google.co.uk/
- 9.ImpactStory. http://en.wikipedia.org/wiki/ImpactStory
- 10.Creative Commons license. http://en.wikipedia.org/wiki/Creative_Commons_license
- 13.Downing J, Murray-Rust P, Tonge AP, Morgan P, Rzepa HS, Cotterill F, Day N, Harvey MJ (2008) SPECTRa: the deposition and validation of primary chemistry research data in digital repositories. J Chem Inf Model 48(8):1571–1581Google Scholar
- 15.Williams R, Pryor G, Bruce A, Marsden W, MacDonald S (2009) Patterns of information use and exchange: case studies of researchers in the life sciences. A report by the Research Information Network and the British LibraryGoogle Scholar
- 19.Meyer PB (2003) Episodes of collective invention. U.S. Department of Labor, Office of Productivity and Technology, Working Paper 368Google Scholar
- 21.von Hippel E (ed) (1988) Cooperation between rivals: the informal trading of technical know-how, Chap 6. In: The sources of innovation, Oxford University Press, OxfordGoogle Scholar
- 23.What is open data? http://linkedgov.org/what-is-open-data/
Open AccessThis article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.