Reaching Out to Collaborators: Crowdsourcing for Pharmaceutical Research
Examples of Crowdsourcing Resources for Pharmaceutical Research
Community for do-it-yourself biologists
Materials, protocols and resources
Open Notebook science challenge
Crowdsourced science challenge—initially on solubility measurement
Example of one scientist’s open notebook
Science networking site
Strategies and tools for faster, efficient web-enabled scientific research
Curated biological pathways
Open Source Drug Discovery
Collaboration around genomics and computational technologies
A recent example of the power of crowdsourcing is the availability of freely accessible online resources to enable and support drug discovery. For instance, online databases, including PubChem, Chemical Entities of Biological Interest or ChEBI database (http://www.ebi.ac.uk/chebi/), DrugBank (http://www.drugbank.ca/), the Human Metabolome Database (www.hmdb.ca) and ChemSpider (http://www.chemspider.com/) represent good examples (6, 7, 8) in addition to commercial databases (9) and collaborative systems like CDD (http://www.collaborativedrug.com). These represent either government or privately funded initiatives with vastly differing resources and scopes. Chemistry (and with it biology) information on the internet has thus become more accessible just as we are seeing a massive increase in screening data coming from individual laboratories. Sometimes there are synergistic benefits of crowdsourcing; for example, the efforts behind the ChemSpider platform, originally a hobby project housed from a basement and recently acquired by the Royal Society of Chemistry, has been acknowledged to have greatly enriched the content in the NIH’s PubChem (9). We are also seeing crowdsourcing applied to get more perspectives on a problem, for example the annotation of 64 putative tools and probes from the NIH Roadmap MLSCN effort by scientists from different groups, using multiple filtering methods or molecule quality metrics (10).
What does the future hold for such databases and other crowdsourcing efforts, and what are some of the challenges to be aware of? While access to very large datasets as a starting point for biological information and modeling may be of value, there should be concerns regarding the quality of the compounds used for screening, e.g., will there be a high percentage of false positives? What about the fidelity of the data? Is the same batch of compound used by different groups? Are there experimental differences that result in large inter-lab differences in the manner in which they use technologies (11)? Do cell passage numbers differ? Are the internal standards the same? What is the diet of the animals used? What is the impact of dissolution variance (12)? Will the naïve user actually be able to dissect out the false positives or issues with data curation (13), which may represent a potential pressure point? What about issues with data protection, anonymity, ethics and tissue handling (14)? There are a myriad of other related questions and issues which could hamper merging data from different groups. On the opportunity side, there may be some obvious value in the smaller-scale experiments from individual laboratories being stored in a single location. Perhaps we can learn from the systems biology or network-building software community that have either manually or automatically annotated large databases (instances of object X interacts with object Y, either directly or indirectly) from individual experiments (15). Rarely is there kinetic data captured in these efforts, and yet if a database of such information could be created, this would become accessible. We can therefore see a need driven by the academic community predominantly for the curation of their single experiments in biology with benefits for preventing repetition and possible decrease of animal and reagent usage. This drive to curate biology can be encouraged by publishers and funding agencies, but once annotated in a desirable format (e.g. there would be a need to capture the experimental protocol, and an ontology would be essential (16)) and location, the information could be freely available for other efforts, whether in data mining, SAR, software development or network building. The goal should be to bring scientists to a point where their data is shared and useful. It is one thing to provide large supplemental files with publications, but it is another to put the data in a location and format so that others can potentially learn from it. Perhaps Pharmaceutical Research (and for that matter other pharmaceutical journals) could play a role in ensuring that data within articles in the journal are deposited with freely accessible databases, such as PubChem or ChemSpider and beyond. Ultimately, we foresee there will be a highly networked structure linking the many crowdsourced database or other non-database tools to reduce redundancy. While we have already seen a dramatic growth in accessible databases, the innovation around computational methods for data analysis and mining have really not kept pace (13). There is an opportunity here for the scientific community to address these needs, and we may see a new wave of informatics company innovation. This could be catalyzed by public or private funding or even crowdsourcing X-prize type awards (http://www.xprize.org/future-x-prizes/life-sciences).
Perhaps there also needs to be some degree of focus initially to such an open drug discovery model to increase the probability of success, maybe around a neglected disease like Malaria or Tuberculosis (TB), or even rapidly emerging diseases (like swine flu), to demonstrate that it is more than a utopian concept. The incentive here could be that these diseases are rapidly becoming of more concern globally and could increase demand on healthcare resources (e.g. the reemergence of TB and ease of transmission). Targeted questions could be posed to the crowd regarding approaches to surmounting TB drug resistance, latency, target identification or developing novel delivery mechanisms (17,18). In addition, a gap analysis may be performed with the crowd to see what other novel issues may not have been considered.
As individuals, we are continually challenged by demands on our time and resources, both financial and intellectual, and participating in crowdsourcing neglected disease efforts would surely be a big motivating factor for many. Some companies allow their employees to pursue personal projects as a small percentage of their time to foster creativity. Why not allow them to give back in this way and by contributing to open-source science, which may be another way to focus the research around their own areas of interest and skills? Perhaps governments can recognize the potential benefits and provide participating companies tax credits or other incentives. For example, the German government pays people to add to Wikipedia (http://www.boingboing.net/2007/06/27/german_government_pa.html). With the stipulation of the Open Access policy by the NIH recently, government funds are effectively being directed in a manner that results in the release of data to the public very shortly after publication. This is an activity motivated by government grants.
For some in the “for profit” realm, the motive for much of their “open crowdsourcing” efforts is the revenue that is accrued from an innovation. For others, the motivation to participate in open drug discovery may not be financial but purely philanthropic in nature or simply the pursuit of an intellectual challenge. Think of it as the ultimate challenge where scientists collaborate with thousands of people to help global health. These two types of members of the crowdsourcing community could coexist. We would welcome suggestions from the various stakeholders with an interest in all aspects of the pharmaceutical R&D value chain on how open pharmaceutical collaborations could be facilitated. This is certainly an unsettling time in the industry, but after the storm has settled, we may be in a unique position to do further aspects of R&D differently and more cost-effectively, with implications for the whole scientific community and global healthcare. Less may indeed be more.
We are grateful to many discussions with colleagues on this topic as well as the reviewers suggestions.
Conflicts of Interest
SE consults for Collaborative Drug Discovery, Inc. on a Bill and Melinda Gates Foundation Grant #49852 “Collaborative drug discovery for TB through a novel database of SAR data optimized to promote data archiving and sharing.” He is also on the advisory board for ChemSpider. AJW is employed by the Royal Society of Chemistry, which owns ChemSpider and associated technologies.
- 1.Bingham A, Ekins S. Competitive collaboration in the pharmaceutical and biotechnology industry. Drug Discov Today. 2009;14(23–24):1749–81.Google Scholar
- 5.Tapscott D, Williams AJ. Wikinomics: how mass collaboration changes everything. New York: Portfolio; 2006.Google Scholar
- 8.Louise-May S, Bunin B, Ekins S. Towards integrated web-based tools in drug discovery. Drug Discovery—Touch Briefings. 2009;6:17–21.Google Scholar
- 13.Williams AJ, Tkachenko V, Lipinski C, Tropsha A, Ekins S. Free online resources enabling crowdsourced drug discovery. Drug Discovery World. 2010;11:1, Winter 2009/10.Google Scholar