Managing risks in drug discovery: reproducibility of published findings

In spite of tremendous advances in biopharmaceutical science and technology, the productivity of pharmaceutical research and development has been steadily declining over the last decades. The reasons for this decline are manifold and range from improved standard of care that is more and more difficult to top to inappropriate management of technical and translational risks along the R&D value chain. In this short review, major types of risks in biopharmaceutical R&D and means to address them will be described. A special focus will be on a risk, i.e., the lack of reproducibility of published information, that has so far not been fully appreciated and systematically analyzed. Measures to improve reproducibility and trust in published information will be discussed.

The decline in pharmaceutical R&D productivity The last decades have seen major advances and productivity gains in science and technology. The polymerase chain reaction (Saiki et al. 1985), for example, has revolutionized molecular biology and made it possible to amplify and quantify nucleic acids in a short time and at high throughput.
Next-generation sequencing technologies (Ozsolak 2012) have reduced time and cost of whole-genome sequencing by several orders of magnitude from more than 10 years and three billion dollars for the first sequence of the human genome (Lander et al. 2001;Venter et al. 2001) to a few days and a thousand dollars (Hayden 2014), with further time and cost reductions being in sight. Computer power has increased exponentially: With well in excess of 10 11 floating point operations per second (100 GFLOPS), modern smartphones are more than ten times more powerful than Deep Blue, the IBM supercomputer that beat chess world champion Garri Kasparow in 1997, that achieved 11.4 GFLOPS. The amount of sequencing, metabolomics, proteomics, microarray, and controlled-access human data available at the European Bioinformatics Institute approximately doubles every 12 months (Elixir 2014). RNA interference and gene-editing technologies have made it possible to investigate the role of individual genes and gene variants in complex systems in vitro and in vivo. The number and size of available chemical libraries have increased tremendously (Dolle 2011), and these compound libraries can be tested against protein targets at significantly higher throughput and lower costs (Mayr and Fuerst 2008).
Yet over the same period of time, pharmaceutical R&D has suffered from a steady decline in productivity. Whereas in other industries, output per invested amount of money has steadily improved, drug discovery and development have increasingly become more expensive, i.e., the amount of money to be invested for a new drug to be approved has approximately doubled every 9 years. This trend has been remarkably stable over the last six decades (see Fig. 1a). In analogy to BMoore's law^that describes the exponential increase in productivity in the semiconductor industry based on the observation that the number of transistors on an integrated circuit approximately doubles every 2 years, this trend has been called BEroom's law^, Moore's law in reverse (Scannell et al. 2012). Using different metrics, the numbers are even more disconcerting: setting the amount of research and development money spent in relation to the number of new drugs approved in the period between 1997 and 2011, it was estimated that true R&D costs per newly approved drug ranged from 3.7 to 11.8 billion US dollars in 12 major pharmaceutical companies (Herper 2012).
There are several potential reasons for this decline in R&D productivity: First, in many therapeutic fields, the standard of care is comparably efficacious and safe, setting a high bar for approval and re-imbursement of new drugs. This has been coined the Bbetter than the Beatles^problem (Scannell et al. 2012) illustrating that every new drug has to be superior to all of what is already available. Second, health authorities have become more cautious of potential drug safety issues following, e.g., the cases of rofecoxcib and cerivastatin, and accordingly raised the bar for new treatments. For example, in 2008, the FDA issued guidance on the development of antidiabetics, requiring long-term cardiovascular safety trials to be performed for each new antidiabetic drug (FDA.gov 2008). Third, clinical trial failure rates have gone up considerably within a period of two decades (Mignani et al. 2015). Of note, the highest clinical attrition rates have been observed in phase II, with lack of efficacy being the major reason for failure (Hay et al. 2014;Cook et al. 2014). This is primarily due to insufficient target validation, lack of predictive preclinical models, or appropriate biomarkers. Failure rates were found to be particularly high in oncology with a likelihood of approval of only about 10 % for a phase 2 compound whereas it was nearly twice as high in the fields of endocrinology or infectious diseases (Hay et al. 2014). Key approaches to reduce later-stage clinical attrition are rigorous human target validation and early clinical proof-of-concept studies (Paul et al. 2010) to address these translational risks (see below). Fourth, the tendencies to streamline and industrialize pharmaceutical R&D thereby neglecting biological complexity have also contributed to clinical failures. Fifth, non-value-adding activities that are not on the critical path of a project lead to higher costs and longer timelines (Paul et al. 2010): Experiments or studies are done because they Bcan be doneô r have traditionally been performed in previous approaches but have no impact on decision making within a specific project. Frequent changes in R&D strategy, re-organizations, and an inefficient bureaucracy with lengthy decision making processes also fall into this category of non-value-adding activities. Finally, and this is a major difference to other industries, cycle times in pharmaceutical R&D are, and will likely remain, very long: A project started today will not result in a product until 15 years-or often more-later. Within this time frame, projects may fail for technical or translational reasons (see below), but there are also many environmental changes like improvements in standard of care, new competitors,  Phrma.org (2015). c Number of new molecular entities (NMEs) approved by the FDA for the years between 2001 and 2015. Horizontal gray bars denote the mean for the respective 5-year periods. Data from Phrma.org (2015) and fda.gov changes in regulatory requirements and medical care systems that may negatively influence the fate of a once promising idea or therapeutic concept and cannot always be foreseen when a project is initiated. Additionally, the outcome of pharmaceutical R&D is in most cases digital-a new drug product or no product. There is typically no equivalent to, e.g., a new instrument with a smaller footprint or a new car that consumes half a liter less per 100 km; it is a new drug or complete failure.
However, there is hope for better performance in future. Whereas the overall biopharmaceutical R&D spending seems to have reached a plateau (Phrma.org 2015;Fig. 1b), the number of new molecular entities (NMEs) approved by the FDA has increased in recent years, from an average of 22 per year between 2006 and 2010 to 36 between 2011 and 2015 ( Fig. 1c). In a recent analysis, Smietana et al. (2015) used a novel metric, the so-called vintage index, to assess changes in R&D productivity over time. The index is defined as the revenue over 7 years of drugs launched in a given year divided by the R&D costs over the previous 7 years. Following a steady decline over the previous 15 years, the industry-wide vintage index nearly doubled between 2011 and 2014. These are encouraging signals that measures to recover R&D productivity have been successful, but longer-term data are needed to prove that this trend is sustainable and not just a transient deviation from Eroom's law.

Risk management in pharmaceutical R&D
Drug discovery is the management of risk. Embarking on a new drug discovery project can be the start of a journey that may last well over 12 years with a price tag in excess of several hundred million euros. Yet, more than 99 % of all drug discovery projects will not result in an approved product.
Thus, most of the resources in pharmaceutical research and development will eventually not be spent on the few molecules making it to the market but primarily on the many molecules and projects that fail to do so. The reasons for failure can be manifold, from the wrong choice of target via the inability to find suitable lead compounds to unexpected toxicity or lack of efficacy in clinical trials. Moreover, projects may also be terminated for reasons that are not primarily based on science, like repeated strategy changes in pharmaceutical companies, changes in drug legislation and criteria or conditions for regulatory approval, a successful competitor, or weak intellectual property. And, even drugs that have been approved by the regulatory authorities may fail economically. Due to cost containment measures in the healthcare systems of many different countries, they simply will not give a sufficient return on the primary investment to foster further R&D.
Thus, when starting a new drug discovery project, potential risks have to be carefully evaluated and mitigation strategies to be developed. This applies to risks associated with the science aspects of the new project as well as the afore-mentioned strategic, regulatory, and commercial risks. For the part of science, two major types of risks can be distinguished: technical and translational risks.
As an attempt at a definition, technical risks can be described as those that lead to an inability to find and sufficiently characterize the right compound that meets the required profile to address the chosen target. Technical risks may present themselves as, for example, the lack of suitable bioassays, inability to generate the appropriate tools or reagents like cell models or antibodies, lack of selectivity toward the target, unsuitable pharmacokinetics, or toxicity of the generated molecules. A more comprehensive list of technical risks is given in Table 1.
Translational risks, on the other hand, can be defined as those being responsible for insufficient clinical efficacy even Potential for drug-drug interactions: CYP450 inhibition or induction Adverse effects in standard safety tests (e.g., hERG, genotoxicity) Toxicity for molecules having an otherwise perfect profile. This lack of efficacy can, for example, be the consequence of the wrong choice of target, the use of models that are not predictive for human disease, or failure of the molecule to engage the target in the clinical situation. Typical translational risks are summarized in Table 2. The most important difference between technical and translational risks is that the former can usually be mitigated by investing more resources, e.g., synthetizing and testing more molecules and repeated and different attempts at generating tools and models. Translational risks, however, cannot be addressed by more Bshots on goal^but only by a rigorous and comprehensive investigation of the science behind the target, its link to human disease, identification of biomarkers to find the right patients amenable to the envisaged therapy, to monitor target engagement and to work as surrogates for clinical outcomes.
Notably, in a recent analysis of factors leading to failures in preclinical and clinical development (Cook et al. 2014), lack of clinical efficacy in phase II was determined as the most frequent cause of failure and insufficient target linkage to disease was identified as the most important reason for lack of clinical efficacy. Moreover, phase II success rate was nearly twofold higher with human genetics evidence linking the target to the disease indication and more than threefold higher when efficacy biomarkers were available at start of phase II. Thus, addressing translational risks very early, already at the target identification stage, will dramatically improve success rate in clinical development.
This insight has led to a paradigm shift in pharmaceutical drug discovery, away from a high-throughput, multiple shots on goal approach based on single target genes being regulated in animal models of disease to a more translational approach, starting with the interrogation of human disease biology and the identification of clinical biomarkers and then back-translating these findings into appropriate experimental model systems and molecules addressing these systems.
Additionally, as a key approach to reduce phase II attrition, clinical proof of concept has to be established as early as possible, preferably in a phase I setting or even in parallel to preclinical drug discovery by, e.g., using approved molecules targeting the same mechanism and/or the use of biomarkers as surrogate endpoints (Paul et al. 2010). Thus, priority should be given to those targets with strong clinical evidence, appropriate reagents, and tools to model the clinical situation in a preclinical setting and the availability of biomarkers for target engagement, disease progression, or as surrogates for endpoints. To aid in prioritization, a translational scoring algorithm has been developed to systematically assess and quantify translational risks (Wehling 2009;Wendler and Wehling 2012).
A prominent example where selecting a drug target based on human genetics and available biomarkers has led to a rapid and successful generation and development of a drug is that of the secreted circulating protease proprotein convertase subtilisin/ kexin type 9 (Pcsk9). In 2003, Pcsk9 mutations were identified in humans that cause familial hypercholesterolemia (Abifadel et al. 2003) via a gain-of-function mechanism resulting in overproduction of apoB100 (Ouguerram et al. 2004). Conversely, humans carrying a loss-of-function mutation in their Pcsk9 gene were demonstrated to have very low LDL levels and to be protected against coronary heart disease (Cohen et al. 2006). These findings prompted several companies to develop neutralizing monoclonal antibodies against Pcsk9 that have meanwhile demonstrated strong clinical efficacy in reducing LDL cholesterol and cardiovascular events on top of statin therapy (Robinson et al. 2015;Sabatine et al. 2015) and have received regulatory approval in the third quarter of 2015. Thus, two key points have led to new drugs being approved within less than 12 years after the publication of the first basic science findings: (1) strong clinical evidence, a so-called Bexperiment of nature,d emonstrating that loss of function of a particular gene provides clinical benefit without obvious adverse effects, and (2) biomarkers like LDL-C that allow for a rapid proof of concept and are accepted surrogate outcome markers.

Reproducibility of published findings
Another important risk factor in biopharmaceutical R&D that has so far not been fully and systematically analyzed is the solidity and reproducibility of published data or, more precisely, the lack thereof. Typically, an interest in exploring a novel therapeutic target is sparked by literature data describing a role for this target in a disease context, predominantly in an accepted rodent model of disease. However, recent analyses by pharmaceutical industry researchers showed that the majority of these published studies could not be reproduced under well-controlled and standardized conditions. Prinz et al. (2011) investigated 57 drug discovery projects in the fields of cardiovascular disease, women's health, or oncology and • Insufficient human evidence, e.g., from genetics or tool compounds • Lack of (surrogate) biomarkers to demonstrate target engagement and treatment efficacy • Inability to identify the right patient population for treatment • No add-on to existing therapy concluded that in more than two thirds of them there were major discrepancies to published data leading to project termination. Another analysis (Begley and Ellis 2012) came to even more dramatic conclusions: out of 53 landmark studies in the field of oncology under consideration, many of them published in high-profile journals, the primary scientific findings could only be reproduced in six (11 %) cases. In other studies, irreproducibility rates of 50 % and above have been found in a broader range of the biomedical literature (Vasilevsky et al. 2013;Hartsthorne and Schachner 2012). Although these estimates have to be taken with a grain of salt because the term Breproducibility^is not well defined, the enormous gap between these numbers and the conventionally assumed upper false-positive rate of 5 % (corresponding to p = 0.05) is alarming. Beside its scientific implications and the resulting lack of confidence and trust, it has a substantial economic impact: Freedman et al. recently estimated that, in the USA alone, 28 billion dollars are annually spent on preclinical research that is not reproducible (Freedman et al. 2015).
There are several potential reasons for this lack of reproducibility: First, no two experiments are the same. Especially living organisms are complex systems with a lot of variables like strain, age, sex, breeding and housing conditions, diet, et cetera, leading to variable results for studies being done by different researchers at different institutions, even when the overall study design and investigated parameters are very much comparable.
Second, improper study design and incorrect or inappropriate statistical analysis based on insufficient sample size may lead to non-reliable or even false conclusions (Prinz et al. 2011;Ioannidis 2005).
Third, biological reagents are frequently not sufficiently quality-controlled or inappropriately applied. Cell lines used in biomedical research, for example, are often misidentified or contaminated (Lorsch et al. 2014). A study on 122 different cancer cell lines showed that 30 % of them were not correctly identified (Zhao et al. 2011). Multidrug-resistant MCF-7 breast adenocarcinoma cells were used over two decades and in over 300 studies before they were demonstrated to be ovarian adenocarcinoma cells (Freedman et al. 2015;Liscovitch and Ravid 2007).
Fourth, research is hypothesis-driven, and researchers are often inclined to give findings supporting their hypothesis a stronger weight compared with experimental results that are in contradiction to their hypothesis. It is a widespread phenomenon that negative outcomes are only rarely published (Fanelli 2012;Kyzas et al. 2007;Sena et al. 2010). This hypothesis bias is reinforced by the policy of most scientific publishers and journals only to publish results that come with a good story line and reject findings or datasets leading to an ambiguous outcome.
Fifth, there is enormous competition and pressure from supervisors and financing bodies to present complete and conclusive data, and continuation of employment or funding of future work often depends on publications in high-impact journals. It has been reported that the likelihood of a paper supporting a tested hypothesis is higher when the corresponding author was working in a very competitive environment (Fanelli 2010) where publication pressure may have been higher. In the most extreme, this can lead to willful disregard of contradictory data or even downright data fabrication.
Therefore, the following practical measures should be taken to improve replicability and reproducibility of published findings. It is needless to say that they cannot be implemented and Blived^by a single party but require active participation by all stake holders involved, including but not limited to investigators, supervisors, reviewers, editors, funding bodies, and committees (Landis et al. 2012).
1. Develop a Bconfirmation mindset^: Before engaging in further studies, especially in a long and costly drug discovery project, it is mandatory to repeat published studies providing the project's biological rationale and confirm their major conclusions. Ideally, this should be incentivized by journals and funding agencies. Instead of being a frequent reason for rejection of a manuscript, the first independent replication of an already published finding should be acceptable for publication, too. Alternatively, for a potential high-profile publication, journals may ask for confirmation of the major results by an independent party before accepting a manuscript for publication. Also, funding agencies or resource allocation committees ought to prioritize proposals seeking to confirm published data before performing extensive and costly follow-up experiments. Thus, future grants may provide a first tranche of financial support to replicate important former findings before granting the second, bigger tranche of money for continuing investigations based on already published data which did not obtain independent confirmation so far.

Full transparency on protocols, materials and methods:
Ensure that all experimental details necessary to replicate a set of experiments are described comprehensively and that protocols, analysis plans, and raw data are included whenever possible. Most journals provide space in online supplements to include this information. Recommendations of standards for rigorous study design, conduct, and analysis have been published (Landis et al. 2012;Ioannidis et al. 2014). However, it is still not uncommon to find papers in which even very basic information is not provided, like strain, age, and gender of animals used in the study, composition of the diet, and whether and how animals were randomized or the study was blinded. 3. Quality control: Establish quality control procedures for reagents and materials used in experimental studies. Encourage the use of validated reagents. Ensure that quality control and reagent validation are required by publishers (Freedman et al. 2015).
4. Scientific rigor: As described above, publication pressure and hypothesis bias have led researchers to seek short-term success by Bgetting the story right^rather than to go for a rigorous and robust approach that will take longer and may produce less spectacular but more valid results-a strategy to Brather be first than to be right . For this to change, investigators, promotion committees and funding bodies have to initiate a cultural change. Instead of an impact factor centric reward system, institutions and committees should also judge researchers on the methodological rigor and quality of their research and the reproducibility of their findings . Moreover, more credit should be given for teaching and mentoring (Begley and Ellis 2012) and the training of the next generation of scientists to rigorously design, conduct, analyze, and report their studies.

Robust study design and appropriate use of statistics:
Especially preclinical studies are often done in an exploratory setting, without a predefined primary outcome, unblinded, and on a small number of animals. Such experiments have to be viewed and interpreted as hypothesis generating rather than hypothesis testing (Landis et al. 2012). For robust hypothesis testing, preclinical studies should be performed in settings that resemble randomized controlled clinical trials, with appropriate sample size, stringent protocols, and predefined measures of success. This may even be taken a step further to multicenter studies at several individual laboratories that, like clinical trials, are centrally coordinated with data being processed and analyzed under a single protocol, as recently described for the testing of an anti-CD94d antibody in an experimental model of stroke (Llovera et al. 2015;Tymianski 2015). Also, it is startling how many papers are still published although they contain basic statistical mistakes (Vaux 2012). Therefore, institutions have to ensure that students and researchers are being trained on proper study design and use of statistics. Reviewers and editors have to pay special attention to verify the appropriate use of statistics. Especially high-impact journals are therefore to be encouraged to, before accepting manuscripts for publication, verify the validity of the statistical analyses by professional biomathematicians getting access to the experimental raw data. If appropriate, raw data should be provided in a supplement. 6. Embrace ambiguity: It is often the case that the results within a set of experiments testing a specific hypothesis are not fully consistent. However, in an environment where journals, promotion committees, and funding agencies strive for the perfect story and are not prepared to accept ambiguity, it is tempting to de-prioritize datasets that are not in line with the story and only to publish the parts of the work that support the working hypothesis and are consistent with each other. This not only will give a distorted picture but also may lead to misinterpretation and false conclusions for parts of the study. What is wrong with, for example, saying that Bwe have found a robust and very interesting phenotype, but the experiments interrogating its molecular mechanism have led to contradictory findings and not given a clear result^? Such a statement would make clear what is robust and what deserves further investigation, dialogue with other researchers, and perhaps collaborative efforts to solve. Investigators, journals, and funding bodies should realize that nothing is wrong with such a statement, that it often reflects reality, and that gaps in stories may provide opportunities for further research (Begley and Ellis 2012).

Conclusions
Drug discovery has changed significantly over the last decade. Previously, drug discovery was a process-driven and technology-oriented, almost factory-like high-throughput approach starting with a gene regulated in an animal model of disease (or the parallel interrogation of many disease-related genes) via the identification and optimization of compounds to their characterization in preclinical models and finally in the clinics, where, however, phase 2 attrition rates have been very high (Scannell et al. 2012;Cook et al. 2014). To reduce attrition at the clinical proof-of-concept stage, this Bmultiple shots on goal^paradigm is gradually being replaced by a more translational approach, where solid clinical evidence of target relevance, the availability of biomarkers, and a thorough mechanistic understanding of the link between target and disease are sought after very early in the drug discovery process and where projects are ranked according to translatability rather than technical feasibility or Bdruggability.B esides these process alterations, there are also cultural and mindset changes that will help to further reduce the cost of drug discovery research and hence improve its productivity. In apparent contrast to widespread scientific practice, experiments should be designed and executed to clearly demonstrate that the initial hypothesis is wrong-so called No-Go experiments-as failing early will save time and resources that would otherwise have been spent in vain. Such a Bfail earlyp aradigm should also be reflected in the company culture and incentives for researches to stop projects based on rigorous science and to mark this as success rather than failure.
Target selection-the most important decision in a drug discovery project-is often based on literature data. However, various analyses have shown that a majority of published findings could not be reproduced by others. Therefore, replication of crucial studies is mandatory before embarking on a long and costly drug discovery program. Furthermore, all stakeholders involved in biomedical research should undertake efforts to recognize and reward reproducibility, to accept and even embrace ambiguity as this will not only improve confidence and trust but also stimulate further research and scientific dialogue.