Reproducible pharmacogenomic profiling of cancer cell line panels

Haverty, Peter M.; Lin, Eva; Tan, Jenille; Yu, Yihong; Lam, Billy; Lianoglou, Steve; Neve, Richard M.; Martin, Scott; Settleman, Jeff; Yauch, Robert L.; Bourgon, Richard

doi:10.1038/nature17987

Reproducible pharmacogenomic profiling of cancer cell line panels

Analysis
Published: 18 May 2016

Volume 533, pages 333–337, (2016)
Cite this article

From

View current issue Submit your manuscript

Peter M. Haverty¹^na1,
Eva Lin²^na1,
Jenille Tan²,
Yihong Yu²,
Billy Lam²,
Steve Lianoglou¹,
Richard M. Neve²,
Scott Martin²,
Jeff Settleman²,
Robert L. Yauch² &
…
Richard Bourgon¹

15k Accesses
184 Citations
61 Altmetric
6 Mentions
Explore all metrics

Abstract

The use of large-scale genomic and drug response screening of cancer cell lines depends crucially on the reproducibility of results. Here we consider two previously published screens, plus a later critique of these studies. Using independent data, we show that consistency is achievable, and provide a systematic description of the best laboratory and analysis practices for future studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

**Figure 1: gCSI, CCLE and GDSC drug response statistics reveal different patterns of response for specifically versus broadly active drugs.**

**Figure 2: Highly concordant sensitive and resistant labels assigned via mean viability.**

**Figure 3: Consistent biomarker scoring for gCSI and CCLE drug response data.**

**Figure 4: Follow-up screen shows SYTO 60 assay closely agrees with CTG but is significantly less precise.**

Pharmacogenomic agreement between two cancer cell line data sets

Article 16 November 2015

New insight for pharmacogenomics studies from the transcriptional analysis of two large-scale cancer cell line panels

Article Open access 09 November 2017

Using drug response data to identify molecular effectors, and molecular “omic” data to identify candidate drugs in cancer

Article 12 September 2014

References

Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603–607 (2012)
Article CAS ADS Google Scholar
Garnett, M. J. et al. Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature 483, 570–575 (2012)
Article CAS ADS Google Scholar
Haibe-Kains, B. et al. Inconsistency in large pharmacogenomic studies. Nature 504, 389–393 (2013)
Article CAS ADS Google Scholar
Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010)
Article Google Scholar
Ein-Dor, L., Kela, I., Getz, G., Givol, D. & Domany, E. Outcome signature genes in breast cancer: is there a unique set? Bioinformatics 21, 171–178 (2005)
Article CAS Google Scholar
Chan, G. K. Y., Kleinheinz, T. L., Peterson, D. & Moffat, J. G. A simple high-content cell cycle assay reveals frequent discrepancies between cell number and ATP and MTS proliferation assays. PLoS ONE 8, e63583 (2013)
Article CAS ADS Google Scholar
Greshock, J. et al. Molecular target class is predictive of in vitro response profile. Cancer Res. 70, 3677–3686 (2010)
Article CAS Google Scholar
Cancer Cell Line Encyclopedia Consortium & Genomics of Drug Sensitivity in Cancer Consortium. Pharmacogenomic agreement between two cancer cell line data sets. Nature 528, 84–87 (2015)
Yu, M. et al. A resource for cell line authentication, annotation and quality control. Nature 520, 307–311 (2015)
Article CAS ADS Google Scholar
Jiang, Y., Hascall, D., Li, D. & Pease, J. H. A charged aerosol detector/chemiluminescent nitrogen detector/liquid chromatography/mass spectrometry system for regular and fragment compound analysis in drug discovery. J. Chromatogr. A 1411, 63–68 (2015)
Article CAS Google Scholar
Dawes, T. D. et al. Compound transfer by acoustic droplet ejection promotes quality and efficiency in ultra-high-throughput screening campaigns. J. Lab. Autom. 21, 64–75 (2016)
Article Google Scholar
McDermott, U. et al. Identification of genotype-correlated sensitivity to selective kinase inhibitors by using high-throughput tumor cell line profiling. Proc. Natl Acad. Sci. USA 104, 19936–19941 (2007)
Article CAS ADS Google Scholar
Klijn, C. et al. A comprehensive transcriptional portrait of human cancer cell lines. Nature Biotechnol. 33, 306–312 (2015)
Article CAS Google Scholar
Zoppoli, G. et al. Putative DNA/RNA helicase Schlafen-11 (SLFN11) sensitizes cancer cells to DNA-damaging agents. Proc. Natl Acad. Sci. USA 109, 15030–15035 (2012)
Article CAS ADS Google Scholar
Duan, Z., Lamendola, D. E., Duan, Y., Yusuf, R. Z. & Seiden, M. V. Description of paclitaxel resistance-associated genes in ovarian and breast cancer cell lines. Cancer Chemother. Pharmacol. 55, 277–285 (2005)
Article CAS Google Scholar
Wu, T. D. & Nacu, S. Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26, 873–881 (2010)
Article CAS Google Scholar
Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010)
Article CAS Google Scholar
Greenman, C. D. et al. PICNIC: an algorithm to predict absolute allelic copy number variation with microarray cancer data. Biostatistics 11, 164–175 (2010)
Article Google Scholar
Rudin, C. M. et al. Comprehensive genomic analysis identifies SOX2 as a frequently amplified gene in small-cell lung cancer. Nature Genet. 44, 1111–1116 (2012)
Article CAS Google Scholar
Tibshirani, R. & Wang, P. Spatial smoothing and hot spot detection for CGH data using the fused lasso. Biostatistics 9, 18–29 (2008)
Article Google Scholar
Yourshaw, M., Taylor, S. P., Rao, A. R., Martín, M. G. & Nelson, S. F. Rich annotation of DNA sequencing variants by leveraging the Ensembl Variant Effect Predictor with plugins. Brief. Bioinform. 16, 255–264 (2015)
Article CAS Google Scholar
Forbes, S. A. et al. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res. 39, D945–D950 (2011)
Article CAS Google Scholar
González-Pérez, A. & López-Bigas, N. Improving the assessment of the outcome of nonsynonymous SNVs with a consensus deleteriousness score, Condel. Am. J. Hum. Genet. 88, 440–449 (2011)
Article Google Scholar
Barretina, J. et al. Addendum: The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 492, 290 (2012)
Article CAS ADS Google Scholar
Sigoillot, F. D. & King, R. W. Vigilance and validation: Keys to success in RNAi screening. ACS Chem. Biol. 6, 47–60 (2011)
Article CAS Google Scholar
Fraley, C. & Raftery, A. E. Model-based clustering, discriminant analysis and density estimation. J. Amer. Statist. Assoc. 97, 611–631 (2002)
Article MathSciNet Google Scholar

Download references

Acknowledgements

We thank G. Manning, T. Sandmann, B. Forrest and D. Stokoe for their valuable contributions to improving the manuscript. We also thank L. Shi and J. Wu for work on early phases of gCSI screening; R. Rodriguez, G. Yuen and D. Hascall for help preparing drug plates; Y. Jiang for assistance with drug plate quality control; and S. Selvaraj and M. Yu for banking and quality control of cell lines used in this study. This manuscript contains an analysis of data released by the Broad Institute (CCLE) and by the GDSC members. Those who carried out the original analysis and collection of these data bear no responsibility for the further analysis or interpretation of it. The relevant subset of the CCLE data are included in the supplementary software package with the explicit written consent of the CCLE group. The GDSC data has been included following the GDSC’s instructions for attribution and under their specified open-source license, Creative Commons 3.0.

Author information

Peter M. Haverty and Eva Lin: These authors contributed equally to this work.

Authors and Affiliations

Department of Bioinformatics and Computational Biology, Genentech Inc., 1 DNA Way, South San Francisco, California, 94080, USA
Peter M. Haverty, Steve Lianoglou & Richard Bourgon
Department of Discovery Oncology, Genentech Inc., 1 DNA Way, South San Francisco, California, 94080, USA
Eva Lin, Jenille Tan, Yihong Yu, Billy Lam, Richard M. Neve, Scott Martin, Jeff Settleman & Robert L. Yauch

Authors

Peter M. Haverty
View author publications
You can also search for this author in PubMed Google Scholar
Eva Lin
View author publications
You can also search for this author in PubMed Google Scholar
Jenille Tan
View author publications
You can also search for this author in PubMed Google Scholar
Yihong Yu
View author publications
You can also search for this author in PubMed Google Scholar
Billy Lam
View author publications
You can also search for this author in PubMed Google Scholar
Steve Lianoglou
View author publications
You can also search for this author in PubMed Google Scholar
Richard M. Neve
View author publications
You can also search for this author in PubMed Google Scholar
Scott Martin
View author publications
You can also search for this author in PubMed Google Scholar
Jeff Settleman
View author publications
You can also search for this author in PubMed Google Scholar
Robert L. Yauch
View author publications
You can also search for this author in PubMed Google Scholar
Richard Bourgon
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

P.M.H. and R.B. drafted the manuscript, prepared figures and tables and designed the factorial screen. P.M.H. and R.B. performed computational analysis and interpreted results. S.L. contributed to the development of elastic net biomarker identification software. J.S. and R.L.Y. contributed to the experimental design, data analysis, and manuscript preparation. S.M. contributed to manuscript preparation and data interpretation. E.L., J.T., Y.Y. and B.L. performed primary cell-based screening experiments. E.L. and R.M.N. designed and managed the cell-based screening experiments.

Corresponding author

Correspondence to Richard Bourgon.

Ethics declarations

Competing interests

All authors are employees of Genentech Inc. and may be stockholders of Roche Pharmaceuticals.

Extended data figures and tables

Extended Data Figure 1 Comparison of gCSI, CCLE and GDSC mean viability response statistics: all pairwise comparisons.

a, b, IC₅₀ values (the drug concentration estimated to produce a 50% reduction in viability relative to untreated cells) for the cell lines in Fig. 1. Because extrapolation beyond tested doses can produce statistically unstable results (Extended Data Fig. 2b), IC₅₀ values are truncated to the tested dose range (which varies by drug and project). c, d, Mean viability (equivalent to area under the log-dose/response curve) is plotted for all drugs that gCSI had in common with one or both other studies. 0% corresponds to complete growth inhibition at all doses; 100% corresponds to viability equal, on average, to untreated control wells. Values are capped at a maximum value of 100% for consistency with GDSC published data. gCSI mean viability and IC₅₀ values were derived from three biological replicates (Methods).

Extended Data Figure 2 Assignment of sensitive and resistant labels for specifically active agents.

a–c, Untruncated mean viability statistics from CCLE and gCSI are plotted for three drugs with specific activity. Two-component Gaussian mixture model fits to these values were performed to classify non-responder and responder distributions for CCLE and gCSI separately²⁶. Non-responder mean viability statistics were scattered around 100%, providing a good indication of assay noise levels. Point colour indicates hard sensitive (red) versus resistant (blue) thresholding based on likelihood ratio—a strategy that ignores uncertainty of classification. Dotted lines, on the other hand, indicate the final thresholds selected by the method used for Fig. 2; this method quantifies uncertainty and, if it is too high, assigns a no-call label to the cell line (Methods). Ellipses indicate the covariance from a two-dimensional, two-component fit and are for illustrative purposes only; response labels were assigned on the basis of individual fits for gCSI and CCLE separately, using all available cell lines for each data set. d–f, Hypothetical log-dose/viability curves illustrating statistical challenges presented by IC₅₀ response metric. In small screens, it is often feasible to adjust dosing scale so that a 50% inhibition (relative to vehicle-treated wells at the same time point) is cleanly achieved. For large-scale screening, on the other hand, it is common for many cell lines to fail to achieve this level of inhibition at any tested dose, and/or to fail to demonstrate a lower asymptote in their dose–response curve. The mean viability statistic, on the other hand, requires no extrapolation beyond tested doses. d, A hypothetical dose–response curve with characteristics ideal for relative or absolute IC₅₀ computation. Tested doses yield viability values well above and well below 50%. Furthermore, both upper and lower asymptotes can be precisely estimated, enabling use of relative IC₅₀, if desired (relative IC₅₀ is computed after first rescaling to the observed viability dynamic range. In the present work, however, all three studies use absolute IC₅₀). e, A hypothetical dose–response curve that yields precise upper and lower asymptote estimates but a highly imprecise absolute IC₅₀ estimate: doses over a range exceeding two orders of magnitude are all consistent with 50% inhibition, and raw data are too noisy for further refinement. Use of such IC₅₀ values, which can be frequently observed in gCSI, CCLE and GDSC results, may complicate cross-study comparisons. f, A hypothetical dose–response curve that requires potentially imprecise extrapolation for estimation of both absolute IC₅₀ and lower asymptote. We advise against use of either absolute or relative IC₅₀ for such curves, which are again common in all three studies under consideration. Finally, also note that in settings such as this, where low doses provide no information and inhibition occurs only at the highest tested doses, any response statistic is forced to rank cell lines on the basis of just a subset of the data generated. As a consequence, results for such drugs may exhibit reduced concordance regardless of analysis method.

Extended Data Figure 3 Comparison of gCSI, CCLE and GDSC primary drug response statistics: only cell lines assayed by all three projects.

Equivalent to Fig. 1, but restricted to cell lines in the three-way gCSI, CCLE and GDSC intersection. Figure 1 shows a significant number of cell lines interrogated by gCSI and CCLE that are sensitive to the three specifically active drugs. Of these, few (if any) are visible here, because few were included in the gCSI–GDSC intersection. a–d, Mean viability statistics (a, b), and IC₅₀ (c, d). Both are truncated as for Fig. 1 and Extended Data Fig. 1. gCSI mean viability and IC₅₀ values were derived from three biological replicates (Methods).

Extended Data Figure 4 gCSI–GDSC concordance for drugs common to only gCSI and GDSC.

a, Genomic features were ranked by direction and strength of their association with GDSC mean viability (from strongest negative to strongest positive). Shaded regions indicate features with significant univariate association (FDR-adjusted P < 0.10). The univariate ranks of features selected by the elastic net using gCSI drug response are indicated with vertical bars, also coloured by direction of association. b, As for panel a, but applying the elastic net to GDSC drug response data and comparing results to univariate statistics obtained from gCSI drug response. c, As in Table 1, lines called sensitive or resistant by both gCSI and GDSC were assessed for concordant classification.

Extended Data Figure 5 Consistent biomarker scoring for gCSI and CCLE drug response data.

The elastic net algorithm was applied to identify short lists of candidate biomarkers for drug response as assessed by either gCSI or CCLE mean viability values, using gCSI genomic data. Supplementary Table 4 describes the number of cell lines available for each comparison. a, Description of biomarkers expected to associate with response for each of the six drugs tested by both gCSI and CCLE; recovery of expected biomarkers; total selected feature count; and overlap between gCSI and CCLE selected feature sets. Asterisk denotes that ST7-AS1 is adjacent to MET and ST7-AS1 expression correlates with MET copy number (Pearson correlation r = 0.47, P < 2.2 × 10⁻¹⁶, n = 534). The gCSI genomics set does not include translocations, so ALK translocation could not be checked directly. b, Equivalent to panel a, but using CCLE rather than gCSI genomics data. c, d, Equivalent to Fig. 3a, b, but using CCLE rather than gCSI genomics data.

Extended Data Figure 6 Features identified as response biomarkers using gCSI and CCLE discretized mean viability drug response data versus gCSI genomics data.

Equivalent to Fig. 3 and Extended Data Fig. 5a, but using discretized sensitive/resistant labels rather than continuous drug response statistics (lines assigned a no-call label were ignored for this analysis). Recovery of expected biomarkers and scoring of features selected by elastic net in the opposing project’s univariate analysis are both comparable to results obtained from continuous mean viability statistics (Fig. 3 and Extended Data Fig. 5a). a, Description of biomarkers expected to associate with response for each of the six drugs tested by both gCSI and CCLE; recovery of expected biomarkers; total selected feature count; and overlap between gCSI and CCLE selected feature sets. b, Genomic features were ranked by direction and strength of their association with CCLE mean viability (from strongest negative to strongest positive). Shaded regions indicate features with significant univariate association (FDR-adjusted P < 0.10). Blue indicates negative association with response and red indicates a positive association. The univariate ranks of features selected by the elastic net using gCSI drug response are indicated with vertical bars, also colored by direction of association. In general, the gCSI-selected features are strongly enriched at the extreme ends of the distributions, indicating that they are highly relevant to both drug response data sets. c, As for panel b, but applying the elastic net to CCLE drug response data and comparing results to univariate statistics obtained from gCSI drug response.

Extended Data Figure 7 Features identified as response biomarkers using gCSI and GDSC discretized mean viability drug response data versus gCSI genomics data.

Equivalent to Extended Data Fig. 6, but contrasting gCSI with GDSC rather than CCLE. a, Description of biomarkers expected to associate with response for each of the six drugs tested by both gCSI and GDSC; recovery of expected biomarkers; total selected feature count; and overlap between gCSI and GDSC selected feature sets. b, Genomic features were ranked by direction and strength of their association with GDSC mean viability (from strongest negative to strongest positive). Shaded regions indicate features with significant univariate association (FDR-adjusted P < 0.10). Blue indicates negative association with response and red indicates a positive association. The univariate ranks of features selected by the elastic net using gCSI drug response are indicated with vertical bars, also colored by direction of association. In general, the gCSI-selected features are strongly enriched at the extreme ends of the distributions, indicating that they are highly relevant to both drug response data sets. c, As for panel b, but applying the elastic net to GDSC drug response data and comparing results to univariate statistics obtained from gCSI drug response.

Extended Data Figure 8 Media serum concentration affects mean viability for some drugs and cell lines.

Differences in mean viability between the 10% and 5% FBS conditions are shown for the follow-up screen. Only CTG-derived mean viability values were used. a, Changes in single-run mean viability for variable seeding data before synthesizing replicates. In many cases, direction of change was not consistent. In others (for example, 769P or CAL-12T when treated with PD0325901), richer media had a consistent effect in each replicate. For each drug, overall significance of bidirectional effect was assessed by running a paired t-test for each cell line separately, and then pooling results by Fisher’s method. Overall media effect was significant for all drugs but lapatinib (for erlotinib, P = 0.037; lapatinib, P = 0.288; paclitaxel, P = 0.014; and PD0325901, P = 2.8 × 10⁻⁴). b, Only PD09325901 achieved an average media effect across cell lines that was statistically significant and of practically important size (4 percentage points). Data for fixed and variable seeding are plotted together because no significant difference in mean viability shift was observed between them for any of the four drugs considered (two-sided, unpaired t-tests: erlotinib, n = 47, P = 0.86; lapatinib, n = 45, P = 0.29; paclitaxel, n = 48, P = 0.80; and PD0325901, n = 46, P = 0.44). Mean viability values were derived from biological triplicate experiments (Methods). c, SYTO 60 produces more variable viability measurements than CTG. The width of the 95% confidence intervals for mean viability the 24 cell lines in the follow-on screen (on viability scale; see Methods for confidence interval estimation). Symbol colour indicates seeding strategy; symbol shape indicates media condition. For all drugs and screening conditions, the vast majority of cell lines showed wider confidence intervals for SYTO 60 than for CTG, indicating increased noise in the SYTO 60 data and reduced precision in estimation of mean viability (mean increase for erlotinib was 7 percentage points, P = 1.2 × 10⁻¹¹; for lapatinib, 8 points, P = 1.6 × 10⁻¹⁴; for paclitaxel, 5 points, P = 2.7 × 10⁻¹⁰; and for PD0325901, 5 points, P = 2.5 × 10⁻⁷). This effect is not related to seeding strategy or media serum concentration.

Extended Data Figure 9 Seeding density affects drug response for multiple drugs and cell lines.

a, In the follow-up screen, the difference in estimated CTG mean viability between variable and fixed seeding is shown (with the two media conditions averaged). For broadly active compounds, slow growing lines typically yielded higher mean viability estimates under variable seeding and therefore produced a positive difference. Fast growing lines, on the other hand, tended to yield lower mean viability estimates under variable seeding (Spearman’s ρ test, P = 1.71 × 10⁻⁵). This trend was not significant for specifically active drugs (P = 8.77 × 10⁻²). Lines of average growth rate, for which fixed and variable seeding were essentially the same and thus no difference was expected, are indicated in the dashed rectangle. b, Similar to panel a, but showing the difference between gCSI (variable seeding) and CCLE (fixed seeding). Because mean viability statistics are not directly comparable when different dosing schedules are used, residuals from a regression of gCSI mean viability on CCLE mean viability are plotted in the heat maps rather than a simple difference. The line plot shows a moving average (75 cell line window) of the row averages of each heat map. As for the follow-up screen, a significant association was seen between cell line growth rate and impact of seeding density for broadly active drugs (Spearman’s ρ test, P = 7.76 × 10⁻⁹), but not for specifically active drugs (P = 1.12 × 10⁻¹). Mean viability values were derived from biological triplicate experiments (Methods).

Extended Data Table 1 Key differences in assay conditions between projects

Full size table

Supplementary information

Supplementary Information

This file contains a Supplementary note and full legends for Supplementary Tables 1-6. (PDF 193 kb)

Supplementary Data

This file contains Supplementary Tables 1-6, please refer to the Supplementary Information document for full legends. (XLSX 1887 kb)

PowerPoint slides

PowerPoint slide for Fig. 1

PowerPoint slide for Fig. 2

PowerPoint slide for Fig. 3

PowerPoint slide for Fig. 4

Rights and permissions

Reprints and permissions

About this article

Cite this article

Haverty, P., Lin, E., Tan, J. et al. Reproducible pharmacogenomic profiling of cancer cell line panels. Nature 533, 333–337 (2016). https://doi.org/10.1038/nature17987

Download citation

Received: 24 October 2014
Accepted: 07 April 2016
Published: 18 May 2016
Issue Date: 19 May 2016
DOI: https://doi.org/10.1038/nature17987
Springer Nature Limited

This article is cited by

Prevalence, causes and impact of TP53-loss phenocopying events in human tumors
- Bruno Fito-Lopez
- Marina Salvadores
- Fran Supek
BMC Biology (2023)
Microsecond cell triple-sorting enabled by multiple pulse irradiation of femtosecond laser
- Ryota Kiya
- Tao Tang
- Yaxiaer Yalikun
Scientific Reports (2023)
Reusability report: Evaluating reproducibility and reusability of a fine-tuned model to predict drug response in cancer patient samples
- Emily So
- Fengqing Yu
- Benjamin Haibe-Kains
Nature Machine Intelligence (2023)
Evaluation of statistical approaches for association testing in noisy drug screening data
- Petr Smirnov
- Ian Smith
- Benjamin Haibe-Kains
BMC Bioinformatics (2022)
Distinct resistance mechanisms arise to allosteric vs. ATP-competitive AKT inhibitors
- Kristin M. Zimmerman Savill
- Brian B. Lee
- Kui Lin
Nature Communications (2022)

Editorial Summary

Cancer cell lines reassessed

Large-scale analyses of the drug sensitivity of cancer cell lines have been previously reported to yield conflicting conclusions. This Analysis uses independently generated data to demonstrate that consistency can be achieved if key laboratory and data analysis practices are considered when future studies are undertaken.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reproducible pharmacogenomic profiling of cancer cell line panels

Abstract

Access this article

Similar content being viewed by others

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Extended data figures and tables

Supplementary information

PowerPoint slides

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Navigation