Cell lines were sourced from the ATCC, or from the collection developed and authenticated by the Brisbane Breast Bank from 2010 to 2012 ( and Table 1). P7731 was derived in-house from a metastatic bone deposit from a breast cancer patient enrolled in the Victorian Tissue Bank (manuscript under review), who provided informed written consent to use donated tissue for research purposes. We acknowledge the existence of genotypic and phenotypic variants of some cell lines, and emphasise that the data presented here reflect our specific collection. Nonetheless, it broadly represents cell line collections used in standard breast cancer research laboratories around the world, and multiple quality control measures were in place to ensure its integrity. The QC process was as follows: (1) At baseline (earliest passage), the morphology of each line was checked for consistency with published information, cultures were photographed for future comparison, and expanded to generate cryopreserved ‘master stocks’. (2) Mycoplasma testing was performed using the Mycoalert® kit (Lonza) according to the manufacturer’s instructions, using media from cells cultured without antibiotic for 2 weeks. All master stocks used in this study were mycoplasma-negative. (3) DNA was extracted for STR-profiling and targeted mutation analysis (see below). (4) Thawed master stocks were cultured for at least 48 h to 70% confluence prior to each experiment, and used within five passages of the authenticated baseline (Table 1), with ongoing re-analysis of morphology and growth rates, and regular re-testing to confirm negative mycoplasma status.
Culture media were prepared according to ATCC recommendations ( and Table 1), and cells were maintained at 37 °C in a humidified atmosphere with 5% CO2. Base media and supplements were purchased from Thermofisher Scientific (TFS) or Sigma-Aldrich (SA): DMEM (TFS catalogue no. 11995065), RPMI (TFS-11875093), McCoy’s 5A modified medium (TFS-16600082), DMEM/F12 (TFS-11330032), Ham’s F12 (TFS-11765054), foetal bovine serum (FBS; TFS-16000044), horse serum (TFS-16050130), bovine pituitary extract (TFS-13028014), Insulin (SA-I5500), Hydrocortisone (SA-H0888), cholera toxin (SA-C8052), epidermal growth factor (SA-E9644), transferrin (SA-T8158). Antibiotic/antimycotic was routinely included in culture media (TFS-15240062). Master stocks were cryopreserved in a solution comprising 50% of the regular growth medium, 10% dimethyl sulfoxide (DMSO) and 40% FBS.
Short tandem repeat (STR)-profiling and targeted mutation analysis
DNA extraction from cell pellets was performed using the QIAamp DNA Mini kit (QIAGEN) and profiling was performed using the Cell ID™ System (Promega). The assay allows co-amplification and three-color detection of 9 STR loci (D21S11, TH01, TPOX, vWA, CSF1PO, D16S539, D7S820, D13S317 and D5S818) and Amelogenin for gender identification and collectively provides a genetic profile with a random match probability of 1 in 2.92 x 109. The amplified and fluorescently-tagged loci were analysed by capillary electrophoresis on an ABI Prism 3100 Genetic Analyser (Applied Biosystems) at QIMR Berghofer (Table-S1). Of the 35 lines tested, 25 matched profiles published by either the ATCC or other laboratories, seven had not been previously published, and three gave partial matches (HBL100, the subject of an authenticity debate (below); Hs578T, which exhibited loss-of-heterozygosity (LOH) at 2/10 STR loci; and MDAMB435, LOH at one locus and different number of STRs at another. The three partially matched lines are triple-negative and may exhibit heightened genomic instability .
Targeted mutation analysis of PIK3CA, HRAS, KRAS and BRAF was performed using the Oncocarta Assay (v1.0; Sequenom ). Variants were validated by High Resolution Melt analysis, iPLEX (using different PCR and extension primers), repeat OncoCarta analysis and/or Sanger sequencing. Lines were also Sanger-sequenced to identify mutations in EGFR (NC_000007.1; Ex 5–11/23–28) and TP53 (NC_000017.9; exons 4–10 as per International Agency for Research on Cancer (IARC) recommendations [16, 17]). Comparing our findings to cell line mutation data repositories [18, 19] revealed several differences: four instances where we could only detect the wild-type allele but a mutation had been reported previously, possibly indicating LOH at these loci (BT474 (PIK3CA, EGFR), MDAMB231 (BRAF), MDAMB436 (BRAF) and ZR751 (HRAS); MDAMB436, where a BRAF deletion was found by the CCLE but not COSMIC or our study; and two with discordant findings between the CCLE and COSMIC (BRAF mutations in MDAMB361 and MDAMB415). These findings highlight the existence of clonal and phenotypic drift in different cell line stocks around the world (Table-S1).
Based on these analyses we can also make the following comments about cell line authenticity:
Previous reports have suggested that KPL1 and MCF7 are the same cell line . Indeed, our stocks had identical STR profiles, shared the E545 K PIK3CA mutation (Table-S1) and had very similar transcriptional profiles (Fig. 3/S3), though there were some phenotypic differences noted with single-cell analysis (Figs. 1,2), consistent with KPL1 being a clonal derivative of MCF7.
We did not detect a Y chromosome in HBL100 (Table-S1), which has been the basis for its exclusion from cell line repositories and cell line panel studies.
MDAMB435 has also been the subject of an historic authenticity debate, with conflicting reports that it could either be the M14 melanoma cell line or a breast-derived line that exhibits lineage infidelity, including melanocytic features [21,22,23]. This line was an outlier in multiple experiments in our study, though others have performed more comprehensive analyses . The most recent consensus is that the line was indeed contaminated with melanoma cells and should not be used as a breast cancer model .
Transcriptome subtype (Tx-ST) was assigned based on previous gene expression array studies, summarised in Table-S2. Six lines in the cohort had not been previously classified, and so we applied the surrogate cytokeratin expression method of Hollestelle et al.  (Fig. 2 and Table-S2). Briefly, the criteria are: Luminal phenotype: CK8/18+ and/or CK19+, CK5-; Basal-like luminal phenotype (L+B): CK8/18+ and/or CK19+, CK5+; null (claudin-low): CK8/18 low, CK19-, CK5-; basal: CK8/18 low, CK19-, CK5+. Triple-negative (TN) cell lines were sub-classified using the TNBCtype tool  (Table-S2).
Multiplexed flow cytometry analysis (FC)
Cells were prepared from three separate cultures as previously described . Single cell suspensions were stained with SYTOX BLUE (Molecular Probes), CD49f-PE-Cy5, EpCAM-FITC, CD44-APC H7 and CD24-PE (Becton–Dickinson (BD); Table 2), or isotype and fluorophore-matched control antibodies. Raw fluorescence data were collected on either FACSAria-I or LSRFortessa flow cytometers (BD) using FACSDiva acquisition software (v6.1.3; BD). Particles and dead cells were excluded based on low light scatter and SYTOX BLUE positivity. Doublet discrimination was also performed by gating out cells with disproportionate forward scatter height and area. Positivity thresholds were determined based on the signal from isotype controls for each marker and cell line combination (Fig-S1). At least 1x104 events representing live, single cells were collected for each sample. Fluorescence compensation was performed on each occasion, then retrospectively checked and modified if necessary using FCS Express software (v6.0; DeNovo). Population frequencies were determined for individual and combined parameters from an average of three cultures (consecutive passages). The proportions of luminal-1, luminal-2, basal and mesenchymal subpopulations were determined as described .
Cultures were washed twice in PBS and then harvested with cell scrapers after soaking in versene (EDTA) for 5 min, so as not to remove trypsin-sensitive cell surface proteins. An equal volume of growth media was added before centrifuging, then washed twice in PBS, resuspended in a minimal volume of PBS and fixed in 5 mL 10% neutral buffered formalin for 30 min at room temperature. After centrifuging (200 × g), samples were washed twice in PBS before processed for paraffin embedding. A cell line TMA was then constructed from 0.6 mm cores. 4 μm TMA sections were cut for IHC. For each of the primary antibodies used, Table 2 lists the working dilution and one of the following antigen retrieval methods: heat retrieval in a decloaking chamber (Biocare Medical) with either: (EDTA) 0.001 M Tris-ethylenediaminetetraacetic acid pH 8.8, at 105 °C for 15 min; (citrate) 0.01 M citric acid buffer pH 6.0, at 125 °C for 5 min; (Chymo) 0.1% Chymotrypsin in 0.01 M CaCl2 + 0.05 M Tris buffer, pH 7.8 at 37 °C for 10 min. The Dako EnVision + (Dakocytomation) and Vectastain® Universal ABC kit (Vector laboratories) were used for detection according to the manufacturers’ instructions. The Dako HercepTest™ kit was used for HER2 staining. Sections were reviewed and described by a qualified pathologist (ACV). We used at least two separate cultures (successive passages) for each cell line.
Digital IHC analysis
Stained TMA sections were scanned on an Aperio ScanScope T2 high-resolution slide scanner (20 x magnification). Digital images were imported into Definiens Tissue Studio 3.0 for automated analysis. Cells were segmented based on nuclear haematoxylin-defined staining and antigen specific DAB-defined chromatic staining was assessed per cell either localised to nucleus for nuclear stains (ER, p53, AR) or in the cytoplasm by dilating the area segmented by the nucleus. The intensity of staining for each antigen was divided into negative, low, moderate and strong categories was verified by visual assessment of a small panel of positive and negative controls. These thresholds were then used to calculate the frequency of cells in each category across all cores in the TMAs. An overall histological score for each antigen was calculated using an algorithm that considers the frequency of cells assigned to each intensity category. Histological score = (1 x % weak + 2 x % moderate + 3 x % strong). TMA sections had to contain at least 100 cells for inclusion. Where multiple cores were analysed, an average was taken. Unsupervised hierarchical clustering was performed on the eight antigens (ER, E-cadherin, CK5, CK8/18, CK19, Vimentin, EGFR and HER2) using Euclidean distance and complete linkage method for clustering via the R Project for Statistical Computing .
Nanostring® targeted pancancer pathways assay
RNA was extracted from each line using the RNeasy kit (QIAGEN), then quantified using Nanodrop. We used the PanCancer Pathways Panel, a set of barcoded probes for 730 genes and 40 housekeeping genes (nanoString®). Assays and analysis were performed according to the manufacturer’s instructions using 50 ng of total RNA . Data were collected using the nCounter® Dx Digital Analyzer (QIMR Berghofer) and processed using nSolver Software (v3.0 NanoString). Unsupervised clustering based on the expression of all the genes in the PanCancer Pathways panel was performed in R , using rank correlation as the distance metric and centroid linkage, with genes ordered by cluster tightness. Significance thresholds (p) used to assign functional enrichment were 0.01 (genes) and 0.05 (samples).