Abstract
The first genomic scar-based homologous recombination deficiency (HRD) measures were produced using SNP arrays. As array-based technology has been largely replaced by next generation sequencing approaches, it has become important to develop algorithms that derive the same type of genomic scar scores from next generation sequencing (whole exome “WXS”, whole genome “WGS”) data. In order to perform this analysis, we introduce here the scarHRD R package and show that using this method the SNP array-based and next generation sequencing-based derivation of HRD scores show good correlation (Pearson correlation between 0.73 and 0.87 depending on the actual HRD measure) and that the NGS-based HRD scores distinguish similarly well between BRCA mutant and BRCA wild-type cases in a cohort of triple-negative breast cancer patients of the TCGA data set.
Similar content being viewed by others
Introduction
Reliable quantification of homologous recombination deficiency of human tumor biopsies, especially in the case of ovarian and breast cancer, is expected to identify patients that are particularly sensitive to platinum or PARP inhibitor-based therapy.1 Before the widespread introduction of next generation sequencing (NGS) to characterize tumor biopsies, SNP arrays were used to identify large-scale genomic aberrations associated with homologous recombination deficiency, often induced by the loss of BRCA1 or BRCA2 function. Three such measures were identified: telomeric allelic imbalance (HRD-TAI score),2 loss of heterozygosity profiles (HRD-LOH score),3 and large-scale state transitions (HRD-LST score).4 These three measures have also been combined into a single summary measure of HR deficiency.5 The HRD-LOH score has also become an integral part of a recently published, whole-genome sequencing-based measure of homologous recombination deficiency, HRDetect.6 These measures, along with functional assays,7 showed promise to identify HR-deficient cases and thus predict response to platinum or PARP inhibitor therapy.2,8,9 Since NGS has become the main genomic characterization method of cancer biopsies, it has become essential to migrate the SNP array-based methodology to NGS-based platforms.
TCGA breast cancer biopsies have been both SNP array profiled and subjected to NGS allowing a direct comparison.8
Results and discussion
We found good correlation between the SNP array-based and NGS-based HRD scores (Fig. 1). When comparing the results of the scarHRD R package to SNP array-based measurements, we found the following Pearson correlation coefficients: number of telomeric allelic imbalances (NtAI): r = 0.84 (R2 = 0.70, adjusted R2 = 0.70, p < 2.2e–16), large-scale transition (LST) r = 0.79 (R2 = 0.62, adjusted R2 = 0.62, p < 2.2e–16) loss of heterozygosity (HRD−LOH) r = 0.73 (R2 = 0.53, adjusted R2 = 0.52, p < 2.2e–16). These three measures are often combined for diagnostic purposes5 and in HRDetect.6 Therefore, we also compared the sum of the three scores across the two platforms (HRD sum): r = 0.87 (R2 = 0.75, adjusted R2 = 0.75, p < 2.2e–16) (Fig. 1). The artificial reduction of coverage to 30× did not affect this correlation (Supplementary Material, Figure S7-S8). The BRCA1/2-mutated samples showed significantly higher NGS-based HRD-sum values (Fig. 2, Supplementary Figure S6). The predictive value of HRD-sum, measured as AUC value of the corresponding ROC curve, was 80.8% (Supplementary Figure S2).
There was no significant difference in SNP versus WXS-based estimation of tAI, LST, and HRD-sum, but the number of LOH events were significantly lower in the WXS-based estimation (p = 0.012, Kolmogorov–Smirnov test). This could be attributed to differences in segmentation algorithm (the more segmented the WXS data is the lower number of LOHs that are called) or to low sample quality, coverage. However, when comparing the ROC curves for BRCA1/2 status of the SNP-based and WXS-based HRD-score, there was no significant difference between the SNP array-based and NGS-based methods. (Supplementary Figure S3).
According to our expectations and previous results the BRCA1/2-deficient cases showed higher values for each of the four scores (Supplementary Figure S4-S5).
The sum of the three HRD scores showed good correlation across the two platforms. Thus in more advanced NGS-based HR deficiency measures such as HRDetect, the SNP array-based step could be replaced by an NGS-based estimate of the HR deficiency scores.
Brief description of the methods
Based on receptor status determined by immunohistochemistry, 139 paired tumor and normal samples of the TCGA breast cancer cohort could be classified as triple-negative breast cancer. From these patients 95 had Affymetrix SNP 6.0 array-based HRD estimates (LOH, TAI, LST), previously published by our group.10 In this publication we present the scarHRD R package (https://github.com/sztup/scarHRD) which estimates the level of the three HR deficiency measures using NGS data.
A sample’s LOH score is the total number of LOH regions across the entire genome that are larger than 15 Mb but do not cover whole chromosomes. In the original publication this 15 Mb lower limit for LOH was determined by comparing SNP array profiles between BRCA mutant and BRCA wild-type cases.3 We performed a similar analysis using NGS data and found that the original 15 Mb cutoff performed best in this case as well (Supplementary Figure S1).
The LST is defined as a chromosomal break between adjacent regions of at least 10 Mb, with a distance between them not larger than 3 Mb.
The number of telomeric allelic imbalances is the number of AIs (the unequal contribution of parental allele sequences with or without changes in the overall copy number of the region) that extend to the telomeric end of a chromosome.
Allele-specific copy number estimation is a crucial part of estimating HR deficiency. As previously shown, allele-specific copy number estimation from NGS data performed using the Sequenza R package show high agreement with SNP array-based copy number profiles.11 The scarHRD package is, therefore, able to use Sequenza preprocessed files as well as other allele-specific segmentation files in the same format.
As it has been previously shown that in ovarian cancer the sum of the genomic scar scores is elevated in BRCA-deficient cancers,5 an additional aim of our study was to compare the unweighted numeric sum of LOH, tAI, and LST, called here HRD-sum, to the BRCA1/2 status of the patients. A sample was classified as BRCA-deficient if (1) there was a deep deletion of BRCA1/2, (2) a germline and a somatic mutation in BRCA1/2 with LOH, or (3) if LOH had co-occurred with promoter methylation in one of the BRCA1/2 genes. The somatic mutation status (mutations with likely pathogenic function) and methylation data was acquired from the TCGA data portal. The germline mutation status was determined using HaplotypeCaller, and was annotated with Intervar,12 likely pathogenic mutations and frameshift insertion/deletion with unknown significance were used in our analysis. LOH was determined using Sequenza’s allele-specific segmentation results (Supplementary Table S1).
Data availability
The data sets generated during the current study are available from the corresponding author on reasonable request.
Code availability
The code/algorithm for performing the experiments is available for download at https://github.com/sztup/scarHRD.
References
Lord, C. J.& Ashworth, A. BRCAness revisited. Nat. Rev. Cancer 16, 110–120 (2016).
Birkbak, N. J. et al. Telomeric allelic imbalance indicates defective DNA repair and sensitivity to DNA-damaging agents. Cancer Discov. 2, 366–375 (2012).
Abkevich, V. et al. Patterns of genomic loss of heterozygosity predict homologous recombination repair defects in epithelial ovarian cancer. Br. J. Cancer 107, 1776–1782 (2012).
Popova, T. et al. Ploidy and large-scale genomic instability consistently identify basal-like breast carcinomas with BRCA1/2 inactivation. Cancer Res. 72, 5454–5462 (2012).
Telli, M. L. et al. Homologous recombination deficiency (HRD) score predicts response to platinum-containing neoadjuvant chemotherapy in patients with triple-negative breast cancer. Clin. Cancer Res. 22, 3764–3773 (2016).
Davies, H. et al. HRDetect is a predictor of BRCA1 and BRCA2 deficiency based on mutational signatures. Nat. Med. 23, 517–525 (2017).
Mutter, R. W. et al. Bi-allelic alterations in DNA repair genes underpin homologous recombination DNA repair defects in breast cancer. J. Pathol. 242, 165–177 (2017).
Mirza, M. R. et al. Niraparib maintenance therapy in platinum-sensitive, recurrent ovarian cancer. N. Engl. J. Med. 375, 2154–2164 (2016).
Zhao, E. Y. et al. Homologous recombination deficiency and platinum-based therapy outcomes in advanced breast cancer. Clin. Cancer Res. 23, 7521–7530 (2017).
Marquard, A. M. et al. Pan-cancer analysis of genomic scar signatures associated with homologous recombination deficiency suggests novel indications for existing cancer drugs. Biomark. Res. 3, 9 (2015).
Favero, F. et al. Sequenza: allele-specific copy number and mutation profiles from tumor sequencing data. Ann. Oncol. 26, 64–70 (2015).
Li, Q. & Wang, K. InterVar: clinical interpretation of genetic variants by the 2015 ACMG-AMP guidelines. Am. J. Hum. Genet. 100, 267–280 (2017).
Acknowledgements
This work was supported by the Research and Technology Innovation Fund (KTIA_NAP_13-2014-0021 to Z.S.); Breast Cancer Research Foundation and the Novo Nordisk Foundation Interdisciplinary Synergy Programme Grant (NNF15OC0016584 to I.C. and Z.S.), by the ÚNKP-17-4-III-SE-63 New National Excellence Program of the Ministry of Human Capacities to L.R. and by Tesaro Inc. The results shown here are based upon data generated by the TCGA Research Network: http://cancergenome.nih.gov/.
Author information
Authors and Affiliations
Contributions
Conception, design, writing, and review of the manuscript: Zs.S., M.D, M.K, L.R., I.C., F.F., N.J.B, A.C.E., A.S., and Zo.S. Development of methodology: Zs.S, M.D, M.K. Analysis and interpretation of data: Zs.S., M.D, M.K, Zo.S.
Corresponding author
Ethics declarations
Competing interests
N.J.B., A.C.E., and Zo.S are listed as co-inventors on a patent on telomeric allelic imbalance, which is owned by Children’s Hospital Boston and licensed to Myriad Genetics. The remaining authors declare no competing interests.
Additional information
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Sztupinszki, Z., Diossy, M., Krzystanek, M. et al. Migrating the SNP array-based homologous recombination deficiency measures to next generation sequencing data of breast cancer. npj Breast Cancer 4, 16 (2018). https://doi.org/10.1038/s41523-018-0066-6
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41523-018-0066-6
- Springer Nature Limited
This article is cited by
-
expHRD: an individualized, transcriptome-based prediction model for homologous recombination deficiency assessment in cancer
BMC Bioinformatics (2024)
-
Prognostic value of structural variants in early breast cancer patients
npj Breast Cancer (2024)
-
Homologous Recombination Deficiency Unrelated to Platinum and PARP Inhibitor Response in Cell Line Libraries
Scientific Data (2024)
-
Genomic and transcriptomic profiling of inflammatory breast cancer reveals distinct molecular characteristics to non-inflammatory breast cancers
Breast Cancer Research and Treatment (2024)
-
Molekularpathologische Untersuchungen im Wandel der Zeit
Die Pathologie (2024)