Background

Globally, the incidence of breast cancer has increased to approximately 2 million cases in 2017, while the mortality rate between 2007 and 2017 has declined [1]. In Pakistan, breast cancer is the most frequent invasive malignancy among women, accounting for 36.8% of all female malignancies [2]. Pakistan has one of the highest rates of breast cancer in Asia, with age-standardized (world) annual incidence and mortality rates of 43.9 and 23.2 per 100,000, respectively [2]. Breast cancer incidence and mortality trends are still increasing [3, 4], making breast cancer a major public health burden in this developing country.

Approximately 50% of familial breast cancer is due to pathogenic germline variants in high, and moderate penetrance genes and common low-penetrance genetic variants [5]. Most of these genes are involved in the DNA repair pathway and maintenance of genomic stability, underlining the significance of other genes involved in this pathway. In 2015, the RecQ Like Helicase (RECQL) gene was identified in West European and East Asian populations as a candidate breast cancer susceptibility gene [6, 7]. It encodes a DNA helicase, which is involved in the repair of DNA double-strand breaks and plays a crucial role in the maintenance of genomic stability. Several studies conducted among unselected breast cancer patients from Belarus and Germany [8], USA [9], and early-onset and familial breast cancer patients from Poland [6], Canada [6], and Australia [10] reported pathogenic RECQL variant frequencies ranging from 0 to 2.6%. Breast tumors associated with pathogenic RECQL variants were predominantly positive for the estrogen and progesterone receptors (ER and PR) [6,7,8, 11].

Apart from two studies conducted in an East Asian population from China [7, 11], data on the contribution of pathogenic RECQL variants to early-onset and/or familial breast cancer patients from other Asian regions are lacking. In Pakistan, breast cancer is the most common malignancy and main cause of cancer-related deaths in women. The burden of breast cancer in terms of estimated age-standardised incidence and mortality rates is 43.9 and 23.2 per 100,000, respectively [12]. Pathogenic variants in high- and moderate-penetrance breast cancer susceptibility genes (BRCA1, BRCA2, TP53, CHEK2, RAD51C, and PALB2) account for about 27% of early-onset and familial breast cancers in Pakistan [13,14,15,16,17], leaving a substantial proportion of cases unexplained. In the present study, we determined the contribution of pathogenic RECQL variants to hereditary breast cancer in 302 early-onset and familial BRCA1 and BRCA2 negative patients with ER positive and/or PR positive breast cancer in a South Asian population from Pakistan.

Methods

Study subjects

Patients diagnosed with invasive breast cancer were selected from the institutional registry of genetically enriched breast and ovarian cancer families enrolled at the Shaukat Khanum Memorial Cancer Hospital and Research Centre (SKMCH&RC) in Lahore, Pakistan, from June 2001 to August 2015, fulfilling the inclusion criteria as described previously [17, 18]. The present study included 302 early-onset and familial breast cancer patients with ER positive and/or PR positive tumors. All study participants were tested negative for pathogenic variants in BRCA1, BRCA2 [17, 18] and about 60% for pathogenic variants in PALB2 (n = 187), TP53 (n = 180), CHEK2 (n = 168), and RAD51C (n = 168) [13,14,15,16] (Muhammad U. Rashid, unpublished TP53 data). We categorized study participants into four risk groups based on age at cancer diagnosis or family history of breast and/or ovarian cancer (Table 1) [17].

Table 1 Frequency of RECQL pathogenic variants according to family structure

The control population comprised 250 healthy women with no family history of breast/ovarian cancer. They were selected from the institutional registry of 1012 female controls enrolled in a Pakistani breast cancer case-control study as previously described [19]. The Institutional Review Board (IRB) of the SKMCH&RC approved the current study (IRB approval number ONC-BRCA-001/2). All study participants signed informed written consent.

Variant screening

The complete coding sequence and exon-intron junctions of the RECQL gene (Genbank accession number NM_002907.3) were screened in the 302 index patients and 250 controls by denaturing high-performance liquid chromatography (DHPLC) analysis. The PCR primers details are described elsewhere [7]. When available, a positive control with a known variant was included in each set of DHPLC analysis. Bidirectional DNA sequencing was performed to confirm a variant, as described elsewhere [20].

Variant classification

The novel RECQL variants were analyzed using the numerical score-based variant classification system Sherloc, a comprehensive refinement of the American College of Medical Genetics and Genomics-Association for Molecular Pathology (ACMG-AMP). Five evidence categories (two clinical and three functional) were used to evaluate variants. Clinical criteria include variant frequency information from large human population data, the Genome Aggregation Database (gnomAD; https://gnomad.broadinstitute.org/gene/-ENSG00000004700?dataset=gnomad_r2_1) and variant observation in unaffected and affected individuals and families. For variants classification, allele frequencies of South Asian population from gnomAD were used as this population has ethnic and geographic relevance to Pakistani population. Functional criteria include variant type, experimental studies, and computational data. The following in silico tools for prediction of protein function or splicing were used: PolyPhen-2 (http://genetics.bwh.harvard.edu/pph2/), SIFT (https://sift.bii.a-star.edu.sg/), SNAP2 (http://www.rostlab.org/services/snap/submit), MutationTaster (http://www.mutationtaster.org/), SNPs&GO (http://snps.biofold.org/snps-and-go/snps-and-go.html), and nsSNP Analyzer (http://snpanalyzer.uthsc.edu/) for the missense variants, [14, 16] and splice-site prediction algorithms MaxEntScan (http://genes.mit.edu/burgelab/maxent/Xmaxentscan_scoreseq.html), NNSPLICE (http://www.fruitfly.org/seq_tools/splice.html), HumanSplice Finder (http://www.umd.be/HSF3/), GeneSplicer (http://ccb.jhu.edu/software/genesplicer/), and SpliceSiteFinder-like (http://www.umd.be/searchSpliceSite.html) for splice-site and intronic variants [14]. In case of any disagreement between clinical and functional evidence, the clinical evidence was considered more convincing.

Variants were classified as pathogenic, likely pathogenic, benign, likely benign, and as variants of uncertain significance (VUS), according to the Sherloc guidelines [21]. Sherloc is a semiquantitative system in which each criterion is awarded a preset number of points on orthogonal benign (1B-5B) or pathogenic (1P-5P) scales using clinical and functional criteria. Point thresholds for pathogenic and benign classifications are 5P and 5B, for likely pathogenic and likely benign classifications 4P and 3B, and for VUS <4P and < 3B. Pathogenicity and benign point scores were calculated separately.

RNA analysis of the c.868-2A > G splice-site variant

Total RNA was extracted from blood samples of the proband and an unaffected sister harboring the RECQL c.868-2A > G, another variant negative unaffected sister, and a variant negative control using TRIzol reagent (Invitrogen, Carlsbad, CA, USA). Total RNA was transcribed to cDNA using the RevertAid First Strand cDNA Synthesis Kit (Thermo Fisher Scientific, Vilnius, Lithuania) with random hexamer primers according to the manufacturer’s protocol. Reverse transcriptase (RT)-PCR was performed using the forward primer (5′ – CAG TTC CCT AAC GCA TCA CT – 3′) and reverse primer (5′ – TTT CAT TGG CTG ACC ATT TT – 3′) located on exon 7 and exon 9 of the RECQL transcript variant 1 (ENST00000444129.7), respectively. PCR reactions were carried out in a 25 μl volume containing 1 μl of respective cDNA, 1x PCR Gold Buffer (Applied Biosystems, California, USA), 2.5 mM MgCl2, 0.2 μM of each primer, 250 μM of each dNTP (Invitrogen, Carlsbad CA, USA), and 1 unit AmpliTaq Gold DNA polymerase (Applied Biosystems, California, USA). After an initial denaturation for 15 min at 95 °C, cDNA was amplified by 35 cycles of 1 min at 94 °C, 1 min at 57.5 °C, 1 min at 72 °C, and a final extension step of 5 min at 72 °C. Five μl of RT-PCR products were loaded on a 2% agarose gel containing ethidium bromide (Sigma-Aldrich, Steinheim, Germany) and electrophoresis was performed at 140 V for 80 min and confirmed by Sanger sequencing as described previously [20].

Results

Characteristics of the study participants

A total of 302 BRCA1 and BRCA2 negative index breast cancer patients were screened for RECQL germline variants. Of these patients, 122 (40.4%) were early-onset breast cancer patients (≤30 years of age), 133 (44.0%) belonged to families with two or more breast cancer cases with at least one case diagnosed at 50 years or younger, 18 (6.0%) to families with both breast and ovarian cancer, and 29 (9.6%) male breast cancer cases diagnosed at any age (Table 1). Of the index patients, 223 presented with ER positive and PR positive breast tumors, 55 with ER positive tumors, and 24 with PR positive tumors. The mean age of disease presentation was 36.6 years (range 20–78) for female breast cancer (n = 273), and 51.5 years (range 27–73) for male breast cancer (n = 29).

Spectrum of identified RECQL variants

In total, 31 distinct RECQL variants were detected. Of these, 20 were novel: one nonsense variant, one splice-site variant, three missense variants, three silent variants, and twelve noncoding variants (Table 2). The remaining eleven variants were previously reported: three missense variants and eight noncoding variants.

Table 2 RECQL germline variants identified in the study cases and controls from Pakistan

Classification and characteristics of identified RECQL variants

The novel variants were analyzed for their potential functional effect using Sherloc guidelines [21], including the minor allele frequency (MAF) > 1% for benign variants reported in Genome Aggregation Database (gnomAD) or in our study (Table 2) and in silico prediction tools (Table 3). One variant was classified as pathogenic, three as VUS, and 16 as benign/likely benign.

Table 3 In silico analyses of novel RECQL variants identified in the study cases from Pakistan

Pathogenic RECQL variant

The novel pathogenic RECQL variant is a nonsense variant at nucleotide position 225 in exon 4 (c.225G > A (p.W75*)), which is predicted to result in premature protein termination. It was identified in a 37-year-old familial breast cancer patient (III:3, Fig. 1a) of Punjabi ethnicity and was absent in 250 controls. The patient carrying this variant presented with a grade 3, ER positive and PR positive invasive ductal carcinoma (IDC) with lymph node involvement. The pathogenic variant frequencies were 0.3% (1/302) in early-onset and familial breast cancer patients and 0.8% (1/133) in familial patients. The variant had a Sherloc score of 8P and was classified as pathogenic (Table 4).

Fig. 1
figure 1

Pedigrees of breast cancer patients with RECQL variants. a Family 282 carrying the pathogenic variant p.W75*. b-d Families 565, 649, and 625 carrying the VUS p.I141F, p.S182S, and p.C475C, respectively. e-g Families 471, 577 and 595 carrying the benign variant c.868-2A > G. Circles are females, squares are males, and a diagonal slash indicates a deceased individual. Symbols with filled left upper quadrant: unilateral breast cancer. Symbols with filled right lower quadrant: cancer other than breast, the name of that cancer is indicated. Double line between spouses: consanguineous marriage. Identification numbers of individuals are below the symbols. The index patient is indicated by an arrow. BC: breast cancer. The numbers following these abbreviations indicate age at cancer diagnosis. +: carrier, −: non-carrier

Table 4 Sherloc variant classification criteria of novel RECQL variants

RECQL variants of uncertain significance (VUS)

One novel missense variant (p.I141F) was identified in a 47-year-old familial breast cancer patient (II:4, Fig. 1b) of Punjabi origin. Two silent variants (p.S182S and p.C475C) were detected in familial breast cancer patients at age 68 (I:1, Fig. 1c) and 47 (III:10, Fig. 1d) respectively of Saraiki background. These variants were not detected in 250 controls. The population allele frequencies of p.I141F, p.S182S, and p.C475C were low (MAF = 0.0188%, MAF = 0.0165% and MAF = 0.0033%, respectively) and within the pathogenic range of < 8 total alleles among South Asians (n ≥ 12,086) in the gnomAD. The missense variant had a Sherloc score of 2.5P and both silent variants of P1.5 and B3. All variants were classified as VUS (Table 4).

Benign or likely benign variants

One novel variant in a canonical splice acceptor site of intron 7, c.868-2A > G, was detected in a 36-year-old familial (II:4, Fig. 1e), a 61-year-old male (II:8, Fig. 1f), and a 25-year-old female early-onset breast cancer patient (II:9, Fig. 1g) of Punjabi, Urdu speaking and Pathan ethnicity, respectively (1%, 3/302). It was also found in one of the two tested unaffected sisters (II:7, Fig. 1g) of the early-onset patient. Moreover, c.868-2A > G was detected in two controls (0.8%, 2/250). The similar frequencies in cases and controls indicate that this variant is not likely to be pathogenic. Using the Sherloc guidelines, a high frequency of the G allele (MAF = 0.5669%) was reported among South Asians (n = 13,582) in the gnomAD. It was predicted to have a functional impact by three of five splice-site prediction tools (Table 3).

To address if c.868-2A > G affects splicing, RT-PCR analysis of RNA extracted from two variant carriers and two non-carriers (one family member and one control) revealed the presence of one transcript corresponding to the reference full-length transcript (364 bp) in all samples (Fig. 2a). All transcripts were confirmed by Sanger sequencing (Fig. 2b-e). Thus, this variant may not affect the splicing of RECQL. It had a Sherloc score of 1P and 8B and was classified as benign (Table 4).

Fig. 2
figure 2

RT-PCR analysis of the RECQL c.868-2A > G splice-site variant. a Photograph of an ethidium bromide-stained gel of the RECQL transcripts. Lane 1, DNA 100 bp marker; lanes 2 and 3, c.868-2A > G carriers; lanes 4 and 5, non-carriers (family member, control); lane 6, no template control; lane 7, gDNA wild type control; lane 8, Lambda DNA/HindIII marker. Product sizes: gDNA = 2260 bp; cDNA = 364 bp. Sequencing profiles of forward strand using PCR product from the cDNA of: b non-carrier (control), c-d c.868-2A > G carriers, e non-carrier (family member)

The remaining eleven variants (three missense variants and eight noncoding variants) have been previously reported as benign/likely benign in the ClinVar database (by April 2020) or in other populations.

Discussion

This is the first study that investigates the prevalence of pathogenic RECQL germline variants in 302 BRCA1 and BRCA2 negative high-risk patients with ER positive and/or PR positive breast tumors from Pakistan. We identified a single novel pathogenic RECQL variant. Although several studies had been previously conducted in Europe and only two studies in East Asia, there is still conflicting evidence for a role of RECQL in breast cancer predisposition [6, 8, 10, 23]. Our study provides additional information on the contribution of the RECQL gene to hereditary breast cancer in a South Asian population from Pakistan.

The novel pathogenic RECQL variant, p.W75* was identified in 0.3% of early-onset and familial breast cancer patients with hormone receptor-positive tumors, but not in controls, suggesting that p.W75* may be disease-causative. In other studies performed in China [7, 11], higher pathogenic variant frequencies ranging from 0.54 to 1.6% were observed in BRCA1 and BRCA2 negative early-onset and/or familial breast cancer cases. In Caucasian studies conducted in the Australia [10], Canada [6], Poland [6], and USA [9], similar variant frequencies ranging from 0.1 to 0.4% have been reported in familial breast cancer patients, while no pathogenic variants were detected in studies performed in South-West Poland and West Ukraine [24]. In other Caucasian studies conducted in Belarus, Germany, and Australia, the frequency of pathogenic variants identified in controls were similar or higher than cases [8, 10]. Overall, these findings suggest a controversial role of RECQL as a breast cancer susceptibility gene.

Previously, a missense variant (p.R215Q) in the highly conserved RecA-like domain D1 of RECQL (amino acid residues 63 to 281) is reported to disrupt the RECQL helicase activity and classified as a pathogenic mutation [7]. In the current study, a novel missense variant, p.I141F, in the same domain was found in one familial breast cancer patient (0.3%), but not in controls. It may also affect the ATP-dependent translocation activity of RECQL leading to disruption of helicase activity [25]. However, functional assays are warranted to confirm this finding. Nevertheless, the population allele frequency of p.I141F was rare among South Asians in the gnomAD. Overall population data, variant type, clinical observation and findings from in silico predictions suggest that p.I141F may be a VUS based on the Sherloc guidelines.

The recurrent splice-site variant, c.868-2A > G, was identified in three breast cancer patients (1.0%) and two controls (0.8%). Its similar frequency in cases and controls indicates that this variant may be benign. This is supported by the fact that it has a very high frequency (0.5669%) among South Asians in the gnomAD. In addition, RT-PCR analysis revealed that it did not affect the RECQL splicing. Thus, based on the Sherloc variant classification guidelines, our data suggest that c.868-2A > G may be benign. However, we cannot exclude that the aberrantly spliced allele may have escaped from detection due to the nonsense-mediated decay or other splicing events may have occurred that were not investigated in the present study.

The ER and PR positive breast tumor of the Pakistani patient with the pathogenic RECQL variant showed high grade and IDC histology. These findings are in line with those from other studies conducted in China [7], Poland [6], Belarus, and Germany [8] further supporting the notion that high grade, hormone receptor-positive breast tumors of IDC histology may be predictors of the pathogenic RECQL variant status.

Our study has several limitations. First, despite its reasonable size, larger studies are warranted to confirm our findings. Second, mutation analysis was restricted to patients with ER and/or PR positive breast tumors, in whom a predominance of pathogenic RECQL mutations has been reported [6,7,8, 11]. However, since patients with both ER and PR negative or triple-negative breast tumors were not tested, this may have undermined the prevalence of pathogenic RECQL variants reported in this study. Further, the functional analyses of the splice-site variant should be extended in order to confirm its classification as benign.

Conclusion

In summary, we identified a single pathogenic RECQL variant in 302 BRCA1 and BRCA2 negative high-risk patients with ER positive and/or PR positive breast tumors. The frequencies of the novel pathogenic variant were 0.3% (1/302) in early-onset and familial breast cancer patients and 0.8% (1/133) in familial patients. Our data suggest that pathogenic RECQL variants explain a negligible proportion of hereditary breast cancer in Pakistan.