Background

Acute lymphoblastic leukemia (ALL) is a result of uncontrolled, clonal proliferation of lymphoid cells due to arrest at an early stage of differentiation that can invade the bone marrow, blood, and extramedullary sites [32]. In India, 3.6% of new cancer cases were diagnosed as leukemia [3], out of which 75–80% were reported to be ALL [1]. ALL disease phenotype comprises of different subtypes with distinct somatic and genetic alterations, clinical features, and treatment responses [14, 15].

Mitochondria generate adenosine triphosphate (ATP) through oxidative phosphorylation, carried out by the electron transport chain (ETC) complex and also harmonize multiple interconnected aspects of cell proliferation, apoptosis and metabolism [2, 21]. The mitochondria have their own genome, a double-stranded 16.6-kb circular unmethylated DNA, encoding for 13 proteins of ETC complex, 12S rRNA and 16S rRNA, and 22 transfer RNAs. Mitochondrial DNA (mtDNA) is highly susceptible to oxidative stress due to its proximity to reactive oxidative species (ROS) produced in the mitochondrial matrix [33]. The ETC complex is made of multiple proteins encoded both by mtDNA and nuclear DNA (nDNA). A mitochondrion can contain two to eight copies of mtDNA [21], which is constantly turned over to maintain its copy number. The mtDNA changes reported in cancer include mutations, deletions, inversions, and copy number alterations and mutations in mtDNA regions have been reported in all types of human cancers [23, 50]. An analysis of mtDNA mutations in over 1600 tumor types reported over 1900 mitochondrial somatic DNA mutations with great variability in the number of mtDNA mutations observed across cancers with a very low frequency of mtDNA mutations in hematological cancers [24]. We have observed significantly higher mtDNA copy numbers in pediatric ALL patients than in controls and that mtDNA copy numbers could predict survival in ALL cases [20]. Variants in mtDNA have been analyzed in the D-loop and ETC coding regions in ALL [22, 27, 54, 55]. The current study aimed to explore the link between the mtDNA variants and mtDNA copy numbers by whole mitochondrial DNA sequencing to identify the regions encoded by these variants and their associated pathogenicity.

Materials and methods

Patients

The patients recruited in the study were newly diagnosed, untreated pediatric ALL patients. These were patients who visited the outpatient department of Medical Oncology at the B.R.A.I.R.C.H. Cancer Hospital at All India Institute of Medical Sciences (AIIMS), New Delhi between January 2017 and December 2017. The ALL diagnosis was confirmed by morphological, immunophenotyping and cytogenetic analyses on the bone marrow samples.

Sample collection and processing

Samples for sequencing and analysis of mtDNA were selected from bone marrow aspirates. The samples were collected from ALL patients in ethylenediaminetetraacetic acid (EDTA) vacutainers at the time of diagnostic sample collection. Mononuclear cells were obtained by whole blood cell lysis, aliquoted and stored at -80°. Total cellular DNA extraction was done as previously described, using lysis buffer, proteinase K and 20% SDS [8]. DNA was quantified using a Qubit fluorometer and quality was ascertained by the A260/280 ratio. One aliquot was used to estimate the activity of the ETC Complex I.

Mitochondrial DNA copy number estimation

Mitochondrial gene copy number was measured on the Bio-Rad CFX96 Real-Time PCR system for all samples using mitochondrial gene-specific primer sequences from the minor arc [35]. The primer sequences used for mitochondrial gene were Forward: CTAAATAGCCCACACGTTCCC and Reverse: AGAGCTCCCGTGAGTGGTTA and for nuclear gene, β-actin were Forward: TAGCTGTGCTCGCGCTACT and Reverse: TCTCTGCTGGATGACGTGAG. MtDNA copy number, normalized to copies of nuclear gene β-actin per cell, in each subject and control sample was calculated using the formula, 2∧[Ct(β-actin)−Ct(minor arc].

ETC Complex I enzyme activity estimation

Specific enzyme activity of ETC Complex I was estimated in cell pellets from the 20 tumor patients and 7 healthy controls. The activity was estimated using an ELISA-based kit for complex I (ab109721, USA) and protein concentration estimated using BCA Assay Kit (Thermo Scientific Pierce Cat No. 23225) as per the manufacturers’ protocol. Complex I activity was determined by following the oxidation of NADH to NAD+ and the simultaneous reduction of the dye (provided in the kit) as measured by the increase in absorbance at 450 nm in a spectrophotometer.

Whole mitochondrial DNA sequencing

Whole mitochondrial genome sequencing was performed on 27 participants and bone marrow aspirates were collected from 20 ALL patients and peripheral blood samples from two healthy adults and five sibling controls. The patient samples were stratified by mtDNA copy numbers to include 10 high and 10 low mtDNA copy number samples. The mitochondrial genome was amplified in two overlapping fragments using long-range PCR where a two-step PCR protocol was used to capture the target region to prepare the amplicon library for sequencing. PCR primers were designed for the target region, and genomic DNA was enriched for the region of interest by amplicon-based capture. The enriched genomic DNA was ligated with sequencing adapters to produce a library that was sequenced on Illumina HiSeq2500 to generate 2 × 150 bp sequence reads greater than 10,000 X sequencing coverage depth.

Identification of mtDNA variants

Multiple quality checks were done during the sequencing process, described in the Additional file 1: Sect. S1.I. The mtDNA variants, including SNPs and indels, were predicted individually for each sample using HaplotypeCaller and the results were combined using bcftools merge (version 1.9). Also, to distinguish mtDNA somatic variants (SNVs and indels) from germline variants, a case–control analysis was performed using MuTect2 by comparing the sequences of five paired tumor samples of patients with those of matched normal samples (respective siblings) with default parameters. Availability of normal tissue from the patients themselves as a representation of control sample was not an option as ALL is a hematological cancer and the leukemic cells are present everywhere in the blood and bone marrow of ALL patients. In addition, the systemic nature of ALL results in inflammation, which has been seen to affect the integrity of mitochondrial DNA. Hence, we took sibling pairs wherever feasible as the best match option.

Variant annotation

The impacts of rare and common variants on protein structure or function were predicted using SnpEff [4] and ANNOVAR [52]. Various mitochondrial specific databases and predictors were used for predicting the impact of variants on the pathogenicity of the disease, as mentioned in Additional file 1: Sect. S1.II. Additionally, variants were classified as pathogenic, likely pathogenic, of uncertain significance, likely benign or benign using the variant assessment guidelines as specified by the American College of Medical Genetics [39]. Various human mitochondrial genome databases such as, mtDB [18], mtSNP [7], HmtVar [37], PhyloP [36] and relevant literature were used for this classification.

Candidate somatic variants

All common variants reported with a minor allele frequency (MAF) of > 1% in the population-based databases (such as dbSNP-based Aggregate Allele Frequency-ALFA) [44] were not considered since those variants are likely to represent rare polymorphisms or low-penetrant variants rather than pathogenic somatic driver mutations. Known germline variants reported at dbSNP (version 151) were excluded and alterations listed as known somatic variations in COSMIC [6] were retained. All the variants were identified to be either ‘known’, based on their previously reported dbSNP ID, or ‘unreported, new’ due to an unassigned dbSNP ID.

Heteroplasmic and homoplasmic variants

Heteroplasmy is a condition where distinct mtDNA genome sequences for the same gene locus coexist in the same cell. Detection of heteroplasmic and homoplasmic variants and haplogroup classification was performed using publicly available online, mtDNA-server [53].

Statistical analysis

Non-parametric statistical tests were applied using GraphPad Prism 7.0 software (San Diego, California, US). Mann–Whitney U test (two-sample Wilcoxon rank-sum test) was used to look to compare mitochondrial DNA copy numbers and complex I enzyme activities in patients and controls. Analysis of effect of the type of variant (loss-of-function (LOFs)/missense) on the complex I enzyme activities was done using Mann-U Whitney statistical test. A ‘p’ value of < 0.05 was considered to be statistically significant.

Results

Patient characteristics

All 20 pediatric ALL patient samples analyzed in the sequencing analysis were of B-cell origin (Table 1). The mean age of patients was 10.2 years, and 12.8 years for controls. All the patients selected for the mitochondrial genome sequencing had > 70% leukemic lymphoblasts in the bone marrow.

Table 1 Characteristics of B-ALL patients (n = 20) included in the study

MtDNA copy number analysis

The mtDNA copy numbers in patients were significantly higher than that in healthy controls (p-value = 0.0026, Fig. 1a). The patients were divided into two categories based on their copy numbers i.e., high and low mtDNA copy numbers for comparison of mtDNA genome sequencing data. The categorization was based on the median mtDNA copy number (417.3 copies per cell) in the patient group.

Fig. 1
figure 1

(a) Graph representing significant differences between mtDNA copy numbers in ALL patients (n = 20) and healthy controls (n = 7), (b) Scatter-plot comparing complex I enzyme activities between ALL patients and controls

Correlation between complex activity and mtDNA copy number

ELISA-based estimation of ETC Complex I enzyme activity showed a higher activity in the patient samples than controls, which was not statistically significant (p-value = 0.088, Fig. 1b). No significant correlation was found between enzyme activity and mtDNA copy numbers, when analyzed using Spearman’s correlation coefficient (data not shown). However, in our larger cohort, complex activities were found to be significantly higher in tumors and they showed a correlation with mtDNA copy number (unpublished results).

Analysis of variants

Full-length mitochondrial genome sequencing in the patient samples and controls using Illumina HiSeq platform revealed a total of 325 variants in the mitochondrial genome in 20 patients and seven controls that passed the criteria of the minimum genotypic quality score i.e., 20. The 325 variants identified included 104 unreported variants, while the remaining, 221 had a previously known dbSNP ID (database for single nucleotide polymorphism). Out of these 325 variants, there were 94 variants that were overlapping in both tumors and control samples, with 199 uniquely present in ALL patients and 32 base positions present only in controls (Table 2). Region-wise filtering of the unassigned variants from GATK variant calling identified the regions with a disproportionately high number of variants, i.e., the D-loop (n = 17), ND5 (n = 11), ND2 and ND4 regions (n = 10 each).

Table 2 Classification and number of variants in patients and controls

Based on their impact on subsequent protein levels or function, the 325 variants were classified into low (n = 131) (synonymous), high (n = 7) (loss-of-function, LOF), moderate (n = 75) (missense), and modifier (n = 112) (non-protein coding) variants. The seven LOF variants were observed in 7 out of 20 tumor patients and 2 out of 7 healthy controls. Two LOFs i.e., positions 5186 and 12007 were present in both tumor and sibling samples, while five were exclusively present in tumors. The new LOF variants are given in Table 3; Additional file 1: Table S1 lists the LOFs with a previously known dbSNP ID. Both LOFs in controls were seen in sibling samples. The LOF variants were present in the regions which encoded for ND1, ND2, ND4, and ND5 subunits of complex I and COX1 and COX3 subunits of complex IV of the ETC (Fig. 2a). The complex activities in patients with LOFs in regions encoding for ETC complex I were not significantly different from those patients with no LOFs at all or LOFs present in other complexes.

Table 3 List tabulating the new LOF variants in different regions (column 1) encoding genes (column 2) of the mitochondrial genome at various base positions (column 3) and their presence/absence and in the number of tumor and control samples (column 4)
Fig. 2
figure 2

Circos plot with the outer circle representing the mitochondrial genome of 16,569 bps showing the genes encoding for 13 proteins of ETC complex, 2 ribosomal rRNAs and 22 tRNAs. The inner-circle (a) denotes the 7 LOF variants (LOF) in green and (b) 21 missense variants (Mis) in blue

Out of a total of 75 missense variants mentioned earlier, variant annotation by SnpEff and Annovar found 21 unreported, missense variants encoding for genes of the ETC complexes and tRNAs (Table 4) and 54 missense variants with an assigned dbSNP ID (Additional file 1: Table S2). The 21missense variants, marked in the Circos plot (Fig. 2b), could have a moderate impact on protein-coding genes and could be disease-associated. Among these 21variants, the majority were found in tumor samples (18 ALL patients and 3 siblings) mostly in ND4 and ND5 encoding regions. In comparison of the complex I activities in patients with missense variants in regions encoding for complex I and those patients with no missense variants, no statistically significant difference was found. We observed a particular base position 4769 bp, which is a missense variant to be present in all the patients and this base position has a known dbSNP ID. This base position encodes for ND2, helical region and has been previously reported in multiple diseases such as Schizophrenia, muscle disease, cardiomyopathy, neuropathy.

Table 4 List tabulating the 21 new, missense variants, with unassigned dbSNP IDs, in different regions (column 1) encoding genes (column 2) of the mitochondrial genome at various base positions (column 3) and their presence/absence in the number of tumor and control samples (column 4)

Differences in mtDNA variants between high and low mtDNA copy number ALL patient samples

LOF variants were present in 4 of 10 ALL patients with high mtDNA copy number and 3 out of 10 with low mtDNA copy number but none of the variants were unique for either group. All ten patients in the high mtDNA copy number group had new, missense variants at different mitochondrial genome sites. There were 11 sites in all that encoded for ND2, ND5 and ND4L of the ETC complex I, COX1 of complex IV and a tRNA. In low mtDNA copy number group, eight out of the ten ALL patients had ten missense variants at various base pair positions, encoding for ND1 of complex I, COX2 and COX3 subunits of complex IV, ATP6 of complex V, and tRNAs. No predilections for any specific regions were observed between the two categories of ALL patients stratified by mtDNA copy number.

Analysis of new, unassigned missense variants using publicly available human mitochondrial genome databases

Deleterious impact and pathogenicity of missense variants that were unassigned by a dbSNP ID were summarized in a variant annotation table. The variants were characterized by pathogenicity predictors, cancer-specific predictors, pathogenicity meta-predictors and database of allelic frequency and phenotype. It was observed that 17 variants with unassigned dbSNP IDs, were classified as either possibly damaging, probably damaging, deleterious, damaging, disease, disease-causing, high impact, pathogenic or being fast-evolving sites based on different predictive software (Additional file 1: Table S3). The most common deleterious effects were seen in missense variants at base positions 5452, 13,022, and 13,535, encoding for complex I and 8594 encoding for complex V. Also, 41 out of 54 missense variants with a known dbSNP ID were listed as pathogenic or deleterious by these tools for predicting pathogenicity in Additional file 1: Table S4.

Analysis of somatic variants

MuTect2 pipeline of GATK compared the five paired samples of ALL patients and their respective siblings to filter out the germ-line variants and list out the true positive somatic variants [6]. Other criteria are listed in Additional file 1: Sect. S1.III. Overall, 93 somatic variants, out of which 6 were known somatic variants as already reported in COSMIC and 87 variants were probable somatic variants. Among these 87, a total of 32 variants had an existing dbSNP ID (Additional file 1: Table S5) and 55 were new probable somatic variants. Table 5 lists out these new probable somatic variants consisting of both missense and LOF variants at various base positions. The new somatic variants predicted by MuTect2 were mapped with the variants called individually by GATK and it was observed that 3 LOFs and 17 missense variants were overlapping between the lists in Tables 3 and 4. The details of somatic variants in the five pairs of ALL patients and their respective siblings are given in Additional file 1: Table S6.

Table 5 List of the new probable somatic variants at various base positions (column 1) in the mitochondrial genome after comparing five pairs of patient and sibling samples using MuTect2 and ruling out germ-line and known COSMIC somatic variants, encoding for various regions (column 2) and the type of variant (column 3)

Haplogroup determination of the mtDNA variants in ALL cases and controls

MtDNA-server analyzed the ALL patients and paired sibling controls for assigning haplogroups. The haplogroups assignments were M6, R, U2b1a, U1a1c1d and M35b2.

Analysis of heteroplasmic and homoplasmic variants

MtDNA-server database was used to identify homoplasmic and heteroplasmic variants at different base positions in the sequenced samples. A total of 415 base positions showed various heteroplasmic and homoplasmic variants, of which 64 positions showed heteroplasmy in some patients and homoplasmy in others. There were 58 positions (13.9%) that exclusively showed heteroplasmy and 229 positions had a homoplasmic appearance in the mtDNA genome. We are reporting here, the most common heteroplasmic and homoplasmic mtDNA sites only, from a very long list. The maximum number of homoplasmic variations were found at the mtDNA sites, m.263A > G (D-loop), m.750A > G and m.1438A > G (12S rRNA), m.4769A > G (ND2) and m.8860A > G (ATP6) in all 20 tumor samples and seven controls. m.73A > G (D-loop), m.7028C > T (COX1), m.14766C > T and m.15326A > G (CYTB) were present in 19 patients and seven controls and m.11719G > A (ND4) was found in 18 patients and seven controls. Similarly, the highest number of heteroplasmic variations were found at the base pair position m.3107N > T, present in all patients and controls, encoding for RNR2 (16S rRNA). The presence of homoplasmic and heteroplasmic variants in tumors and controls was also checked in the filtered list of new, missense and LOF variants in all samples (Additional file 1: Sect. S1.IV).

Discussion

The research data presented in this paper describe the deleterious variants that we identified in our study, in addition to previously reported ones, in the mitochondrial genome in lymphoblastic leukemia. Integrated genomic and transcriptomic analyses of large cohorts of childhood ALL have revolutionized our understanding of the genetic basis of ALL by identifying new subtypes, dysregulated pathways, and therapeutic targets within the last decade [41]. Metabolomics and proteomics studies have also been performed in ALL earlier [29]. However, whole mitochondrial DNA sequencing data is limited for ALL and the mitochondrial genome’s specific role in affecting the pathogenesis and prognosis in ALL is still unknown. This is due to factors such as the technical complexity inherent in analyzing the mitochondrial genome, the redundancy of the function of individual units of the mtDNA, and the non-feasibility of acquiring adjacent normal tissue.

Next-generation sequencing (NGS) is being used to detect targetable lesions in patients with ALL and to integrate sequencing data into disease management protocols [17]. The genomic landscape of adult and pediatric BCR-ABL1-like B-lymphoblastic leukemia has been described using parallel DNA and RNA sequencing [42]. A few studies have analyzed the whole mitochondrial genome to identify variants that could be helpful in ALL diagnosis, risk stratification, treatment planning and minimal residual disease (MRD) assessment [5, 25]. A pan-cancer mtDNA study identified the presence of 19 LOF somatic mutations in different regions of ETC complexes in pediatric B-ALL cases [49]. Our work reports novel LOFs not reported previously in B-ALL and, in fact, we have identified distinct LOF variants, with five of them encoding for ND1, ND2, ND4 and ND5 of complex I subunits and the remaining two for COX1 and COX3 of the complex IV region. We observed a higher number of LOFs in tumors (n = 7) as compared to controls (n = 2), suggesting a possible impact on the structure and function of these complexes in leukemia. Aside from previously reported disease-associated ATP6 variants such as m.8860A > G that occur with high frequency in breast cancers [10], we found additional deleterious/disease-related mutations in the ATP6 region in 13 ALL cases studied. Recently, authors have shown the utility of genome sequencing in diagnosis of pediatric patients with suspected mitochondrial disease and identification of nuclear and mtDNA variants [40]. A study performed in MELAS patients carrying the heteroplasmic m.3243A > G mutation showed that high mtDNA copy number and low heteroplasmy levels correlated with less severe disease [11].

An elevated mtDNA copy number might be associated with higher number of variations which might subsequently be associated with the pathogenesis of B-ALL. Previous studies have reported non-neutral missense or nonsense leukemia-specific mutations in mtDNA regions encoding ND1, ND2, ND4L, ND6, COX3, ATP6 and ATP8 genes in pediatric ALL patients indicating a role as a potential diagnostic marker [22, 54, 55]. In comparison to the two recent studies on pediatric B-ALL [22, 49], we observed 21 missense variants at new, unreported sites. Interestingly, many variants were only present in the tumor samples suggesting a likely association with some aspect of cancer. Most of the new variants were in regions encoding for different complex I subunits, of which the greatest number were in ND4 and ND5 regions. Given the role of complex I in maintaining the NAD+/NADH balance and ROS levels, the generation of mitochondrial membrane potential and ATP production, its dysfunction often underlies or contributes to mitochondrial disorders [26]. This hypothesis is supported by studies on mtDNA mutations in other cancers showing that functional disruption of Complex I leads to elevated ROS, manifesting as an increase in metastatic propensity and in the transcription of glycolytic- and metastasis-related genes [16, 24, 57]. We also observed two mutations linked to metastatic potential in other cancers: A13966G and A10398G (Additional file 1: Table S1) with the latter also being a germline polymorphism associated with cancer susceptibility [48]. While recent large-scale pan cancer analysis of mtDNA genomes suggest that the mtDNA mutational load in hematopoietic cancers is low compared to other cancers, the observation of some known and some novel mutations in our dataset indicate that mtDNA changes may contribute to pathogenesis and may have prognostic value [34, 56]. Additionally, the preponderance of variants in Complex I emphasizes its critical role as complex I is the largest complex of the ETC. Its functional alterations could lead to downstream effects on the assembly of mammalian ETC super complexes [31]. No correlation between presence of LOFs and missense variants in regions encoding for complex I and their enzymatic activities of complex I, in ALL patients, indicated that the activity doesn’t depend on the variants, directly, but might affect the functionality of the complexes of ETC.

Appropriate representation for assessing mtDNA mutations requires using paired tissue samples (tumor and normal) for identifying somatic variants [13]. We found 87 probable somatic variants in five case–control pairs, among which 32 had dbSNP IDs but 55 were new somatic variants, most commonly present in ETC complex I. Interestingly, there were overlaps between the new, probable somatic variants and the three LOFs and 17 missense variants present in ALL patients. We found that the ND4 and ND5 regions had the maximum number of new somatic mutations, a fact reported for other cancers [28, 43] and suggested to alter tumor progression [19].

Most pathogenic mtDNA mutations are heteroplasmic; mutant and wild-type (WT) mitochondrial genotypes can coexist, and affected cells will manifest metabolic defects only when the mtDNA mutation levels exceed a critical threshold [47]. Genetic and metabolic plasticity in cancer cells bearing heteroplasmic mutations was shown to allow higher oncogenic/metastatic potential [9]. Studies support the notion that mtDNA mutations occur throughout the tumor course and drift to homoplasmy with appreciable frequency [24, 56]. There appears to be a tissue-specific selection process for mtDNA mutations, advantageous for cancer spread [51]. The LOFs we found were higher in number in tumors and more in the homoplasmic than the heteroplasmic state in ALL samples. LOFs in the control samples were in a heteroplasmic state, which could theoretically maintain a threshold for maintenance of normal mitochondrial function. Deep NGS data from 1916 patients from 24 cancer types showed a positive selection in tumors via a preferential increase in allelic frequencies of non-synonymous mtDNA variants [12]. A direct correlation was seen between the increase in the number of heteroplasmic mtDNA variant and extent of atherosclerotic morphologic phenotype based on ratio of minor and major nucleotide mtDNA variants [45].

We did not observe a correlation between higher mtDNA copy numbers and complex activity, although the overall complex I activity was higher in tumors than controls. These observations indicate that higher mtDNA copy numbers in tumors do not necessarily translate into significantly higher complex activities. Perhaps a cumulative suppressive effect of LOF variants could impair complex function in leukemic cells. A similar observation was made in AML cells where increased mitochondrial mass and activity of the respiratory chain enzymes were not concordant [46]. A study in head and neck cancers, and esophageal squamous cell carcinoma, proposed that increased mtDNA copy number could compensate for the damaged mtDNA to maintain the supply of ATP by mitochondrial respiration to maintain cellular survival [30].

Conclusions

In summary, our data provide some information on novel LOF and missense variants in the ETC complexes in leukemic patients at sites, in addition to those already considered as somatic mutation hotspots in mtDNA genome of cancers. These variants could possibly have disease-associated or disease-modifying effects and deserve further exploration for functional relevance. The loss of function variants may support tumor growth by facilitating the rewiring of mitochondrial function in cancer cells to shift from energy generation to re-routing metabolites for macromolecular biosynthesis. Finally, a lack of correlation between mtDNA copy numbers and complex activity indicates that leukemic cells may maintain complex activities by upregulating copy numbers to reach a critical threshold for mitochondrial function necessary to support tumor growth and survival [38].

The presence of mtDNA variants in B-ALL portends a role in cancer through its impact on protein function and metabolic rewiring. However, the exact role of mitochondrial variants in leukemia requires in-vitro experiments to delineate mechanistic effects.

The major limitation of our study is that although we have found LOF and missense variants in greater numbers in ALL, functional validation of the variants linking them with causation/progression of cancer has not yet been done.