Next generation sequencing for molecular diagnosis of neuromuscular diseases
- First Online:
- Cite this article as:
- Vasli, N., Böhm, J., Le Gras, S. et al. Acta Neuropathol (2012) 124: 273. doi:10.1007/s00401-012-0982-8
Inherited neuromuscular disorders (NMD) are chronic genetic diseases posing a significant burden on patients and the health care system. Despite tremendous research and clinical efforts, the molecular causes remain unknown for nearly half of the patients, due to genetic heterogeneity and conventional molecular diagnosis based on a gene-by-gene approach. We aimed to test next generation sequencing (NGS) as an efficient and cost-effective strategy to accelerate patient diagnosis. We designed a capture library to target the coding and splice site sequences of all known NMD genes and used NGS and DNA multiplexing to retrieve the pathogenic mutations in patients with heterogeneous NMD with or without known mutations. We retrieved all known mutations, including point mutations and small indels, intronic and exonic mutations, and a large deletion in a patient with Duchenne muscular dystrophy, validating the sensitivity and reproducibility of this strategy on a heterogeneous subset of NMD with different genetic inheritance. Most pathogenic mutations were ranked on top in our blind bioinformatic pipeline. Following the same strategy, we characterized probable TTN, RYR1 and COL6A3 mutations in several patients without previous molecular diagnosis. The cost was less than conventional testing for a single large gene. With appropriate adaptations, this strategy could be implemented into a routine genetic diagnosis set-up as a first screening approach to detect most kind of mutations, potentially before the need of more invasive and specific clinical investigations. An earlier genetic diagnosis should provide improved disease management and higher quality genetic counseling, and ease access to therapy or inclusion into therapeutic trials.
KeywordsNeuromuscular disorder Sequencing Molecular diagnosis DNA barcoding Myopathy Neuropathy
Inherited neuromuscular disorders (NMD) form a group of genetic diseases which result in chronic long term disability posing a significant burden to the patients, their families and public health care. NMD are often severe and include more than 200 monogenic disorders with a total incidence exceeding 1 in 3,000 . Despite tremendous research and clinical efforts, the molecular causes of NMD are still unknown for approximately half of patients. The precise diagnosis of NMD requires an extensive clinical evaluation in conjunction with targeted complementary tests. To date, routine genetic diagnosis is mainly done on a gene-by-gene basis, starting from the most pertinent one. Diagnostic challenges in this group of diseases include genetic heterogeneity in most of the disorders and lack of segregation data in sporadic cases to orient the screening. As an example, hereditary sensorimotor neuropathies (HSMN) are due to mutations in nearly 50 genes, while congenital myopathies implicate at least 14 different genes [21, 22]. Also, large genes, such as Titin (TTN) with 363 exons, are not entirely sequenced even if previously linked to NMD . As a consequence, clinical tests are multiplied, DNA is sent to different laboratories and patients are submitted to thorough examination that includes sometimes invasive investigations. Often genetic diagnosis is delayed, exposing the patient to unnecessary investigations and treatments, precluding the full benefit of a targeted approach to treatment, and increasing recurrence risk in the families.
Current molecular diagnostic approaches are time-consuming and expensive. Recently, massively parallel sequencing using next generation sequencing (NGS) technologies has emerged as a successful approach to interrogate multiple genes simultaneously and is currently mainly used to identify novel disease genes in a research setting [10, 18, 19, 20, 31]. A fewer studies reported the use of whole genome (WGS) or whole exome (WES) sequencing for genetic diagnosis of a given monogenic disease. Concerning NMD, Lupski et al.  and Montenegro et al. , respectively, used WGS and WES in patients with hereditary sensorimotor neuropathies (HSMN) on a research setting. In both cases, they analyzed a single family and focused their variants ranking only on known HSMN genes. Targeted resequencing of known disease genes appears more relevant for routine diagnosis and until now was tested on a few specific disease genes like 5 ataxia genes or 21 breast cancer genes or a single large gene like DMD in very homogeneous patient cohorts, or for carrier testing [6, 13, 30].
As no previous large scale sequencing study targeting several NMD genes was reported, our aim here is to pilot an efficient screening strategy in an attempt to improve the clinical and molecular investigations of neuromuscular diseases from a very heterogeneous panel of patients. We used targeted enrichment of 267 known NMD genes followed by NGS in patients affected by different neuromuscular diseases with or without known mutations. DNA multiplexing and blind variant ranking retrieved successfully different mutation types for diseases with different segregations.
Patients and methods
Two groups of patients with various neuromuscular diseases were selected: eight patients with pathogenic mutations previously identified by conventional Sanger sequencing of candidate genes (patients A to H), and eight random patients without known mutations and different clinical diagnosis encompassing myopathies and neuropathies (patients I to P). Clinical and segregation data are listed in the online resource data. DNA was extracted from venous blood by three different methods: two manual methods, FlexiGene DNA kit (Qiagen GmbH, Hilden, Germany) and Bacc Nucleon 3 (Amersham-Bioscience), and one automated method, the QIA symphony DNA midi kit (Qiagen GmbH, Hilden, Germany). Informed consent was obtained from all individuals, and the study was approved by the comité de protection des personnes (DC-2012-1497).
Targeted massively parallel sequencing
All the 267 NMD genes, known to be implicated in 16 different disease classes (online resource Table 1; http://www.musclegenetable.org/) were targeted for enrichment. Capture design was done using the Agilent eArray (http://earray.chem.agilent.com/earray/). In this pilot study, we included the 267 genes because genetic heterogeneity exists in all disease classes and several genes are implicated in different classes (e.g. LMNA or DNM2). Oligonucleotides covered all coding exons and all intron–exon boundaries including at least 50 intronic nucleotides. 5′ and 3′ UTRs and deep intronic sequences were not targeted to avoid increasing the sequence target size that would have strongly decreased the mean sequence coverage. After masking the repetitive elements, the 4,604 targeted exons represented 1.6 Mb (online resource Table 2). A minimum of 3 μg of genomic DNA was sheared to obtain a mean fragment size of 250 nt using Covaris E210 (KBioscience, Herts, UK) followed by automatic library preparation with the SPRI-TE Nucleic Acid Extractor (Beckman Coulter Inc, Brea, CA) using the SPRIworks fragment library cartridge (Beckman Coulter Inc, Brea, CA) and Illumina adapters. Targeted regions were captured using the Agilent SureSelect custom target enrichment kit (Agilent Technologies, Santa Clara, CA) following Agilent protocols. Enriched DNA fragments were barcoded with the Illumina multiplexing sample preparation kit (Illumina, San Diego, CA), pooled by 4, and sequenced on an Illumina Genome Analyzer IIx to generate 72nt paired-end reads for 4 DNAs per channel, following the manufacturer’s protocols.
Variants filtering and ranking were done using the VaRank program. Briefly, for each variant, VaRank used the Alamut software (Interactive Biosoftware, Rouen, France) to collect genomic annotations and different scores such as the coding status, the nucleotide and amino acid conservation scores and the effect of each change on the protein and splice site, and then compiles them to rank SNVs and indels starting from the most probable pathogenic. The SNVs/indels are characterized using several genomic or functional annotations that VaRank summarizes into a score to produce a list of ranked variants (manuscript in preparation). The probable mutations are ranked starting from the most likely to be pathogenic according to the following list: nonsense, frameshift, essential splice site (affecting the conserved consensus intronic positions), start loss, stop loss, missense, predicted splice site mutation (outside of the consensus sites), in frame indels, and synonymous coding. The scores are modulated according to the genomic conservation based on the phastcons score  and to the SIFT  and PolyPhen v2  scores to assess the effect of amino acid change on the protein function. Synonymous coding variants might have an effect on the biosynthesis of the protein , and their potential impact on splicing was also scored. Splicing effect is assessed using three different softwares: Human Splicing Finder , MaxEntScan  and NNSplice . Scores of compound heterozygous mutations in the same gene were added in case of a recessive segregation of the disease to prioritize the best candidate genes. Clinical significance was extracted from dbSNP134 and some of the known mutations were confirmed using locus-specific mutation data bases (LSDB-http://www.umd.be/). The “Clinical significance” field from dbSNP highlights known mutations with an “rs” identifier. Variants annotated as “probably-pathogenic” or “pathogenic” usually corresponded to reported mutations and were weighted to reach a high VaRank score. Indeed, these variants were not filtered as some healthy people are carriers of mutations. In the next step, genes within the patient disease class(es) were extracted and the mode of inheritance of the disease in the family, if known, was matched to the known type of transmission for every selected candidate genes.
In order to detect large deletions, a coverage-based method was used where the number of reads in a sliding window of 20nt was computed across the genome for each patient and coverage compared to three randomly selected patients.
Sanger sequencing was performed to confirm sequence variants in the original DNA samples and to assess the segregation in the families included in this study (GATC Biotech).
Sequencing, coverage and variant statistics
Sequence after filtering (in nt)b
Sequence in target regions (in nt)
Mean coverage (x)
Median coverage (x)
% Base ≥3× coverage
% Base ≥10× coverage
Fully covered exons
SNVs + indels
SNVs + indels without rs number
Novel coding non-synonymousa
Novel splice site changea
Novel coding stop (gained/lost)a
Novel coding frameshifta
Variants identification and ranking
Mutations identified in patients with known mutations and probable mutations in patients without previous molecular characterization
Mutation nucleotide (protein)
Carrier for myotubular myopathy (XL)
Exon4: c.141–144delAAAG (p.Glu48LeufsX24)
Centronuclear myopathy (AR)
Exon20: c.1717 C > T (p.Gln573X)
Duchenne muscular dystrophy (XL)
Ataxia ocular apraxia (AR)
Exon10: c.3213–3214insT (p.Gln1072SerfsX3); Int10: c.5275-1 G > A
Myotubular myopathy (XL)
Exon4: c.156–157insA (p.Cys53MetfsX8)
Centronuclear myopathy (AD)
Exon14: c.1565G>A (p.Arg522His)
Myotubular myopathy (XL)
Ataxia ocular apraxia (AR)
Exon10: c.2967-2971delGAAAG(p.Arg989SerfsX5); Exon8: c.994C>T (p.Arg332Trp)
HMSN, demyelinating CMT neuropathy (AR)a
Myopathy with cytoplasmic aggregates
Exon292: c.68576C>T (p.Pro22859Leu)
Bethlem dystrophy or myofibrillar myopathy (AD)
Muscular dystrophies, other myopathies
Exon27: c.6812G>A (p.Arg2271Lys)
Hereditary spastic paraplegia (sporadic)
Vacuolar myopathy (sporadic)
Congenital myopathies, distal myopathies, other myopathies
HMSN, axonal CMT (AR)
Exon11: c.1928C>A (p.Thr643Asn); c.1930C>T (p.Arg644Cys)b
Muscular dystrophy (AR)
Exon18: c.3100G>A (p.Val1034Met); Exon240: c.49243G>A (p.Ala16415Thr)
Muscular dystrophy and arthrogryposis (AR)
Muscular dystrophies, congenital myopathies, other NMD diseases
Exon55: c.8554C>T (p.Arg2852X); Exon81: c.11557G>A (p.Glu3853Lys)
Identification and confirmation of mutations
The large deletion encompassing exons 18–44 of the DMD gene was detected in a patient with Duchenne Muscular Dystrophy by comparing the number of reads in these regions with other sequenced DNA samples (Fig. 2b). The mean coverage for exons 18–44 is 0 for this patient and 177 for other patients (online resource Table 2, DMD gene for patient C). Unexpectedly, off-target reads from genomic DNA fragments not targeted for enrichment and located in DMD introns allowed to restrict the areas containing the upstream and downstream breakpoints from 27 to 11 Kb and 248 to 72 Kb, respectively (Fig. 2c). To assess the accuracy of deletion breakpoints predicted through off-target reads with the precise deleted positions, we analyzed this DMD deletion using the custom-designed oligonucleotides CGH-array previously described by Saillour et al.  where oligonucleotide probes cover both intronic and exonic DMD regions with an average tiling interval of 50 bases. CGH-array indicated that the 3′ breakpoint maps between 32,187,417 (position of the non-deleted probe) and 32,187,427 (position of the deleted probe), and the 5′ breakpoint maps between 32,538,435 and 32,538,443 (Fig. 2d). NGS data are coherent with CGH-array as the off-target reads closer to the deletion mapped at positions 32,072,428 and 32,547,130. The differences between the precise positions based on CGH-array and breakpoints found by NGS data are 115 kb for the 3′ breakpoint and 9 kb for the 5′ breakpoint. For a better precision of NGS method for mapping intronic breakpoints, targeting for enrichment of intronic sequences could be a possibility, but will increase the total targeted sequence length and thus decrease the overall coverage for a given sequencing depth.
Following a similar strategy, we analyzed 8 DNAs (I to P) from patients with heterogeneous neuromuscular disorders without molecular characterization. These patients were not selected based neither on the amount or quality of clinical data nor on the availability of DNA from other members of the family, to mimic the situation of routine diagnosis. There were no specific inclusion criteria. We identified probable disease-causing mutations in several patients. Sanger sequencing was used to validate the presence of mutations in the original DNA and also confirmed disease segregation (Fig. 3e–f, Table 2, and online resource Fig. 4). Sequencing, coverage and variant statistics were similar to the previous experiment (online resource Table 4). The identified mutations in RYR1, TTN and COL6A3 genes were in agreement with the clinical data (online resource data for patient descriptions). Importantly, while variants ranking can be made without detailed clinical and histological data to suggest probable mutated genes, such data are very valuable to validate the molecular findings. For example, we identified compound heterozygous mutations (including a nonsense mutation) in the ryanodine receptor (RYR1) in patient P presenting with muscular dystrophy and arthrogryposis (Fig. 3e–f). Indeed RYR1 mutations have previously been linked to several congenital myopathies,  and also to severe neonatal arthrogryposis . The mutations were found in his affected twin brother and each parent was found to be heterozygous for one mutation. We also identified probable mutations in the large TTN gene in patients J and O with myopathy with cytoplasmic aggregates and limb girdle muscular dystrophy, respectively, widening the clinical and molecular spectra for this gene that is not routinely sequenced on a diagnosis setting due to its large size.
We did not have false negative in the eight patients with known mutations as we retrieved all mutations. We checked the rate of false positive, i.e. variant not present in the starting DNA, in patients with unknown mutations where we found the probable disease-causing mutations. We found that the probability of being false positive due to sequencing or mapping errors is high when the percentage of reads showing the change is less than 25 or when the number of reads showing the change is less than 8. Even with these cut-offs, 16.5 % of false positive variants were found out of 40 variants tested, calling for validation of the mutations by Sanger sequencing.
In this study, we performed targeted sequencing of the coding sequences and all intron–exon boundaries including at least 50 intronic nucleotides of the known NMD genes through massively parallel sequencing in a cohort of patients with heterogeneous neuromuscular diseases. We were able to retrieve all the known mutations in previously characterized patients and we identified several novel pathogenic mutations in patients lacking molecular diagnosis.
We demonstrate that this strategy can detect several types of mutations including intronic and exonic changes as well as small indel and a large deletion. All mutations were detected from the massively parallel sequencing and analysis of a single proband, unlike previous studies where comparative sequencing of several individual exomes was used to retrieve the causative mutations in a family with HSMN . A main challenge in NGS data analysis is the identification of the pathogenic change among the large list of variants. Our blind analysis based on variant ranking and disease class allowed the identification of all known mutations. However, detailed clinical, histological and molecular data were necessary for the confirmation steps, i.e., matching the genetic data with the phenotype.
Such targeted parallel sequencing of all candidate genes is especially suitable for diseases with high genetic heterogeneity, as it is the case for NMD, and should ease the identification of allelic diseases, i.e., different diseases caused by mutations of the same gene. In addition, this strategy allows the analysis of large genes, such as TTN, that are routinely not fully tested by conventional Sanger sequencing even if known to be implicated in diseases. For example, our identification of a probable TTN mutation in a patient with myopathy with cytoplasmic aggregates and respiratory insufficiency widen the clinical spectrum compared to previous studies .
We did not find disease-causing mutations among the coding sequences of the NMD genes in four patients with unknown genetic cause. Patient I was first clinically diagnosed with demyelinating polyneuropathy, but clinical and biochemical re-analyses in parallel to NGS suggested he had a mitochondrial disease which implicated genes are not covered by our present design. Patient N showed two missense changes in LMNA including the p.Arg644Cys change, previously linked to various laminopathies. Both changes are on the same allele, as they were always found in the same reads/fragments (online resource Fig. 4e), and thus cannot be the sole cause of the axonal neuropathy. We did not have access to parent DNAs to investigate this further. For the other two patients, who were previously excluded for several candidate genes by Sanger sequencing, mutations were also missed by our approach. The disease-causing mutation may be a deep intronic change, repeat expansions or translocation for which detection has not been tested in this study. Concerning specifically repeat expansions, they cannot be mapped back to a reference genome unambiguously due to their repetitive nature. Alternatively, these patients may also be mutated in a gene not linked to NMD at the time of our targeting library design.
WGS or WES would in principle allow re-analysis of variants list once newly discovered genes are identified, while these genes should be added to update the NMD capture library. However, these approaches have several disadvantages for routine molecular diagnosis compared to the NMD-seq strategy described in this study, especially concerning design, coverage, variant analysis and validation, incidental findings, and throughput and price. WES capture library should also be updated to incorporate novel gene and exon predictions, and cannot be customized to increase the enrichment of specific exons difficult to capture or incorporate known intronic mutation hotspots. NMD-seq has a higher coverage and leads to a smaller list of variants as it focuses on a subset of genes, whereas the sensitivity and heterozygosity assessment decrease following WGS or WES due to lower coverage . Sequencing more genes at a lower coverage leads to an increased risk of false negative and an increased number of false positive variants that are time-consuming to validate. Indeed, WES at 50× mean coverage results in about 20 % of targeted regions covered less than 10 times, outputs not suited for routine diagnosis. WES or WGS also potentiate incidental finding, i.e. the discovery of an unrelated disease not targeted by the diagnosis measure; this could be an ethical issue. Moreover, WGS or WES have lower throughput and a higher cost, both are important issues for routine diagnosis. We validated DNA multiplexing for four DNAs in one channel to increase throughput and decrease the cost in comparison with conventional Sanger approaches which is about 500–1,000 € per sample for one gene but can increase to 5,000 € or more depending on gene size. Testing several candidates on a gene-by-gene basis may exceed 8,000 € . With recent developments in sequencers and DNA barcoding, we estimate the total cost of NMD-seq from a pool of 12 barcoded DNAs to about 500 € per patient for at least 140× coverage, while WES and WGS cost about 1,500 € and 5,000 € for a 50× coverage, respectively. It takes about 2 months to perform the NMD-seq approach for 267 genes (excluding validation of the data by Sanger sequencing), a similar turnaround time to what is proposed by diagnosis laboratories to test a gene with 20 exons under current routine diagnosis.
Major conditions for further use of the NMD-seq strategy as a routine approach in genetic diagnostic labs are the reproducibility, detection sensitivity and the study of heterogeneous cohort of patients with sometimes incomplete clinical characterization as it was the case in our study. After validation of other types of mutations not tested in this study, this strategy could be implemented as a first screening approach, potentially before the need of more invasive and time-consuming investigations such as biopsies.
Our strategy does not require the knowledge of detailed clinical data for proposing candidate mutations; however, this knowledge is necessary for the final validation of the diagnosis that can only be performed matching both clinical and genetic data. NGS will not replace clinical investigations but rather direct clinicians towards the most adequate investigations, while excluding unnecessary costly and time-consuming tests. As a general rule for all NGS approaches, the targeted regions are not homogeneously covered. This point could most probably be enhanced by increasing the amount of oligonucleotides targeting regions more difficult to enrich. For an exclusion diagnosis, conventional Sanger sequencing of specific exons might be necessary if the best candidate genes for a specific disorder are not well covered by NGS. In any case, mutations found by NGS must be validated by Sanger sequencing to rule out sequence errors introduced during the NGS protocol, to provide a second independent confirmation and a basis for a simplified test for further counseling within the family. Importantly, variant ranking and confirmation depend on clinical data and knowledge on NMD, and are thus probably not directly applicable for direct-to-consumer testing.
A faster molecular diagnosis of NMD will have major impacts on patients as it will improve disease management and genetic counseling, and will allow access to therapy or inclusion into therapeutic trials. As an example, the identification of RYR1 mutations in patient P is of major medical importance as the treatment of RYR1 patients with salbutamol has shown significant amelioration of muscle weakness . In conclusion, we provided the first proof-of-principle that next generation sequencing could be apply for molecular diagnosis of neuromuscular disorders.
We thank Michel Koenig and Mathieu Anheim for DNA samples with known mutations, Ngoc-Hoan Nguyen and Céline Keime for help in the bioinformatics setup and analysis, Claudia Strafella, Nicolas Dondaine, Gabrielle Rudolf, Juliette Nectoux, Céline Leroux, Isabelle Atlan and France Leturcq for technical assistance, and Gisèle Bonne for discussions. This study was supported by the Institut National de la Santé et de la Recherche Médicale (INSERM), the Centre National de la Recherche Scientifique (CNRS), University of Strasbourg (UdS), Collège de France and grants from the Agence Nationale de la Recherche, Muscular Dystrophy Association and Myotubular Trust.
Conflict of interest
The authors declare that they have no conflict of interest.
This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.