Introduction

Next-generation sequencing (NGS) includes several technological advances on Sanger’s dideoxy chain terminator sequencing [1] and Mullis’ polymerase chain reaction [2], which have been the cornerstones of medical sequencing technologies until the last decade. Different NGS technologies and applications are currently available to researchers and clinicians, and their rapid expansion makes it likely that NGS will be part of routine pediatric practice in the not-too distant-future.

Only a few short years after its implementation, NGS is already receiving considerable attention in the media as well as in the scientific literature. The public is aware of extraordinary success stories; for example, that of the young Nicholas Volker. This 4-year-old boy was affected by a severe undiagnosed illness characterized by bowel inflammation. Based on the recommendation of his pediatrician, Nicholas’ DNA underwent NGS sequencing and a mutation in the X-linked XIAP gene, involved in the regulation of the immune system, was identified. Mutations in XIAP are associated with X-linked lymphoproliferative syndrome (XLP2, OMIM # 300635) [3]. In this instance, the information gathered by NGS directed Nicholas’ diagnosis and treatment; in fact, he underwent an allogeneic hematopoietic progenitor cell transplant to prevent hemophagocytic lymphohistiocytosis, a life-threatening condition [4].

Despite the success stories, however, use of NGS both in research settings and the clinic presents several difficulties and requires careful application, specialized technical knowledge, and advanced bioinformatics, including data analysis, storage, and interpretation. Here, we will review technical considerations in study design and applications in the clinical setting. We will provide examples of NGS studies with a specific focus in pediatrics, and discuss some of the advantages and pitfalls of NGS sequencing. Technical considerations on platform selection, costs, turnaround times, and sequencing error rates are beyond the scope of this paper and are reviewed elsewhere [5]. NGS is predicted to become even more common, as new technologies [6] and bioinformatics [5] are implemented.

In addition to Whole Genome Sequencing (WGS), targeted regions of interest on the genome can be selected, or “enriched for”, during the preparation of the experiment [7]. Targeted sequencing allows the investigators to achieve higher coverage, i.e. the number of times each DNA base is assessed, at a fraction of the cost. For this reason, it is particularly useful for designing diagnostic assays in a number of well-defined conditions, such as non-syndromic deafness [8] or cardiomyopathy (Pan Cardiomyopathy Panel, Center for Personalized Genetic Medicine, Harvard Medical School).

The most common type of targeted sequencing is whole exome sequencing (WES), which is focused on coding regions, or exons, of known genes. WES target regions collectively comprise about 30 Mb (1 % of the genome) [9], and its utility is based on the assumption that 85 % of the disease-causing mutations are contained in exons [10, 11]. It is important to keep in mind that WES is not designed to detect variation in regulatory DNA regions (such as distal promoters, enhancers, or deep within introns), or chromatin functional elements between one gene and the next (intragenic regions), which serve to regulate gene expression levels and may also be responsible for disease.

Applications and Interpretation

Many common human conditions do not follow the classical Mendelian theory that dictates that mutations in one gene are responsible for one disorder only (OGOD, one geneone disorder). In fact, NGS applications such as WGS and WES are beginning to be implemented clinically (1) for the diagnosis of congenital or early onset disorders with variable or atypical phenotypic presentations, to confirm or exclude a clinical suspicion or a dubious diagnosis; or—alternatively, (2) in conditions caused by one of many genes (known as genetic heterogeneity) [12], as screening all candidates at the same time is more cost effective and rapid than concentrating on one at a time [13].

To maximize the likelihood of achieving a genetic diagnosis, NGS studies must be planned carefully, particularly when deciding how many patients to sequence and whether to include family members. WGS and WES have been applied to: (1) single individuals, (2) families, and (3) groups of unrelated individuals sharing the same diagnosis or the same phenotypic trait. Each strategy has advantages and disadvantages:

  1. 1.

    A clinician may be drawn to study a single patient with a striking phenotype or an unusual presentation. In order to feel confident that NGS can identify the causative gene, additional information above and beyond simply the sequence results is required, such as the inheritance pattern. The laboratory, in fact, should be instructed by the clinician whether to look for a single mutation (necessary to cause an Autosomal Dominant [AD] disorder), two mutations in the same gene (necessary to cause an Autosomal Recessive [AR] disorder), or a mutation mapping to the X chromosome (necessary to cause an X-linked [XL] disorder) (Fig. 1). In addition, knowledge of the phenotype helps to prioritize genes for particular scrutiny during data interpretation; for example, the lab should scrutinize mutations in genes encoding cytoskeleton proteins in a patient with a blood defect, mutations in enzyme coding genes in a patient with a metabolic disorder, or synaptic receptor genes in a patient with a neurological condition. In fact, while sequencing a single patient is relatively cheap, a typical WES experiment yields about 20,000 variants, and a typical WGS yields millions [14•, 15], which results in hundreds of candidate mutations on which to follow up [5].

    Fig. 1
    figure 1

    Overview of the inheritance patterns discussed in this review. Classic Mendelian inheritance patterns (upper row): AD Autosomal dominant, in which the disease mutation, indicated by the red star, is passed on from an affected parent to the offspring; one mutation is sufficient to cause the phenotype. AR Autosomal recessive, in which both parents contribute one mutation to the offspring; two mutations in the same gene are required to cause the disease. XL X-linked recessive, in which the mutation on the X chromosome is passed on from an unaffected carrier female, causing 50 % of male children to inherit the mutation and manifest the disease; and 50 % of the female children will be unaffected carriers. Examples of non-Mendelian inheritance patterns are shown in the bottom row: de novo mutations originate in the germline; neither parent is a carrier. In UPD UniParental Disomy, the affected individual receives two copies of an entire chromosome or chromosomal region containing the mutation from one parent, and no copies from the other. Complex disorders are multifactorial, i.e. different mutations in multiple genes are required for the condition to manifest, often with the contribution of environmental factors (Color figure online)

  2. 2.

    Family studies, on the other hand, are particularly informative. Large multigenerational families are probably the most advantageous situation, because they can be combined with classical linkage analyses to point to the region of the genome with the highest statistical likelihood of containing the mutation. Unfortunately, these families are arguably rare to come across in the clinical setting. The most basic study designs involve “trios”, consisting of a proband and his/her unaffected parents or, less commonly, “quads”, which also include a sibling. Trios or quads are particularly useful to identify de novo and AR mutations. De novo mutations occur during germ cell formation and neither parent is a carrier (Fig. 1). By virtue of this, recurrence rates are virtually none. AR mutations, instead, may come in one of two flavors: homozygous, if both parents contribute an identical mutation, or compound heterozygous, if each parent contributes a different mutation in the same gene. Interpretation and candidate prioritization is more challenging for AD, heterozygous variants, because of the larger number of candidate mutations with this inheritance pattern in any given family.

  3. 3.

    Finally, cohorts of individuals with homogeneous phenotypes are the most informative, as they may point to different mutations in the same gene across unrelated individuals. Unfortunately, cohort studies are also particularly susceptible to confounding factors such as genetic heterogeneity, when different genes cause similar or even identical phenotypes.

NGS in the Clinical Setting

The major purposes of NGS have been either to diagnose genetic conditions or to develop personalized therapeutics by establishing a causal connection between a patient’s genetic variation and his/her disease. Although several outstanding results have been provided in translational medicine, the main application of NGS is currently the unraveling of causative mutations for unexplained genetic disorders (Tables 1, 2).

Table 1 Examples of disease-causing genes identified by means of NGS
Table 2 High-throughput NGS in different clinical categories, including apparently healthy subjects

Mendelian diseases are rare, although collectively they are estimated to affect 40–82 individuals per 1,000 live births [16]. If congenital anomalies are included in the rare disease group, up to 8 % of individuals in the general population are affected by a genetic disorder [17]. These genetic conditions typically affect children and are responsible for up to 20 % of hospital admissions in pediatric units.

Pediatric diseases with a relevant genetic component are mainly neurologic and include intellectual disability, speech delay, autism spectrum disorders and seizures, which often occur in syndromic association with each other. Major malformations, such as congenital heart diseases (CHD), are also included in this group.

Intellectual Disability

Intellectual disability (ID) affects approximately 1–3 % of the general population. Many severe forms of ID are genetic in origin, with mutations ranging from large cytogenetic abnormalities affecting entire chromosomes to point mutations in single genes. Cytogenetic abnormalities include gains and losses of chromosomal regions. The genetic etiology of ID remains unexplained in about 50 % of cases, although mutations in single genes yet to be discovered are likely to be the underlying cause in many patients with unexplained ID. NGS technologies are drastically changing this scenario. However, before translating these technologies into clinical practice, many problems have to be solved, with respect to obtaining both reliable genetic results and consistent genotype-phenotype correlations.

As a rule, a careful clinical evaluation of patients is recommended before pursuing NGS analysis. Experienced clinicians in the field of dysmorphology should perform a careful phenotypic evaluation. Many conditions with highly distinctive clinical presentation can be indeed investigated by specific gene sequencing. Examples are Mowat-Wilson (OMIM # 235730) (ZEB2), Rubinstein-Taybi (OMIM # 180849) (CREBBP or EP300), Sotos (NSD1), Pitt-Hopkins (OMIM # 610954) (TCF4), Kabuki (OMIM # 147920, # 300867) (MLL2 or KDM6A) and Schinzel-Giedion (OMIM # 269150)(SETBP1) syndromes. Multiplex ligation-dependent probe amplification (MLPA) may be used to complement targeted sequencing, if a causative mutation is not identified, to exclude the uncommon instances of small and intragenic insertions or deletions. Should targeted sequencing, MLPA, and Array Comparative Genomic Hybridization (array-CGH) be unrevealing in patients with these distinctive phenotypes, then proceeding to NGS is an appropriate next step in the diagnostic process.

Other disorders with highly distinctive clinical presentation are caused by either single gene mutations or deletion of the responsible gene, the latter being detectable by array-CGH. Disorders in this category include Kleefstra syndrome (OMIM # 610253) (EHMT1, or del9q34), Smith-Magenis syndrome (OMIM # 182290) (RAI1, or del17p11) and chromosome 17q21.31 deletion syndrome (OMIM # 610443) (KANSL1, or del17q21.31).

Finally, clinical evaluation may suggest imprinting of specific chromosome regions, such as in Prader-Willi/Angelman (OMIM # 176270, #105830) and Beckwith-Wiedemann (OMIM # 130650) syndromes. Imprinting mutations are linked to genes whose expression depends on whether they are inherited from the father or from the mother. Methylation-sensitive MLPA can detect even the uncommon UPD-associated epigenetic mutations, when entire chromosome regions are inherited from just one parent (Fig. 1).

In cases for which the mutation may not be predicted from the clinical phenotype, NGS, and WES in particular, is a powerful and cost-effective tool for attempting to understand the genetic basis of the disease over traditional gene-discovery methods, thus avoiding an excess of ineffective genetic testing. It is also effective in detecting mutations in known genes that can cause variable clinical presentation (phenotypic heterogeneity), or to elucidate the genetic basis of diseases with causal variants in different genes in a common molecular pathway (genetic heterogeneity) [12].

A first step in the selection of patients could be to group them into two main categories: (1) likely monogenic, or single gene, disorders; or (2) complex conditions, i.e. multigenic, or multiple gene disorders, with or without the contribution of environmental factors (Fig. 1). The causative and recurrent association of ID and minor physical anomalies (the so-called dysmorphic features), hypotonia, altered growth parameters and motor delay are to be considered strong indicators of a syndromic and monogenic form of ID. Non-syndromic clinical presentations in the autism spectrum, or non-syndromic conditions characterized by very mild ID, should rather be included in the second category, and meta-analysis of exome data performed according to a multigenic model of pathogenesis. However, it must be specified that no clear-cut categories can be defined in the broad spectrum of ID. A consistent number of severe, non-syndromic ID cases were indeed recently associated to mutations in a unique autosomal gene by exome sequencing [18••, 19].

The appropriate approach for mutation discovery depends on whether the condition is sporadic or familial. In sporadic forms of syndromic ID, filtering across different patients with overlapping clinical presentations is probably more effective in achieving consistent genetic diagnoses, mainly with respect to the discovery of variants in new genes. This approach rests on the assumption that all or most of the patients affected by the same recognizable clinical condition probably harbor mutations within the same gene or within a group of functionally related genes in cases of genetic heterogeneity [12]. De novo pathogenic variants can help prioritize heterozygous variants, but incomplete penetrance should be considered in healthy carriers. Incomplete penetrance, in fact, occurs when not every mutation carrier develops the condition. Often, unpredictable environmental and/or stochastic factors account for this situation.

The study of familial cases with a recognizable Mendelian pattern of inheritance can lead to the identification of causal genes by considering the inheritance pattern of sequence variants in a limited number of affected and unaffected individuals from the same kindred.

Examples of patient stratification are provided below.

  1. 1.

    Angelman-like or Rett-like phenotype, characterized by severe ID, usually absent speech, microcephaly, epilepsy, stereotypical and repetitive movements. Suggested testing includes CNVs by array-CGH, MLPA on chromosome 15q11q13, UBE3A, MECP2, CDKL5, FOXG1 and MEF2C sequencing. Genetic heterogeneity is proven for the Rett phenotype (OMIM # 312750 and # 613454), whose causative genes include MECP2, CDKL5, FOXG1 and MEF2C, and a shared molecular pathway has been demonstrated for MECP2 and MEF2C mutations [20].

  2. 2.

    Pitt-Hopkins syndrome-like phenotype. Classical Pitt-Hopkins syndrome (PTHS, OMIM # 610954) is a syndromic form of non-progressive encephalopathy, linked to the TCF4 gene on 18q21.1. A checklist-based clinical score aids in determining the likelihood of the diagnosis [21, 22]. Patients without a detectable TCF4 mutation who score high on the checklist should be studied by NGS in an effort to identify mutations in as-yet unrecognized genes associated with PTHS-like phenotype.

  3. 3.

    Syndromic forms of epilepsy. Epilepsy represents one of the most common features in the clinical spectrum of many syndromes, caused by either chromosome imbalances or mutations in single genes (Epi4 K Consortium 2013). Thus, accurate clinical evaluation is particularly important in this category of patients before enrolling them into NGS. Genes encoding ion channel proteins are likely candidates in these patients.

  4. 4.

    Familial ID with AD, AR, or XL Mendelian inheritance. Confounding factors in AD transmission are gonadal mosaicism, when the mutation is present in the germ cells of one parent, or incomplete penetrance, when the parent is a carrier of a mutation that does not cause the disorder in 100 % of cases. In these scenarios, inheritance may falsely appear as AR and the analysis of exome data must be performed accordingly [23, 24•, 25].

Finally, the most straightforward strategy in the identification of genetic causes of isolated idiopathic ID relies on the paradigm that de novo mutations are more likely to be pathogenic than inherited ones [26]. This paradigm can be applied to private syndromic forms of ID, when only one patient is selected for WES, but caution is recommended before transferring WES results into clinical practice, particularly regarding variants detected in new genes.

Congenital Heart Diseases

CHDs encompass a diverse group of birth defects, which together account for about 1 % of live births in the Western World [27]. Although many of these conditions are postulated to have a strong genetic component, genetic analyses are complicated by (a) non-Mendelian inheritance patterns, particularly when mutations in multiple genes at the same time are required to develop the condition (multigenic model); (b) genetic heterogeneity; and (c) genetic and environmental interaction. In fact, 55 phenotype-causing genes are known in humans and hundreds in mouse models [27]. Array-CGH experiments have shown that rare copy number variants (CNVs) may cause a sizable fraction of CHD, but the majority of cases remain without explanation.

Recently, 362 families comprising a proband and his/her unaffected parents (”trios” as discussed above) from the Congenital Heart Disease Genetic Network Study of the National Heart, Lung, and Blood Institute Paediatric Cardiac Genomics Consortium underwent WES [28••]. These patients were affected with conotruncal defects, left ventricular obstruction, or heterotaxy. The study was designed to identify mutations originating de novo (Fig. 1). CHD cases showed a significant number of mutations in genes expressed in the developing heart. Odds ratios were particularly high for deleterious mutation classes, such as premature termination, frameshift, or splice site [28••]. Interestingly, a number of the genes identified were involved in chromatin remodeling, particularly in the H3K4 and H3K27 methylation pathways that regulated the expression of key developmental genes. The authors concluded that de novo single base mutations or deletions/insertions of few bases (InDels) discovered by WES might be responsible for as many as 10 % of previously unexplained CHD cases [28••].

Different study designs need to be implemented to detect the rare or common inherited variants that are likely to cause the majority of CHD occurrences. This issue is extremely delicate, because for most inherited conditions suitably sized and screened control populations are not available, which affects the interpretation of rare genetic variations in the context of rare monogenetic conditions. A recent study attempted to determine the mutation frequency in 12 genes associated with Brugada Syndrome in the general population using the exome data from the NHLBI GO Exome Sequencing Project (ESP) [29]. This study highlighted the complex issues of discriminating between disease causing mutations and low-frequency genetic variants, as frequency is often used a criterion for candidate prioritization during NGS interpretation.

Incidental Findings

Because of the extraordinary amount of data produced, NGS is raising ethical and policy issues concerning incidental, or secondary, findings. Incidental findings are defined as results that are not related to the indication for ordering the test, but may have clinical relevance for the patient and his/her physician. Patient autonomy, privacy, and consent are additional concerns.

The American College of Medical Genetics (ACMG) issued a set of recommendations for reporting incidental findings in WGS or WES clinical sequencing [30••]. The ACMG Working Group identified two categories of variants: (1) previously reported and recognized causes of a disorder, or “Known Pathogenic”; and (2) previously unreported of the type that is expected to cause the disease, or “Expected Pathogenic”. According to their consensus, Known and Expected Pathogenic variants should be reported to the patients based on a “minimum” list that contains mostly rare, monogenic disorders. Conditions caused by translocations, inversions, repeat expansions, chromosomal deletions, and other mechanisms not readily investigated by clinical NGS were not included in the list. The Working Group recommended that laboratories actively screen their data for mutations in the minimum list [30••].

The ACMG Working Group recognizes that the amount of scrutiny reserved for these genes will not be equal to that reserved for the clinical indication to the test. According to their policy statement, the clinicians should be aware of the limitations of clinical sequencing and counsel their patients accordingly, emphasizing that negative results may not mean lack of mutations in the genes contained in the minimum list. Patients, both adults and children, may not opt out of receiving a report on incidental findings [30••]. This policy has been criticized as “paternalistic”, as it allegedly ignores the fact that patients may choose not to learn about potentially serious conditions [31], particularly regarding late onset disorders for which preventive therapies may not be available (e.g. Alzheimer’s Disease). The ACMG Working Group, however, believes that it is up to the ordering physician to contextualize any incidental findings based on relevant clinical information, and that it would be prohibitive for laboratories to make case-by-case decisions on what to report [30••, 32].

Medically actionable findings are estimated to occur in ~3.4 % of European-descent and ~1.2 % of African-descent adults undergoing NGS, based on a longer gene list than the ACMG “minimum” [33•]. The majority of those variants could be equated to the ACMG “Known Pathogenic”, according to the Human Gene Mutation Database (hgmd.org). However, the authors noted that currently available databases do not classify pathogenic mutations with the necessary rigor, and every incidental finding requires expert review before being reported [33•].

Interestingly, parents of children with rare, undiagnosed genetic conditions overwhelmingly opt for the return of incidental findings, particularly those causing disorders that could be somehow prevented or treated (what has been referred to as “useful” information) [34, 35]. Clinical diagnostic sequencing, however, is currently reserved for patients with severe chronic or even life-threatening conditions, and these studies may not be transferable to other cohorts of generally healthy individuals who are on average less educated about genetic disorders.

Special Applications

Pathology

WES is introducing a new paradigm in pathology. Guidelines have recommended incorporating gene sequencing into the autopsy workup of sudden unexpected deaths of the young [36, 37]. However, cost and practical issues have prevented coroners and medical examiners from implementing this policy, considering that almost 100 genes are implicated in sudden death associated conditions. Recently, a clinical report provided proof-of-principle for WES in postmortem genetic testing on a 16-year-old teenager whose autopsy was inconclusive for a specific cardiomyopathy. In this instance, WES analysis revealed the R249Q MYH7 mutation, rendering a final diagnosis of familial Hypertrophic Cardiomyopathy [38].

Prenatal Diagnosis

The presence of cell-free fetal DNA in maternal circulation has been used as a basis for the development of non-invasive prenatal testing by NGS techniques [39]. Apoptosis of trophoblastic cells is currently considered the primary source of cell-free fetal DNA. In fact, up to 10 % of the DNA in maternal serum is of fetal origin [39]. Fetal DNA can be studied by NGS at 10 weeks of gestation, in order to determine dosage differences or gene variants between fetal and reference sequences.

Several studies demonstrated high sensitivity for the detection of trisomy 13, 18, and 21 [40] and sex chromosome abnormalities [41]. RhD genotyping or detection of paternally inherited disease-causing mutations, including Huntington disease and myotonic dystrophy, can achieve a high degree of accuracy [4244].

Haplotype analysis combined with targeted NGS can allow non-invasive prenatal diagnosis for alpha-thalassemia and beta-thalassemia by detecting paternally inherited alleles in the maternal plasma [45••].

However, although several preliminary studies are promising, limitations for non-invasive prenatal screening include the possibility of a false-positive result due to mosaicism confined to placenta (placental DNA may not correspond to fetal DNA) and the restricted number of diagnosable chromosome abnormalities.

Copy Number Variants

In addition to single nucleotide variants and small InDels (insertion or deletions of a small number of bases), WGS and WES are being investigated as tools to detect CNVs, which are currently the domain of array-CGH and SNP chips. Intragenic deletions are not consistently identified by clinically available microarrays, with the exception of targeted custom arrays that are specifically designed towards a region of the genome [46]. If implemented, CNV detection by NGS would streamline genetic diagnoses by reducing the number of performed tests. More complex rearrangements, such as chromosomal translocations, are also amenable to be studied [47].

Unfortunately, WGS costs are still too prohibitive for routine clinical applications, and methods designed for WGS are not readily applicable to WES. Moreover, the enrichment process introduces technical biases [48].

Several algorithms are being designed for CNV analysis of WES data. Currently, none outperforms array-CGH, which is considered to be the gold standard. Overall, the available tools appear to have high false positive rate, low sensitivity, and duplication bias when compared to the array-CGH platform [4952]. Interestingly, some of these algorithms were shown to reliably call rare (Minor Allele Frequency <1 %) and small (1–30 kb, and three or more exons) exonic CNVs from Autism Spectrum Disorder WES, which are among the CNVs less well covered by clinical array-CGH [53•].

In addition to WGS and WES, a custom designed targeted NGS assay was used successfully to discover CNVs and non-coding mutations of 55 Retinitis Pigmentosa and Leber Congenital Amaurosis genes in 126 patients [54].

Conclusions

Recent advances in sequencing technology have transformed the pace of scientific discovery. NGS sequencing encompasses many challenges, both technical and related to study design. Research applications have included rare Mendelian monogenic, or single gene, disorders, as well as common conditions. Clinical NGS is mostly applied to the diagnosis of patients likely to be affected by genetic disorders but without a known etiology. While NGS is still the purview of few specialized centers, the reduction in costs is making it a viable diagnostic option for clinicians.