FormalPara Key Points

Exome reanalysis should be a routine clinical practice, as it may yield additional diagnoses, primarily due to novel gene-disease discoveries, updated clinical features, and improved bioinformatics tools.

Challenges exist for both physician-initiated and laboratory-initiated exome reanalysis, and collaboration between clinical laboratories and clinicians is critical for its success.

Better incorporation of automated workflows will greatly benefit the long term sustainability of exome reanalysis.

1 Introduction

Clinical exome sequencing (CES) is now routinely used for the diagnosis of rare genetic conditions and has a reported diagnostic yield of 25–40% [1,2,3]. Unlike many clinical tests, such as a blood lipid profile that requires periodic resampling as part of an individual’s medical care, germline diagnostic genetic testing is often viewed as a “once-in-a-lifetime” test, as the analyte (the germline genome sequence) is presumed to be invariant over time. However, changes in the interpretation of results from complex clinical genomic tests are inevitable, as technology, bioinformatics pipelines, and medical knowledge about variant-gene-disease associations evolve and expand over time. The reassessment of existing exome sequence data provides an opportunity to identify previously unknown genetic causes for a given patient’s clinical phenotype. Ultimately, such efforts may result in changes to the clinical management of the patients and families. Several previous studies have demonstrated the clinical validity of exome reanalysis to increase the diagnostic yield [4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19], and the American College of Medical Genetics and Genomics (ACMG) recently published a series of points to consider regarding the reevaluation and reanalysis of genomic test results at various levels [20]. While challenges remain, it is important for both ordering physicians and clinical laboratories to recognize the need for exome reanalysis as a routine clinical practice, given the evolving nature of the genomic field. Here, we highlight critical considerations and benefits of CES reanalysis. Furthermore, we propose a model for the future that could maximize the clinical utility of CES reanalysis by integrating information from electronic medical records and knowledge databases into the reanalysis workflow.

2 Benefits of Exome Reanalysis

Diagnostic tests are rarely performed multiple times, as a negative result typically rules out one or more specific diagnoses. Screening tests, such as a complete blood count (CBC), are often performed iteratively, and interpretation of CBC results can depend on a patient’s rapidly changing clinical status. Unlike most diagnostic tests, a negative CES result does not rule out a genetic disorder; instead, it indicates that a diagnosis could not be identified, given the clinical and gene-disease information available at the time of analysis.

In theory, the majority of negative CES cases fall into three categories: (1) those where the cause for the patient’s phenotype is non-Mendelian such as oligogenic disease, or non-genetic such as environmental factors and infection; (2) those where the disease-causing variant is located outside of the analyzed genomic region, for example, the causal variant is located in a deep intronic or regulatory region that was not covered during CES testing, or the disease-causing variant is structural, for example, an inversion event that results in gene disruption, or genomic findings are not tractable or confidently identified by exome, for example, repeat expansions; and (3) those where the disease-causing variant(s) exist within the exome, but the tools and knowledge available at the time of the initial analysis preclude recognition of the diagnosis. For the last category, it is expected that exome reanalysis would increase diagnostic yield. Based on previously published studies, exome reanalysis results in an increase in diagnostic yield of approximately 12%, with reported increases ranging from 5 to 26% [4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19]. Although the reported interval between initial analysis and reanalysis varies from 6 months to 7 years among reanalysis studies, the majority of reanalyses were performed at 1- to 2-year intervals [4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19]. Larger studies may be helpful for defining a standard practice for the timing of reanalysis, taking into consideration the evolving rate of novel gene-disease and variant-disease discovery versus cost and labor required for reanalysis. The phenotypes included in these studies were diverse; however, cases with congenital anomalies, intellectual disability, epilepsy, and other neurological phenotypes were most commonly included. These data suggest that reanalysis results in a considerable increase in diagnostic yield and demonstrates the need for periodic interrogation of exome data in routine clinical practice.

In line with the rapid pace of improvement in knowledge of gene-disease associations, the majority of new molecular diagnoses result from newly discovered disease genes [4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19]. As of 8 February 2021, the Online Mendelian Inheritance in Man (OMIM) database consists of 6809 phenotypes for which a molecular basis is known and also contains 4380 genes with disease-causing variants. The disease-causing gene and variant entries are approximately four times greater in number than they were 17 years ago [21] . Since 2014, OMIM has added approximately 300 new phenotype entries per year and the total number of phenotypes and genes continue to grow, in part due to the application of large-scale sequencing efforts in the clinical diagnostic setting [22]. Even for a previously known disease gene, the phenotypic spectrum may expand over time and result in the recognition of an association between a previously overlooked variant and the patient’s reported clinical features. The integration of additional clinically observed phenotypes into exome reanalysis may result in an expanded gene list for analysis and/or an association with the clinical phenotypes in previously analyzed genes.

A new molecular diagnosis can also result from an upgraded classification for variants in known disease genes. A variant of uncertain clinical significance (VUS) can be reclassified as pathogenic or likely pathogenic with additional evidence, such as new functional data and new test results that were not available during the initial analysis. A common scenario that has been illustrated in several previously published studies is that, a candidate variant is confirmed to be de novo after follow-up parental studies and is reclassified to likely pathogenic or pathogenic using the ACMG variant classification guidelines [23] . Similarly, a heterozygous VUS may be upgraded to likely pathogenic when detected in trans with a pathogenic variant for a recessive disorder after targeted parental studies demonstrate biparental inheritance of the two variants. For the same reasons, the yield of trio- and/or family-based exome sequencing is generally higher than singletons or duos, partially due to the ability to highlight de novo and compound heterozygous (biallelic) variants [1].

There is no doubt that improved bioinformatics tools contribute substantially to new genetic diagnoses. Many improvements throughout the process can be made over time including variant calling particularly for INDELs [24], variant annotation, as well as incorporation of up-to-date gene-disease-variant databases and variant population frequency databases. In addition, copy number variants (CNVs), including intragenic exonic deletions and/or duplications, also account for a subset of genetic disorders [25]. While the sensitivity and specificity of CNV detection from exome data may vary among clinical laboratories, the tools used for detecting CNVs are being developed, and are now available and may not have existed at the time of initial analysis.

Other contributing factors include multiple molecular diagnoses with blended phenotypes, identification of a genetic cause that may have been overlooked due to low exome coverage, recognition of synonymous variants affecting gene splicing that were filtered out at the time of the original analysis, as well as international and external data-sharing efforts to aid in the interpretation of genes with unknown function. Although uncommon, misinterpretation of a previously analyzed variant/gene can potentially be identified in the process of exome reanalysis. A significant amount of time and effort is needed to manually evaluate hundreds of genetic variants per exome case and often requires a multidisciplinary team’s input, particularly in the area of clinical correlation, and is often required for the best possible interpretation; it may not be surprising that a molecular diagnosis can be “missed” during the initial analysis.

3 Advantages and Challenges Associated With the Initiation of Exome Reanalysis

The expansion of collaborations between clinical laboratories, clinicians, and researchers is a driving factor contributing to the finding of new insights through the reanalysis of CES data. At the present time, exome reanalysis is considered to be a shared responsibility among the ordering health-care provider, the clinical testing laboratory, and the patient. Most commonly, exome reanalysis is initiated by the clinician on a patient-by-patient basis, or by a clinical laboratory on a cohort level [4,5,6,7,8,9,10,11,12,13,14,15,16,17,18].

3.1 Clinician-Initiated Individual Exome Reanalysis

Typically reanalysis is ordered as part of a clinical encounter at a clinician’s office. Reanalysis can be ordered by a healthcare provider for the following reasons: (1) Evolving clinical presentation: An exome-based analysis typically leverages a patient’s phenotypic features to prioritize variants in genes known to cause diseases clinically correlated to the patient’s phenotype. For example, a variant may be excluded from analysis and/or reporting if a commonly observed clinical feature of that disease is not observed in the patient. Thus a previously tested patient with a negative result who later presents with one or more additional features may be a good candidate for reanalysis. The ability of the additional new features to influence diagnosis upon reanalysis would depend on the number, severity, and presentation of new features such as whether the features represent major changes, such as an additional organ system malformation or such as a minor biochemical abnormality. These updated clinical phenotypes may ultimately result in an enhanced clinical correlation. The evolving presentations, especially in pediatric patients, can also help assist in the diagnoses of complex blended phenotype [26]. (2) Updated family history: Clinical assessment and genetic testing of family members can be particularly helpful in prioritizing or excluding variants. Segregation analysis can be performed in scenarios where additional family members have been diagnosed with the same condition as the proband or vice versa. Incorporation of genetic testing results from additional family members, such as siblings and parents, is another reason to initiate a reanalysis request in order to get an updated report. (3) Sufficient time has elapsed since the last analysis: New information on gene-disease associations and new published evidence can lend itself to reclassification of variants. A clinician may consider ordering reanalysis at a certain point after the initial analysis without an informed diagnosis. In this case, “sufficient time” is subjective and can be dependent on the ordering clinician and is influenced by the clinical presentation. For example, novel genes related to intellectual disability are discovered on a regular basis and an annual reanalysis may be acceptable. However, for a rare disorder with no gene association, periodic reanalysis can be spaced accordingly. Other factors that influence the number of reanalyses per patient include access to insurance, the ability to self-pay, and the number of reanalyses, if any, provided by the laboratory at no cost. (4) New gene discovery of interest: For a specific clinical presentation, a new gene discovery can trigger reanalysis if the patient’s phenotype is consistent with the new gene-disease association. (5) VUS or candidate genes reported in the initial exome analysis: A reanalysis can be initiated to check for updates on new information specifically for what was already reported such as a VUS or a previously reported candidate gene with an uncertain link to human disease. (6) Changes to the laboratory test: If a provider is made aware of an updated exome test, for example, a new exome platform, an updated bioinformatics pipeline, and additional capability in detecting copy number variants, they may choose to order re-analysis.

The advantages of a clinician-initiated reanalysis are that the laboratory is more likely to receive an updated clinical phenotype and family history that can be helpful at the time of reanalysis. During follow-up visits, clinicians may find there are additional affected family members and relatives. By submitting their samples for additional genetic testing, the results can be used for family segregation and further clarify the significance of any VUS.

The major limitations are that this approach is often ad hoc and each patient does not receive a systematic reevaluation. In the absence of a deliberate reevaluation strategy of all previously tested negative patients, there is a risk of cases with a “hidden diagnosis” staying in the clinic and in the laboratory’s internal dataset. The second limitation is potential overutilization of reanalysis for a plausible genetic finding, for example, a copy number test for a patient that already has a diagnosis that explains the majority of the patient’s reported phenotype. In such cases, laboratories rely on the expertise of the ordering clinician to determine if such additional testing is warranted.

The role of the clinician/ordering provider is critical in the reevaluation process, and it is important to educate clinicians to improve opportunities for leveraging reanalysis including: (1) the benefits of clinical reassessment and systematic reevaluation to increase the likelihood of identifying a genetic diagnosis; (2) education on the various exome reanalysis options available by testing laboratories; (3) education on the improvements in the process of exome analysis, such as copy number variants detection in the bioinformatics pipeline that may increase the diagnostic rate; (4) reassessment of new literature that could provide evidence for the VUS present in the patient’s exome data. Even if the patient’s disease phenotype remains static, clinicians may still request an exome reanalysis that may lead to a positive finding. The pretest probability of achieving a diagnosis varies vastly among different clinical indications. An exome reanalysis may not be the best option for some cases after a negative exome analysis, while it could remain a good choice for others. When such requests are ordered, the laboratory may determine whether the laboratory should reprocess the original raw data, whether it is sufficient to reanalyze the previous variant calls, or simply reevaluate the previously reported variants.

Socioeconomic factors affecting the family can also impact access to reanalysis. Depending on the laboratory policy, there may be a fee associated with exome reanalyses and that would limit access to this test since reimbursement and insurance payment may be a consideration factor for patients. Additionally, in certain states/institutions, the wording of the CES consent form signed by the patient or family may not include the permission to perform reanalysis, thus clinicians may need to re-consent the patients before reanalysis can be performed. Further, physicians may not have the access to all variants detected in their patients that were not listed on the initial exome report. If physicians are interested in certain genes in which the conditions may be suggestive of the patients’ disease phenotype, physicians would have to request the test laboratory to specifically analyze and report the variants in the genes of interest. In any case, closer collaborations between ordering physicians and clinical laboratories is likely to result in increased identification of molecular diagnoses, a win-win situation for patients and their families, physicians, and clinical laboratories alike.

3.2 Laboratory-Initiated Reanalysis

A laboratory may initiate reanalysis that is targeted on a subset of patients (e.g., based on a phenotype) [6, 8, 14] or an entire cohort [11]. There are several common contexts in which a laboratory might consider performing CES reanalysis. Such situations may include: (1) reassessment of patients using up-to date curated gene-disease information, such as in the context of reanalysis of a group of patients using an updated disease-specific panel or exome slice gene list; (2) availability of semi-automated software tools, which can reduce the effort required to reanalysis and triage cases most likely to have a previously unidentified molecular diagnosis [11]; (3) technical updates that allow detection of variants and/or variant types not available for evaluation during initial analysis, for example, improved INDEL calling or CNV detection [24, 27]; (4) reconciling conflicting variant classifications between or within internal and/or external databases [28] .

When exome reanalysis is initiated by a clinical laboratory, existing exome-sequencing data are often utilized along with the historical phenotype information. Typically, there is no new exome wet-lab work involved in this process. However, an updated bioinformatics pipeline or novel software tools can be applied to quickly screen previously undiagnosed exome cases. An up-to-date gene-disease database as well as newly available population databases are required to maximize the potential diagnostic yield from the reanalysis.

Routine reevaluation of a clinical laboratory’s entire exome database of variant classifications may be impractical or unfeasible for many laboratories that lack resources. However, depending on the bioinformatics setting, laboratories can use multiple approaches to achieve an additional diagnosis without a formal order from the clinical care provider. A less complex approach is to use the patient’s genomic position information and to search for a reported pathogenic or likely pathogenic variant in variant-disease databases, such as ClinVar or HGMD. Using HPO-term-based tools such as PhenoTips is another popular method to link patients’ variants located within known disease-causing genes to diagnoses which fit the phenotypic presentation of the patient. A third approach identifies primary literature relevant to a given cohort member utilizing genotype data, phenotype data, and one or more text-mining tools. Online tools, such as Exomiser, can be used to generate a phenotype match score and/or variant rank. Such approaches can be used independently or in combination with variant level evidence, such as allele frequency or segregation. The prioritized results can then be reviewed at the case-level by clinical laboratory personnel to determine whether any of the prioritized findings could explain the patient’s reported phenotype.

It is easier to generate CES data than to process, analyze, and interpret it. A multidisciplinary team in a clinical diagnostic setting would enable this critical effort.

Reanalysis initiated by a clinical laboratory can be more systematic than clinician-initiated reanalysis, as laboratory-developed semi-automated approaches can be applied across patient cohorts [11]. The improved use of information technology systems with up-to-date databases for patients at a cohort level can reduce the time and labor required for reanalysis. A genetic diagnosis may be uncovered without a clinical request to be received. Efforts should focus on automating some of the analysis when feasible and appropriate. However, the setup of such infrastructure itself is labor-intensive, and it requires an understanding that a non-trivial resource allocation is needed to perform reanalysis on a regular basis. Laboratories cannot deliver an infinite number of time- and labor-intensive services without adequate financial support. It is challenging to acquire updated clinical information when reanalysis is performed on a cohort level; therefore this reanalysis approach is based on the assumption that the patient’s disease phenotype has remained static. This can be a limitation especially in the pediatric setting where clinical presentations can evolve over time. Finally, an ethical obligation, based on the principle of beneficence, requires laboratories to attempt to recontact the ordering physician and patient in circumstances that may meaningfully alter medical management; however, doing so can be challenging. This is particularly true in situations where the original ordering clinicians(s) no longer follow the patient or have moved to another institution. Thus, it would be prudent for the clinician to inform the patient prior to exome sequencing that the interpretation and results have the potential to be updated and that it is important for the patient to provide up-to-date contact information [29]. The development of laboratory policies regarding reanalysis, including those regarding cost and turnaround time, are necessary for reanalysis to occur smoothly.

4 Future of Exome Reanalysis

The integration of genomic medicine into clinical care has changed routine clinical practice. Reanalysis of CES data represents a powerful and cost-effective approach to identify additional diagnoses and provides opportunities for improved patient care. CES reanalysis is expected to benefit a growing number of patients with previous negative CES testing and is an important consideration for both the clinical team as part of routine medical care of patients who have had CES and the laboratory as a contributor to the discovery and characterization of new gene-phenotype correlations.

We recognize that a greater incorporation of automation is needed for the reanalysis process. While challenges remain, we propose a reanalysis model in which many interactions between clinicians and laboratories are initiated and performed through electronic health record (EHR) systems, such as Epic or Cerner (Fig. 1a). When significant new clinical information becomes available, the clinician may choose to notify the laboratory through the EHR system. The EHR system may send notifications to the genetic testing laboratory that include the updated information, such as newly manifested phenotypes, pedigrees, or additional laboratory testing results.

Fig. 1
figure 1

Future of exome reanalysis

It is expected that deep phenotyping will result in more accurate variant analysis. Extraction of meaningful information from the EHR has always remained a challenge for both clinicians and laboratories. Studies have shown that facial analysis technologies may represent a promising non-invasive approach in syndrome recognition with great sensitivity and specificity for some genetic disorders [30, 31]. Tools such as Face2Gene use computer vision and deep-learning algorithms to identify and quantify similarities within and differences between hundreds of syndromes (Fig. 1b). On an experiment reflecting a real clinical setting problem, a facial image analysis framework achieved 91% top-10 accuracy in identifying the correct syndrome on 502 different images [30]. Such facial image recognition can be used to identify potential rare genetic disorders prior to exome reanalysis. In addition, software tools such as ClinPhen can automatically extract and prioritize patient phenotypes directly from medical records, convert the information to standardized vocabulary of phenotypic abnormalities with HPO terms, and feed the HPO terms into an exome reanalysis pipeline to accelerate genetic disease diagnosis (Fig. 1b) [32]. It is not inconceivable that a dedicated clinician can process 200 patient records in a typical week; however, ClinPhen can do the same in 10 min [32]. Use of such systems could be leveraged to update clinical information and facilitate rapid and efficient decision making by the clinician, while greatly reducing the manual effort required to review patients’ clinical history. Although such computational tools may not always have the sensitivity and benefit of human experience and clinical judgment, the practical utility of these systems may outweigh those risks.

One benefit of having genomic data available for reevaluation is for rapid assessment of new disease genes as they are reported. Revisiting genomic sequencing data in light of novel disease gene associations becomes a necessary component of exome reanalysis (Fig. 1c). When information for novel disease-causing genes and variants is available, an automated or semi-automated system in the laboratory can interrogate existing patients’ genomic data to automatically search for matching results. The laboratory would then be notified when a plausible match is identified, which may trigger a formal manual review, and therefore ultimately result in a new genetic diagnosis requiring a minimal effort input. Conversely, a conflicting classification from another clinical laboratory in the ClinVar database for a previously classified and reported variant could automatically flag the variant for further review, including potential reclassification.

In the context of exome reanalysis, genomic data-sharing efforts that aim to improve variant interpretation and collaborative curation to characterize and disseminate the clinical relevance of genomic variation are paramount. For example, ClinVar and ClinGen, two NIH-based efforts, have formed a partnership to improve the knowledge of clinically relevant genomic variation. ClinVar serves as a repository for clinical assertions about genomic variants and their association with disease [28]. ClinGen Expert Panels review variant-disease assertions, as well as data in the primary literature describing these variants, and submit their standardized interpretations to ClinVar as expert-reviewed records. Such expert-reviewed records are especially suited for use in automated workflows, as these assertions have been evaluated in accordance with ClinGen working group standards and are less likely to result in spurious or false positive genotype-phenotype calls.

We expect that data sharing will play a larger role in the future of CES reanalysis (Fig. 1d), though data must be shared in a manner that protects each individual’s privacy and is legally compliant. Achieving the ideal balance of sharing versus privacy is particularly challenging for genomic data, as each genome is the ultimate identifier of its owner and DNA samples therefore can never be truly anonymized. While institutional review board (IRB) approval of informed patient consent and understanding of HIPAA requirements are essential, setting up an open source database across institutions has become increasingly feasible. A flexible and powerful computing platform for data management is critical, although standardizing data formats may represent a practical challenge. Leveraging robust API solutions may be a potential future direction. From the exome reanalysis perspective, a cohort of unrelated individuals in multiple institutions suspected of having a similar presentation could be evaluated and compared with thousands of unaffected individuals, for example, from the Genome Aggregation Database (gnomAD), as well as to those comprehensive disease databases. One can imagine that, utilizing the comprehensive, sophisticated infrastructure described above, a molecular diagnosis could be identified with minimal human input.

In the near future, we anticipate computational tools utilizing artificial intelligence (AI) methods that scour large volumes of information from scientific studies and databases will be applied to the analysis of in-house clinical genomic sequencing results. Large genomic datasets, collections of annotated medical images, clinical notes, and functional datasets can be used for AI algorithm training. Currently, the applications for AI in extraction of deep phenotypic information from images, EHRs, and other medical devices to inform downstream genetic analysis appear to be promising [33, 34]. Such approaches would support a proactive medical IT system that continuously learns and is consequently able to provide valuable insights into existing data. For the purpose of reevaluation of genomic data, AI may draw associations between phenotype and genotype data, as well as knowledge from genomic databases and updated publications, and therefore be able to generate a potential genetic diagnosis. Such methods are already in development, though, to our knowledge, none are part of routine clinical testing workflows [35]. Ultimately, an AI-based system may have the potential to improve the performance and work efficiency in exome reanalysis.