Introduction

The adoption of copy number variation (CNV) analysis by the clinical diagnostic laboratory has had a major effect on the field of medical genetics. It has helped refine genotype-phenotype relationships in known disorders and has led to the discovery of new syndromes [1]. Systematic CNV analysis in large populations has begun to reveal the frequency and the effect of this variation in the human genome.

CNV within the genome is widely recognized as a source of disease. CNVs that involve genomic fragments containing one or more dosage-sensitive genes can result in genetic disorders and complex diseases, including autism, cancer, immune deficiency, and neurodegenerative and neuropsychiatric disorders [28]. However, apparently healthy individuals also have a significant number of CNVs within the human genome that seem to have no association with adverse phenotypic outcomes [818].

The term copy number variation refers to a difference in the dosage of genes or genomic fragments when compared with a reference human genome. It was originally used to describe genomic fragments that ranged in size from at least one kilobase to several million bases and had a variable copy number [12, 19]. Higher resolution CNV analysis has revealed the existence of increasingly smaller CNVs (100 to 1,000 bp) in the human genome [20, 21]. CNVs usually result from structural genomic alterations such as a deletion (loss), a duplication (gain), an insertion (usually a gain) or unbalanced translocations/inversions that may lead to either loss or gain of sequences near the breakpoints [12].

Both recurrent and unique (non-recurrent) CNVs are described and each class of CNV is mediated by a different rearrangement mechanism. Recurrent CNVs are usually flanked by low copy number repeats or segmental duplications. This allows for recombination between large, identical blocks of sequences (duplicons) by a process referred to as non-allelic homologous recombination [2224]. Non-recurrent CNV formation is less well understood, although breakpoint analysis in several non-recurrent CNVs has suggested the involvement of non-homologous end joining, and replication-based mechanisms such as fork stalling and template switching [2528].

Microarray-based CNV detection has been universally adopted in clinical diagnostics owing to the comprehensive nature of genome-wide analysis [29]. Despite the great advances made in CNV detection technologies, there are still several limitations to genome-wide CNV analyses that may affect their clinical utility. In this review we discuss the advantages and limitations of CNV discovery in the clinical diagnostic laboratory with specific emphasis on the impact of CNV analysis on the clinician in both the prenatal and the postnatal setting.

Variation in healthy individuals and in disease

The role of CNVs in genetic syndromes has long been recognized, with recurrent microdeletion/microduplications detected in syndromes such as Prader-Willi [30], Smith-Magenis [31] and Williams-Beuren [32]. However, initial studies utilizing microarray-based analysis focused on phenotypically normal individuals and identified large CNVs that did not appear to be associated with a genetic disease [9, 10]. The recent expansion of microarray-based CNV analysis has led to a better appreciation of the extent of CNV-based variation within the genomes of apparently healthy individuals. These initial studies using microarrays with limited coverage predicted as many as ten CNVs per individual [9, 10, 16], but, as the resolution of the detection technology has improved, the number of CNVs detected within an individual's genome has continued to grow. Thus, more recent studies have estimated that individuals are hemizygous for approximately 30 to 50 deletions larger than 5 kb [11] and the frequency of CNVs may be greater than 100 per individual [12, 15]. The accumulating CNV data from healthy controls has led to the establishment of public databases such as the Database of Genomic Variants and NCBI's Database of Genomic Structural Variation (Box 1).

There are a number of ways by which CNVs can result in a disease phenotype. In the most common scenario, a deletion or duplication alters the genomic copy number of dosage sensitive gene(s) [33, 34]. Alternatively, a deletion either within or encompassing a gene sensitive to haploinsufficiency can have the same effect as a disruptive point mutation within the gene. Further, CNVs may result in monogenetic diseases by altering the expressions of genes flanking the CNV due to the disruption of regulatory elements [35, 36]. CNVs have also been shown to be a source of mutation in autosomal recessive disorders, in which a deletion of one allele of the gene in combination with a point mutation on the other allele results in the disease phenotype. Recently, three individuals with unexplained intellectual disabilities (IDs) were found to have Cohen syndrome, an autosomal recessive disorder, after CNV analysis. All three patients had a CNV that deleted the COH1 gene and a second pathogenic point mutation was subsequently identified on the other allele [37]. All of these possibilities can lead to difficulty in differentiating a pathogenic from a benign CNV, further complicating CNV data interpretation.

Over the past decade, based solely upon the increased clinical use of microarray-based CNV analysis, the list of recurrent and non-recurrent CNVs associated with disease phenotypes has continued to grow. This has led to the discovery of many new microdeletion and microduplication syndromes, such as 1q21.1 microdeletions [3840] and 15q13.3 microdeletion/microduplication [4145]. These novel syndromes, combined with the ever-expanding literature on rare, one-off CNVs associated with disease phenotypes, highlight the significant involvement of CNVs as a causative mutation in genetic diseases. Clinically relevant CNVs can be found in databases such as DECIPHER, ECARUCA and the International Standards for Cytogenomic Arrays Database (Box 1). As clinical laboratories adopt CNV analysis, these resources will become invaluable for the clinician to discriminate pathogenic from non-disease associated CNVs.

CNV detection by high-resolution microarray

A number of technologies can detect copy number gains and losses throughout the human genome and their resolution is continuously increasing. These are either targeted assays for specific genomic regions or known disease genes using techniques such as real-time PCR [4648], multiplex amplifiable probe hybridization [49, 50], multiplex ligation-dependent probe amplification [5052] or genome-wide assays using high-density microarrays [24, 29], and more recently high-throughput sequencing (HTS) technologies [53, 54].

High-resolution microarray-based CNV analysis provides a method to detect structural genomic alterations. It is useful for uncovering microdeletions and microduplications as well as novel CNVs that are undetectable by standard karyotype analysis or fluorescence in situ hybridization (FISH) [55]. CNV analysis is typically performed using two types of microarray: either array-based comparative genomic hybridization (aCGH) or SNP-based microarrays (SNP-arrays) (Table 1). aCGH and SNP-arrays are both efficient tools for researchers and clinicians. To decide which is most suitable for a certain application, several factors need to be considered, including resolution desired and ability to customize probe content.

Table 1 Array-based comparative genomic hybridization versus single nucleotide polymorphism array

Both aCGH and SNP-arrays can detect low-level mosaicism, which would be missed by traditional cytogenetic testing and may provide a more accurate measurement of mosaicism level [5658]. Although mosaicism can also be identified through FISH analysis, typically it has to be suspected before FISH analysis is performed. Thus, microarray-based analysis is especially useful in cases when mosaicism is not clinically suspected and therefore would not have been screened for by FISH with an increased number of metaphase counts. Furthermore, SNP-arrays have an additional advantage that they may help determine whether the mosaic cell line originated from a meiotic or mitotic event [59, 60].

In addition to determining copy number alterations, the genotype information provided by SNP-arrays allows the identification of copy number neutral loss of heterozygosity (LOH). This helps in identifying regions that are homozygous due to segmental uniparental disomy or parent of common origin effect, both of which can result in a disease phenotype if a disease gene within the segment is mutated or silenced by imprinting [61, 62]. LOH has also been successfully used to identify candidate genes, especially in families with known consanguinity, as segments of homozygosity by descent may indicate a region containing a gene with a homozygous mutation [6365].

Limitations

Despite the proven utility of microarryas in CNV detection, there are noted limitations in the clinical application. Pathogenic duplications are less commonly identified through clinical microarray-based testing than pathologic deletions. This may be due partially to technical limitations of identifying small duplications [66, 67], although duplications are also associated with milder phenotypes, which may not result in CNV analysis, and duplications are often inherited from one parent and may be assumed to be benign as a result [68]. It is likely that this contributed to the dogma that deletion of genomic material is more likely to result in disease, and may have also introduced a bias that has probably led to an underestimation of copy number gains.

Further, microarray-based analysis will not detect genomic alterations that do not result in changes in the amount of genetic material (copy neutral alterations), such as balanced translocations and inversions. Inversions can be directly associated with disease such as the inversion in the F8C gene, which is found in approximately 50% of individuals with hemophilia A [69, 70]. Such disease-causing inversions will not be identified through CNV analysis. Inversions can also predispose to genomic rearrangements. Inversions have been noted at an increased frequency in parents of children with Williams-Beuren syndrome [71] and in the mothers of children with Angelman syndrome [72].

As discussed above, an important disease mechanism in any CNV is the possible interruption of a critical gene. This is true of both small CNV and large genomic alterations. Balanced translocations can also result in a disease phenotype if the breakpoints of translocation reside within a coding region or ultraconserved element. Although the mechanism for disease may be similar, due to the neutral copy number change associated with a balance translocation, the etiology for disease will go unrecognized by microarray-based CNV analysis. In a study of 36,325 patients with idiopathic ID, 0.78% of individuals were found to carry copy-number-neutral genomic alterations. These deleterious genomic alterations were identified through chromosome analysis and were not detected through microarray-based testing [73]. Important disease mechanisms may go undiagnosed and may be underestimated if only microarray-based CNV analysis is performed. Other strategies, such as karyotype analysis, may be needed to rule out translocations and/or inversions as the underlying mechanism of disease etiology.

Examples of the utility and impact of clinical CNV analysis

Improvements in traditional cytogenetic techniques have helped identify a number of cytogenetic anomalies associated with ID [74, 75]. Yet in comparison with traditional cytogenetic analysis, CNV arrays have a significantly increased diagnostic yield [76, 77] and studies using microarray testing have identified pathogenic CNVs in approximately 10% to 20% of individuals with idiopathic ID [73, 7880]. The clinical laboratory has quickly adopted microarray-based CNV analysis as the recommended first tier of testing for individuals with non-syndromic ID, autism spectrum disorders (ASDs), and multiple congenital anomalies (MCA) [29, 81].

The use of microarray testing in individuals with ASD and MCA has significantly increased the rate of identification of an underlying genetic etiology. In a study of 852 subjects with ASD, an underlying diagnosis was established in 0.46% of cases through fragile × testing, 2.23% through karyotype analysis, and 7.0% through microarray analysis [82]. Similar to the studies of individuals with ID and ASD, detection of genomic alterations in MCA through microarray analysis was significantly higher than in traditional cytogenetic analysis [83, 84]. Establishing a genetic diagnosis is important as it can lead to appropriate referrals for therapy, surveillance for other organ involvement, end the diagnostic odyssey and provide accurate information for genetic counseling [85, 86].

The use of CNV analysis has extended beyond the diagnosis of ID, ASD and MCA, allowing for a broader understanding of known disorders and the identification of new syndromes. Before microarray-based analysis, the diagnosis of microdeletion or duplication syndromes required either a visible submicroscopic deletion, such as in Miller-Dieker syndrome [87], or disease-specific FISH analysis. Recognition of disparate phenotypes within a specific syndrome remains a difficult challenge for the clinician. The unbiased nature of CNV analysis has allowed the diagnosis of known microdeletion and duplication syndromes with a wide phenotypic spectrum, which may have otherwise gone undiagnosed [88]. One example is Potocki-Lupski syndrome, which results from a duplication of 17p11.2 and is characterized by developmental delay and variable congenital anomalies. Although Potocki-Lupski syndrome was clinically characterized previously [89], the use of microarray-based testing has greatly improved the characterization of phenotypic spectrum and molecular analysis of the duplications found in the syndrome [90].

With the increased utilization of CNV analysis, new microdeletion and duplication syndromes have been identified. For example, screening of more than 10,000 patients with developmental disabilities through CNV analysis revealed seven patients with a similar phenotype (dysmorphic features, midline defects, seizures and developmental delay) and a microdeletion at 1q41q42 [91]. Isolated cases were previously reported with a similar microdeletion [92, 93], although microarray-based CNV analysis provided a means to appreciate the phenotype and insight into pathophysiology of the recurrent microdeletion [9497].

Clinical CNV analysis typically includes known microdeletion and duplication syndromes, although it can also be designed to include single gene disorders. Exonic or multiexonic CNVs within dosage-sensitive genes represent a significant percentage of mutations in monogenetic disorders [98]. As a result, exonic arrays (microarrays designed to evaluate CNV within transcribed regions) have been designed to target the exon of a single gene, exons within a group of specific disorders, or genome-wide exonic coverage [99, 100]. Intragenic deletions/duplications have been identified in a number of autosomal dominant, X-linked recessive and X-linked dominant disorders, many of which would have been missed by non-exon targeted arrays and traditional sequencing methods [21].

Although CNV analysis is often used to evaluate a non-discrete phenotype such as ID, the high frequency of CNVs within monogenic disorders allows researchers to use microarray-based platforms for candidate gene selection. CHARGE syndrome has a distinct phenotype, follows an autosomal dominant inheritance pattern, and, until recently, had an unknown genetic etiology. A number of strategies were used to identify a candidate gene [101] and aCGH analysis in a patient with CHARGE syndrome revealed a microdeletion containing the CHD7 gene [102]. Similar strategies have been used to identify candidate genes in other diseases, such as NBPF23 in neuroblastoma [103], TCTE3 in congenital diaphragmatic hernia [104], TUSC3 in non-syndromic ID [105], and MTUS1 in familial breast cancer [106].

Challenges in interpretation of CNV analysis

CNV analysis is now routinely utilized for the diagnosis of an individual or family and the results are used for genetic counseling and clinical management [107]. Findings from clinical CNV analysis can often provide results that are unintended or difficult to interpret. The ability to detect CNVs has far outpaced our ability to discern their role in disease.

Initially, clinical CNV analysis was designed to limit the number of ambiguous findings and maximize the relevancy of results. This included limiting genomic coverage to known deletion and duplication syndromes [108110], as well as using probes, such as bacterial artificial chromosomes (BACs), that could be confirmed through traditional cytogenetic and molecular techniques [111, 112]. Although there may still be a role for such targeted arrays, the majority of clinical laboratories have adopted platforms with increasing genomic coverage, which increased detection of both deleterious CNVs and variants of unknown significance.

As discussed above, CNVs can result in disease pathology through multiple mechanisms. This makes the interpretation of novel CNVs difficult, and there appears to be wide variability in reporting the clinical significance of CNV results. In one study, 13 CNVs (from both BAC and oligo arrays) were sent to 11 different clinical laboratories. The laboratories designated each CNV as one of the following: normal, likely benign, uncertain clinical significance or abnormal. In none of the 13 cases was there unanimous agreement over the clinical significance of the CNV [113]. A number of guidelines are available to aid clinical laboratories in reporting the clinical significance of CNVs [111113], and these include consideration of known contiguous gene syndromes, the size, dose and inheritance pattern of the CNV, genomic content within the CNV, and comparison of the medical literature and CNV databases.

As for any sequence or genomic variant, it is important to determine whether a CNV is inherited or de novo [114], and many clinical CNV analyses utilize parental samples or testing of relatives when possible [115]. In general, microduplications are more likely to be inherited than microdeletions [116], although simply being inherited does not indicate that a CNV is benign. Inherited CNVs may have different and unrecognized breakpoints and mechanisms such as mosaicism, incomplete penetrance and variable expression, which can result in inherited CNVs having significantly different impacts among individuals [115]. For example, both deletions and duplications of 1q21.1 are associated with varying levels of ID, microcephaly, dysmorphic features and congenital anomalies, despite the 1q21.1 CNV being inherited from both mildly affected and unaffected parents [38, 39].

Clinicians and clinical laboratories often rely on available databases and medical literature that examine the phenotype and frequency at which the CNV has been identified. Yet ambiguity remains about the possible pathogenic effects of a number of CNVs. The 15q11 region is prone to unequal recombination events (mediated by low copy number repeats) leading to deletion, duplication and triplication of the genomic region. Duplications and triplications of 15q11 have been associated with autism [117120], drug-resistant epileptic seizures [120123] and schizophrenia [123125]. Despite a number of studies reporting the pathogenic nature of the 15q11 duplication, researchers have also refuted that increased dose of 15q11 alone is sufficient for a disease phenotype [44, 116]. Whether a parent of origin effect or other epigenetic factors contribute to a disease phenotype has yet to be discerned.

Prenatal CNV analysis

Studies evaluating the efficacy of microarray-based CNV analysis have mainly been performed in pediatric populations, although CNV analysis has been performed in spontaneous miscarriages [109], in fetuses terminated for MCA [126, 127] and for prenatal diagnosis [128132]. Indications for pursuing prenatal CNV analysis are similar to the indications for pediatric CNV analysis, and they include an increased risk for chromosomal abnormalities [128, 129] and a family history of an intragenic CNV [133]. Unique to prenatal testing, CNV analysis is also performed as a result of advanced maternal age, an abnormal fetal ultrasound or parental anxiety [134].

Initial concerns over the use of prenatal CNV analysis have faded with the experience of clinicians and the laboratory. Similar to postnatal microarray studies, the use of prenatal microarray analysis has resulted in a significant increase in the identification of genomic alterations. In a recent meta-analysis by Hilleman et al., prenatal aCGH appears to increase the detection rate of fetal chromosomal imbalances by 3.6% over traditional karytoype analysis [135], and, similar to postnatal MCA studies, prenatal CNV analysis will identify a significantly increased rate of chromosomal abnormalities when a structural fetal anomaly is present [135, 136].

Currently, CNV analysis is not recommended to replace traditional prenatal cytogenetic testing [137], although it is often used as a first-tier option for invasive prenatal testing [138, 139]. Prenatal diagnosis allows potential parents to make informed decisions about reproduction. Prenatal CNV analysis extends parental autonomy to decisions over microdeletion or duplication syndromes as well as chromosome abnormalities. Whether microarray analysis is utilized as first-tier testing or in limited circumstances such as fetal anomalies, genetic counseling is a necessary component of prenatal CNV testing [140142].

Genetic counseling

CNV analysis has provided an ability to identify disease-causing alterations in an unprecedented number of diseases and phenotypes. Despite the promise of CNV analysis, testing may reveal a variant of unknown significance, CNVs with incomplete penetrance or variable expressivity, or unanticipated findings such as misattributed paternity. As a result, an emphasis has been placed on the genetic counseling of patients and families undergoing testing.

As noted above, CNV analysis may identify results that are relatively ambiguous, which may make the interpretation of results relatively complicated. This difficulty is intensified in the prenatal setting, as the decision to continue a pregnancy is often made after the results of prenatal CNV analysis are available. Prenatal aCGH analysis may increase the number of variants of unknown significance by 1% to 2% compared with prenatal karyotype analysis alone [135, 138]. The potential of indeterminate results has led many to question the utility of CNV analysis in prenatal diagnosis [143] and has raised concerns on the possible emotional harm on the expectant parents [144]. As a result, informed consent and genetic counseling are paramount prior to undergoing prenatal CNV analysis.

Although delivery of counseling will differ among clinicians, genetic counseling should inform patients and families undergoing CNV analysis of the potential benefit of testing as well as the potential risks of testing, such as variants of unknown significance [128, 141]. Genetic counselors, geneticists and medical geneticists are familiar with discussing such issues, which are not unique to CNV analysis. Genetic counseling remains integral to providing clinical CNV analysis whether it is performed in the postnatal or prenatal setting.

Ethical concerns

The unbiased genomic nature of microarray-based CNV analysis is both a benefit and a concern for the clinician. CNV analysis may also discover information that was not intended, such as CNVs that predispose for adult-onset disorders, regions associated with neoplasia and misattributed paternity. In evaluating patients for developmental delay and/or congenital anomalies, individuals have been noted to have CNVs in cancer predisposition syndromes such as familial adenomatous polyposis [145], Peutz-Jeghers syndrome and Li-Fraumeni syndrome [146].

There is significant debate over disclosing incidental findings in genetic research, although the moral obligation to disclose information about pre-symptomatic conditions or neoplasia syndromes is distinct from disclosing misattributed paternity [147, 148]. Genetic counseling for CNV analysis should include the possibility of unintended or incidental results. Prior to undergoing testing it is important to discuss both policies about disclosure of unintended results and how results will be relayed with the patient. Rarely, CNV analysis may reveal results that have legal consequences. SNP-arrays will also identify consanguinity as a result of LOH. Depending on the age of the parent and degree of relation, consanguinity may indicate sexual abuse [140, 149, 150], and laboratories should have institutional policies concerning the possible legal implications of such testing.

The ethical issues surrounding CNV analysis and HTS technologies are not dissimilar from those of genetic testing in general. The importance of informed consent and minimizing the risk of privacy violations are emphasized due to the vast amount of genetic information generated by CNV analysis. Informed consent for CNV analysis may be difficult due to the nature of the testing as opposed to informed consent for the testing of one single gene. Although it is important to include specific information such as the possibility of identifying pre-symptomatic conditions, informed consent should address both the goal and general methodology of the testing and not every possible result [151].

There has been a recent shift in both clinical testing and research testing to share genetic results and phenotypic data. In part this is due to the need of large data sets to differentiate between benign and pathogenic CNVs. The potential harm of loss of privacy or confidentiality may be increased due to the sharing of data, although the actual risk of privacy violations is hard to estimate [152]. It is also important for the clinician to make the patient aware of the laboratory's policy on data sharing. Potential mistrust of the clinician may occur if the patient is unaware of the potential risk of loss of privacy and discovers that the clinical laboratory shares genetic and phenotypic data [151].

Towards the future: high-throughput sequencing technologies

Although microarray technology is the current mainstay of the clinical cytogenetic laboratory, HTS technologies are a powerful tool for the analysis of genetic variation as well as mutation detection in patient cohorts. The strength of HTS is its potential ability to detect all forms of variation, including single nucleotide variations), small (<50 bp) insertions/deletions, CNVs and copy neutral structural variations on a single platform. HTS technologies hold great promise as the future method of choice for detection of all forms of genomic variation, including CNVs.

Structural variation may be underrepresented regardless of whether microarray analysis or HTS technology is utilized. Currently no single platform or strategy identifies all forms of genomic variation and the choice of method for identifying CNVs may ultimately limit results [153]. The understanding of the benefits and limitations of each technology becomes paramount as use in clinical practice is expanding. We suggest a testing algorithm to identify genomic variation in clinical samples based on available technologies for mutation detection and the clinical presentation of the patient (Figure 1).

Figure 1
figure 1

Clinical copy number variation analysis algorithm. Whole genome or exome copy number variation (CNV) analysis is an accepted first-line screening tool for the evaluation of patients with complex clinical presentations, and also intellectual disability (ID), autism spectrum disorder (ASD) or multiple congenital anomalies (MCA). If negative, whole exome/clinical exome analysis is now clinically available for subsequent analysis. CNVs detected by array-based comparative genomic hybridization (aCGH) or SNP-based microarray (SNP-array) may lead to a diagnosis or point to a coexisting mutation in the affected gene(s) in autosomal recessive (AR) traits. Targeted exonic CNV analysis can similarly be applied to identify deletions in genes with a heterozygous mutation detected by DNA sequencing. The emergence of whole genome DNA sequence analysis on a clinical basis will allow for integrated whole exome CNV/whole exome sequence analysis in the near future. ARMLPA, multiplex ligation-dependent probe amplification; NGS, next-generation high-throughput DNA sequencing; qPCR, quantitative PCR.

Conclusion

CNV analysis is a powerful tool for gene discovery, evaluating pathogenic effects of genomic alterations and establishing a diagnosis for patients with a number of phenotypes. The goal of clinical CNV analysis is designed to identify structural alterations that would establish a genetic diagnosis in an individual or family. A genetic diagnosis may aid in the clinical management of an individual and allows for accurate genetic counseling, including providing recurrence risks and prenatal testing options, and a genetic diagnosis may relieve the family's anxiety surrounding the etiology of disease.

Despite the noted benefit of CNV analysis, it is often difficult to determine the pathogenic effects of a CNV. This ambiguity may create an added burden of uncertainty, which is often shared between the laboratory, the clinician and the family. Appropriate genetic counseling helps inform the family of possible outcomes, including a normal result, variants of unknown significance and the possibility of incidental findings. As the ability to detect CNVs continues to increase so must the ability to discern the pathogenic effects of CNVs. A current framework exists to investigate the possible effects of CNVs in the human genome. Continued collaboration between researchers, clinicians and families is imperative to both maximize the benefit of CNV analysis as well as minimize the risk to patients undergoing testing.

Box 1. Web resources

The URLs for websites referred to herein are as follows:

Database of Genomic Variants http://projects.tcag.ca/variation/

NCBI Database of genomic structural variation (dbVar) http://www.ncbi.nlm.nih.gov/dbvar/

DECIPHER http://decipher.sanger.ac.uk

ECARUCA http://umcecaruca01.extern.umcn.nl:8080/ecaruca/ecaruca.jsp

International Standards for Cytogenomic Arrays Database https://www.iscaconsortium.org