Screening of triploid with low-coverage whole-genome sequencing by a single-nucleotide polymorphism-based test in miscarriage tissue
- 162 Downloads
To establish a single-nucleotide polymorphism-based analysis (SBA) method to identify triploidy in the miscarriage tissue by using low-coverage whole-genome sequencing (LC-WGS).
The method was established by fitting a quadratic curve model by counting the distribution of three heterozygous mutation content intervals. The triploid test result was mainly determined by the opening direction and the axis of symmetry of the quadratic curve, and Z test between the same batch samples was also used for auxiliary judgment.
Two hundred thirteen diploid samples and 8 triploid samples were used for establishment of the analytical method and 203 unknown samples were used for blind testing. In the blind testing, we found 2 cases positive for triploidy. After chromosome microarray analysis (CMA) and mass spectrometry verification, we found that both samples were true positives. We randomly selected 5 samples from the negative samples for mass spectrometry verification, and the results showed that these samples were all true negatives.
Our method achieved accurate detection of triploidy in the miscarriage tissue and has the potential to detect more chromosomal abnormality types such as uniparental disomy (UPD) using a single LC-WGS approach.
KeywordsSingle-nucleotide polymorphism Heterozygous mutation Quadratic curve Z test Triploid Uniparental disomy
Chromosomal abnormalities generally refer to structural or copy number variations that can lead to a number of serious health outcomes such as infertility [1, 2], recurrent miscarriage , birth defect, and cancer . In all clinically recognized pregnancies, approximately 10–15% result in abortions, and most of abortions occur in the first trimester. About 50% of early pregnancy abortions are caused by chromosomal abnormalities , most of which (86%) are due to abnormal chromosome numbers, including trisomy, monosomy, and polyploidy. Triploidy (69, XXX; 69, XXY; 69, XYY), which refers to an extra set of chromosomes present in each cell, is one of the most common chromosomal abnormalities [6, 7, 8]. About 1–2% of pregnancies and 10% of early abortions are caused by triploidy [5, 8, 9, 10]. Most triploid fetuses are unable to survive pregnancy and undergo spontaneous abortion between the 7th and 17th weeks of pregnancy [6, 8, 11, 12]. Live birth of triploid babies is very rare and they usually die early after birth with only a few reported cases of unusually long survival .
Considering that the phenomenon of triploidy is quite common in pregnancy and has serious consequences, it is of clinical significance to develop a technique that can detect this abnormality accurately. There are many techniques available for detecting chromosomal abnormalities, such as karyotyping, fluorescence in situ hybridization (FISH), and chromosome microarray analysis (CMA). FISH can confirm the locations of chromosomal abnormalities identified by CMA, targeted next-generation sequencing (NGS), and WGS, but fails to detect de novo chromosomal abnormalities. The CMA technology has been widely used to study abortion causes [6, 14], but is limited by failure to detect certain chromosome rearrangements, such as balanced translocations and inversions. Recently, using paired reads that span breakpoints, targeted NGS and WGS technology has been applied to detect chromosomal abnormalities [15, 16]. As sequencing cost continues to decline, WGS has been widely used to investigate the cause of miscarriage. In general, greater the depth of WGS, more accurate the analysis will be. However, as the depth of sequencing increases, so does the cost. It is a balance of accuracy and cost that will determine the depth of sequencing employed. Given the technologic advances, it is of great value to develop a comprehensive detection method for chromosomal abnormalities including CNV, aneuploidy, triploidy, and UPD using a single NGS method such as LC-WGS.
Materials and methods
Phase 1—Establishment of methodology
First, genomic DNA isolated from 8 triploid miscarried products of conception (POC) and genomic DNA from 213 diploid individuals were sequenced using LC-WGS. The establishment of methodology was developed in the following steps. In short, we determined the triploidy status according to the proportion of mutant reads in each heterozygous SNP. First of all, heterozygous sites were selected for each sample. We defined the ratio of the number of mutated reads to the number of all reads at the site as mutation ratio (MR). Since there are two copies of each autosomal chromosome in diploid cells, MR should be close to 1/2 in diploid cells. In contrast, for triploid cells, 1/3 and 2/3 MR for heterozygous SNPs are expected. However, due to sequencing variabilities, the MR of each SNP site is not exactly 1/3 or 2/3, so we defined “1/3 interval” as [0.28, 0.38], “2/3 interval” as [0.62, 0.72], and “1/2 interval” as [0.45, 0.55]. Next, the number of SNPs belonging to three intervals was respectively counted as SNs. We set up a coordinate system with MR as abscissa and SNs as ordinate. For each sample, we got three data points: (1/3, y1), (1/2, y2), (2/3, y3). The parabolic equation y = ax2 + bx + c was fitted to these data points. When the sample is triploid, the opening direction of parabola is upward, a > 0; when the sample is diploid, the opening direction of parabola is downward, a < 0. The central axis of the parabola has an abscissa of –b/2a and the value should be within [0.45, 0.55].
Phase 2—Validation of SBA analysis
A blinded test was performed in order to evaluate the feasibility of the SBA method in differentiating diploids and triploids at low sequencing depth. To avoid the absence of triploidy in the blinded samples, we mixed a triploid sample into 202 samples of unknown type. These 203 samples were blindly tested using the SBA method. The triploids detected by SBA were subsequently verified by mass spectrometry genotyping method and CMA. In addition, 5 negative samples from blinded samples were selected randomly to be verified by mass spectrometry analysis.
Patients and sample collection
All samples of abortion tissue involved in this study were obtained in SZMHH from January 1, through August 1, 2013. The tissue was frozen by liquid nitrogen treatment and DNA was isolated and stored in the − 80 °C refrigerator immediately after it was isolated. All samples were analyzed at Beijing Cheerland Medical Laboratory Co., Ltd., which is certified by the National Health Commission of the People’s Republic of China.
Library construction and sequencing
The aborted tissue was washed with phosphate-buffered saline (PBS) after being thawed. Genomic DNA was extracted by QIAamp DNA mini kit (Qiagen, Hilden, Germany). Quantity and purity of gDNA were assessed by Qubit 3.0 fluorometer (Invitrogen, Carlsbad, CA, USA) and NanoDrop-One (Thermo Scientific, Wilmington, DE, USA). One microgram of gDNA was fragmented to the size range of 200–500 bp by M220 Focused-ultrasonicator (Covaris, UK), and 100–300-bp DNA fragments were selected using AMPure XP beads (Agencourt, CA, USA). Then, the selected DNA fragments were repaired and modified at 3′ end. The dTTP tail junction sequence was ligated to the end of DNA fragment, and the DNA fragment was amplified for 8 cycles and subjected to a single-strand cyclization process. The PCR product was denatured with a specific molecule is then ligated by DNA ligase. The remaining linear molecules were digested by exonuclease, and finally a single-stranded loop DNA library was obtained. The generated sequencing library was quality control by Agilent 2100 bioanalyzer (Agilent, CA, USA) and Qubit 3.0 (Invitrogen). All samples were subjected to 50-bp pair end sequencing using the BGISEQ-500 sequencing platform.
Phase A—Mutation detection and quality control
The raw paired reads for each sample were approximately 60 M. All reads were inversely adjusted, and the inverted reads were merged with the original data to form a new data set for triploid analysis. First, data filtering was performed using SOAPnuke (version 1.5.0). After filtering for clean reads, the data were mapped to the reference genome sequence (hg19) using BWA (0.7.12-r1039). Variation detection and filtration were performed by GATK HaplotypeCaller. Then we combined the filtered INDEL and SNP. Finally, data size and average depth of each chromosome were compared for statistical and quality control.
Phase B—SNP filtering
We screened the SNPs for each sample. All SNPs that were used to calculate the parabola must satisfy both of the following conditions. First, it must exist in the dbSNP database (version 138), and the frequency in the 1000 Genomes database (phrase 3 version) needs to be more than 5%. At the same time, the number of reads must be more than 10 after the inversion. Through the above two steps of screening, we obtained available SNPs for triploidy analysis.
Establishment of analysis method
Parabolic data for triploid and diploid
a/(a + c) × 100%
d/(b + d) × 100%
(a + d)/(a + b + c + d) × 100%
Parabolic data for triploid and partial diploid
(1/3 + 2/3)/(1/2)
Performance of the analysis method on 203 unknown samples
Detection of UPD using SBA method
Several technologies such as cytogenetics, CMA, FISH, and WGS can be used to investigate chromosomal abnormalities. Cytogenetic analysis of cultured chorionic or fetal tissues is still regarded as gold standard for chromosomal ploidy analysis. However, this technology has shortcomings such as difficulties in tissue culture, contamination from maternal cells, and time inefficiency. Some new technologies have emerged, including array comparative genomic hybridization (aCGH), and multiplex ligation-dependent probe amplification (MLPA) . Microarray-based method can detect millions of genomic loci simultaneously with high resolution and is particularly suitable for detection of micro duplications/deletions. NGS-based CNV-Seq is mainly used to analyze chromosome aneuploidy, including microdeletions/duplications and trisomy, but still faces challenges in triploidy detection. In this study, we developed a tripoidy detection method by analyzing the frequency distribution of heterozygous SNPs in the sample by taking advantage of LC-WGS technology. Our method can detect triploidy with high accuracy. Moreover, we envision that our method’s utility is not limited to the detection of triploidy and also has the potential for the detection of other chromosomal abnormalities such as UPD.
Although our method can detect triploidy in majority of tested cases, there are situations that will negatively affect the accuracy of triploidy detection. For example, if fetal samples contain significant maternal contamination or are positive for somatic mosaicism, the accuracy of analysis will decrease. When maternal blood is mixed into the triploid sample, the tripoidy signal will be reduced, leading to potential false negative results. Therefore, maternal cell contamination should be avoided as much as possible. On the data analysis level, we can use the proportion of X chromosome to help us rule out potential maternal contamination. If the proportion of X chromosome is abnormally high, we should consider the possibility of maternal cell contamination. Somatic mosaicism also posts significant analytic challenge using our LC-WGS method.
In summary, we successfully developed a LC-WGS-based SBA method that allows accurate detection of tripoidy in 203 blinded POC samples. We also provided proof of concept that this method could be used for whole chromosomal or segmental chromosomal UPD detection. Combining low sequencing depth and WGS, this method offers high detection accuracy, wide utility, and reasonable cost. As the sequencing cost of WGS continues to decrease, we foresee that our method will provide more values in the analysis of chromosomal abnormalities as well as mutation detection to aid the study to identify causes of infertility, abortion, and genetic diseases.
The authors thank SZMHH for providing triploid samples and the Southern University of Science and Technology-CheerLand Institute of Precision Medicine for providing research equipment.
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflict of interest.
All procedures performed in the study involving abortion tissue samples were in accordance with the ethical standards of the Shenzhen Maternity and Child Healthcare Hospital.
- 22.Kolarski M, Ahmetovic B, Beres M, Topic R, Nikic V, Kavecan I, et al. Genetic counseling and prenatal diagnosis of triploidy during the second trimester of pregnancy. Mediev Archaeol. 2017;71:144–7.Google Scholar
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.