Abstract
Infertility is a major reproductive health issue that affects about 12% of women of reproductive age in the United States. Aneuploidy in eggs accounts for a significant proportion of early miscarriage and in vitro fertilization failure. Recent studies have shown that genetic variants in several genes affect chromosome segregation fidelity and predispose women to a higher incidence of egg aneuploidy. However, the exact genetic causes of aneuploid egg production remain unclear, making it difficult to diagnose infertility based on individual genetic variants in mother’s genome. In this study, we evaluated machine learning-based classifiers for predicting the embryonic aneuploidy risk in female IVF patients using whole-exome sequencing data. Using two exome datasets, we obtained an area under the receiver operating curve of 0.77 and 0.68, respectively. High precision could be traded off for high specificity in classifying patients by selecting different prediction score cutoffs. For example, a strict prediction score cutoff of 0.7 identified 29% of patients as high-risk with 94% precision. In addition, we identified MCM5, FGGY, and DDX60L as potential aneuploidy risk genes that contribute the most to the predictive power of the model. These candidate genes and their molecular interaction partners are enriched for meiotic-related gene ontology categories and pathways, such as microtubule organizing center and DNA recombination. In summary, we demonstrate that sequencing data can be mined to predict patients’ aneuploidy risk thus improving clinical diagnosis. The candidate genes and pathways we identified are promising targets for future aneuploidy studies.
Similar content being viewed by others
Data availability
The gene scores used for the classification model construction are available at Rutgers University Community Repository (RUcore): https://doi.org/doi:10.7282/t3-fn30-rv12.
References
Akter S, Xu D, Nagel SC, Bromfield JJ, Pelch K, Wilshire GB, Joshi T (2019) Machine learning classifiers for endometriosis using transcriptomics and methylomics data. Front Genet 10:766
Alazami AM, Awad SM, Coskun S, Al-Hassan S, Hijazi H, Abdulwahab FM, Poizat C, Alkuraya FS (2015) TLE6 mutation causes the earliest known human embryonic lethality. Genome Biol 16:240
Ben-David U, Amon A (2020) Context is everything: aneuploidy in cancer. Nat Rev Genet 21(1):44–62
Biswas L, Tyc K, El Yakoubi W, Morgan K, Xing J, Schindler K (2021) Meiosis interrupted: the genetics of female infertility via meiotic failure. Reproduction 161(2):R13–R35
Blengini CS, Ibrahimian P, Vaskovicova M, Drutovic D, Solc P, Schindler K (2021) Aurora kinase A is essential for meiosis in mouse oocytes. PLoS Genet 17(4):e1009327
Bochman ML, Schwacha A (2008) The Mcm2-7 complex has in vitro helicase activity. Mol Cell 31(2):287–293
Bochman ML, Schwacha A (2009) The Mcm complex: unwinding the mechanism of a replicative helicase. Microbiol Mol Biol Rev 73(4):652–683
Bromberg Y, Rost B (2007) SNAP: predict effect of non-synonymous polymorphisms on function. Nucleic Acids Res 35(11):3823–3835
Bury L, Coelho PA, Simeone A, Ferries S, Eyers CE, Eyers PA, Zernicka-Goetz M, Glover DM (2017) Plk4 and Aurora A cooperate in the initiation of acentriolar spindle assembly in mammalian oocytes. J Cell Biol 216(11):3571–3590
Cai Y, Huang T, Jia P (2020) Editorial: advanced interpretable machine learning methods for clinical ngs big data of complex hereditary diseases. Front Genet 11:600902
Cao X, Zhang Y, Abdulkadir M, Deng L, Fernandez TV, Garcia-Delgar B, Hagstrom J, Hoekstra PJ, King RA, Koesterich J, Kuperman S, Morer A, Nasello C, Plessen KJ, Thackray JK, Zhou L, S. Tourette International Collaborative Genetics, Dietrich A, Tischfield JA, Heiman GA, Xing J (2021) Whole-exome sequencing identifies genes associated with Tourette’s disorder in multiplex families. Mol Psychiatry 26(11):6937–6951
Carp H, Toder V, Aviram A, Daniely M, Mashiach S, Barkai G (2001) Karyotype of the abortus in recurrent miscarriage. Fertil Steril 75(4):678–682
Carrington B, Sacks G, Regan L (2005) Recurrent miscarriage: pathophysiology and outcome. Curr Opin Obstet Gynecol 17(6):591
Clark-Maguire S, Mains PE (1994) Localization of the mei-1 gene product of Caenorhaditis elegans, a meiotic-specific spindle component. J Cell Biol 126(1):199–209
Ferguson RL, Pascreau G, Maller JL (2010) The cyclin A centrosomal localization sequence recruits MCM5 and Orc1 to regulate centrosome reduplication. J Cell Sci 123(Pt 16):2743–2749
Franasiak JM, Forman EJ, Hong KH, Werner MD, Upham KM, Treff NR, Scott RT (2014) Aneuploidy across individual chromosomes at the embryonic level in trophectoderm biopsies: changes with patient age and chromosome structure. J Assist Reprod Genet 31(11):1501–1509
French JD, Edwards SL (2020) The role of noncoding variants in heritable disease. Trends Genet 36(11):880–891
Giam M, Rancati G (2015) Aneuploidy and chromosomal instability in cancer: a jackpot to chaos. Cell Div 10(1):3
Greene CS, Krishnan A, Wong AK, Ricciotti E, Zelaya RA, Himmelstein DS, Zhang R, Hartmann BM, Zaslavsky E, Sealfon SC, Chasman DI, FitzGerald GA, Dolinski K, Grosser T, Troyanskaya OG (2015) Understanding multicellular function and disease with human tissue-specific networks. Nat Genet 47(6):569–576
Gui H, Schriemer D, Cheng WW, Chauhan RK, Antiňolo G, Berrios C, Bleda M, Brooks AS, Brouwer RWW, Burns AJ, Cherny SS, Dopazo J, Eggen BJL, Griseri P, Jalloh B, Le T-L, Lui VCH, Luzón-Toro B, Matera I, Ngan ESW, Pelet A, Ruiz-Ferrer M, Sham PC, Shepherd IT, So M-T, Sribudiani Y, Tang CSM, van den Hout MCGN, van der Linde HC, van Ham TJ, van Ijcken WFJ, Verheij JBGM, Amiel J, Borrego S, Ceccherini I, Chakravarti A, Lyonnet S, Tam PKH, Garcia-Barceló M-M, Hofstra RMW (2017) Whole exome sequencing coupled with unbiased functional analysis reveals new Hirschsprung disease genes. Genome Biol 18(1):48
Hassold T, Chiu D (1985) Maternal age-specific rates of numerical chromosome abnormalities with special reference to trisomy. Hum Genet 70(1):11–17
Hassold T, Hunt P (2001) To err (meiotically) is human: the genesis of human aneuploidy. Nat Rev Genet 2(4):280–291
He B, Xia S, Zhang Z (2020) NudCD1 promotes the proliferation and metastasis of non-small cell lung cancer cells through the activation of IGF1R-ERK1/2. Pathobiology 87(4):244–253
Herwig R, Hardt C, Lienhard M, Kamburov A (2016) Analyzing and interpreting genome data at the network level with ConsensusPathDB. Nat Protoc 11(10):1889–1907
Hogge WA, Byrnes AL, Lanasa MC, Surti U (2003) The clinical use of karyotyping spontaneous abortions. Am J Obstet Gynecol 189(2):397–400
Ichise H, Ichise T, Yoshida N (2016) Phospholipase Cγ2 is required for luminal expansion of the epididymal duct during postnatal development in mice. PLoS ONE 11(3):e0150521
Jiang X, Zhao D, Ali A, Xu B, Liu W, Wen J, Zhang H, Shi Q, Zhang Y (2021) MeiosisOnline: a manually curated database for tracking and predicting genes associated with meiosis. Front Cell Dev Biol 9:2102
Johnson J-LFA, Lu C, Raharjo E, McNally K, McNally FJ, Mains PE (2009) Levels of the ubiquitin ligase substrate adaptor MEL-26 are inversely correlated with MEI-1/katanin microtubule-severing activity during both meiosis and mitosis. Dev Biol 330(2):349–357
Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alfoldi J, Wang Q, Collins RL, Laricchia KM, Ganna A, Birnbaum DP, Gauthier LD, Brand H, Solomonson M, Watts NA, Rhodes D, Singer-Berk M, England EM, Seaby EG, Kosmicki JA, Walters RK, Tashman K, Farjoun Y, Banks E, Poterba T, Wang A, Seed C, Whiffin N, Chong JX, Samocha KE, Pierce-Hoffman E, Zappala Z, O’Donnell-Luria AH, Minikel EV, Weisburd B, Lek M, Ware JS, Vittal C, Armean IM, Bergelson L, Cibulskis K, Connolly KM, Covarrubias M, Donnelly S, Ferriera S, Gabriel S, Gentry J, Gupta N, Jeandet T, Kaplan D, Llanwarne C, Munshi R, Novod S, Petrillo N, Roazen D, Ruano-Rubio V, Saltzman A, Schleicher M, Soto J, Tibbetts K, Tolonen C, Wade G, Talkowski ME, C. Genome Aggregation Database, Neale BM, Daly MJ, MacArthur DG (2020) The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581(7809):434–443
Kato H, Takeuchi O, Sato S, Yoneyama M, Yamamoto M, Matsui K, Uematsu S, Jung A, Kawai T, Ishii KJ, Yamaguchi O, Otsu K, Tsujimura T, Koh C-S, Reis e Sousa C, Matsuura Y, Fujita T, Akira S (2006) Differential roles of MDA5 and RIG-I helicases in the recognition of RNA viruses. Nature 441(7089):101–105
Katoh M (2013) Functional proteomics, human genetics and cancer biology of GIPC family members. Exp Mol Med 45(6):e26
Ko HW, Norman RX, Tran J, Fuller KP, Fukuda M, Eggenschwiler JT (2010) Broad-minded links cell cycle-related kinase to cilia assembly and hedgehog signal transduction. Dev Cell 18(2):237–247
Kubicek D, Hornak M, Horak J, Navratil R, Tauwinklova G, Rubes J, Vesela K (2019) Incidence and origin of meiotic whole and segmental chromosomal aneuploidies detected by karyomapping. Reprod Biomed Online 38(3):330–339
Kuliev A, Zlatopolsky Z, Kirillova I, Spivakova J, Janzen JC (2011) Meiosis errors in over 20,000 oocytes studied in the practice of preimplantation aneuploidy testing. Reprod Biomed Online 22(1):2–8
Lake CM, Teeter K, Page SL, Nielsen R, Hawley RS (2007) A genetic analysis of the Drosophila mcm5 gene defines a domain specifically required for meiotic recombination. Genetics 176(4):2151–2163
Lee CS, Friedman JR, Fulmer JT, Kaestner KH (2005) The initiation of liver development is dependent on Foxa transcription factors. Nature 435(7044):944–947
Linder P, Jankowsky E (2011) From unwinding to clamping — the DEAD box RNA helicase family. Nat Rev Mol Cell Biol 12(8):505–516
Lowy-Gallego E, Fairley S, Zheng-Bradley X, Ruffier M, Clarke L, Flicek P, C. Genome Project (2019) Variant calling on the GRCh38 assembly with the data from phase three of the 1000 genomes project. Wellcome Open Res 4:50
Manzini MC, Tambunan DE, Hill RS, Yu TW, Maynard TM, Heinzen EL, Shianna KV, Stevens CR, Partlow JN, Barry BJ, Rodriguez J, Gupta VA, Al-Qudah A-K, Eyaid WM, Friedman JM, Salih MA, Clark R, Moroni I, Mora M, Beggs AH, Gabriel SB, Walsh CA (2012) Exome sequencing and functional validation in zebrafish identify GTDC2 mutations as a cause of Walker-Warburg syndrome. Am J Hum Genet 91(3):541–547
McCaffrey R, St Johnston D, González-Reyes A (2006) Drosophila mus301/spindle-C encodes a helicase with an essential role in double-strand DNA break repair and meiotic progression. Genetics 174(3):1273–1285
McCoy RC, Demko Z, Ryan A, Banjevic M, Hill M, Sigurjonsson S, Rabinowitz M, Fraser HB, Petrov DA (2015a) Common variants spanning PLK4 are associated with mitotic-origin aneuploidy in human embryos. Science 348(6231):235–238
McCoy RC, Demko ZP, Ryan A, Banjevic M, Hill M, Sigurjonsson S, Rabinowitz M, Petrov DA (2015b) Evidence of selection against complex mitotic-origin aneuploidy during preimplantation development. PLoS Genet 11(10):e1005601
Nguyen AL, Marin D, Zhou A, Gentilello AS, Smoak EM, Cao Z, Fedick A, Wang Y, Taylor D, Scott RT Jr, Xing J, Treff N, Schindler K (2017) Identification and characterization of Aurora kinase B and C variants associated with maternal aneuploidy. Mol Hum Reprod 23(6):406–416
Novikova G, Andrews SJ, Renton AE, Marcora E (2021) Beyond association: successes and challenges in linking non-coding genetic variation to functional consequences that modulate Alzheimer’s disease risk. Mol Neurodegener 16(1):27
Ozawa K, Kondo T, Hori O, Kitao Y, Stern DM, Eisenmenger W, Ogawa S, Ohshima T (2001) Expression of the oxygen-regulated protein ORP150 accelerates wound healing by modulating intracellular VEGF transport. J Clin Investig 108(1):41–50
Practice Committee of the American Society for Reproductive Medicine (2012) Evaluation and treatment of recurrent pregnancy loss: a committee opinion. Fertil Steril 98(5):1103–1111
Romanel A, Zhang T, Elemento O, Demichelis F (2017) EthSEQ: ethnicity annotation from whole exome sequencing data. Bioinformatics 33(15):2402–2404
ESHRE Guideline Group on RPL, Bender Atik R, Christiansen OB, Elson J, Kolte AM, Lewis S, Middeldorp S, Nelen W, Peramo B, Quenby S, Vermeulen N, Goddijn M (2018) ESHRE guideline: recurrent pregnancy loss. Human Reprod Open 2018(2):hoy004
Seachrist DD, Anstine LJ, Keri RA (2021) FOXA1: a pioneer of nuclear receptor action in breast cancer. Cancers 13(20):5205
Singh C, Glaab E, Linster CL (2017) Molecular identification of d-Ribulokinase in budding yeast and mammals. J Biol Chem 292(3):1005–1028
Solc P, Baran V, Mayer A, Bohmova T, Panenkova-Havlova G, Saskova A, Schultz RM, Motlik J (2012) Aurora kinase A drives MTOC biogenesis but does not trigger resumption of meiosis in mouse oocytes matured in vivo. Biol Reprod 87(4):85
Sondka Z, Bamford S, Cole CG, Ward SA, Dunham I, Forbes SA (2018) The COSMIC cancer gene census: describing genetic dysfunction across all human cancers. Nat Rev Cancer 18(11):696–705
Srayko M, O’Toole E T, Hyman AA, Muller-Reichert T (2006) Katanin disrupts the microtubule lattice and increases polymer number in C. elegans meiosis. Curr Biol 16(19):1944–1949
Stirparo GG, Boroviak T, Guo G, Nichols J, Smith A, Bertone P (2018) Integrated analysis of single-cell embryo data yields a unified transcriptome signature for the human pre-implantation epiblast. Development (cambridge, England) 145(3):dev158501
Szklarczyk D, Morris JH, Cook H, Kuhn M, Wyder S, Simonovic M, Santos A, Doncheva NT, Roth A, Bork P, Jensen LJ, von Mering C (2017) The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible. Nucleic Acids Res 45(D1):D362–D368
Telenti A, Lippert C, Chang PC, DePristo M (2018) Deep learning of genomic variation and regulatory network data. Hum Mol Genet 27(R1):R63–R71
Toth B, Würfel W, Bohlmann M, Zschocke J, RudniK–Schöneborn S, Nawroth F, Schleußner E, Rogenhofer N, Wischmann T, von Wolff M, Hancke K, von Otte S, Kuon R, Feil K, Tempfer C (2018) Recurrent miscarriage: diagnostic and therapeutic procedures. guideline of the DGGG, OEGGG and SGGG (S2k-Level, AWMF Registry Number 015/050). Geburtshilfe Frauenheilkd 78(4):364–381
Tyc KM, El Yakoubi W, Bag A, Landis J, Zhan Y, Treff NR, Scott RT, Tao X, Schindler K, Xing J (2020a) Exome sequencing links CEP120 mutation to maternally derived aneuploid conception risk. Hum Reprod 35(9):2134–2148
Tyc KM, McCoy RC, Schindler K, Xing J (2020b) Mathematical modeling of human oocyte aneuploidy. Proc Natl Acad Sci U S A 117(19):10455–10464
Tyc KM, Wong A, Scott RT Jr, Tao X, Schindler K, Xing J (2021) Analysis of DNA variants in miRNAs and miRNA 3’UTR binding sites in female infertility patients. Lab Invest 101(4):503–512
Wang Z, Zhou Y, Hu X, Chen W, Lin X, Sun L, Xu X, Hong W, Wang T (2015) RILP suppresses invasion of breast cancer cells by modulating the activity of RalA through interaction with RalGDS. Cell Death Dis 6(10):e1923
Wang Y, Miller M, Astrakhan Y, Petersen BS, Schreiber S, Franke A, Bromberg Y (2019) Identifying Crohn’s disease signal from variome analysis. Genome Med 11(1):59
Wang X, Baumann C, De La Fuente R, Viveiros MM (2020) CEP215 and AURKA regulate spindle pole focusing and aMTOC organization in mouse oocytes. Reproduction (cambridge, England) 159(3):261–274
Wong AK, Krishnan A, Troyanskaya OG (2018) GIANT 2.0: genome-scale integrated analysis of gene networks in tissues. Nucleic Acids Res 46(W1):W65–W70
Yu M, Smolen GA, Zhang J, Wittner B, Schott BJ, Brachtel E, Ramaswamy S, Maheswaran S, Haber DA (2009) A developmentally regulated inducer of EMT, LBX1, contributes to breast cancer progression. Genes Dev 23(15):1737–1742
Zhang X, Acencio ML, Lemke N (2016) Predicting essential genes and proteins based on machine learning and network topological features: a comprehensive review. Front Physiol 7:75
Zhang R, Zhang F, Sun Z, Liu P, Zhang X, Ye Y, Cai B, Walsh MJ, Ren X, Hao X, Zhang W, Yu J (2019) LINE-1 retrotransposition promotes the development and progression of lung squamous cell carcinoma by disrupting the tumor-suppressor gene FGGY. Can Res 79(17):4453
Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS (2012) A high-performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics 28(24):3326–3328
Acknowledgements
We thank the patients who participated in and contributed to this study. We thank Leelabati Biswas and Mansour Aboelenain for helpful discussions. We gratefully acknowledge access to the HPC facilities and support of the computational STEM and bioinformatics scientists from the Office of Advanced Research Computing at Rutgers University.
Funding
This work is partly supported by a grant from the NIH/NICHD to KS, JX, and XT: R01-HD091331. YB was supported by the NIH/NIGMS grant R01-GM115486 and NIH/NIMH R01-MH115958.
Author information
Authors and Affiliations
Contributions
YB, KS, and JX contributed to the study conception and design. Patient samples were recruited by RTS and XT. Data analysis was performed by SS, KMT, XC, and JX. Pipeline development was performed by MM, YW, and YB. The first draft of the manuscript was written by SS and JX. All authors commented on the draft of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Sun, S., Miller, M., Wang, Y. et al. Predicting embryonic aneuploidy rate in IVF patients using whole-exome sequencing. Hum Genet 141, 1615–1627 (2022). https://doi.org/10.1007/s00439-022-02450-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00439-022-02450-z