Abstract
Copy number variation (CNV) related disorders tend to show complex phenotypic profiles that do not match known diseases. This makes it difficult to ascertain their underlying molecular basis. A potential solution is to compare the affected genomic regions for multiple patients that share a pathological phenotype, looking for commonalities. Here, we present a novel approach to associate phenotypes with functional systems, in terms of GO categories and KEGG and Reactome pathways, based on patient data. The approach uses genomic and phenomic data from the same patients, finding shared genomic regions between patients with similar phenotypes. These regions are mapped to genes to find associated functional systems. We applied the approach to analyse patients in the DECIPHER database with de novo CNVs, finding functional systems associated with most phenotypes, often due to mutations affecting related genes in the same genomic region. Manual inspection of the ten top-scoring phenotypes found multiple FunSys connections supported by the previous studies for seven of them. The workflow also produces reports focussed on the genes and FunSys connected to the different phenotypes, alongside patient-specific reports, which give details of the associated genes and FunSys for each individual in the cohort. These can be run in “confidential” mode, preserving patient confidentiality. The workflow presented here can be used to associate phenotypes with functional systems using data at the level of a whole cohort of patients, identifying important connections that could not be found when considering them individually. The full workflow is available for download, enabling it to be run on any patient cohort for which phenotypic and CNV data are available.
Similar content being viewed by others
Availability of data and materials
The data that support the findings of this study are available as additional material. However, whilst information for the patients results has been removed for confidential reasons. The datasets used and/or analysed during the current study are available from the DECIPHER database under signed agreement. All code underlying the workflow is freely available from https://github.com/fmjabato/PhenFun, written in R and bash script, employing a workflow manager, AutoFlow, to be run on UNIX-like systems. All dependencies are explained in the README file of the Github repository.
References
AlAmer N, Bondalapati A, Garcia-Godoy F, Kandalam U (2016) Osteogenic differentiation of orofacial tissue-derived mesenchymal stem cells—a review. Curr Tissue Eng (Discontinued) 5:11–20. https://doi.org/10.2174/2211542004666150713190649
Alexa A, Rahnenfuhrer J (2016) topGO: enrichment analysis for gene ontology. R package version 2.26.0 (2016)
Amberger JS, Bocchini CA, Schiettecatte F, Scott AF, Hamosh A (2015) OMIM.org: online mendelian inheritance in man (OMIM®), an online catalog of human genes and genetic disorders. Nucleic Acids Res 43(D1):789–798. https://doi.org/10.1093/nar/gku1205
Andrews T, Honti F, Pfundt R, De Leeuw N, Hehir-Kwa J, Silfhout AVV, De Vries B, Webber C (2015) The clustering of functionally related genes contributes to CNV-mediated disease. Genome Res. https://doi.org/10.1101/gr.184325.114
Angelis D, Spiliotis ET (2016) Septin mutations in human cancers. Front Cell Dev Biol. https://doi.org/10.3389/fcell.2016.00122
Attili D, McClintock SD, Rizvi AH, Pandya S, Rehman H, Nadeem DM, Richter A, Thomas D, Dame MK, Turgeon DK, Varani J, Aslam MN (2019) Calcium-induced differentiation in normal human colonoid cultures: cell-cell/cell-matrix adhesion, barrier formation and tissue integrity. PLoS One. https://doi.org/10.1371/journal.pone.0215122
Bass JIF, Diallo A, Nelson J, Soto JM, Myers CL, Walhout AJM (2013) Using networks to measure similarity between genes: association index selection. Nat Methods 10(12):1169–1176. https://doi.org/10.1038/nmeth.2728
Begemann A, Acuña MA, Zweier M, Vincent M, Steindl K, Bachmann-Gagescu R, Hackenberg A, Abela L, Plecko B, Kroell-Seger J, Baumer A, Yamakawa K, Inoue Y, Asadollahi R, Sticht H, Zeilhofer HU, Rauch A (2019) Further corroboration of distinct functional features in SCN2A variants causing intellectual disability or epileptic phenotypes. Mol Med (Cambridge, Mass.). https://doi.org/10.1186/s10020-019-0073-6
Bueno A, Rodríguez-López R, Reyes-Palomares A, Rojano E, Corpas M, Nevado J, Lapunzina P, Sánchez-Jiménez F, Ranea JAG (2018) Phenotype-loci associations in networks of patients with rare disorders: application to assist in the diagnosis of novel clinical cases. Eur J Hum Genet 26(10):1451–1461. https://doi.org/10.1038/s41431-018-0139-x
Chagoyen M, Pazos F (2016) Characterization of clinical signs in the human interactome. Bioinformatics. https://doi.org/10.1093/bioinformatics/btw054
Deshpande AS, Goudy SL (2019) Cellular and molecular mechanisms of cleft palate development. Laryngosc Investig Otolaryngol. https://doi.org/10.1002/lio2.214
Doğan T (2018) HPO2GO: prediction of human phenotype ontology term associations for proteins using cross ontology annotation co-occurrences. PeerJ 6:5298. https://doi.org/10.7717/peerj.5298
Doggrell SA (2003) The role of 5-HT on the cardiovascular and renal systems and the clinical potential of 5-HT modulation. Expert Opin Investig Drugs. https://doi.org/10.1517/13543784.12.5.805
Fabregat A, Jupe S, Matthews L, Sidiropoulos K, Gillespie M, Garapati P, Haw R, Jassal B, Korninger F, May B, Milacic M, Roca CD, Rothfels K, Sevilla C, Shamovsky V, Shorser S, Varusai T, Viteri G, Weiser J, Wu G, Stein L, Hermjakob H, D’Eustachio P (2018) The reactome pathway knowledgebase. Nucleic Acids Res 46(D1):649–655. https://doi.org/10.1093/nar/gkx1132
Firth HV, Richards SM, Bevan AP, Clayton S, Corpas M, Rajan D, Vooren SV, Moreau Y, Pettett RM, Carter NP (2009) DECIPHER: database of chromosomal imbalance and phenotype in humans using ensembl resources. Am J Hum Genet 84(4):524–533. https://doi.org/10.1016/j.ajhg.2009.03.010
Foong AL, Grindrod KA, Patel T, Kellar J (2018) Demystifying serotonin syndrome (or serotonin toxicity)
Gamazon ER, Stranger BE (2015) The impact of human copy number variation on gene expression. Brief Funct Genom 14(5):352–357. https://doi.org/10.1093/bfgp/elv017
Goh K-I, Cusick ME, Valle D, Childs B, Vidal M, Barabási A-L (2007) The human disease network. Proc Natl Acad Sci 104(21):8685–8690
Gokhale A, Ryder PV, Zlatic SA, Faundez V (2016) Identification of the interactome of a palmitoylated membrane protein, phosphatidylinositol 4-Kinase type II Alpha. Methods Mol Biol. https://doi.org/10.1007/978-1-4939-3170-5_4
Guna A, Butcher NJ, Bassett AS (2015) Comparative mapping of the 22q112 deletion region and the potential of simple model organisms. J Neurodev Disord 7(1):1–16. https://doi.org/10.1186/s11689-015-9113-x
Hemshekhar M, Sunitha K, Thushara RM, Sebastin Santhosh M, ShanmugaSundaram M, Kemparaju K, Girish KS (2013) Antiarthritic and antiinflammatory propensity of 4-methylesculetin, a coumarin derivative. Biochimie. https://doi.org/10.1016/j.biochi.2013.02.014
Hoeffding LK, Trabjerg BB, Olsen L, Mazin W, Sparsø T, Vangkilde A, Mortensen PB, Pedersen CB, Werge T (2017) Risk of psychiatric disorders among individuals with the 22q11.2 deletion or duplication. JAMA Psychiatry. https://doi.org/10.1001/jamapsychiatry.2016.3939
Hughes SD, Ketheesan N, Haleagrahara N (2017) The therapeutic potential of plant flavonoids on rheumatoid arthritis. Crit Rev Food Sci Nutr. https://doi.org/10.1080/10408398.2016.1246413
Hutcheson JD, Setola V, Roth BL, Merryman WD (2011) Serotonin receptors and heart valve disease—it was meant 2B. Pharmacol Ther. https://doi.org/10.1016/j.pharmthera.2011.03.008
Javed A, Agrawal S, Ng PC (2014) Phen-Gen: combining phenotype and genotype to analyze rare disorders. Nat Methods 11(9):935–937. https://doi.org/10.1038/nmeth.3046
Johansson ACV, Feuk L (2011) Characterization of copy number-stable regions in the human genome. Hum Mutat. https://doi.org/10.1002/humu.21524
Juping D, Yuan Y, Shiyong C, Jun L, Xiuxiu Z, Haijian Y, Jianfeng S, Bo S (2017) Serum bilirubin and the risk of rheumatoid arthritis. J Clin Lab Anal. https://doi.org/10.1002/jcla.22118
Kahanda I, Funk C, Verspoor K, Ben-Hur A (2015) PHENOstruct: prediction of human phenotype ontology terms using heterogeneous data sources. F1000Research 4:259 https://doi.org/10.12688/f1000research.6670.1
Kanehisa M, Sato Y, Kawashima M, Furumichi M, Tanabe M (2016) KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. https://doi.org/10.1093/nar/gkv1070
Kaplan DI, Isom LL, Petrou S (2016) Role of sodium channels in epilepsy. Cold Spring Harbor Perspect Med 6(6):022814. https://doi.org/10.1101/cshperspect.a022814
Kovacs CS (2019) Physiological actions of parathyroid hormone-related protein in epidermal, mammary, reproductive, and pancreatic tissues. In: Principles of bone biology. https://doi.org/10.1016/B978-0-12-814841-9.00036-1
Lee D-S, Park J, Kay KA, Christakis NA, Oltvai ZN, Barabasi A-L (2008) The implications of human metabolic network topology for disease comorbidity. Proc Natl Acad Sci. https://doi.org/10.1073/pnas.0802208105
Li S, Han X, Ye M, Chen S, Shen Y, Niu J, Wang Y, Xu C (2019) Prenatal diagnosis of microdeletions or microduplications in the proximal, central, and distal regions of chromosome 22q11.2: ultrasound findings and pregnancy outcome. Front Genet 10(AUG):813. https://doi.org/10.3389/fgene.2019.00813
Lussier Y, Borlawsky T, Rappaport D, Liu Y, Friedman C (2006) PhenoGO: assigning phenotypic context to gene ontology annotations with natural language processing. In: Pacific symposium on biocomputing, 64–75
Melville JM, Moss TJM (2013) The immune consequences of preterm birth. Front Neurosci. https://doi.org/10.3389/fnins.2013.00079
Mosquera N, Rodriguez-Trillo A, Mera-Varela A, Gonzalez A, Conde C (2018) Uncovering cellular retinoic acid-binding protein 2 as a potential target for rheumatoid arthritis synovial hyperplasia. Sci Rep. https://doi.org/10.1038/s41598-018-26027-x
Mullegama SV, Rosenfeld JA, Orellana C, Van Bon BWM, Halbach S, Repnikova EA, Brick L, Li C, Dupuis L, Rosello M, Aradhya S, Stavropoulos DJ, Manickam K, Mitchell E, Hodge JC, Talkowski ME, Gusella JF, Keller K, Zonana J, Schwartz S, Pyatt RE, Waggoner DJ, Shaffer LG, Lin AE, De Vries BBA, Mendoza-Londono R, Elsea SH (2014) Reciprocal deletion and duplication at 2q23.1 indicates a role for MBD5 in autism spectrum disorder. Eur J Hum Genet. https://doi.org/10.1038/ejhg.2013.67
Nevado J, Mergener R, Palomares-Bralo M, Souza KR, Vallespín E, Mena R, Martínez-Glez V, Mori MÁ, Santos F, García-Miñaur S, García-Santiago F, Mansilla E, Fernández L, de Torres ML, Riegel M, Pablo Lapunzina (2014) New microdeletion and microduplication syndromes: a comprehensive review. Genet Mol Biol. https://doi.org/10.1590/S1415-47572014000200007
Nguyen D-Q, Webber C, Ponting CP (2006) Bias of selection on human copy-number variants. PLoS Genet 2(2):20. https://doi.org/10.1371/journal.pgen.0020020
Nguyen D-Q, Webber C, Hehir-Kwa J, Pfundt R, Veltman J, Ponting CP (2008) Reduced purifying selection prevails over positive selection in human copy number variant evolution. Genome Res 18(11):1711–1723. https://doi.org/10.1101/gr.077289.108
Notaro M, Schubach M, Robinson PN, Valentini G (2017) Prediction of human phenotype ontology terms by means of hierarchical ensemble methods. BMC Bioinform 18(1):449. https://doi.org/10.1186/s12859-017-1854-y
Pache RA, Zanzoni A, Naval J, Mas JM, Aloy P (2008) Towards a molecular characterisation of pathological pathways. FEBS Lett 582(8):1259–1265. https://doi.org/10.1016/j.febslet.2008.02.014
Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W, Cho EK, Dallaire S, Freeman JL, González JR, Gratacòs M, Huang J, Kalaitzopoulos D, Komura D, MacDonald JR, Marshall CR, Mei R, Montgomery L, Nishimura K, Okamura K, Shen F, Somerville MJ, Tchinda J, Valsesia A, Woodwark C, Yang F, Zhang J, Zerjal T, Zhang J, Armengol L, Conrad DF, Estivill X, Tyler-Smith C, Carter NP, Aburatani H, Lee C, Jones KW, Scherer SW, Hurles ME (2006) Global variation in copy number in the human genome. Nature 444(7118):444–454. https://doi.org/10.1038/nature05329
Reyes-Palomares A, Bueno A, Rodríguez-López R, Medina MÁ, Sánchez-Jiménez F, Corpas M, Ranea JAG (2016) Systematic identification of phenotypically enriched loci using a patient network of genomic disorders. BMC Genom 17(1):232. https://doi.org/10.1186/s12864-016-2569-6
Ribet D, Boscaini S, Cauvin C, Siguier M, Mostowy S, Echard A, Cossart P (2017) SUMOylation of human septins is critical for septin filament bundling and cytokinesis. J Cell Biol. https://doi.org/10.1083/jcb.201703096
Robinson PN, Mundlos S (2010) The human phenotype ontology. Clin Genet 77(6):525–534. https://doi.org/10.1111/j.1399-0004.2010.01436.x
Rojano E, Ranea JA, Perkins JR (2016) Characterisation of non-coding genetic variation in histamine receptors using AnNCR-SNP. Amino Acids 48(10):2433–2442. https://doi.org/10.1007/s00726-016-2265-5
Rojano E, Seoane P, Ranea JAG, Perkins JR (2018) Regulatory variants: from detection to predicting impact. Brief Bioinform. https://doi.org/10.1093/bib/bby039
Rojano E, Seoane P, Bueno-Amoros A, Perkins JR, Garcia-Ranea JA (2017) Revealing the relationship between human genome regions and pathological phenotypes through network analysis. Springer, pp 197–207. https://doi.org/10.1007/978-3-319-56148-6_17
Sam LT, Mendonça EA, Li J, Blake J, Friedman C, Lussier YA (2009) PhenoGO: an integrated resource for the multiscale mining of clinical and biological data. BMC Bioinform 10(Suppl 2):8. https://doi.org/10.1186/1471-2105-10-S2-S8
Scharfman HE (2007) The neurobiology of epilepsy. Curr Neurol Neurosci Rep. https://doi.org/10.1007/s11910-007-0053-z
Sedghi M, Abdali H, Memarzadeh M, Salehi M, Nouri N, Hosseinzadeh M, Nouri N (2015) Identification of proximal and distal 22q11.2 microduplications among patients with cleft lip and/or palate: a novel inherited atypical 0.6 Mb duplication. Genet Res Int. https://doi.org/10.1155/2015/398063
Seoane P, Ocaña S, Carmona R, Bautista R, Madrid E, Torres MA, Gonzalo Claros M (2016) AutoFlow, a versatile workflow engine illustrated by assembling an optimised de novo transcriptome for a non-model species, such as Faba Bean (Vicia faba). Curr Bioinform 11(4):440–450. https://doi.org/10.2174/1574893611666160212235117
Shaikh TH (2017) Copy number variation disorders. Curr Genet Med Rep 5(4):183–190. https://doi.org/10.1007/s40142-017-0129-2
Sharp GC, Ho K, Davies A, Stergiakouli E, Humphries K, McArdle W, Sandy J, DaveySmith G, Lewis SJ, Relton CL (2017) Distinct DNA methylation profiles in subtypes of orofacial cleft. Clin Epigenet. https://doi.org/10.1186/s13148-017-0362-2
Shaw-Smith C, Redon R, Rickman L, Rio M, Willatt L, Fiegler H, Firth H, Sanlaville D, Winter R, Colleaux L, Bobrow M, Carter NP (2004) Microarray based comparative genomic hybridisation (array-CGH) detects submicroscopic chromosomal deletions and duplications in patients with learning disability/mental retardation and dysmorphic features. J Med Genet 41(4):241–8. https://doi.org/10.1136/jmg.2003.017731
The Gene Ontology Consortium (2010) The Gene Ontology in 2010: extensions and refinements. Nucleic acids Res. https://doi.org/10.1093/nar/gkp1018
Thévenin A, Ein-Dor L, Ozery-Flato M, Shamir R (2014) Functional gene groups are concentrated within chromosomes, among chromosomes and in the nuclear space of the human genome. Nucleic Acids Res 42(15):9854–9861. https://doi.org/10.1093/nar/gku667
Tong J, Rathitharan G, Meyer JH, Furukawa Y, Ang L-C, Boileau I, Guttman M, Hornykiewicz O, Kish SJ (2017) Brain monoamine oxidase B and A in human parkinsonian dopamine deficiency disorders. Brain 140(9):2460–2474. https://doi.org/10.1093/brain/awx172
Yu G, He Q-Y (2016) ReactomePA: an R/Bioconductor package for reactome pathway analysis and visualization. Mol BioSyst 12(2):477–479. https://doi.org/10.1039/C5MB00663E
Yu G, Wang L-G, Han Y, He Q-Y (2012) clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS J Integr Biol. https://doi.org/10.1089/omi.2011.0118
Zaman T, Helbig I, Božović IB, DeBrosse SD, Bergqvist AC, Wallis K, Medne L, Maver A, Peterlin B, Helbig KL, Zhang X, Goldberg EM (2018) Mutations in SCN3A cause early infantile epileptic encephalopathy. Ann Neurol. https://doi.org/10.1002/ana.25188
Zarrei M, MacDonald JR, Merico D, Scherer SW (2015) A copy number variation map of the human genome. Nat Rev Genet. https://doi.org/10.1038/nrg3871
Zhou X, Menche J, Barabási AL, Sharma A (2014) Human symptoms-disease network. Nat Commun. https://doi.org/10.1038/ncomms5212
Acknowledgements
The authors would like to thank the Supercomputing and Bioinnovation Center (SCBI) of the University of Malaga for their provision of computational resources and technical support (www.scbi.uma.es/site). This study makes use of data generated by the DECIPHER community. A full list of centers who contributed to the generation of the data is available from http://decipher.sanger.ac.uk and via email from decipher@sanger.ac.uk. Funding for the project was provided by the Wellcome Trust. Those who carried out the original analysis and collection of the data bear no responsibility for the further analysis or interpretation of it by the Recipient or its Registered Users.
Funding
The study was funded by grants from the The Spanish Ministry of Science and Innovation with European Regional Development Fund [PID2019-108096RB-C21 to J.A.G and PID2019-108096RB-C22 to F.P.]; the Andalusian Government with European Regional Development Fund [UMA18-FEDERJA-102]; biomedicine research project [PI-0075-2017] (Fundacion Progreso y Salud); and the Ramon Areces foundation for rare disease investigation (National call for research on life and material sciences, XIX edition). The CIBERER is an initiative from the Carlos III Health Institute (Instituto de Salud Carlos III).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
439_2020_2214_MOESM2_ESM.html
Phenotype-GO HTML enrichments report with phenotype specific enrichment results. Extended explanation is included into file.
439_2020_2214_MOESM3_ESM.html
Phenotype-KEGG HTML enrichments report with phenotype specific enrichment results. Extended explanation is included into file.
439_2020_2214_MOESM4_ESM.html
Phenotype-Reactome HTML enrichments report with phenotype specific enrichment results. Extended explanation is included into file.
439_2020_2214_MOESM5_ESM.html
Patient-GO HTML enrichments report with patients specific enrichment results. Extended explanation is included into file.
Rights and permissions
About this article
Cite this article
Jabato, F.M., Seoane, P., Perkins, J.R. et al. Systematic identification of genetic systems associated with phenotypes in patients with rare genomic copy number variations. Hum Genet 140, 457–475 (2021). https://doi.org/10.1007/s00439-020-02214-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00439-020-02214-7