Abstract
Numerous computational prediction tools have been introduced to estimate the functional impact of variants in the human genome based on evolutionary constraints and biochemical metrics. However, their implementation in diagnostic settings to classify variants faced challenges with accuracy and validity. Most existing tools are pan-genome and pan-diseases, which neglected gene- and disease-specific properties and limited the accessibility of curated data. As a proof-of-concept, we developed a disease-specific prediction tool named Deafness Variant deleteriousness Prediction tool (DVPred) that focused on the 157 genes reportedly causing genetic hearing loss (HL). DVPred applied the gradient boosting decision tree (GBDT) algorithm to the dataset consisting of expert-curated pathogenic and benign variants from a large in-house HL patient cohort and public databases. With the incorporation of variant-level and gene-level features, DVPred outperformed the existing universal tools. It boasts an area under the curve (AUC) of 0.98, and showed consistent performance (AUC = 0.985) in an independent assessment dataset. We further demonstrated that multiple gene-level metrics, including low complexity genomic regions and substitution intolerance scores, were the top features of the model. A comprehensive analysis of missense variants showed a gene-specific ratio of predicted deleterious and neutral variants, implying varied tolerance or intolerance to variation in different genes. DVPred explored the utility of disease-specific strategy in improving the deafness variant prediction tool. It can improve the prioritization of pathogenic variants among massive variants identified by high-throughput sequencing on HL genes. It also shed light on the development of variant prediction tools for other genetic disorders.
Similar content being viewed by others
References
Adzhubei IA et al (2010) A method and server for predicting damaging missense mutations. Nat Methods 7(4):248–249
Aggarwala V, Voight BF (2016) An expanded sequence context model broadly explains variability in polymorphism levels across the human genome. Nat Genet 48(4):349–355
Ashburner M et al (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25(1):25–29
Azaiez H et al (2018) Genomic Landscape and Mutational Signatures of Deafness-Associated Genes. Am J Hum Genet 103(4):484–497
Carter H et al (2013) Identifying Mendelian disease genes with the variant effect scoring tool. BMC Genom 14(Suppl 3):S3
Choi Y et al (2012) Predicting the functional effect of amino acid substitutions and indels. PLoS ONE 7(10):e46688
Consortium G.T. (2013) The Genotype-Tissue Expression (GTEx) project. Nat Genet 45(6):580–585
Davydov EV et al (2010) Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput Biol 6(12):e1001025
di Iulio J et al (2018) The human noncoding genome defined by genetic diversity. Nat Genet 50(3):333–337
Dong C et al (2015) Comparison and integration of deleteriousness prediction methods for nonsynonymous SNVs in whole exome sequencing studies. Hum Mol Genet 24(8):2125–2137
Dorschner MO et al (2013) Actionable, pathogenic incidental findings in 1,000 participants’ exomes. Am J Hum Genet 93(4):631–640
Evans P et al (2019) Genetic variant pathogenicity prediction trained using disease-specific clinical sequencing data sets. Genome Res 29(7):1144–1151
Grimm DG et al (2015) The evaluation of tools used to predict the impact of missense variants is hindered by two types of circularity. Hum Mutat 36(5):513–523
Ioannidis NM et al (2016) REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants. Am J Hum Genet 99(4):877–885
Ionita-Laza I et al (2016) A spectral approach integrating functional genomic annotations for coding and noncoding variants. Nat Genet 48(2):214–220
Iqbal S et al (2020) Comprehensive characterization of amino acid positions in protein structures reveals molecular effect of missense variants. Proc Natl Acad Sci USA 117(45):28201–28211
Itan Y et al (2016) The mutation significance cutoff: gene-level thresholds for variant predictions. Nat Methods 13(2):109–110
Jagadeesh KA et al (2016) M-CAP eliminates a majority of variants of uncertain significance in clinical exomes at high sensitivity. Nat Genet 48(12):1581–1586
Kircher M et al (2014) A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet 46(3):310–315
Kohler S et al (2021) The human phenotype ontology in 2021. Nucleic Acids Res 49(D1):D1207–D1217
Kumar P, Henikoff S, Ng PC (2009) Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc 4(7):1073–1081
Landrum MJ et al (2018) ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res 46(D1):D1062–D1067
Li B et al (2009) Automated inference of molecular mechanisms of disease from amino acid substitutions. Bioinformatics 25(21):2744–2750
Li J et al (2018) Performance evaluation of pathogenicity-computation methods for missense variants. Nucleic Acids Res 46(15):7793–7804
Lu Q et al (2015) A statistical framework to predict functional non-coding regions in the human genome through integrated analysis of annotation data. Sci Rep 5:10576
Mi H et al (2021) PANTHER version 16: a revised family classification, tree-based classification tool, enhancer regions and extensive API. Nucleic Acids Res 49(D1):D394–D403
Oza AM et al (2018) Expert specification of the ACMG/AMP variant interpretation guidelines for genetic hearing loss. Hum Mutat 39(11):1593–1613
Petrovski S et al (2013) Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet 9(8):e1003709
Pollard KS et al (2010) Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res 20(1):110–121
Qi H et al (2021) MVP predicts the pathogenicity of missense variants by deep learning. Nat Commun 12(1):510
Rehm HL et al (2015) ClinGen–the clinical genome resource. N Engl J Med 372(23):2235–2242
Richards S et al (2015) Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med 17(5):405–424
Shihab HA et al (2013) Predicting the functional, molecular, and phenotypic consequences of amino acid substitutions using hidden Markov models. Hum Mutat 34(1):57–65
Siepel A et al (2005) Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res 15(8):1034–1050
Sloan-Heggen CM et al (2016) Comprehensive genetic testing in the clinical evaluation of 1119 patients with hearing loss. Hum Genet 135(4):441–450
Stenson PD et al (2003) Human Gene Mutation Database (HGMD): 2003 update. Hum Mutat 21(6):577–581
Sundaram L et al (2018) Predicting the clinical impact of human mutation with deep neural networks. Nat Genet 50(8):1161–1170
Szklarczyk D et al (2019) STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res 47(D1):D607–D613
van der Velde KJ et al (2017) GAVIN: gene-aware variant interpretation for medical sequencing. Genome Biol 18(1):6
Wang J, Shen Y (2014) When a “disease-causing mutation” is not a pathogenic variant. Clin Chem 60(5):711–713
Xiong HY et al (2015) RNA splicing The human splicing code reveals new insights into the genetic determinants of disease. Science 347(6218):1254806
Yang Y et al (2013) Clinical whole-exome sequencing for the diagnosis of Mendelian disorders. N Engl J Med 369(16):1502–1511
Zaucha J et al (2020) Family-specific analysis of variant pathogenicity prediction tools. NAR Genom Bioinform 2(2):lqaa014
Zhang X et al (2021) Disease-specific variant pathogenicity prediction significantly improves variant interpretation in inherited cardiac conditions. Genet Med 23(1):69–79
Acknowledgements
We are grateful to all students and staff who have contributed to the data curation.
Funding
This study was supported by the National Key Research and Development Program of China (2017YFC0907503) and 1 3 5 project for disciplines of excellence West China Hospital of Sichuan University (ZYJC20002).
Author information
Authors and Affiliations
Contributions
F. B., H. Y. and R. J.H. S. conceived the study. F. B. wrote the manuscript, with the contributions by K. T B. and H. A. M. Z., Y. L., and J. C. organized and performed the data curation. Q. C., Y. W., and X. Z. created and evaluated the model, with contributions by F. B., Q. Z., and X. L.
Corresponding authors
Ethics declarations
Conflict of interest
Yumei Wang, Xia Zhao, and Xiarong Li were employed at GeneDock Co.Ltd. at the time of submission. No other conflicts relevant to this study should be reported.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
439_2022_2440_MOESM6_ESM.xlsx
Supplementary file6 Table S5. List of P/LP variants from DVD (v8.1), ClinVar (v20171203), and HGMD (2017q4) (XLSX 817 KB)
Rights and permissions
About this article
Cite this article
Bu, F., Zhong, M., Chen, Q. et al. DVPred: a disease-specific prediction tool for variant pathogenicity classification for hearing loss. Hum Genet 141, 401–411 (2022). https://doi.org/10.1007/s00439-022-02440-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00439-022-02440-1