Abstract
The first step in understanding the molecular biology of an inherited disease is to identify which gene or genes are carrying variants. This process starts with locating the mutations in a chromosomal band, as narrow as possible, and follows with the manual analysis of all the genes mapping in this region. Usually this is not an easy task, but it can be facilitated by complementary computational approaches that evaluate all genes in a region of interest. We present here a method that combines literature mining, gene annotations, and sequence homology searches to prioritize candidate genes involved in a given genetic disorder. The method progresses in two steps. Firstly, we compute associations of molecular and phenotypic features as taken from MEDLINE. Secondly, for a disease with a given phenotype and linked to a chromosomal region, sequence homology based searches are carried on the chromosomal region to identify potential candidates that are scored using the precomputed associations. The scoring of associations between biological concepts using links across databases can be extended to other databases in Molecular Biology and to nondisease phenotypes.
Keywords
- Monogenic Disease
- RefSeq Gene
- Sequence Homology Search
- Prioritize Candidate Gene
- International Human Genome Sequencing Consortium
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Ashburner M, Ball CA, Blake JA et al. Nat Genet 2000; 25:25–29.
Altschul SF, Madden TL, Schaffer AA et al. Gapped BLAST and PSi-BLAST: A new generation of protein database search programs. Nucleic Acid Res 1997; 25:3389–3402.
Bergoffen J, Scherer SS, Wang S et al. Connexin mutations in X-linked charcot-marie-tooth disease. Science 1993; 262:2039–2042.
Bork P. Powers and pitfalls in sequence analysis: The 70% hurdle. Genome Res 2001; 10:398–400.
Bouhouche A, Benomar A, Birouk N et al. A locus for an axonal form of autosomal recessive Charcot-Marie-Tooth disease maps to chromosome Iq21.2–q21.3. Am J Hum Genet 1999; 65:722–727.
Camon E, Magrane M, Barrell D et al. The gene ontology annotation (GOA) database: Sharing knowledge in uniprot with gene ontology. Nucleic Acids Res 2004; 32.
Cardon LR, Abecasis GR. Using haplotype blocks to map human complex trait loci. Trends Genet 2003; 19:135–140.
Erie DJ, Yang YH. Asthma investigators begin to reap the fruits of genomics. Genome Biol 2003; 4:232.
Hetman JM, Soderling SH, Glavas NA et al. Cloning and characterization of PDE7B, a cAMP-specific phosphodiesterase. Proc Natl Acad Sci USA 2000; 97:472–476.
Hogenesch JB, Ching KA, Batalov S et al. A comparison of the Celera and Ensembl predicted gene sets reveals little overlap in novel genes. Cell 2001; 106:413–415.
Khandjian EW. Biology of the fragile X mental retardation protein, an RNA-binding protein. Biochem Cell Biol 1999; 77:331–342.
Lander ES et al. International human genome sequencing consortium. Initial sequencing and analysis of the human genome. Nature 2001; 409:860–921.
Olsen RW, Avoli M. GABA and epileptogenesis. Epilepsia 1997; 38:399–407.
Perez-Iratxeta, Bork P, Andrade MA. Association of genes to genetically inherited diseases using data mining. Nat Genet 2002; 31:316–319.
Plaitakis A, Flessas P, Natsiou AB et al. Glutamate dehydrogenase deficiency in cerebellar degenerations: Clinical, biochemical and molecular genetic aspects. Can J Neurol Sci 1993; 20:109–116.
Pruitt KD, Maglott DR. Ref seq and locus link: NCBI gene-centered resources. Nucleic Acids Res 2001; 29:137–140.
Smith CJ, Huang R, Sun D et al. Development of decompensated dilated cardiomyopathy is associated with decreased gene expression and activity of the milrinone-sensitive cAMP phosphodiesterase PDE3A. Circulation 1997; 96:3116–3123.
Torrents D, Suyama M, Zdobnov E et al. A genome-wide survey of human pseudogenes. Genome Res 2003; 13:2559–2567.
Wheeler DL, Church DM, Edgar R et al. Database resources of the national center for biotechnology information: Update. Nucleic Acids Res 2004; 32:D35–D40.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2006 Landes Bioscience and Springer Science+Business Media
About this chapter
Cite this chapter
Perez-Iratxeta, C., Bork, P., Andrade, M.A. (2006). Literature and Genome Data Mining for Prioritizing Disease-Associated Genes. In: Discovering Biomolecular Mechanisms with Computational Biology. Molecular Biology Intelligence Unit. Springer, Boston, MA. https://doi.org/10.1007/0-387-36747-0_6
Download citation
DOI: https://doi.org/10.1007/0-387-36747-0_6
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-34527-7
Online ISBN: 978-0-387-36747-7
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)
