Skip to main content

Literature and Genome Data Mining for Prioritizing Disease-Associated Genes

  • Chapter

Part of the Molecular Biology Intelligence Unit book series (MBIU)

Abstract

The first step in understanding the molecular biology of an inherited disease is to identify which gene or genes are carrying variants. This process starts with locating the mutations in a chromosomal band, as narrow as possible, and follows with the manual analysis of all the genes mapping in this region. Usually this is not an easy task, but it can be facilitated by complementary computational approaches that evaluate all genes in a region of interest. We present here a method that combines literature mining, gene annotations, and sequence homology searches to prioritize candidate genes involved in a given genetic disorder. The method progresses in two steps. Firstly, we compute associations of molecular and phenotypic features as taken from MEDLINE. Secondly, for a disease with a given phenotype and linked to a chromosomal region, sequence homology based searches are carried on the chromosomal region to identify potential candidates that are scored using the precomputed associations. The scoring of associations between biological concepts using links across databases can be extended to other databases in Molecular Biology and to nondisease phenotypes.

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.00
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (Canada)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ashburner M, Ball CA, Blake JA et al. Nat Genet 2000; 25:25–29.

    CrossRef  PubMed  CAS  Google Scholar 

  2. Altschul SF, Madden TL, Schaffer AA et al. Gapped BLAST and PSi-BLAST: A new generation of protein database search programs. Nucleic Acid Res 1997; 25:3389–3402.

    CrossRef  PubMed  CAS  Google Scholar 

  3. Bergoffen J, Scherer SS, Wang S et al. Connexin mutations in X-linked charcot-marie-tooth disease. Science 1993; 262:2039–2042.

    CrossRef  PubMed  CAS  Google Scholar 

  4. Bork P. Powers and pitfalls in sequence analysis: The 70% hurdle. Genome Res 2001; 10:398–400.

    CrossRef  Google Scholar 

  5. Bouhouche A, Benomar A, Birouk N et al. A locus for an axonal form of autosomal recessive Charcot-Marie-Tooth disease maps to chromosome Iq21.2–q21.3. Am J Hum Genet 1999; 65:722–727.

    CrossRef  PubMed  CAS  Google Scholar 

  6. Camon E, Magrane M, Barrell D et al. The gene ontology annotation (GOA) database: Sharing knowledge in uniprot with gene ontology. Nucleic Acids Res 2004; 32.

    Google Scholar 

  7. Cardon LR, Abecasis GR. Using haplotype blocks to map human complex trait loci. Trends Genet 2003; 19:135–140.

    CrossRef  PubMed  CAS  Google Scholar 

  8. Erie DJ, Yang YH. Asthma investigators begin to reap the fruits of genomics. Genome Biol 2003; 4:232.

    CrossRef  Google Scholar 

  9. Hetman JM, Soderling SH, Glavas NA et al. Cloning and characterization of PDE7B, a cAMP-specific phosphodiesterase. Proc Natl Acad Sci USA 2000; 97:472–476.

    CrossRef  PubMed  CAS  Google Scholar 

  10. Hogenesch JB, Ching KA, Batalov S et al. A comparison of the Celera and Ensembl predicted gene sets reveals little overlap in novel genes. Cell 2001; 106:413–415.

    CrossRef  PubMed  CAS  Google Scholar 

  11. Khandjian EW. Biology of the fragile X mental retardation protein, an RNA-binding protein. Biochem Cell Biol 1999; 77:331–342.

    CrossRef  PubMed  CAS  Google Scholar 

  12. Lander ES et al. International human genome sequencing consortium. Initial sequencing and analysis of the human genome. Nature 2001; 409:860–921.

    CrossRef  PubMed  CAS  Google Scholar 

  13. Olsen RW, Avoli M. GABA and epileptogenesis. Epilepsia 1997; 38:399–407.

    CrossRef  PubMed  CAS  Google Scholar 

  14. Perez-Iratxeta, Bork P, Andrade MA. Association of genes to genetically inherited diseases using data mining. Nat Genet 2002; 31:316–319.

    PubMed  CAS  Google Scholar 

  15. Plaitakis A, Flessas P, Natsiou AB et al. Glutamate dehydrogenase deficiency in cerebellar degenerations: Clinical, biochemical and molecular genetic aspects. Can J Neurol Sci 1993; 20:109–116.

    Google Scholar 

  16. Pruitt KD, Maglott DR. Ref seq and locus link: NCBI gene-centered resources. Nucleic Acids Res 2001; 29:137–140.

    CrossRef  PubMed  CAS  Google Scholar 

  17. Smith CJ, Huang R, Sun D et al. Development of decompensated dilated cardiomyopathy is associated with decreased gene expression and activity of the milrinone-sensitive cAMP phosphodiesterase PDE3A. Circulation 1997; 96:3116–3123.

    PubMed  CAS  Google Scholar 

  18. Torrents D, Suyama M, Zdobnov E et al. A genome-wide survey of human pseudogenes. Genome Res 2003; 13:2559–2567.

    CrossRef  PubMed  CAS  Google Scholar 

  19. Wheeler DL, Church DM, Edgar R et al. Database resources of the national center for biotechnology information: Update. Nucleic Acids Res 2004; 32:D35–D40.

    CrossRef  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Landes Bioscience and Springer Science+Business Media

About this chapter

Cite this chapter

Perez-Iratxeta, C., Bork, P., Andrade, M.A. (2006). Literature and Genome Data Mining for Prioritizing Disease-Associated Genes. In: Discovering Biomolecular Mechanisms with Computational Biology. Molecular Biology Intelligence Unit. Springer, Boston, MA. https://doi.org/10.1007/0-387-36747-0_6

Download citation

Publish with us

Policies and ethics