Skip to main content
Log in

Identifying Candidate Disease Genes with High-Performance Computing

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The publicly-funded effort to read the complete nucleotide sequence of the human genome, the human genome project (HGP), is nearing completion of the approximately three billion nucleotides of the human genome. In addition, several valuable sources of information have been developed as direct and indirect results of the HGP. These include the genome sequencing of model organisms (Escherichia coli, Saccharomyces cerevisiae, the fruit fly Drosophila melanogaster, the worm Caenorhabditis elegans, and the laboratory mouse), gene discovery projects (expressed sequence tags and full-length), and new high-throughput expression analyzes. These resources are invaluable in identifying the trascriptome and proteome—the set of transcribed and translated sequences. However, the bulk of the effort still remains—to identify the functional and structural elements contained within gene sequences. Addressing these challenges requires the use of high-performance computing. There are currently hundreds of databases containing biological information that may contain data relevant to the identification of disease-causing genes. Knowledge discovery using these databases holds enormous potential, if sufficient computing resources are utilized to process the overwhelming amounts of data. We are developing a system to acquire and mine data from a subset of these databases to aid our efforts to identify disease genes. A high performance cluster of Linux of workstations is used to perform distributed sequence alignments as part of our analysis and processing. This system has been used to mine the GeneMap99 database within specific genomic intervals to identify potential candidate disease genes associated with Bardet-Biedl syndrome (BBS).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. B. Agnew. When pharma merges, R&D is the dowry. Science, 287(5460):1952–1953, 2000.

    Google Scholar 

  2. S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. Basic local alignment search tool. J. Mol. Biol., 215:403–410, 1990.

    Google Scholar 

  3. J. B. L. Bard, R. A. Baldock, and D. R. Davidson. Elucidating the genetic networks of development: a bioinformatics approach. Genome Research, 8:859–863, 1998.

    Google Scholar 

  4. G. Bardet. Sur un syndrome d'obesite infantile avec polydactylie et retinite pigmentaire (contribution a l'etude des formes cliniques de l'obesite hypophysaire). Thesis: Paris Note: No. 479, 1920.

  5. A. D. Baxevanis. The molecular biology database collection: an updated compilation of biological database resources. Nucleic Acids Research, 29(1):1–10, 2001.

    Google Scholar 

  6. A. Biedl. Ein Geschwisterpaar mit adiposo-genitaler Dystrophie. Dtsch. Med. Wschr., 48:1630, 1922.

    Google Scholar 

  7. C. Blaschke and J. C. Oliveros. Mining functional information associated with expression arrays. Functional Integrattive Genomics, 1:256–268, 2001.

    Google Scholar 

  8. E. A. Bruford, R. Riise, P. W. Teague, K. Porter, K. L. Thomson, A. T. Moore, M. Jay, M. Warburg, A. Schinzel, N. Tommerup, K. Tornqvist, T. Rosenberg, M. Patton, D. C. Mansfield, and A. F. Wright. Linkage mapping in 29 Bardet-Biedl syndrome families confirms loci in chromosomal regions 11q13, 15q22.3-q23, and 16q21. Genomics, 41:93–99, 1997.

    Google Scholar 

  9. M. Burset and R. Guigo. Evaluation of gene structure prediction programs. Genomics, 34:353–367, 1996.

    Google Scholar 

  10. R. Carmi, T. Rokhlina, A. E. Kwitek-Black, K. Elbedour, D. Nishimura, E. M. Stone, and V. C. Sheffield. Use of a DNA pooling strategy to identify a human obesity syndrome locus on chromosome 15. Hum. Molec. Genet., 4:9–13, 1995.

    Google Scholar 

  11. J.-M. Claverie. From bioinformatics to computational biology. Genome Research, 10:1277–1279, 2000.

    Google Scholar 

  12. S. R. Eddy. A review of the profile HMM literature from 1996-1998. Bioinformatics, 14:755–763, 1998.

    Google Scholar 

  13. B. Ewing and P. Green. Analysis of expressed sequence tags indicates 35,000 human genes. Nature Genetics, 25:232–234, 2000.

    Google Scholar 

  14. V. Ganti and R. Ramakrishnan. Mining very large databases. Computer, 38–45, August, 1999.

  15. J. S. Green, P. S. Parfrey, J. D. Harnett, N. R. Farid, B. C. Cramer, G. Johnson, O. Heath, P. J. McManamon, E. O'Leary, and W. Pryse-Phillips. The cardinal manifestations at Bardet-Biedl syndrome, a form of Laurence-Moon-Biedl syndrome. New Eng. J. Med., 321:1002–1009, 1989.

    Google Scholar 

  16. J. L. Hennessy and D. A. Patterson. Computer Architecture: A Quantitative Approach, Morgan Kaufman Publishers, Inc., San Francisco, CA., USA. p. 7, 1996.

    Google Scholar 

  17. T. Jenssen, A. Laegreid, J. Komorowski, and E. Hovig. A literature network of human genes for high-throughput analysis of gene expression. Nature Genetics, 28:21–28, 2001.

    Google Scholar 

  18. D. P. Kreil and T. Etzold. DATABANKS-a catalog database of molecular biology databases. Trends in Biochemical Sciences, 24(4):155–157, 1999.

    Google Scholar 

  19. C. Kalb. When drugs do harm. Newsweek, p. 61, April 27, 1998.

  20. E. S. Lander, et al. Initial sequencing and analysis of the human genome. Nature, 409:860–921, February 15, 2001.

    Google Scholar 

  21. K. Mykytyn, T. Braun, R. Carmi, N. B. Haider, C. C. Searby, M. Shastri, G. Beck, A. F. Wright, A. Iannaccone, K. Elbedour, R. Riise, A. Baldi, A. Raas-Rothschild, S. W. Gorman, D. M. Duhl, S. G. Jacobson, T. Casavant, E. M. Stone, V. C. Sheffield, Identification of the gene causing the human obesity syndrome, BBS4. June, 2001. To appear in Nature Genetics.

  22. J. Ott. Analysis of Human Genetic Linkage, Johns Hopkins University Press, Baltimore and London, pp. 54–80, 1991.

    Google Scholar 

  23. S. L. Salzberg, D. B. Searles, and S. Kasif. Computaitonal Methods in Molecular Biology, Elsevier, Amsterdam, The Netherlands, pp. 228, 1999.

    Google Scholar 

  24. A. M. Slavotinek, E. M. Stone, K. Mykytyn, J. R. Heckenlively, J. S. Green, E. Heon, M. A. Musarella, P. S. Parfrey, V. C. Sheffield, and L. G. Biesecker. Mutations in MKKS cause Bardet-Biedl syndrome. Nature Genet., 26:15–16, 2000.

    Google Scholar 

  25. S. Solis-cohen and E. Weiss. Dystrophia adiposogenitalis, with atypical retinitis pigmentosa and mental deficiency, possibly of cerebral origin: a report of four cases in one family. Trans. Assoc. Am. Phys., 39:356–358, 1924.

    Google Scholar 

  26. R. H. Tamarin. Principles of Genetics, Wm. C. Brown Publishers. Dubuque, IA, 1996.

    Google Scholar 

  27. A. Watson. The universe shows its age. Science, 279(5353):981–983, 1998.

    Google Scholar 

  28. T.-L. Young, M. O. Woods, P. S. Parfrey, J. S. Green, E. O'Leary, D. Hefferton, and W. S. Davidson. Canadian Bardet-Biedl syndrome family reduces the critical region of BBS3 (3p) and presents with a variable phenotype. Am. J. Med. Genet., 78:461–467, 1998.

    Google Scholar 

  29. ftp://ftp.ncbi.nlm.nih.gov/blast/db/

  30. http://www.ncbi.nlm.nih.gov/Entrez/

  31. http://www.ncbi.nlm.nih.gov/genome/guide/human/

  32. http://www.ncbi.nlm.nih.gov/Genbank/index.html

  33. http://www.nhgri.nih.gov/HGP/

  34. http://archive.uwcm.ac.uk/uwcm/mg/hgmd0.html

  35. http://hmmer.wustl.edu/

  36. http://www.java.sun.com/

  37. http://www.genome.ad.jp/kegg/

  38. http://research. marshfieldclinic.org/genetics/

  39. http://www.informatics.jax.org/reports/homology map/mouse_human.shtml

  40. http://www.ncbi.nlm.nih.gov

  41. http://www.ncbi.nlm.nih.gov/entrez/Omim/mimstats.html

  42. http://www.OpenPbs.org

  43. http://www.genome.washington.edu/UWGC/analysistools/phrap.htm

  44. http://www.ncbi.nih.gov/RefSeq/index.html

  45. http://ftp.genome.washington.edu/RM/RepeatMasker.html

  46. http://www.ncbi.nlm.nih.gov/SAGE/

  47. http://searchlauncher.bcm.tmc.edu:9331/seq-search/struc-predict.html

  48. http://genome.ucsc.edu/

  49. http://eyeball.eng.uiowa.edu/clustering/

  50. http://www.ncbi.nlm.nih.gov/UniGene/

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Braun, T.A., Scheetz, T.E., Webster, G. et al. Identifying Candidate Disease Genes with High-Performance Computing. The Journal of Supercomputing 26, 7–24 (2003). https://doi.org/10.1023/A:1024417200364

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1024417200364

Navigation