A Parallel Expressed Sequence Tag (EST) Clustering Program

  • Kevin Pedretti
  • Todd Scheetz
  • Terry Braun
  • Chad Roberts
  • Natalie Robinson
  • Thomas Casavant
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2127)

Abstract

This paper describes the UIcluster software tool, which partitions Expressed Sequence Tag (EST) sequences and other genetic sequences into “clusters” based on sequence similarity. Ideally, each cluster will contain sequences that all represent the same gene. If a naýve approach such as an NxN comparison (N is the number of sequences input) is taken, the problem is only feasible for very small data sets. UIcluster has been developed over the course of four years to solve this problem efficiently and accurately for large data sets consisting of tens or hundreds of thousands of EST sequences. The latest version of the application has been parallelized using the MPI (message passing interface) standard. Both the computation and memory requirements of the program can be distributed among multiple (possibly distributed) UNIX processes.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Adams M.D., Kerlavage A.R., Fleishmann R.D., Fuldner R.A., Bult C.J., Lee N.H., Kirkness E.F., Weinstock K.G., Gocayne J.D., White O., et al. (1995) Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence. Nature 377:3–17Google Scholar
  2. 2.
    Bonaldo M.F., Lennon G., Soares M.B. (1996) Normalization and subtraction: two approaches to facilitate gene discovery. Genome Research 6:791–806CrossRefGoogle Scholar
  3. 3.
    International Human Genome Sequencing Consortium (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921CrossRefGoogle Scholar
  4. 4.
    Message Passing Interface Form (1994) MPI: A message-passing interface standard. University of Tennessee Technical Report CS-94-230Google Scholar
  5. 5.
    Miller R.T., Christoffels A.G., Gopalakrishnan C., Burke J.A., Ptitsyn A.A., Broveak T.R., Hide W.A. (1999) A comprehensive approach to clustering of expressed human gene sequence: The Sequence Tag Alighment and Consensus Knowledgebase. Genome Research 9:1143–1155CrossRefGoogle Scholar
  6. 6.
    Parsons J.D., Brenner S., Bishop M.J. (1992) Clustering cDNA Sequences. Computational Applications in Bioscience 8:461–466Google Scholar
  7. 7.
    Schuler G.D. (1997) Pieces of the puzzle: expressed sequence tags and the catalog of human genes. Journal of Molecular Medicine 75:694–698CrossRefGoogle Scholar
  8. 8.
    Venter J.C., Adams M.D., Myers E.W., Li P.W., Mural R.J., Sutton G.G., et al. (2001) The sequence of the human genome. Science 291:1304–1351CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Kevin Pedretti
    • 1
  • Todd Scheetz
    • 1
  • Terry Braun
    • 1
  • Chad Roberts
    • 1
  • Natalie Robinson
    • 1
  • Thomas Casavant
    • 1
  1. 1.Parallel Processing Laboratory, and The Coordinated Laboratory for Computational Genomics Dept. of Electrical and Computer EngineeringUniversity of IowaIowa CityUSA

Personalised recommendations