Alternative Parallelization Strategies in EST Clustering

  • Nishank Trivedi
  • Kevin T. Pedretti
  • Terry A. Braun
  • Todd E. Scheetz
  • Thomas L. Casavant
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2763)

Abstract

One of the fundamental components of large-scale gene discovery projects is that of clustering of Expressed Sequence Tags (ESTs) from complementary DNA (cDNA) clone libraries. Clustering is used to create non-redundant catalogs and indices of these sequences. In particular, clustering of ESTs is frequently used to estimate the number of genes derived from cDNA-based gene discovery efforts. This paper presents a novel parallel extension to an EST clustering program, UIcluster4, that incorporates alternative splicing information and a new parallelization strategy. The results are compared to other parallelized EST clustering systems in terms of overall processing time and in accuracy of the resulting clustering.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Boguski, M.S., Lowe, T.M., Tolstoshev, C.M.: dbEST – database for expressed sequence tags. Nature Genetics 4, 332–333 (1993)CrossRefGoogle Scholar
  2. 2.
    Hillier, L., Clark, N., Dubuque, T., Elliston, K., Hawkins, M., Holman, M., Hultman, M., Kucaba, T., Le, M., Lennon, G., Marra, M., Parsons, J., Rifkin, L., Rohlfing, T., Soares, M., Tan, F., Trevaskis, E., Waterston, R., Williamson, A., Wohldmann, P., Wilson, R.: Generation and analysis of 280,000 human expressed sequence tags. Genome Research 6, 807–828 (1996)CrossRefGoogle Scholar
  3. 3.
    Schuler, G.D.: Pieces of the puzzle: expressed sequence tags and the catalog of human genes. Journal of Molecular Medicine 75, 694–698 (1997)CrossRefGoogle Scholar
  4. 4.
    Bonaldo, M.F., Lennon, G., Soares, M.B.: Normalization and subtraction: two approaches to facilitate gene discovery. Genome Research 6, 791–806 (1996)CrossRefGoogle Scholar
  5. 5.
  6. 6.
    Adams, M.D., Kerlavage, A.R., Flieshmann, R.D., Fuldner, R.A., Bult, C.J., Lee, N.H., Kirkness, E.F., Weinstock, K.G., Gocayne, J.D., White, O.: Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence. Nature 377, 3–17 (1995)Google Scholar
  7. 7.
    Miller, R.T., Christoffels, A.G., Gopalakrishnan, C., Burke, J.A., Ptitsyn, A.A., Broveak, T.R., Hide, W.A.: A comprehensive approach to clustering of expressed human gene sequence: The Sequence Tag Alignment and Consensus Knowledgebase. Genome Research 9, 1143–1155 (1999)CrossRefGoogle Scholar
  8. 8.
    Kalyanaraman, A., Aluru, S., Kothari, S.: Space and time efficient parallel algorithms and software for EST clustering. In: International Conference on Parallel Processing, p. 331 (2002)Google Scholar
  9. 9.
    Trivedi, N., Bischof, J., Davis, S., Pedretti, K., Scheetz, T.E., Braun, T.A., Roberts, C.A., Robinson, N.L., Sheffield, V.C., Soares, M.B., Casavant, T.L.: Parallel creation of non-redundant gene indices from partial mRNA transcipt. Future Generation Computer Systems 18, 863–870 (2002)CrossRefMATHGoogle Scholar
  10. 10.
    Message Passing Interface Form: MPI: A message-passing interface standard. University of Tennessee Technical Report, CS–94230 (1994)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Nishank Trivedi
    • 1
  • Kevin T. Pedretti
    • 2
  • Terry A. Braun
    • 1
  • Todd E. Scheetz
    • 1
  • Thomas L. Casavant
    • 1
  1. 1.The University of Iowa, Iowa CityIowaUSA
  2. 2.Sandia National LabsAlbuquerqueUSA

Personalised recommendations