Advertisement

The Journal of Supercomputing

, Volume 3, Issue 4, pp 255–269 | Cite as

Study of protein sequence comparison metrics on the connection machine CM-2

  • Eric Lander
  • Jill P. Mesirov
  • Washington TaylorIV
Article

Abstract

Software tools have been developed to do rapid, large-scale protein sequence comparisons on databases of amino acid sequences, using a data parallel computer architecture. This software enables one to compare a protein against a database of several thousand proteins in the same time required by a conventional computer to do a single protein-protein comparison, thus enabling biologists to find relevant similarities much more quickly, and to evaluate many different comparison metrics in a reasonable period of time. We have used this software to analyze the effectiveness of various scoring metrics in determining sequence similarity, and to generate statistical information about the behavior of these scoring systems under the variation of certain parameters.

Key words

proteins sequence comparison dynamic programming parallel computing computational biology 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arratia, R., and Lander, E. S. 1989. The Distribution of Clusters in Random Graphs.Adv. Appl. Math. (to appear).Google Scholar
  2. Arratia, R., and Waterman, M. S. 1985. An Erdos-Renyi Law with shifts, Adv. Math., 55:13–23.Google Scholar
  3. Arratia, R., Gordon, L., and Waterman, M. S. 1986. An extreme value distribution for sequence matching.Ann. Stat., 14:971–993.Google Scholar
  4. Bellman, R. 1957.Dynamic Programming. Princeton University Press, Princeton, N. Jersey.Google Scholar
  5. Coulson, A. F. W., Collins, J. F., and Lyall, A. 1987. Protein and nucleic acid sequence database searching: A suitable case for parallel processing.The Computer J., 30, 5:420–424.Google Scholar
  6. Doolittle, R. F. 1986.Of Urfs and Orfs: A Primer on How to Analyze Derived Amino Acid Sequences. University Science Books, Mill Valley, Calif.Google Scholar
  7. Doolittle, R. F., Hunkapiller, M. W., Hood, L. E., Devare, S. G., Robbins, K. C., Aaronson, S. A., and Antoniades, H. M. 1983. Simian sarcoma viruses oncogene v-sis is derived from the gene (or genes) encoding a platelet-derived growth factor.Science, 221:275–276.Google Scholar
  8. Edmiston, E., and Wagner, R. A. 1987. Parallelization of the dynamic programming algorithm for comparison of sequences. InProc., 1987 International Conf. on Parallel Processing (Chicago, Aug. 17–21), Penn State Press, Philadelphia, pp. 78–80.Google Scholar
  9. Hillis, W. D. 1985.The Connection Machine. MIT Press, Cambridge, Mass.Google Scholar
  10. Johnson, D. S. 1973.Near-optimal bin packing algorithms. Ph.D. diss., Dept. of Mathematics, Mass. Institute of Technology, Cambridge, Mass.Google Scholar
  11. Lander, E., Mesirov, J. P., and Taylor, W. 1988. Protein sequence comparison on a data parallel computer. InProc., 1988 Internatinal Conf. on Parallel Processing (Chicago, Aug. 15–19), Penn State Press, Philadelphia, pp. 257–263.Google Scholar
  12. Maxam, A. M., and Gilbert, W. 1977.Proc., Nat. Acad. Sci., 74:560–564.Google Scholar
  13. Needleman, S. B., and Wunsch, C. D. 1970. A general method applicable to the search for similarities in the amino acid sequences of two proteins.J. Mol. Biol., 48:444–453.Google Scholar
  14. Sanger, F., Nicklen, S., and Coulson, A. R. 1977.Proc., Nat. Acad. Sci., 74:5463–5467.Google Scholar
  15. Smith, T. F., and Waterman, M. S. 1981. Identification of common molecular subsequences.J. Mol. Biol., 147:195–197.Google Scholar
  16. Thinking Machines Corp. 1987. Connection Machine® Model CM-2 technical summary.Google Scholar
  17. Waterman, M. S., Gordon, L., and Arratia, R. 1987. Phase transitions in sequence matches and nucleic acid structure.Proc., Nat. Acad. Sci., 84:1239–1243.Google Scholar

Copyright information

© Kluwer Academic Publishers 1989

Authors and Affiliations

  • Eric Lander
    • 1
    • 2
  • Jill P. Mesirov
    • 3
  • Washington TaylorIV
    • 3
  1. 1.Whitehead Institute for Biomedical Research, 9 Cambridge CenterCambridgeUSA
  2. 2.Harvard UniversityCambridgeUSA
  3. 3.Thinking Machines CorporationCambridgeUSA

Personalised recommendations