Bulletin of Mathematical Biology

, Volume 51, Issue 1, pp 39–54 | Cite as

Algorithms for the optimal identification of segment neighborhoods

  • Ivan E. Auger
  • Charles E. Lawrence


Two algorithms for the efficient identification of segment neighborhoods are presented. A segment neighborhood is a set of contiguous residues that share common features. Two procedures are developed to efficiently find estimates for the parameters of the model that describe these features and for the residues that define the boundaries of each segment neighborhood. The algorithms can accept nearly any model of segment neighborhood, and can be applied with a broad class of best fit functions including least squares and maximum likelihood. The algorithms successively identify the most important features of the sequence. The application of one of these methods to the haemagglutinin protein of influenza virus reveals a possible mechanism for conformational change through the finding of a break in a strong heptad repeat structure.


Influenza Virus Heptad Coiled Coil Hydrophobic Segment Segmented Regression 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Akaike, H. 1970. “Statistical Predictor Identification.”Ann. Inst. Statist. Math. 22, 203–217.zbMATHMathSciNetGoogle Scholar
  2. — 1974. “A New Look At Statistical Model Identification.”IEEE Trans. Auto. Control 19, 716–723.zbMATHMathSciNetCrossRefGoogle Scholar
  3. Allen, D. M. 1971. “Mean Square Error of Prediction as a criterion for Selecting Variables.”Technometrics 16, 469–475.CrossRefGoogle Scholar
  4. Bellman, R. and R. Roth. 1966. “Curve Fitting by Segmented Straight Lines.”J. Am. Statist. Assoc 46, 1079–1084.MathSciNetGoogle Scholar
  5. Bement, T. R. and M. S. Waterman. 1977. “Locating Maximum variance Segments in Sequential Data.”Math. Geol. 9, 55–61.CrossRefGoogle Scholar
  6. Box, G. E. P. and S. Watson. 1962. “Robustness to non-normality of Regression Tests.”Biometrika 17, 83–91.MathSciNetGoogle Scholar
  7. Cohen C. and D. A. D. Parry. 1986. “α-Helical Coiled Coils—A Widespread Motif in Proteins”.Trends in Biochemical Sciences 11, 245–248.CrossRefGoogle Scholar
  8. Crick, F. H. 1953. “The packing of α-helices: Simple Coiled Coil.”Acta Cryst. 6, 689–697.zbMATHCrossRefGoogle Scholar
  9. Cornette, J. L., K. B. Cease, H. Margalit, J. L. Spouge, J. A. Berzofsky and C. DeLisi. 1987. “Hydrophobicity Scales and Computational Techniques for Detecting Amphipathic Structures in Proteins.”J. Molec. Biol. 195, 659–685.CrossRefGoogle Scholar
  10. Dayhoff, M. O., R. N. Schwartz and B. C. Orcutt. 1978.Atlas of Protein Sequence and Structure, Vol 3, pp. 345–352. Silver Spring, MD: National Biomedical Research Foundation.Google Scholar
  11. DeLisi C. and J. A. Berzofsky. 1985. “T-cell Antigenic Sites Tend to be Amphipathic Structures.”Proc. Natn. Acad. Sci. U.S.A. 82, 7048–7052.CrossRefGoogle Scholar
  12. Eisenberg, D., E. Schwarz, M. Komaromy and R. Wall. 1984. “Analysis of Membrane and Surface Protein Sequences with the Hydrophobic Moment Plot.”J. Molec. Biol. 179, 125–142.CrossRefGoogle Scholar
  13. Engelman, D. M. and G. Zaccai. 1980. “Bacteriorhodopsin is an Inside-Out Protein.”Proc. Natn. Acad. Sci. U.S.A. 77, 5894–5898.CrossRefGoogle Scholar
  14. Eventoff, W., M. G. Rossmann, S. S. Taylor, H. J. Torff, H. Meyer, W. Keil and H. H. Kiltz. 1977. “Structural Adaptation of Lactate Dehydrogenase Isozymes.”Proc. Natn. Acad. Sci. U.S.A. 74, 2677–2681.CrossRefGoogle Scholar
  15. Feder, P. I. 1975a. “On Asymptotic Distribution Theory in Segmented Regression Problems—Identified Cases.”Ann. Statistics 3, 49–83.zbMATHMathSciNetGoogle Scholar
  16. — 1975b. “The log Likelihood Ratio in Segmented Regression.”Ann Statistics 3, 84–97.zbMATHMathSciNetGoogle Scholar
  17. Flory, P. J. 1956. “Theory of Elastic Mechanisms in Fibrous Proteins.”J. Am. Chem. Soc. 78, 5222–5235.CrossRefGoogle Scholar
  18. Fousler, D. E. and S. Karlin. 1987. “Maximal Success Duration for A Semi-Markov Process.”Stochastic Processes Applic. 24, 203–224.zbMATHMathSciNetCrossRefGoogle Scholar
  19. Hawkins, D. M. 1976. “Point Estimation of the Parameters of Piecewise Regression Models.”Appl. Statistics 25, 51–57.MathSciNetCrossRefGoogle Scholar
  20. Heijne, G. von, 1986. “Mitochondrial Targeting Sequences May Form Amphiphilic Helices.”EMBO J. 5, 1335–1342.Google Scholar
  21. Hinkley, D. V. 1971. “Inference in Two-phase Regression.”J. Am. Statis. Assoc. 66, 736–743.zbMATHCrossRefGoogle Scholar
  22. Hopps, T. P. and K. P. Woods. 1981. “Prediction of Protein Antigenic Determinations from Amino Acid Sequences.”Proc. Natn. Acad. Sci. U.S.A. 78, 3824–3828.CrossRefGoogle Scholar
  23. Karlin, S. and G. Ghandour. 1985. “Multiple Alphabet Amino Acid Sequence Comparisons of the Immunoglobulin Kappa-gene.”Proc. Natn. Acad. Sci. U.S.A. 82, 8597–8601.CrossRefGoogle Scholar
  24. Kendall, M. and A. Stuart. 1979.The Advanced Theory of Statistics, New York: Macmillan.Google Scholar
  25. Kirschner, K. and H. Bisswanger. 1976. “Multifunctional Proteins.”A. Rev. Biochem. 45, 143–166.CrossRefGoogle Scholar
  26. Kyte, J. and R. P. Doolittle. 1982. “A Simple Method for Displaying the Hydropathic Character of a Protein.”J. Molec. Biol. 157, 105–132.CrossRefGoogle Scholar
  27. Lawrence, C. E. and A. A. Reilly. 1985. “Maximum Likelihood Estimation of Subsequence Conservation.”J. Theor. Biol. 113, 425–439.MathSciNetCrossRefGoogle Scholar
  28. Lerman, P. M. 1980. “Fitting Segmented Regression Models by Grid Search.”Appl. Statistics 29, 77–84.CrossRefGoogle Scholar
  29. Leszczynski, J. F. and G. D. Rose. 1986. “Loops in Globular Proteins: A Novel Category of Secondary Structure.”Science 234, 849–855.Google Scholar
  30. Mallows, C. L. 1973. “Some Comments onC p.”Technometrics 15, 661–675.zbMATHCrossRefGoogle Scholar
  31. Pearson, E. S. and N. W. Please. 1975. “Relation Between the Shape of Population Distributions and the Robustness of Four Simple Statistical Tests.”Biometrika 62, 223–241.zbMATHMathSciNetCrossRefGoogle Scholar
  32. Quandt, R. E. 1972. “New Approaches to Estimating Switching Regressions.”J. Am. Statist. Assoc. 67, 306–330.zbMATHCrossRefGoogle Scholar
  33. Rose, G. D. 1979. “Hierarchic Organization of Domains in Globular Proteins.”J. Molec. Biol. 134, 447–470.CrossRefGoogle Scholar
  34. Sankoff, D. and J. B. Kruskal (Eds.). 1983.Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Reading, MA: Addison-Wesley.Google Scholar
  35. Schulz, G. E. and R. H. Schirmer. 1979.Principles of Protein Structure. New York: Springer.Google Scholar
  36. Seber, G. A. F. 1977.Linear Regression Analysis. Wiley: New York.Google Scholar
  37. Skehel, J.J., P. M. Baylel, E. B. Brown, S. R. Martin, M. D. Waterfield, J. M. White, I. A. Wilson and D. C. Wiley, 1982. “Changes in the Conformation of Influenza Virus Hemagglutinin at the pH Optimum of Virus-mediated Membrane Fusion.”Proc. Natn. Acad. Sci. U.S.A. 79, 968–972.CrossRefGoogle Scholar
  38. Sternberg, M. J. and E. Thornton. 1977. “On the Conformation of Proteins: An Analysis of β-Pleated Sheets.”J. Molec. Biol. 110, 285–296.Google Scholar
  39. Waterman, M. S. 1984. “General Methods of Sequence Comparison.”Bull. Math. Biol. 46, 473–500.zbMATHMathSciNetCrossRefGoogle Scholar
  40. Wetlaufer, D. E. 1972. “Nucleation, Rapid Folding, and Globular Intrachain Regions in Proteins.”Proc. Natn. Acad. Sci. U.S.A. 70, 697–701.CrossRefGoogle Scholar
  41. Wilson, I. A., J. J. Skehel and D. C. Wiley. 1981. “Structure of the Haemagglutinin Membrane Glycoprotein of Influenza Virus at 3 Å Resolution.”Nature 289, 366–373.CrossRefGoogle Scholar
  42. Worsley, K. J. 1983. “Testing for Two-phase Multiple Regression.”Technometrics 25, 35–42.zbMATHMathSciNetCrossRefGoogle Scholar

Copyright information

© Society for Mathematical Biology 1989

Authors and Affiliations

  • Ivan E. Auger
    • 1
  • Charles E. Lawrence
    • 1
  1. 1.Laboratory of BiometricsWadsworth Center for Laboratories and Research, New York State Department of HealthAlbanyU.S.A.

Personalised recommendations