Skip to main content
Log in

Algorithms for the optimal identification of segment neighborhoods

  • Published:
Bulletin of Mathematical Biology Aims and scope Submit manuscript

Abstract

Two algorithms for the efficient identification of segment neighborhoods are presented. A segment neighborhood is a set of contiguous residues that share common features. Two procedures are developed to efficiently find estimates for the parameters of the model that describe these features and for the residues that define the boundaries of each segment neighborhood. The algorithms can accept nearly any model of segment neighborhood, and can be applied with a broad class of best fit functions including least squares and maximum likelihood. The algorithms successively identify the most important features of the sequence. The application of one of these methods to the haemagglutinin protein of influenza virus reveals a possible mechanism for conformational change through the finding of a break in a strong heptad repeat structure.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Literature

  • Akaike, H. 1970. “Statistical Predictor Identification.”Ann. Inst. Statist. Math. 22, 203–217.

    MATH  MathSciNet  Google Scholar 

  • — 1974. “A New Look At Statistical Model Identification.”IEEE Trans. Auto. Control 19, 716–723.

    Article  MATH  MathSciNet  Google Scholar 

  • Allen, D. M. 1971. “Mean Square Error of Prediction as a criterion for Selecting Variables.”Technometrics 16, 469–475.

    Article  Google Scholar 

  • Bellman, R. and R. Roth. 1966. “Curve Fitting by Segmented Straight Lines.”J. Am. Statist. Assoc 46, 1079–1084.

    MathSciNet  Google Scholar 

  • Bement, T. R. and M. S. Waterman. 1977. “Locating Maximum variance Segments in Sequential Data.”Math. Geol. 9, 55–61.

    Article  Google Scholar 

  • Box, G. E. P. and S. Watson. 1962. “Robustness to non-normality of Regression Tests.”Biometrika 17, 83–91.

    MathSciNet  Google Scholar 

  • Cohen C. and D. A. D. Parry. 1986. “α-Helical Coiled Coils—A Widespread Motif in Proteins”.Trends in Biochemical Sciences 11, 245–248.

    Article  Google Scholar 

  • Crick, F. H. 1953. “The packing of α-helices: Simple Coiled Coil.”Acta Cryst. 6, 689–697.

    Article  MATH  Google Scholar 

  • Cornette, J. L., K. B. Cease, H. Margalit, J. L. Spouge, J. A. Berzofsky and C. DeLisi. 1987. “Hydrophobicity Scales and Computational Techniques for Detecting Amphipathic Structures in Proteins.”J. Molec. Biol. 195, 659–685.

    Article  Google Scholar 

  • Dayhoff, M. O., R. N. Schwartz and B. C. Orcutt. 1978.Atlas of Protein Sequence and Structure, Vol 3, pp. 345–352. Silver Spring, MD: National Biomedical Research Foundation.

    Google Scholar 

  • DeLisi C. and J. A. Berzofsky. 1985. “T-cell Antigenic Sites Tend to be Amphipathic Structures.”Proc. Natn. Acad. Sci. U.S.A. 82, 7048–7052.

    Article  Google Scholar 

  • Eisenberg, D., E. Schwarz, M. Komaromy and R. Wall. 1984. “Analysis of Membrane and Surface Protein Sequences with the Hydrophobic Moment Plot.”J. Molec. Biol. 179, 125–142.

    Article  Google Scholar 

  • Engelman, D. M. and G. Zaccai. 1980. “Bacteriorhodopsin is an Inside-Out Protein.”Proc. Natn. Acad. Sci. U.S.A. 77, 5894–5898.

    Article  Google Scholar 

  • Eventoff, W., M. G. Rossmann, S. S. Taylor, H. J. Torff, H. Meyer, W. Keil and H. H. Kiltz. 1977. “Structural Adaptation of Lactate Dehydrogenase Isozymes.”Proc. Natn. Acad. Sci. U.S.A. 74, 2677–2681.

    Article  Google Scholar 

  • Feder, P. I. 1975a. “On Asymptotic Distribution Theory in Segmented Regression Problems—Identified Cases.”Ann. Statistics 3, 49–83.

    MATH  MathSciNet  Google Scholar 

  • — 1975b. “The log Likelihood Ratio in Segmented Regression.”Ann Statistics 3, 84–97.

    MATH  MathSciNet  Google Scholar 

  • Flory, P. J. 1956. “Theory of Elastic Mechanisms in Fibrous Proteins.”J. Am. Chem. Soc. 78, 5222–5235.

    Article  Google Scholar 

  • Fousler, D. E. and S. Karlin. 1987. “Maximal Success Duration for A Semi-Markov Process.”Stochastic Processes Applic. 24, 203–224.

    Article  MATH  MathSciNet  Google Scholar 

  • Hawkins, D. M. 1976. “Point Estimation of the Parameters of Piecewise Regression Models.”Appl. Statistics 25, 51–57.

    Article  MathSciNet  Google Scholar 

  • Heijne, G. von, 1986. “Mitochondrial Targeting Sequences May Form Amphiphilic Helices.”EMBO J. 5, 1335–1342.

    Google Scholar 

  • Hinkley, D. V. 1971. “Inference in Two-phase Regression.”J. Am. Statis. Assoc. 66, 736–743.

    Article  MATH  Google Scholar 

  • Hopps, T. P. and K. P. Woods. 1981. “Prediction of Protein Antigenic Determinations from Amino Acid Sequences.”Proc. Natn. Acad. Sci. U.S.A. 78, 3824–3828.

    Article  Google Scholar 

  • Karlin, S. and G. Ghandour. 1985. “Multiple Alphabet Amino Acid Sequence Comparisons of the Immunoglobulin Kappa-gene.”Proc. Natn. Acad. Sci. U.S.A. 82, 8597–8601.

    Article  Google Scholar 

  • Kendall, M. and A. Stuart. 1979.The Advanced Theory of Statistics, New York: Macmillan.

    Google Scholar 

  • Kirschner, K. and H. Bisswanger. 1976. “Multifunctional Proteins.”A. Rev. Biochem. 45, 143–166.

    Article  Google Scholar 

  • Kyte, J. and R. P. Doolittle. 1982. “A Simple Method for Displaying the Hydropathic Character of a Protein.”J. Molec. Biol. 157, 105–132.

    Article  Google Scholar 

  • Lawrence, C. E. and A. A. Reilly. 1985. “Maximum Likelihood Estimation of Subsequence Conservation.”J. Theor. Biol. 113, 425–439.

    Article  MathSciNet  Google Scholar 

  • Lerman, P. M. 1980. “Fitting Segmented Regression Models by Grid Search.”Appl. Statistics 29, 77–84.

    Article  Google Scholar 

  • Leszczynski, J. F. and G. D. Rose. 1986. “Loops in Globular Proteins: A Novel Category of Secondary Structure.”Science 234, 849–855.

    Google Scholar 

  • Mallows, C. L. 1973. “Some Comments onC p .”Technometrics 15, 661–675.

    Article  MATH  Google Scholar 

  • Pearson, E. S. and N. W. Please. 1975. “Relation Between the Shape of Population Distributions and the Robustness of Four Simple Statistical Tests.”Biometrika 62, 223–241.

    Article  MATH  MathSciNet  Google Scholar 

  • Quandt, R. E. 1972. “New Approaches to Estimating Switching Regressions.”J. Am. Statist. Assoc. 67, 306–330.

    Article  MATH  Google Scholar 

  • Rose, G. D. 1979. “Hierarchic Organization of Domains in Globular Proteins.”J. Molec. Biol. 134, 447–470.

    Article  Google Scholar 

  • Sankoff, D. and J. B. Kruskal (Eds.). 1983.Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Reading, MA: Addison-Wesley.

    Google Scholar 

  • Schulz, G. E. and R. H. Schirmer. 1979.Principles of Protein Structure. New York: Springer.

    Google Scholar 

  • Seber, G. A. F. 1977.Linear Regression Analysis. Wiley: New York.

    Google Scholar 

  • Skehel, J.J., P. M. Baylel, E. B. Brown, S. R. Martin, M. D. Waterfield, J. M. White, I. A. Wilson and D. C. Wiley, 1982. “Changes in the Conformation of Influenza Virus Hemagglutinin at the pH Optimum of Virus-mediated Membrane Fusion.”Proc. Natn. Acad. Sci. U.S.A. 79, 968–972.

    Article  Google Scholar 

  • Sternberg, M. J. and E. Thornton. 1977. “On the Conformation of Proteins: An Analysis of β-Pleated Sheets.”J. Molec. Biol. 110, 285–296.

    Google Scholar 

  • Waterman, M. S. 1984. “General Methods of Sequence Comparison.”Bull. Math. Biol. 46, 473–500.

    Article  MATH  MathSciNet  Google Scholar 

  • Wetlaufer, D. E. 1972. “Nucleation, Rapid Folding, and Globular Intrachain Regions in Proteins.”Proc. Natn. Acad. Sci. U.S.A. 70, 697–701.

    Article  Google Scholar 

  • Wilson, I. A., J. J. Skehel and D. C. Wiley. 1981. “Structure of the Haemagglutinin Membrane Glycoprotein of Influenza Virus at 3 Å Resolution.”Nature 289, 366–373.

    Article  Google Scholar 

  • Worsley, K. J. 1983. “Testing for Two-phase Multiple Regression.”Technometrics 25, 35–42.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Auger, I.E., Lawrence, C.E. Algorithms for the optimal identification of segment neighborhoods. Bltn Mathcal Biology 51, 39–54 (1989). https://doi.org/10.1007/BF02458835

Download citation

  • Received:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02458835

Keywords

Navigation