Minimum message length encoding and the comparison of macromolecules

Allison, L.; Yee, C. N.

doi:10.1007/BF02458580

Minimum message length encoding and the comparison of macromolecules

Published: May 1990

Volume 52, pages 431–453, (1990)
Cite this article

Bulletin of Mathematical Biology Aims and scope Submit manuscript

L. Allison¹ &
C. N. Yee¹

77 Accesses
20 Citations
Explore all metrics

Abstract

A method of inductive inference known asminimum message length encoding is applied to string comparison in molecular biology. The question of whether or not two strings are related and, if so, of how they are related and the problem of finding a good theory of string mutation are treated as inductive inference problems. The method allows the posterior odds-ratio of two string alignments or of two models of string mutation to be computed. The connection between models of mutation and existing string alignment algorithms is made explicit. A fast minimum message length alignment algorithm is also described.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Literature

Allison, L. and T. I. Dix. 1986. A bit-string longest common subsequence algorithm.Inf. Processing Lett. 23, 305–310.
Article MathSciNet Google Scholar
Bains, W. 1986. The multiple origins of the human Alu sequences.J. molec. Evol. 23, 189–199.
Article Google Scholar
Boulton, D. M. and C. S. Wallace. 1969. The information content of a multistate distribution.J. theor. Biol. 23, 269–278.
Article MathSciNet Google Scholar
Chaitin, G. J. 1966. On the length of programs for computing finite binary sequences.J. ACM 13, 547–569.
Article MATH MathSciNet Google Scholar
Cohen, D. N., T. A. Reichert and A. K. C. Wong. 1975. Matching code sequences utilizing context free quality measures.Math. Biosci. 24, 25–30.
Article MATH MathSciNet Google Scholar
Dayhoff, M. O. 1978.Atlas of Protein Sequence and Structure, Vol. 5, Suppl. 3. Washington, DC: National Biomedical Research Foundation.
Google Scholar
Deken, J. 1983. Probabilistic behaviour of longest common subsequence lengths. InTime Warps, String Edits and Macro-Molecules, D. Sankoff and J. B. Kruskall (eds). Reading, MA: Addison Wesley.
Google Scholar
Gatlin, L. L. 1974. Conservation of Shannon's redundancy of proteins.J. mol. Evol. 3, 189–208.
Article Google Scholar
Georgeff, M. P. and C. S. Wallace. 1984. A general selection criterion for inductive inference. Proceedings of the European Conference on Artificial Intelligence, pp. 473–482.
Gotoh, O. 1982. An improved algorithm for matching biological sequences.J. molec. Biol. 162, 705–708.
Article Google Scholar
Hamming, R. W. 1980.Coding and Information Theory. Englewood Cliffs, NJ: Prentice Hall.
MATH Google Scholar
Hasegawa, M. and Taka-Aki Yani. 1975. The genetic code and the entropy of protein.Math. Biosci. 24, 169–182.
Article Google Scholar
Hirschberg, D. S. 1975. A linear space algorithm for computing maximal common subsequences.Commun. ACM 18, 341–343.
Article MATH MathSciNet Google Scholar
Jimenez-Montano, M. A. 1984. On the syntactic structure of protein sequences and the concept of grammar complexity.Bull. math. Biol. 46, 641–659.
Article MATH MathSciNet Google Scholar
Kolmogorov, A. N. 1965. Three approaches to the quantitative definition of information.Prob. Inf. Transmission 1, 1–7.
MATH Google Scholar
Langdon, G. G. 1984. An introduction to arithmetic coding.IBM J. Res. and Dev. 28, 135–149.
Article MATH MathSciNet Google Scholar
Miller, W. and E. W. Myers. 1988. Sequence comparison with concave weighting functions.Bull. math. Biol. 50, 97–120.
Article MATH MathSciNet Google Scholar
Ming Li and P. M. B. Vitanyi. 1988. Two decades of applied Kolmogorov Complexity. Proceedings of the Third Annual Conference on Structure in Complexity Theory. IEEE 80–101.
Reichert, T. A., D. N. Cohen and K. C. Wong. 1973. An application of information theory to genetic mutations and the matching of polypeptide sequences.J. theor. Biol. 42, 245–261.
Article Google Scholar
Rissanen, J. 1983. A universal prior for integers and estimation by minimum description length.Ann. Stat. 11, 416–431.
MATH MathSciNet Google Scholar
Sankoff, D. and J. B. Kruskall (eds). 1983.Time Warps, String Edits and Macro-Molecules. Reading, MA: Addison Wesley.
Google Scholar
Sellers, P. H. 1974. On the theory and computation of evolutionary distances.SIAM J. appl. Math. 26, 787–793.
Article MATH MathSciNet Google Scholar
Sellers, P. H. 1980. The theory and computation of evolutionary distances: pattern recognition.J. Algorithms 1, 359–373.
Article MATH MathSciNet Google Scholar
Shepherd, J. C. W. 1981. Method to determine the reading frame of a protein from the purine/pyrimidine genome sequence and its possible evolutionary justification.Proc. natl. Acad. Sci. 78, 1596–1600.
Article Google Scholar
Smith, T. F. 1969. The genetic code, information density and evolution.Math. Biosci. 4, 179–187.
Article Google Scholar
Smith, T. F. and M. S. Waterman. 1980. Protein constraints induced by multiframe encoding.Math. Biosci. 49, 17–26.
Article MATH Google Scholar
Smith, T. F., M. S. Waterman and W. M. Fitch. 1981. comparative biosequence metrics.J. molec. Evol. 18, 38–46.
Article Google Scholar
Solomonoff, R. 1964. A formal theory of inductive inference, I and II.Inf. Control 7, 1–22, 224–254.
Article MATH MathSciNet Google Scholar
Staden, R. and A. D. McLachlan. 1982. Codon preference and its use in identifying protein coding regions in long DNA sequences.Nucleic Acids Res. 10, 141–156.
Google Scholar
Turing, A. M. 1936. On computable numbers, with an application to the entscheidungsproblem.Proc. Lon. math. Soc. 2, 230–265, 544–546.
MATH Google Scholar
Wallace, C. S. and D. M. Boulton. 1968. An information measure for classification.Comput. J. 11, 185–194.
MATH Google Scholar
Wallace, C. S. and P. R. Freeman. 1987. Estimation and inference by compact coding.J. R. Stat. Soc. B49, 240–265.
MATH MathSciNet Google Scholar
Wallace, C. S. 1989. Personal communication.
Waterman, M. S. 1984. General methods of sequence comparison.Bull. math. Biol. 46, 473–500.
Article MATH MathSciNet Google Scholar
Waterman, M. S. 1984b. Efficient sequence alignment algorithms.J. theor. Biol. 108, 333–337.
MathSciNet Google Scholar
Waterman, M. S. and M. Eggert. 1987. A new algorithm for best subsequence alignments and application to tRNA-rRNA comparison.J. molec. Biol. 197, 723–728.
Article Google Scholar
Witten, I. H., R. M. Neal and J. G. Cleary. 1987. Arithmetic coding for data compression.Commun. ACM 30, 520–540.
Article Google Scholar
Wong, A. K. C., T. A. Reichert, D. N. Cohen and B. O. Aygun. 1974. A generalized method for matching informational macromolecular code sequences.Comput. Biol. Med. 4, 43–57.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Monash University, 3168, Australia
L. Allison & C. N. Yee

Authors

L. Allison
View author publications
You can also search for this author in PubMed Google Scholar
C. N. Yee
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Supported by Australian Research Council grant A48830856.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Allison, L., Yee, C.N. Minimum message length encoding and the comparison of macromolecules. Bltn Mathcal Biology 52, 431–453 (1990). https://doi.org/10.1007/BF02458580

Download citation

Received: 24 May 1989
Revised: 20 September 1989
Issue Date: May 1990
DOI: https://doi.org/10.1007/BF02458580

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Minimum message length encoding and the comparison of macromolecules

Abstract

Access this article

Similar content being viewed by others

libFLASM: a software library for fixed-length approximate string matching

Compositional Properties of Alignments

The Chain Alignment Problem

Literature

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Minimum message length encoding and the comparison of macromolecules

Abstract

Access this article

Similar content being viewed by others

libFLASM: a software library for fixed-length approximate string matching

Compositional Properties of Alignments

The Chain Alignment Problem

Literature

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation