Minimum message length encoding and the comparison of macromolecules
- 67 Downloads
A method of inductive inference known asminimum message length encoding is applied to string comparison in molecular biology. The question of whether or not two strings are related and, if so, of how they are related and the problem of finding a good theory of string mutation are treated as inductive inference problems. The method allows the posterior odds-ratio of two string alignments or of two models of string mutation to be computed. The connection between models of mutation and existing string alignment algorithms is made explicit. A fast minimum message length alignment algorithm is also described.
KeywordsDynamic Programming Algorithm Edit Distance Alignment Algorithm Kolmogorov Complexity Edit Operation
Unable to display preview. Download preview PDF.
- Dayhoff, M. O. 1978.Atlas of Protein Sequence and Structure, Vol. 5, Suppl. 3. Washington, DC: National Biomedical Research Foundation.Google Scholar
- Deken, J. 1983. Probabilistic behaviour of longest common subsequence lengths. InTime Warps, String Edits and Macro-Molecules, D. Sankoff and J. B. Kruskall (eds). Reading, MA: Addison Wesley.Google Scholar
- Georgeff, M. P. and C. S. Wallace. 1984. A general selection criterion for inductive inference. Proceedings of the European Conference on Artificial Intelligence, pp. 473–482.Google Scholar
- Ming Li and P. M. B. Vitanyi. 1988. Two decades of applied Kolmogorov Complexity. Proceedings of the Third Annual Conference on Structure in Complexity Theory. IEEE 80–101.Google Scholar
- Sankoff, D. and J. B. Kruskall (eds). 1983.Time Warps, String Edits and Macro-Molecules. Reading, MA: Addison Wesley.Google Scholar
- Staden, R. and A. D. McLachlan. 1982. Codon preference and its use in identifying protein coding regions in long DNA sequences.Nucleic Acids Res. 10, 141–156.Google Scholar
- Wallace, C. S. 1989. Personal communication.Google Scholar