Finitestate models in the alignment of macromolecules
 L. Allison,
 C. S. Wallace,
 C. N. Yee
Minimum message length encoding is a technique of inductive inference with theoretical and practical advantages. It allows the posterior oddsratio of two theories or hypotheses to be calculated. Here it is applied to problems of aligning or relating two strings, in particular two biological macromolecules. We compare the rtheory, that the strings are related, with the nulltheory, that they are not related. If they are related, the probabilities of the various alignments can be calculated. This is done for one, three, and fivestate models of relation or mutation. These correspond to linear and piecewise linear cost functions on runs of insertions and deletions. We describe how to estimate parameters of a model. The validity of a model is itself an hypothesis and can be objectively tested. This is done on real DNA strings and on artificial data. The tests on artificial data indicate limits on what can be inferred in various situations. The tests on real DNA support either the three or fivestate models over the onestate model. Finally, a fast, approximate minimum message length string comparison algorithm is described.
 Title
 Finitestate models in the alignment of macromolecules
 Journal

Journal of Molecular Evolution
Volume 35, Issue 1 , pp 7789
 Cover Date
 19920701
 DOI
 10.1007/BF00160262
 Print ISSN
 00222844
 Online ISSN
 14321432
 Publisher
 SpringerVerlag
 Additional Links
 Topics
 Keywords

 Alignment
 Edit distance
 Homology
 Inductive inference
 Minimum message length
 Similarity
 String
 Industry Sectors
 Authors

 L. Allison ^{(1)}
 C. S. Wallace ^{(1)}
 C. N. Yee ^{(1)}
 Author Affiliations

 1. Department of Computer Science, Monash University, 3168, Australia