Abstract
A new manner of relating formal language theory to the study of informational macromolecules is initiated. A language is associated with each pair of sets where the first set consists of double-stranded DNA molecules and the second set consists of the recombinational behaviors allowed by specified classes of enzymatic activities. The associated language consists of strings of symbols that represent the primary structures of the DNA molecules that may potentially arise from the original set of DNA molecules under the given enzymatic activities.
Attention is focused on the potential effect of sets of restriction enzymes and a ligase that allow DNA molecules to be cleaved and reassociated to produce further molecules. The associated languages are analysed by means of a new generative formalism called a splicing system. A significant subclass of these languages, which we call the persistent splicing languages, is shown to coincide with a class of regular languages which have been previously studied in other contexts: the strictly locally testable languages.
This study initiates the formal analysis of the generative power of recombinational behaviors in general. The splicing system formalism allows observations to be made concerning the generative power of general recombination and also of sets of enzymatic activities that include general recombination.
Similar content being viewed by others
Literature
Book, R. 1983. “Thue Systems and the Church-Rosser Property: Replacement Systems, Specification of Formal Languages, and Presentations of Monoids.” In: L. Cummings (Ed.),Combinatorics on Words, pp. 1–38. New York: Academic Press.
Book, R. 1985. “Thue Systems and Word Problems.” In: J. P. Jouannand (Ed.),Rewriting Systems and Applications, Lecture Notes on Computer Science 202, pp. 63–94. Springer.
Brendel, V. and H. G. Busse. 1984. “Genome Structure Described by Formal Languages.”Nucleic Acids Res. 12, 2561–2568.
De Luca, A. and A. Restivo. 1980. “A Characterization of Strictly Locally Testable Languages and Its Application to Subsemigroups of a Free Semigroup.”Information and Control 44, 300–319.
Eberling, W. and M. A. Jimenez-Montano. 1980. “On Grammars, Complexity, and Information Measures of Biological Macromolecules.”Mathematical Biosciences 52, 53–72.
Jimenez-Montano, M. A. 1984. “On the Syntactic Structure of Protein Sequences and the Concept of Grammar Complexity.”Bull. math. Biol. 46, 641–659.
Hopcroft, J. E. and J. D. Ullman. 1979.Introduction to Automata Theory, Languages, and Computation. Reading, MA: Addison-Wesley.
Landau, G. M. and U. Vishkin. 1986. “Introducing Efficient Parallelism into Approximate String Matching and a New Serial Algorithm.”Proceedings of the 18th Annual ACM Symposium on Theory of Computing, pp. 220–230.
—— andR. Nussinov. 1986. “An Efficient String Matching Algorithm withk Differences for Nucleotide and Amino Acid Sequences.”Nucleic Acids Res. 14, 31–46.
Legerski, R. J. and D. L. Robberson. 1985. “Analysis and Optimization of Recombinant DNA Joining Reactions.”J. mol. Biol. 181, 297–312.
Lewin, B. 1983.Genes. New York: Wiley.
—. 1987.Genes III. New York: Wiley.
Martinez, H. M. (Ed.) 1984. “Mathematical and Computational Problems in the Analysis of Molecular Sequences.”Bull. math. Biol. (Special Issue Honoring M. O. Dayhoff)46(4).
McNaughton, R. and S. Papert. 1971.Counter-Free Automata. Cambridge MA: M.I.T. Press.
Salomaa, A. 1985.Computation and Automata. Cambridge: Cambridge University Press.
Schutzenberger, M. P. 1975. “Sur Certaines Operations de Fermeture dans les Langages Rationnels.”Symposia Math. 15, 245–253.
Watson, J. D., Tooze, J. and Kurtz, D. T. 1983.Recombinant DNA: A Short Course. New York: Freeman.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Head, T. Formal language theory and DNA: An analysis of the generative capacity of specific recombinant behaviors. Bltn Mathcal Biology 49, 737–759 (1987). https://doi.org/10.1007/BF02481771
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02481771