Abstract
Motivated by work on bio-operations on DNA sequences, a string duplication system S consists of an initial string over \(\varSigma \) and a set of duplication functions that iteratively generate new strings from existing strings in the system. As the main result we introduce the concept of a deduplication—a reverse function of duplication—on an nondeterministic finite-state automaton (NFA) and propose the deduplication operation on an NFA that transforms a given NFA to a smaller NFA while generating the same language in the string duplication system. Then, we introduce a nested duplication, which is similar to tandem duplication but depends on the information of the nested duplication in the previous step. We propose an NFA construction for an arbitrary nested duplication system, analyze its properties and present an algorithm that computes the system capacity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cho, D.-J., Han, Y.-S., Kim, H., Palioudakis, A., Salomaa, K.: Duplications and pseudo-duplications. Int. J. Unconv. Comput. 12, 145–167 (2016)
Dassow, J., Mitrana, V., Paun, G.: On the regularity of duplication closure. Bull. EATCS 69, 133–136 (1999)
Dassow, J., Mitrana, V., Salomaa, A.: Operations and language generating devices suggested by the genome evolution. Theoret. Comput. Sci. 270(1–2), 701–738 (2002)
de Koning, A.J., Gu, W., Castoe, T.A., Batzer, M.A., Pollock, D.D.: Repetitive elements may comprise over two-thirds of the human genome. PLoS Genet. 7(12), e1002384 (2011)
Farnoud, F., Schwartz, M., Bruck, J.: The capacity of string-duplication systems. IEEE Trans. Inf. Theory 62(2), 811–824 (2016)
Immink, K.: Codes for Mass Data Storage Systems. Shannon Foundation Publishers, Denver (2004)
Ito, M., Kari, L., Kincaid, Z., Seki, S.: Duplication in DNA sequences. In: Ito, M., Toyama, M. (eds.) DLT 2008. LNCS, vol. 5257, pp. 419–430. Springer, Heidelberg (2008). doi:10.1007/978-3-540-85780-8_33
Ito, M., Leupold, P., Shikishima-Tsuji, K.: Closure of language classes under bounded duplication. In: Ibarra, O.H., Dang, Z. (eds.) DLT 2006. LNCS, vol. 4036, pp. 238–247. Springer, Heidelberg (2006). doi:10.1007/11779148_22
Jain, S., Farnoud, F., Bruck, J.: Capacity and expressiveness of genomic tandem duplication. In: Proceedings of the 23rd IEEE International Symposium on Information Theory, pp. 1946–1950 (2015)
Kuich, W., Salomaa, A.: Semirings, Automata and Languages. Springer, New York, Inc. (1985)
Leupold, P.: Languages generated by iterated idempotencies. Ph.D. thesis, University Rovira i Virgili (2006)
Leupold, P.: Duplication roots. In: Harju, T., Karhumäki, J., Lepistö, A. (eds.) DLT 2007. LNCS, vol. 4588, pp. 290–299. Springer, Heidelberg (2007). doi:10.1007/978-3-540-73208-2_28
Leupold, P., Martín-Vide, C., Mitrana, V.: Uniformly bounded duplication languages. Discret. Appl. Math. 146(3), 301–310 (2005)
Leupold, P., Mitrana, V., Sempere, J.M.: Formal languages arising from gene repeated duplication. In: Jonoska, N., Păun, G., Rozenberg, G. (eds.) Aspects of Molecular Computing. LNCS, vol. 2950, pp. 297–308. Springer, Heidelberg (2003). doi:10.1007/978-3-540-24635-0_22
Martín-Vide, C., Păun, G.: Duplication grammars. Acta Cybern. 14(1), 151–164 (1999)
Mitrana, V., Rozenberg, G.: Some properties of duplication grammars. Acta Cybern. 14(1), 165–177 (1999)
Searls, D.B.: The computational linguistics of biological sequences. Artif. Intell. Mol. Biol. 2, 47–120 (1993)
Swanson, L., Robertson, G., Mungall, K.L., Butterfield, Y.S., Chiu, R., Corbett, R.D., Docking, T.R., Hogge, D., Jackman, S.D., Moore, R.A., et al.: Barnacle: detecting and characterizing tandem duplications and fusions in transcriptome assemblies. BMC Genom. 14(1), 550 (2013)
Wood, D.: Theory of Computation. Wiley, New York (1987)
Yokomori, T., Kobayashi, S.: DNA evolutionary linguistics, RNA structure modeling: a computational approach. In: Proceedings of the 1st International Symposium on Intelligence in Neural and Biological Systems, pp. 38–45 (1995)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Cho, DJ., Han, YS., Kim, H. (2017). Deduplication on Finite Automata and Nested Duplication Systems. In: Patitz, M., Stannett, M. (eds) Unconventional Computation and Natural Computation. UCNC 2017. Lecture Notes in Computer Science(), vol 10240. Springer, Cham. https://doi.org/10.1007/978-3-319-58187-3_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-58187-3_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-58186-6
Online ISBN: 978-3-319-58187-3
eBook Packages: Computer ScienceComputer Science (R0)