Skip to main content

Deduplication on Finite Automata and Nested Duplication Systems

  • Conference paper
  • First Online:
Unconventional Computation and Natural Computation (UCNC 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10240))

  • 462 Accesses

Abstract

Motivated by work on bio-operations on DNA sequences, a string duplication system S consists of an initial string over \(\varSigma \) and a set of duplication functions that iteratively generate new strings from existing strings in the system. As the main result we introduce the concept of a deduplication—a reverse function of duplication—on an nondeterministic finite-state automaton (NFA) and propose the deduplication operation on an NFA that transforms a given NFA to a smaller NFA while generating the same language in the string duplication system. Then, we introduce a nested duplication, which is similar to tandem duplication but depends on the information of the nested duplication in the previous step. We propose an NFA construction for an arbitrary nested duplication system, analyze its properties and present an algorithm that computes the system capacity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cho, D.-J., Han, Y.-S., Kim, H., Palioudakis, A., Salomaa, K.: Duplications and pseudo-duplications. Int. J. Unconv. Comput. 12, 145–167 (2016)

    MATH  Google Scholar 

  2. Dassow, J., Mitrana, V., Paun, G.: On the regularity of duplication closure. Bull. EATCS 69, 133–136 (1999)

    MathSciNet  MATH  Google Scholar 

  3. Dassow, J., Mitrana, V., Salomaa, A.: Operations and language generating devices suggested by the genome evolution. Theoret. Comput. Sci. 270(1–2), 701–738 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  4. de Koning, A.J., Gu, W., Castoe, T.A., Batzer, M.A., Pollock, D.D.: Repetitive elements may comprise over two-thirds of the human genome. PLoS Genet. 7(12), e1002384 (2011)

    Article  Google Scholar 

  5. Farnoud, F., Schwartz, M., Bruck, J.: The capacity of string-duplication systems. IEEE Trans. Inf. Theory 62(2), 811–824 (2016)

    Article  MathSciNet  Google Scholar 

  6. Immink, K.: Codes for Mass Data Storage Systems. Shannon Foundation Publishers, Denver (2004)

    Google Scholar 

  7. Ito, M., Kari, L., Kincaid, Z., Seki, S.: Duplication in DNA sequences. In: Ito, M., Toyama, M. (eds.) DLT 2008. LNCS, vol. 5257, pp. 419–430. Springer, Heidelberg (2008). doi:10.1007/978-3-540-85780-8_33

    Chapter  Google Scholar 

  8. Ito, M., Leupold, P., Shikishima-Tsuji, K.: Closure of language classes under bounded duplication. In: Ibarra, O.H., Dang, Z. (eds.) DLT 2006. LNCS, vol. 4036, pp. 238–247. Springer, Heidelberg (2006). doi:10.1007/11779148_22

    Chapter  Google Scholar 

  9. Jain, S., Farnoud, F., Bruck, J.: Capacity and expressiveness of genomic tandem duplication. In: Proceedings of the 23rd IEEE International Symposium on Information Theory, pp. 1946–1950 (2015)

    Google Scholar 

  10. Kuich, W., Salomaa, A.: Semirings, Automata and Languages. Springer, New York, Inc. (1985)

    MATH  Google Scholar 

  11. Leupold, P.: Languages generated by iterated idempotencies. Ph.D. thesis, University Rovira i Virgili (2006)

    Google Scholar 

  12. Leupold, P.: Duplication roots. In: Harju, T., Karhumäki, J., Lepistö, A. (eds.) DLT 2007. LNCS, vol. 4588, pp. 290–299. Springer, Heidelberg (2007). doi:10.1007/978-3-540-73208-2_28

    Chapter  Google Scholar 

  13. Leupold, P., Martín-Vide, C., Mitrana, V.: Uniformly bounded duplication languages. Discret. Appl. Math. 146(3), 301–310 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  14. Leupold, P., Mitrana, V., Sempere, J.M.: Formal languages arising from gene repeated duplication. In: Jonoska, N., Păun, G., Rozenberg, G. (eds.) Aspects of Molecular Computing. LNCS, vol. 2950, pp. 297–308. Springer, Heidelberg (2003). doi:10.1007/978-3-540-24635-0_22

    Chapter  Google Scholar 

  15. Martín-Vide, C., Păun, G.: Duplication grammars. Acta Cybern. 14(1), 151–164 (1999)

    MathSciNet  MATH  Google Scholar 

  16. Mitrana, V., Rozenberg, G.: Some properties of duplication grammars. Acta Cybern. 14(1), 165–177 (1999)

    MathSciNet  MATH  Google Scholar 

  17. Searls, D.B.: The computational linguistics of biological sequences. Artif. Intell. Mol. Biol. 2, 47–120 (1993)

    Google Scholar 

  18. Swanson, L., Robertson, G., Mungall, K.L., Butterfield, Y.S., Chiu, R., Corbett, R.D., Docking, T.R., Hogge, D., Jackman, S.D., Moore, R.A., et al.: Barnacle: detecting and characterizing tandem duplications and fusions in transcriptome assemblies. BMC Genom. 14(1), 550 (2013)

    Article  Google Scholar 

  19. Wood, D.: Theory of Computation. Wiley, New York (1987)

    MATH  Google Scholar 

  20. Yokomori, T., Kobayashi, S.: DNA evolutionary linguistics, RNA structure modeling: a computational approach. In: Proceedings of the 1st International Symposium on Intelligence in Neural and Biological Systems, pp. 38–45 (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yo-Sub Han .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Cho, DJ., Han, YS., Kim, H. (2017). Deduplication on Finite Automata and Nested Duplication Systems. In: Patitz, M., Stannett, M. (eds) Unconventional Computation and Natural Computation. UCNC 2017. Lecture Notes in Computer Science(), vol 10240. Springer, Cham. https://doi.org/10.1007/978-3-319-58187-3_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-58187-3_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-58186-6

  • Online ISBN: 978-3-319-58187-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics