Skip to main content

Lost Strings in Genomes: What Sense Do They Make?

  • Conference paper
  • First Online:
Bioinformatics and Biomedical Engineering (IWBBIO 2017)

Abstract

We studied the sets of avoided strings to be observed over a family of genomes. It was found that the length of the minimal avoided string rarely exceeds 9 nucleotides, with neither respect to a phylogeny of a genome under consideration. The lists of the avoided strings observed over the sets of (related) genomes have been analyzed. Very low correlation between the phylogeny, and the set of those strings has been found.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bugaenko, N.N., Gorban, A.N., Sadovsky, M.G.: Maximum entropy method in analysis of genetic text and measurement of its information content. Open Syst. Inf. Dyn. 5, 265–278 (1998)

    Article  MATH  Google Scholar 

  2. Gorban, A.N., Popova, T.G., Sadovsky, M.G., Wünsch, D.C.: Information content of the frequency dictionaries, reconstruction, transformation, classification of dictionaries, genetic texts. In: Intelligent Engineering Systems through Artificial Neural Networks. Smart Engineering System Design, vol. 11, pp. 657–663. ASME Press, New York (2001)

    Google Scholar 

  3. Gorban, A.N., Popova, T.G., Sadovsky, M.G.: Classification of symbol sequences over thier frequency dictionaries: towards the connection between structure and natural taxonomy. Open Syst. Inf. Dyn. 7, 1–17 (2000)

    Article  MATH  Google Scholar 

  4. Sadovsky, M.G., Shchepanovsky, A.S., Putintzeva, Y.A.: Genes, information and sense: complexity and knowledge retrieval. Theory Biosci. 127, 69–78 (2008)

    Article  Google Scholar 

  5. Sadovsky, M.G.: Comparison of real frequencies of strings vs. the expected ones reveals the information capacity of macromoleculae. J. Biol. Phys. 29, 23–38 (2003)

    Article  Google Scholar 

  6. Sadovsky, M.G.: Information capacity of nucleotide sequences and its applications. Bull. Math. Biol. 68, 156–178 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  7. Garcia S.P., Pinho A.J.: Minimal absent words in four human genome assemblies. PLoS One 6(12), e29344 (2011)

    Google Scholar 

  8. Alileche, A., Goswami, J., Bourland, W., Davis, M., Hampikian, G.: Nullomer derived anticancer peptides (NulloPs): differential lethal effects on normal and cancer cells in vitro. Peptides 38, 302–311 (2012)

    Article  Google Scholar 

  9. Acquisti, C., Poste, G., Curtiss, D., Kumar, S.: Nullomers: really a matter of natural selection? PLoS One 10, e1022 (2007)

    Article  Google Scholar 

  10. Aurell, E., Innocenti, N., Zhou, H.-J.: The Bulk and The Tail of Minimal Absent Words in Genome Sequences (2015). arXiv:1509.05188v1

  11. Rahman, M.S., Alatabbi, A., Athar, T., Crochemore, M., Rahman, M.S.: Absent words and the (dis)similarity analysis of DNA sequences: an experimental study. BMC Res. Notes 9, 186 (2016)

    Article  Google Scholar 

  12. Garcia, S.P., Pinho, A.P., Rodrigues, J., Bastos, C.A.C., Ferreira, P.: Minimal absent words in prokaryotic, eukaryotic genomes. PLoS One 6(1), e16065 (2011)

    Google Scholar 

  13. Hao, B., Xie, H., Zuguo, Y., Chen, G.: Avoided strings in bacterial complete genomes and a related combinatorial problem. Ann. Comb. 4, 247–255 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  14. Chairungsee, S., Crochemore, M.: Using minimal absent words to build phylogeny. Theoret. Comput. Sci. 450, 109–116 (2012)

    Article  MATH  MathSciNet  Google Scholar 

  15. Gelfand, M.S., Koonin, E.V.: Avoidance of palindromic words in bacterial and archaeal genomes: a close connection with restriction enzymes. Nucleic Acids Res. 25, 2430–2439 (1997)

    Article  Google Scholar 

  16. Fuglsang, A.: Distribution of potential type II restriction sites (palindromes) in prokaryotes. Biochem. Biophys. Res. Commun. 310(2), 280–285 (2003)

    Article  Google Scholar 

  17. Roberts, R.J., Vincze, T., Posfai, J., Macelis, D.: REBASE-a database for DNA restriction and modification: enzymes, genes and genomes. Nucleic Acids Res. 43, D298–D299 (2015)

    Article  Google Scholar 

Download references

Acknowledgement

This study was supported by a research grant # 14.Y26.31.0004 from the Government of the Russian Federation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Sadovsky .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Sadovsky, M., Fontaine, JF., Andrade-Navarro, M.A., Yakubailik, Y., Rudenko, N. (2017). Lost Strings in Genomes: What Sense Do They Make?. In: Rojas, I., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2017. Lecture Notes in Computer Science(), vol 10209. Springer, Cham. https://doi.org/10.1007/978-3-319-56154-7_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-56154-7_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-56153-0

  • Online ISBN: 978-3-319-56154-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics