Skip to main content

Fuzzy Genome Sequence Assembly for Single and Environmental Genomes

  • Chapter
Fuzzy Systems in Bioinformatics and Computational Biology

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 242))

  • 946 Accesses

Summary

Traditional methods obtain a microorganism’s DNA by culturing it individually. Recent advances in genomics have lead to the procurement of DNA of more than one organism from its natural habitat. Indeed, natural microbial communities are often very complex with tens and hundreds of species. Assembling these genomes is a crucial step irrespective of the method of obtaining the DNA. This chapter presents fuzzy methods for multiple genome sequence assembly of cultured genomes (single organism) and environmental genomes (multiple organisms).

An optimal alignment of DNA genome fragments is based on several factors, such as the quality of bases and the length of overlap. Factors such as quality indicate if the data is high quality or an experimental error. We propose a sequence assembly solution based on fuzzy logic, which allows for tolerance of inexactness or errors in fragment matching and that can be used for improved assembly.

We propose fuzzy classification using modified fuzzy weighted averages to classify fragments belonging to different organisms within an environmental genome population. Our proposed approach uses DNA-based signatures such as GC content and nucleotide frequencies as features for the classification. This divide-and-conquer strategy also improves performance on larger datasets. We evaluate our method on artificially created environmental genomes to test various combinations of organisms and on an environmental genome.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baxevanis, A.D., Ouellette, B.F.F.: Bioinformatics: A practical guide to the analysis of genes and proteins, 1st edn. John Wiley, Chichester (2005)

    Google Scholar 

  2. Beja, O., Suzuki, M.T., Koonin, E.V., Aravind, L., Hadd, A., Nguyen, L.P., Villacorta, R., Amjadi, M., Garrigues, C., Jovanovich, S.B., Feldman, R.A., DeLong, E.F.: Construction and analysis of bacterial artificial chromosome libraries from a marine microbial assemblage. Environmental Microbiology 2, 516–529 (2000)

    Article  Google Scholar 

  3. Birdsell, J.A.: Integrating genomics, bioinformatics and classical genetics to study the effects of recombination on genome evolution. Molocular Biology Evolution 19, 1181–1197 (2002)

    Google Scholar 

  4. Brown, T.A.: Genomes, 3rd edn. Garland Science (2006)

    Google Scholar 

  5. Burge, C., Campbell, A.M., Karlin, S.: Over- and under-representation of short oligonucleotides in DNA sequences. Proceedings National Acaddemy of Science USA 89(4), 1358–1362 (1992)

    Article  Google Scholar 

  6. Chen, K., Pachter, L.: Bioinformatics for whole-genome shotgun sequencing of microbial communities. PLoS Computational Biology 1, 106–112 (2005)

    Article  Google Scholar 

  7. Choudhuri, S.: The path from nuclein to human genome: A brief history of DNA with a note on human genome sequencing and its impact on future research in biology. Bulletin of Science Technology Society 23, 360–367 (2003)

    Article  Google Scholar 

  8. Conant, G.C., Lewis, P.O.: Effects of nucleotide composition bias on the success of the parsimony criterion in phylogenetic inference. Molecular Biology Evolution 18, 1024–1033 (2001)

    Google Scholar 

  9. Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to algorithms, 2nd edn., pp. 313–319. McGraw-Hill, New York (2001)

    MATH  Google Scholar 

  10. Edmund, P.: A history of genome sequencing, Tech. report, Yale University Bioinformatics (2001)

    Google Scholar 

  11. Ewing, B., Green, P.: Basecalling of automated sequencer traces using phred. ii. error probabilities. Genome Research 8, 186–194 (1998)

    Google Scholar 

  12. Fleischmann, R., Adams, M., White, O., Clayton, R., Kirkness, E., Kerlavage, A., Bult, C., Tomb, J., Dougherty, B., Merrick, J.: Whole-genome random sequencing and assembly of Haemophilus Influenzae Rd. Science 269(5223), 496–512 (1995)

    Article  Google Scholar 

  13. Gasch, A.P., Eisen, M.B.: Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering. Genome Biology 3(11), 1–22 (2002)

    Article  Google Scholar 

  14. Gene, M.: Whole-genome DNA sequencing. IEEE Computational Engineering and Science 1, 33–43 (1999)

    Google Scholar 

  15. Green, P.: Documentation for phrap. Genome Center, University of Washington (2006)

    Google Scholar 

  16. Gutman, G.A., Hatfield, G.W.: Nonrandom utilization of codon pairs in Escherichia coli. Proceedings National Academy of Science USA 86, 3699–3703 (1989)

    Article  Google Scholar 

  17. Huang, X., Madan, A.: CAP3: A DNA sequence assembly program. Genome Research 9(9), 868–877 (1999)

    Article  Google Scholar 

  18. Hugenholtz, P.: Exploring prokaryotic diversity in the genomic era. Genome Biology 3, reviews0003.1–reviews0003.8 (2002)

    Article  Google Scholar 

  19. Karlin, S., Ladunga, I., Blaisdell, B.E.: Heterogeneity of genomes: Measures and values. Proceedings National Academy of Science USA 91, 12837–12841 (1994)

    Article  Google Scholar 

  20. Kececioglu, J.D., Myers, E.W.: Combinatorial algorithms for DNA sequence assembly. Algorithmica 13, 7–51 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  21. Looney, C.G.: Interactive clustering and merging with a new fuzzy expected value. Pattern Recognition 35, 2413–2423 (2002)

    Article  MATH  Google Scholar 

  22. McHardy, A.C., Martín, H.G., Tsirigos, A., Hugenholtz, P., Rigoutsos, I.: Accurate phylogenetic classification of variable-length DNA fragments. Nature Methods 4(1), 63–72 (2007)

    Article  Google Scholar 

  23. Mongodin, E., Emerson, J., Nelson, K.: Microbial metagenomics. Genome Biology 6(10), 347 (2005)

    Article  Google Scholar 

  24. Nasser, S.: Fuzzy sequence classification and assembly of environmental genomes, Ph.D. thesis, University of Nevada Reno (2008)

    Google Scholar 

  25. Nasser, S., Vert, G., Nicolescu, M., Murray, A.: Multiple sequence alignment using fuzzy logic. In: Proceedings of the IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, Honolulu, Hawaii, vol. 7, pp. 304–311 (2007)

    Google Scholar 

  26. Nasser, S., Vert, G.L., Breland, A., Nicolescu, M.: Fuzzy classification of genome sequences prior to assembly based on similarity measures. In: North American Fuzzy Information Processing Society, pp. 354–359 (2007)

    Google Scholar 

  27. NCBI, National center for biotechnology information, NIH (2007), http://www.ncbi.nlm.nih.gov/

  28. Nowak, M.A.: Evolutionary dynamics: Exploring the equations of life, 1st edn. Belknap Press (October 2006)

    Google Scholar 

  29. Oliver, J.L., Marín, A.: A relationship between GC content and coding-sequence length. Journal of Molecular Evolution 43(3), 216–223 (2004)

    Article  Google Scholar 

  30. Otu, H.H., Sayood, K.: A divide-and-conquer approach to fragment assembly. Bioinformatics 19(1), 22–29 (2003)

    Article  Google Scholar 

  31. Peltola, H., Soderlund, H., Ukkonen, E.: Seqaid: A DNA sequence assembling program based on a mathematical model. Nucleic Acids Research 21(1), 307–321 (1984)

    Article  Google Scholar 

  32. Pop, M., Salzberg, S.L., Shumway, M.: Genome sequence assembly: Algorithms and issues. IEEE Computer 35(7), 47–54 (2002)

    Google Scholar 

  33. Pride, D.T., Meinersmann, R.J., Wassenaar, T.M., Blaser, M.J.: Evolutionary implications of microbial genome tetranucleotide frequency biases. Genome Research 13(2), 145–158 (2003)

    Article  Google Scholar 

  34. Rappe, M., Giovannoni, S.: The uncultured microbial majority. Annual Reviews Microbiology 57, 369–394 (2003)

    Article  Google Scholar 

  35. Reva, O., Tümmler, B.: Differentiation of regions with atypical oligonucleotide composition in bacterial genomes. BMC Bioinformatics 6(1), 251 (2005)

    Article  Google Scholar 

  36. Rondon, M.R., August, P.R., Bettermann, A.D., Bradly, S.F., Grossman, T.H., Liles, M.R., Loiacono, K.A., Lynch, B.A., MacNeil, I.A., Minor, C., Tiong, C.L., Gilman, M., Osburne, M.S., Clardy, J., Handelsman, J., Goodman, R.M.: Cloning the soil metagenome: a strategy for accessing the genetic and functional diversity of uncultured microorgansims. Applications Environmental Microbiology 66, 2541–2547 (2000)

    Article  Google Scholar 

  37. Sadegh-Zadeh, K.: Fuzzy genomes. Artificial Intelligent Medicine 18(1), 1–28 (2000)

    Article  Google Scholar 

  38. Sanger, F., Coulson, A.R., Hong, G.F., Hill, D.F., Petersen, G.B.: Nucleotide sequence of Bacteriophage Lambda DNA. Journal Molecular Biology 162(4), 729–773 (1982)

    Article  Google Scholar 

  39. Sanger, F., Nicklen, S., Coulson, A.R.: DNA sequencing with chain-terminating inhibitors. Proceedings National Academy of Science USA 74(12), 5463–5467 (1977)

    Article  Google Scholar 

  40. Smith, T., Waterman, M.: Identification of common molecular subsequences. Journal of Molecular Biology 147, 195–197 (1981)

    Article  Google Scholar 

  41. Stein, J.L., Marsh, T.L., Wu, K.Y., Shizuya, H., DeLong, E.F.: Characterization of uncultivated prokaryotes: isolation and analysis of a 40-kilobase-pair genome fragment from a planktonic marine archaeon. Journal of Bacteriology 178, 591–599 (1996)

    Google Scholar 

  42. Sutton, G., White, O., Adams, M., Kerlavage, A.: TIGR Assembler: A new tool for assembling large shotgun sequencing projects. Genome Science & Technology 1, 9–19 (1995)

    Google Scholar 

  43. Teeling, H., Meyerdierks, A., Bauer, M., Amann, R., Glockner, F.: Application of tetranucleotide frequencies for the assignment of genomic fragments. Environmental Microbiology 6, 938–947 (2004)

    Article  Google Scholar 

  44. Teeling, H., Waldmann, J., Lombardot, T., Bauer, M., Glockner, F.O.: TETRA: A web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in dna sequences. BMC Bioinformatics 5, 163 (2004)

    Article  Google Scholar 

  45. Turnbaugh, P.J., Ley, R.E., Mahowald, M.A., Magrini, V., Mardis, E.R., Gordon, J.I.: An obesity–associated gut microbiome with increased capacity for energy harvest. Nature 444(7122), 1009–1010 (2006)

    Article  Google Scholar 

  46. Tyson, G.W., Chapman, J., Hugenholtz, P., Allen, E.E., Ram, R.J., Richardson, P.M., Solovyev, V.V., Rubin, E.M., Rokhsar, D.S., Banfield, J.F.: Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428, 37–43 (2004)

    Article  Google Scholar 

  47. Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  48. Venter, J.C., Remington, K., Heidelberg, J.F., Halpern, A.L., Rusch, D., Eisen, J.A., Wu, D., Paulsen, I., Nelson, K.E., Nelson, W., Fouts, D.E., Levy, S., Knap, A.H., Lomas, M.W., Nealson, K., White, O., Peterson, J., Hoffman, J., Parsons, R., Baden-Tillson, H., Pfannkoch, C., Rogers, Y.-H., Smith, H.O.: Environmental genome shotgun sequencing of the sargasso sea. Science 304, 66–74 (2004)

    Article  Google Scholar 

  49. Welch, R.A., Burland, V., Plunkett, G., Redford, P., Roesch, P., Rasko, D., Buckles, E.L., Liou, S.R., Boutin, A., Hackett, J., Stroud, D., Mayhew, G.F., Rose, D.J., Zhou, S., Schwartz, D.C., Perna, N.T., Mobley, H.L., Donnenberg, M.S., Blattner, F.R.: Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proceedings National Academy of Science USA 99(26), 17020–17024 (2002)

    Article  Google Scholar 

  50. Wong, L.: The practical bioinformatician, 1st edn. World Scientific Publishing Company, Singapore (2004)

    Google Scholar 

  51. Woyke, T., Teeling, H., Ivanova, N.N., Huntemann, M., Richter, M., Gloeckner, F.O., Boffelli, D., Anderson, I.J., Barry, K.W., Shapiro, H.J., Szeto, E., Kyrpides, N.C., Mussmann, M., Amann, R., Bergin, C., Ruehland, C., Rubin, E.M., Dubilier, N.: Symbiosis insights through metagenomic analysis of a microbial consortium. Nature 443(7114), 925–927 (2006)

    Article  Google Scholar 

  52. Xu, D., Bondugula, R., Popescu, M., Keller, J.: Bioinformatics and fuzzy logic. In: IEEE International Conference on Fuzzy Systems, Vancouver, BC, pp. 817–824 (2006)

    Google Scholar 

  53. Zadeh, L.A.: Fuzzy logic and approximate reasoning. Synthese 30 (1975)

    Google Scholar 

  54. Zhang, Z., Schwartz, S., Wagner, L., Miller, W.: A greedy algorithm for aligning DNA sequences. Journal of Computational Biology 7, 203–214 (2000)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Nasser, S., Breland, A., Harris, F.C., Nicolescu, M., Vert, G.L. (2009). Fuzzy Genome Sequence Assembly for Single and Environmental Genomes. In: Jin, Y., Wang, L. (eds) Fuzzy Systems in Bioinformatics and Computational Biology. Studies in Fuzziness and Soft Computing, vol 242. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89968-6_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-89968-6_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-89967-9

  • Online ISBN: 978-3-540-89968-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics