Summary
Traditional methods obtain a microorganism’s DNA by culturing it individually. Recent advances in genomics have lead to the procurement of DNA of more than one organism from its natural habitat. Indeed, natural microbial communities are often very complex with tens and hundreds of species. Assembling these genomes is a crucial step irrespective of the method of obtaining the DNA. This chapter presents fuzzy methods for multiple genome sequence assembly of cultured genomes (single organism) and environmental genomes (multiple organisms).
An optimal alignment of DNA genome fragments is based on several factors, such as the quality of bases and the length of overlap. Factors such as quality indicate if the data is high quality or an experimental error. We propose a sequence assembly solution based on fuzzy logic, which allows for tolerance of inexactness or errors in fragment matching and that can be used for improved assembly.
We propose fuzzy classification using modified fuzzy weighted averages to classify fragments belonging to different organisms within an environmental genome population. Our proposed approach uses DNA-based signatures such as GC content and nucleotide frequencies as features for the classification. This divide-and-conquer strategy also improves performance on larger datasets. We evaluate our method on artificially created environmental genomes to test various combinations of organisms and on an environmental genome.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Baxevanis, A.D., Ouellette, B.F.F.: Bioinformatics: A practical guide to the analysis of genes and proteins, 1st edn. John Wiley, Chichester (2005)
Beja, O., Suzuki, M.T., Koonin, E.V., Aravind, L., Hadd, A., Nguyen, L.P., Villacorta, R., Amjadi, M., Garrigues, C., Jovanovich, S.B., Feldman, R.A., DeLong, E.F.: Construction and analysis of bacterial artificial chromosome libraries from a marine microbial assemblage. Environmental Microbiology 2, 516–529 (2000)
Birdsell, J.A.: Integrating genomics, bioinformatics and classical genetics to study the effects of recombination on genome evolution. Molocular Biology Evolution 19, 1181–1197 (2002)
Brown, T.A.: Genomes, 3rd edn. Garland Science (2006)
Burge, C., Campbell, A.M., Karlin, S.: Over- and under-representation of short oligonucleotides in DNA sequences. Proceedings National Acaddemy of Science USA 89(4), 1358–1362 (1992)
Chen, K., Pachter, L.: Bioinformatics for whole-genome shotgun sequencing of microbial communities. PLoS Computational Biology 1, 106–112 (2005)
Choudhuri, S.: The path from nuclein to human genome: A brief history of DNA with a note on human genome sequencing and its impact on future research in biology. Bulletin of Science Technology Society 23, 360–367 (2003)
Conant, G.C., Lewis, P.O.: Effects of nucleotide composition bias on the success of the parsimony criterion in phylogenetic inference. Molecular Biology Evolution 18, 1024–1033 (2001)
Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to algorithms, 2nd edn., pp. 313–319. McGraw-Hill, New York (2001)
Edmund, P.: A history of genome sequencing, Tech. report, Yale University Bioinformatics (2001)
Ewing, B., Green, P.: Basecalling of automated sequencer traces using phred. ii. error probabilities. Genome Research 8, 186–194 (1998)
Fleischmann, R., Adams, M., White, O., Clayton, R., Kirkness, E., Kerlavage, A., Bult, C., Tomb, J., Dougherty, B., Merrick, J.: Whole-genome random sequencing and assembly of Haemophilus Influenzae Rd. Science 269(5223), 496–512 (1995)
Gasch, A.P., Eisen, M.B.: Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering. Genome Biology 3(11), 1–22 (2002)
Gene, M.: Whole-genome DNA sequencing. IEEE Computational Engineering and Science 1, 33–43 (1999)
Green, P.: Documentation for phrap. Genome Center, University of Washington (2006)
Gutman, G.A., Hatfield, G.W.: Nonrandom utilization of codon pairs in Escherichia coli. Proceedings National Academy of Science USA 86, 3699–3703 (1989)
Huang, X., Madan, A.: CAP3: A DNA sequence assembly program. Genome Research 9(9), 868–877 (1999)
Hugenholtz, P.: Exploring prokaryotic diversity in the genomic era. Genome Biology 3, reviews0003.1–reviews0003.8 (2002)
Karlin, S., Ladunga, I., Blaisdell, B.E.: Heterogeneity of genomes: Measures and values. Proceedings National Academy of Science USA 91, 12837–12841 (1994)
Kececioglu, J.D., Myers, E.W.: Combinatorial algorithms for DNA sequence assembly. Algorithmica 13, 7–51 (1995)
Looney, C.G.: Interactive clustering and merging with a new fuzzy expected value. Pattern Recognition 35, 2413–2423 (2002)
McHardy, A.C., MartÃn, H.G., Tsirigos, A., Hugenholtz, P., Rigoutsos, I.: Accurate phylogenetic classification of variable-length DNA fragments. Nature Methods 4(1), 63–72 (2007)
Mongodin, E., Emerson, J., Nelson, K.: Microbial metagenomics. Genome Biology 6(10), 347 (2005)
Nasser, S.: Fuzzy sequence classification and assembly of environmental genomes, Ph.D. thesis, University of Nevada Reno (2008)
Nasser, S., Vert, G., Nicolescu, M., Murray, A.: Multiple sequence alignment using fuzzy logic. In: Proceedings of the IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, Honolulu, Hawaii, vol. 7, pp. 304–311 (2007)
Nasser, S., Vert, G.L., Breland, A., Nicolescu, M.: Fuzzy classification of genome sequences prior to assembly based on similarity measures. In: North American Fuzzy Information Processing Society, pp. 354–359 (2007)
NCBI, National center for biotechnology information, NIH (2007), http://www.ncbi.nlm.nih.gov/
Nowak, M.A.: Evolutionary dynamics: Exploring the equations of life, 1st edn. Belknap Press (October 2006)
Oliver, J.L., MarÃn, A.: A relationship between GC content and coding-sequence length. Journal of Molecular Evolution 43(3), 216–223 (2004)
Otu, H.H., Sayood, K.: A divide-and-conquer approach to fragment assembly. Bioinformatics 19(1), 22–29 (2003)
Peltola, H., Soderlund, H., Ukkonen, E.: Seqaid: A DNA sequence assembling program based on a mathematical model. Nucleic Acids Research 21(1), 307–321 (1984)
Pop, M., Salzberg, S.L., Shumway, M.: Genome sequence assembly: Algorithms and issues. IEEE Computer 35(7), 47–54 (2002)
Pride, D.T., Meinersmann, R.J., Wassenaar, T.M., Blaser, M.J.: Evolutionary implications of microbial genome tetranucleotide frequency biases. Genome Research 13(2), 145–158 (2003)
Rappe, M., Giovannoni, S.: The uncultured microbial majority. Annual Reviews Microbiology 57, 369–394 (2003)
Reva, O., Tümmler, B.: Differentiation of regions with atypical oligonucleotide composition in bacterial genomes. BMC Bioinformatics 6(1), 251 (2005)
Rondon, M.R., August, P.R., Bettermann, A.D., Bradly, S.F., Grossman, T.H., Liles, M.R., Loiacono, K.A., Lynch, B.A., MacNeil, I.A., Minor, C., Tiong, C.L., Gilman, M., Osburne, M.S., Clardy, J., Handelsman, J., Goodman, R.M.: Cloning the soil metagenome: a strategy for accessing the genetic and functional diversity of uncultured microorgansims. Applications Environmental Microbiology 66, 2541–2547 (2000)
Sadegh-Zadeh, K.: Fuzzy genomes. Artificial Intelligent Medicine 18(1), 1–28 (2000)
Sanger, F., Coulson, A.R., Hong, G.F., Hill, D.F., Petersen, G.B.: Nucleotide sequence of Bacteriophage Lambda DNA. Journal Molecular Biology 162(4), 729–773 (1982)
Sanger, F., Nicklen, S., Coulson, A.R.: DNA sequencing with chain-terminating inhibitors. Proceedings National Academy of Science USA 74(12), 5463–5467 (1977)
Smith, T., Waterman, M.: Identification of common molecular subsequences. Journal of Molecular Biology 147, 195–197 (1981)
Stein, J.L., Marsh, T.L., Wu, K.Y., Shizuya, H., DeLong, E.F.: Characterization of uncultivated prokaryotes: isolation and analysis of a 40-kilobase-pair genome fragment from a planktonic marine archaeon. Journal of Bacteriology 178, 591–599 (1996)
Sutton, G., White, O., Adams, M., Kerlavage, A.: TIGR Assembler: A new tool for assembling large shotgun sequencing projects. Genome Science & Technology 1, 9–19 (1995)
Teeling, H., Meyerdierks, A., Bauer, M., Amann, R., Glockner, F.: Application of tetranucleotide frequencies for the assignment of genomic fragments. Environmental Microbiology 6, 938–947 (2004)
Teeling, H., Waldmann, J., Lombardot, T., Bauer, M., Glockner, F.O.: TETRA: A web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in dna sequences. BMC Bioinformatics 5, 163 (2004)
Turnbaugh, P.J., Ley, R.E., Mahowald, M.A., Magrini, V., Mardis, E.R., Gordon, J.I.: An obesity–associated gut microbiome with increased capacity for energy harvest. Nature 444(7122), 1009–1010 (2006)
Tyson, G.W., Chapman, J., Hugenholtz, P., Allen, E.E., Ram, R.J., Richardson, P.M., Solovyev, V.V., Rubin, E.M., Rokhsar, D.S., Banfield, J.F.: Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428, 37–43 (2004)
Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)
Venter, J.C., Remington, K., Heidelberg, J.F., Halpern, A.L., Rusch, D., Eisen, J.A., Wu, D., Paulsen, I., Nelson, K.E., Nelson, W., Fouts, D.E., Levy, S., Knap, A.H., Lomas, M.W., Nealson, K., White, O., Peterson, J., Hoffman, J., Parsons, R., Baden-Tillson, H., Pfannkoch, C., Rogers, Y.-H., Smith, H.O.: Environmental genome shotgun sequencing of the sargasso sea. Science 304, 66–74 (2004)
Welch, R.A., Burland, V., Plunkett, G., Redford, P., Roesch, P., Rasko, D., Buckles, E.L., Liou, S.R., Boutin, A., Hackett, J., Stroud, D., Mayhew, G.F., Rose, D.J., Zhou, S., Schwartz, D.C., Perna, N.T., Mobley, H.L., Donnenberg, M.S., Blattner, F.R.: Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proceedings National Academy of Science USA 99(26), 17020–17024 (2002)
Wong, L.: The practical bioinformatician, 1st edn. World Scientific Publishing Company, Singapore (2004)
Woyke, T., Teeling, H., Ivanova, N.N., Huntemann, M., Richter, M., Gloeckner, F.O., Boffelli, D., Anderson, I.J., Barry, K.W., Shapiro, H.J., Szeto, E., Kyrpides, N.C., Mussmann, M., Amann, R., Bergin, C., Ruehland, C., Rubin, E.M., Dubilier, N.: Symbiosis insights through metagenomic analysis of a microbial consortium. Nature 443(7114), 925–927 (2006)
Xu, D., Bondugula, R., Popescu, M., Keller, J.: Bioinformatics and fuzzy logic. In: IEEE International Conference on Fuzzy Systems, Vancouver, BC, pp. 817–824 (2006)
Zadeh, L.A.: Fuzzy logic and approximate reasoning. Synthese 30 (1975)
Zhang, Z., Schwartz, S., Wagner, L., Miller, W.: A greedy algorithm for aligning DNA sequences. Journal of Computational Biology 7, 203–214 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Nasser, S., Breland, A., Harris, F.C., Nicolescu, M., Vert, G.L. (2009). Fuzzy Genome Sequence Assembly for Single and Environmental Genomes. In: Jin, Y., Wang, L. (eds) Fuzzy Systems in Bioinformatics and Computational Biology. Studies in Fuzziness and Soft Computing, vol 242. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89968-6_2
Download citation
DOI: https://doi.org/10.1007/978-3-540-89968-6_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89967-9
Online ISBN: 978-3-540-89968-6
eBook Packages: EngineeringEngineering (R0)