Skip to main content

Alignment-Free Sequence Comparison Based on Next Generation Sequencing Reads: Extended Abstract

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 7262))

Abstract

Next generation sequencing (NGS) technologies have generated enormous amount of shotgun read data and assembly of the reads can be challenging, especially for organisms without template sequences. We study the power of genome comparison based on shotgun read data without assembly using three alignment-free sequence comparison statistics, \(D_2, D_2^*\), and \(D_2^S\), both theoretically and by simulations. Theoretical formulas for the power of detecting the relationship between two sequences related through a common motif model are derived. It is shown that both \(D_2^*\) and \(D_2^S\) outperform D 2 for detecting the relationship between two sequences based on NGS data. We then study the effects of length of the tuple, read length, coverage, and sequencing error on the power of \(D_2^*\) and \(D_2^S\). Finally, variations of these statistics, \(d_2, d_2^*\) and \(d_2^S\), respectively, are used to first cluster 5 mammalian species with known phylogenetic relationships and then cluster 13 tree species whose complete genome sequences are not available using NGS shotgun reads. The clustering results using \(d_2^S\) are consistent with biological knowledge for the 5 mammalian and 13 tree species, respectively. Thus, the statistic \(d_2^S\) provides a powerful alignment-free comparison tool to study the relationships among different organisms based on NGS read data without assembly.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Blaisdell, B.E.: A measure of the similarity of sets of sequences not requiring sequence alignment. Proceedings of the National Academy of Sciences of the United States of America 83(14), 5155–5159 (1986)

    Article  MATH  Google Scholar 

  2. Domazet-Lošo, M., Haubold, B.: Alignment-free detection of local similarity among viral and bacterial genomes. Bioinformatics 27(11), 1466–1472 (2011)

    Article  Google Scholar 

  3. Ivan, A., Halfon, M., Sinha, S.: Computational discovery of cis-regulatory modules in Drosophila without prior knowledge of motifs. Genome Biology 9(1), R22 (2008)

    Article  Google Scholar 

  4. Jun, S.R., Sims, G.E., Wu, G.A., Kim, S.H.: Whole-proteome phylogeny of prokaryotes by feature frequency profiles: An alignment-free method with optimal feature resolution. Proceedings of the National Academy of Sciences of the United States of America 107(1), 133–138 (2010)

    Article  Google Scholar 

  5. Leung, G., Eisen, M.B.: Identifying cis-regulatory sequences by word profile similarity. PLoS One 4, e6901 (2009)

    Article  Google Scholar 

  6. Lippert, R.A., Huang, H.Y., Waterman, M.S.: Distributional regimes for the number of k-word matches between two random sequences. Proceedings of the National Academy of Sciences of the United States of America 100(13), 13980–13989 (2002)

    Article  MathSciNet  Google Scholar 

  7. Liu, X., Wan, L., Li, J., Reinert, G., Waterman, M.S., Sun, F.: New powerful statistics for alignment-free sequence comparison under a pattern transfer model. Journal of Theoretical Biology 284(1), 106–116 (2011)

    Article  Google Scholar 

  8. Reinert, G., Chew, D., Sun, F.Z., Waterman, M.S.: Alignment-free sequence comparison (I): Statistics and power. Journal of Computational Biology 16(12), 1615–1634 (2009)

    Article  MathSciNet  Google Scholar 

  9. Sims, G.E., Jun, S.R., Wu, G.A., Kim, S.H.: Alignment-free genome comparison with feature frequency profiles (FFP) and optimal resolutions. Proceedings of the National Academy of Sciences of the United States of America 106(8), 2677–2682 (2009)

    Article  Google Scholar 

  10. Vinga, S., Almeida, J.: Alignment-free sequence comparison–a review. Bioinformatics 19(4), 513–523 (2003)

    Article  Google Scholar 

  11. Wan, L., Reinert, G., Sun, F., Waterman, M.S.: Alignment-free sequence comparison (II): Theoretical power of comparison statistics. Journal of Computational Biology 17(11), 1467–1490 (2010)

    Article  MathSciNet  Google Scholar 

  12. Zhai, Z.Y., Ku, S.Y., Luan, Y.H., Reinert, G., Waterman, M.S., Sun, F.Z.: The power of detecting enriched patterns: An HMM approach. Journal of Computational Biology 17(4), 581–592 (2010)

    Article  MathSciNet  Google Scholar 

  13. Zhang, Z.D., Rozowsky, J., Snyder, M., Chang, J., Gerstein, M.: Modeling ChIP sequencing in silico with applications. PLoS Computational Biology 4(8), e1000158 (2008)

    Article  MathSciNet  Google Scholar 

  14. Hansen, K.D., Brenner, S.E., Dudoit, S.: Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Research 38(12), e131 (2010)

    Article  Google Scholar 

  15. Li, J., Jiang, H., Wong, W.H.: Modeling non-uniformity in short-read rates in RNA-Seq data. Genome Biology 11, R50 (2010)

    Article  Google Scholar 

  16. Richter, D.C., Ott, F., Auch, A.F., Schmid, R., Huson, D.H.: MetaSim: a sequencing simulator for genomics and metagenomics. PLoS One 3(10), e3373 (2008)

    Article  Google Scholar 

  17. Cannon, C.H., Kua, C.S., Zhang, D., Harting, J.R.: Assembly free comparative genomics of short-read sequence data discovers the needles in the haystack. Molecular Ecology 19(suppl. 1), 146–160 (2010)

    Google Scholar 

  18. Miller, W., Rosenbloom, K., Hardison, R.C., Hou, M., Taylor, J., Raney, B., Burhans, R., King, D.C., Baertsch, R., Blankenberg, D., et al.: 28-way vertebrate alignment and conservation track in the UCSC genome browser. Genome Research 17(12), 1797–1808 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Song, K., Ren, J., Zhai, Z., Liu, X., Deng, M., Sun, F. (2012). Alignment-Free Sequence Comparison Based on Next Generation Sequencing Reads: Extended Abstract. In: Chor, B. (eds) Research in Computational Molecular Biology. RECOMB 2012. Lecture Notes in Computer Science(), vol 7262. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29627-7_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-29627-7_29

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29626-0

  • Online ISBN: 978-3-642-29627-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics