Seed Design Framework for Mapping SOLiD Reads

  • Laurent Noé
  • Marta Gîrdea
  • Gregory Kucherov
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6044)

Abstract

The advent of high-throughput sequencing technologies constituted a major advance in genomic studies, offering new prospects in a wide range of applications. We propose a rigorous and flexible algorithmic solution to mapping SOLiD color-space reads to a reference genome. The solution relies on an advanced method of seed design that uses a faithful probabilistic model of read matches and, on the other hand, a novel seeding principle especially adapted to read mapping. Our method can handle both lossy and lossless frameworks and is able to distinguish, at the level of seed design, between SNPs and reading errors. We illustrate our approach by several seed designs and demonstrate their efficiency.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ma, B., Tromp, J., Li, M.: PatternHunter: Faster and more sensitive homology search. Bioinformatics 18(3), 440–445 (2002)CrossRefGoogle Scholar
  2. 2.
    Noé, L., Kucherov, G.: YASS: enhancing the sensitivity of DNA similarity search. Nucleic Acids Research 33(Web Server Issue), W540–W543 (2005)Google Scholar
  3. 3.
    Li, H., Ruan, J., Durbin, R.: Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Research 18, 1851–1858 (2008)CrossRefGoogle Scholar
  4. 4.
    Strömberg, M., Lee, W.P.: MOSAIK read alignment and assembly program (2009), http://bioinformatics.bc.edu/marthlab/Mosaik
  5. 5.
    Rivals, E., Salmela, L., Kiiskinen, P., Kalsi, P., Tarhio, J.: MPSCAN: Fast localisation of multiple reads in genomes. In: Salzberg, S.L., Warnow, T. (eds.) Algorithms in Bioinformatics. LNCS, vol. 5724, pp. 246–260. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  6. 6.
    Campagna, D., Albiero, A., Bilardi, A., Caniato, E., Forcato, C., Manavski, S., Vitulo, N., Valle, G.: PASS: a program to align short sequences. Bioinformatics 25(7), 967–968 (2009)CrossRefGoogle Scholar
  7. 7.
    Chen, Y., Souaiaia, T., Chen, T.: PerM: Efficient mapping of short sequencing reads with periodic full sensitive spaced seeds. Bioinformatics 25(19), 2514–2521 (2009)CrossRefGoogle Scholar
  8. 8.
    Weese, D., Emde, A., Rausch, T., Döring, A., Reinert, K.: RazerS–fast read mapping with sensitivity control. Genome Research 19(9), 1646–1654 (2009)CrossRefGoogle Scholar
  9. 9.
    Rumble, S.M., Lacroute, P., Dalca, A.V., Fiume, M., Sidow, A., Brudno, M.: SHRiMP: Accurate mapping of short color-space reads. PLoS Comp. Biol. 5(5) (2009)Google Scholar
  10. 10.
    Lin, H., Zhang, Z., Zhang, M., Ma, B., Li, M.: ZOOM! zillions of oligos mapped. Bioinformatics 24(21), 2431–2437 (2008)CrossRefGoogle Scholar
  11. 11.
    Langmead, B., Trapnell, C., Pop, M., Salzberg, S.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology 10(3) (2009)Google Scholar
  12. 12.
    Li, H., Durbin, R.: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25(14), 1754–1760 (2009)CrossRefGoogle Scholar
  13. 13.
    Li, R., Yu, C., Li, Y., Lam, T., Yiu, S., Kristiansen, K., Wang, J.: SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25(15), 1966–1967 (2009)CrossRefGoogle Scholar
  14. 14.
    Hoffmann, S., Otto, C., Kurtz, S., Sharma, C., Khaitovich, P., Stadler, P., Hackermuller, J.: Fast mapping of short sequences with mismatches, insertions and deletions using index structures. PLoS Comp. Biol. 5(9) (2009)Google Scholar
  15. 15.
    Homer, N., Merriman, B., Nelson, S.F.: BFAST: an alignment tool for large scale genome resequencing. PLoS One 4(11) (2009)Google Scholar
  16. 16.
    Ondov, B., Varadarajan, A., Passalacqua, K., Bergman, N.: Efficient mapping of Applied Biosystems SOLiD sequence data to a reference genome for functional genomic applications. Bioinformatics 24(23), 2776–2777 (2008)CrossRefGoogle Scholar
  17. 17.
    Prufer, K., Stenzel, U., Dannemann, M., Green, R., Lachmann, M., Kelso, J.: PatMaN: rapid alignment of short sequences to large databases. Bioinformatics 24(13), 1530–1531 (2008)CrossRefGoogle Scholar
  18. 18.
    Bentley, D., Balasubramanian, S., Swerdlow, H., Smith, G., Milton, J., Brown, C., Hall, K., Evers, D., Barnes, C., Bignell, H., et al.: Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456(7218), 53–59 (2008)CrossRefGoogle Scholar
  19. 19.
    Kucherov, G., Noé, L., Roytberg, M.: Multiseed lossless filtration. IEEE Transactions on Computational Biology and Bioinformatics 2(1), 51–61 (2005)CrossRefGoogle Scholar
  20. 20.
    Kucherov, G., Noé, L., Roytberg, M.: A unifying framework for seed sensitivity and its application to subset seeds. J. Bioinform. Comput. Biol. 4(2), 553–570 (2006)CrossRefGoogle Scholar
  21. 21.
    ABI: A theoretical understanding of 2 base color codes and its application to annotation, error detection, and error correction. methods for annotating 2 base color encoded reads in the SOLiDTMsystem (2008)Google Scholar
  22. 22.
    ABI: The SOLiDTM3 system. enabling the Next Generation of Science (2009)Google Scholar
  23. 23.
    Ewing, B., Green, P.: Base-calling of automated sequencer traces using phred. II. error probabilities. Genome Research 8(3), 186–194 (1998)Google Scholar
  24. 24.
    Li, M., Ma, B., Kisman, D., Tromp, J.: PatternHunter II: Highly sensitive and fast homology search. J. Bioinform. Comput. Biol. 2(3), 417–439 (2004)CrossRefGoogle Scholar
  25. 25.
    Sun, Y., Buhler, J.: Designing multiple simultaneous seeds for DNA similarity search. Journal of Computational Biology 12(6), 847–861 (2005)CrossRefGoogle Scholar
  26. 26.
    Brejová, B., Brown, D.G., Vinar, T.: Optimal spaced seeds for hidden markov models, with application to homologous coding regions. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 42–54. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  27. 27.
    Zhou, L., Stanton, J., Florea, L.: Universal seeds for cDNA-to-genome comparison. BMC Bioinformatics 9(36) (2008)Google Scholar
  28. 28.
    Yang, J., Zhang, L.: Run probabilities of seed-like patterns and identifying good transition seeds. Journal of Computational Biology 15(10), 1295–1313 (2008)CrossRefMathSciNetGoogle Scholar
  29. 29.
    Kucherov, G., Noé, L., Roytberg, M.: Subset seed automaton. In: Holub, J., Žďárek, J. (eds.) CIAA 2007. LNCS, vol. 4783, pp. 180–191. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  30. 30.
    Kucherov, G., Noé, L., Roytberg, M.: Iedera: subset seed design tool (2009), http://bioinfo.lifl.fr/yass/iedera
  31. 31.
    Keich, U., Li, M., Ma, B., Tromp, J.: On spaced seeds for similarity search. Discrete Applied Mathematics 138(3), 253–263 (2004); preliminary version in 2002Google Scholar
  32. 32.
    Buhler, J., Keich, U., Sun, Y.: Designing seeds for similarity search in genomic DNA. In: Proceedings of the 7th Annual International Conference on Computational Molecular Biology (RECOMB), pp. 67–75. ACM Press, New York (2003)Google Scholar
  33. 33.
    Burkhardt, S., Kärkkäinen, J.: Better filtering with gapped q-grams. CPM 2001 56(1,2), 51–70 (2003); Preliminary version in CPM 2001Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Laurent Noé
    • 1
  • Marta Gîrdea
    • 1
  • Gregory Kucherov
    • 1
  1. 1.INRIA Lille - Nord Europe, LIFL/CNRSUniversité Lille 1Villeneuve d’AscqFrance

Personalised recommendations