Multiple Sequence Alignment System for Pyrosequencing Reads
Pyrosequencing is among the emerging sequencing techniques, capable of generating upto 100,000 overlapping reads in a single run. This technique is much faster and cheaper than the existing state of the art sequencing technique such as Sanger. However, the reads generated by pyrosequencing are short in size and contain numerous errors. In order to use these reads for any subsequent analysis, the reads must be aligned . Existing multiple sequence alignment methods cannot be used as they do not take into account the specific positions of the sequences with respect to the genome, and are highly inefficient for large number of sequences. Therefore, the common practice has been to use either simple pairwise alignment despite its poor accuracy for error prone pyroreads, or use computationally expensive techniques based on sequential gap propagation. In this paper, we develop a computationally efficient method based on domain decomposition, referred to as pyro-align, to align such large number of reads. The proposed alignment algorithm accurately aligns the erroneous reads in a short period of time, which is orders of magnitude faster than any existing method. The accuracy of the alignment is confirmed from the consensus obtained from the multiple alignments.
Unable to display preview. Download preview PDF.
- 1.Saeed, F., Khokhar, A.: Sample-Align-D: A High Performance Multiple Sequence Alignment System using Phylogenetic Sampling and Domain Decomposition. In: Proc. 23rd IEEE International Parallel and Distributed Processing Symposium (April 2007)Google Scholar
- 3.Liu, Z., Lozupone, C., Hamady, M., Bushman, F.D., Knight, R.: Short pyrosequencing reads suffice for accurate microbial community analysis. Nucl. Acids Res. 541 (2007)Google Scholar
- 8.Setubal, C., Meidanis, J.: Introduction to Computational Molecular Biology (January 1997)Google Scholar
- 9.Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Computer Science and Computational Biology (January 1997)Google Scholar
- 10.Gusfield, D.: Efficient methods for multiple sequence alignment with guaranteed error bounds. Computer Science Division, UC Davis, Technical Report CSE 91-4 (1991)Google Scholar
- 11.Schmid, R., Schuster, S.C., Steel, M.A., Huson, D.H.: ReadSim-A simulator for Sanger and 454 sequencing (2006)Google Scholar
- 12.Eriksson, N., Pachter, L., Mitsuya, Y., Rhee, S.-Y., Wang, C., Gharizadeh, B., Ronaghi, M., Shafer, R.W., Beerenwinkel, N.: Viral Population Estimation Using Pyrosequencing: PLoS Comput Biol. Public Library of Science 4 (May 2008)Google Scholar
- 14.Zagordi, O., Geyrhofer, L., Roth, V., Beerenwinkel, N.: Deep sequencing of a genetically heterogeneous sample: local haplotype reconstruction and read error correction. In: RECOMB 2009 (accepted paper) (2009)Google Scholar
- 19.Edgar, R.C.: MUSCLE: Multiple Sequence Alignment with High Accuracy and High Throughput. Nucleic Acids Research 32(5) (2004)Google Scholar
- 20.Edgar, R.C.: MUSCLE: A Multiple Sequence Alignment Method with Reduced Time and Space Complexity. BMC Bioinformatics, 1471–2105 (2004)Google Scholar
- 22.Saeed, F., Khokhar, A.: A Domain Decomposition Strategy for Alignment of Multiple Biological Sequences on Multiprocessor Platforms. Journal of Parallel and Distributed Computing (to appear)Google Scholar
- 30.Roche Applied Sciences:GS20 Data Processing Software Manual:Penzberg: Roche Diagnostics GmbH (2006)Google Scholar