Abstract
Mapping sequenced reads to a reference genome, also known as sequence reads alignment, is central for sequence analysis. Emerging sequencing technologies such as next generation sequencing (NGS) lead to an explosion of sequencing data, which is far beyond the process capabilities of existing alignment tools. Consequently, sequence alignment becomes the bottleneck of sequence analysis. Intensive computing power is required to address this challenge. A key feature of sequence alignment is that different reads are independent. Considering this property, we proposed a multi-level parallelization strategy to speed up BWA, a widely used sequence alignment tool and developed our massively parallel sequence aligner: mBWA. mBWA contains two levels of parallelization: firstly, parallelization of data input/output (IO) and reads alignment by a three-stage parallel pipeline; secondly, parallelization enabled by Intel Many Integrated Core (MIC) coprocessor technology. In this paper, we demonstrate that mBWA outperforms BWA by a combination of those techniques. To the best of our knowledge, mBWA is the first sequence alignment tool to run on Intel MIC and it can achieve more than 5-fold speedup over the original BWA while maintaining the alignment precision.
mBWA is under BSD and freely available at http://sourceforge.net/projects/mbwa
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. Journal of Molecular Biology 215(3), 403–410 (1990)
Chen, Y., Souaiaia, T., Chen, T.: PerM: efficient mapping of short sequencing reads with periodic full sensitive spaced seeds. Bioinformatics 25(19), 2514–2521 (2009)
Clement, N.L., Snell, Q., Clement, M.J., Hollenhorst, P.C., Purwar, J., Graves, B.J., Johnson, W.E.: The GNUMAP algorithm: unbiased probabilistic mapping of oligonucleotides from next-generation sequencing. Bioinformatics 26(1), 38–45 (2010)
Cokus, S.J., Feng, S., Zhang, X., Chen, Z., Merriman, B., Haudenschild, C.D., Jacobsen, S.E.: Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature 452(7184), 215–219 (2008)
Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proceedings of the 41st Annual Symposium on Foundations of Computer Science 2000, pp. 390–398. IEEE (2000)
Homer, N., Merriman, B., Nelson, S.F.: BFAST: an alignment tool for large scale genome resequencing. PloS One 4(11), e7767 (2009)
Jeffers, J., Reinders, J.: Intel Xeon Phi Coprocessor High Performance Programming. Newnes, Boston (2013)
Jiang, H., Wong, W.H.: SeqMap: mapping massive amount of oligonucleotides to the genome. Bioinformatics 24(20), 2395–2396 (2008)
Langmead, B., Trapnell, C., Pop, M., Salzberg, S.L.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10(3), R25 (2009)
Li, H., Durbin, R.: Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25(14), 1754–1760 (2009)
Li, H., Homer, N.: A survey of sequence alignment algorithms for next-generation sequencing. Briefings in Bioinformatics 11(5), 473–483 (2010)
Li, H., Ruan, J., Durbin, R.: Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Research 18(11), 1851–1858 (2008)
Li, R., Li, Y., Fang, X., Yang, H., Wang, J., Kristiansen, K., Wang, J.: SNP detection for massively parallel whole-genome resequencing. Genome Research 19(6), 1124–1132 (2009)
Li, R., Li, Y., Kristiansen, K., Wang, J.: SOAP: short oligonucleotide alignment program. Bioinformatics 24(5), 713–714 (2008)
Li, R., Yu, C., Li, Y., Lam, T.W., Yiu, S.M., Kristiansen, K., Wang, J.: SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25(15), 1966–1967 (2009)
Lin, H., Zhang, Z., Zhang, M.Q., Ma, B., Li, M.: ZOOM! Zillions of oligos mapped. Bioinformatics 24(21), 2431–2437 (2008)
Ma, B., Tromp, J., Li, M.: PatternHunter: faster and more sensitive homology search. Bioinformatics 18(3), 440–445 (2002)
Medina-Medina, N., Broka, A., Lacey, S., Lin, H., Klings, E.S., Baldwin, C.T., Steinberg, M.H., Sebastiani, P.: Comparing Bowtie and BWA to Align Short Reads from a RNA-Seq Experiment. In: Rocha, M.P., Luscombe, N., Fdez-Riverola, F., RodrÃguez, J.M.C. (eds.) 6th International Conference on PACBB. AISC, vol. 154, pp. 197–207. Springer, Heidelberg (2012)
Pireddu, L., Leo, S., Zanetti, G.: MapReducing a genomic sequencing workflow. In: Proceedings of the Second International Workshop on MapReduce and its Applications, pp. 67–74. ACM (2011)
Rumble, S.M., Lacroute, P., Dalca, A.V., Fiume, M., Sidow, A., Brudno, M.: SHRiMP: accurate mapping of short color-space reads. PLoS Computational Biology 5(5), e1000386 (2009)
Schatz, M.C.: CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics 25(11), 1363–1369 (2009)
Smith, A.D., Xuan, Z., Zhang, M.Q.: Using quality scores and longer reads improves accuracy of Solexa read mapping. BMC Bioinformatics 9(1), 128 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Cui, Y., Liao, X., Zhu, X., Wang, B., Peng, S. (2014). mBWA: A Massively Parallel Sequence Reads Aligner. In: Saez-Rodriguez, J., Rocha, M., Fdez-Riverola, F., De Paz Santana, J. (eds) 8th International Conference on Practical Applications of Computational Biology & Bioinformatics (PACBB 2014). Advances in Intelligent Systems and Computing, vol 294. Springer, Cham. https://doi.org/10.1007/978-3-319-07581-5_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-07581-5_14
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07580-8
Online ISBN: 978-3-319-07581-5
eBook Packages: EngineeringEngineering (R0)