BWTCP: A Parallel Method for Constructing BWT in Large Collection of Genomic Reads
Short-read alignment and assembly are fundamental procedures for analyses of DNA sequencing data. Many state-of-the-art short-read aligners employ Burrows-Wheeler transform (BWT) as an in-memory index for the reference genome. BWT has also found its use in genome assembly, for indexing the reads. In a typical data set, the volume of reads can be as large as several hundred Gigabases. Consequently, fast construction of the BWT index for reads is essential for an efficient sequence processing. In this paper, we present a parallel method called BWTCP for BWT construction at a large scale. BWTCP is characterized by its ability to harness heterogeneous computing power including multi-core CPU, multiple CPUs, and accelerators like GPU or Intel Xeon Phi. BWTCP is also featured by its novel pruning strategy. Using BWTCP, we managed to construct the BWT for 1 billion 100bp reads within 30 m using 16 compute nodes (2 CPUs per node) on Tianhe-2 Supercomputer. It significantly outperforms the baseline tool BCR, which would need 13 h to finish all processing for the same dataset. BWTCP is freely available at https://github.com/hwang91/BWTCP.
KeywordsBWT Genome assembly BWTCP BCR CX1 Radix sort Parallel computing
We acknowledge Prof. T.W. Lam, Project Manager Ruibang Luo and C.M. Liu in BAL lab, Department of Computer Science, The University of Hong Kong for providing the source codes, related data and constructive advice both in designing and testing of BWTCP. And this work is supported by NSFC Grant 61272056, U1435222, 61133005, 61120106005 and 91432018.
- 2.Deshpande, V.: Sequencing, assembling, and annotating a mid-sized genome. In: Plant and Animal Genome XXII Conference. Plant and Animal Genome (2014)Google Scholar
- 4.Blazewicz, J., Frohmberg, W., Gawron, P., et al.: DNA sequence assembly involving an acyclic graph model. Found. Comput. Decis. Sci. 38(1), 25–34 (2013)Google Scholar
- 5.Li, B., Fillmore, N., Bai, Y., et al.: Evaluation of de novo transcriptome assemblies from RNASeqdata. bioRxiv (2014)Google Scholar
- 6.Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm (1994)Google Scholar
- 9.Chan, S.H., Cheung, J., Wu, E., et al.: MICA: a fast short-read aligner that takes full advantage of intel many integrated core architecture (MIC) (2014). arXiv preprint arXiv:1402.4876
- 12.Liu, C.M., Luo, R., Lam, T.W.: GPU-accelerated BWT construction for large collection of short reads (2014). arXiv preprint arXiv:1401.7457
- 13.Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proceedings of the 41st Annual Symposium on Foundations of Computer Science, p. 390–398. IEEE (2000)Google Scholar
- 15.Mcllroy, P.M., Bostic, K., Mcllroy, M.D.: Engineering radix sort. Comput. Syst. 6(1), 5–27 (1993)Google Scholar