Resolving Load Balancing Issues in BWA on NUMA Multicore Architectures
Running BWA in multithreaded mode on a multi-socket server results in poor scaling behaviour. This is because the current parallelisation strategy does not take into account the load imbalance that is inherent to the properties of the data being aligned, e.g. varying read lengths and numbers of mutations. Additional load imbalance is also caused by the BWA code not anticipating certain hardware characteristics of multi-socket multicores, such as the non-uniform memory access time of the different cores. We show that rewriting the parallel section using Cilk removes the load imbalance, resulting in a factor two performance improvement over the original BWA.
KeywordsBWA Multithreading NUMA Load balancing Cilk
This work is funded by Intel, Janssen Pharmaceutica and by the Institute for the Promotion of Innovation through Science and Technology in Flanders (IWT).
- 2.Burrows-Wheeler Aligner. http://bio-bwa.sourceforge.net/
- 4.Intel Cilk Plus. http://software.intel.com/en-us/intel-cilk-plus
- 5.Farragina, P., Manzini, G.: Opportunistic data structures with applications. In: 41st IEEE Annual Symposium on Foundations of Computer Science, pp. 390–398. IEEE Computer Society, Los Alamitos (2000)Google Scholar
- 6.Langmead, B., Trapnell, C., Pop, M., Salzberg, S.L.: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25:1–R25:10 (2009). (Article: R25)Google Scholar
- 8.Genomes Project. http://www.1000genomes.org/
- 9.Peters, D., Luo, X., Qiu, K., Liang, P.: Speeding up large-scale next generation sequencing data analysis with pBWA. J. Appl. Bioinform. Comput. Biol. 1(1), 1–6 (2012)Google Scholar
- 10.Herzeel, C., Costanza, P., Ashby, T., Wuyts, R.: Performance analysis of BWA alignment. Technical report, ExaScience Life Lab (2013)Google Scholar