Massively Parallel Sequence Alignment with BLAST Through Work Distribution Implemented Using PCJ Library
This article presents massively parallel execution of the BLAST algorithm on supercomputers and HPC clusters using thousands of processors. Our work is based on the optimal splitting up the set of queries running with the non-modified NCBI-BLAST package for sequence alignment. The work distribution and search management have been implemented in Java using a PCJ (Parallel Computing in Java) library. The PCJ-BLAST package is responsible for reading sequence for comparison, splitting it up and start multiple NCBI-BLAST executables. We also investigated a problem of parallel I/O and thanks to PCJ library we deliver high throughput execution of BLAST. The presented results show that using Java and PCJ library we achieved very good performance and efficiency. In result, we have significantly reduced time required for sequence analysis. We have also proved that PCJ library can be used as an efficient tool for fast development of the scalable applications.
KeywordsSequence alignment NGS Next Generation Sequencing Parallel programming Java BLAST NCBI-BLAST PCJ
The authors would like to thank CHIST-ERA consortium for financial support under HPDCJ project (Polish part funded by NCN grant 2014/14/Z/ST6/00007) and NordForsk for the support within NIASC consortium. The performance tests have been performed using ICM University of Warsaw computational facilities.
- 4.Cofer, H.: SGI® High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE™ and SGI UV™ Systems. Np: Silicon Graphics International (2012)Google Scholar
- 5.Chi, E.H.H., Shoop, E., Carlis, J., Retzel, E., Riedl, J.: Efficiency of shared-memory multiprocessors for a genetic sequence similarity search algorithm. Technical report, University of Minnesota, CS Department, vol. TR97-05 (1997)Google Scholar
- 6.Darling, A., Carey, L., Feng, W.C.: The design, implementation, and evaluation of mpiBLAST. In: Proceedings of ClusterWorld Conference and Expo in Conjunction with the 4th International Conference on Linux Clusters: The HPC Revolution 2003, San Jose, CA, pp. 13–15 (2003)Google Scholar
- 7.Bjornson, R.D., Sherman, A.H., Weston, S.B., Willard, N., Wing, J.: TurboBLAST(r): a parallel implementation of BLAST built on the TurboHub. In: Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS), 0183. IEEE (2002)Google Scholar
- 9.Lin, H., Ma, X., Chandramohan, P., Geist, A., Samatova, N.: Efficient data access for parallel BLAST. In: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2005), Washington, DC, USA. IEEE Computer Society (2005)Google Scholar
- 10.Nowicki, M., Górski, Ł., Grabrczyk, P., Bała, P.: PCJ - Java library for high performance computing in PGAS model. In: International Conference on High Performance Computing and Simulation, HPCS 2014, pp. 202–209. IEEE (2014)Google Scholar
- 12.Carlson, W.W., Draper, J.M., Culler, D.E., Yelick, K., Brooks, E., Warren, K.: Introduction to UPC and language specification (Vol. 576). Technical report CCS-TR-99-157, IDA Center for Computing Sciences (1999)Google Scholar
- 13.Hilfinger, P., Bonachea, D., Datta, K., Gay, D., Graham, S., Liblit, B., Pike, G., Su, J., Yelick, K.: Titanium language reference manual. UC Berkeley Technical report, UCB/EECS-2005-15, Berkeley, California, USA (2005)Google Scholar