Advertisement

Massively Parallel Sequence Alignment with BLAST Through Work Distribution Implemented Using PCJ Library

  • Marek Nowicki
  • Davit Bzhalava
  • Piotr Bała
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10393)

Abstract

This article presents massively parallel execution of the BLAST algorithm on supercomputers and HPC clusters using thousands of processors. Our work is based on the optimal splitting up the set of queries running with the non-modified NCBI-BLAST package for sequence alignment. The work distribution and search management have been implemented in Java using a PCJ (Parallel Computing in Java) library. The PCJ-BLAST package is responsible for reading sequence for comparison, splitting it up and start multiple NCBI-BLAST executables. We also investigated a problem of parallel I/O and thanks to PCJ library we deliver high throughput execution of BLAST. The presented results show that using Java and PCJ library we achieved very good performance and efficiency. In result, we have significantly reduced time required for sequence analysis. We have also proved that PCJ library can be used as an efficient tool for fast development of the scalable applications.

Keywords

Sequence alignment NGS Next Generation Sequencing Parallel programming Java BLAST NCBI-BLAST PCJ 

Notes

Acknowledgments

The authors would like to thank CHIST-ERA consortium for financial support under HPDCJ project (Polish part funded by NCN grant 2014/14/Z/ST6/00007) and NordForsk for the support within NIASC consortium. The performance tests have been performed using ICM University of Warsaw computational facilities.

References

  1. 1.
    Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)CrossRefGoogle Scholar
  2. 2.
    Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–3402 (1997)CrossRefGoogle Scholar
  3. 3.
    Braun, R.C., Pedretti, K.T., Casavant, T.L., Scheetz, T.E., Birkett, C.L., Roberts, C.A.: Parallelization of local BLAST service on workstation clusters. Future Gener. Comput. Syst. 17(6), 745–754 (2001)CrossRefzbMATHGoogle Scholar
  4. 4.
    Cofer, H.: SGI® High Throughput Computing (HTC) Wrapper Program for Bioinformatics on SGI ICE™ and SGI UV™ Systems. Np: Silicon Graphics International (2012)Google Scholar
  5. 5.
    Chi, E.H.H., Shoop, E., Carlis, J., Retzel, E., Riedl, J.: Efficiency of shared-memory multiprocessors for a genetic sequence similarity search algorithm. Technical report, University of Minnesota, CS Department, vol. TR97-05 (1997)Google Scholar
  6. 6.
    Darling, A., Carey, L., Feng, W.C.: The design, implementation, and evaluation of mpiBLAST. In: Proceedings of ClusterWorld Conference and Expo in Conjunction with the 4th International Conference on Linux Clusters: The HPC Revolution 2003, San Jose, CA, pp. 13–15 (2003)Google Scholar
  7. 7.
    Bjornson, R.D., Sherman, A.H., Weston, S.B., Willard, N., Wing, J.: TurboBLAST(r): a parallel implementation of BLAST built on the TurboHub. In: Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS), 0183. IEEE (2002)Google Scholar
  8. 8.
    Mathog, D.R.: Parallel BLAST on split databases. Bioinformatics 19(14), 1865–1866 (2003)CrossRefGoogle Scholar
  9. 9.
    Lin, H., Ma, X., Chandramohan, P., Geist, A., Samatova, N.: Efficient data access for parallel BLAST. In: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2005), Washington, DC, USA. IEEE Computer Society (2005)Google Scholar
  10. 10.
    Nowicki, M., Górski, Ł., Grabrczyk, P., Bała, P.: PCJ - Java library for high performance computing in PGAS model. In: International Conference on High Performance Computing and Simulation, HPCS 2014, pp. 202–209. IEEE (2014)Google Scholar
  11. 11.
    Numrich, R.W., Reid, J.: Co-array Fortran for parallel programming. ACM SIGPLAN Fortran Forum 17(2), 1–31 (1998). ACMCrossRefGoogle Scholar
  12. 12.
    Carlson, W.W., Draper, J.M., Culler, D.E., Yelick, K., Brooks, E., Warren, K.: Introduction to UPC and language specification (Vol. 576). Technical report CCS-TR-99-157, IDA Center for Computing Sciences (1999)Google Scholar
  13. 13.
    Hilfinger, P., Bonachea, D., Datta, K., Gay, D., Graham, S., Liblit, B., Pike, G., Su, J., Yelick, K.: Titanium language reference manual. UC Berkeley Technical report, UCB/EECS-2005-15, Berkeley, California, USA (2005)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Faculty of Mathematics and Computer ScienceNicolaus Copernicus UniversityToruńPoland
  2. 2.Department of Laboratory Medicine, F46Karolinska InstitutetStockholmSweden
  3. 3.Interdisciplinary Centre for Mathematical and Computational ModellingUniversity of WarsawWarsawPoland

Personalised recommendations