Parka: A Parallel Implementation of BLAST with MapReduce
Bioinformatics applications have become more data-intensive and compute-intensive, which requires an effective method to implement parallel computing and get a high-throughput. Although there exists some tools to realize parallelization of BLAST, but most of them depend on complex platforms or software. A parallel BLAST is implemented using Spark, which is called Parka. The parallel execution time and speedup of Parka are evaluated in a cluster environment. Then, it is compared with Hadoop-based parallelization method. Results show that it is a scalable and effective parallelization approach for sequence alignment.
KeywordsBLAST MapReduce Data-intensive computing Bioinformatics applications
This work is partly supported by the National Natural Science Foundation of China (No. 61602169), the Natural Science Foundation of Hunan Province (No. 2015JJ3071), and the Scientific Research Fund of Hunan Provincial Education Department (No. 16C0643).
- 2.Darling, A.E., Carey, L., Feng, W.: The design, implementation, and evaluation of mpiBLAST. In: ClusterWorld Conference & Expo and the 4th International Conference on Linux Clusters: The HPC Revolution (2003)Google Scholar
- 3.Bjornson, R.D., Sherman, A.H., Weston, S.B., Willard, N., Wing, J.: TurboBLAST: a parallel implementation of BLAST build on the TurboHub. In: Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS’02) (2002)Google Scholar
- 6.Yang, C.T., Han, T.F., Kan, H.C.: G-BLAST: a grid-based solution for mpiBLAST on computational Grids. Concurrency Comput. Pract. Exper. 21(2), 225–255 (2009)Google Scholar
- 8.He, H., Fedak, G., Tang, B., Cappello, F.: BLAST application with data-aware desktop grid middleware. In: Proceedings of the 9th IEEE International Symposium on Cluster Computing and the Grid (CCGrid’09), pp. 284–291 (2009)Google Scholar
- 10.Matsunaga, A., Tsugawa, M., Fortes, J.: CloudBLAST: combining MapReduce and virtualization on distributed resources for bioinformatics applications. In: Proceeding of the Fourth IEEE International Conference on e-Science, pp. 222–229 (2008)Google Scholar
- 12.Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: HotCloud 2010, USENIX Association, pp. 1–7 (2010)Google Scholar
- 13.Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI 2012, USENIX Association, pp. 15–28 (2012)Google Scholar