Parka: A Parallel Implementation of BLAST with MapReduce

Zhang, Li; Tang, Bing

doi:10.1007/978-3-319-69096-4_26

Li Zhang¹⁷ &
Bing Tang¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 686))

Included in the following conference series:

International Conference on Intelligent and Interactive Systems and Applications

2434 Accesses
2 Citations

Abstract

Bioinformatics applications have become more data-intensive and compute-intensive, which requires an effective method to implement parallel computing and get a high-throughput. Although there exists some tools to realize parallelization of BLAST, but most of them depend on complex platforms or software. A parallel BLAST is implemented using Spark, which is called Parka. The parallel execution time and speedup of Parka are evaluated in a cluster environment. Then, it is compared with Hadoop-based parallelization method. Results show that it is a scalable and effective parallelization approach for sequence alignment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)
Article Google Scholar
Darling, A.E., Carey, L., Feng, W.: The design, implementation, and evaluation of mpiBLAST. In: ClusterWorld Conference & Expo and the 4th International Conference on Linux Clusters: The HPC Revolution (2003)
Google Scholar
Bjornson, R.D., Sherman, A.H., Weston, S.B., Willard, N., Wing, J.: TurboBLAST: a parallel implementation of BLAST build on the TurboHub. In: Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS’02) (2002)
Google Scholar
Vouzis, P.D., Sahinidis, N.V.: GPU-BLAST: using graphics processors to accelerate protein sequence alignment. Bioinformatics 27(2), 182–188 (2011)
Article Google Scholar
Sun, Y., Zhao, S., Yu, H., Gao, G., Luo, J.: ABCGrid: application for bioinformatics computing grid. Bioinformatics 23(9), 1175–1177 (2007)
Article Google Scholar
Yang, C.T., Han, T.F., Kan, H.C.: G-BLAST: a grid-based solution for mpiBLAST on computational Grids. Concurrency Comput. Pract. Exper. 21(2), 225–255 (2009)
Google Scholar
Mirto, M., Fiore, S., Epicoco, I., Cafaro, M., Mocavero, S., Blasi, E., Aloisio, G.: A bioinfomatics grid alignment toolkit. Future Gener. Comput. Syst. 24(7), 752–762 (2008)
Article Google Scholar
He, H., Fedak, G., Tang, B., Cappello, F.: BLAST application with data-aware desktop grid middleware. In: Proceedings of the 9th IEEE International Symposium on Cluster Computing and the Grid (CCGrid’09), pp. 284–291 (2009)
Google Scholar
Fedak, G., He, H., Cappello, F.: BitDew: A data management and distribution service with multi-protocol file transfer and metadata abstraction. J. Netw. Comput. Appl. 32(5), 961–975 (2009)
Article Google Scholar
Matsunaga, A., Tsugawa, M., Fortes, J.: CloudBLAST: combining MapReduce and virtualization on distributed resources for bioinformatics applications. In: Proceeding of the Fourth IEEE International Conference on e-Science, pp. 222–229 (2008)
Google Scholar
Schatz, M.C.: CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics 25(11), 1363–1369 (2009)
Article Google Scholar
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: HotCloud 2010, USENIX Association, pp. 1–7 (2010)
Google Scholar
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI 2012, USENIX Association, pp. 15–28 (2012)
Google Scholar

Download references

Acknowledgments

This work is partly supported by the National Natural Science Foundation of China (No. 61602169), the Natural Science Foundation of Hunan Province (No. 2015JJ3071), and the Scientific Research Fund of Hunan Provincial Education Department (No. 16C0643).

Author information

Authors and Affiliations

School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan, 411201, Hunan, China
Li Zhang & Bing Tang

Authors

Li Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Bing Tang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Zhang .

Editor information

Editors and Affiliations

Technical University of Catalonia, Barcelona, Spain
Fatos Xhafa
Department of Computer Science and Engineering, Faculty of Engineering and Technology, SOA University, Bhubaneswar, Odisha, India
Srikanta Patnaik
School of Information Technologies, University of Sydney, Sydney, New South Wales, Australia
Albert Y. Zomaya

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, L., Tang, B. (2018). Parka: A Parallel Implementation of BLAST with MapReduce. In: Xhafa, F., Patnaik, S., Zomaya, A. (eds) Advances in Intelligent Systems and Interactive Applications. IISA 2017. Advances in Intelligent Systems and Computing, vol 686. Springer, Cham. https://doi.org/10.1007/978-3-319-69096-4_26

Download citation

DOI: https://doi.org/10.1007/978-3-319-69096-4_26
Published: 01 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69095-7
Online ISBN: 978-3-319-69096-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics