Probabilistic Approach Processing Scheme Based on BLAST for Improving Search Speed of Bioinformatics

Jeong, Yoon-Su; Shin, Seung-Soo

doi:10.1007/s11277-018-5955-3

Probabilistic Approach Processing Scheme Based on BLAST for Improving Search Speed of Bioinformatics

Published: 14 September 2018

Volume 105, pages 405–426, (2019)
Cite this article

Wireless Personal Communications Aims and scope Submit manuscript

Yoon-Su Jeong¹ &
Seung-Soo Shin²

137 Accesses
1 Citation
Explore all metrics

Abstract

As researchers on bioinformatics using heuristic algorithms have been increasingly studied, information management used in various bioinformatics fields (new drug development, medical diagnosis, agricultural product improvement, etc.) has been studied mainly on BLAST algorithm. However, many of the algorithms that are being used in the large genome database use a complete sorting procedure, which takes a lot of time to search the database for proteins or nucleic acid sequences, which causes many problems in processing large amounts of bio information. We propose a BLAST-based probabilistic access processing method that can manage, analyze and process a large amount of bio data distributed based on information communication infrastructure and IT technology. The proposed method aims to improve the accessibility of data by linking weighted bioinformatics information with probability factors to easily access large capacity bio data. In addition, the proposed scheme classifies the priority information allocated to the bioinformatics information by hierarchical grouping according to the degree of similarity, thereby ensuring high accuracy of the search results of the bioinformatics information, and at the same time, the goal is to obtain low processing time by classifying information (type, attribute, priority, etc.) into weights by property. Previous researchers have suggested clustering algorithms for fragmentation of genetic information to solve the problem of haplotype assembly in genetics, or proposed particle swarm optimization methods similar to existing genetic algorithms using heuristic clustering method based on MEC model. In the performance evaluation, the proposed method improved the accuracy by average 13.5% and the efficiency of the data retrieval by average 19.7% more than previous scheme. The overhead of Bioinformatics information processing was 8.8% lower and the processing time was average 13.5% lower.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

NestMSA: a new multiple sequence alignment algorithm

Article 19 February 2020

Evolutionary computation for solving search-based data analytics problems

Article 01 August 2020

An Evolutionary Optimization Methodology for Analyzing Breast Cancer Gene Sequences Using MSAPSO and MSADE

References

Disz, T., Kubal, M., Olson, R., Overbeek, R., & Stevens, R. (2005). Challenges in large scale distributed computing: bioinformatics, In Proceedings challenges of large applications in distributed environments, 2005. CLADE 2005 (pp. 57–65).
Sumitomo, J., Hogan, J. M., Newell, F., & Roe, P. (2008). BioMashups: The new world of exploratory bioinformatics? In IEEE fourth international conference on eScience, 2008. eScience’08 (pp. 422–423).
Lengauer, T. (1993). Algorithmic research problems in molecular bioinformatics. In Proceedings of the 2nd Israel symposium on the theory and computing systems, 1993 (pp. 177–192).
Alterovitz, G., & Ramoni, M. F. (2007). Bioinformatics and proteomics: An engineering problem solving-based approach. IEEE Transactions on Education, 50(1), 49–54.
Article MATH Google Scholar
Saaty, T. L. (1990). How to make a decision: The analytic hierarchy process. European Journal of Operational Research, 48(1), 9–26.
Article MathSciNet MATH Google Scholar
Neelakanta, P., Chatterjee, S., Pappusetty, D., & Pavlovic, M. (2011). Information-theoretic algorithms in bioinformatics and bio-/medical-imaging: A review. In 2011 International conference on recent trends in information technology (ICRTIT) (pp. 183–188).
Roman, R., Zhou, J., & Lopez, J. (2009). Feed-forward artificial neural network based inference system applied in bioinformatics data-mining. In International joint conference on neural networks, 2009. IJCNN 2009 (pp. 1744–1749).
Lau, K. W., & Siepen, J. (2006). Bioinformatic approaches to improve the identification of peptides from proteomics experiments. In The institution of engineering and technology seminar on signal processing for genomics (pp. 23–45).
Jeong, Y. S., Lee, B. K., & Lee, S. H. (2006). An efficient device authentication protocol using bioinformatic. In 2006 International conference on computational intelligence and security (Vol. 1, pp. 855–858).
Wang, R. S., Wu, L. Y., Li, Z. P., & Zhang, X. S. (2005). Haplotype reconstruction from SNP fragments by minimum error correction. Bioinformatics, 21(10), 2456–2462.
Article Google Scholar
Wang, Y., Feng, E., & Wang, R. (2007). A clustering algorithm based on two distance functions for MEC model. Computational Biology and Chemistry, 31(2), 148–150.
Article MATH Google Scholar
Bustamam, A., Burrage, K., & Hamilton, N. A. (2012). Fast parallel Markov clustering in bioinformatics using massively parallel computing on GPU with CUDA and ELLPACK-R sparse format. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9(3), 679–692.
Article Google Scholar
Xia, Y., Eugne Ng, T. S., & Sun, X. S. (2015). Blast: Accelerating high-performance data analytics applications by optical multicast. In 2015 IEEE conference on computer communications (INFORCOM) (pp. 1930–1938).
Li, D., Li, Y., Wu, J., Su, S., & Yu, J. (2012). ESM: Efficient and scalable data center multicast routing. IEEE/ACM Transactions on Networking, 20(3), 944–955.
Article Google Scholar
Li, D., Xu, M., Zhao, M.-C., Guo, C., Zhang, Y., & Wu, M.-Y. (2011). RDCM: Reliable data center multicast. In INFOCOM’11 (pp. 56–60).
Sun, X., Fan, L., Yan, L., Kong, L., Ding, Y., Guo, C., et al. (2011). Deliver bioinformatics services in public cloud: Challenges and research framework. In 2011 IEEE 8th international conference on e-business engineering (ICEBE) (pp. 352–357).
Oehmen, C., & Nieplocha, J. (2006). ScalaBLAST: A scalable implementation of BLAST for high-performance data-intensive bioinformatics analysis. IEEE Transactions on Parallel and Distributed Systems, 17(8), 740–749.
Article Google Scholar
Oehmen, C. S., & Baxter, D. J. (2013). ScalaBLAST 2.0: Rapid and robust BLAST calculations on multiprocessor systems. Bioinformatics, 29(6), 797–798.
Article Google Scholar
Altschul, S. F., Madden, T. L., Schaeffer, A. A., Zhang, J., Zhang, Z., Miller, W., et al. (1997). Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research, 25(17), 3389–3402.
Article Google Scholar
Zhao, K., & Chu, X. (2014). G-BLASTN: Accelerating nucleotide alignment by graphics processors. Bioinformatics, 30(10), 1381–1391.
Article Google Scholar
Feng, W. (2010). mpiBLAST. http://www.mpiblast.org. Accessed 17 May 2016.
Lin, H., Ma, X., & Feng, W. (2010). Coordinating computation and I/O in massively parallel sequence search. IEEE Transactions on Parallel and Distributed Systems, 22(4), 529–543.
Article Google Scholar
Loh, P.-R., Baym, M., & Berger, B. (2012). Compressive genomics. Nature Biotechnology, 30(7), 627–630.
Article Google Scholar
Lancia, G., Bafna, V., Istrail, S., Lippert, R., & Schwartz, R. (2001). SNPs problems, complexity, and algorithms. Algorithms—ESA 2001 (pp. 182–193). Heidelberg: Springer.
Levy, S., et al. (2007). The diploid genome sequence of an individual human. PLoS Biology, 5(10), e254.
Article Google Scholar
Bansal, V., & Bafna, V. (2008). HapCUT: An efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics, 24(16), i153–i159.
Article Google Scholar
Bansal, V., Halpern, A. L., Axelrod, N., & Bafna, V. (2008). An MCMC algorithm for haplotype assembly from whole-genome sequence data. Genome Research, 18(8), 1336–1346.
Article Google Scholar
Kim, J. H., Waterman, M. S., & Li, L. M. (2007). Diploid genome reconstruction of Ciona intestinalis and comparative analysis with Ciona savignyi. Genome Research, 17(7), 1101–1110.
Article Google Scholar
Duitama, J., et al. (2012). Fosmid-based whole genome haplotyping of a HapMap trio child: Evaluation of single individual haplotyping techniques. Nucleic Acids Research, 40(5), 2041–2053.
Article Google Scholar
Aguiar, D., & Istrail, S. (2012). HapCompass: A fast cycle basis algorithm for accurate haplotype assembly of sequence data. Journal of Computational Biology, 19(6), 577–590.
Article MathSciNet Google Scholar
Das, S., & Vikalo, H. (2015). SDhaP: Haplotype assembly for diploids and polyploids via semi-definite programming. BMC Genomics, 16(1), 260.
Article Google Scholar
Puljiz, Z., & Vikalo, H. (2016). Decoding genetic variations: Communications inspired haplotype assembly. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 13(3), 518–530.
Article Google Scholar
He, D., Choi, A., Pipatsrisawat, K., Darwiche, A., & Eskin, E. (2010). Optimal algorithms for haplotype assembly from whole-genome sequence data. Bioinformatics, 26(12), i183–i190.
Article Google Scholar
Qian, W., Yang, Y., Yang, N., & Li, C. (2007). Particle swarm optimization for SNP haplotype reconstruction problem. Applied Mathematics and Computation, 196(1), 266–272.
Article MathSciNet MATH Google Scholar
Chuang, E. Y. (2013). Combination of high-throughput genomic technologies and bioinformatics for molecular characterization of cancer. In 2013 3rd international conference on instrumentation, communications, information technology, and biomedical engineering (ICICI-BME) (p. 1).
A. AI Mazari, “Bioinformatics and Healthcare Computing Models and Services on Grid Initiatives for Data Analysis and Management”, 2014 3rd International Conference on Advanced Computer Science Applications and Technologies (ACSAT), pp. 26-31, Dec. 2014.

Download references

Acknowledgements

This Research was supported by the Tongmyong University Research Grants 2016 (2016A013).

Author information

Authors and Affiliations

Department of Information Communication Engineering, Mokwon University, 88 Doanbuk-ro, Seo-gu, Daejeon, 35349, Korea
Yoon-Su Jeong
Department of Information Security, Tongmyong University, 428, Sinseonno, Nam-gu, Busan, 48520, Korea
Seung-Soo Shin

Authors

Yoon-Su Jeong
View author publications
You can also search for this author in PubMed Google Scholar
Seung-Soo Shin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Seung-Soo Shin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jeong, YS., Shin, SS. Probabilistic Approach Processing Scheme Based on BLAST for Improving Search Speed of Bioinformatics. Wireless Pers Commun 105, 405–426 (2019). https://doi.org/10.1007/s11277-018-5955-3

Download citation

Published: 14 September 2018
Issue Date: 30 March 2019
DOI: https://doi.org/10.1007/s11277-018-5955-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Probabilistic Approach Processing Scheme Based on BLAST for Improving Search Speed of Bioinformatics

Abstract

Access this article

Similar content being viewed by others

NestMSA: a new multiple sequence alignment algorithm

Evolutionary computation for solving search-based data analytics problems

An Evolutionary Optimization Methodology for Analyzing Breast Cancer Gene Sequences Using MSAPSO and MSADE

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Probabilistic Approach Processing Scheme Based on BLAST for Improving Search Speed of Bioinformatics

Abstract

Access this article

Similar content being viewed by others

NestMSA: a new multiple sequence alignment algorithm

Evolutionary computation for solving search-based data analytics problems

An Evolutionary Optimization Methodology for Analyzing Breast Cancer Gene Sequences Using MSAPSO and MSADE

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation