Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Probabilistic Approach Processing Scheme Based on BLAST for Improving Search Speed of Bioinformatics

  • 61 Accesses

  • 1 Citations

Abstract

As researchers on bioinformatics using heuristic algorithms have been increasingly studied, information management used in various bioinformatics fields (new drug development, medical diagnosis, agricultural product improvement, etc.) has been studied mainly on BLAST algorithm. However, many of the algorithms that are being used in the large genome database use a complete sorting procedure, which takes a lot of time to search the database for proteins or nucleic acid sequences, which causes many problems in processing large amounts of bio information. We propose a BLAST-based probabilistic access processing method that can manage, analyze and process a large amount of bio data distributed based on information communication infrastructure and IT technology. The proposed method aims to improve the accessibility of data by linking weighted bioinformatics information with probability factors to easily access large capacity bio data. In addition, the proposed scheme classifies the priority information allocated to the bioinformatics information by hierarchical grouping according to the degree of similarity, thereby ensuring high accuracy of the search results of the bioinformatics information, and at the same time, the goal is to obtain low processing time by classifying information (type, attribute, priority, etc.) into weights by property. Previous researchers have suggested clustering algorithms for fragmentation of genetic information to solve the problem of haplotype assembly in genetics, or proposed particle swarm optimization methods similar to existing genetic algorithms using heuristic clustering method based on MEC model. In the performance evaluation, the proposed method improved the accuracy by average 13.5% and the efficiency of the data retrieval by average 19.7% more than previous scheme. The overhead of Bioinformatics information processing was 8.8% lower and the processing time was average 13.5% lower.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

References

  1. 1.

    Disz, T., Kubal, M., Olson, R., Overbeek, R., & Stevens, R. (2005). Challenges in large scale distributed computing: bioinformatics, In Proceedings challenges of large applications in distributed environments, 2005. CLADE 2005 (pp. 57–65).

  2. 2.

    Sumitomo, J., Hogan, J. M., Newell, F., & Roe, P. (2008). BioMashups: The new world of exploratory bioinformatics? In IEEE fourth international conference on eScience, 2008. eScience’08 (pp. 422–423).

  3. 3.

    Lengauer, T. (1993). Algorithmic research problems in molecular bioinformatics. In Proceedings of the 2nd Israel symposium on the theory and computing systems, 1993 (pp. 177–192).

  4. 4.

    Alterovitz, G., & Ramoni, M. F. (2007). Bioinformatics and proteomics: An engineering problem solving-based approach. IEEE Transactions on Education, 50(1), 49–54.

  5. 5.

    Saaty, T. L. (1990). How to make a decision: The analytic hierarchy process. European Journal of Operational Research, 48(1), 9–26.

  6. 6.

    Neelakanta, P., Chatterjee, S., Pappusetty, D., & Pavlovic, M. (2011). Information-theoretic algorithms in bioinformatics and bio-/medical-imaging: A review. In 2011 International conference on recent trends in information technology (ICRTIT) (pp. 183–188).

  7. 7.

    Roman, R., Zhou, J., & Lopez, J. (2009). Feed-forward artificial neural network based inference system applied in bioinformatics data-mining. In International joint conference on neural networks, 2009. IJCNN 2009 (pp. 1744–1749).

  8. 8.

    Lau, K. W., & Siepen, J. (2006). Bioinformatic approaches to improve the identification of peptides from proteomics experiments. In The institution of engineering and technology seminar on signal processing for genomics (pp. 23–45).

  9. 9.

    Jeong, Y. S., Lee, B. K., & Lee, S. H. (2006). An efficient device authentication protocol using bioinformatic. In 2006 International conference on computational intelligence and security (Vol. 1, pp. 855–858).

  10. 10.

    Wang, R. S., Wu, L. Y., Li, Z. P., & Zhang, X. S. (2005). Haplotype reconstruction from SNP fragments by minimum error correction. Bioinformatics, 21(10), 2456–2462.

  11. 11.

    Wang, Y., Feng, E., & Wang, R. (2007). A clustering algorithm based on two distance functions for MEC model. Computational Biology and Chemistry, 31(2), 148–150.

  12. 12.

    Bustamam, A., Burrage, K., & Hamilton, N. A. (2012). Fast parallel Markov clustering in bioinformatics using massively parallel computing on GPU with CUDA and ELLPACK-R sparse format. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9(3), 679–692.

  13. 13.

    Xia, Y., Eugne Ng, T. S., & Sun, X. S. (2015). Blast: Accelerating high-performance data analytics applications by optical multicast. In 2015 IEEE conference on computer communications (INFORCOM) (pp. 1930–1938).

  14. 14.

    Li, D., Li, Y., Wu, J., Su, S., & Yu, J. (2012). ESM: Efficient and scalable data center multicast routing. IEEE/ACM Transactions on Networking, 20(3), 944–955.

  15. 15.

    Li, D., Xu, M., Zhao, M.-C., Guo, C., Zhang, Y., & Wu, M.-Y. (2011). RDCM: Reliable data center multicast. In INFOCOM’11 (pp. 56–60).

  16. 16.

    Sun, X., Fan, L., Yan, L., Kong, L., Ding, Y., Guo, C., et al. (2011). Deliver bioinformatics services in public cloud: Challenges and research framework. In 2011 IEEE 8th international conference on e-business engineering (ICEBE) (pp. 352–357).

  17. 17.

    Oehmen, C., & Nieplocha, J. (2006). ScalaBLAST: A scalable implementation of BLAST for high-performance data-intensive bioinformatics analysis. IEEE Transactions on Parallel and Distributed Systems, 17(8), 740–749.

  18. 18.

    Oehmen, C. S., & Baxter, D. J. (2013). ScalaBLAST 2.0: Rapid and robust BLAST calculations on multiprocessor systems. Bioinformatics, 29(6), 797–798.

  19. 19.

    Altschul, S. F., Madden, T. L., Schaeffer, A. A., Zhang, J., Zhang, Z., Miller, W., et al. (1997). Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research, 25(17), 3389–3402.

  20. 20.

    Zhao, K., & Chu, X. (2014). G-BLASTN: Accelerating nucleotide alignment by graphics processors. Bioinformatics, 30(10), 1381–1391.

  21. 21.

    Feng, W. (2010). mpiBLAST. http://www.mpiblast.org. Accessed 17 May 2016.

  22. 22.

    Lin, H., Ma, X., & Feng, W. (2010). Coordinating computation and I/O in massively parallel sequence search. IEEE Transactions on Parallel and Distributed Systems, 22(4), 529–543.

  23. 23.

    Loh, P.-R., Baym, M., & Berger, B. (2012). Compressive genomics. Nature Biotechnology, 30(7), 627–630.

  24. 24.

    Lancia, G., Bafna, V., Istrail, S., Lippert, R., & Schwartz, R. (2001). SNPs problems, complexity, and algorithms. Algorithms—ESA 2001 (pp. 182–193). Heidelberg: Springer.

  25. 25.

    Levy, S., et al. (2007). The diploid genome sequence of an individual human. PLoS Biology, 5(10), e254.

  26. 26.

    Bansal, V., & Bafna, V. (2008). HapCUT: An efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics, 24(16), i153–i159.

  27. 27.

    Bansal, V., Halpern, A. L., Axelrod, N., & Bafna, V. (2008). An MCMC algorithm for haplotype assembly from whole-genome sequence data. Genome Research, 18(8), 1336–1346.

  28. 28.

    Kim, J. H., Waterman, M. S., & Li, L. M. (2007). Diploid genome reconstruction of Ciona intestinalis and comparative analysis with Ciona savignyi. Genome Research, 17(7), 1101–1110.

  29. 29.

    Duitama, J., et al. (2012). Fosmid-based whole genome haplotyping of a HapMap trio child: Evaluation of single individual haplotyping techniques. Nucleic Acids Research, 40(5), 2041–2053.

  30. 30.

    Aguiar, D., & Istrail, S. (2012). HapCompass: A fast cycle basis algorithm for accurate haplotype assembly of sequence data. Journal of Computational Biology, 19(6), 577–590.

  31. 31.

    Das, S., & Vikalo, H. (2015). SDhaP: Haplotype assembly for diploids and polyploids via semi-definite programming. BMC Genomics, 16(1), 260.

  32. 32.

    Puljiz, Z., & Vikalo, H. (2016). Decoding genetic variations: Communications inspired haplotype assembly. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 13(3), 518–530.

  33. 33.

    He, D., Choi, A., Pipatsrisawat, K., Darwiche, A., & Eskin, E. (2010). Optimal algorithms for haplotype assembly from whole-genome sequence data. Bioinformatics, 26(12), i183–i190.

  34. 34.

    Qian, W., Yang, Y., Yang, N., & Li, C. (2007). Particle swarm optimization for SNP haplotype reconstruction problem. Applied Mathematics and Computation, 196(1), 266–272.

  35. 35.

    Chuang, E. Y. (2013). Combination of high-throughput genomic technologies and bioinformatics for molecular characterization of cancer. In 2013 3rd international conference on instrumentation, communications, information technology, and biomedical engineering (ICICI-BME) (p. 1).

  36. 36.

    A. AI Mazari, “Bioinformatics and Healthcare Computing Models and Services on Grid Initiatives for Data Analysis and Management”, 2014 3rd International Conference on Advanced Computer Science Applications and Technologies (ACSAT), pp. 26-31, Dec. 2014.

Download references

Acknowledgements

This Research was supported by the Tongmyong University Research Grants 2016 (2016A013).

Author information

Correspondence to Seung-Soo Shin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Jeong, Y., Shin, S. Probabilistic Approach Processing Scheme Based on BLAST for Improving Search Speed of Bioinformatics. Wireless Pers Commun 105, 405–426 (2019). https://doi.org/10.1007/s11277-018-5955-3

Download citation

Keywords

  • Bioinformatics
  • BLAST
  • Probability
  • Distributed data management
  • Algorithm
  • Cloud
  • Networking
  • Computing