Skip to main content

Advertisement

Log in

Probabilistic Approach Processing Scheme Based on BLAST for Improving Search Speed of Bioinformatics

  • Published:
Wireless Personal Communications Aims and scope Submit manuscript

Abstract

As researchers on bioinformatics using heuristic algorithms have been increasingly studied, information management used in various bioinformatics fields (new drug development, medical diagnosis, agricultural product improvement, etc.) has been studied mainly on BLAST algorithm. However, many of the algorithms that are being used in the large genome database use a complete sorting procedure, which takes a lot of time to search the database for proteins or nucleic acid sequences, which causes many problems in processing large amounts of bio information. We propose a BLAST-based probabilistic access processing method that can manage, analyze and process a large amount of bio data distributed based on information communication infrastructure and IT technology. The proposed method aims to improve the accessibility of data by linking weighted bioinformatics information with probability factors to easily access large capacity bio data. In addition, the proposed scheme classifies the priority information allocated to the bioinformatics information by hierarchical grouping according to the degree of similarity, thereby ensuring high accuracy of the search results of the bioinformatics information, and at the same time, the goal is to obtain low processing time by classifying information (type, attribute, priority, etc.) into weights by property. Previous researchers have suggested clustering algorithms for fragmentation of genetic information to solve the problem of haplotype assembly in genetics, or proposed particle swarm optimization methods similar to existing genetic algorithms using heuristic clustering method based on MEC model. In the performance evaluation, the proposed method improved the accuracy by average 13.5% and the efficiency of the data retrieval by average 19.7% more than previous scheme. The overhead of Bioinformatics information processing was 8.8% lower and the processing time was average 13.5% lower.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Disz, T., Kubal, M., Olson, R., Overbeek, R., & Stevens, R. (2005). Challenges in large scale distributed computing: bioinformatics, In Proceedings challenges of large applications in distributed environments, 2005. CLADE 2005 (pp. 57–65).

  2. Sumitomo, J., Hogan, J. M., Newell, F., & Roe, P. (2008). BioMashups: The new world of exploratory bioinformatics? In IEEE fourth international conference on eScience, 2008. eScience’08 (pp. 422–423).

  3. Lengauer, T. (1993). Algorithmic research problems in molecular bioinformatics. In Proceedings of the 2nd Israel symposium on the theory and computing systems, 1993 (pp. 177–192).

  4. Alterovitz, G., & Ramoni, M. F. (2007). Bioinformatics and proteomics: An engineering problem solving-based approach. IEEE Transactions on Education, 50(1), 49–54.

    Article  MATH  Google Scholar 

  5. Saaty, T. L. (1990). How to make a decision: The analytic hierarchy process. European Journal of Operational Research, 48(1), 9–26.

    Article  MathSciNet  MATH  Google Scholar 

  6. Neelakanta, P., Chatterjee, S., Pappusetty, D., & Pavlovic, M. (2011). Information-theoretic algorithms in bioinformatics and bio-/medical-imaging: A review. In 2011 International conference on recent trends in information technology (ICRTIT) (pp. 183–188).

  7. Roman, R., Zhou, J., & Lopez, J. (2009). Feed-forward artificial neural network based inference system applied in bioinformatics data-mining. In International joint conference on neural networks, 2009. IJCNN 2009 (pp. 1744–1749).

  8. Lau, K. W., & Siepen, J. (2006). Bioinformatic approaches to improve the identification of peptides from proteomics experiments. In The institution of engineering and technology seminar on signal processing for genomics (pp. 23–45).

  9. Jeong, Y. S., Lee, B. K., & Lee, S. H. (2006). An efficient device authentication protocol using bioinformatic. In 2006 International conference on computational intelligence and security (Vol. 1, pp. 855–858).

  10. Wang, R. S., Wu, L. Y., Li, Z. P., & Zhang, X. S. (2005). Haplotype reconstruction from SNP fragments by minimum error correction. Bioinformatics, 21(10), 2456–2462.

    Article  Google Scholar 

  11. Wang, Y., Feng, E., & Wang, R. (2007). A clustering algorithm based on two distance functions for MEC model. Computational Biology and Chemistry, 31(2), 148–150.

    Article  MATH  Google Scholar 

  12. Bustamam, A., Burrage, K., & Hamilton, N. A. (2012). Fast parallel Markov clustering in bioinformatics using massively parallel computing on GPU with CUDA and ELLPACK-R sparse format. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9(3), 679–692.

    Article  Google Scholar 

  13. Xia, Y., Eugne Ng, T. S., & Sun, X. S. (2015). Blast: Accelerating high-performance data analytics applications by optical multicast. In 2015 IEEE conference on computer communications (INFORCOM) (pp. 1930–1938).

  14. Li, D., Li, Y., Wu, J., Su, S., & Yu, J. (2012). ESM: Efficient and scalable data center multicast routing. IEEE/ACM Transactions on Networking, 20(3), 944–955.

    Article  Google Scholar 

  15. Li, D., Xu, M., Zhao, M.-C., Guo, C., Zhang, Y., & Wu, M.-Y. (2011). RDCM: Reliable data center multicast. In INFOCOM’11 (pp. 56–60).

  16. Sun, X., Fan, L., Yan, L., Kong, L., Ding, Y., Guo, C., et al. (2011). Deliver bioinformatics services in public cloud: Challenges and research framework. In 2011 IEEE 8th international conference on e-business engineering (ICEBE) (pp. 352–357).

  17. Oehmen, C., & Nieplocha, J. (2006). ScalaBLAST: A scalable implementation of BLAST for high-performance data-intensive bioinformatics analysis. IEEE Transactions on Parallel and Distributed Systems, 17(8), 740–749.

    Article  Google Scholar 

  18. Oehmen, C. S., & Baxter, D. J. (2013). ScalaBLAST 2.0: Rapid and robust BLAST calculations on multiprocessor systems. Bioinformatics, 29(6), 797–798.

    Article  Google Scholar 

  19. Altschul, S. F., Madden, T. L., Schaeffer, A. A., Zhang, J., Zhang, Z., Miller, W., et al. (1997). Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research, 25(17), 3389–3402.

    Article  Google Scholar 

  20. Zhao, K., & Chu, X. (2014). G-BLASTN: Accelerating nucleotide alignment by graphics processors. Bioinformatics, 30(10), 1381–1391.

    Article  Google Scholar 

  21. Feng, W. (2010). mpiBLAST. http://www.mpiblast.org. Accessed 17 May 2016.

  22. Lin, H., Ma, X., & Feng, W. (2010). Coordinating computation and I/O in massively parallel sequence search. IEEE Transactions on Parallel and Distributed Systems, 22(4), 529–543.

    Article  Google Scholar 

  23. Loh, P.-R., Baym, M., & Berger, B. (2012). Compressive genomics. Nature Biotechnology, 30(7), 627–630.

    Article  Google Scholar 

  24. Lancia, G., Bafna, V., Istrail, S., Lippert, R., & Schwartz, R. (2001). SNPs problems, complexity, and algorithms. Algorithms—ESA 2001 (pp. 182–193). Heidelberg: Springer.

  25. Levy, S., et al. (2007). The diploid genome sequence of an individual human. PLoS Biology, 5(10), e254.

    Article  Google Scholar 

  26. Bansal, V., & Bafna, V. (2008). HapCUT: An efficient and accurate algorithm for the haplotype assembly problem. Bioinformatics, 24(16), i153–i159.

    Article  Google Scholar 

  27. Bansal, V., Halpern, A. L., Axelrod, N., & Bafna, V. (2008). An MCMC algorithm for haplotype assembly from whole-genome sequence data. Genome Research, 18(8), 1336–1346.

    Article  Google Scholar 

  28. Kim, J. H., Waterman, M. S., & Li, L. M. (2007). Diploid genome reconstruction of Ciona intestinalis and comparative analysis with Ciona savignyi. Genome Research, 17(7), 1101–1110.

    Article  Google Scholar 

  29. Duitama, J., et al. (2012). Fosmid-based whole genome haplotyping of a HapMap trio child: Evaluation of single individual haplotyping techniques. Nucleic Acids Research, 40(5), 2041–2053.

    Article  Google Scholar 

  30. Aguiar, D., & Istrail, S. (2012). HapCompass: A fast cycle basis algorithm for accurate haplotype assembly of sequence data. Journal of Computational Biology, 19(6), 577–590.

    Article  MathSciNet  Google Scholar 

  31. Das, S., & Vikalo, H. (2015). SDhaP: Haplotype assembly for diploids and polyploids via semi-definite programming. BMC Genomics, 16(1), 260.

    Article  Google Scholar 

  32. Puljiz, Z., & Vikalo, H. (2016). Decoding genetic variations: Communications inspired haplotype assembly. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 13(3), 518–530.

    Article  Google Scholar 

  33. He, D., Choi, A., Pipatsrisawat, K., Darwiche, A., & Eskin, E. (2010). Optimal algorithms for haplotype assembly from whole-genome sequence data. Bioinformatics, 26(12), i183–i190.

    Article  Google Scholar 

  34. Qian, W., Yang, Y., Yang, N., & Li, C. (2007). Particle swarm optimization for SNP haplotype reconstruction problem. Applied Mathematics and Computation, 196(1), 266–272.

    Article  MathSciNet  MATH  Google Scholar 

  35. Chuang, E. Y. (2013). Combination of high-throughput genomic technologies and bioinformatics for molecular characterization of cancer. In 2013 3rd international conference on instrumentation, communications, information technology, and biomedical engineering (ICICI-BME) (p. 1).

  36. A. AI Mazari, “Bioinformatics and Healthcare Computing Models and Services on Grid Initiatives for Data Analysis and Management”, 2014 3rd International Conference on Advanced Computer Science Applications and Technologies (ACSAT), pp. 26-31, Dec. 2014.

Download references

Acknowledgements

This Research was supported by the Tongmyong University Research Grants 2016 (2016A013).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Seung-Soo Shin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jeong, YS., Shin, SS. Probabilistic Approach Processing Scheme Based on BLAST for Improving Search Speed of Bioinformatics. Wireless Pers Commun 105, 405–426 (2019). https://doi.org/10.1007/s11277-018-5955-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11277-018-5955-3

Keywords

Navigation