A Survey of Data Mining and Deep Learning in Bioinformatics

Abstract

The fields of medicine science and health informatics have made great progress recently and have led to in-depth analytics that is demanded by generation, collection and accumulation of massive data. Meanwhile, we are entering a new period where novel technologies are starting to analyze and explore knowledge from tremendous amount of data, bringing limitless potential for information growth. One fact that cannot be ignored is that the techniques of machine learning and deep learning applications play a more significant role in the success of bioinformatics exploration from biological data point of view, and a linkage is emphasized and established to bridge these two data analytics techniques and bioinformatics in both industry and academia. This survey concentrates on the review of recent researches using data mining and deep learning approaches for analyzing the specific domain knowledge of bioinformatics. The authors give a brief but pithy summarization of numerous data mining algorithms used for preprocessing, classification and clustering as well as various optimized neural network architectures in deep learning methods, and their advantages and disadvantages in the practical applications are also discussed and compared in terms of their industrial usage. It is believed that in this review paper, valuable insights are provided for those who are dedicated to start using data analytics methods in bioinformatics.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2

References

  1. 1.

    Li, J., Wong, L., and Yang, Q., Guest editors' introduction: Data Mining in Bioinformatics. IEEE Intell. Syst. 20(6):16–18, 2005.

    CAS  Article  Google Scholar 

  2. 2.

    Yoo, I., Alafaireet, P., Marinov, M., Pena-Hernandez, K., Gopidi, R., Chang, J.-F., and Hua, L., Data mining in healthcare and biomedicine: a survey of the literature. J. Med. Syst. 36(4):2431–2448, 2012.

    PubMed  Article  Google Scholar 

  3. 3.

    Kharya, S., Using data mining techniques for diagnosis and prognosis of cancer disease. arXiv preprint arXiv:12051923, 2012.

  4. 4.

    Santosh, K., and Antani, S., Automated chest X-ray screening: Can lung region symmetry help detect pulmonary abnormalities? IEEE Transactions on Medical Imaging, 2017.

  5. 5.

    Zohora, F. T., Antani, S., and Santosh, K., Circle-like foreign element detection in chest x-rays using normalized cross-correlation and unsupervised clustering. In: Medical Imaging 2018: Image Processing. International Society for Optics and Photonics, p 105741V, 2018.

  6. 6.

    Zohora, F. T., and Santosh, K., Foreign Circular Element Detection in Chest X-Rays for Effective Automated Pulmonary Abnormality Screening. International Journal of Computer Vision and Image Processing (IJCVIP). 7(2):36–49, 2017.

    Article  Google Scholar 

  7. 7.

    Santosh, K., Vajda, S., Antani, S., and Thoma, G. R., Edge map analysis in chest X-rays for automatic pulmonary abnormality screening. Int. J. Comput. Assist. Radiol. Surg. 11(9):1637–1646, 2016.

    PubMed  CAS  Article  Google Scholar 

  8. 8.

    Karargyris, A., Siegelman, J., Tzortzis, D., Jaeger, S., Candemir, S., Xue, Z., Santosh, K., Vajda, S., Antani, S., and Folio, L., Combination of texture and shape features to detect pulmonary abnormalities in digital chest X-rays. Int. J. Comput. Assist. Radiol. Surg. 11(1):99–106, 2016.

    PubMed  Article  Google Scholar 

  9. 9.

    Kalsi, S., Kaur, H., and Chang, V., DNA Cryptography and Deep Learning using Genetic Algorithm with NW algorithm for Key Generation. J. Med. Syst. 42(1):17, 2018.

    Article  Google Scholar 

  10. 10.

    Hsieh, S.-L., Hsieh, S.-H., Cheng, P.-H., Chen, C.-H., Hsu, K.-P., Lee, I.-S., Wang, Z., and Lai, F., Design ensemble machine learning model for breast cancer diagnosis. J. Med. Syst. 36(5):2841–2847, 2012.

    PubMed  Article  Google Scholar 

  11. 11.

    Somasundaram, S., Alli, P., and Machine Learning, A., Ensemble Classifier for Early Prediction of Diabetic Retinopathy. J. Med. Syst. 41(12):201, 2017.

    Article  Google Scholar 

  12. 12.

    Alanazi, H. O., Abdullah, A. H., and Qureshi, K. N., A critical review for developing accurate and dynamic predictive models using machine learning methods in medicine and health care. J. Med. Syst. 41(4):69, 2017.

    PubMed  Article  Google Scholar 

  13. 13.

    Han, J., How can data mining help bio-data analysis? In: Proceedings of the 2nd International Conference on Data Mining in Bioinformatics. Springer-Verlag, pp 1–2, 2002.

  14. 14.

    Brazma, A., Hingamp, P., Quackenbush, J., Sherlock, G., Spellman, P., Stoeckert, C., Aach, J., Ansorge, W., Ball, C. A., and Causton, H. C., Minimum information about a microarray experiment (MIAME)—toward standards for microarray data. Nat. Genet. 29(4):365–371, 2001.

    PubMed  CAS  Article  Google Scholar 

  15. 15.

    Antonie, M.-L., Zaiane, O. R., and Coman, A. Application of data mining techniques for medical image classification. In: Proceedings of the Second International Conference on Multimedia Data Mining. Springer-Verlag, pp. 94–101, 2001.

  16. 16.

    Dasu, T., Johnson, T., Muthukrishnan, S., and Shkapenyuk, V., Mining database structure; or, how to build a data quality browser. In: Proceedings of the 2002 ACM SIGMOD international conference on Management of data. ACM, pp 240–251, 2002.

  17. 17.

    Raman, V., and Hellerstein, J. M., Potter's wheel: An interactive data cleaning system. In: VLDB, pp 381–390, 2001.

  18. 18.

    Becker, B., Kohavi, R., and Sommerfield, D., Visualizing the simple Bayesian classifier. Information Visualization in Data Mining and Knowledge Discovery. 18:237–249, 2001.

    Google Scholar 

  19. 19.

    Zhang, J., Hsu, W., and Lee, M., FASTCiD: FAST clustering in dynamic spatial databases. Submitted for publication, 2002.

  20. 20.

    Xu, X., Jäger, J., and Kriegel, H.-P., A fast parallel clustering algorithm for large spatial databases. In: High Performance Data Mining. Springer, pp 263–290, 1999.

  21. 21.

    Han, J., Pei, J., and Kamber, M., Data mining: concepts and techniques. New York: Elsevier, 2011.

    Google Scholar 

  22. 22.

    Daubechies, I., Ten lectures on wavelets. SIAM, 1992.

  23. 23.

    Mackiewicz, A., and Ratajczak, W., Principal components analysis (PCA). Comput. Geosci. 19:303–342, 1993.

    Article  Google Scholar 

  24. 24.

    Holland, S. M., Principal components analysis (PCA). Department of Geology. Athens, GA: University of Georgia, 2008, 30602–32501.

    Google Scholar 

  25. 25.

    Ku, W., Storer, R. H., and Georgakis, C., Disturbance detection and isolation by dynamic principal component analysis. Chemom. Intell. Lab. Syst. 30(1):179–196, 1995.

    CAS  Article  Google Scholar 

  26. 26.

    Andrews, H., and Patterson, C., Singular value decomposition (SVD) image coding. IEEE Trans. Commun. 24(4):425–432, 1976.

    Article  Google Scholar 

  27. 27.

    Shearer, C., The CRISP-DM model: the new blueprint for data mining. Journal of Data Warehousing 5(4):13–22, 2000.

    Google Scholar 

  28. 28.

    Glas, A. M., Floore, A., Delahaye, L. J., Witteveen, A. T., Pover, R. C., Bakx, N., Lahti-Domenici, J. S., Bruinsma, T. J., Warmoes, M. O., and Bernards, R., Converting a breast cancer microarray signature into a high-throughput diagnostic test. BMC Genomics 7(1):278, 2006.

    PubMed  PubMed Central  Article  Google Scholar 

  29. 29.

    Yoshida, H., Kawaguchi, A., and Tsuruya, K., Radial basis function-sparse partial least squares for application to brain imaging data. Computational and Mathematical Methods in Medicine 2013, 2013.

  30. 30.

    Jen, C.-H., Wang, C.-C., Jiang, B. C., Chu, Y.-H., and Chen, M.-S., Application of classification techniques on development an early-warning system for chronic illnesses. Expert Syst. Appl. 39(10):8852–8858, 2012.

    Article  Google Scholar 

  31. 31.

    Bailey, T., and Jain, A., A note on distance-weighted $ k $-nearest neighbor rules. IEEE Trans Syst Man Cybern 4:311–313, 1978.

    Google Scholar 

  32. 32.

    Keller, J. M., Gray, M. R., and Givens, J. A., A fuzzy k-nearest neighbor algorithm. IEEE Trans Syst Man Cybern 4:580–585, 1985.

    Article  Google Scholar 

  33. 33.

    Liu, D.-Y., Chen, H.-L., Yang, B., Lv, X.-E., Li, L.-N., and Liu, J., Design of an enhanced fuzzy k-nearest neighbor classifier based computer aided diagnostic system for thyroid disease. J. Med. Syst. 36(5):3243–3254, 2012.

    PubMed  Article  Google Scholar 

  34. 34.

    Syaliman, K., and Nababan, E., Sitompul O Improving the accuracy of k-nearest neighbor using local mean based and distance weight. In: Journal of Physics: Conference Series. vol 1. IOP Publishing, p 012047, 2018.

  35. 35.

    Spiegelhalter, D. J., Dawid, A. P., Lauritzen, S. L., Cowell, R. G., Bayesian analysis in expert systems. Statistical science: 219–247, 1993.

  36. 36.

    Kononenko, I., Semi-naive Bayesian classifier. In: Machine Learning—EWSL-91. Springer, pp 206–219, 1991.

  37. 37.

    Langley, P., Induction of recursive Bayesian classifiers. In: Machine Learning: ECML-93. Springer, pp 153–164, 1993.

  38. 38.

    Peng, H., and Long, F. A., Bayesian learning algorithm of discrete variables for automatically mining irregular features of pattern images. In: Proceedings of the Second International Conference on Multimedia Data Mining. Springer-Verlag, pp 87–93, 2001.

  39. 39.

    Hickey, S. J., Naive Bayes classification of public health data with greedy feature selection. Commun. IIMA 13(2):7, 2013.

    Google Scholar 

  40. 40.

    Abellán, J., and Castellano, J. G., Improving the Naive Bayes Classifier via a Quick Variable Selection Method Using Maximum of Entropy. Entropy 19(6):247, 2017.

    Article  Google Scholar 

  41. 41.

    Estella, F., Delgado-Marquez, B. L., Rojas, P., Valenzuela, O., San Roman, B., and Rojas, I., Advanced system for automously classify brain MRI in neurodegenerative disease. In: Multimedia Computing and Systems (ICMCS), 2012 International Conference on. IEEE, pp 250–255, 2012.

  42. 42.

    Rodriguez, J. J., Kuncheva, L. I., and Alonso, C. J., Rotation forest: A new classifier ensemble method. IEEE Trans. Pattern Anal. Mach. Intell. 28(10):1619–1630, 2006.

    PubMed  Article  Google Scholar 

  43. 43.

    Domingos, P., and Hulten, G., Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 71–80, 2000.

  44. 44.

    Hulten, G., Spencer, L., and Domingos, P., Mining time-changing data streams. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 97–106, 2001.

  45. 45.

    Zhu, B., Jiao, J., Han, Y., Weissman, T., Improving Decision Tree Learning by Optimal Split Scoring Function Estimation, 2017.

  46. 46.

    Esmeir, S., and Markovitch, S., Anytime induction of low-cost, low-error classifiers: a sampling-based approach. J. Artif. Intell. Res. 33:1–31, 2008.

    Article  Google Scholar 

  47. 47.

    Esmeir, S., and Markovitch, S., Anytime learning of anycost classifiers. Mach. Learn. 82(3):445–473, 2011.

    Article  Google Scholar 

  48. 48.

    Boser, B. E., Guyon, I. M., and Vapnik, V. N. A., training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual workshop on Computational learning theory. ACM, pp 144–152, 1992.

  49. 49.

    Lee, K.-J., Hwang, Y.-S., and Rim, H.-C., Two-phase biomedical NE recognition based on SVMs. In: Proceedings of the ACL 2003 workshop on Natural language processing in biomedicine-Volume 13. Association for Computational Linguistics, pp 33–40, 2003.

  50. 50.

    Nanni, L., Lumini, A., and Brahnam, S., Survey on LBP based texture descriptors for image classification. Expert Syst. Appl. 39(3):3634–3641, 2012.

    Article  Google Scholar 

  51. 51.

    Hasri, N. N. M., Wen, N. H., Howe, C. W., Mohamad, M. S., Deris, S., and Kasim, S., Improved Support Vector Machine Using Multiple SVM-RFE for Cancer Classification. International Journal on Advanced Science, Engineering and Information. Technology 7(4–2):1589–1594, 2017.

    Google Scholar 

  52. 52.

    Kavitha, K., and Gopinath, A., Gopi M Applying improved svm classifier for leukemia cancer classification using FCBF. In: Advances in Computing, Coemmunications and Informatics (ICACCI), 2017 International Conference on. IEEE, pp 61–66, 2017.

  53. 53.

    Er, O., Yumusak, N., and Temurtas, F., Chest diseases diagnosis using artificial neural networks. Expert Syst. Appl. 37(12):7648–7655, 2010.

    Article  Google Scholar 

  54. 54.

    Gunasundari, S., and Baskar S., Application of Artificial Neural Network in identification of lung diseases. In: Nature & Biologically Inspired Computing. NaBIC 2009. World Congress on. IEEE, pp 1441–1444, 2009.

  55. 55.

    Bin, W., and Jing, Z., A novel artificial neural network and an improved particle swarm optimization used in splice site prediction. J Appl Computat Math 3(166), 2014.

  56. 56.

    Amaratunga, D., Cabrera, J., and Lee, Y.-S., Enriched random forests. Bioinformatics 24(18):2010–2014, 2008.

    PubMed  CAS  Article  Google Scholar 

  57. 57.

    Yao, D., Yang, J., and Zhan, X., An improved random forest algorithm for class-imbalanced data classification and its application in PAD risk factors analysis. Open Electr Electron Eng J 7(1):62–72, 2013.

    Article  Google Scholar 

  58. 58.

    Fabris, F., Doherty, A., Palmer, D., de Magalhães, J. P., Freitas, A. A., and Wren, J., A new approach for interpreting Random Forest models and its application to the biology of ageing. Bioinformatics 1:8, 2018.

    Google Scholar 

  59. 59.

    Gopal, R., Marsden, J. R., and Vanthienen, J., Information mining—Reflections on recent advancements and the road ahead in data, text, and media mining. New York, NY: Elsevier, 2011.

    Google Scholar 

  60. 60.

    Ding, J., Berleant, D., Nettleton, D., and Wurtele, E., Mining MEDLINE: abstracts, sentences, or phrases. In: Proceedings of the pacific symposium on biocomputing, 2002. pp 326–337, 2002.

  61. 61.

    Shen, H.-B., and Chou, K.-C., Ensemble classifier for protein fold pattern recognition. Bioinformatics 22(14):1717–1722, 2006.

    PubMed  CAS  Article  Google Scholar 

  62. 62.

    Eom, J.-H., Kim, S.-C., and Zhang, B.-T., AptaCDSS-E: A classifier ensemble-based clinical decision support system for cardiovascular disease level prediction. Expert Syst. Appl. 34(4):2465–2479, 2008.

    Article  Google Scholar 

  63. 63.

    Jain, A. K., Murty, M. N., and Flynn, P. J., Data clustering: a review. ACM computing surveys (CSUR) 31(3):264–323, 1999.

    Article  Google Scholar 

  64. 64.

    Zhang, T., Ramakrishnan, R., and Livny, M., BIRCH: an efficient data clustering method for very large databases. In: ACM Sigmod Record. vol 2. ACM, pp 103–114, 1996.

  65. 65.

    Bryant, D., and Moulton, V., Neighbor-net: an agglomerative method for the construction of phylogenetic networks. Mol. Biol. Evol. 21(2):255–265, 2004.

    PubMed  CAS  Article  Google Scholar 

  66. 66.

    Heo, M., and Leon, A. C., Statistical power and sample size requirements for three level hierarchical cluster randomized trials. Biometrics 64(4):1256–1262, 2008.

    PubMed  Article  Google Scholar 

  67. 67.

    Darkins, R., Cooke, E. J., Ghahramani, Z., Kirk, P. D., Wild, D. L., and Savage, R. S., Accelerating Bayesian hierarchical clustering of time series data with a randomised algorithm. PLoS One 8(4):e59795, 2013.

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  68. 68.

    Elkamel, A., Gzara, M., and Ben-Abdallah, H., A bio-inspired hierarchical clustering algorithm with backtracking strategy. Appl. Intell. 42(2):174–194, 2015.

    Article  Google Scholar 

  69. 69.

    Yildirim, P., and Birant, D., K-Linkage: A New Agglomerative Approach for Hierarchical Clustering. Adv Electr Comput Eng 17(4):77–88, 2017.

    Article  Google Scholar 

  70. 70.

    Chiu, T., Fang, D., Chen, J., Wang, Y., and Jeris, C., A robust and scalable clustering algorithm for mixed type attributes in large database environment. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 263–268, 2001.

  71. 71.

    Hussain, H. M., Benkrid, K., Seker, H., and Erdogan, A. T., FPGA implementation of K-means algorithm for bioinformatics application: An accelerated approach to clustering Microarray data. In: Adaptive Hardware and Systems (AHS), 2011 NASA/ESA Conference on. IEEE, pp 248–255, 2011.

  72. 72.

    Tseng, G. C., Penalized and weighted K-means for clustering with scattered objects and prior information in high-throughput biological data. Bioinformatics 23(17):2247–2255, 2007.

    PubMed  CAS  Article  Google Scholar 

  73. 73.

    Botía, J. A., Vandrovcova, J., Forabosco, P., Guelfi, S., D’Sa, K., Hardy, J., Lewis, C. M., Ryten, M., and Weale, M. E., An additional k-means clustering step improves the biological features of WGCNA gene co-expression networks. BMC Syst. Biol. 11(1):47, 2017.

    PubMed  PubMed Central  Article  Google Scholar 

  74. 74.

    Sathiya, G., and Kavitha, P., An efficient enhanced K-means approach with improved initial cluster centers. Middle-East J. Sci. Res. 20(1):100–107, 2014.

    Google Scholar 

  75. 75.

    Jain, A. K., Data clustering: 50 years beyond K-means. Pattern Recogn. Lett. 31(8):651–666, 2010.

    Article  Google Scholar 

  76. 76.

    Jiang, D., Pei, J., and Zhang, A., DHC: a density-based hierarchical clustering method for time series gene expression data. In: Bioinformatics and Bioengineering. Proceedings. Third IEEE Symposium on, 2003. IEEE, pp 393–400, 2003.

  77. 77.

    Kailing, K., Kriegel, H.-P., and Kröger, P., Density-connected subspace clustering for high-dimensional data. In: Proceedings of the 2004 SIAM International Conference on Data Mining. SIAM, pp 246–256, 2004.

  78. 78.

    Wang, L., Li, M., Han, X., and Zheng, K., An improved density-based spatial clustering of application with noise. International Journal of Computers and Applications: 1–7, 2018.

  79. 79.

    Günnemann, S., Boden, B., and Seidl, T., DB-CSC: a density-based approach for subspace clustering in graphs with feature vectors. Machine Learning and Knowledge Discovery in Databases:565–580, 2011.

  80. 80.

    Sittel, F., and Stock, G., Robust density-based clustering to identify metastable conformational states of proteins. J. Chem. Theory Comput. 12(5):2426–2435, 2016.

    PubMed  CAS  Article  Google Scholar 

  81. 81.

    Liu, S., Zhu, L., Sheong, F. K., Wang, W., and Huang, X., Adaptive partitioning by local density-peaks: An efficient density-based clustering algorithm for analyzing molecular dynamics trajectories. J. Comput. Chem. 38(3):152–160, 2017.

    PubMed  CAS  Article  Google Scholar 

  82. 82.

    Maltsev, N., Glass, E., Sulakhe, D., Rodriguez, A., Syed, M. H., Bompada, T., Zhang, Y., and D'souza, M., PUMA2—grid-based high-throughput analysis of genomes and metabolic pathways. Nucleic Acids Res. 34(suppl_1):D369–D372, 2006.

    PubMed  CAS  Article  Google Scholar 

  83. 83.

    Ortuso, F., Langer, T., and Alcaro, S., GBPM: GRID-based pharmacophore model: concept and application studies to protein–protein recognition. Bioinformatics 22(12):1449–1455, 2006.

    PubMed  CAS  Article  Google Scholar 

  84. 84.

    Porro, I., Torterolo, L., Corradi, L., Fato, M., Papadimitropoulos, A., Scaglione, S., Schenone, A., and Viti, F., A Grid-based solution for management and analysis of microarrays in distributed experiments. BMC Bioinf 8(1):S7, 2007.

    Article  Google Scholar 

  85. 85.

    Ren, J., Cai, B., and Hu, C., Clustering over data streams based on grid density and index tree. 6. https://doi.org/10.4156/jcit.vol6.issue1.11, 2011.

  86. 86.

    Liu, F., Ye, C., and Zhu, E., Accurate Grid-based Clustering Algorithm with Diagonal Grid Searching and Merging. In: IOP Conference Series: Materials Science and Engineering. 1: IOP Publishing, p 012123, 2017.

  87. 87.

    Si, Y., Liu, P., Li, P., and Brutnell, T. P., Model-based clustering for RNA-seq data. Bioinformatics 30(2):197–205, 2013.

    PubMed  Article  Google Scholar 

  88. 88.

    Abawajy, J. H., Kelarev, A. V., and Chowdhury, M., Multistage approach for clustering and classification of ECG data. Comput. Methods Prog. Biomed. 112(3):720–730, 2013.

    CAS  Article  Google Scholar 

  89. 89.

    Wang, J., Delabie, J., Aasheim, H. C., Smeland, E., and Myklebost, O., Clustering of the SOM easily reveals distinct gene expression patterns: results of a reanalysis of lymphoma study. BMC Bioinf 3(1):36, 2002. https://doi.org/10.1186/1471-2105-3-36.

    Article  Google Scholar 

  90. 90.

    Hinton, G. E., and Salakhutdinov, R. R., Reducing the dimensionality of data with neural networks. Science 313(5786):504–507, 2006.

    PubMed  CAS  Article  Google Scholar 

  91. 91.

    Hinton, G. E., Osindero, S., and Teh, Y.-W., A fast learning algorithm for deep belief nets. Neural Comput. 18(7):1527–1554, 2006.

    PubMed  Article  Google Scholar 

  92. 92.

    Bengio, Y., Lamblin, P., Popovici, D., and Larochelle, H., Greedy layer-wise training of deep networks. In: Advances in neural information processing systems. pp 153–160, 2007.

  93. 93.

    LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P., Gradient-based learning applied to document recognition. Proc. IEEE 86(11):2278–2324, 1998.

    Article  Google Scholar 

  94. 94.

    Pascanu, R., Mikolov, T., and Bengio, Y., On the difficulty of training recurrent neural networks. In: International Conference on Machine Learning. pp 1310–1318, 2013.

  95. 95.

    Hubel, D. H., and Wiesel, T. N., Receptive fields, binocular interaction and functional architecture in the cat's visual cortex. J. Physiol. 160(1):106–154, 1962.

    PubMed  PubMed Central  CAS  Article  Google Scholar 

  96. 96.

    Xu, J., Xiang, L., Hang, R., and Wu, J., Stacked Sparse Autoencoder (SSAE) based framework for nuclei patch classification on breast cancer histopathology. In: Biomedical Imaging (ISBI), 2014 IEEE 11th International Symposium on. IEEE, pp 999–1002, 2014.

  97. 97.

    Jia, W., Yang, M., and Wang, S.-H., Three-Category Classification of Magnetic Resonance Hearing Loss Images Based on Deep Autoencoder. J. Med. Syst. 41(10):165, 2017.

    PubMed  Article  Google Scholar 

  98. 98.

    Vincent, P., Larochelle, H., Bengio, Y., and Manzagol, P.-A., Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on Machine learning. ACM, pp 1096–1103, 2008.

  99. 99.

    Huang, G. B., Lee, H., and Learned-Miller, E., Learning hierarchical representations for face verification with convolutional deep belief networks. In: Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, pp 2518–2525. , 2012.

  100. 100.

    Lee, H., Pham, P., Largman, Y., Ng AY., Unsupervised feature learning for audio classification using convolutional deep belief networks. In: Advances in neural information processing systems, 2009. pp 1096–1104, 2009.

  101. 101.

    LeCun, Y., Bengio, Y., and Hinton, G., Deep learning. Nature 521(7553):436–444, 2015.

    PubMed  CAS  Article  Google Scholar 

  102. 102.

    Bengio, Y., Simard, P., and Frasconi, P., Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2):157–166, 1994.

    PubMed  CAS  Article  Google Scholar 

  103. 103.

    Gers, F. A., Schmidhuber, J., and Cummins F., Learning to forget: Continual prediction with LSTM.

  104. 104.

    Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio Y., Learnieng phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:14061078

  105. 105.

    Fakoor, R., Ladhak, F., Nazi, A., and Huber, M., Using deep learning to enhance cancer diagnosis and classification. In: Proceedings of the International Conference on Machine Learning, 2013.

  106. 106.

    Liang, M., Li, Z., Chen, T., and Zeng, J., Integrative data analysis of multi-platform cancer data with a multimodal deep learning approach. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) 12(4):928–937, 2015.

    CAS  Article  Google Scholar 

  107. 107.

    Gao, X., Lin, S., and Wong, T. Y., Automatic feature learning to grade nuclear cataracts based on deep learning. IEEE Trans. Biomed. Eng. 62(11):2693–2701, 2015.

    PubMed  Article  Google Scholar 

  108. 108.

    Liao, S., Gao, Y., Oto, A., and Shen, D., Representation learning: a unified deep learning framework for automatic prostate MR segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, 2013. Springer, pp 254–261, 2013.

  109. 109.

    Di Lena, P., Nagata, K., and Baldi, P., Deep architectures for protein contact map prediction. Bioinformatics 28(19):2449–2457, 2012.

    PubMed  PubMed Central  Article  Google Scholar 

  110. 110.

    Ditzler, G., Polikar, R., and Rosen, G., Multi-layer and recursive neural networks for metagenomic classification. IEEE Trans on Nanobiosci 14(6):608–616, 2015.

    Article  Google Scholar 

  111. 111.

    Majumdar, A., Real-time Dynamic MRI Reconstruction using Stacked Denoising Autoencoder. arXiv preprint arXiv:150306383, 2015.

  112. 112.

    Xu, Y., Dai, Z., Chen, F., Gao, S., Pei, J., and Lai, L., Deep learning for drug-induced liver injury. J. Chem. Inf. Model. 55(10):2085–2093, 2015.

    PubMed  CAS  Article  Google Scholar 

  113. 113.

    Holzinger, A., Dehmer, M., and Jurisica, I., Knowledge discovery and interactive data mining in bioinformatics-state-of-the-art, future challenges and research directions. BMC Bioinf 15(6):I1, 2014.

    Article  Google Scholar 

  114. 114.

    Min, S., Lee, B., and Yoon, S., Deep learning in bioinformatics. Brief. Bioinform. 18(5):851–869, 2017.

    PubMed  Google Scholar 

  115. 115.

    Lan, K., Fong, S., Song, W., Vasilakos, A. V., and Millham, R. C., Self-Adaptive Pre-Processing Methodology for Big Data Stream Mining in Internet of Things Environmental Sensor Monitoring. Symmetry 9(10):244, 2017.

    Article  Google Scholar 

  116. 116.

    Kashyap, H., Ahmed, H. A., Hoque, N., Roy, S., and Bhattacharyya, D. K., Big data analytics in bioinformatics: A machine learning perspective. arXiv preprint arXiv:150605101, 2015.

  117. 117.

    Holzinger, A., and Jurisica I., Knowledge discovery and data mining in biomedical informatics: The future is in integrative, interactive machine learning solutions. In: Interactive knowledge discovery and data mining in biomedical informatics. Springer, pp 1–18, 2014.

  118. 118.

    Kamal, S., Ripon, S. H., Dey, N., Ashour, A. S., and Santhi, V., A MapReduce approach to diminish imbalance parameters for big deoxyribonucleic acid dataset. Comput. Methods Prog. Biomed. 131:191–206, 2016.

    Article  Google Scholar 

  119. 119.

    Bhatt, C., Dey, N., and Ashour, A. S., Internet of things and big data technologies for next generation healthcare, 2017.

  120. 120.

    Dey, N., Hassanien, A. E., Bhatt, C., Ashour, A., and Satapathy, S. C., Internet of Things and Big Data Analytics Toward Next-Generation Intelligence. Berlin: Springer, 2018.

    Google Scholar 

  121. 121.

    Tamane, S., Tamane, S., Solanki, V. K., and Dey, N., Privacy and security policies in big data, 2017.

  122. 122.

    Dey, N., Bhatt, C., and Ashour, A. S., Big Data for Remote Sensing: Visualization, Analysis and Interpretation, 2018.

  123. 123.

    Kamal, M. S., Dey, N., and Ashour, A. S., Large Scale Medical Data Mining for Accurate Diagnosis: A Blueprint. In Handbook of Large-Scale Distributed Computing in Smart Healthcare (pp. 157–176). Springer: Cham, 2017.

  124. 124.

    Manogaran, G., and Lopez, D., Disease surveillance system for big climate data processing and dengue transmission. International Journal of Ambient Computing and Intelligence (IJACI) 8(2):88–105, 2017.

    Article  Google Scholar 

  125. 125.

    Jain, A., and Bhatnagar, V., Concoction of Ambient Intelligence and Big Data for Better Patient Ministration Services. International Journal of Ambient Computing and Intelligence (IJACI) 8(4):19–30, 2017.

    Article  Google Scholar 

  126. 126.

    Matallah, H., Belalem, G., and Bouamrane, K., Towards a New Model of Storage and Access to Data in Big Data and Cloud Computing. International Journal of Ambient Computing and Intelligence (IJACI) 8(4):31–44, 2017.

    Article  Google Scholar 

  127. 127.

    Vengadeswaran, S., and Balasundaram, S. R., An Optimal Data Placement Strategy for Improving System Performance of Massive Data Applications Using Graph Clustering. International Journal of Ambient Computing and Intelligence (IJACI) 9(3):15–30, 2018.

    Article  Google Scholar 

Download references

Funding

The authors are thankful to the financial support from the research grants, 1) MYRG2015–00024-FST, titled Building Sustainable Knowledge Networks through Online Communities’ offered by RDAO/FST, University of Macau and Macau SAR government. 2) MYRG2016–00069, titled ‘Nature-Inspired Computing and Metaheuristics Algorithms for Optimizing Data Mining Performance’ offered by RDAO/FST, University of Macau and Macau SAR government. 3) FDCT/126/2014/A3, titled ‘A Scalable Data Stream Mining Methodology: Stream-based Holistic Analytics and Reasoning in Parallel’ offered by FDCT of Macau SAR government.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Simon Fong.

Ethics declarations

Conflict of Interest

The authors declare that this article content has no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

This article is part of the Topical Collection on Image & Signal Processing

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lan, K., Wang, Dt., Fong, S. et al. A Survey of Data Mining and Deep Learning in Bioinformatics. J Med Syst 42, 139 (2018). https://doi.org/10.1007/s10916-018-1003-9

Download citation

Keywords

  • Bioinformatics
  • Biomedicine
  • Data mining
  • Machine learning
  • Deep learning