Massive datasets and machine learning for computational biomedicine: trends and challenges

  • Anton KocheturovEmail author
  • Panos M. Pardalos
  • Athanasia Karakitsiou
S.I.: Computational Biomedicine


This survey paper attempts to cover a broad range of topics related to computational biomedicine. The field has been attracting great attention due to a number of benefits it can provide the society with. New technological and theoretical advances have made it possible to progress considerably. Traditionally, problems emerging in this field are challenging from many perspectives. In this paper, we considered the influence of big data on the field, problems associated with massive datasets in biomedicine and ways to address these problems. We analyzed the most commonly used machine learning and feature mining tools and several new trends and tendencies such as deep learning and biological networks for computational biomedicine.



Panos Pardalos was partially supported by Laboratory of Algorithm and Technologies for Network Analysis, Nizhny Novgorod, Russia.


  1. Abdi, H., & Williams, L. J. (2010). Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2(4), 433–459.CrossRefGoogle Scholar
  2. Abeyratne, U. R., Tun, A. K., Lye, N. T., Guanglan, Z., & Saratchandran, P. (2000). RBF networks for source localization in quantitative electrophysiology. Critical Reviews in Biomedical Engineering, 28(3&4), 463–472.CrossRefGoogle Scholar
  3. Acharya, U. R., Faust, O., Kadri, N. A., Suri, J. S., & Yu, W. (2013). Automated identification of normal and diabetes heart rate signals using nonlinear measures. Computers in Biology and Medicine, 43(10), 1523–1529.CrossRefGoogle Scholar
  4. Acharya, U. R., Sree, S. V., Ang, P. C. A., Yanti, R., & Suri, J. S. (2012). Application of non-linear and wavelet based features for the automated identification of epileptic EEG signals. International Journal of Neural Systems, 22(02), 1250002.CrossRefGoogle Scholar
  5. Aizerman, M. A., Braverman, E. M., & Rozonoer, L. I. (1964). Theoretical foundations of potential function method in pattern recognition. Automation and Remote Control, 25, 917–936.Google Scholar
  6. Albarqouni, S., Baur, C., Achilles, F., Belagiannis, V., Demirci, S., & Navab, N. (2016). Aggnet: Deep learning from crowds for mitosis detection in breast cancer histology images. IEEE Transactions on Medical Imaging, 35(5), 1313–1321.CrossRefGoogle Scholar
  7. Albert, R., Jeong, H., & Barabási, A.-L. (1999). Internet: Diameter of the world-wide web. Nature, 401(6749), 130.CrossRefGoogle Scholar
  8. Almeida, L. B. (2003). Misep-linear and nonlinear ica based on mutual information. Journal of Machine Learning Research, 4, 1297–1318.Google Scholar
  9. Azevedo, F. A. C., Carvalho, L. R. B., Grinberg, L. T., Farfel, J. M., Ferretti, R. E. L., Leite, R. E. P., et al. (2009). Equal numbers of neuronal and nonneuronal cells make the human brain an isometrically scaled-up primate brain. Journal of Comparative Neurology, 513(5), 532–541.CrossRefGoogle Scholar
  10. Balasubramanian, M., & Schwartz, E. L. (2002). The isomap algorithm and topological stability. Science, 295(5552), 7–7.CrossRefGoogle Scholar
  11. Baldi, P. (2012). Autoencoders, unsupervised learning, and deep architectures. In Proceedings of ICML workshop on unsupervised and transfer learning (pp. 37–49).Google Scholar
  12. Barabási, A.-L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509–512.CrossRefGoogle Scholar
  13. Barua, S., Islam, M. M., Yao, X., & Murase, K. (2014). Mwmote-majority weighted minority oversampling technique for imbalanced data set learning. IEEE Transactions on Knowledge and Data Engineering, 26(2), 405–425.CrossRefGoogle Scholar
  14. Batal, I., Cooper, G. F., Fradkin, D., Harrison, J., Moerchen, F., & Hauskrecht, M. (2016). An efficient pattern mining approach for event detection in multivariate temporal data. Knowledge and Information Systems, 46(1), 115–150.CrossRefGoogle Scholar
  15. Bock, D. D., Lee, W.-C. A., Kerlin, A. M., Andermann, M. L., Hood, G., Wetzel, A. W., et al. (2011). Network anatomy and in vivo physiology of visual cortical neurons. Nature, 471(7337), 177–182.CrossRefGoogle Scholar
  16. Boginski, V., & Commander, C. W. (2009). Identifying critical nodes in protein–protein interaction networks. In Clustering challenges in biological networks (pp. 153–167). World Scientific.Google Scholar
  17. Borghini, G., Astolfi, L., Vecchiato, G., Mattia, D., & Babiloni, F. (2014). Measuring neurophysiological signals in aircraft pilots and car drivers for the assessment of mental workload, fatigue and drowsiness. Neuroscience & Biobehavioral Reviews, 44, 58–75.CrossRefGoogle Scholar
  18. Boser, B. E., Guyon, I. M., Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. In Proceedings of the fifth annual workshop on computational learning theory (pp. 144–152). ACM.Google Scholar
  19. Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.Google Scholar
  20. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.CrossRefGoogle Scholar
  21. Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and regression trees. Boca Raton: CRC press.Google Scholar
  22. Brosch, T., Tang, L. Y. W., Yoo, Y., Li, D. K. B., Traboulsee, A., & Tam, R. (2016). Deep 3d convolutional encoder networks with shortcuts for multiscale feature integration applied to multiple sclerosis lesion segmentation. IEEE Transactions on Medical Imaging, 35(5), 1229–1239.CrossRefGoogle Scholar
  23. Butenko, S., Chaovalitwongse, W. A., & Pardalos, P. M. (2009). Clustering challenges in biological networks. Singapore: World Scientific.CrossRefGoogle Scholar
  24. Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., et al. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14(5), 365.CrossRefGoogle Scholar
  25. Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM Computing Surveys (CSUR), 41(3), 15.CrossRefGoogle Scholar
  26. Chan, H.-P., Lo, S.-C. B., Sahiner, B., Lam, K. L., & Helvie, M. A. (1995). Computer-aided detection of mammographic microcalcifications: Pattern recognition with an artificial neural network. Medical Physics, 22(10), 1555–1567.CrossRefGoogle Scholar
  27. Chang, H.-H., & Moura, J. M. F. (2010). Biomedical signal processing. Biomedical Engineering and Design Handbook, 2, 559–579.Google Scholar
  28. Chang, R. L., Ghamsari, L., Manichaikul, A., Hom, E. F. Y., Balaji, S., Weiqi, F., et al. (2011). Metabolic network reconstruction of chlamydomonas offers insight into light-driven algal metabolism. Molecular Systems Biology, 7(1), 518.CrossRefGoogle Scholar
  29. Chang, Y. D. C., Ido, M. S., & Long, Q. (2016). Multiple imputation for general missing data patterns in the presence of high-dimensional data. Scientific Reports, 6, 21689.CrossRefGoogle Scholar
  30. Chaovalitwongse, W. A., & Pardalos, P. M. (2008). On the time series support vector machine using dynamic time warping kernel for brain activity classification. Cybernetics and Systems Analysis, 44(1), 125–138.CrossRefGoogle Scholar
  31. Charles, D., Gabriel, M., & Furukawa, M. F. (2013). Adoption of electronic health record systems among us non-federal acute care hospitals: 2008–2012. ONC Data Brief, 9, 1–9.Google Scholar
  32. Chawla, M. P. S. (2011). Pca and ica processing methods for removal of artifacts and noise in electrocardiograms: A survey and comparison. Applied Soft Computing, 11(2), 2216–2226.CrossRefGoogle Scholar
  33. Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357.CrossRefGoogle Scholar
  34. Chou, K.-C., & Shen, H.-B. (2007). Recent progress in protein subcellular location prediction. Analytical Biochemistry, 370(1), 1–16.CrossRefGoogle Scholar
  35. CireşAn, D., Meier, U., Masci, J., & Schmidhuber, J. (2012). Multi-column deep neural network for traffic sign classification. Neural Networks, 32, 333–338.CrossRefGoogle Scholar
  36. Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.Google Scholar
  37. Crookston, N. L., Finley, A. O., et al. (2008). yaimpute: An R package for kNN imputation. Journal of Statistical Software, 23(10), 1–16.CrossRefGoogle Scholar
  38. Csermely, P., Korcsmáros, T., Kiss, H. J. M., London, G., & Nussinov, R. (2013). Structure and dynamics of molecular networks: A novel paradigm of drug discovery: A comprehensive review. Pharmacology & Therapeutics, 138(3), 333–408.CrossRefGoogle Scholar
  39. de Rooij, M., Crienen, S., Witjes, J. A., Barentsz, J. O., Rovers, M. M., & Grutters, J. P. C. (2014). Cost-effectiveness of magnetic resonance (mr) imaging and mr-guided targeted biopsy versus systematic transrectal ultrasound-guided biopsy in diagnosing prostate cancer: A modelling study from a health care perspective. European Urology, 66(3), 430–436.CrossRefGoogle Scholar
  40. De Solla Price, D. J. (1965). Networks of scientific papers. Science, 149, 510–515.CrossRefGoogle Scholar
  41. Dehzangi, A., Paliwal, K., Sharma, A., Dehzangi, O., & Sattar, A. (2013). A combination of feature extraction methods with an ensemble of different classifiers for protein structural class prediction problem. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 10(3), 564–575.CrossRefGoogle Scholar
  42. Delorme, A., Sejnowski, T., & Makeig, S. (2007). Enhanced detection of artifacts in EEG data using higher-order statistics and independent component analysis. Neuroimage, 34(4), 1443–1449.CrossRefGoogle Scholar
  43. Donoho, D. L., & Grimes, C. (2003). Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data. Proceedings of the National Academy of Sciences, 100(10), 5591–5596.CrossRefGoogle Scholar
  44. Drummond, C., Holte, R. C., et al. (2003). C4.5, class imbalance, and cost sensitivity: Why under-sampling beats over-sampling. In Workshop on learning from imbalanced datasets II (Vol. 11, pp. 1–8). Citeseer.Google Scholar
  45. Duarte, N. C., Becker, S. A., Jamshidi, N., Thiele, I., Mo, M. L., Vo, T. D., et al. (2007). Global reconstruction of the human metabolic network based on genomic and bibliomic data. Proceedings of the National Academy of Sciences, 104(6), 1777–1782.CrossRefGoogle Scholar
  46. Efron, B., Hastie, T., Johnstone, I., Tibshirani, R., et al. (2004). Least angle regression. The Annals of Statistics, 32(2), 407–499.CrossRefGoogle Scholar
  47. Eguiluz, V. M., Chialvo, D. R., Cecchi, G. A., Baliki, M., & Apkarian, A. V. (2005). Scale-free brain functional networks. Physical Review Letters, 94(1), 018102.CrossRefGoogle Scholar
  48. Eisenstein, M. (2015). Big data: The power of petabytes. Nature, 527(7576), S2–S4.CrossRefGoogle Scholar
  49. Elbuni, A., Kanoun, S., Elbuni, M., & Ali, N. (2009). ECG parameter extraction algorithm using (dwtae) algorithm. In International conference on computer engineering & systems, 2009. ICCES 2009 (pp. 315–320). IEEE.Google Scholar
  50. Elkan, C. (2001). The foundations of cost-sensitive learning. In International joint conference on artificial intelligence (Vol. 17, pp. 973–978). Lawrence Erlbaum Associates Ltd.Google Scholar
  51. Enders, C. K. (2010). Applied missing data analysis. Guilford Press.Google Scholar
  52. Fan, W., Stolfo, S. J., Zhang, J., & Chan, P. K. (1999). Adacost: Misclassification cost-sensitive boosting. In Icml (Vol. 99, pp. 97–105).Google Scholar
  53. Faust, O., Acharya, U. R., Adeli, H., & Adeli, A. (2015). Wavelet-based EEG processing for computer-aided seizure detection and epilepsy diagnosis. Seizure-European Journal of Epilepsy, 26, 56–64.CrossRefGoogle Scholar
  54. Ferrari, M., & Quaresima, V. (2012). A brief review on the history of human functional near-infrared spectroscopy (fnirs) development and fields of application. Neuroimage, 63(2), 921–935.CrossRefGoogle Scholar
  55. Freeman, L. (1977). A set of measures of centrality based on betweenness. Sociometry, 40(1), 35–41. Scholar
  56. Freund, Y., Schapire, R. E., et al. (1996). Experiments with a new boosting algorithm. In Icml (Vol. 96, pp. 148–156). Bari, Italy.Google Scholar
  57. Friedman, J. H. (1991). Multivariate adaptive regression splines. The Annals of Statistics, 19, 1–67.CrossRefGoogle Scholar
  58. Friedman, J. H. (2002). Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4), 367–378.CrossRefGoogle Scholar
  59. Furnival, G. M., & Wilson, R. W. (1974). Regressions by leaps and bounds. Technometrics, 16(4), 499–511.CrossRefGoogle Scholar
  60. Gao, Z.-K., Cai, Q., Yang, Y.-X., Dang, W.-D., & Zhang, S.-S. (2016). Multiscale limited penetrable horizontal visibility graph for analyzing nonlinear time series. Scientific Reports, 6, 35622.CrossRefGoogle Scholar
  61. Gardner, A. B., Worrell, G. A., Marsh, E., Dlugos, D., & Litt, B. (2007). Human and automated detection of high-frequency oscillations in clinical intracranial EEG recordings. Clinical Neurophysiology, 118(5), 1134–1143.CrossRefGoogle Scholar
  62. Gilchrist, J., Ennett, C.M., Frize, M., & Bariciak, E. (2011). Neonatal mortality prediction using real-time medical measurements. In 2011 IEEE international workshop on medical measurements and applications proceedings (MeMeA) (pp. 65–70). IEEE.Google Scholar
  63. Glasser, M. F., Coalson, T. S., Robinson, E. C., Hacker, C. D., Harwell, J., Yacoub, E., et al. (2016). A multi-modal parcellation of human cerebral cortex. Nature, 536(7615), 171–178.CrossRefGoogle Scholar
  64. Goel, S., Tomar, P., & Kaur, G. (2016). An optimal wavelet approach for ECG noise cancellation. International Journal of Bio-Science and Bio-Technology, 8(4), 39–52.CrossRefGoogle Scholar
  65. Gong, G., He, Y., Concha, L., Lebel, C., Gross, D. W., Evans, A. C., et al. (2008). Mapping anatomical connectivity patterns of human cerebral cortex using in vivo diffusion tensor imaging tractography. Cerebral Cortex, 19(3), 524–536.CrossRefGoogle Scholar
  66. Gorber, S. C., Tremblay, M., Moher, D., & Gorber, B. (2007). A comparison of direct vs. self-report measures for assessing height, weight and body mass index: A systematic review. Obesity Reviews, 8(4), 307–326.CrossRefGoogle Scholar
  67. Graves, A., Mohamed, A., & Hinton, G. (2013). Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing (icassp) (pp. 6645–6649). IEEE.Google Scholar
  68. Grech, R., Cassar, T., Muscat, J., Camilleri, K. P., Fabri, S. G., Zervakis, M., et al. (2008). Review on solving the inverse problem in eeg source analysis. Journal of Neuroengineering and Rehabilitation, 5(1), 25.CrossRefGoogle Scholar
  69. Green, W. J. F., Ball, G., Hulman, G., Johnson, C., Van Schalwyk, G., Ratan, H. L., et al. (2016). KI67 and DLX2 predict increased risk of metastasis formation in prostate cancer-a targeted molecular approach. British Journal of Cancer, 115(2), 236.CrossRefGoogle Scholar
  70. Greenspan, H., van Ginneken, B., & Summers, R. M. (2016). Guest editorial deep learning in medical imaging: Overview and future promise of an exciting new technique. IEEE Transactions on Medical Imaging, 35(5), 1153–1159.CrossRefGoogle Scholar
  71. Grossi, E., Veggo, F., Narzisi, A., Compare, A., & Muratori, F. (2016). Pregnancy risk factors in autism: A pilot study with artificial neural networks. Pediatric Research, 79(2), 339.CrossRefGoogle Scholar
  72. Guo, H., & Viktor, H. L. (2004). Learning from imbalanced data sets with boosting and data generation: The databoost-im approach. ACM Sigkdd Explorations Newsletter, 6(1), 30–39.CrossRefGoogle Scholar
  73. Hajian-Tilaki, K. (2013). Receiver operating characteristic (roc) curve analysis for medical diagnostic test evaluation. Caspian Journal of Internal Medicine, 4(2), 627.Google Scholar
  74. Halford, J. J., Sabau, D., Drislane, F. W., Tsuchida, T. N., & Sinha, S. R. (2016). American clinical neurophysiology society guideline 4: Recording clinical eeg on digital media. The Neurodiagnostic Journal, 56(4), 261–265.CrossRefGoogle Scholar
  75. Han, H., Wang, W.-Y., & Mao, B.-H. (2005). Borderline-smote: A new over-sampling method in imbalanced data sets learning. In International conference on intelligent computing (pp. 878–887). Springer.Google Scholar
  76. Harrison, R. R., Kier, R. J., Chestek, C. A., Gilja, V., Nuyujukian, P., Ryu, S., et al. (2009). Wireless neural recording with single low-power integrated circuit. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 17(4), 322–329.CrossRefGoogle Scholar
  77. He, H., Bai, Y., Garcia, E. A., & Li, S. (2008). Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In IEEE international joint conference on neural networks, 2008. IJCNN 2008 (IEEE world congress on computational intelligence) (pp. 1322–1328). IEEE.Google Scholar
  78. He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284.CrossRefGoogle Scholar
  79. Helmstaedter, M. (2013). Cellular-resolution connectomics: Challenges of dense neural circuit reconstruction. Nature Methods, 10(6), 501.CrossRefGoogle Scholar
  80. Hess, K. R., Keith Anderson, W., Symmans, F., Valero, V., Ibrahim, N., Mejia, J. A., et al. (2006). Pharmacogenomic predictor of sensitivity to preoperative chemotherapy with paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide in breast cancer. Journal of Clinical Oncology, 24(26), 4236–4244.CrossRefGoogle Scholar
  81. Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55–67.CrossRefGoogle Scholar
  82. Hoffmann, A., Huang, Y., Suetsugu-Maki, R., Ringelberg, C. S., Tomlinson, C. R., Rio-Tsonis, K. D., et al. (2012). Implication of the mir-184 and mir-204 competitive rna network in control of mouse secondary cataract. Molecular Medicine, 18(1), 528.Google Scholar
  83. Hormozdiari, F., Penn, O., Borenstein, E., & Eichler, E. E. (2015). The discovery of integrated gene networks for autism and related disorders. Genome Research, 25(1), 142–154.CrossRefGoogle Scholar
  84. Huang, P.-S., Boyken, S. E., & Baker, D. (2016). The coming of age of de novo protein design. Nature, 537(7620), 320–327.CrossRefGoogle Scholar
  85. Hughes, C., Henderson, A., Kansiz, M., Dorling, K. M., Jimenez-Hernandez, M., Brown, Michael D., et al. (2015). Enhanced ftir bench-top imaging of single biological cells. Analyst, 140(7), 2080–2085.CrossRefGoogle Scholar
  86. Hyvärinen, A., Karhunen, J., & Oja, E. (2004). Independent component analysis (Vol. 46). Wiley.Google Scholar
  87. Hyvärinen, A., & Pajunen, P. (1999). Nonlinear independent component analysis: Existence and uniqueness results. Neural Networks, 12(3), 429–439.CrossRefGoogle Scholar
  88. Iasemidis, L. D., Shiau, D.-S., Pardalos, P. M., Chaovalitwongse, W., Narayanan, K., Prasad, A., et al. (2005). Long-term prospective on-line real-time seizure prediction. Clinical Neurophysiology, 116(3), 532–544.CrossRefGoogle Scholar
  89. Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine, 2(8), e124.CrossRefGoogle Scholar
  90. Jeong, H., Mason, S. P., Barabási, A.-L., & Oltvai, Z. N. (2001). Lethality and centrality in protein networks. Nature, 411(6833), 41.CrossRefGoogle Scholar
  91. Jeong, H., Tombor, B., Albert, R., Oltvai, Z. N., & Barabási, A.-L. (2000). The large-scale organization of metabolic networks. Nature, 407(6804), 651.CrossRefGoogle Scholar
  92. Jia, J., Liu, Z., Xiao, X., Liu, B., & Chou, K.-C. (2015). ippi-esml: An ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into pseaac. Journal of Theoretical Biology, 377, 47–56.CrossRefGoogle Scholar
  93. Jia, Y., Wei, E., Wang, X., Zhang, X., Morrison, J. C., Parikh, M., et al. (2014). Optical coherence tomography angiography of optic disc perfusion in glaucoma. Ophthalmology, 121(7), 1322–1332.CrossRefGoogle Scholar
  94. Johnson, A. E. W., Pollard, T. J., Shen, L., Li-wei, H. L., Feng, M., Ghassemi, M., et al. (2016). Mimic-III, a freely accessible critical care database. Scientific Data, 3, 160035.CrossRefGoogle Scholar
  95. Johnsson, P., Ackley, A., Vidarsdottir, L., Lui, W.-O., Corcoran, M., Grandér, D., et al. (2013). A pseudogene long-noncoding-rna network regulates pten transcription and translation in human cells. Nature Structural and Molecular Biology, 20(4), 440.CrossRefGoogle Scholar
  96. Jombart, T., Devillard, S., & Balloux, F. (2010). Discriminant analysis of principal components: A new method for the analysis of genetically structured populations. BMC Genetics, 11(1), 94.CrossRefGoogle Scholar
  97. Kabir, M. A., & Shahnaz, C. (2012). Denoising of ECG signals based on noise reduction algorithms in EMD and wavelet domains. Biomedical Signal Processing and Control, 7(5), 481–489.CrossRefGoogle Scholar
  98. Kasthuri, N., Hayworth, K. J., Berger, D. R., Schalek, R. L., Conchello, J. A., Knowles-Barley, S., et al. (2015). Saturated reconstruction of a volume of neocortex. Cell, 162(3), 648–661.CrossRefGoogle Scholar
  99. Khaligh-Razavi, S.-M., & Kriegeskorte, N. (2014). Deep supervised, but not unsupervised, models may explain it cortical representation. PLoS Computational Biology, 10(11), e1003915.CrossRefGoogle Scholar
  100. Khalilia, M., Chakraborty, S., & Popescu, M. (2011). Predicting disease risks from highly imbalanced data using random forest. BMC Medical Informatics and Decision Making, 11(1), 51.CrossRefGoogle Scholar
  101. Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97(1–2), 273–324.CrossRefGoogle Scholar
  102. Kohonen, T. (1998). The self-organizing map. Neurocomputing, 21(1–3), 1–6.CrossRefGoogle Scholar
  103. Korenkevych, D., Chien, J.-H., Zhang, J., Shiau, D.-S., Sackellares, C., & Pardalos, P. M. (2013). Small world networks in computational neuroscience. In Handbook of combinatorial optimization (pp. 3057–3088). Springer.Google Scholar
  104. Korenkevych, D., Ozrazgat-Baslanti, T., Thottakkara, P., Hobson, C. E., Pardalos, P., Momcilovic, P., et al. (2016). The pattern of longitudinal change in serum creatinine and ninety-day mortality after major surgery. Annals of Surgery, 263(6), 1219.CrossRefGoogle Scholar
  105. Kotsiantis, S., Kanellopoulos, D., Pintelas, P., et al. (2006). Handling imbalanced datasets: A review. GESTS International Transactions on Computer Science and Engineering, 30(1), 25–36.Google Scholar
  106. Kubat, M., & Matwin, S. (1997). Addressing the curse of imbalanced data sets: One sided sampling. In Proceedings of the fourteenth international conference on machine learning (pp. 179–186).Google Scholar
  107. Latora, V., & Marchiori, M. (2003). Economic small-world behavior in weighted networks. The European Physical Journal B-Condensed Matter and Complex Systems, 32(2), 249–263.CrossRefGoogle Scholar
  108. Lee, D.-S., Park, J., Kay, K. A., Christakis, N. A., Oltvai, Z. N., & Barabási, A.-L. (2008). The implications of human metabolic network topology for disease comorbidity. Proceedings of the National Academy of Sciences, 105(29), 9880–9885.CrossRefGoogle Scholar
  109. Ling, C. X., & Li, C. (1998). Data mining for direct marketing: Problems and solutions. In KDD (Vol. 98, pp. 73–79).Google Scholar
  110. Ling, C. X., & Sheng, V. S. (2011). Cost-sensitive learning. In Encyclopedia of machine learning (pp. 231–235). Springer.Google Scholar
  111. Ling, C. X., Yang, Q., Wang, J., & Zhang, S. (2004). Decision trees with minimal costs. In Proceedings of the twenty-first international conference on Machine learning (p.  69). ACM.Google Scholar
  112. Liu, B., Wei, Y., Zhang, Y., & Yang, Q. (2017). Deep neural networks for high dimension, low sample size data. In Proceedings of the twenty-sixth international joint conference on artificial intelligence, IJCAI-17 (pp. 2287–2293).Google Scholar
  113. Liu, W., Liu, C., Chen, F., Yang, J., & Zheng, L. (2016). Discrimination of transgenic soybean seeds by terahertz spectroscopy. Scientific Reports, 6, 35799.CrossRefGoogle Scholar
  114. Liu, X.-Y., Wu, J., & Zhou, Z.-H. (2009). Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 39(2), 539–550.CrossRefGoogle Scholar
  115. Liu, X.-Y., & Zhou, Z.-H. (2006). The influence of class imbalance on cost-sensitive learning: An empirical study. In Sixth international conference on data mining, 2006. ICDM’06 (pp. 970–974). IEEE.Google Scholar
  116. Lorente, D., Aleixos, N., Gómez-Sanchis, J., Cubero, S., García-Navarrete, Or L., & Blasco, J. (2012). Recent advances and applications of hyperspectral imaging for fruit and vegetable quality assessment. Food and Bioprocess Technology, 5(4), 1121–1142.CrossRefGoogle Scholar
  117. Lowery, A. J., Miller, N., Devaney, A., McNeill, R. E., Davoren, P. A., Lemetre, C., et al. (2009). Microrna signatures predict oestrogen receptor, progesterone receptor and her2/neu receptor status in breast cancer. Breast Cancer Research, 11(3), R27.CrossRefGoogle Scholar
  118. Luo, J., Min, W., Gopukumar, D., & Zhao, Y. (2016). Big data application in biomedical research and health care: A literature review. Biomedical Informatics Insights, 8, 1.Google Scholar
  119. Mangasarian, O. L., & Wild, E. W. (2006). Multisurface proximal support vector machine classification via generalized eigenvalues. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(1), 69–74.CrossRefGoogle Scholar
  120. Mani, I., & Zhang, I. (2003). kNN approach to unbalanced data distributions: A case study involving information extraction. In Proceedings of workshop on learning from imbalanced datasets (Vol. 126).Google Scholar
  121. Manjón, J. V., Coupé, P., & Buades, A. (2015). Mri noise estimation and denoising using non-local pca. Medical Image Analysis, 22(1), 35–47.CrossRefGoogle Scholar
  122. Mardis, E. R. (2011). A decades perspective on DNA sequencing technology. Nature, 470(7333), 198.CrossRefGoogle Scholar
  123. Martis, R. J., Acharya, U. R., Lim, C. M., Mandana, K. M., Ray, A. K., & Chakraborty, C. (2013). Application of higher order cumulant features for cardiac health diagnosis using ECG signals. International Journal of Neural Systems, 23(04), 1350014.CrossRefGoogle Scholar
  124. McCarthy, K., Zabar, B., & Weiss, G. (2005). Does cost-sensitive learning beat sampling for classifying rare classes? In Proceedings of the 1st international workshop on Utility-based data mining (pp. 69–77). ACM.Google Scholar
  125. Mika, S., Ratsch, G., Weston, J., Scholkopf, B., & Mullers, K.-R. (1999). Fisher discriminant analysis with kernels. In Neural networks for signal processing IX, 1999. Proceedings of the 1999 IEEE signal processing society workshop (pp. 41–48). IEEE.Google Scholar
  126. Mikula, S. (2016). Progress towards mammalian whole-brain cellular connectomics. Frontiers in Neuroanatomy, 10, 62.CrossRefGoogle Scholar
  127. Ming, L., Zhang, Q., Deng, M., Miao, J., Guo, Y., Gao, W., et al. (2008). An analysis of human microrna and disease associations. PloS ONE, 3(10), e3420.CrossRefGoogle Scholar
  128. Miranda, H., Gilja, V., Chestek, C. A., Shenoy, K. V., & Meng, T. H. (2010). Hermesd: A high-rate long-range wireless transmission system for simultaneous multichannel neural recording applications. IEEE Transactions on Biomedical Circuits and Systems, 4(3), 181–191.CrossRefGoogle Scholar
  129. Moore, G. E., et al. (1975). Progress in digital integrated electronics. Electron Devices Meeting, 21, 11–13.Google Scholar
  130. Murray, C. J. L., Lozano, R., Flaxman, A. D., Serina, P., Phillips, D., Stewart, A., et al. (2014). Using verbal autopsy to measure causes of death: The comparative performance of existing methods. BMC Medicine, 12(1), 5.CrossRefGoogle Scholar
  131. Naimi, H., Adamou-Mitiche, A. B. H., & Mitiche, L. (2015). Medical image denoising using dual tree complex thresholding wavelet transform and wiener filter. Journal of King Saud University-Computer and Information Sciences, 27(1), 40–45.CrossRefGoogle Scholar
  132. Naseer, N., Hong, M. J., & Hong, K.-S. (2014). Online binary decision decoding using functional near-infrared spectroscopy for the development of brain-computer interface. Experimental Brain Research, 232(2), 555–564.CrossRefGoogle Scholar
  133. Newman, M. E. J. (2012). Communities, modules and large-scale structure in networks. Nature Physics, 8(1), 25.CrossRefGoogle Scholar
  134. Newman, M. E. J., & Girvan, M. (2004). Finding and evaluating community structure in networks. Physical Review E, 69(2), 026113.CrossRefGoogle Scholar
  135. Ng, M., Fleming, T., Robinson, M., Thomson, B., Graetz, N., Margono, C., et al. (2014). Global, regional, and national prevalence of overweight and obesity in children and adults during 1980–2013: A systematic analysis for the global burden of disease study 2013. The Lancet, 384(9945), 766–781.CrossRefGoogle Scholar
  136. Nguyen, T. B., Wang, S., Anugu, V., Rose, N., McKenna, M., Petrick, N., et al. (2012). Distributed human intelligence for colonic polyp classification in computer-aided detection for CT colonography. Radiology, 262(3), 824–833.CrossRefGoogle Scholar
  137. Niedermeyer, E., & da Silva, F. L. (Eds.). (2005). Electroencephalography: Basic principles, clinical applications, and related fields. Lippincott Williams & Wilkins.Google Scholar
  138. Nunez, P. L., & Pilgreen, K. L. (1991). The spline-laplacian in clinical neurophysiology: A method to improve EEG spatial resolution. Journal of Clinical Neurophysiology: Official Publication of the American Electroencephalographic Society, 8(4), 397–413.CrossRefGoogle Scholar
  139. Oberhardt, M. A., Palsson, B. Ø., & Papin, J. A. (2009). Applications of genome-scale metabolic reconstructions. Molecular Systems Biology, 5(1), 320.Google Scholar
  140. Oh, S., Lee, M. S., & Zhang, B.-T. (2011). Ensemble learning with active example selection for imbalanced biomedical data classification. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 8(2), 316–325.CrossRefGoogle Scholar
  141. Orth, J. D., Conrad, T. M., Na, J., Lerman, J. A., Nam, H., Feist, A. M., et al. (2011). A comprehensive genome-scale reconstruction of escherichia coli metabolism2011. Molecular Systems Biology, 7(1), 535.CrossRefGoogle Scholar
  142. Pappu, V., Panagopoulos, O. P., Xanthopoulos, P., & Pardalos, P. M. (2015). Sparse proximal support vector machines for feature selection in high dimensional datasets. Expert Systems with Applications, 42(23), 9183–9191.CrossRefGoogle Scholar
  143. Pardalos, P. M., Chaovalitwongse, W., Iasemidis, L. D., Sackellares, J. C., Shiau, D.-S., Carney, P. R., et al. (2004). Seizure warning algorithm based on optimization and nonlinear dynamics. Mathematical Programming, 101(2), 365–385.CrossRefGoogle Scholar
  144. Park, Y. S., Choi, Y. H., Lee, H. S., Moon, D. J., Kim, S. G., Lee, J. H., et al. (2013). The impact of laser doppler imaging on the early decision-making process for surgical intervention in adults with indeterminate burns. Burns, 39(4), 655–661.CrossRefGoogle Scholar
  145. Peng, Y., Jiang, Y., Yang, C., Brown, J. B., Antic, T., Sethi, I., et al. (2013). Quantitative analysis of multiparametric prostate mr images: Differentiation between prostate cancer and normal tissue and correlation with gleason scorea computer-aided diagnosis development study. Radiology, 267(3), 787–796.CrossRefGoogle Scholar
  146. Picard, D. (1985). Testing and estimating change-points in time series. Advances in Applied Probability, 17(4), 841–867.CrossRefGoogle Scholar
  147. Quinlan, J. R. (1993). Combining instance-based and model-based learning. In Proceedings of the tenth international conference on machine learning (pp. 236–243).Google Scholar
  148. Quinlan, J. R, et al. (1992). Learning with continuous classes. In 5th Australian joint conference on artificial intelligence (Vol. 92, pp. 343–348). Singapore.Google Scholar
  149. Raghunathan, T., & Siscovick, D. (1996). A multiple-imputation analysis of a case-control study of the risk of primary cardiac arrest among pharmacologically treated hypertensives. Journal of the Royal Statistical Society. Series C (Applied Statistics), 45, 335–352.Google Scholar
  150. Ramgopal, S., Thome-Souza, S., Jackson, M., Kadish, N. E., Fernández, I. S., Klehm, J., et al. (2014). Seizure detection, seizure prediction, and closed-loop warning systems in epilepsy. Epilepsy & behavior, 37, 291–307.CrossRefGoogle Scholar
  151. Robb, R. A. (1999). Biomedical imaging, visualization, and analysis. Wiley.Google Scholar
  152. Rokach, L. (2010). Ensemble-based classifiers. Artificial Intelligence Review, 33(1–2), 1–39.CrossRefGoogle Scholar
  153. Romero, I. (2011). PCA and ICA applied to noise reduction in multi-lead ECG. In Computing in cardiology, 2011 (pp. 613–616). IEEE.Google Scholar
  154. Roweis, S. T., & Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500), 2323–2326.CrossRefGoogle Scholar
  155. Rubin, D. B. (2004). Multiple imputation for nonresponse in surveys (Vol. 81). Wiley.Google Scholar
  156. Saeys, Y., Inza, I., & Larrañaga, P. (2007). A review of feature selection techniques in bioinformatics. Bioinformatics, 23(19), 2507–2517.CrossRefGoogle Scholar
  157. Salam, M. T., Sawan, M., & Nguyen, D. K. (2011). A novel low-power-implantable epileptic seizure-onset detector. IEEE Transactions on Biomedical Circuits and Systems, 5(6), 568–578.CrossRefGoogle Scholar
  158. Salathé, M., Kazandjieva, M., Lee, J. W., Levis, P., Feldman, M. W., & Jones, J. H. (2010). A high-resolution human contact network for infectious disease transmission. Proceedings of the National Academy of Sciences, 107(51), 22020–22025.CrossRefGoogle Scholar
  159. Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85–117.CrossRefGoogle Scholar
  160. Scholz, M., Kaplan, F., Guy, C. L., Kopka, J., & Selbig, J. (2005). Non-linear PCA: A missing data approach. Bioinformatics, 21(20), 3887–3895.CrossRefGoogle Scholar
  161. Shaw, L. J., Raggi, P., Berman, D. S., & Callister, T. Q. (2006). Coronary artery calcium as a measure of biologic age. Atherosclerosis, 188(1), 112–119.CrossRefGoogle Scholar
  162. Shin, H.-C., Roth, H. R., Gao, M., Lu, L., Xu, Z., Nogues, I., et al. (2016). Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Transactions on Medical Imaging, 35(5), 1285–1298.CrossRefGoogle Scholar
  163. Shivaswamy, P. K., Bhattacharyya, C., & Smola, A. J. (2006). Second order cone programming approaches for handling missing and uncertain data. Journal of Machine Learning Research, 7, 1283–1314.Google Scholar
  164. Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., et al. (2016). Mastering the game of go with deep neural networks and tree search. Nature, 529(7587), 484–489.CrossRefGoogle Scholar
  165. Sinha, S. R., Sullivan, L. R., Sabau, D., Orta, D. S. J., Dombrowski, K. E., Halford, J. J., et al. (2016). American clinical neurophysiology society guideline 1: Minimum technical requirements for performing clinical electroencephalography. The Neurodiagnostic Journal, 56(4), 235–244.CrossRefGoogle Scholar
  166. Skidmore, F., Korenkevych, D., Liu, Y., He, G., Bullmore, E., & Pardalos, P. M. (2011). Connectivity brain networks based on wavelet correlation analysis in parkinson fmri data. Neuroscience Letters, 499(1), 47–51.CrossRefGoogle Scholar
  167. Sosenko, J. M., Mahon, J., Rafkin, L., Lachin, J. M., Krause-Steinrauf, H., Krischer, J. P., et al. (2011). A comparison of the baseline metabolic profiles between diabetes prevention trial-type 1 and trialnet natural history study participants. Pediatric Diabetes, 12(2), 85–90.CrossRefGoogle Scholar
  168. Sporns, O., Honey, C. J., & Kötter, R. (2007). Identification and classification of hubs in brain networks. PloS ONE, 2(10), e1049.CrossRefGoogle Scholar
  169. Sporns, O., Tononi, G., & Edelman, G. M. (2000). Theoretical neuroanatomy: Relating anatomical and functional connectivity in graphs and cortical connection matrices. Cerebral Cortex, 10(2), 127–141.CrossRefGoogle Scholar
  170. Statnikov, A. (2011). A gentle introduction to support vector machines in biomedicine: Theory and methods (Vol. 1). World Scientific.Google Scholar
  171. Szklarczyk, D., Franceschini, A., Wyder, S., Forslund, K., Heller, D., Huerta-Cepas, J., et al. (2014). String v10: Protein–protein interaction networks, integrated over the tree of life. Nucleic Acids Research, 43(D1), D447–D452.CrossRefGoogle Scholar
  172. Tan, M., Wang, L., & Tsang, I. W. (2010). Learning sparse svm for feature selection on very high dimensional datasets. In Proceedings of the 27th international conference on machine learning (ICML-10) (pp. 1047–1054).Google Scholar
  173. Tang, G., & Qin, A. (2008). ECG de-noising based on empirical mode decomposition. In The 9th international conference for young computer scientists, 2008. ICYCS 2008 (pp. 903–906). IEEE.Google Scholar
  174. Targ, S., Almeida, D., & Lyman, K. (2016). Resnet in resnet: Generalizing residual architectures. arXiv preprintarXiv:1603.08029.Google Scholar
  175. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1), 267–288.Google Scholar
  176. Tsirka, V., Simos, P. G., Vakis, A., Kanatsouli, K., Vourkas, M., Erimaki, S., et al. (2011). Mild traumatic brain injury: Graph-model characterization of brain networks for episodic memory. International Journal of Psychophysiology, 79(2), 89–96.CrossRefGoogle Scholar
  177. van Buuren, S., & Groothuis-Oudshoorn, K. (2010). mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45, 1–68.Google Scholar
  178. van Grinsven, M. J. J. P., van Ginneken, B., Hoyng, C. B., Theelen, T., & Sánchez, C. I. (2016). Fast convolutional neural network training using selective data sampling: Application to hemorrhage detection in color fundus images. IEEE Transactions on Medical Imaging, 35(5), 1273–1284.CrossRefGoogle Scholar
  179. Vapnik, V. N., & Lerner, A. Y. (1963). Recognition of patterns with help of generalized portraits. Avtomat. i Telemekh, 24(6), 774–780.Google Scholar
  180. Vasconcelos, C. N., & Vasconcelos, B. N. (2017). Increasing deep learning melanoma classification by classical and expert knowledge based image transforms. CoRR, arXiv:abs/1702.07025.
  181. Waldrop, M. M. (2016). More than moore. Nature, 530(7589), 144–148.CrossRefGoogle Scholar
  182. Wang, W., Liu, Q.-H., Cai, S.-M., Tang, M., Braunstein, L. A., & Stanley, H. E. (2016). Suppressing disease spreading by using information diffusion on multiplex networks. Scientific Reports, 6, 29259.CrossRefGoogle Scholar
  183. Wang, X., Fan, N., & Pardalos, P. M. (2018). Robust chance-constrained support vector machines with second-order moment information. Annals of Operations Research, 263(1–2), 45–68.CrossRefGoogle Scholar
  184. Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of small-worldnetworks. Nature, 393(6684), 440.CrossRefGoogle Scholar
  185. Webb, A., & Kagadis, G. C. (2003). Introduction to biomedical imaging. Medical Physics, 30(8), 2267–2267.CrossRefGoogle Scholar
  186. White, J. G., Southgate, E., Thomson, J. N., & Brenner, S. (1986). The structure of the nervous system of the nematode caenorhabditis elegans. Philosophical Transaction of the Royal Society of London B Biology Science, 314(1165), 1–340.CrossRefGoogle Scholar
  187. Wong, H. R., Lindsell, C. J., Pettilä, V., Meyer, N. J., Thair, S. A., Karlsson, S., et al. (2014). A multibiomarker-based outcome risk stratification model for adult septic shock. Critical Care Medicine, 42(4), 781.CrossRefGoogle Scholar
  188. Wong, S. C., Gatt, A., Stamatescu, V., & McDonnell, M. D. (2016). Understanding data augmentation for classification: When to warp? In 2016 international conference on digital image computing: techniques and applications (DICTA) (pp. 1–6). IEEE.Google Scholar
  189. Xu, Y., Jia, R., Mou, L., Li, G., Chen, Y., Lu, Y., & Jin, Z. (2016). Improved relation classification by deep recurrent neural networks with data augmentation. In COLING.Google Scholar
  190. Yao, D. (2001). A method to standardize a reference of scalp EEG recordings to a point at infinity. Physiological Measurement, 22(4), 693.CrossRefGoogle Scholar
  191. Yu, Y., Su, R., Wang, L., Qi, W., & He, Z. (2010). Comparative QSAR modeling of antitumor activity of ARC-111 analogues using stepwise MLR, PLS, and ANN techniques. Medicinal Chemistry Research, 19(9), 1233–1244.CrossRefGoogle Scholar
  192. Zhang, D., Wang, Y., Zhou, L., Yuan, H., Shen, D., Initiative, A. D. N., et al. (2011). Multimodal classification of alzheimer’s disease and mild cognitive impairment. Neuroimage, 55(3), 856–867.CrossRefGoogle Scholar
  193. Zhao, X.-M., Li, X., Chen, L., & Aihara, K. (2008). Protein classification with imbalanced data. Proteins: Structure, Function, and Bioinformatics, 70(4), 1125–1132.CrossRefGoogle Scholar
  194. Zhou, J., Greicius, M. D., Gennatas, E. D., Growdon, M. E., Jang, J. Y., Rabinovici, G. D., et al. (2010). Divergent network connectivity changes in behavioural variant frontotemporal dementia and alzheimers disease. Brain, 133(5), 1352–1367.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Center for Applied OptimizationUniversity of FloridaGainesvilleUSA
  2. 2.Laboratory of Algorithms and Technologies for Network AnalysisNational Research University Higher School of EconomicsNizhny NovgorodRussia
  3. 3.Department of Business AdministrationTechnological Educational Institute of Central MacedoniaSerresGreece

Personalised recommendations