Skip to main content

Deep Mining from Omics Data

  • Protocol
  • First Online:
Data Mining Techniques for the Life Sciences

Abstract

Since the advent of high-throughput omics technologies, various molecular data such as genes, transcripts, proteins, and metabolites have been made widely available to researchers. This has afforded clinicians, bioinformaticians, statisticians, and data scientists the opportunity to apply their innovations in feature mining and predictive modeling to a rich data resource to develop a wide range of generalizable prediction models. What has become apparent over the last 10 years is that researchers have adopted deep neural networks (or “deep nets”) as their preferred paradigm of choice for complex data modeling due to the superiority of performance over more traditional statistical machine learning approaches, such as support vector machines. A key stumbling block, however, is that deep nets inherently lack transparency and are considered to be a “black box” approach. This naturally makes it very difficult for clinicians and other stakeholders to trust their deep learning models even though the model predictions appear to be highly accurate. In this chapter, we therefore provide a detailed summary of the deep net architectures typically used in omics research, together with a comprehensive summary of the notable “deep feature mining” techniques researchers have applied to open up this black box and provide some insights into the salient input features and why these models behave as they do. We group these techniques into the following three categories: (a) hidden layer visualization and interpretation; (b) input feature importance and impact evaluation; and (c) output layer gradient analysis. While we find that omics researchers have made some considerable gains in opening up the black box through interpretation of the hidden layer weights and node activations to identify salient input features, we highlight other approaches for omics researchers, such as employing deconvolutional network-based approaches and development of bespoke attribute impact measures to enable researchers to better understand the relationships between the input data and hidden layer representations formed and thus the output behavior of their deep nets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.cancer.gov/

  2. 2.

    https://www.genome.gov/

  3. 3.

    https://www.mskcc.org/

  4. 4.

    We direct the reader to the recent survey and guidelines produced by Zhang et al. [25] for a more detailed systematic review of deep learning technologies (including available software tools), their application to omics data, and good practice guidelines.

  5. 5.

    As shown in Fig. 1a, an additional “bias” weight, b, associated with a single unit set to a constant value of 1, is assigned to each processing element within the hidden and output layers of an MLP. The bias weight allows the activation function to “move” horizontally across the net input space (x-axis), whereas the standard weights w1wn transform the shape and direction of the activation function to enable it to pivot around the horizontal point, therefore allowing for more adaptive and effective learning [35].

  6. 6.

    The terms weights, weighted values, connections, and connection strengths will be used synonymously throughout this section.

  7. 7.

    The purpose of the hidden layers within an Auto-encoder is not to reproduce a precise copy of the higher dimensional input vector on the output layer. Rather it is meant to be a rough approximation (within an allowable error tolerance) that is less sensitive to variations within the training data, thereby filtering out noise and forcing the network to concentrate on covariances within the input data by nonlinear projections onto a lower dimensional space—akin to a form of nonlinear principal component analysis.

  8. 8.

    For example, training an MLP with back-propagation for 100,000 epochs to learn the MNIST handwritten digits benchmark data sets [47], consisting of 60,000 images for training and 10,000 for testing, would not be feasible on such multi-core serial processors due to insufficient memory and/or lengthy training times (months) caused by intractably slow processing [48].

  9. 9.

    The nVidia GTX580 GPU provides up to 512 cores and between 197.6 GLOPS and 1.581 TFLOPS floating point compute power.

  10. 10.

    Residual functions make stronger reference to layer inputs by allowing the architecture to include “jump connections” where the input from one layer can be provided as input to a hidden feature layer n layers ahead, where n > 1. This is represented in an equation as F(x) = H(x) − x, where H(x) represents the nonlinear hidden mapping we want to learn with input x through an array of stacked nonlinear hidden layers. H(x) is then reformulated as F(x) + x, where F(x) and x represent the stacked nonlinear layers and identity function, respectively.

  11. 11.

    Which is simply f(x) = x·sigmoid(x).

  12. 12.

    Hubel and Wiesel’s discovered that the cat’s visual system had locally sensitive, orientation-selective neurons.

  13. 13.

    Which could be a standard MLP hidden layer or a convolutional or pooling layer in a CNN.

  14. 14.

    A deconvolutional net or “deconvnet” uses the same filtering and pooling layers of a standard CNN but in reverse, so instead of mapping pixels to feature maps, it maps features to input pixels.

  15. 15.

    PCA is commonly used for dimensionality reduction by establishing a new set of variables, from the original set, that better captures the variation in the sample and can be viewed as rotations of the original variable set. There are fewer new variables than the original and these new variables are referred to as principal components and are uncorrelated and ordered by the fraction of the total information retained. A limitation is that there is an underlying linearity assumption of the relationship between the data and set of variables.

  16. 16.

    A logistic regression model establishes a relationship between a binary outcome or “response” variable and a group of feature or “predictor” variables. It models the logit-transformed probability as a linear relationship with the predictor variables—for more information please see [104, 105].

  17. 17.

    Actual probabilities can be recovered by simply exponentiating the log odds ratios.

  18. 18.

    Wrapper-based approaches train a machine learning model (e.g., Support Vector Machine, Logistic Regression, or Decision Tree) on a range of independent subsets of features and compare their predictive accuracy on a hold-out test set. The model’s error rate associated with the test set is the score given for the subset of features used. Given the iterative training procedure, the approach is computationally very intensive. Filter methods, on the other hand, use common statistical algorithms such as Pearson product-moment correlation coefficient and mutual information, to score each feature/class combination. Filter methods are much faster than wrapper-based approaches and can be used as a pre-cursor to the wrapper method to filter out irrelevant features before the iterative training process. A common such approach is Recursive Feature Elimination [110] where feature subsets are trained with Support Vector Machines and features with low weights are removed and the process repeated. See [103] for more information.

References

  1. Aliper A, Plis S, Artemov A, Ulloa A, Mamoshina P, Zhavoronkov A (2016) Deep learning applications for predicting pharmacological properties of drugs and drug repurposing using transcriptomic data. Mol Pharm 13(7):2524–2530

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828

    Article  PubMed  Google Scholar 

  3. Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT press, Cambrdige

    Google Scholar 

  4. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436

    Article  CAS  PubMed  Google Scholar 

  5. Horgan RP, Kenny LC (2011) ‘Omic’ technologies: genomics, transcriptomics, proteomics and metabolomics. Obstet Gynaecol 13(3):189–195

    Article  Google Scholar 

  6. Dziuda DM (2010) Data mining for genomics and proteomics: analysis of gene and protein expression data. Wiley, Hoboken, NJ

    Book  Google Scholar 

  7. Van’t Veer LJ, Dai H, Van De Vijver MJ, He YD, Hart AAM, Mao M, Peterse HL, Van Der Kooy K, Marton MJ, Witteveen AT et al (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871):530

    Article  Google Scholar 

  8. Abate-Shen C, Shen MM (2009) The prostate-cancer metabolome. Nature 457(7231):799–800

    Article  CAS  PubMed  Google Scholar 

  9. Azuaje F (2010) Bioinformatics and biomarker discovery. Wiley Online Library

    Google Scholar 

  10. Swan AL, Mobasheri A, Allaway D, Liddell S, Bacardit J (2013) Application of machine learning to proteomics data: classification and biomarker identification in postgenomics biology. Omics 17(12):595–610

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Urbanczyk-Wochniak E, Luedemann A, Kopka J, Selbig J, Roessner-Tunali U, Willmitzer L, Fernie AR (2003) Parallel analysis of transcript and metabolic profiles: a new approach in systems biology. EMBO Rep 4(10):989–993

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Joyce AR, Palsson BØ (2006) The model organism as a system: integrating ‘omics’ data sets. Nat Rev Mol Cell Biol 7:198–210

    Article  CAS  PubMed  Google Scholar 

  13. Alzubaidi A (2018) Challenges in developing prediction models for multimodal high-throughput biomedical data. In: Proceedings of SAI intelligent systems conference. Springer, New York, pp 1056–1069

    Google Scholar 

  14. Weinstein JN, Collisson EA, Mills GB, Mills Shaw KR, Ozenberger BA, Ellrott K, Shmulevich I, Sander C, Stuart JM, Cancer Genome Atlas Research Network et al (2013) The cancer genome atlas pan-cancer analysis project. Nat Genet 45(10):1113

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  15. Cerami E, Gao J, Dogrusoz U, Gross BE, Sumer SO, Aksoy BA, Jacobsen A, Byrne CJ, Heuer ML, Larsson E et al (2012) The cbio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data

    Google Scholar 

  16. Gao J, Aksoy BA, Dogrusoz U, Dresdner G, Gross B, Sumer SO, Sun Y, Jacobsen A, Sinha R, Larsson E et al (2013) Integrative analysis of complex cancer genomics and clinical profiles using the cbioportal. Sci Signal 6(269):pl1–pl1

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  17. AlQuraishi M (2019) Alphafold at casp13. Bioinformatics 35(22):4862–4865

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Alzubaidi A, Tepper J et al (2020) A novel deep mining model for effective knowledge discovery from omics data. Artif Intell Med 2020:101821

    Article  Google Scholar 

  19. Berest I, Arnold C, Reyes-Palomares A, Palla G, Rasmussen KD, Giles H, Bruch P-M, Huber W, Dietrich S, Helin K et al (2019) Quantification of differential transcription factor activity and multiomics-based classification into activators and repressors: diffTF. Cell Rep 29(10):3147–3159

    Article  CAS  PubMed  Google Scholar 

  20. Rumelhart DE, Hinton GE, Williams RJ (1985) Learning internal representations by error propagation. Technical report, California Univ, San Diego, La Jolla Inst for Cognitive Science

    Google Scholar 

  21. Angermueller C, Lee H, Reik W, Stegle O (2017) Accurate prediction of single-cell dna methylation states using deep learning. BioRxiv 055715

    Google Scholar 

  22. Kelley DR, Snoek J, Rinn JL (2016) Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res 26(7):990–999

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Patel-Murray NL, Adam M, Huynh N, Wassie BT, Milani P, Fraenkel E (2020) A multi-omics interpretable machine learning model reveals modes of action of small molecules. Sci Rep 10(1):1–14

    Article  CAS  Google Scholar 

  24. Zhou J, Troyanskaya OG (2015) Predicting effects of noncoding variants with deep learning–based sequence model. Nat Methods 12(10):931

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Zhang Z, Zhao Y, Liao X, Shi W, Li K, Zou Q, Peng S (2019) Deep learning in omics: a survey and guideline. Brief Funct Genomics 18(1):41–57

    Article  CAS  PubMed  Google Scholar 

  26. Danaee P, Ghaeini R, Hendrix DA (2017) A deep learning approach for cancer detection and relevant gene identification. In: Pacific symposium on biocomputing 2017. World Scientific, Singapore, pp 219–229

    Chapter  Google Scholar 

  27. Gomez-Verdejo V, Parrado-Hernández E, Tohka J (2019) Sign-consistency based variable importance for machine learning in brain imaging. Neuroinformatics 17(4):593–609

    Article  PubMed  PubMed Central  Google Scholar 

  28. Kim B, Wattenberg M, Gilmer J, Cai C, Wexler J, Viegas F et al (2018) Interpretability beyond feature attribution: quantitative testing with concept activation vectors (TCAV). In: International conference on machine learning, pp 2668–2677

    Google Scholar 

  29. Tepper JA, Shertil MS, Powell HM (2016) On the importance of sluggish state memory for learning long term dependency. Knowl-Based Syst 96:104–114

    Article  Google Scholar 

  30. van Aken B, Winter B, Loser A, Gers FA (2019) How does bert answer questions? A layer-wise analysis of transformer representations. In: Proceedings of the 28th ACM international conference on information and knowledge management, pp 1823–1832

    Chapter  Google Scholar 

  31. Tan J, Hammond JH, Hogan DA, Greene CS (2016) Adage-based integration of publicly available Pseudomonas aeruginosa gene expression data with denoising autoencoders illuminates microbe-host interactions. MSystems 1(1):e00025–e00015

    Article  PubMed  PubMed Central  Google Scholar 

  32. Tan J, Ung M, Cheng C, Greene CS (2014) Unsupervised feature construction and knowledge extraction from genome-wide assays of breast cancer with denoising autoencoders. In: Pacific symposium on biocomputing co-chairs. World Scientific, Singapore, pp 132–143

    Google Scholar 

  33. Simonyan K, Vedaldi A, Zisserman A (2013) Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv preprint. arXiv:1312.6034

    Google Scholar 

  34. Sundararajan M, Taly A, Yan Q (2017) Axiomatic attribution for deep networks. In: International conference on machine learning, pp 3319–3328. PMLR

    Google Scholar 

  35. Samarasinghe S (2006) Neural networks for applied sciences and engineering: from fundamentals to complex pattern recognition. Auerbach Publications, Boca Raton, FL

    Book  Google Scholar 

  36. Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In: ICML

    Google Scholar 

  37. Orojo O, Tepper J, McGinnity TM, Mahmud M (2020) Time sensitivity and self-organisation in multi-recurrent neural networks. In: 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, New York, pp 1–7

    Google Scholar 

  38. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536

    Article  Google Scholar 

  39. Bottou L, Gallinarí P (1991) A framework for the cooperation of learning algorithms. In: Advances in neural information processing systems, pp 781–788

    Google Scholar 

  40. Pearlmutter BA (1995) Gradient calculations for dynamic recurrent neural networks: a survey. IEEE Trans Neural Netw 6(5):1212–1228

    Article  CAS  PubMed  Google Scholar 

  41. Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366

    Article  Google Scholar 

  42. Hinton GE, Zemel RS (1994) Autoencoders, minimum description length, and Helmholtz free energy. Adv Neural Inf Proces Syst 6:3–10

    Google Scholar 

  43. Bengio Y, Lamblin P, Popovici D, Larochelle H (2007) Greedy layer-wise training of deep networks, pp 153–160

    Google Scholar 

  44. Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507

    Article  CAS  PubMed  Google Scholar 

  45. Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw 5(2):157–166

    Article  CAS  PubMed  Google Scholar 

  46. Hochreiter S, Bengio Y, Frasconi P, Schmidhuber J et al (2001) Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. IEEE, New York

    Google Scholar 

  47. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  48. Cireşan DC, Meier U, Gambardella LM, Schmidhubër J (2010) Deep, big, simple neural nets for handwritten digit recognition. Neural Comput 22(12):3207–3220

    Article  PubMed  Google Scholar 

  49. Hinton GE, Sejnowski TJ et al (1986) Learning and relearning in Boltzmann machines. In: Parallel distributed processing: explorations in the microstructure of cognition, vol 1, pp 282–317

    Google Scholar 

  50. Carreira-Perpinan MA, Hinton GE (2005) On contrastive divergence learning. Aistats 10:33–40

    Google Scholar 

  51. Zhang S, Zhou J, Hu H, Gong H, Chen L, Cheng C, Zeng J (2016) A deep learning framework for modeling structural features of rna-binding protein targets. Nucleic Acids Res 44(4):e32–e32

    Article  PubMed  Google Scholar 

  52. Mufti Mahmud M, Shamim Kaiser T, McGinnity M, Hussain A (2021) Deep learning in mining biological data. Cogn Comput 13(1):1–33

    Article  Google Scholar 

  53. Erhan D, Bengio Y, Courville A, Manzagol PA, Vincent P, Bengio S (2010) Why does unsupervised pre-training help deep learning? J Mach Learn Res 11:625–660

    Google Scholar 

  54. LeCun Y, Chopra S, Hadsell R, Ranzato M, Huang F (2006) A tutorial on energy-based learning. Predict Struct Data 1

    Google Scholar 

  55. Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol P-A (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11:3371–3408

    Google Scholar 

  56. Simard PY, Steinkraus D, Platt JC et al (2003) Best practices for convolutional neural networks applied to visual document analysis. In: ICDAR, vol 3. Citeseer, Pennsylvania

    Google Scholar 

  57. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

    Google Scholar 

  58. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034

    Google Scholar 

  59. Ramachandran P, Zoph B, Le QV (2017) Searching for activation functions. arXiv preprint. arXiv:1710.05941

    Google Scholar 

  60. Alipanahi B, Delong A, Weirauch MT, Frey BJ (2015) Predicting the sequence specificities of DNA- and RNA binding proteins by deep learning. Nat Biotechnol 33(8):831

    Article  CAS  PubMed  Google Scholar 

  61. Lanchantin J, Singh R, Lin Z, Qi Y (2016) Deep motif: visualizing genomic sequence classifications. arXiv preprint. arXiv:1605.01133

    Google Scholar 

  62. Min X, Chen N, Chen T, Jiang R (2016) Deepenhancer: predicting enhancers by convolutional neural networks. In: 2016 IEEE International conference on bioinformatics and biomedicine (BIBM). IEEE, New York, pp 637–644

    Chapter  Google Scholar 

  63. Poplin R, Chang P-C, Alexander D, Schwartz S, Colthurst T, Ku A, Newburger D, Dijamco J, Nguyen N, Afshar PT et al (2018) A universal SNP and small-indel variant caller using deep neural networks. Nat Biotechnol 36(10):983–987

    Article  CAS  PubMed  Google Scholar 

  64. Umarov RK, Solovyev VV (2017) Recognition of prokaryotic and eukaryotic promoters using convolutional deep learning neural networks. PloS One 12(2):e0171410

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  65. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia, pp 675–678

    Chapter  Google Scholar 

  66. Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M et al (2016) Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint. arXiv:1603.04467

    Google Scholar 

  67. Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L et al (2019) Pytorch: an imperative style, high-performance deep learning library. arXiv preprint. arXiv:1912.01703

    Google Scholar 

  68. Hubel DH, Wiesel TN (1962) Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J Physiol 160(1):106–154

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Sharma A, Vans E, Shigemizu D, Boroevich KA, Tsunoda T (2019) Deepinsight: a methodology to transform a non-image data to an image for convolution neural network architecture. Sci Rep 9(1):1–7

    Google Scholar 

  70. Hochreiter S, Schmidhubër J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  CAS  PubMed  Google Scholar 

  71. Gers FA, Schmidhuber J, Cummins F (2000) Learning to forget: continual prediction with LSTM. Neural Comput 12(10):2451–2471

    Article  CAS  PubMed  Google Scholar 

  72. Werbos PJ (1990) Backpropagation through time: what it does and how to do it. Proc IEEE 78(10):1550–1560

    Article  Google Scholar 

  73. Williams RJ, Peng J (1990) An efficient gradient-based algorithm for on-line training of recurrent network trajectories. Neural Comput 2(4):490–501

    Article  Google Scholar 

  74. Binner JM, Tino P, Tepper J, Anderson R, Jones B, Kendall G (2010) Does money matter in inflation forecasting? Phys A Stat Mech Appl 389(21):4793–4808

    Article  Google Scholar 

  75. Cao Q, Ewing BT, Thompson MA (2012) Forecasting wind speed with recurrent neural networks. Eur J Oper Res 221(1):148–154

    Article  Google Scholar 

  76. Dorffner G (1996) Neural networks for time series processing. Neural Netw World 4(6):447–468

    Google Scholar 

  77. Gers FA, Schmidhuber E (2001) LSTM recurrent networks learn simple context-free and context-sensitive languages. IEEE Trans Neural Netw 12(6):1333–1340

    Article  CAS  PubMed  Google Scholar 

  78. Ulbricht C (1994) Multi-recurrent networks for traffic forecasting. In: AAAI, pp 883–888

    Google Scholar 

  79. Sekhon A, Singh R, Qi Y (2018) Deepdiff: deep-learning for predicting differential gene expression from histone modifications. Bioinformatics 34(17):i891–i900

    Article  CAS  PubMed  Google Scholar 

  80. Karimi M, Wu D, Wang Z, Shen Y (2019) Deepaffinity: interpretable deep learning of compound–protein affinity through unified recurrent and convolutional neural networks. Bioinformatics 35(18):3329–3338

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Cho K, Van Merrienboer B, Bahdanau D, Bengio Y (2014) On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint. arXiv:1409.1259

    Google Scholar 

  82. Chung NC, Mirza B, Choi H, Wang J, Wang D, Ping P, Wang W (2019) Unsupervised classification of multi-omics data during cardiac remodeling using deep learning. Methods 166:66–73

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. Wright RE (1995) Logistic regression

    Google Scholar 

  84. Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B (1998) Support vector machines. IEEE Intell Syst Appl 13(4):18–28

    Article  Google Scholar 

  85. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  Google Scholar 

  86. Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 785–794

    Chapter  Google Scholar 

  87. Shao Y, Cheng Y, Shah RU, Weir CR, Bray BE, Zeng-Treitler Q (2021) Shedding light on the black box: explaining deep neural network prediction of clinical outcomes. J Med Syst 45(1):1–9

    Article  Google Scholar 

  88. Taigman Y, Ranzato MA (2014) Deepface: closing the gap to human-level performance in face verification. Facebook Research Publication, Menlo Park, CA

    Google Scholar 

  89. Wang M, Deng W (2018) Deep face recognition: a survey. CoRR, abs/1804.06655

    Google Scholar 

  90. Pomerleau DA (1989) Alvinn: an autonomous land vehicle in a neural network. Technical report, Carnegie-Mellon Univ, Pittsburgh, PA Artificial Intelligence and Psychology

    Google Scholar 

  91. Bojarski M, Testa DD, Dworakowski D, Firner B, Flepp B, Goyal P, Jackel LD, Monfort M, Muller U, Zhang J et al (2016) End to end learning for self-driving cars. arXiv preprint. arXiv:1604.07316

    Google Scholar 

  92. Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer, New York, pp 818–833

    Google Scholar 

  93. Zeiler MD, Taylor GW, Fergus R (2011) Adaptive deconvolutional networks for mid and high level feature learning. In: 2011 International conference on computer vision. IEEE, New York, pp 2018–2025

    Chapter  Google Scholar 

  94. Yosinski J, Clune J, Nguyen A, Fuchs T, Lipson H (2015) Understanding neural networks through deep visualization. arXiv preprint. arXiv:1506.06579

    Google Scholar 

  95. Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders machine learning. In: Proceedings of the twenty-fifth international conference (ICML 2008), Helsinki, Finland

    Google Scholar 

  96. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Article  Google Scholar 

  97. Cartling B (2008) On the implicit acquisition of a context-free grammar by a simple recurrent neural network. Neurocomputing 71(7–9):1527–1537

    Article  Google Scholar 

  98. Lee Giles C, Lawrence S, Tsoi AC (2001) Noisy time series prediction using recurrent neural networks and grammatical inference. Mach Learn 44(1):161–183

    Article  Google Scholar 

  99. Horne BG, Hush DR (1996) Bounds on the complexity of recurrent neural network implementations of finite state machines. Neural Netw 9(2):243–252

    Article  Google Scholar 

  100. Jacobsson H, Ziemke T (2005) Cryssmex, a novel rule extractor for recurrent neural networks: overview and case study. In: International conference on artificial neural networks. Springer, New York, pp 503–508

    Google Scholar 

  101. Kolen JF (1994) Fool’s gold: extracting finite state machines from recurrent network dynamics. In: Advances in neural information processing systems, pp 501–501

    Google Scholar 

  102. Won SH, Song I, Lee SY, Park CH (2010) Identification of finite state automata with a class of recurrent neural networks. IEEE Trans Neural Netw 21(9):1408–1421

    Article  PubMed  Google Scholar 

  103. Witten IH, Frank E, Hall MA (2011) Data mining. Practical machine learning tools and techniques. Morgan Kaufmann, Burlington, MA

    Google Scholar 

  104. Lemeshow S, Moeschberger ML (2005) Review of regression methods in biostatistics: linear, logistic, survival, and repeated measures models by Vittinghoff, Glidden, Shiboski, and McCulloch. Stata J 5(2):274–278

    Article  Google Scholar 

  105. Saarela M, Jauhiainen S (2021) Comparison of feature importance measures as explanations for classification models. SN Appl Sci 3(2):1–12

    Article  Google Scholar 

  106. Quinlan JR (2014) C4. 5: programs for machine learning. Elsevier, Amsterdam

    Google Scholar 

  107. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    Google Scholar 

  108. Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27(3):379–423

    Article  Google Scholar 

  109. Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc B Stat Methodol 67(2):301–320

    Article  Google Scholar 

  110. Guyon I, Elisseéff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182

    Google Scholar 

  111. Shrikumar A, Greenside P, Shcherbina A, Kundaje A (2016) Not just a black box: learning important features through propagating activation differences. arXiv preprint. arXiv:1605.01713

    Google Scholar 

  112. Alzubaidi AHA (2019) Evolutionary and deep mining models for effective biomarker discovery. PhD thesis, Nottingham Trent University

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abeer Alzubaidi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Alzubaidi, A., Tepper, J. (2022). Deep Mining from Omics Data. In: Carugo, O., Eisenhaber, F. (eds) Data Mining Techniques for the Life Sciences. Methods in Molecular Biology, vol 2449. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2095-3_15

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-2095-3_15

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-2094-6

  • Online ISBN: 978-1-0716-2095-3

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics