Skip to main content
Log in

A survey of deep network techniques all classifiers can adopt

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Deep neural networks (DNNs) have introduced novel and useful tools to the machine learning community. Other types of classifiers can potentially make use of these tools as well to improve their performance and generality. This paper reviews the current state of the art for deep learning classifier technologies that are being used outside of deep neural networks. Non-neural network classifiers can employ many components found in DNN architectures. In this paper, we review the feature learning, optimization, and regularization methods that form a core of deep network technologies. We then survey non-neural network learning algorithms that make innovative use of these methods to improve classification performance. Because many opportunities and challenges still exist, we discuss directions that can be pursued to expand the area of deep learning for a variety of classification algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  • Abdullah A, Veltkamp RC, Wiering MA (2009) An ensemble of deep support vector machines for image categorization. In: International conference of soft computing and pattern recognition, SoCPaR, pp 301–306

  • Agarap AF (2018) A neural network architecture combining gated recurrent unit (GRU) and support vector machine (SVM) for intrusion detection in network traffic data. In: Proceedings of the 10th international conference on machine learning and computing, ICMLC, ACM, pp 26–30

  • Alaverdyan Z, Jung J, Bouet R, Lartizien C (2020) Regularized siamese neural network for unsupervised outlier detection on brain multiparametric magnetic resonance imaging: Application to epilepsy lesion screening. Med Image Anal 60:101618

    Google Scholar 

  • Alvarez-Melis D, Jaakkola TS (2017) Tree-structured decoding with doubly-recurrent neural networks. In: International conference on learning representations, ICLR

  • Andrew G, Arora R, Bilmes JA, Livescu K (2013) Deep canonical correlation analysis. In: Proceedings of the 30th international conference on machine learning, ICML, vol 28, pp 1247–1255

  • Anguita D, Ghio A, Oneto L, Parra X, Reyes-Ortiz JL (2013) A public domain dataset for human activity recognition using smartphones. In: 21st European symposium on artificial neural networks, ESANN

  • Antoniou A, Storkey AJ, Edwards H (2017) Data augmentation generative adversarial networks. arXiv preprint: arXiv:1711.04340

  • Azevedo WW, Zanchettin C (2011) A MLP-SVM hybrid model for cursive handwriting recognition. In: The international joint conference on neural networks. IJCNN, IEEE, pp 843–850

  • Baumann P, Hochbaum DS, Yang YT (2019) A comparative study of the leading machine learning techniques and two new optimization algorithms. Eur J Oper Res 272(3):1041–1057

    MathSciNet  MATH  Google Scholar 

  • Bellili A, Gilloux M, Gallinari P (2001) An hybrid MLP-SVM handwritten digit recognizer. In: International conference on document analysis and recognition. IEEE Computer Society, ICDAR, pp 28–33

  • Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127

    MathSciNet  MATH  Google Scholar 

  • Bengio Y (2013) Deep learning of representations: looking forward. In: International conference of statistical language and speech processing, SLSP, vol 7978, pp 1–37

  • Blundell C, Cornebise J, Kavukcuoglu K, Wierstra D (2015) Weight uncertainty in neural networks. arXiv preprint: arXiv:1505.05424

  • Bowles C, Chen L, Guerrero R, Bentley P, Gunn RN, Hammers A, Dickie DA, del C Valdés Hernández M, Wardlaw JM, Rueckert D (2018) GAN augmentation: augmenting training data using generative adversarial networks. arXiv preprint: arXiv:1810.10863

  • Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    MATH  Google Scholar 

  • Breiman L (2000) Randomizing outputs to increase prediction accuracy. Mach Learn 40(3):229–242

    MATH  Google Scholar 

  • Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and Regression Trees. Wadsworth, Wadsworth

    MATH  Google Scholar 

  • Carreira-Perpiñán MÁ, Tavallali P (2018) Alternating optimization of decision trees, with application to learning sparse oblique trees. In: Advances in neural information processing systems, NeurIPS, pp 1219–1229

  • Caruana R, Niculescu-Mizil A (2006) An empirical comparison of supervised learning algorithms. In: Proceedings of the 23rd international conference on machine learning, ICML, ACM, vol 148, pp 161–168

  • Caruana R, Karampatziakis N, Yessenalina A (2008) An empirical evaluation of supervised learning in high dimensions. In: Proceedings of the 23rd international conference on machine learning, ICML, ACM, ACM international conference proceeding series, vol 307, pp 96–103

  • Caruana R, Lou Y, Gehrke J, Koch P, Sturm M, Elhadad N (2015) Intelligible models for healthcare: predicting pneumonia risk and hospital 30-day readmission. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 1721–1730

  • Cawley GC (2006) Leave-one-out cross-validation based model selection criteria for weighted ls-svms. In: Proceedings of the international joint conference on neural networks, IJCNN, pp 1661–1668

  • Chen C, Li O, Tao D, Barnett A, Rudin C, Su J (2019) This looks like that: deep learning for interpretable image recognition. In: Advances in neural information processing systems, NeurIPS, pp 8928–8939

  • Chen W, Hays J (2018) Sketchygan: towards diverse and realistic sketch to image synthesis. In: IEEE conference on computer vision and pattern recognition, CVPR, pp 9416–9425

  • Cho K, van Merrienboer B, Gülçehre Ç, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of the conference on empirical methods in natural language processing, EMNLP, ACL, pp 1724–1734

  • Cimino A, Dell’Orletta F (2016) Tandem LSTM-SVM approach for sentiment analysis. In: Proceedings of third Italian conference on computational linguistics, CLiC-it 2016, & fifth evaluation campaign of natural language processing and speech tools for Italian. Final Workshop, EVALITA, vol 1749

  • Cubuk ED, Zoph B, Mané D, Vasudevan V, Le QV (2018) Autoaugment: learning augmentation policies from data. arXiv preprint: arXiv:1805.09501

  • Dai Z, Damianou AC, González J, Lawrence ND (2016) Variational auto-encoded deep gaussian processes. In: International conference on learning representations, ICLR

  • Damianou A (2015) Deep gaussian processes and variational propagation of uncertainty. PhD thesis, University of Sheffield

  • Damianou AC, Lawrence ND (2013) Deep gaussian processes. In: Proceedings of the international conference on artificial intelligence and statistics, AISTATS, vol 31, pp 207–215

  • de Brébisson A, Simon É, Auvolat A, Vincent P, Bengio Y (2015) Artificial neural networks applied to taxi destination prediction. In: Proceedings of the ECML/PKDD, vol 1526

  • Deng H, Runger GC (2012) Feature selection via regularized trees. In: The international joint conference on neural networks. IJCNN, IEEE, pp 1–8

  • Deng H, Runger GC (2013) Gene selection with guided regularized random forest. Pattern Recognit 46(12):3483–3489

    Google Scholar 

  • Deng L, Yu D, Platt JC (2012) Scalable stacking and learning for building deep architectures. In: IEEE international conference on acoustics. Speech and signal processing, ICASSP, pp 2133–2136

  • Duchi JC, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res 12:2121–2159

    MathSciNet  MATH  Google Scholar 

  • Dunlop MM, Girolami MA, Stuart AM, Teckentrup AL (2018) How deep are deep Gaussian processes? J Mach Learn Res 19:54:1–54:46

    MathSciNet  MATH  Google Scholar 

  • Duvenaud D, Rippel O, Adams RP, Ghahramani Z (2014) Avoiding pathologies in very deep networks. In: Proceedings of the international conference on artificial intelligence and statistics, AISTATS, vol 33, pp 202–210

  • Eickholt J, Cheng J (2013) Dndisorder: predicting protein disorder using boosting and deep networks. BMC Bioinform 14:88

    Google Scholar 

  • Esteva A, Robicquet A, Ramsundar B, Kuleshov V, DePristo M, Chou K, Cui C, Corrado G, Thrun S, Dean J (2019) A guide to deep learning in healthcare. Nat Med 25(1):24–29

    Google Scholar 

  • Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360

    MathSciNet  MATH  Google Scholar 

  • Fayek HM, Lech M, Cavedon L (2017) Evaluating deep learning architectures for speech emotion recognition. Neural Netw 92:60–68

    Google Scholar 

  • Feng J, Zhou Z (2018) Autoencoder by forest. In: Proceedings of the AAAI conference on artificial intelligence, pp 2967–2973

  • Feng J, Yu Y, Zhou Z (2018) Multi-layered gradient boosting decision trees. In: Advances in neural information processing systems, NeurIPS, pp 3555–3565

  • Feng X, Jiang Y, Yang X, Du M, Li X (2019) Computer vision algorithms and hardware implementations: a survey. Integration 69:309–320

    Google Scholar 

  • Fortunato M, Blundell C, Vinyals O (2017) Bayesian recurrent neural networks. arXiv preprint: arXiv:1704.02798

  • Freund Y (1995) Boosting a weak learning algorithm by majority. Inf Comput 121(2):256–285

    MathSciNet  MATH  Google Scholar 

  • Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Annals of statistics pp 1189–1232

  • Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378

    MathSciNet  MATH  Google Scholar 

  • Frosst N, Hinton GE (2017) Distilling a neural network into a soft decision tree. In: Proceedings of the first international workshop on comprehensibility and explanation in AI and ML, vol 2071

  • Fung G, Mangasarian OL (2004) A feature selection newton method for support vector machine classification. Comput Optim Appl 28(2):185–202

    MathSciNet  MATH  Google Scholar 

  • Gjoreski M, Janko V, Slapnicar G, Mlakar M, Resçiç N, Bizjak J, Drobnic V, Marinko M, Mlakar N, Lustrek M, Gams M (2020) Classical and deep learning methods for recognizing human activities and modes of transportation with smartphone sensors. Inf Fusion 62:47–62

    Google Scholar 

  • Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville AC, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, NeurIPS, pp 2672–2680

  • Goodfellow IJ, Bengio Y, Courville AC (2016) Deep Learning. Adaptive computation and machine learning. MIT Press, Cambridge

    MATH  Google Scholar 

  • Graves A, Mohamed A, Hinton GE (2013) Speech recognition with deep recurrent neural networks. In: IEEE international conference on acoustics. speech and signal processing, ICASSP, IEEE, pp 6645–6649

  • Graves A, Wayne G, Danihelka I (2014) Neural turing machines. arXiv preprint: arXiv:1410.5401

  • Guo C, Gao J, Wang YY, Deng L, He X (2017) Context-sensitive search using a deep learning model. US Patent 9535960

  • Han J, Zhang D, Cheng G, Liu N, Xu D (2018) Advanced deep-learning techniques for salient and category-specific object detection: A survey. IEEE Signal Process Mag 35(1):84–100

    Google Scholar 

  • Hardoon DR, Szedmák S, Shawe-Taylor J (2004) Canonical correlation analysis: an overview with application to learning methods. Neural Comput 16(12):2639–2664

    MATH  Google Scholar 

  • Hatcher WG, Yu W (2018) A survey of deep learning: platforms, applications and emerging research trends. IEEE Access 6:24411–24432

    Google Scholar 

  • Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507

    MathSciNet  MATH  Google Scholar 

  • Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554

    MathSciNet  MATH  Google Scholar 

  • Ho TK (1995) Random decision forests. In: International conference on document analysis and recognition, ICDAR, pp 278–282

  • Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Google Scholar 

  • Hoffman J, Tzeng E, Park T, Zhu J, Isola P, Saenko K, Efros AA, Darrell T (2018) Cycada: cycle-consistent adversarial domain adaptation. In: Proceedings of the 35th international conference on machine learning, ICML, vol 80, pp 1994–2003

  • Hotelling H (1992) Relations between two sets of variates. In: Breakthroughs in statistics, Springer, pp 162–190

  • Huang D, Huang W, Yuan Z, Lin Y, Zhang J, Zheng L (2018) Image super-resolution algorithm based on an improved sparse autoencoder. Information 9(1):11

    Google Scholar 

  • Hung C, Chen W, Lai P, Lin C, Lee C (2017) Comparing deep neural network and other machine learning algorithms for stroke prediction in a large-scale population-based electronic medical claims database. In: 39th annual international conference of the IEEE engineering in medicine and biology society (EMBC), IEEE, pp 3110–3113

  • Hutchinson B, Deng L, Yu D (2013) Tensor deep stacking networks. IEEE Trans Pattern Anal Mach Intell 35(8):1944–1957

    Google Scholar 

  • Ioannou Y, Robertson DP, Zikic D, Kontschieder P, Shotton J, Brown M, Criminisi A (2016) Decision forests, convolutional networks and the models in-between. arXiv preprint: arXiv:1603.01250

  • Jaitly N, Hinton GE (2013) Vocal tract length perturbation (VTLP) improves speech recognition. In: Proceedings ICML workshop on deep learning for audio, speech and language, vol 117

  • Ji S, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231

    Google Scholar 

  • Jorge J, Vieco J, Paredes R, Sánchez J, Benedí J (2018) Empirical evaluation of variational autoencoders for data augmentation. In: Proceedings of the international joint conference on computer vision, imaging and computer graphics theory and applications, VISIGRAPP, pp 96–104

  • Józefowicz R, Vinyals O, Schuster M, Shazeer N, Wu Y (2016) Exploring the limits of language modeling. arXiv preprint: arXiv:1602.02410

  • Kadurin A, Aliper A, Kazennov A, Mamoshina P, Vanhaelen Q, Khrabrov K, Zhavoronkov A (2017) The cornucopia of meaningful leads: applying deep adversarial autoencoders for new molecule development in oncology. Oncotarget 8(7):10883

    Google Scholar 

  • Kalman RE (1960) A new approach to linear filtering and prediction problems. J Basic Eng 82(1):35–45

    MathSciNet  Google Scholar 

  • King RD, Feng C, Sutherland A (1995) Statlog: comparison of classification algorithms on large real-world problems. Appl Artif Intell Int J 9(3):289–333

    Google Scholar 

  • Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: International conference on learning representations, ICLR

  • Kingma DP, Welling M (2014) Auto-encoding variational bayes. In: International conference on learning representations, ICLR

  • Kontschieder P, Fiterau M, Criminisi A, Bulò SR (2015) Deep neural decision forests. In: IEEE International conference on computer vision, ICCV, pp 1467–1475

  • Krishnan RG, Shalit U, Sontag DA (2015) Deep kalman filters. arXiv preprint: arXiv:1511.05121

  • Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Tech. rep, Citeseer

  • Krizhevsky A, Nair V, Hinton G (2010) Cifar-10 (canadian institute for advanced research) http://www.cs.toronto.edu/~kriz/cifar.html

  • Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, NeurIPS, pp 1097–1105

  • Krueger D, Ballas N, Jastrzebski S, Arpit D, Kanwal MS, Maharaj T, Bengio E, Fischer A, Courville AC (2017) Deep nets don’t learn via memorization. In: International conference on learning representations, ICLR

  • LeCun Y (1998) The mnist database of handwritten digits http://yann.lecun.com/exdb/mnist/

  • LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436

    Google Scholar 

  • LeCun Y, et al. (1989) Generalization and network design strategies. Connectionism in perspective pp 143–155

  • Li O, Liu H, Chen C, Rudin C (2018) Deep learning for case-based reasoning through prototypes: a neural network that explains its predictions. In: Proceedings of the thirty-second AAAI conference on artificial intelligence, pp 3530–3537

  • Lim T, Loh W, Shih Y (2000) A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Mach Learn 40(3):203–228

    MATH  Google Scholar 

  • Liong VE, Lu J, Wang G (2013) Face recognition using deep PCA. In: International conference on information. communications & signal processing, ICICS, pp 1–5

  • Liu C, Nakagawa M (2001) Evaluation of prototype learning algorithms for nearest-neighbor classifier in application to handwritten character recognition. Pattern Recogn 34(3):601–615

    MATH  Google Scholar 

  • Liu X, Zou Y, Kong L, Diao Z, Yan J, Wang J, Li S, Jia P, You J (2018) Data augmentation via latent space interpolation for image classification. In: International conference on pattern recognition, ICPR, pp 728–733

  • Lu G, Ouyang W, Xu D, Zhang X, Gao Z, Sun M (2018) Deep kalman filtering network for video compression artifact reduction. Proc Eur Conf Computer Vis ECCV 11218:591–608

    Google Scholar 

  • Lucchese C, Nardini FM, Orlando S, Perego R, Trani S (2017) X-DART: blending dropout and pruning for efficient learning to rank. In: Proceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, ACM, pp 1077–1080

  • Mathy C, Derbinsky N, Bento J, Rosenthal J, Yedidia JS (2015) The boundary forest algorithm for online supervised and unsupervised learning. In: Proceedings of the twenty-ninth AAAI conference on artificial intelligence, pp 2864–2870

  • Mattos CLC, Dai Z, Damianou AC, Forth J, Barreto GA, Lawrence ND (2016) Recurrent gaussian processes. In: International conference on learning representations, ICLR

  • Memisevic R, Hinton GE (2007) Unsupervised learning of image transformations. In: IEEE computer society conference on computer vision and pattern recognition, CVPR

  • Mettes P, van der Pol E, Snoek C (2019) Hyperspherical prototype networks. In: Advances in neural information processing systems, NeurIPS, pp 1485–1495

  • Minsky M, Papert S (1987) Perceptrons—an introduction to computational geometry. MIT Press, Cambridge

    MATH  Google Scholar 

  • Mishkin D, Matas J (2016) All you need is a good init. In: International conference on learning representations, ICLR

  • Moghimi M, Belongie SJ, Saberian MJ, Yang J, Vasconcelos N, Li L (2016) Boosted convolutional neural networks. In: Proceedings of the British machine vision conference, BMVC

  • Nagi J, Caro GAD, Giusti A, Nagi F, Gambardella LM (2012) Convolutional neural support vector machines: hybrid visual pattern classifiers for multi-robot systems. In: International conference on machine learning and applications. ICMLA, IEEE, pp 27–32

  • Ndikumana A, Hong CS (2019) Self-driving car meets multi-access edge computing for deep learning-based caching. In: International conference on information networking. ICOIN, IEEE, pp 49–54

  • Nguyen AM, Yosinski J, Clune J (2015) Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. In: IEEE conference on computer vision and pattern recognition. IEEE Computer Society, CVPR, pp 427–436

  • Niu X, Suen CY (2012) A novel hybrid CNN-SVM classifier for recognizing handwritten digits. Pattern Recogn 45(4):1318–1325

    Google Scholar 

  • Nøkland A (2016) Direct feedback alignment provides learning in deep neural networks. In: Advances in neural information processing systems, NeurIPS, pp 1037–1045

  • Ohashi H, Al-Nasser M, Ahmed S, Akiyama T, Sato T, Nguyen P, Nakamura K, Dengel A (2017) Augmenting wearable sensor data with physical constraint for dnn-based human-action recognition. In: ICML 2017 times series workshop, pp 6–11

  • Park T, Liu M, Wang T, Zhu J (2019) Semantic image synthesis with spatially-adaptive normalization. In: IEEE conference on computer vision and pattern recognition, CVPR, pp 2337–2346

  • Paterakis NG, Mocanu E, Gibescu M, Stappers B, van Alst W (2017) Deep learning versus traditional machine learning methods for aggregated energy demand prediction. In: IEEE PES innovative smart grid technologies conference Europe. ISGT, IEEE, pp 1–6

  • Perez L, Wang J (2017) The effectiveness of data augmentation in image classification using deep learning. arXiv preprint: arXiv:1712.04621

  • Poole B, Sohl-Dickstein J, Ganguli S (2014) Analyzing noise in autoencoders and deep networks. arXiv preprint: arXiv:1406.1831

  • Pouyanfar S, Sadiq S, Yan Y, Tian H, Tao Y, Reyes MEP, Shyu M, Chen S, Iyengar SS (2019) A survey on deep learning: algorithms, techniques, and applications. ACM Comput Surv 51(5):92:1–92:36

    Google Scholar 

  • Radford A, Metz L, Chintala S (2016) Unsupervised representation learning with deep convolutional generative adversarial networks. In: International conference on learning representations, ICLR

  • Rashmi KV, Gilad-Bachrach R (2015) DART: dropouts meet multiple additive regression trees. In: proceedings of the eighteenth international conference on artificial intelligence and statistics, AISTATS, vol 38

  • Rasmussen CE (2003) Gaussian processes in machine learning. Summer School on Machine Learning, Springer 3176:63–71

    MATH  Google Scholar 

  • Reed SE, Akata Z, Yan X, Logeswaran L, Schiele B, Lee H (2016) Generative adversarial text to image synthesis. In: Proceedings of the international conference on machine learning, ICML, JMLR workshop and conference proceedings, vol 48, pp 1060–1069

  • Riad R, Dancette C, Karadayi J, Zeghidour N, Schatz T, Dupoux E (2018) Sampling strategies in siamese networks for unsupervised speech representation learning. In: Conference of the international speech communication association, ISCA, pp 2658–2662

  • Rios LM, Sahinidis NV (2013) Derivative-free optimization: a review of algorithms and comparison of software implementations. J Glob Optim 56(3):1247–1293

    MathSciNet  MATH  Google Scholar 

  • Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65(6):386

    Google Scholar 

  • Rumelhart DE, Hinton GE, Williams RJ (1985) Learning internal representations by error propagation. California Univ San Diego La Jolla Inst for Cognitive Science, Tech. rep

  • Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis IJCV 115(3):211–252

    MathSciNet  Google Scholar 

  • Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. In: Advances in neural information processing systems, NeurIPS, pp 3856–3866

  • Salzberg S (1994) Book review: C4.5: programs for machine learning by j. ross quinlan. Morgan Kaufmann publishers, inc., 1993. Machine Learning 16(3):235–240

  • Samuel AL (1959) Some studies in machine learning using the game of checkers. IBM J Res Dev 3(3):210–229

    MathSciNet  Google Scholar 

  • Sboev A, Moloshnikov I, Gudovskikh D, Selivanov A, Rybka R, Litvinova T (2018) Deep learning neural nets versus traditional machine learning in gender identification of authors of rusprofiling texts. Proc Comput Sci 123:424–431

    Google Scholar 

  • Shashua SD, Mannor S (2017) Deep robust kalman filter. arXiv preprint: arXiv:1703.02310

  • Shickel B, Tighe P, Bihorac A, Rashidi P (2018) Deep EHR: a survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE J Biomed Health Inf 22(5):1589–1604

    Google Scholar 

  • Shlens J (2014) A tutorial on principal component analysis. arXiv preprint: arXiv:1404.1100

  • Singh M, Bajpai U, Prasath S et al (2020) Generation of fashionable clothes using generative adversarial networks: A preliminary feasibility study. Int J Cloth Sci Technol 32(2):177–187

    Google Scholar 

  • Smagulova K, James AP (2019) A survey on lstm memristive neural network architectures and applications. Eur Phys J Spec Top 228(10):2313–2324

    Google Scholar 

  • Snell J, Swersky K, Zemel RS (2017) Prototypical networks for few-shot learning. In: Advances in neural information processing systems, NeurIPS, pp 4077–4087

  • Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  • Sutskever I, Hinton GE (2007) Learning multilevel distributed representations for high-dimensional sequences. In: Proceedings of the international conference on artificial intelligence and statistics, AISTATS, vol 2, pp 548–555

  • Sutskever I, Martens J, Dahl GE, Hinton GE (2013) On the importance of initialization and momentum in deep learning. In: Proceedings of the international conference on machine learning, ICML, vol 28, pp 1139–1147

  • Swersky K, Tarlow D, Sutskever I, Salakhutdinov R, Zemel RS, Adams RP (2012) Cardinality restricted boltzmann machines. In: Advances in neural information processing systems, NeurIPS, pp 3302–3310

  • Szegedy C, Liu W, Jia Y, Sermanet P, Reed SE, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: IEEE conference on computer vision and pattern recognition. IEEE Computer Society, CVPR, pp 1–9

  • Tai KS, Socher R, Manning CD (2015) Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing of the Asian federation of natural language processing, ACL, The Association for Computer Linguistics, pp 1556–1566

  • Taigman Y, Yang M, Ranzato M, Wolf L (2014) Deepface: Closing the gap to human-level performance in face verification. In: IEEE conference on computer vision and pattern recognition. IEEE Computer Society, CVPR, pp 1701–1708

  • Tang Y (2013) Deep learning using support vector machines. arXiv preprint: arXiv:1306.0239

  • Tang Y, Eliasmith C (2010) Deep networks for robust visual recognition. In: Proceedings of the international conference on machine learning, ICML, pp 1055–1062

  • Tang C, Wang Y, Xu J, Sun Y, Zhang B (2018) Efficient scenario generation of multiple renewable power plants considering spatial and temporal correlations. Appl Energy 221:348–357

    Google Scholar 

  • Tanno R, Arulkumaran K, Alexander DC, Criminisi A, Nori AV (2019) Adaptive neural trees. In: Proceedings of the international conference on machine learning, ICML, vol 97, pp 6166–6175

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Methodol, pp 267–288

  • Ting KM, Witten IH (1999) Issues in stacked generalization. J Artif Intell Res 10:271–289

    MATH  Google Scholar 

  • Utkin LV, Ryabinin MA (2018) A siamese deep forest. Knowledge-Based Systems 139:13–22

    Google Scholar 

  • van der Wilk M, Rasmussen CE, Hensman J (2017) Convolutional gaussian processes. In: Advances in neural information processing systems, NeurIPS, pp 2849–2858

  • Vincent P, Larochelle H, Bengio Y, Manzagol P (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the international conference of machine learning, ICML, vol 307, pp 1096–1103

  • Vinyals O, Jia Y, Deng L, Darrell T (2012) Learning with recursive perceptual representations. In: Advances in neural information processing systems, NeurIPS, pp 2834–2842

  • Wang SI, Manning CD (2012) Baselines and bigrams: simple, good sentiment and topic classification. In: Proceedings of the 50th annual meeting of the association for computational linguistics (Volume 2: Short Papers), pp 90–94

  • Wang SI, Manning CD (2013) Fast dropout training. In: Proceedings of the international conference on machine learning, ICML, vol 28, pp 118–126

  • Wang G, Zhang G, Choi K, Lu J (2019a) Deep additive least squares support vector machines for classification with model transfer. IEEE Trans Syst Man Cybern Syst 49(7):1527–1540

    Google Scholar 

  • Wang J, Chen Y, Hao S, Peng X, Hu L (2019b) Deep learning for sensor-based activity recognition: A survey. Pattern Recogn Lett 119:3–11

    Google Scholar 

  • Widrow B, Hoff ME (1960) Adaptive switching circuits. Stanford Univ CA Stanford Electronics Labs, Tech. rep

  • Wiering MA, Schomaker LR (2014) Multi-layer support vector machines. Regularization, optimization, kernels, and support vector machines p 457

  • Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259

    Google Scholar 

  • Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput 1(1):67–82

    Google Scholar 

  • Wong SC, Gatt A, Stamatescu V, McDonnell MD (2016) Understanding data augmentation for classification: When to warp? In: International conference on digital image computing: techniques and applications. DICTA, IEEE, pp 1–6

  • Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K, Klingner J, Shah A, Johnson M, Liu X, Kaiser L, Gouws S, Kato Y, Kudo T, Kazawa H, Stevens K, Kurian G, Patil N, Wang W, Young C, Smith J, Riesa J, Rudnick A, Vinyals O, Corrado G, Hughes M, Dean J (2016) Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint: arXiv:1609.08144

  • Xu R (2013) Improvements to random forest methodology. PhD thesis, Iowa State University

  • Yang H, Wu J (2012) Practical large scale classification with additive kernels. In: Proceedings of the Asian conference on machine learning, ACML, vol 25, pp 523–538

  • Yang F, Poostchi M, Yu H, Zhou Z, Silamut K, Yu J, Maude RJ, Jäger S, Antani SK (2020) Deep learning for smartphone-based malaria parasite detection in thick blood smears. IEEE J Biomed Health Inf 24(5):1427–1438

    Google Scholar 

  • Yang Y, Morillo IG, Hospedales TM (2018) Deep neural decision trees. arXiv preprint: arXiv:1806.06988

  • Yeh C, Wu W, Ko W, Wang YF (2017) Learning deep latent space for multi-label classification. In: Proceedings of AAAI conference on artificial intelligence, pp 2838–2844

  • Yin C, Zhu Y, Fei J, He X (2017) A deep learning approach for intrusion detection using recurrent neural networks. IEEE Access 5:21954–21961

    Google Scholar 

  • Yu D, Deng L (2011) Deep convex net: a scalable architecture for speech pattern classification. In: Annual conference of the international speech communication association, INTERSPEECH, pp 2285–2288

  • Zareapoor M, Shamsolmoali P, Jain DK, Wang H, Yang J (2018) Kernelized support vector machine with deep learning: an efficient approach for extreme multiclass dataset. Pattern Recogn Lett 115:4–13

    Google Scholar 

  • Zhang HH, Ahn J, Lin X, Park C (2006) Gene selection using support vector machines with non-convex penalty. Bioinformatics 22(1):88–95

    Google Scholar 

  • Zhang R, Isola P, Efros AA (2016) Colorful image colorization. In: European conference on computer vision, ECCV, vol 9907, pp 649–666

  • Zhang C, Bengio S, Hardt M, Recht B, Vinyals O (2017) Understanding deep learning requires rethinking generalization. In: International conference on learning representations, ICLR

  • Zhao T, Zhang B, He M, Zhang W, Zhou N, Yu J, Fan J (2018) Embedding visual hierarchy with deep networks for large-scale visual recognition. IEEE Trans Image Process 27(10):4740–4755

    MathSciNet  MATH  Google Scholar 

  • Zhou Y, Chellappa R (1988) Computation of optical flow using a neural network. In: Proceedings of international conference on neural networks, ICNN’88, IEEE, pp 71–78

  • Zhou Z, Feng J (2017) Deep forest: Towards an alternative to deep neural networks. In: Proceedings of the international joint conference on artificial intelligence (IJCAI), pp 3553–3559

  • Zoran D, Lakshminarayanan B, Blundell C (2017) Learning deep nearest neighbor representations using differentiable boundary trees. arXiv preprint: arXiv:1702.08833

  • Zuo Y, Avraham G, Drummond T (2018) Generative adversarial forests for better conditioned adversarial learning. arXiv preprint: arXiv:1805.05185

Download references

Acknowledgements

The authors would like to thank Tharindu Adikari, Chris Choy, Ji Feng, Yani Ioannou, Stanislaw Jastrzebski and Marco A. Wiering for their valuable assistance in providing code and additional implementation details of the algorithms that were evaluated in this paper. We would also like to thank Samaneh Aminikhanghahi and Tinghui Wang for their feedback and guidance on the methods described in this survey. This material is based upon work supported by the National Science Foundation under Grant No. 1543656.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alireza Ghods.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Responsible editor: Pierre Baldi.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ghods, A., Cook, D.J. A survey of deep network techniques all classifiers can adopt. Data Min Knowl Disc 35, 46–87 (2021). https://doi.org/10.1007/s10618-020-00722-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-020-00722-8

Keywords

Navigation