Post-training discriminative pruning for RBMs
- 155 Downloads
- 1 Citations
Abstract
One of the major challenges in the area of artificial neural networks is the identification of a suitable architecture for a specific problem. Choosing an unsuitable topology can exponentially increase the training cost, and even hinder network convergence. On the other hand, recent research indicates that larger or deeper nets can map the problem features into a more appropriate space, and thereby improve the classification process, thus leading to an apparent dichotomy. In this regard, it is interesting to inquire whether independent measures, such as mutual information, could provide a clue to finding the most discriminative neurons in a network. In the present work, we explore this question in the context of Restricted Boltzmann machines, by employing different measures to realize post-training pruning. The neurons which are determined by each measure to be the most discriminative, are combined and a classifier is applied to the ensuing network to determine its usefulness. We find that two measures in particular seem to be good indicators of the most discriminative neurons, producing savings of generally more than 50% of the neurons, while maintaining an acceptable error rate. Further, it is borne out that starting with a larger network architecture and then pruning is more advantageous than using a smaller network to begin with. Finally, a quantitative index is introduced which can provide information on choosing a suitable pruned network.
Keywords
Restricted Boltzmann machines Pruning Discriminative information Phoneme classification Emotion classificationNotes
Acknowledgements
The authors wish to thank to SEP and CONACyT (Program SEP-CONACyT CB-2012-01, No.182432) and the Universidad Autónoma Metropolitana from México; Universidad Nacional de Litoral (with PACT 2011 #58, CAI+D 2011 #58-511, PRODACT 2016 (FICH-UNL)), ANPCyT (with PICT 2015-0977) and Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) from Argentina, for their support. We also want to thank ELRA for supplying the, Emotional speech synthesis database, catalogue reference: ELRA-S0329.
Compliance with ethical standards
Conflict of interest:
The authors declare that they have no conflict of interest.
Ethical approval:
This article does not contain any studies with human participants or animals performed by any of the authors.
References
- Adell Mercado J, Bonafonte Cávez A, Escudero Mancebo D (2005) Analysis of prosodic features: towards modelling of emotional and pragmatic attributes of speech. Proces Leng Nat 35:277–283Google Scholar
- Albornoz EM, Sánchez-Gutiérrez M, Martinez-Licona F, Rufiner HL, Goddard J (2014) Spoken emotion recognition using deep learning. Springer, BerlinCrossRefGoogle Scholar
- Atwood J, Towsley D, Gile K, Jensen DD (2014) Learning to generate networks. In: Networks: from graphs to rich data, NIPS 2014 Workshop, MontrealGoogle Scholar
- Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127MathSciNetCrossRefzbMATHGoogle Scholar
- Berglund M, Raiko T, Cho K (2015) Measuring the usefulness of hidden units in Boltzmann machines with mutual information. Neural Netw 64:12–18CrossRefGoogle Scholar
- Borchert M, Dusterhoft A (2005) Emotions in speech—experiments with prosody and quality features in speech for use in categorical and dimensional emotion recognition environments. In: Proceedings of IEEE international conference on natural language processing and knowledge engineering (NLP-KE), pp 147–151Google Scholar
- Cao F, Liu B, Park DS (2013) Image classification based on effective extreme learning machine. Neurocomputing 102:90–97CrossRefGoogle Scholar
- Castellano G, Fanelli AM, Pelillo M (1997) An iterative pruning algorithm for feedforward neural networks. IEEE Trans Neural Netw 8(3):519–531CrossRefGoogle Scholar
- Chui CK, Montefusco L, Puccio L (eds) (2014) Wavelets: theory, algorithms, and applications, vol 5. Academic Press, San Diego, CaliforniaGoogle Scholar
- Du KL, Swamy M (2014) Neural Netw Stat Learn. Springer, LondonCrossRefGoogle Scholar
- Erhan D, Bengio Y, Courville A, Manzagol PA, Vincent P, Bengio S (2010) Why does unsupervised pre-training help deep learning? J Machi Learn Res 11:625–660MathSciNetzbMATHGoogle Scholar
- Eyben F, Wöllmer M, Schuller B (2010) Opensmile: the munich versatile and fast open-source audio feature extractor. In: Proceedings of the international conference on multimedia (MM ’10), ACM, New York, pp 1459–1462Google Scholar
- Garofolo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS (1993) DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1. NASA STI/Recon technical report No 93Google Scholar
- Guo XL, Wang HY, Glass DH (2012) A growing bayesian self-organizing map for data clustering. In: 2012 international conference on machine learning and cybernetics, vol 2, pp 708–713Google Scholar
- Hassibi B, Stork DG, Wolff GJ (1993) Optimal brain surgeon and general network pruning. In: 1993 IEEE international conference on neural networks, IEEE, pp 293–299Google Scholar
- Hassibi B, Stork DG, Wolff G, Watanabe T (1994) Optimal brain surgeon: extensions and performance comparisons. Adv Neural Inf Process Syst 6:263–270Google Scholar
- Haykin SS, Haykin SS, Haykin SS, Haykin SS (2009) Neural networks and learning machines, vol 3. Pearson Upper Saddle River, HobokenzbMATHGoogle Scholar
- Hegde S, Achary K, Shetty S (2015) Feature selection using Fisher’s ratio technique for automatic speech recognition. arXiv preprint arXiv:1505.03239
- Hinton GE (2002) Training products of experts by minimizing contrastive divergence. Neural Comput 14(8):1771–1800CrossRefzbMATHGoogle Scholar
- Hinton GE (2012) A practical guide to training restricted Boltzmann machines. Springer, BerlinCrossRefGoogle Scholar
- Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507MathSciNetCrossRefzbMATHGoogle Scholar
- Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554MathSciNetCrossRefzbMATHGoogle Scholar
- Hinton GE, Deng L, Yu D, Dahl GE, r Mohamed A, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath TN, Kingsbury B (2012) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process Mag 29(6):82–97CrossRefGoogle Scholar
- Hjelm RD, Calhoun VD, Salakhutdinov R, Allen EA, Adali T, Plis SM (2014) Restricted Boltzmann machines for neuroimaging: an application in identifying intrinsic networks. NeuroImage 96:245–260CrossRefGoogle Scholar
- Hozjan V, Kacic Z, Moreno A, Bonafonte A, Nogueiras A (2002) Interface databases: design and collection of a multilingual emotional speech database. In: Third language resources and evaluation conference LREC 2002, Las Palmas de Gran Canaria, SpainGoogle Scholar
- Huang FJ, Boureau YL, LeCun Y, et al (2007) Unsupervised learning of invariant feature hierarchies with applications to object recognition. In: 2007 IEEE conference on computer vision and pattern recognition, IEEE, pp 1–8Google Scholar
- Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1):489–501CrossRefGoogle Scholar
- Huang GB, Wang DH, Lan Y (2011) Extreme learning machines: a survey. Int J Mach Learn Cybern 2(2):107–122CrossRefGoogle Scholar
- Huang X, Acero A, Hon HW, Foreword By-Reddy R (2001) Spoken language processing: a guide to theory, algorithm, and system development. Prentice hall PTR, Upper Saddle RiverGoogle Scholar
- Hussain S, Alili AA (2016) A pruning approach to optimize synaptic connections and select relevant input parameters for neural network modelling of solar radiation. Appl Soft Comput 52:898–908CrossRefGoogle Scholar
- Jolliffe I (2002) Principal component analysis. Wiley, HobokenzbMATHGoogle Scholar
- Keselman H, Othman AR, Wilcox RR, Fradette K (2004) The new and improved two-sample t test. Psychol Sci 15(1):47–51CrossRefGoogle Scholar
- Kuremoto T, Kimura S, Kobayashi K, Obayashi M (2014) Time series forecasting using a deep belief network with restricted Boltzmann machines. Neurocomputing 137:47–56CrossRefGoogle Scholar
- Le QV (2013) Building high-level features using large scale unsupervised learning. In: 2013 IEEE international conference on acoustics, speech and signal processing, IEEE, pp 8595–8598Google Scholar
- LeCun Y, Denker JS, Solla SA, Howard RE, Jackel LD (1990) Optimal brain damage. NIPs 2:598–605Google Scholar
- Lee H, Grosse R, Ranganath R, Ng AY (2009) Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: Proceedings of the 26th annual international conference on machine learning (ICML ’09), ACM, New York, pp 609–616Google Scholar
- Lu B, Wang G, Yuan Y, Han D (2013) Semantic concept detection for video based on extreme learning machine. Neurocomputing 102:176–183CrossRefGoogle Scholar
- Martínez C, Goddard J, Milone D, Rufiner H (2012) Bioinspired sparse spectro-temporal representation of speech for robust classification. Comput Speech Lang 26(5):336–348CrossRefGoogle Scholar
- Michie D, Spiegelhalter D, Taylor C (1994) Machine learning, neural and statistical classification. Ellis Horwood, University College, LondonzbMATHGoogle Scholar
- Milone DH, Rubio AJ (2003) Prosodic and accentual information for automatic speech recognition. IEEE Trans Speech Audio Process 11(4):321–333CrossRefGoogle Scholar
- Pele O, Werman M (2009) Fast and robust earth mover’s distances. In: 2009 IEEE 12th international conference on computer vision, IEEE, pp 460–467Google Scholar
- Reed R (1993) Pruning algorithms: a survey. IEEE Trans Neural Netw 4(5):740–747CrossRefGoogle Scholar
- Rolon R, Di Persia L, Rufiner HL, Spies R (2014) Most discriminative atom selection for Apnea–Hypopnea events detection. In: Anales del VI Congreso Latinoamericano de Ingeniera Biomdica (CLAIB 2014), pp 709–712Google Scholar
- Sánchez-Gutiérrez ME, Albornoz EM, Martinez-Licona F, Rufiner HL, Goddard J (2014) Deep learning for emotional speech recognition. In: Martínez-Trinidad JF, Carrasco-Ochoa JA, Olvera-López JA, Salas-Rodríguez J, Suen CY (eds) Pattern recognition. MCPR 2014. Lecture notes in computer science, vol 8495. Springer, Cham, pp 311–320Google Scholar
- Sarikaya R, Hinton G, Deoras A (2014) Application of deep belief networks for natural language understanding. IEEE/ACM Trans Audio Speech Lang Process 22(4):778–784CrossRefGoogle Scholar
- Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
- Smolensky P (1986) Information processing in dynamical systems: foundations of harmony theory. Techical report. DTIC DocumentGoogle Scholar
- Stanley KO, Miikkulainen R (2002) Evolving neural networks through augmenting topologies. Evolut Comput 10(2):99–127, http://nn.cs.utexas.edu/?stanley:ec02
- Stevens KN (2000) Acoustic phonetics, vol 30. MIT press, CambridgeGoogle Scholar
- Sutskever I, Hinton GE (2007) Learning multilevel distributed representations for high-dimensional sequences. AISTATS 2:548–555Google Scholar
- Suzuki K, Horiba I, Sugie N (2001) A simple neural network pruning algorithm with application to filter synthesis. Neural Process Lett 13(1):43–53CrossRefzbMATHGoogle Scholar
- Taigman Y, Yang M, Ranzato M, Wolf L (2014) Deepface: closing the gap to human-level performance in face verification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1701–1708Google Scholar
- Vignolo LD, Rufiner HL, Milone DH (2016) Multi-objective optimisation of wavelet features for phoneme recognition. IET Signal Process 10(6):685–691CrossRefGoogle Scholar
- Wilcox RR (1995) Anova a paradigm for low power and misleading measures of effect size. Rev Edu Res 65(1):51–77CrossRefGoogle Scholar
- Zen H, Senior A (2014) Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 3844–3848Google Scholar