Soft Computing

, Volume 23, Issue 3, pp 767–781 | Cite as

Post-training discriminative pruning for RBMs

  • Máximo Sánchez-GutiérrezEmail author
  • Enrique M. Albornoz
  • Hugo L. Rufiner
  • John Goddard Close
Methodologies and Application


One of the major challenges in the area of artificial neural networks is the identification of a suitable architecture for a specific problem. Choosing an unsuitable topology can exponentially increase the training cost, and even hinder network convergence. On the other hand, recent research indicates that larger or deeper nets can map the problem features into a more appropriate space, and thereby improve the classification process, thus leading to an apparent dichotomy. In this regard, it is interesting to inquire whether independent measures, such as mutual information, could provide a clue to finding the most discriminative neurons in a network. In the present work, we explore this question in the context of Restricted Boltzmann machines, by employing different measures to realize post-training pruning. The neurons which are determined by each measure to be the most discriminative, are combined and a classifier is applied to the ensuing network to determine its usefulness. We find that two measures in particular seem to be good indicators of the most discriminative neurons, producing savings of generally more than 50% of the neurons, while maintaining an acceptable error rate. Further, it is borne out that starting with a larger network architecture and then pruning is more advantageous than using a smaller network to begin with. Finally, a quantitative index is introduced which can provide information on choosing a suitable pruned network.


Restricted Boltzmann machines Pruning Discriminative information Phoneme classification Emotion classification 



The authors wish to thank to SEP and CONACyT (Program SEP-CONACyT CB-2012-01, No.182432) and the Universidad Autónoma Metropolitana from México; Universidad Nacional de Litoral (with PACT 2011 #58, CAI+D 2011 #58-511, PRODACT 2016 (FICH-UNL)), ANPCyT (with PICT 2015-0977) and Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET) from Argentina, for their support. We also want to thank ELRA for supplying the, Emotional speech synthesis database, catalogue reference: ELRA-S0329.

Compliance with ethical standards

Conflict of interest:

The authors declare that they have no conflict of interest.

Ethical approval:

This article does not contain any studies with human participants or animals performed by any of the authors.


  1. Adell Mercado J, Bonafonte Cávez A, Escudero Mancebo D (2005) Analysis of prosodic features: towards modelling of emotional and pragmatic attributes of speech. Proces Leng Nat 35:277–283Google Scholar
  2. Albornoz EM, Sánchez-Gutiérrez M, Martinez-Licona F, Rufiner HL, Goddard J (2014) Spoken emotion recognition using deep learning. Springer, BerlinCrossRefGoogle Scholar
  3. Atwood J, Towsley D, Gile K, Jensen DD (2014) Learning to generate networks. In: Networks: from graphs to rich data, NIPS 2014 Workshop, MontrealGoogle Scholar
  4. Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127MathSciNetCrossRefzbMATHGoogle Scholar
  5. Berglund M, Raiko T, Cho K (2015) Measuring the usefulness of hidden units in Boltzmann machines with mutual information. Neural Netw 64:12–18CrossRefGoogle Scholar
  6. Borchert M, Dusterhoft A (2005) Emotions in speech—experiments with prosody and quality features in speech for use in categorical and dimensional emotion recognition environments. In: Proceedings of IEEE international conference on natural language processing and knowledge engineering (NLP-KE), pp 147–151Google Scholar
  7. Cao F, Liu B, Park DS (2013) Image classification based on effective extreme learning machine. Neurocomputing 102:90–97CrossRefGoogle Scholar
  8. Castellano G, Fanelli AM, Pelillo M (1997) An iterative pruning algorithm for feedforward neural networks. IEEE Trans Neural Netw 8(3):519–531CrossRefGoogle Scholar
  9. Chui CK, Montefusco L, Puccio L (eds) (2014) Wavelets: theory, algorithms, and applications, vol 5. Academic Press, San Diego, CaliforniaGoogle Scholar
  10. Du KL, Swamy M (2014) Neural Netw Stat Learn. Springer, LondonCrossRefGoogle Scholar
  11. Erhan D, Bengio Y, Courville A, Manzagol PA, Vincent P, Bengio S (2010) Why does unsupervised pre-training help deep learning? J Machi Learn Res 11:625–660MathSciNetzbMATHGoogle Scholar
  12. Eyben F, Wöllmer M, Schuller B (2010) Opensmile: the munich versatile and fast open-source audio feature extractor. In: Proceedings of the international conference on multimedia (MM ’10), ACM, New York, pp 1459–1462Google Scholar
  13. Garofolo JS, Lamel LF, Fisher WM, Fiscus JG, Pallett DS (1993) DARPA TIMIT acoustic-phonetic continous speech corpus CD-ROM. NIST speech disc 1-1.1. NASA STI/Recon technical report No 93Google Scholar
  14. Guo XL, Wang HY, Glass DH (2012) A growing bayesian self-organizing map for data clustering. In: 2012 international conference on machine learning and cybernetics, vol 2, pp 708–713Google Scholar
  15. Hassibi B, Stork DG, Wolff GJ (1993) Optimal brain surgeon and general network pruning. In: 1993 IEEE international conference on neural networks, IEEE, pp 293–299Google Scholar
  16. Hassibi B, Stork DG, Wolff G, Watanabe T (1994) Optimal brain surgeon: extensions and performance comparisons. Adv Neural Inf Process Syst 6:263–270Google Scholar
  17. Haykin SS, Haykin SS, Haykin SS, Haykin SS (2009) Neural networks and learning machines, vol 3. Pearson Upper Saddle River, HobokenzbMATHGoogle Scholar
  18. Hegde S, Achary K, Shetty S (2015) Feature selection using Fisher’s ratio technique for automatic speech recognition. arXiv preprint arXiv:1505.03239
  19. Hinton GE (2002) Training products of experts by minimizing contrastive divergence. Neural Comput 14(8):1771–1800CrossRefzbMATHGoogle Scholar
  20. Hinton GE (2012) A practical guide to training restricted Boltzmann machines. Springer, BerlinCrossRefGoogle Scholar
  21. Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507MathSciNetCrossRefzbMATHGoogle Scholar
  22. Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554MathSciNetCrossRefzbMATHGoogle Scholar
  23. Hinton GE, Deng L, Yu D, Dahl GE, r Mohamed A, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath TN, Kingsbury B (2012) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process Mag 29(6):82–97CrossRefGoogle Scholar
  24. Hjelm RD, Calhoun VD, Salakhutdinov R, Allen EA, Adali T, Plis SM (2014) Restricted Boltzmann machines for neuroimaging: an application in identifying intrinsic networks. NeuroImage 96:245–260CrossRefGoogle Scholar
  25. Hozjan V, Kacic Z, Moreno A, Bonafonte A, Nogueiras A (2002) Interface databases: design and collection of a multilingual emotional speech database. In: Third language resources and evaluation conference LREC 2002, Las Palmas de Gran Canaria, SpainGoogle Scholar
  26. Huang FJ, Boureau YL, LeCun Y, et al (2007) Unsupervised learning of invariant feature hierarchies with applications to object recognition. In: 2007 IEEE conference on computer vision and pattern recognition, IEEE, pp 1–8Google Scholar
  27. Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1):489–501CrossRefGoogle Scholar
  28. Huang GB, Wang DH, Lan Y (2011) Extreme learning machines: a survey. Int J Mach Learn Cybern 2(2):107–122CrossRefGoogle Scholar
  29. Huang X, Acero A, Hon HW, Foreword By-Reddy R (2001) Spoken language processing: a guide to theory, algorithm, and system development. Prentice hall PTR, Upper Saddle RiverGoogle Scholar
  30. Hussain S, Alili AA (2016) A pruning approach to optimize synaptic connections and select relevant input parameters for neural network modelling of solar radiation. Appl Soft Comput 52:898–908CrossRefGoogle Scholar
  31. Jolliffe I (2002) Principal component analysis. Wiley, HobokenzbMATHGoogle Scholar
  32. Keselman H, Othman AR, Wilcox RR, Fradette K (2004) The new and improved two-sample t test. Psychol Sci 15(1):47–51CrossRefGoogle Scholar
  33. Kuremoto T, Kimura S, Kobayashi K, Obayashi M (2014) Time series forecasting using a deep belief network with restricted Boltzmann machines. Neurocomputing 137:47–56CrossRefGoogle Scholar
  34. Le QV (2013) Building high-level features using large scale unsupervised learning. In: 2013 IEEE international conference on acoustics, speech and signal processing, IEEE, pp 8595–8598Google Scholar
  35. LeCun Y, Denker JS, Solla SA, Howard RE, Jackel LD (1990) Optimal brain damage. NIPs 2:598–605Google Scholar
  36. Lee H, Grosse R, Ranganath R, Ng AY (2009) Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: Proceedings of the 26th annual international conference on machine learning (ICML ’09), ACM, New York, pp 609–616Google Scholar
  37. Lu B, Wang G, Yuan Y, Han D (2013) Semantic concept detection for video based on extreme learning machine. Neurocomputing 102:176–183CrossRefGoogle Scholar
  38. Martínez C, Goddard J, Milone D, Rufiner H (2012) Bioinspired sparse spectro-temporal representation of speech for robust classification. Comput Speech Lang 26(5):336–348CrossRefGoogle Scholar
  39. Michie D, Spiegelhalter D, Taylor C (1994) Machine learning, neural and statistical classification. Ellis Horwood, University College, LondonzbMATHGoogle Scholar
  40. Milone DH, Rubio AJ (2003) Prosodic and accentual information for automatic speech recognition. IEEE Trans Speech Audio Process 11(4):321–333CrossRefGoogle Scholar
  41. Pele O, Werman M (2009) Fast and robust earth mover’s distances. In: 2009 IEEE 12th international conference on computer vision, IEEE, pp 460–467Google Scholar
  42. Reed R (1993) Pruning algorithms: a survey. IEEE Trans Neural Netw 4(5):740–747CrossRefGoogle Scholar
  43. Rolon R, Di Persia L, Rufiner HL, Spies R (2014) Most discriminative atom selection for Apnea–Hypopnea events detection. In: Anales del VI Congreso Latinoamericano de Ingeniera Biomdica (CLAIB 2014), pp 709–712Google Scholar
  44. Sánchez-Gutiérrez ME, Albornoz EM, Martinez-Licona F, Rufiner HL, Goddard J (2014) Deep learning for emotional speech recognition. In: Martínez-Trinidad JF, Carrasco-Ochoa JA, Olvera-López JA, Salas-Rodríguez J, Suen CY (eds) Pattern recognition. MCPR 2014. Lecture notes in computer science, vol 8495. Springer, Cham, pp 311–320Google Scholar
  45. Sarikaya R, Hinton G, Deoras A (2014) Application of deep belief networks for natural language understanding. IEEE/ACM Trans Audio Speech Lang Process 22(4):778–784CrossRefGoogle Scholar
  46. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
  47. Smolensky P (1986) Information processing in dynamical systems: foundations of harmony theory. Techical report. DTIC DocumentGoogle Scholar
  48. Stanley KO, Miikkulainen R (2002) Evolving neural networks through augmenting topologies. Evolut Comput 10(2):99–127,
  49. Stevens KN (2000) Acoustic phonetics, vol 30. MIT press, CambridgeGoogle Scholar
  50. Sutskever I, Hinton GE (2007) Learning multilevel distributed representations for high-dimensional sequences. AISTATS 2:548–555Google Scholar
  51. Suzuki K, Horiba I, Sugie N (2001) A simple neural network pruning algorithm with application to filter synthesis. Neural Process Lett 13(1):43–53CrossRefzbMATHGoogle Scholar
  52. Taigman Y, Yang M, Ranzato M, Wolf L (2014) Deepface: closing the gap to human-level performance in face verification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1701–1708Google Scholar
  53. Vignolo LD, Rufiner HL, Milone DH (2016) Multi-objective optimisation of wavelet features for phoneme recognition. IET Signal Process 10(6):685–691CrossRefGoogle Scholar
  54. Wilcox RR (1995) Anova a paradigm for low power and misleading measures of effect size. Rev Edu Res 65(1):51–77CrossRefGoogle Scholar
  55. Zen H, Senior A (2014) Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 3844–3848Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2017

Authors and Affiliations

  1. 1.Departamento de Ingeniería EléctricaUniversidad Autónoma MetropolitanaIztapalapaMexico
  2. 2.Instituto de Investigacin en Señales, Sistemas e Inteligencia Computacional, sinc(i), FICH-UNL-CONICETCiudad UniversitariaSanta FeArgentina
  3. 3.Laboratorio de Cibernética, Facultad de IngenieríaUNEROro VerdeArgentina

Personalised recommendations