A Novel Connectionist-Oriented Feature Normalization Technique

  • Edmondo Trentin
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4132)


Feature normalization is a topic of practical relevance in real-world applications of neural networks. Although the topic is sometimes overlooked, the success of connectionist models in difficult tasks may depend on a proper normalization of input features. As a matter of fact, the relevance of normalization is pointed out in classic pattern recognition literature. In addition, neural nets require input values that do not compromise numerical stability during the computation of partial derivatives of the nonlinearities. For instance, inputs to connectionist models should not exceed certain ranges, in order to avoid the phenomenon of “saturation” of sigmoids. This paper introduces a novel feature normalization technique that ensures values that are distributed over the (0,1) interval in a uniform manner. The normalization is obtained starting from an estimation of the probabilistic distribution of input features, followed by an evaluation (over the feature that has to be normalized) of a “mixture of Logistics” approximation of the cumulative distribution. The approach turns out to be compliant with the very nature of the neural network (it is realized via a mixture of sigmoids, that can be encapsulated within the network itself). Experiments on a real-world continuous speech recognition task show that the technique is effective, comparing favorably with some standard feature normalizations.


Speech Recognition Input Feature Feature Normalization Connectionist Model Sigmoid Activation Function 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bourlard, H., Morgan, N.: Connectionist Speech Recognition. In: A Hybrid Approach, vol. 247. Kluwer Academic Publishers, Boston (1994)Google Scholar
  2. 2.
    Carmichael, J.W., George, J.A., Julius, R.S.: Finding natural clusters. Systematic Zoology 17, 144–150 (1968)CrossRefGoogle Scholar
  3. 3.
    Davis, S.B., Mermelstein, P.: Comparison of parametric representations of monosyllabic word recognition in continuously spoken sentences. IEEE Trans. On Acoustics, Speech and Signal Processing 28(4), 357–366 (1980)CrossRefGoogle Scholar
  4. 4.
    Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)MATHGoogle Scholar
  5. 5.
    Fukunaga, K.: Statistical Pattern Recognition, 2nd edn. Academic Press, San Diego (1990)MATHGoogle Scholar
  6. 6.
    Hall, A.V.: Group forming and discrimination with homogeneity functions. In: Cole, A.J. (ed.) Numerical Taxonomy, pp. 53–67. Academic Press, New York (1969)Google Scholar
  7. 7.
    Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)MATHGoogle Scholar
  8. 8.
    Lumelsky, V.: A combined algorithm for weighting the variables and clustering in the clustering problem. Pattern Recognition 15, 53–60 (1982)MATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    Merhav, N., Lee, C.H.: A minimax classification approach with application to robust speech recognition. IEEE Transactions on Speech and Audio Processing 1, 90–100 (1993)CrossRefGoogle Scholar
  10. 10.
    Mood, A.M., Graybill, F.A., Boes, D.C.: Introduction to the Theory of Statistics., 3rd edn. McGraw-Hill International, Singapore (1974)Google Scholar
  11. 11.
    Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286 (1989)CrossRefGoogle Scholar
  12. 12.
    Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. In: Rumelhart, D.E., McClelland, J.L. (eds.) Parallel Distributed Processing, ch. 8, vol. 1, pp. 318–362. MIT Press, Cambridge (1986)Google Scholar
  13. 13.
    Trentin, E., Gori, M.: Continuous speech recognition with a robust connectionist/ markovian hybrid model. In: Dorffner, G., Bischof, H., Hornik, K. (eds.) ICANN 2001. LNCS, vol. 2130, p. 577. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  14. 14.
    Trentin, E., Gori, M.: Robust combination of neural networks and hiddenMarkov models for speech recognition. IEEE Transactions on Neural Networks 14(6) (November 2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Edmondo Trentin
    • 1
  1. 1.Dipartimento di Ingegneria dell’InformazioneUniversità di SienaSienaItaly

Personalised recommendations