A Novel Connectionist-Oriented Feature Normalization Technique
Feature normalization is a topic of practical relevance in real-world applications of neural networks. Although the topic is sometimes overlooked, the success of connectionist models in difficult tasks may depend on a proper normalization of input features. As a matter of fact, the relevance of normalization is pointed out in classic pattern recognition literature. In addition, neural nets require input values that do not compromise numerical stability during the computation of partial derivatives of the nonlinearities. For instance, inputs to connectionist models should not exceed certain ranges, in order to avoid the phenomenon of “saturation” of sigmoids. This paper introduces a novel feature normalization technique that ensures values that are distributed over the (0,1) interval in a uniform manner. The normalization is obtained starting from an estimation of the probabilistic distribution of input features, followed by an evaluation (over the feature that has to be normalized) of a “mixture of Logistics” approximation of the cumulative distribution. The approach turns out to be compliant with the very nature of the neural network (it is realized via a mixture of sigmoids, that can be encapsulated within the network itself). Experiments on a real-world continuous speech recognition task show that the technique is effective, comparing favorably with some standard feature normalizations.
KeywordsSpeech Recognition Input Feature Feature Normalization Connectionist Model Sigmoid Activation Function
Unable to display preview. Download preview PDF.
- 1.Bourlard, H., Morgan, N.: Connectionist Speech Recognition. In: A Hybrid Approach, vol. 247. Kluwer Academic Publishers, Boston (1994)Google Scholar
- 6.Hall, A.V.: Group forming and discrimination with homogeneity functions. In: Cole, A.J. (ed.) Numerical Taxonomy, pp. 53–67. Academic Press, New York (1969)Google Scholar
- 10.Mood, A.M., Graybill, F.A., Boes, D.C.: Introduction to the Theory of Statistics., 3rd edn. McGraw-Hill International, Singapore (1974)Google Scholar
- 12.Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. In: Rumelhart, D.E., McClelland, J.L. (eds.) Parallel Distributed Processing, ch. 8, vol. 1, pp. 318–362. MIT Press, Cambridge (1986)Google Scholar
- 14.Trentin, E., Gori, M.: Robust combination of neural networks and hiddenMarkov models for speech recognition. IEEE Transactions on Neural Networks 14(6) (November 2003)Google Scholar