Abstract
The feature transformation is a very important step in pattern recognition systems. A feature transformation matrix can be obtained using different criteria such as discrimination between classes or feature independence or mutual information between features and classes. The obtained matrix can also be used for feature reduction. In this paper, we propose a new method for finding a feature transformation-based on Mutual Information (MI). For this purpose, we suppose that the Probability Density Function (PDF) of features in classes is Gaussian, and then we use the gradient ascent to maximize the mutual information between features and classes. Experimental results show that the proposed MI projection consistently outperforms other methods for a variety of cases. In the UCI Glass database we improve the classification accuracy up to 7.95 %. Besides, the improvement of phoneme recognition rate is 3.55 % on TIMIT.
Similar content being viewed by others
Notes
Probability Density Function.
Comprehensive Medicinal Chemistry.
Hidden Markov Model.
Mel Frequency Cepstral Coefficient.
Signal-to-Noise Ratio.
Signal-to-Noise Ratio.
References
Blake, C., Keogh, E., & Merz, C. J. (1998). In UCI repository of machine learning databases.
Duda, R. O., & Hart, P. E. (2001). In Pattern classification (2nd ed.). New York: Wiley-Interscience.
Fukunaga, K. (1990). Introduction to statistical pattern recognition. New York: Academic Press.
Hall, M., Frank, E., & Holmes, G. (2009). In The WEKA data mining software: an update.
Hild, K. E. II, Erdogmus, D., Torkkola, K., & Principe, J. C. (2006). Feature extraction using information-theoretic learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(9), 1385–1392.
Kumar, N., & Andreou, A. G. (1998). Heteroscedastic discriminant analysis and reduced rank HMM’s for improved speech recognition. Speech Communication, 26, 283–297.
Lee, K., & Hon, H. (1988). Speaker-independent phone recognition using hidden Markov model. IEEE Transactions on Acoustics, Speech, and Signal.
Padmanabhan, M., & Dharanipragada, S. (2005). Maximizing information content in feature extraction. IEEE Transactions on Speech and Audio Processing, 13(4).
Siohan, O. (1998). On the robustness of linear discrimination analysis as a preprocessing step for noisy speech recognition. In IEEE international conference on acoustics, speech, and signal processing (pp. 125–128).
Torkkola, K. (2003). Feature extraction by non-parametric mutual information maximization. Journal of Machine Learning Research, 3(7–8), 1415–1438.
Torkkola, K., & Campbell, W. M. (2000). Mutual information in learning feature transformations. In Proceedings of the 17th international conference on machine learning, Stanford, CA, USA (pp. 1015–1022).
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Proofs of the relation (8), (7) are presented here. From relation (8) we have:
Where
Therefore H upp could rewrite as follows:
Using the principle of logarithms the above relation could be express as follows:
Since ∑ c P c =1 the equation can be express as follows:
And finally we can come to the following relation for H upp :
The relation (7) could be proof in same way as above steps:
Proof of the relation (23), which is the objective function of the LDA method, is presented here. From relation (12) we have:
LDA assume all the class covariance matrices are the same. Therefore, in the above relation if we assume all the class covariance S c are same and equal to within-covariance matrix S w then we can come to the following relation:
The above relation is the objective function of the LDA method.
Rights and permissions
About this article
Cite this article
Bassir, S.M., Akbari, A. & Nassersharif, B. An improved feature transformation method using mutual information. Int J Speech Technol 17, 107–115 (2014). https://doi.org/10.1007/s10772-013-9211-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-013-9211-7