An improved feature transformation method using mutual information

Bassir, Seyed Milad; Akbari, Ahmad; Nassersharif, Babak

doi:10.1007/s10772-013-9211-7

An improved feature transformation method using mutual information

Published: 29 October 2013

Volume 17, pages 107–115, (2014)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Seyed Milad Bassir¹,
Ahmad Akbari¹ &
Babak Nassersharif¹

189 Accesses
Explore all metrics

Abstract

The feature transformation is a very important step in pattern recognition systems. A feature transformation matrix can be obtained using different criteria such as discrimination between classes or feature independence or mutual information between features and classes. The obtained matrix can also be used for feature reduction. In this paper, we propose a new method for finding a feature transformation-based on Mutual Information (MI). For this purpose, we suppose that the Probability Density Function (PDF) of features in classes is Gaussian, and then we use the gradient ascent to maximize the mutual information between features and classes. Experimental results show that the proposed MI projection consistently outperforms other methods for a variety of cases. In the UCI Glass database we improve the classification accuracy up to 7.95 %. Besides, the improvement of phoneme recognition rate is 3.55 % on TIMIT.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Joint Feature Transformation and Selection Based on Dempster-Shafer Theory

Adaptive Information-Theoretical Feature Selection for Pattern Classification

Feature Selection via Vectorizing Feature’s Discriminative Information

Notes

Probability Density Function.
Comprehensive Medicinal Chemistry.
Hidden Markov Model.
Mel Frequency Cepstral Coefficient.
Signal-to-Noise Ratio.
Signal-to-Noise Ratio.

References

Blake, C., Keogh, E., & Merz, C. J. (1998). In UCI repository of machine learning databases.
Google Scholar
Duda, R. O., & Hart, P. E. (2001). In Pattern classification (2nd ed.). New York: Wiley-Interscience.
Google Scholar
Fukunaga, K. (1990). Introduction to statistical pattern recognition. New York: Academic Press.
MATH Google Scholar
Hall, M., Frank, E., & Holmes, G. (2009). In The WEKA data mining software: an update.
Google Scholar
Hild, K. E. II, Erdogmus, D., Torkkola, K., & Principe, J. C. (2006). Feature extraction using information-theoretic learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(9), 1385–1392.
Article Google Scholar
Kumar, N., & Andreou, A. G. (1998). Heteroscedastic discriminant analysis and reduced rank HMM’s for improved speech recognition. Speech Communication, 26, 283–297.
Article Google Scholar
Lee, K., & Hon, H. (1988). Speaker-independent phone recognition using hidden Markov model. IEEE Transactions on Acoustics, Speech, and Signal.
Padmanabhan, M., & Dharanipragada, S. (2005). Maximizing information content in feature extraction. IEEE Transactions on Speech and Audio Processing, 13(4).
Siohan, O. (1998). On the robustness of linear discrimination analysis as a preprocessing step for noisy speech recognition. In IEEE international conference on acoustics, speech, and signal processing (pp. 125–128).
Google Scholar
Torkkola, K. (2003). Feature extraction by non-parametric mutual information maximization. Journal of Machine Learning Research, 3(7–8), 1415–1438.
MATH MathSciNet Google Scholar
Torkkola, K., & Campbell, W. M. (2000). Mutual information in learning feature transformations. In Proceedings of the 17th international conference on machine learning, Stanford, CA, USA (pp. 1015–1022).
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science Department, Iran University of Science and Technology, Tehran, Iran
Seyed Milad Bassir, Ahmad Akbari & Babak Nassersharif

Authors

Seyed Milad Bassir
View author publications
You can also search for this author in PubMed Google Scholar
Ahmad Akbari
View author publications
You can also search for this author in PubMed Google Scholar
Babak Nassersharif
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Seyed Milad Bassir.

Appendix

Proofs of the relation (8), (7) are presented here. From relation (8) we have:

$$ H_{upp} ( F ) = - \int_{F} P ( F ) \log p_{app} ( F )dF $$

(24)

Where

$$\begin{aligned} &{p ( F ) = \sum_{c} P_{c}p ( F| c )} \\ &{p_{app} ( F ) = \frac{1}{ ( 2\pi )^{n / 2}\vert S \vert ^{1 / 2}}}\\ &{\phantom{p_{app} ( F ) =}{}\times\exp \bigl( - ( 1 / 2 ) [ F - M ]^{T}S^{ - 1} [ F - M ] \bigr)} \end{aligned}$$

Therefore H _upp could rewrite as follows:

$$\begin{aligned} H_{upp}( F ) =& - \sum_{c} P_{c}\int_{F} p ( F\vert c ) \log \biggl( \frac{1}{ ( 2\pi )^{n / 2}\vert S \vert ^{1 / 2}} \\ &{}\times\exp \bigl( - ( 1 / 2 ) [ F - M ]^{T}S^{ - 1} [ F - M ] \bigr) \biggr) \end{aligned}$$

(25)

Using the principle of logarithms the above relation could be express as follows:

$$\begin{aligned} H_{upp} ( F ) =& - \sum_{c} P_{c} \int_{F} p ( F| c ) \biggl( - \frac{n}{2}\log ( 2\pi ) - \frac{1}{2}\log\vert S \vert \\ &{}-\frac{1}{2} [ F - M ]^{T}S^{ - 1} [ F - M ] \biggr) \\ =& \frac{1}{2}\sum_{c} P_{c} \biggl( n\log ( 2\pi ) + \log\vert S \vert \\ &{}+ \int_{F} p ( F| c ) [ F - M ]^{T}S^{ - 1} [ F - M ] \biggr) \\ =& \frac{1}{2}\sum_{c} P_{c} \bigl( n\log ( 2\pi ) + \log\vert S \vert \\ &{}+ \mathit{trace} \bigl( S^{ - 1} \bigl( S_{c} + ( M - M_{c} ) ( M - M_{c} )^{T} \bigr) \bigr) \bigr) \end{aligned}$$

(26)

Since ∑_c P _c=1 the equation can be express as follows:

$$\begin{aligned} &{\frac{1}{2} \biggl( n\log ( 2\pi ) + \log\vert S \vert} \\ &{\quad{} + \mathit{trace} \biggl( S^{ - 1} \biggl( \sum_{c} P_{c} \bigl( S_{c} + ( M - M_{c} ) ( M - M_{c} )^{T} \bigr) \biggr) \biggr) \biggr)} \end{aligned}$$

(27)

And finally we can come to the following relation for H _upp:

$$ H_{upp} ( F ) = \frac{n}{2}\log ( 2\pi ) + \frac{1}{2}\log\vert S \vert + \frac{n}{2} $$

(28)

The relation (7) could be proof in same way as above steps:

$$\begin{aligned} H ( F\vert C ) =& - \sum_{c} P_{c}\int_{F} p ( F| c )\log p ( F| c )dF \\ =& - \sum_{c} P_{c}\int _{F} p ( F| c )\log \biggl( \frac{1}{ ( 2\pi )^{n / 2}\vert S_{c} \vert ^{1 / 2}} \\ &{}\times\exp \bigl( - ( 1 / 2 ) [ F - M_{c} ]^{T}S_{c}^{ - 1} [ F - M_{c} ] \bigr) \biggr) \\ =&- \sum_{c} P_{c} \int _{F} p ( F| c ) \biggl( - \frac{n}{2}\log ( 2\pi ) - \frac{1}{2}\log\vert S_{c} \vert \\ &{}- \frac{1}{2} [ F - M_{c} ]^{T}S_{c}^{ - 1} [ F - M_{c} ] \biggr) \\ =& \frac{1}{2} \biggl( n\log ( 2\pi ) + \sum _{c} P_{c}\log\vert S_{c} \vert \\ &{}+ \mathit{trace} \biggl( S^{ - 1} \biggl( \sum_{c} P_{c} \bigl( S_{c} + ( M - M_{c} ) \\ &{}\times ( M - M_{c} )^{T} \bigr) \biggr) \biggr) \biggr) \\ =& \frac{n}{2}\log ( 2\pi ) + \frac{1}{2}\sum _{c} P_{c}\log\vert S_{c} \vert + \frac{n}{2} \\ =&\sum_{c} P_{c} \biggl[ \frac{n}{2}\log ( 2\pi ) + \frac {1}{2}\log \vert S_{c} \vert + \frac{n}{2} \biggr] \end{aligned}$$

(29)

Proof of the relation (23), which is the objective function of the LDA method, is presented here. From relation (12) we have:

$$ I_{upp} ( Y;c ) = \frac{1}{2}\log\bigl\vert WSW^{T} \bigr\vert - \frac{1}{2}\sum _{c} P_{c}\log\bigl\vert WS_{c}W^{T} \bigr\vert $$

(30)

LDA assume all the class covariance matrices are the same. Therefore, in the above relation if we assume all the class covariance S _c are same and equal to within-covariance matrix S _w then we can come to the following relation:

$$\begin{aligned} I_{upp} ( Y;c ) =& \frac{1}{2}\log\bigl\vert WSW^{T} \bigr\vert - \frac{1}{2}\log\bigl\vert WS_{w}W^{T} \bigr\vert \sum_{c} P_{c} \\ =& \frac{1}{2}\log\frac{\vert WSW^{T} \vert}{\vert WS_{w}W^{T} \vert} \end{aligned}$$

(31)

The above relation is the objective function of the LDA method.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bassir, S.M., Akbari, A. & Nassersharif, B. An improved feature transformation method using mutual information. Int J Speech Technol 17, 107–115 (2014). https://doi.org/10.1007/s10772-013-9211-7

Download citation

Received: 22 April 2013
Accepted: 07 October 2013
Published: 29 October 2013
Issue Date: June 2014
DOI: https://doi.org/10.1007/s10772-013-9211-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An improved feature transformation method using mutual information

Abstract

Access this article

Similar content being viewed by others

Joint Feature Transformation and Selection Based on Dempster-Shafer Theory

Adaptive Information-Theoretical Feature Selection for Pattern Classification

Feature Selection via Vectorizing Feature’s Discriminative Information

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An improved feature transformation method using mutual information

Abstract

Access this article

Similar content being viewed by others

Joint Feature Transformation and Selection Based on Dempster-Shafer Theory

Adaptive Information-Theoretical Feature Selection for Pattern Classification

Feature Selection via Vectorizing Feature’s Discriminative Information

Notes

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation