Skip to main content
Log in

An improved feature transformation method using mutual information

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

The feature transformation is a very important step in pattern recognition systems. A feature transformation matrix can be obtained using different criteria such as discrimination between classes or feature independence or mutual information between features and classes. The obtained matrix can also be used for feature reduction. In this paper, we propose a new method for finding a feature transformation-based on Mutual Information (MI). For this purpose, we suppose that the Probability Density Function (PDF) of features in classes is Gaussian, and then we use the gradient ascent to maximize the mutual information between features and classes. Experimental results show that the proposed MI projection consistently outperforms other methods for a variety of cases. In the UCI Glass database we improve the classification accuracy up to 7.95 %. Besides, the improvement of phoneme recognition rate is 3.55 % on TIMIT.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. Probability Density Function.

  2. Comprehensive Medicinal Chemistry.

  3. Hidden Markov Model.

  4. Mel Frequency Cepstral Coefficient.

  5. Signal-to-Noise Ratio.

  6. Signal-to-Noise Ratio.

References

  • Blake, C., Keogh, E., & Merz, C. J. (1998). In UCI repository of machine learning databases.

    Google Scholar 

  • Duda, R. O., & Hart, P. E. (2001). In Pattern classification (2nd ed.). New York: Wiley-Interscience.

    Google Scholar 

  • Fukunaga, K. (1990). Introduction to statistical pattern recognition. New York: Academic Press.

    MATH  Google Scholar 

  • Hall, M., Frank, E., & Holmes, G. (2009). In The WEKA data mining software: an update.

    Google Scholar 

  • Hild, K. E. II, Erdogmus, D., Torkkola, K., & Principe, J. C. (2006). Feature extraction using information-theoretic learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(9), 1385–1392.

    Article  Google Scholar 

  • Kumar, N., & Andreou, A. G. (1998). Heteroscedastic discriminant analysis and reduced rank HMM’s for improved speech recognition. Speech Communication, 26, 283–297.

    Article  Google Scholar 

  • Lee, K., & Hon, H. (1988). Speaker-independent phone recognition using hidden Markov model. IEEE Transactions on Acoustics, Speech, and Signal.

  • Padmanabhan, M., & Dharanipragada, S. (2005). Maximizing information content in feature extraction. IEEE Transactions on Speech and Audio Processing, 13(4).

  • Siohan, O. (1998). On the robustness of linear discrimination analysis as a preprocessing step for noisy speech recognition. In IEEE international conference on acoustics, speech, and signal processing (pp. 125–128).

    Google Scholar 

  • Torkkola, K. (2003). Feature extraction by non-parametric mutual information maximization. Journal of Machine Learning Research, 3(7–8), 1415–1438.

    MATH  MathSciNet  Google Scholar 

  • Torkkola, K., & Campbell, W. M. (2000). Mutual information in learning feature transformations. In Proceedings of the 17th international conference on machine learning, Stanford, CA, USA (pp. 1015–1022).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Seyed Milad Bassir.

Appendix

Appendix

Proofs of the relation (8), (7) are presented here. From relation (8) we have:

$$ H_{upp} ( F ) = - \int_{F} P ( F ) \log p_{app} ( F )dF $$
(24)

Where

$$\begin{aligned} &{p ( F ) = \sum_{c} P_{c}p ( F| c )} \\ &{p_{app} ( F ) = \frac{1}{ ( 2\pi )^{n / 2}\vert S \vert ^{1 / 2}}}\\ &{\phantom{p_{app} ( F ) =}{}\times\exp \bigl( - ( 1 / 2 ) [ F - M ]^{T}S^{ - 1} [ F - M ] \bigr)} \end{aligned}$$

Therefore H upp could rewrite as follows:

$$\begin{aligned} H_{upp}( F ) =& - \sum_{c} P_{c}\int_{F} p ( F\vert c ) \log \biggl( \frac{1}{ ( 2\pi )^{n / 2}\vert S \vert ^{1 / 2}} \\ &{}\times\exp \bigl( - ( 1 / 2 ) [ F - M ]^{T}S^{ - 1} [ F - M ] \bigr) \biggr) \end{aligned}$$
(25)

Using the principle of logarithms the above relation could be express as follows:

$$\begin{aligned} H_{upp} ( F ) =& - \sum_{c} P_{c} \int_{F} p ( F| c ) \biggl( - \frac{n}{2}\log ( 2\pi ) - \frac{1}{2}\log\vert S \vert \\ &{}-\frac{1}{2} [ F - M ]^{T}S^{ - 1} [ F - M ] \biggr) \\ =& \frac{1}{2}\sum_{c} P_{c} \biggl( n\log ( 2\pi ) + \log\vert S \vert \\ &{}+ \int_{F} p ( F| c ) [ F - M ]^{T}S^{ - 1} [ F - M ] \biggr) \\ =& \frac{1}{2}\sum_{c} P_{c} \bigl( n\log ( 2\pi ) + \log\vert S \vert \\ &{}+ \mathit{trace} \bigl( S^{ - 1} \bigl( S_{c} + ( M - M_{c} ) ( M - M_{c} )^{T} \bigr) \bigr) \bigr) \end{aligned}$$
(26)

Since ∑ c P c =1 the equation can be express as follows:

$$\begin{aligned} &{\frac{1}{2} \biggl( n\log ( 2\pi ) + \log\vert S \vert} \\ &{\quad{} + \mathit{trace} \biggl( S^{ - 1} \biggl( \sum_{c} P_{c} \bigl( S_{c} + ( M - M_{c} ) ( M - M_{c} )^{T} \bigr) \biggr) \biggr) \biggr)} \end{aligned}$$
(27)

And finally we can come to the following relation for H upp :

$$ H_{upp} ( F ) = \frac{n}{2}\log ( 2\pi ) + \frac{1}{2}\log\vert S \vert + \frac{n}{2} $$
(28)

The relation (7) could be proof in same way as above steps:

$$\begin{aligned} H ( F\vert C ) =& - \sum_{c} P_{c}\int_{F} p ( F| c )\log p ( F| c )dF \\ =& - \sum_{c} P_{c}\int _{F} p ( F| c )\log \biggl( \frac{1}{ ( 2\pi )^{n / 2}\vert S_{c} \vert ^{1 / 2}} \\ &{}\times\exp \bigl( - ( 1 / 2 ) [ F - M_{c} ]^{T}S_{c}^{ - 1} [ F - M_{c} ] \bigr) \biggr) \\ =&- \sum_{c} P_{c} \int _{F} p ( F| c ) \biggl( - \frac{n}{2}\log ( 2\pi ) - \frac{1}{2}\log\vert S_{c} \vert \\ &{}- \frac{1}{2} [ F - M_{c} ]^{T}S_{c}^{ - 1} [ F - M_{c} ] \biggr) \\ =& \frac{1}{2} \biggl( n\log ( 2\pi ) + \sum _{c} P_{c}\log\vert S_{c} \vert \\ &{}+ \mathit{trace} \biggl( S^{ - 1} \biggl( \sum_{c} P_{c} \bigl( S_{c} + ( M - M_{c} ) \\ &{}\times ( M - M_{c} )^{T} \bigr) \biggr) \biggr) \biggr) \\ =& \frac{n}{2}\log ( 2\pi ) + \frac{1}{2}\sum _{c} P_{c}\log\vert S_{c} \vert + \frac{n}{2} \\ =&\sum_{c} P_{c} \biggl[ \frac{n}{2}\log ( 2\pi ) + \frac {1}{2}\log \vert S_{c} \vert + \frac{n}{2} \biggr] \end{aligned}$$
(29)

Proof of the relation (23), which is the objective function of the LDA method, is presented here. From relation (12) we have:

$$ I_{upp} ( Y;c ) = \frac{1}{2}\log\bigl\vert WSW^{T} \bigr\vert - \frac{1}{2}\sum _{c} P_{c}\log\bigl\vert WS_{c}W^{T} \bigr\vert $$
(30)

LDA assume all the class covariance matrices are the same. Therefore, in the above relation if we assume all the class covariance S c are same and equal to within-covariance matrix S w then we can come to the following relation:

$$\begin{aligned} I_{upp} ( Y;c ) =& \frac{1}{2}\log\bigl\vert WSW^{T} \bigr\vert - \frac{1}{2}\log\bigl\vert WS_{w}W^{T} \bigr\vert \sum_{c} P_{c} \\ =& \frac{1}{2}\log\frac{\vert WSW^{T} \vert}{\vert WS_{w}W^{T} \vert} \end{aligned}$$
(31)

The above relation is the objective function of the LDA method.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bassir, S.M., Akbari, A. & Nassersharif, B. An improved feature transformation method using mutual information. Int J Speech Technol 17, 107–115 (2014). https://doi.org/10.1007/s10772-013-9211-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-013-9211-7

Keywords

Navigation