Information-Theoretic Methods

Torkkola, Kari

doi:10.1007/978-3-540-35488-8_7

Kari Torkkola⁶

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 207))

9097 Accesses
4 Citations

Abstract

Shannon’s seminal work on information theory provided the conceptual framework for communication through noisy channels (Shannon, 1948). This work, quantifying the information content of coded messages, established the basis for all current systems aiming to transmit information through any medium.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

M.E. Aladjem. Nonparametric discriminant analysis via recursive optimization of Patrick-Fisher distance. IEEE Transactions on Systems, Man, and Cybernetics, 28(2):292–299, April 1998.
Article Google Scholar
C. Aliferis, I. Tsamardinos, and A. Statnikov. HITON, a novel Markov blanket algorithm for optimal variable selection. In Proceedings of the 2003 American Medical Informatics Association (AMIA) Annual Symposium, pages 21–25, Washington, DC, USA, November 8–12 2003.
Google Scholar
A. Antos, L. Devroye, and L. Gyorfi. Lower bounds for Bayes error estimation. IEEE Transactions on PAMI, 21(7):643–645, July 1999.
Google Scholar
A. Banerjee, I. Dhillon, J. Ghosh, and S. Merugu. An information theoretic analysis of maximum likelihood mixture estimation for exponential families. In Proc. International Conference on Machine Learning (ICML), pages 57–64, Banff, Canada, July 2004.
Google Scholar
R. Battiti. Using mutual information for selecting features in supervised neural net learning. Neural Networks, 5(4):537–550, July 1994.
Article Google Scholar
S. Becker. Mutual information maximization: Models of cortical self-organization. Network: Computation in Neural Systems, 7(1), February 1996.
Google Scholar
L. Breiman. Random forests. Machine Learning, 45(1):5–32, 2001.
Article MATH Google Scholar
L. Breiman, J.F. Friedman, R.A. Olshen, and P.J. Stone. Classification and regression trees. Wadsworth International Group, Belmont, CA, 1984.
MATH Google Scholar
G. Chechik, A. Globerson, N. Tishby, and Y. Weiss. Information bottleneck for gaussian variables. Journal of Machine Learning Research, 6:168–188, 2005.
MathSciNet Google Scholar
P.A. Devijver and J. Kittler. Pattern recognition: A statistical approach. Prentice Hall, London, 1982.
MATH Google Scholar
D. Erdogmus, K.E. Hild, and J.C. Principe. Online entropy manipulation: Stochastic information gradient. IEEE Signal Processing Letters, 10:242–245, 2003.
Article Google Scholar
D. Erdogmus, J.C. Principe, and K.E. Hild. Beyond second order statistics for learning: A pairwise interaction model for entropy estimation. Natural Computing, 1:85–108, 2002.
Article MATH MathSciNet Google Scholar
R.M. Fano. Transmission of Information: A Statistical theory of Communications. Wiley, New York, 1961.
Google Scholar
M. Feder and N. Merhav. Relations between entropy and error probability. IEEE Trans. on Information Theory, 40:259–266, 1994.
Article MATH Google Scholar
M. Feder, N. Merhav, and M. Gutman. Universal prediction of individual sequences. IEEE Trans. on Information Theory, 38:1258–1270, 1992.
Article MATH MathSciNet Google Scholar
F. Fleuret. Fast binary feature selection with conditional mutual information. Journal of Machine Learning Research, 5:1531–1555, 2004.
MathSciNet Google Scholar
G. Forman. An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research, 3:1289–1305, March 2003.
Article MATH Google Scholar
L. Frey, D. Fisher, I. Tsamardinos, C. Aliferis, and A. Statnikov. Identifying Markov blankets with decision tree induction. In Proc. of IEEE Conference on Data Mining, Melbourne, FL, USA, Nov. 19–22 2003.
Google Scholar
A. Globerson and N. Tishby. Sufficient dimensionality reduction. Journal of Machine Learning Research, 3:1307–1331, 2003.
Article MATH Google Scholar
X. Guorong, C. Peiqi, and W. Minhui. Bhattacharyya distance feature selection. In Proceedings of the 13th International Conference on Pattern Recognition, volume 2, pages 195–199. IEEE, 25–29 Aug. 1996.
Article Google Scholar
T. S. Han and S. Verdú. Generalizing the fano inequality. IEEE Trans. on Information Theory, 40(4):1147–1157, July 1994.
Article MATH Google Scholar
M.E. Hellman and J. Raviv. Probability of error, equivocation and the Chernoff bound. IEEE Transactions on Information Theory, 16:368–372, 1970.
Article MATH MathSciNet Google Scholar
A.O. Hero, B. Ma, O. Michel, and J. Gorman. Alpha-divergence for classification, indexing and retrieval. Technical Report CSPL-328, University of Michigan Ann Arbor, Communications and Signal Processing Laboratory, May 2001.
Google Scholar
J.N. Kapur. Measures of information and their applications. Wiley, New Delhi, India, 1994.
MATH Google Scholar
S. Kaski and J. Sinkkonen. Principle of learning metrics for data analysis. Journal of VLSI Signal Processing, special issue on Machine Learning for Signal Processing, 37:177–188, 2004.
Google Scholar
R. Kohavi and G.H. John. Wrappers for feature subset selection. Artificial Intelligence, 97:273–324, 1997.
Article MATH Google Scholar
D. Koller and M. Sahami. Toward optimal feature selection. In Proceedings of ICML-96, 13th International Conference on Machine Learning, pages 284–292, Bari, Italy, 1996.
Google Scholar
A. Kraskov, H. Stögbauer, and P. Grassberger. Estimating mutual information. e-print arXiv.org/cond-mat/0305641, 2003.
Google Scholar
L. Paninski. Estimation of entropy and mutual information. Neural Computation, 15:1191–1253, 2003.
Article MATH Google Scholar
J. Peltonen and S. Kaski. Discriminative components of data. IEEE Transactions on Neural Networks, 2005.
Google Scholar
J. Peltonen, A. Klami, and S. Kaski. Improved learning of Riemannian learning metrics for exploratory analysis. Neural Networks, 17:1087–1100, 2004.
Article MATH Google Scholar
J.C. Principe, J.W. Fisher III, and D. Xu. Information theoretic learning. In Simon Haykin, editor, Unsupervised Adaptive Filtering. Wiley, New York, NY, 2000.
Google Scholar
J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA, 1993.
Google Scholar
A. Renyi. On measures of entropy and information. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, pages 547–561. University of California Press, 1961.
Google Scholar
G. Saon and M. Padmanabhan. Minimum Bayes error feature selection for continuous speech recognition. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Processing Systems 13 (Proc. NIPS’00), pages 800–806. MIT Press, 2001.
Google Scholar
C. Shannon. A mathematical theory of communication. The Bell System Technical Journal, 27:379–423, 623–656, July, October 1948.
MathSciNet Google Scholar
N. Tishby, F. Pereira, and W. Bialek. The information bottleneck method. In Proceedings of the 37-th Annual Allerton Conference on Communication, Control and Computing, pages 368–377, 1999.
Google Scholar
K. Torkkola. Feature extraction by non-parametric mutual information maximization. Journal of Machine Learning Research, 3:1415–1438, March 2003.
Article MATH MathSciNet Google Scholar
I. Tsamardinos, C. Aliferis, and A. Statnikov. Algorithms for large scale Markov blanket discovery. In The 16th International FLAIRS Conference, St. Augustine, Florida, USA, 2003.
Google Scholar
I. Tsamardinos and C.F. Aliferis. Towards principled feature selection: Relevancy, filters and wrappers. In Proceedings of the Workshop on Artificial Intelligence and Statistics, 2003.
Google Scholar
E. Tuv. Feature selection and ensemble learning. In I. Guyon, S. Gunn, M. Nikravesh, and L. Zadeh, editors, Feature Extraction, Foundations and Applications. Springer, New York, 2005.
Google Scholar
N. Vasconcelos. Feature selection by maximum marginal diversity: optimality and implications for visual recognition. In Proc. IEEE Conf on CVPR, pages 762–772, Madison, WI, USA, 2003.
Google Scholar
D.R. Wolf and E.I. George. Maximally informative statistics. In José M. Bernardo, editor, Bayesian Methods in the Sciences. Real Academia de Ciencias, Madrid, Spain, 1999.
Google Scholar
D.H. Wolpert and D.R. Wolf. Estimating functions of distributions from a finite set of samples. Phys. Rev. E, 52(6):6841–6854, 1995.
Article MathSciNet Google Scholar
E.P. Xing, M.I. Jordan, and R.M. Karp. Feature selection for high-dimensional genomic microarray data. In Proc. 18th International Conf. on Machine Learning, pages 601–608. Morgan Kaufmann, San Francisco, CA, 2001.
Google Scholar
Y. Yang and J.O. Pedersen. A comparative study on feature selection in text categorization. In Proc. 14th International Conference on Machine Learning, pages 412–420. Morgan Kaufmann, 1997.
Google Scholar
L. Yu and H. Liu. Feature selection for high-dimensional data: A fast correlation-based filter solution. In ICML’03, Washington, D.C., 2003.
Google Scholar
M. Zaffalon and M. Hutter. Robust feature selection by mutual information distributions. In Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence, pages 577–584, San Francisco, 2002. Morgan Kaufmann.
Google Scholar

Download references

Author information

Authors and Affiliations

Intelligent Systems Lab, Motorola, Tempe, AZ, USA
Kari Torkkola

Authors

Kari Torkkola
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Clopinet, 955 Creston Road, 94708, Berkeley, USA
Isabelle Guyon
Department of Electrical Engineering & Computer Science — EECS, University of California, 94720, Berkeley, USA
Masoud Nikravesh
School of Electronics and Computer Sciences, University of Southampton, SO17 1BJ, Southampton Highfield, UK
Steve Gunn
Division of Computer Science Lab. Electronics Research, University of California, Soda Hall 387, 94720-1776, Berkeley, CA, USA
Lotfi A. Zadeh

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Torkkola, K. (2008). Information-Theoretic Methods. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds) Feature Extraction. Studies in Fuzziness and Soft Computing, vol 207. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-35488-8_7

Download citation

DOI: https://doi.org/10.1007/978-3-540-35488-8_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35487-1
Online ISBN: 978-3-540-35488-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics