Clustering Gene Expression Data by Mutual Information with Gene Function

Kaski, Samuel; Sinkkonen, Janne; Nikkilä, Janne

doi:10.1007/3-540-44668-0_12

Samuel Kaski⁷,
Janne Sinkkonen⁷ &
Janne Nikkilä⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2130))

Included in the following conference series:

International Conference on Artificial Neural Networks

4054 Accesses
3 Citations

Abstract

We introduce a simple on-line algorithm for clustering paired samples of continuous and discrete data. The clusters are defined in the continuous data space and become local there, while within-cluster differences between the associated, implicitly estimated conditional distributions of the discrete variable are minimized. The discrete variable can be seen as an indicator of relevance or importance guiding the clustering. Minimization of the Kullback-Leibler divergence-based distortion criterion is equivalent to maximization of the mutual information between the generated clusters and the discrete variable. We apply the method to a time series data set, i.e. yeast gene expressions measured with DNA chips, with biological knowledge about the functions of the genes encoded into the discrete variable.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 189.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

S. Becker. Mutual information maximization: models of cortical self-organization. Network: Computation in Neural Systems, 7:7–31, 1996.
Article MATH Google Scholar
S. Becker and G. E. Hinton. Self-organizing neural network that discovers surfaces in random-dot stereograms. Nature, 355:161–163, 1992.
Article Google Scholar
M. P. Brown, W. N. Grundy, D. Lin, N. Cristianini, C. W. Sugnet, T. S. Furey, M. Ares, Jr., and D. Haussler. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proceedings of the National Academy of Sciences, USA, 97:262–267, 2000.
Article Google Scholar
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39:1–38, 1977.
MATH MathSciNet Google Scholar
M. B. Eisen, P. T. Spellman, P. O. Brown, and D. Botstein. Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences, USA, 95:14863–14868, 1998.
Article Google Scholar
T. Hastie, R. Tibshirani, and A. Buja. Flexible discriminant and mixture models. In J. Kay and D. Titterington, editors, Neural Networks and Statistics. Oxford University Press, 1995.
Google Scholar
S. Kaski. Convergence of a stochastic semisupervised clustering algorithm. Technical Report A62, Helsinki University of Technology, Publications in Computer and Information Science, Espoo, Finland, 2000.
Google Scholar
S. Kaski, J. Sinkkonen and J. Peltonen. Bankruptcy Analysis with Self-Organizing Maps in Learning Metrics IEEE Transactions on Neural Networks, 2001, in press.
Google Scholar
K. V. Mardia. Statistics of directional data. Journal of the Royal Statistical Society. Series B, 37:349–393, 1975.
MATH MathSciNet Google Scholar
F. Pereira, N. Tishby, and L. Lee. Distributional clustering of English words. In Proceedings of the 30th Annual Meeting of the Association for Computational Linguistics, pages 183–190. 1993.
Google Scholar
J. Sinkkonen and S. Kaski. Clustering based on conditional distributions in an auxiliary space. Neural Computation, 2001, in press.
Google Scholar
T. Hofmann, J. Puzicha, and M. I. Jordan. Learning from dyadic data. In M. S. Kearns, S. A. Solla, and D. A. Cohn, editors, Advances in Neural Information Processing Systems 11, pages 466–472. Morgan Kauffmann Publishers, San Mateo, CA, 1998.
Google Scholar
N. Tishby, F. C. Pereira, and W. Bialek. The information bottleneck method. In 37th Annual Allerton Conference on Communication, Control, and Computing, Urbana, Illinois, 1999.
Google Scholar

Download references

Author information

Authors and Affiliations

Neural Networks Research Centre, Helsinki University of Technology, P.O.Box 5400, FIN-02015, Hut, Finland
Samuel Kaski, Janne Sinkkonen & Janne Nikkilä

Authors

Samuel Kaski
View author publications
You can also search for this author in PubMed Google Scholar
Janne Sinkkonen
View author publications
You can also search for this author in PubMed Google Scholar
Janne Nikkilä
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Mecidal Cybernetics and Artificial Intelligence, University of Vienna, Freyung 6/2, 1010, Vienna, Austria
Georg Dorffner
Institute for Computer Aided Automation Pattern Recognition and Image Processing Group, Technical University of Vienna, Favoritenstr. 9/1832, 1040, Vienna, Austria
Horst Bischof
Institut für Statistik, Wirtschaftsuniversität Wien, Augasse 2-6, 1090, Wien, Austria
Kurt Hornik

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kaski, S., Sinkkonen, J., Nikkilä, J. (2001). Clustering Gene Expression Data by Mutual Information with Gene Function. In: Dorffner, G., Bischof, H., Hornik, K. (eds) Artificial Neural Networks — ICANN 2001. ICANN 2001. Lecture Notes in Computer Science, vol 2130. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44668-0_12

Download citation

DOI: https://doi.org/10.1007/3-540-44668-0_12
Published: 17 August 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42486-4
Online ISBN: 978-3-540-44668-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics