Infinite factorization of multiple non-parametric views

Rogers, Simon; Klami, Arto; Sinkkonen, Janne; Girolami, Mark; Kaski, Samuel

doi:10.1007/s10994-009-5155-1

Infinite factorization of multiple non-parametric views

Published: 13 November 2009

Volume 79, pages 201–226, (2010)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Infinite factorization of multiple non-parametric views

Download PDF

Simon Rogers¹,
Arto Klami²,
Janne Sinkkonen²,
Mark Girolami¹ &
…
Samuel Kaski²

561 Accesses
10 Citations
Explore all metrics

Abstract

Combined analysis of multiple data sources has increasing application interest, in particular for distinguishing shared and source-specific aspects. We extend this rationale to the generative and non-parametric clustering setting by introducing a novel non-parametric hierarchical mixture model. The lower level of the model describes each source with a flexible non-parametric mixture, and the top level combines these to describe commonalities of the sources. The lower-level clusters arise from hierarchical Dirichlet Processes, inducing an infinite-dimensional contingency table between the sources. The commonalities between the sources are modeled by an infinite component model of the contingency table, interpretable as non-negative factorization of infinite matrices, or as a prior for infinite contingency tables. With Gaussian mixture components plugged in for continuous measurements, the model is applied to two views of genes, mRNA expression and abundance of the produced proteins, to expose groups of genes that are co-regulated in either or both of the views. We discover complex relationships between the marginals (that are multimodal in both marginals) that would remain undetected by simpler models. Cluster analysis of co-expression is a standard method of screening for co-regulation, and the two-view analysis extends the approach to distinguishing between pre- and post-translational regulation.

References

Archambeau, C., & Bach, F. R. (2009). Sparse probabilistic projections. In D. Koller, D. Schuurmans, Y. Bengio & L. Bottou (Eds.), Advances in neural information processing systems (Vol. 21, pp. 73–80). Cambridge: MIT Press.
Google Scholar
Bach, F. R., & Jordan, M. I. (2005). A probabilistic interpretation of canonical correlation analysis (Tech. Rep. 688). Department of Statistics, University of California, Berkeley.
Barnard, K., Duygulu, P., Forsyth, D., de Freitas, N., Blei, D. M., & Jordan, M. I. (2003). Matching words and pictures. Journal of Machine Learning Research, 3, 1107–1135.
Article MATH Google Scholar
Becker, S., & Hinton, G. E. (1992). Self-organizing neural network that discovers surfaces in random-dot stereograms. Nature, 355, 161–163.
Article Google Scholar
Bickel, S., & Scheffer, T. (2004). Multi-view clustering. In Proceedings of the IEEE international conference on data mining (pp. 19–26). IEEE.
Blackwell, D., & MacQueen, J. B. (1973). Ferguson distributions via Polya urn schemes. The Annals of Statistics, 1(2), 353–355.
Article MATH MathSciNet Google Scholar
Blei, D. M., & Jordan, M. I. (2003). Modeling annotated data. In Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval (pp. 127–134). New York: ACM Press.
Google Scholar
Blei, D., Ng, A., Jordan, M., & Lafferty, J. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.
Article MATH Google Scholar
Cohn, D., & Hoffman, T. (2001). The missing link—a probabilistic model of document content and hypertext connectivity. In T. Leen, T. Dietterich & V. Tresp (Eds.), Advances in neural information processing systems (Vol. 13). Cambridge: MIT Press.
Google Scholar
Dhillon, I. S., Mallela, S., & Modha, D. S. (2003). Information-theoretic co-clustering. In Proceedings of KDD’03, the ninth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 89–98). New York: ACM Press.
Chapter Google Scholar
Englebienne, G., Cootes, T., & Rattray, M. (2008). A probabilistic model for generating realistic lip movements from speech. In J. Platt, D. Koller, Y. Singer & S. Roweis (Eds.), Advances in neural information processing systems (Vol. 20, pp. 401–408). Cambridge: MIT Press.
Google Scholar
Friedman, N., Linial, M., Nachman, I., & Pe’er, D. (2000). Using Bayesian networks to analyze expression data. In RECOMB ’00: Proceedings of the fourth annual international conference on computational molecular biology (pp. 127–135). New York: ACM. doi:10.1145/332306.332355.
Chapter Google Scholar
Gelman, A., Carlin, J., Stern, H., & Rubin, D. (2004). Bayesian data analysis (2nd ed.). London: Chapman and Hall.
MATH Google Scholar
Hofmann, T. (1999). Probabilistic latent semantic analysis. In Proceedings of the 15th conference on uncertainty in artificial intelligence (pp. 289–296). San Francisco: Morgan Kaufmann.
Google Scholar
Johnson, N. L., Kotz, S., & Balakrishnan, N. (1997). Discrete multivariate distributions. New York: Wiley.
MATH Google Scholar
Klami, A., & Kaski, S. (2007). Local dependent components. In Z. Ghahramani (Ed.) Proceedings of ICML 2007, the 24th international conference on machine learning (pp. 425–432). Madison: Omnipress.
Chapter Google Scholar
Klami, A., & Kaski, S. (2008). Probabilistic approach to detecting dependencies between data sets. Neurocomputing, 72, 39–46. doi:10.1016/j.neucom.2007.12.044.
Article Google Scholar
Lee, D., & Seung, H. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401, 788–791.
Article Google Scholar
Li, W., Blei, D., & McCallum, A. (2007). Nonparametric Bayes Pachinko allocation. In Proceedings of the 23rd conference on uncertainty in artificial intelligence. AUAI Press.
Neal, R. M. (2000). Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical statistics, 9(2), 249–265.
Article MathSciNet Google Scholar
Rasmussen, C. (2000). The infinite Gaussian mixture model. In S. A. Solla, T. K. Leen & K. R. Muller (Eds.), Advances in neural information processing Systems (Vol. 12, pp. 554–560). Cambridge: MIT Press.
Google Scholar
Rivals, I., Personnaz, L., Taing, L., & Potier, M. C. (2007). Enrichment or depletion of a GO category within a class of genes: which test? Bioinformatics, 23(4), 401–407.
Article Google Scholar
Rodriguez, A., Dunson, D. B., & Gelfand, A. E. (2008). The nested Dirichlet process. Journal of the American Statistical Association, 103(483), 1131–1154.
Article Google Scholar
Rogers, S., Girolami, M., Kolch, W., Waters, K. M., Liu, T., Thrall, B., & Wiley, H. S. (2008). Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models. Bioinformatics, 24(24), 2894–2900. doi:10.1093/bioinformatics/btn553.
Article Google Scholar
Roy, D. M., & Teh, Y. W. (2009). The Mondrian process. In D. Koller, D. Schuurmans, Y. Bengio & L. Bottou, (Eds.), Advances in neural information processing systems (Vol. 21, pp. 1377–1384). Cambridge: MIT Press.
Google Scholar
Teh, Y. W., Jordan, M. I., Beal, M. J., & Blei, D. M. (2006). Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101, 1566–1581.
Article MATH MathSciNet Google Scholar
Vinokourov, A., Hardoon, D. R., & Shawe-Taylor, J. (2003a). Learning the semantics of multimedia content with application to web image retrieval and classification. In Proceedings of fourth international symposium on independent component analysis and blind source separation (pp. 697–701).
Vinokourov, A., Shawe-Taylor, J., & Cristianini, N. (2003b). Inferring a semantic representation of text via cross-language correlation analysis. In S. T. Becker & K. Obermayer (Eds.), Advances in neural information processing systems (Vol. 15, pp. 1473–1480). Cambridge: MIT Press.
Google Scholar
Waters, K., Liu, T., Quesnberry, R., Qian, W., Willse, A., Bandyopadhyay, S., Kathmann, L., Weber, T., Smith, R., Wiley, H., & Thrall, B. (2008). Systems analysis of response of human mammary epithelial cells to egf by integration of gene expression and proteomic data. Under submission.
Welling, M., Porteous, I., & Bart, E. (2008). Infinite state Bayesian networks. In J. Platt, D. Koller, Y. Singer & S. Roweis (Eds.), Advances in neural information processing systems (Vol. 20, pp. 1601–1608). Cambridge: MIT Press.
Google Scholar
West, M. (1992). Hyperparameter estimation in Dirichlet process mixtures (Tech. Rep. 92-A03). Duke University, Institute of Statistics and Decision Sciences.

Download references

Author information

Authors and Affiliations

Department of Computing Science, University of Glasgow, Glasgow, UK
Simon Rogers & Mark Girolami
Department of Information and Computer Science, Helsinki University of Technology, Espoo, Finland
Arto Klami, Janne Sinkkonen & Samuel Kaski

Authors

Simon Rogers
View author publications
You can also search for this author in PubMed Google Scholar
Arto Klami
View author publications
You can also search for this author in PubMed Google Scholar
Janne Sinkkonen
View author publications
You can also search for this author in PubMed Google Scholar
Mark Girolami
View author publications
You can also search for this author in PubMed Google Scholar
Samuel Kaski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arto Klami.

Additional information

Editors: Nicolo Cesa-Bianchi, David R. Hardoon, and Gayle Leen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rogers, S., Klami, A., Sinkkonen, J. et al. Infinite factorization of multiple non-parametric views. Mach Learn 79, 201–226 (2010). https://doi.org/10.1007/s10994-009-5155-1

Download citation

Received: 27 February 2009
Revised: 09 October 2009
Accepted: 12 October 2009
Published: 13 November 2009
Issue Date: May 2010
DOI: https://doi.org/10.1007/s10994-009-5155-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Infinite factorization of multiple non-parametric views

Abstract

Article PDF

Similar content being viewed by others

A mixture factor model with applications to microarray data

Factor Analysis with Mixture Modeling to Evaluate Coherent Patterns in Microarray Data

Mixtures of Gaussian copula factor analyzers for clustering high dimensional data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Infinite factorization of multiple non-parametric views

Abstract

Article PDF

Similar content being viewed by others

A mixture factor model with applications to microarray data

Factor Analysis with Mixture Modeling to Evaluate Coherent Patterns in Microarray Data

Mixtures of Gaussian copula factor analyzers for clustering high dimensional data

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation