Abstract
Combined analysis of multiple data sources has increasing application interest, in particular for distinguishing shared and source-specific aspects. We extend this rationale to the generative and non-parametric clustering setting by introducing a novel non-parametric hierarchical mixture model. The lower level of the model describes each source with a flexible non-parametric mixture, and the top level combines these to describe commonalities of the sources. The lower-level clusters arise from hierarchical Dirichlet Processes, inducing an infinite-dimensional contingency table between the sources. The commonalities between the sources are modeled by an infinite component model of the contingency table, interpretable as non-negative factorization of infinite matrices, or as a prior for infinite contingency tables. With Gaussian mixture components plugged in for continuous measurements, the model is applied to two views of genes, mRNA expression and abundance of the produced proteins, to expose groups of genes that are co-regulated in either or both of the views. We discover complex relationships between the marginals (that are multimodal in both marginals) that would remain undetected by simpler models. Cluster analysis of co-expression is a standard method of screening for co-regulation, and the two-view analysis extends the approach to distinguishing between pre- and post-translational regulation.
Article PDF
Similar content being viewed by others
References
Archambeau, C., & Bach, F. R. (2009). Sparse probabilistic projections. In D. Koller, D. Schuurmans, Y. Bengio & L. Bottou (Eds.), Advances in neural information processing systems (Vol. 21, pp. 73–80). Cambridge: MIT Press.
Bach, F. R., & Jordan, M. I. (2005). A probabilistic interpretation of canonical correlation analysis (Tech. Rep. 688). Department of Statistics, University of California, Berkeley.
Barnard, K., Duygulu, P., Forsyth, D., de Freitas, N., Blei, D. M., & Jordan, M. I. (2003). Matching words and pictures. Journal of Machine Learning Research, 3, 1107–1135.
Becker, S., & Hinton, G. E. (1992). Self-organizing neural network that discovers surfaces in random-dot stereograms. Nature, 355, 161–163.
Bickel, S., & Scheffer, T. (2004). Multi-view clustering. In Proceedings of the IEEE international conference on data mining (pp. 19–26). IEEE.
Blackwell, D., & MacQueen, J. B. (1973). Ferguson distributions via Polya urn schemes. The Annals of Statistics, 1(2), 353–355.
Blei, D. M., & Jordan, M. I. (2003). Modeling annotated data. In Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval (pp. 127–134). New York: ACM Press.
Blei, D., Ng, A., Jordan, M., & Lafferty, J. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.
Cohn, D., & Hoffman, T. (2001). The missing link—a probabilistic model of document content and hypertext connectivity. In T. Leen, T. Dietterich & V. Tresp (Eds.), Advances in neural information processing systems (Vol. 13). Cambridge: MIT Press.
Dhillon, I. S., Mallela, S., & Modha, D. S. (2003). Information-theoretic co-clustering. In Proceedings of KDD’03, the ninth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 89–98). New York: ACM Press.
Englebienne, G., Cootes, T., & Rattray, M. (2008). A probabilistic model for generating realistic lip movements from speech. In J. Platt, D. Koller, Y. Singer & S. Roweis (Eds.), Advances in neural information processing systems (Vol. 20, pp. 401–408). Cambridge: MIT Press.
Friedman, N., Linial, M., Nachman, I., & Pe’er, D. (2000). Using Bayesian networks to analyze expression data. In RECOMB ’00: Proceedings of the fourth annual international conference on computational molecular biology (pp. 127–135). New York: ACM. doi:10.1145/332306.332355.
Gelman, A., Carlin, J., Stern, H., & Rubin, D. (2004). Bayesian data analysis (2nd ed.). London: Chapman and Hall.
Hofmann, T. (1999). Probabilistic latent semantic analysis. In Proceedings of the 15th conference on uncertainty in artificial intelligence (pp. 289–296). San Francisco: Morgan Kaufmann.
Johnson, N. L., Kotz, S., & Balakrishnan, N. (1997). Discrete multivariate distributions. New York: Wiley.
Klami, A., & Kaski, S. (2007). Local dependent components. In Z. Ghahramani (Ed.) Proceedings of ICML 2007, the 24th international conference on machine learning (pp. 425–432). Madison: Omnipress.
Klami, A., & Kaski, S. (2008). Probabilistic approach to detecting dependencies between data sets. Neurocomputing, 72, 39–46. doi:10.1016/j.neucom.2007.12.044.
Lee, D., & Seung, H. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401, 788–791.
Li, W., Blei, D., & McCallum, A. (2007). Nonparametric Bayes Pachinko allocation. In Proceedings of the 23rd conference on uncertainty in artificial intelligence. AUAI Press.
Neal, R. M. (2000). Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical statistics, 9(2), 249–265.
Rasmussen, C. (2000). The infinite Gaussian mixture model. In S. A. Solla, T. K. Leen & K. R. Muller (Eds.), Advances in neural information processing Systems (Vol. 12, pp. 554–560). Cambridge: MIT Press.
Rivals, I., Personnaz, L., Taing, L., & Potier, M. C. (2007). Enrichment or depletion of a GO category within a class of genes: which test? Bioinformatics, 23(4), 401–407.
Rodriguez, A., Dunson, D. B., & Gelfand, A. E. (2008). The nested Dirichlet process. Journal of the American Statistical Association, 103(483), 1131–1154.
Rogers, S., Girolami, M., Kolch, W., Waters, K. M., Liu, T., Thrall, B., & Wiley, H. S. (2008). Investigating the correspondence between transcriptomic and proteomic expression profiles using coupled cluster models. Bioinformatics, 24(24), 2894–2900. doi:10.1093/bioinformatics/btn553.
Roy, D. M., & Teh, Y. W. (2009). The Mondrian process. In D. Koller, D. Schuurmans, Y. Bengio & L. Bottou, (Eds.), Advances in neural information processing systems (Vol. 21, pp. 1377–1384). Cambridge: MIT Press.
Teh, Y. W., Jordan, M. I., Beal, M. J., & Blei, D. M. (2006). Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101, 1566–1581.
Vinokourov, A., Hardoon, D. R., & Shawe-Taylor, J. (2003a). Learning the semantics of multimedia content with application to web image retrieval and classification. In Proceedings of fourth international symposium on independent component analysis and blind source separation (pp. 697–701).
Vinokourov, A., Shawe-Taylor, J., & Cristianini, N. (2003b). Inferring a semantic representation of text via cross-language correlation analysis. In S. T. Becker & K. Obermayer (Eds.), Advances in neural information processing systems (Vol. 15, pp. 1473–1480). Cambridge: MIT Press.
Waters, K., Liu, T., Quesnberry, R., Qian, W., Willse, A., Bandyopadhyay, S., Kathmann, L., Weber, T., Smith, R., Wiley, H., & Thrall, B. (2008). Systems analysis of response of human mammary epithelial cells to egf by integration of gene expression and proteomic data. Under submission.
Welling, M., Porteous, I., & Bart, E. (2008). Infinite state Bayesian networks. In J. Platt, D. Koller, Y. Singer & S. Roweis (Eds.), Advances in neural information processing systems (Vol. 20, pp. 1601–1608). Cambridge: MIT Press.
West, M. (1992). Hyperparameter estimation in Dirichlet process mixtures (Tech. Rep. 92-A03). Duke University, Institute of Statistics and Decision Sciences.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editors: Nicolo Cesa-Bianchi, David R. Hardoon, and Gayle Leen.
Rights and permissions
About this article
Cite this article
Rogers, S., Klami, A., Sinkkonen, J. et al. Infinite factorization of multiple non-parametric views. Mach Learn 79, 201–226 (2010). https://doi.org/10.1007/s10994-009-5155-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-009-5155-1