Vine copulas for mixed data : multiview clustering for mixed data beyond metaGaussian dependencies
Abstract
Copulas enable flexible parameterization of multivariate distributions in terms of constituent marginals and dependence families. Vine copulas, hierarchical collections of bivariate copulas, can model a wide variety of dependencies in multivariate data including asymmetric and tail dependencies which the more widely used Gaussian copulas, used in MetaGaussian distributions, cannot. However, current inference algorithms for vines cannot fit data with mixed—a combination of continuous, binary and ordinal—features that are common in many domains. We design a new inference algorithm to fit vines on mixed data thereby extending their use to several applications. We illustrate our algorithm by developing a dependencyseeking multiview clustering model based on Dirichlet Process mixture of vines that generalizes previous models to arbitrary dependencies as well as to mixed marginals. Empirical results on synthetic and real datasets demonstrate the performance on clustering singleview and multiview data with asymmetric and tail dependencies and with mixed marginals.
Keywords
Vine copula Mixed data Multiview Dependencyseeking clustering1 Introduction
With the Gaussian copula itself, using different marginals, many different joint distributions (even multimodal distributions) can be constructed, called metaGaussian distributions, that have been used in several applications (Letham et al. 2014; Eickhoff et al. 2015). However, metaGaussian dependencies from the Gaussian copula do not include asymmetric and tail dependencies that are captured by other copula families (Joe 2014): see Fig. 1. Vine copulas provide a flexible way of pairwise dependency modeling using hierarchical collections of bivariate copulas, each of which can belong to any copula family thereby capturing a wide variety of dependencies (see Sect. 2 for details). Vines have been used in many applications such as time series analysis (Aas et al. 2009), domain adaptation (LopezPaz et al. 2012) and variational inference (Tran et al. 2015) in the machine learning literature.
Real world data often has mixed (continuous, binary and ordinal valued) features. While copulas have been useful for modeling continuous multivariate distributions their use with discrete data remains difficult—the copula is not marginfree and may not be identifiable—but they still provide usable dependence relationships (Genest and Neslehova 2007). In particular, there are no existing techniques to fit vine copulas on mixed data and the challenge lies mainly in parameter inference. Two previous approaches, both for only discrete (not mixed) features, require expensive estimation of marginals: one by Panagiotelis et al. (2012) and another by Smith and Khaled (2012). The latter can be extended to mixed data but their MCMC algorithm requires computations that are exponential in data dimensions per sampling step, making it practically infeasible. We address this problem, by designing a new efficient inference algorithm for vine copulas that can fit mixed data.
Our algorithm facilitates the extension of multivariate models—to arbitrary dependencies through the use of vines and to mixed data through our inference algorithm. We demonstrate such an extension in the context of multiview learning. Multiple views of data refer to different measurement modalities or information sources for the same learning task, for example image and text (Chen et al. 2012) or text in two languages (Guo and Xiao 2012). The views could also be distinct information from the same source such as words and context (Pennington et al. 2014) or similar information from different measurement regimes (Wang et al. 2013) with potentially different noise characteristics. The multiview learning paradigm involves simultaneously learning a model for each view, assuming the views are conditionally independent given the class label (or the cluster assignment, in a clustering scenario). Multiview approaches utilize the dependencies within views and across views by colearning, leading to improved learning, for example in multiclass classification (Minh et al. 2013), image denoising (White et al. 2012) and coclustering (Sun et al. 2015). Empirical (Wang et al. 2015; Minh et al. 2013) and theoretical (Chaudhuri et al. 2009) results show the effectiveness of such approaches, especially over methods that concatenate the features from multiple views.
Multiview clustering based on Canonical Correlation Analysis (CCA) has been studied extensively (Chaudhuri et al. 2009; Kumar et al. 2011; Dhillon et al. 2011). The CCAbased dependency seeking clustering model of Klami and Kaski (2008) groups cooccurring samples in the combined space of the views, such that the views are independent given the clustering (see Sect. 3 for details). Through the use of Gaussian copulas, Rey and Roth (2012) eliminate two restrictive assumptions in Klami and Kaski’s model: Gaussianonly dependence structure and identical marginal distributions in all dimensions. However, as noted above, Gaussian copulas cannot capture many different kinds of dependencies prevalent in realworld datasets. For datasets with asymmetric and tail dependencies, Rey and Roth’s model that assumes metaGaussian distribution suffers from model mismatch and results in an erroneously large number of clusters (see Sect. 6). Another limitation of their method, as well as many other clustering methods, is the inability to fit mixed data. We overcome both these limitations by developing a dependencyseeking multiview clustering model based on Dirichlet Process mixture of vines that generalizes to arbitrary dependencies as well as to mixed marginals.
 1.
We take the first step to fit vine copula for mixed data (with arbitrary continuous, ordinal and binary marginals) by designing a new MCMC inference algorithm with time complexity that is, per sampling step, quadratic in the data dimensions and linear in the number of bivariate copulas used. Our sampling scheme bypasses the costly estimation of marginals using a rankbased likelihood (Hoff 2007) to obtain approximate parameter estimates [Sect. 4]
Empirically, it is faster than the algorithm of Panagiotelis et al. (2012) for discrete marginals and it yields more accurate parameter estimates, in both the continuous and discrete case, than the current best estimators.
 2.
We develop a Dirichlet Process mixture of vine copulas model for dependency seeking multiview clustering, that generalizes the model of Rey and Roth (2012) to arbitrary dependencies (beyond metaGaussian) as well as to mixed marginals. The flexibility of the model comes with its challenges in fitting mixed data and nonconjugacy of priors for the latent variables in our model. We design an inference algorithm that overcomes both these hurdles by extending our inference algorithm for vines [Sect. 5].
 3.
Our empirical results on synthetic and real datasets demonstrate (i) the scalability and accuracy of our inference algorithm and (ii) clustering performance on singleview and multiview data with asymmetric and tail dependencies and with mixed marginals [Sect. 6].
2 Background
Copula An Mdimensional copula is a multivariate distribution function \(C: [0,1]^M \mapsto [0,1]\) with uniform margins. A theorem by Sklar (1959) proves that copulas can uniquely characterize continuous joint distributions. It shows that for every joint distribution, \(F(X_1, \ldots , X_M)\), with continuous marginals, \(F_j(X_j) ~~ \forall 1 \le j \le M\), there exists a unique copula function C such that \(F(X_1, \ldots , X_M) = C(F_1(X_1), \ldots , F_M(X_M) )\) as well as the converse. The joint density function p can be expressed as: \(p(X_1,\ldots ,X_M) = c(F_1(X_1), \ldots ,F_M(X_M)).p_1(X_1) \ldots p_M(X_M)\) for strictly increasing and continuous marginals \(F_j\) and copula density c. In the discrete case, the copula is uniquely determined, not in general, but on \(Ran(F_1) \times \ldots \times Ran(F_p)\), where \(Ran(F_j)\) is the range of marginal \(F_j\) and a copula based decomposition remains well defined. See Genest and Neslehova (2007) for a discussion on how dependence properties of copulas are valid for discrete data.
Analytic expressions for hfunctions have been derived for commonly used copulas; see Aas et al. (2009) for more details and an introduction to vines. The advantage of such a model is that not all the bivariate copulas have to belong to the same family thus enabling us to model different kinds of bivariate dependencies. In this paper we describe our models using DVines, but the techniques can easily be extended to other regular vines for a given configuration of paircopulas. We note that the choice of Dvines is motivated by the ready availability of baselines for continuous data (Brechmann and Schepsmeier 2013) and discrete data (Panagiotelis et al. 2012) (though there is no available baseline for mixed data).
NonParametric Clustering Bayesian nonparametric models enable clustering with mixture models without having to fix the number of mixture components apriori, allowing the model to adapt based on the observed data. The Dirichlet Process (DP) serves as a prior for a mixture distribution over countably infinite components for a mixture model (Teh 2010). The DP is briefly described below, through a generative process, that produces countably infinite weights \({\pi _k}_{k=1}^{\infty }\) summing to one (refer to (Teh 2010) for alternate definitions). This generative process is also called the stickbreaking process (Aldous 1985), where the distribution of weights \({\pi _k}_{k=1}^{\infty }\) is often represented by GEM after its authors.
3 Related work
Parameter Estimation for Discrete Vines For vines with discrete margins, Smith and Khaled (2012) propose an MCMC inference algorithm which uses a data augmentation approach to compute the probability mass function (PMF). It is extensible to mixed data, but requires, for a Mdimensional vine, \(\mathcal {O}(2^M)\) computations per sampling step. Panagiotelis et al. (2012) derive a decomposition of the PMF that requires only \(2M(M1)\) evaluations of bivariate copula functions in the vine. But their method cannot be used with vines with mixed margins. Further, their recommended estimation method is the twostep IFM approach (Joe 2014; Panagiotelis et al. 2012) where the marginals are estimated first and then ML estimates of parameters are obtained using nonlinear maximization methods such as gradient ascent that are fraught with problems due to local maxima.
Multiview Dependency Seeking Clustering Finding linear interview dependencies through CCA has been extended in several ways in recent years to capture nonlinear dependencies (eg. kernelized CCA (ShaweTaylor and Cristianini 2004)) and nonnormal distributions (eg. exponential CCA (Klami et al. 2010)).
Modelbased clustering techniques such as that in Yerebakan et al. (2014) attempt to capture more complex continuous densities by modeling each mixture component with multimodal densities based on an Infinite Gaussian mixture, but cannot be used with multiview data or mixed data.
Clustering mixed data Recent modelbased clustering methods to fit mixed data have been designed by McParland and Gormley (2016) and Browne and McNicholas (2012) that use latent variable approaches, similar to ours, but assume Gaussian distribution; and by McParland et al. (2014) who use a mixture of factor analyzers model.
Recent copulabased models include a mixture of DVines by Kim et al. (2013) that can only fit continuous data. A more general mixture of copulas by Kosmidis and Karlis (2015) mentions possible extensions to discrete and mixed data. For several copula families their algorithm scales exponentially with dimensions rendering them impractical. For vines, that capture more complex dependencies and constitute our main focus, they do not discuss mixed data extensions and for discrete vines they suggest the same PMF decomposition of Panagiotelis et al. (2012) that we compare with in our experiments and significantly outperform.
Correlation clustering also attempts to find clusters based on dependencies and is typically PCAbased. E.g. INCONCO (Plant and Böhm 2011) that can be used with mixed data but models dependencies by distinct Gaussian distributions for each category of each discrete feature. While SCENIC (Plant 2012), that is empirically found to outperform INCONCO, is not as restrictive in the dependencies, it also is limited by the fact that it assumes a Gaussian distribution to find a lowdimensional embedding of the data. Note that these methods are not suited for multiview clustering; we use SCENIC and ClustMD (McParland and Gormley 2016) as baselines in singleview settings only.
4 Dvines for mixed data
Our approach involves a generative formulation for Dvines where we explicitly introduce marginals for each datapoint as latent variables. Note that the model and inference algorithm can be readily extended to other regular vines but for ease of exposition we restrict ourselves to Dvines.
\({\varSigma }=\{\sigma _{s,t} : 1 \le s < t \le M \}\) is the collection of parameters of all the constituent bivariate copulas in the Dvine definition. We place a uniform prior over the support of the parameters in \({\varSigma }_{s,t} \forall s,t\), once again for simplicity. We also note that alternate priors exploiting conjugacy are preferable where permissible. For instance, for bivariate Gaussian copula, we place an inverse Wishart prior exploiting conjugacy. (Refer to sections 4.5 and 4.6 from Murphy (2012) for a discussion on Wishart distribution for Bayesian inference. The use of Inverse Wishart prior for Bayesian inference with the Gaussian copula is discussed in detail in Hoff 2007).
Inference Exact inference for this problem is intractable and we propose an approximate inference algorithm for vines for mixed data based on Gibbs sampling using the extended rank likelihood (Hoff 2007) approach that bypasses the estimation of margins and thus can accommodate both continuous and discrete ordinal margins. Further, due to the nonconjugacy of priors, our Gibbs Sampling steps are interspersed with Metropolis Hastings steps, similar to the sampling approaches found in Neal (2000) and Meeds et al. (2007).
Computational Complexity Drawing a single sample from a rank constrained Dvine with uniform marginals using Metropolis Hastings algorithm entails time complexity of \(O(M^2)\) with the chosen proposal. Hence, time complexity for a single Gibbs sweep in our algorithm is \(O(M^2 N)\) due to the quadratic complexity of sampling the \(U_{i,.}\) variables for each of the N samples and the sampling for parameters and families of \(M \atopwithdelims ()2\) pair copulas.
A popular technique to reduce the complexity of vine inference is truncation (Joe 2014) where all copulas beyond a certain level in the vine structure are assumed to be independence copulas. This can potentially lead to linear complexity per sampling step per data point. Our algorithm can be extended for truncated vines but we do not investigate this further in this paper.
5 Vines for multiview dependency seeking clustering of mixed data
We now present a model for multiview dependency seeking clustering using DVines. Consider data \(\{X_{i,v,j}\}\), N data points with \(i \in [N]\), collected from V views with \(v \in [V]\), where \(j \in [M_v]\) denotes the dimension in the specific view. Our goal is to cluster the data simultaneously from all the views, while modeling intraview dependencies in each view. (Note that for better readability, we have slightly deviated from the superscript notation used in Sect. 3, to denote a view.)
Notation: A set with a subscript starting with a hyphen(−) indicates the set of all elements except the index following the hyphen. Let \(n_k = \{ \mathbf {X}_i : Z_i=k\}\).
For sampling \(\alpha \), we follow the standard technique in Escobar and West (1995). Sampling \(U, {\varSigma }, {\varTheta }\) follows from Sect. 4 due to our modeling assumption that data in each view and each cluster is independently generated from a Dvine. Hence, for each cluster k, for each view v, sampling the random variables corresponding to the marginal distributions \(\mathbf {U}^{k,v}=\{U_{i,v,.} :i\in [N],Z_i=k\}\), the pair copula parameters \({\varSigma }_{k,v}\) and the families \({\varTheta }_{k,v}\) independently follow the same steps as outlined in the Gibbs sampling iteration in Algorithm 1.
6 Experiments
Datatype  Continuous  Discrete  Mixed  

Algorithm  ExtDVine  MLE (Aas et al.)  ExtDVine  Panagiotelis et al.  ExtDVine 
RMSE  0.0389  0.0395  0.06  0.106  0.0429 
Goodness of fit: RMSE between correlation values on original data and data simulated using parameter estimates from MLE and our method
GOF method  Kendall’s tau  Pearson  Spearman rho 

MLE  0.031  0.071  0.046 
ExtDVine  0.016  0.066  0.023 
Table 1 shows the average RMSE of the original parameters with respect to the estimated parameters obtained by ExtDVine and MLE. Our estimates are closer to the true parameters than those obtained by MLE. We repeat this experiment with mixed marginals (Gaussian, gamma, negative binomial, Poisson) and obtain a low RMSE of the estimated parameters from the original parameters (there are no available baselines for mixed data).
Time Complexity We empirically evaluate the time complexity and accuracy for discrete marginals by plotting time taken for inference for varying dimensions (M), for a fixed datasize of N \(=\) 500 points with parameters generated from priors. Since there is no baseline for mixed data, we restrict this evaluation to discrete data and use the baseline of Panagiotelis et al. (2012), the most efficient method known for discrete vines. We use 15 sampling sweeps while the method of Panagiotelis et al. (2012) takes significantly more time to run till convergence (with between 1020 iterations) and obtains less accurate parameter estimates. (Results shown over 25 runs with error bars in Fig. 3). While our inference method analytically leads to complexity quadratic in M (and linear in the number of paircopulas), in Fig. 3, it almost looks linear in M, in comparison with Panagiotelis et al. (2012) due to significantly higher runtime of the baseline. In fact, the baseline did not complete its run to convergence after running for a day even for 20 dimensional data. In Fig. 4, we show a standalone plot of the runtime and accuracy of our technique (without the discrete baseline) for upto M \(=\) 50 dimensions. We observe quadratic complexity of O(\(M^2N\)), linear in the number of pair copulas, for a fixed datasize N \(=\) 500 as discussed.
6.1 Dependency seeking clustering
Evaluation Metrics We evaluate the ability of GCMVC and our method ExtVineMVC to identify the correct number of clusters. We also evaluate the clustering performance of ExtVineMVC and ExtGCMVC when the number of clusters is given as input. Clustering performance is measured by Adjusted Rand Index (ARI) (Hubert and Arabie 1985), Variation of Information (VI) (Meilă 2007), Normalized Mutual Information (NMI) (Vinh et al. 2010) and the classification accuracy obtained by fixing the labels of the inferred clusters. Note that lower VI is better while higher values in other metrics indicate better performance. All results shown are averages over 25 simulations.
Multiview clustering on synthetic datasets with continuous marginals
Measure  ARI  NMI  VI  Accuracy 

ExtVineMVC  0.346  0.308  0.936  0.795 
GCMVC  0.110  0.117  1.128  0.661 
Multiview clustering on synthetic datasets with mixed marginals
Measure  ARI  NMI  VI  Accuracy 

ExtVineMVC  0.252  0.207  1.095  0.729 
ExtGCMVC  0.167  0.138  1.185  0.692 
SingleView Setting While our focus application is multiview dependency seeking clustering, we also run our algorithm in the special case of singleview setting to demonstrate our algorithm for datasets with more complex dependencies like combination of asymmetric and tail dependencies. We generate data with pairwise tail dependencies and asymmetric dependencies as shown in Fig. 7. In dataset 1 we use gamma, normal and exponential marginals and in dataset 2 we use gamma, negative binomial and Poisson marginals. Note that cluster 1 has asymmetric dependencies and cluster 2 has tail dependencies. We also use additional baselines of Gaussian Mixture Models (GMM) for continuous features and two stateoftheart methods for mixed data: SCENIC (Plant 2012) and ClustMD (McParland and Gormley 2016).
Figure 8 shows the proportion of times, out of 25 runs, when algorithms ExtVineMVC and GCMVC obtain a specific number of clusters in the single view setting showing how our model fits the data compared to baseline for data generated from a known number of clusters. In the continuous case, our method infers the right number 80% of the times and in the remaining cases, the deviation is not large (Fig. 8a). GCMVC has to compensate for the model mismatch by increasing the number of clusters (Fig. 8b). In the case of mixed data, ExtGCMVC erroneoulsy infers the number of clusters to be more than 5 in 90% of the cases (Fig. 8d). ExtVineMVC does much better, inferring the right number of clusters in 80% of the cases and the deviation is \(\le 1\) (Fig. 8c). Table 5 shows the performance of ExtVineMVC in comparison with GCMVC and GMM for dependency seeking clustering on continuous data. Table 6 compares ExtVineMVC versus baselines ExtGCMVC, SCENIC and ClustMD. ExtVineMVC is found to consistently outperform the baselines in both continuous and mixed datasets.
Clustering accuracy results: single view, continuous
Measure  ARI  NMI  VI  Accuracy 

ExtVineMVC  0.220  0.195  1.084  0.738 
GCMVC  0.065  0.056  1.295  0.634 
GMM  0.017  0.021  1.354  0.572 
Clustering accuracy results: single view, mixed marginals
Measure  ARI  NMI  VI  Accuracy 

ExtVineMVC  0.124  0.101  1.237  0.664 
ExtGCMVC  0.075  0.074  1.215  0.635 
SCENIC  0.006  0.014  1.366  0.508 
ClustMD  0.058  0.083  1.153  0.602 
Results: mortality dataset: mortality prediction
Measure  ARI  NMI  VI  Accuracy 

ExtVineMVC  0.20  0.20  0.90  0.734 
ExtGCMVC  0.02  0.009  1.27  0.58 
6.2 Summary of results

ExtDvine obtains more accurate parameter estimates than the MLE method of Aas et al. (2009) for continuous margins as well as the method of Panagiotelis et al. (2012) for discrete margins. In runtime it is faster than the latter and is the first method to fit vines on mixed margins.
 ExtVineMVC, our DP mixture model for dependency seeking clustering in multiview and single view settings, is evaluated on simulated continuous and mixed data containing asymmetric and tail dependencies. We show superior performance over baselines in
 1.
clustering accuracy in a finite mixture setting,
 2.
detecting the correct number of clusters in a nonparametric setting.
 1.

ExtVineMVC significantly outperforms GCMVC and ExtGCMVC (that follow the model of Rey and Roth (2012) and are limited to modeling metaGaussian dependencies) on clustering real world datasets.
7 Conclusion
We design a new MCMC inference algorithm to fit vines on mixed data that runs in \(O(M^2 N)\) time per sampling step (M dimensions, N observations). Our model, a DP mixture of vines, can fit mixed margin distributions and arbitrary dependencies. Empirically we demonstrate the benefits of our model in dependency seeking clustering, extending stateoftheart multi and single view models by modeling asymmetric and tail dependencies and fitting mixed data.
References
 Aas, K., Czado, C., Frigessi, A., & Bakken, H. (2009). Paircopula constructions of multiple dependence. Insurance: Mathematics and Economics, 44(2), 182–198.MathSciNetzbMATHGoogle Scholar
 Aldous, D. J. (1985). In École d’été de probabilités de SaintFlour, XIII—1983. Lecture notes in mathematics (pp. 1–198). Springer.Google Scholar
 Amoualian, H., Gaussier, E., Clausel, M., & Amini, M.R. (2016). Streaminglda: A copulabased approach to modeling topic dependencies in document streams. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining.Google Scholar
 Bache, K., & Lichman, M. (2013). UCI Machine learning repository. http://archive.ics.uci.edu/ml.
 Brechmann, E. C., & Schepsmeier, U. (2013). Modeling dependence with C and Dvine copulas: The R package CDVine. Journal of Statistical Software, 52(3). doi: 10.18637/jss.v052.i03.
 Browne, R. P., & McNicholas, P. D. (2012). Modelbased clustering, classification, and discriminant analysis of data with mixed type. Journal of Statistical Planning and Inference, 142(11), 2976–2984.MathSciNetCrossRefzbMATHGoogle Scholar
 Chang, Y., Li, Y., Ding, A., & Dy, J. (2016). A robustequitable copula dependence measure for feature selection. In Proceedings of the 19th international conference on artificial intelligence and statistics (AISTATS), (pp. 84–92).Google Scholar
 Chaudhuri, K., Kakade, S. M., Livescu, K., & Sridharan, K. (2009). Multiview clustering via canonical correlation analysis. In Proceedings of the 26th annual international conference on machine learning, (pp. 129–136). ACM.Google Scholar
 Chen, N., Zhu, J., Sun, F., & Xing, E. P. (2012). Largemargin predictive latent subspace learning for multiview data analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(12), 2365–2378.CrossRefGoogle Scholar
 Dhillon, P., Foster, D. P., & Ungar, L. H. (2011). Multiview learning of word embeddings via CCA. In Advances in Neural information processing systems (NIPS), (pp. 199–207).Google Scholar
 Eickhoff, C., de Vries, A. P., & Hofmann, T. (2015). Modelling term dependence with copulas. In Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, (pp. 783–786).Google Scholar
 Elidan, G. (2010). Copula bayesian networks. In Advances in neural information processing systems (NIPS), (pp. 559–567).Google Scholar
 Elidan, G. (2012). Copula network classifiers (cncs). In Proceedings of the seventeenth international conference on artificial intelligence and statistics (AISTATS), (pp. 346–354).Google Scholar
 Escobar, M. D., & West, M. (1995). Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association, 90, 577–588.Google Scholar
 Fujimaki, R., Sogawa, Y., & Morinaga, S. (2011). Online heterogeneous mixture modeling with marginal and copula selection. In Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, (pp. 645–653).Google Scholar
 Genest, C., & Neslehova, J. (2007). A primer on copulas for count data. Astin Bulletin, 37(2), 475.MathSciNetCrossRefzbMATHGoogle Scholar
 Goldberger, A. L., Amaral, L. A. N., Glass, L., Hausdorff, J. M., Ivanov, P Ch., Mark, R. G., et al. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation, 101(23), 215–220.CrossRefGoogle Scholar
 Gonçalves, A., Von Zuben, F. J., & Banerjee, A. (2016). Multitask sparse structure learning with gaussian copula models. Journal of Machine Learning Research, 17(33), 1–30.MathSciNetzbMATHGoogle Scholar
 Guo, Y., & Xiao, M. (2012). Cross language text classification via subspace coregularized multiview learning. In Proceedings of the 29th international conference on machine learning (ICML).Google Scholar
 Han, F., & Liu, H. (2013). Principal component analysis on nongaussian dependent data. In Proceedings of the 30th international conference on machine learning (ICML), (pp. 240–248).Google Scholar
 Han, F., Zhao, T., & Liu, H. (2013). Coda: High dimensional copula discriminant analysis. Journal of Machine Learning Research, 14, 629–671.MathSciNetzbMATHGoogle Scholar
 Hoff, P. D. (2007). Extending the rank likelihood for semiparametric copula estimation. The Annals of Applied Statistics, 1(1), 265–283.MathSciNetCrossRefzbMATHGoogle Scholar
 Hoff, P. D. (2008). Rank likelihood estimation for continuous and discrete data. ISBA Bulletin, 15(1), 8–10.Google Scholar
 Hubert, L., & Arabie, P. (1985). Comparing partitions. Journal of Classification, 2(1), 193–218.CrossRefzbMATHGoogle Scholar
 Joe, H. (2014). Dependence Modeling with Copulas. Boca Raton: CRC Press.zbMATHGoogle Scholar
 Kalaitzis, A., & Silva, R. (2013). Flexible sampling of discrete data correlations without the marginal distributions. In Advances in neural information processing systems (NIPS).Google Scholar
 Kim, D., Kim, J.M., Liao, S.M., & Jung, Y.S. (2013). Mixture of Dvine copulas for modeling dependence. Computational Statistics & Data Analysis, 64, 1–19.MathSciNetCrossRefGoogle Scholar
 Klami, A., & Kaski, S. (2008). Probabilistic approach to detecting dependencies between data sets. Neurocomputing, 72(1), 39–46.CrossRefGoogle Scholar
 Klami, A., Virtanen, S., & Kaski, S. (2010). Bayesian exponential family projections for coupled data sources. In Proceedings of the twentysixth conference on uncertainty in artificial intelligence (UAI), (pp. 286–293).Google Scholar
 Kosmidis, I., & Karlis, D. (2015). Modelbased clustering using copulas with applications. In Statistics and computing. Springer.Google Scholar
 Kumar, A., Rai, P., & Daume, H. (2011). Coregularized multiview spectral clustering. In Advances in neural information processing systems (NIPS), (pp. 1413–1421).Google Scholar
 Letham, B., Sun, W., & Sheopuri, A. (2014). Latent variable copula inference for bundle pricing from retail transaction data. In Proceedings of the 31st international conference on machine learning (ICML), (pp. 217–225).Google Scholar
 LopezPaz, D., Hernándezlobato, J. M, & Schölkopf, B. (2012). Semisupervised domain adaptation with nonparametric copulas. In Advances in neural information processing systems (NIPS), (pp. 665–673).Google Scholar
 LopezPaz, D., HernándezLobato, J. M., & Ghahramani, Z. (2013). Gaussian process vine copulas for multivariate dependence. In International conference on machine learning (ICML), (pp. 10–18).Google Scholar
 Marlin, B. M., Kale, D. C., Khemani, R. G., & Wetzel, R. C. (2012). Unsupervised pattern discovery in electronic health care data using probabilistic clustering models. In Proceedings of the 2nd ACM SIGHIT international health informatics symposium, (pp. 389–398). ACM.Google Scholar
 McParland, D., & Gormley, I. C. (2016). Model based clustering for mixed data: clustMD. Advances in Data Analysis and Classification,. doi: 10.1007/s116340160238x.
 McParland, D., Gormley, I. C., McCormick, T. H., Clark, S. J., Kabudula, C. W., & Collinson, M. A. (2014). Clustering South African households based on their asset status using latent variable models. The Annals of Applied Statistics, 8(2), 747.MathSciNetCrossRefzbMATHGoogle Scholar
 Meeds, E., Ghahramani, Z., Neal, R., & Roweis, S. (2007). Modeling dyadic data with binary latent factors. In Advances in neural information processing systems (NIPS), 19.Google Scholar
 Meilă, M. (2007). Comparing clusterings: an information based distance. Journal of Multivariate Analysis, 98(5), 873–895.MathSciNetCrossRefzbMATHGoogle Scholar
 Minh, H. Q., Bazzani, L., & Murino, V. (2013). A unifying framework for vectorvalued manifold regularization and multiview learning. In Proceedings of the 30th international conference on machine learning (ICML), (pp. 100–108).Google Scholar
 Murphy, K. P. (2012). Machine learning: A probabilistic perspective. Boston: MIT Press.zbMATHGoogle Scholar
 Neal, Radford M. (2000). Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics, 9(2), 249–265.MathSciNetGoogle Scholar
 Panagiotelis, A., Czado, C., & Joe, H. (2012). Pair copula constructions for multivariate discrete data. Journal of the American Statistical Association, 107(499), 1063–1072.MathSciNetCrossRefzbMATHGoogle Scholar
 Pennington, J., Socher, R., & Manning, C. D. (2014). Glove: Global vectors for word representation. In EMNLP, (Vol. 14, pp. 1532–1543).Google Scholar
 Plant, C. (2012). Dependency clustering across measurement scales. In Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining, (pp. 361–369).Google Scholar
 Plant, C., & Böhm, C. (2011). INCONCO: Interpretable clustering of numerical and categorical objects. In Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining, (pp. 1127–1135).Google Scholar
 Rey, M., & Roth, V. (2012). Copula mixture model for dependencyseeking clustering. In International conference on machine learning (ICML).Google Scholar
 ShaweTaylor, John, & Cristianini, Nello. (2004). Kernel methods for pattern analysis. Cambridge: Cambridge University Press.CrossRefzbMATHGoogle Scholar
 Sklar, A. (1959). Fonctions de rpartition n dimensions et leurs marges. Publications de l’Institut de statistique de l’Universite de Paris, 8, 229–231.Google Scholar
 Smith, M. S., & Khaled, M. A. (2012). Estimation of copula models with discrete margins via Bayesian data augmentation. Journal of the American Statistical Association, 107(497), 290–303.MathSciNetCrossRefzbMATHGoogle Scholar
 Sun, J., Lu, J., Xu, T., & Bi, J. (2015). Multiview sparse coclustering via proximal alternating linearized minimization. In Proceedings of the 32nd international conference on machine learning (ICML), (pp. 757–766).Google Scholar
 Teh, Y. W. (2010). Dirichlet processes. In Encyclopedia of machine learning. Springer.Google Scholar
 Tenzer, Y., & Elidan, G. (2013). Speedy model selection (sms) for copula models. In Proceedings of the 30th conference on uncertainty in artificial intelligence (UAI).Google Scholar
 Tran, D., Blei, D., & Airoldi, E. M. (2015). Copula variational inference. In Advances in neural information processing systems (NIPS), (pp. 3564–3572).Google Scholar
 Vinh, N. X., Epps, J., & Bailey, J. (2010). Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. The Journal of Machine Learning Research, 11, 2837–2854.MathSciNetzbMATHGoogle Scholar
 Wang, H., Nie, F., & Huang, H. (2013). Multiview clustering and feature learning via structured sparsity. In Proceedings of the 30th international conference on machine learning (ICML), (pp. 352–360).Google Scholar
 Wang, W., Arora, R., Livescu, K., & Bilmes, J. (2015). On deep multiview representation learning. In Proceedings of the 32nd international conference on machine learning (ICML), (pp. 1083–1092).Google Scholar
 White, M., Zhang, X., Schuurmans, D., & Yu, Y.l. (2012). Convex multiview subspace learning. In Advances in neural information processing systems (NIPS), (pp. 1673–1681).Google Scholar
 Wu, Y., José Miguel, H.L. & Ghahramani, Z. (2013). Dynamic covariance models for multivariate financial time series. In Proceedings of the 31st international conference on machine learning (ICML), (pp. 558–566).Google Scholar
 Yerebakan, H. Z., Rajwa, B., & Dundar, M. (2014). The infinite mixture of infinite Gaussian mixtures. In Advances in neural information processing systems (NIPS).Google Scholar