Biconditional Generative Adversarial Networks for Multiview Learning with Missing Views
- 2.8k Downloads
Abstract
In this paper, we present a conditional GAN with two generators and a common discriminator for multiview learning problems where observations have two views, but one of them may be missing for some of the training samples. This is for example the case for multilingual collections where documents are not available in all languages. Some studies tackled this problem by assuming the existence of view generation functions to approximately complete the missing views; for example Machine Translation to translate documents into the missing languages. These functions generally require an external resource to be set and their quality has a direct impact on the performance of the learned multiview classifier over the completed training set. Our proposed approach addresses this problem by jointly learning the missing views and the multiview classifier using a tripartite game with two generators and a discriminator. Each of the generators is associated to one of the views and tries to fool the discriminator by generating the other missing view conditionally on the corresponding observed view. The discriminator then tries to identify if for an observation, one of its views is completed by one of the generators or if both views are completed along with its class. Our results on a subset of Reuters RCV1/RCV2 collections show that the discriminator achieves significant classification performance; and that the generators learn the missing views with high quality without the need of any consequent external resource.
Keywords
Bilingual text classification Multiview learning Conditional GAN1 Introduction
We address the problem of multiview learning with Generative Adversarial Networks (GANs) in the case where some observations may have missing views without there being an external resource to complete them. This is a typical situation in many applications where different sources generate different views of samples unevenly; like text information present in all Wikipedia pages while images are more scarce. Another example is multilingual text classification where documents are available in two languages and share the same set of classes while some are just written in one language. Previous works supposed the existence of view generating functions to complete the missing views before deploying a learning strategy [2]. However, the performance of the global multiview approach is biased by the quality of the generating functions which generally require external resources to be set. The challenge is hence to learn an efficient model from the multiple views of training data without relying on an extrinsic approach to generate altered views for samples that have missing ones.
In this direction, GANs provide a propitious and broad approach with a high ability to seize the underlying distribution of the data and create new samples [11]. These models have been mostly applied to image analysis and major advances have been made on generating realistic images with low variability [7, 15, 16]. GANs take their origin from the game theory and are formulated as a two players game formed by a generator G and a discriminator D. The generator takes a noise z and produces a sample G(z) in the input space, on the other hand the discriminator determines whenever a sample comes from the true distribution of the data or if it is generated by G. Other works included an inverse mapping from the input to the latent representation, mostly referred to as BiGANs, and showed the usefulness of the learned feature representation for auxiliary discriminant problems [8, 9]. This idea paved the way for the design of efficient approaches for generating coherent synthetic views of an input image [6, 14, 21].
In this work, we propose a GAN based model for bilingual text classification, called \(\texttt {Cond}^2\texttt {GANs}\), where some training documents are just written in one language. The model learns the representation of missing versions of bilingual documents jointly with the association to their respective classes, and is composed of a discriminator D and two generators \(G_1\) and \(G_2\) formulated as a tripartite game. For a given document with a missing version in one language, the corresponding generator induces the latter conditionally on the observed one. The training of the generators is carried out by minimizing a regularized version of the cross-entropy measure proposed for multi-class classification with GANs [19] in a way to force the models to generate views such that the completed bilingual documents will have high class assignments. At the same time, the discriminator learns the association between documents and their classes and distinguishes between observations that have their both views and those that got a completed view by one of the generators. This is achieved by minimizing an aggregated cross-entropy measure in a way to force the discriminator to be certain of the class of observations with their complete views and uncertain of the class of documents for which one of the versions was completed. The regularization term in the objectives of generators is derived from an adapted feature matching technique [17] which is an effective way for preventing from situations where the models become unstable; and which leads to fast convergence.
Propose a new tripartite GAN model that makes class prediction along with the generation of high quality document representations in different input spaces in the case where the corresponding versions are not observed (Sect. 3.2);
Achieve state-of-the art performance compared to multiview approaches that rely on external view generating functions on multilingual document classification; and which is another challenging application than image analysis which is the domain of choice for the design of new GAN models (Sect. 4.2);
Demonstrate the value of the generated views within our approach compared to when they are generated using MT (Sect. 4.2).
2 Related Work
Multiview learning has been an active domain of research these past few years. Many advances have been made on both theoretic and algorithmic sides [5, 12]. The three main families of techniques for (semi-)supervised learning are (kernel) Canonical Correlation Analysis (CCA), Multiple kernel learning (MKL) and co-regularization. CCA finds pairs of highly correlated subspaces between the views that is used for mapping the data before training, or integrated in the learning objective [3, 10]. MKL considers one kernel per view and different approaches have been proposed for their learning. In one of the earliest work, [4] proposed an efficient algorithm based on sequential minimization techniques for learning a corresponding support vector machine defined over a convex nonsmooth optimization problem. Co-regularization techniques tend to minimize the disagreement between the single-view classifiers over their outputs on unlabeled examples by adding a regularization term to the objective function [18]. Some approaches have also tackled the tedious question of combining the predictions of the view specific classifiers [20]. However all these techniques assume that views of a sample are complete and available during training and testing.
Recently, many other studies have considered the generation of multiple views from a single input image using GANs [14, 21, 23] and have demonstrated the intriguing capacity of these models to generate coherent unseen views. The former approaches rely mostly on an encoder-encoder network to first map images into a latent space and then generate their views using an inverse mapping. This is a very exciting problem, however, our learning objective differs from these approaches as we are mostly interested in the classification of muti-view samples with missing views. The most similar work to ours that uses GANs for multiview classification is probably [6]. This approach generates missing views of images in the same latent space than the input image, while \(\texttt {Cond}^2\texttt {GANs}\) learns the representations of the missing views in their respective input spaces conditionally on the observed ones which in general are from other feature spaces. Furthermore, \(\texttt {Cond}^2\texttt {GANs}\) benefits from low complexity and stable convergence which has been shown to be lacking in the previous approach.
Another work which has considered multiview learning with incomplete views, also for document classification, is [2]. The authors proposed a Rademacher complexity bounds for a multiview Gibbs classifier trained on multilingual collections where the missing versions of documents have been generated by Machine Translation systems. Their bounds exhibit a term corresponding to the quality of the MT system generating the views. The bottleneck is that MT systems depend on external resources, and they require a huge amount of parallel collections containing documents and their translations in all languages of interest for their tuning. For rare languages, this can ultimately affect the performance of the learning models. Regarding these aspects our proposed approach differs from all the previous studies, as we do not suppose the existence of parallel training sets nor MT systems to generate the missing versions of the training observations.
3 Cond\(^2\)GANs
In the following sections, we first present the basic definitions which will serve to our problem setting, and then the proposed model for multiview classification with missing views.
3.1 Framework and Problem Setting
We consider multiclass classification problems, where a bilingual document is defined as a sequence \(\mathbf{x}= (x^{1},x^{2})\in \mathcal{X}\) that belongs to one and only one class \(\mathbf {y}\in \mathcal{Y}= \{0,1\}^K\). The class membership indicator vector \(\mathbf {y}=(y_k)_{1\le k\le K}\), of each bilingual document, has all its components equal to 0 except the one that indicates the class associated with the example which is equal to one. Here we suppose that \(\mathcal{X}= \left( \mathcal{X}_1 \cup \{\perp \}\right) \times \left( \mathcal{X}_{2} \cup \{\perp \}\right) \), where \(x^{v}= \perp \) means that the v-th view is not observed. Hence, each observed view \(x^{v}\in \mathbf{x}\) is such that \(x^{v}\ne \perp \) and it provides a representation of \(\mathbf{x}\) in a corresponding input space \(\mathcal{X}_v\subseteq \mathbb {R}^{d_v}\). Following the conclusions of the co-training study [5], our framework is based on the following main assumption:
Assumption 1
([5]). Observed views are not completely correlated, and are equally informative.
Furthermore, we assume that each example \((\mathbf{x},\mathbf {y})\) is identically and independently distributed (i.i.d.) according to a fixed yet unknown distribution \(\mathcal {D}\) over \(\mathcal{X}\times \mathcal{Y}\), and that at least one of its views is observed. Additionally, we suppose to have access to a training set \(\mathcal {S}=\{(\mathbf{x}_i,\mathbf {y}_i); i\in \{1,\ldots ,m\}\}=\mathcal {S}_F\sqcup \mathcal {S}_1\sqcup \mathcal {S}_2\) of size m drawn i.i.d. according to \(\mathcal {D}\), where \(\mathcal {S}_F=\{\left( \left( x^{1}_i,x^{2}_i\right) ,\mathbf {y}_i\right) \mid i\in \{1,\ldots ,m_F\}\}\) denotes the subset of training samples with their both complete views and \(\mathcal {S}_1=\{\left( \left( x^{1}_i,\perp \right) ,\mathbf {y}_i\right) \mid i\in \{1,\ldots ,m_1\}\}\) (respectively \(\mathcal {S}_2=\{\left( \left( \perp , x^{2}_i\right) ,\mathbf {y}_i\right) \mid i\in \{1,\ldots ,m_2\}\}\)) is the subset of training samples with their second (respectively first) view that is not observed (i.e. \(m=m_F+m_1+m_2\)).
It is possible to address this problem using existing techniques; for example, by learning singleview classifiers independently on the examples of \(\mathcal {S}\sqcup \mathcal {S}_1\) (respectively \(\mathcal {S}\sqcup \mathcal {S}_2\)) for the first (respectively second) view. To make predictions, one can then combine the outputs of the classifiers [20] if both views of a test example are observed; or otherwise, use one of the outputs corresponding to the observed view. Another solution is to apply multiview approaches over the training samples of \(\mathcal {S}_F\); or over the whole training set \(\mathcal {S}\) by completing the views of examples in \(\mathcal {S}_1\) and \(\mathcal {S}_2\) before using external view generation functions.
3.2 The Tripartite Game
neural networks that are trained using an objective implementing a three players game between a discriminator, D, and two generators, \(G_1\) and \(G_2\). The game that these models play is depicted in Fig. 1 and it can be summarized as follows. At each step, if an observation is chosen with a missing view, the corresponding generator – \(G_1\) (respectively \(G_2\)) if the first (respectively second) view is missing – produces the view from random noise conditionally on the observed view in a way to fool the discriminator. On the other hand, the discriminator takes as input an observation with both of its views complete or completed and, classifies it if the views are initially observed or tells if a view was produced by one of the generators. Formally, both generators \(G_1\) and \(G_2\) take as input; samples from the training subsets \(\mathcal {S}_2\) and \(\mathcal {S}_1\) respectively; as well as random noise drawn from a uniform distribution defined over the input space of the missing view and produce the corresponding pseudo-view, which is missing; i.e. \(G_1(z^1,x^2)=\tilde{x}^1\) and \(G_2(x^1, z^2)=\tilde{x}^2\). These models are learned in a way to replicate the conditional distributions \(p(x^1|x^2,z^1)\) and \(p(x^2|x^1,z^2)\); and inherently define two probability distributions, denoted respectively by \(p_{G_1}\) and \(p_{G_2}\), as the distribution of samples if both views where observed i.e. \((\tilde{x}^1,x^2)\sim p_{G_1}(x^1,x^2)\), \((x^1,\tilde{x}^2)\sim p_{G_2}(x^1,x^2)\). On the other hand, the discriminator takes as input a training sample; either from the set \(\mathcal {S}_F\), or from one of the training subsets \(\mathcal {S}_1\) or \(\mathcal {S}_2\) where the missing view of the example is generated by one of the generators accordingly. The task of D is then to recognize observations from \(\mathcal {S}_1\) and \(\mathcal {S}_2\) that have completed views by \(G_1\) and \(G_2\) and to classify examples from \(\mathcal {S}\) to their true classes. To achieve this goal we add a fake class, \(K+1\), to the set of classes, \(\mathcal{Y}\), corresponding to samples that have one of their views generated by \(G_1\) or \(G_2\). The dimension of the discriminator’s output is hence set to \(K+1\) which by applying softmax is supposed to estimate the posterior probability of classes for each multiview observation (with complete or completed views) given in input. For an observation \(\mathbf{x}\in \mathcal{X}\), we use \(D_{K+1}(\mathbf{x})= p_D(y=K+1 | \mathbf{x})\) to estimate the probability that one of its views is generated by \(G_1\) or \(G_2\). As the task of the generators is to produce good quality views such that the observation with the completed view will be assigned to its true class with high probability, we follow [17] by supplying the discriminator to not get fooled easily as stated in the following assumption:
Assumption 2
([17]). An observation \(\mathbf{x}\) has one of its views generated by \(G_1\) or \(G_2\); if and only if \(D_{K+1}(\mathbf{x}) > \sum _{k=1}^K D_k(\mathbf{x})\).
3.3 Analyses and Convergence
From the outputs of the discriminator for all \(\mathbf{x}\in \mathcal{X}\) we build an auxiliary function \(\mathbf {D}(\mathbf{x})=\sum _{k=1}^K p_D(y=k\mid \mathbf{x})\) equal to the sum of the first K outputs associated to the true classes. In this following, we provide a theoretical analysis of \(\texttt {Cond}^2\texttt {GANs}\) involving the auxiliary function \(\mathbf {D}\) under nonparametric hypotheses.
Proposition 1
Proof
By plugging back \(\mathbf {D}^*_{G_1,G_2}\) (Eq. 3) into the value function \(V_{D}\) we have the following necessary and sufficient condition for attaining the global minimum of this function:
Theorem 1
Proof
From Eq. 4, it is straightforward to verify that \(p_{real}(x^1,x^2)=p_{G_1}(x^1,x^2)=p_{G_2}(x^1,x^2)\) is a global Nash equilibrium but it may not be unique. In order to ensure the uniqueness, we add the Jensen-Shannon divergence between the distribution \(p_{G_1}\) and \(p_{real}\) and \(p_{G_2}\) and \(p_{real}\) the value function \(V_D\) (Eq. 1) as stated in the corollary below.
Corollary 1
Proof
The proof follows from the positivness of JSD and the necessary and sufficient condition for it to be equal to 0. Hence, \(\bar{V_D}(\mathbf {D},G_1,G_2)\) reaches it minimum \(-\log 4\), iff \(p_{G_1}=p_{real}=p_{G_2}\). \(\Box \)
This result suggests that at equilibrium, both generators produce views such that observations with their completed view follow the same real distribution than those which have their both views observed.
3.4 Algorithm and Implementation
In order to be inline with the premises of Corollary 1; we empirically tested different solutions and the most effective one that we found was the feature matching technique proposed in [17], which addressed the problem of instability for the learning of generators by adding a penalty term \(\mathcal {L}_{FM}^v=\Vert \mathbb {E}_{p_{real}}f(x^1,x^2) - \mathbb {E}_{p_{G_v}}f(x^{3-v},G_v(x^v))\Vert \) \(, v\in \{1,2\}\) to their corresponding objectives (Eq. (8)). Where, \(\Vert .\Vert \) is the \(\ell _2\) norm and f is the sigmoid activation function on an intermediate layer of the discriminator. The overall algorithm of \(\texttt {Cond}^2\texttt {GANs}\) is shown above. The parameters of the three neural networks are first initialized using Xavier. For a given number of iterations T, minibatches of size \(3m_b\) are randomly sampled from the sets \(\mathcal {S}_F\), \(\mathcal {S}_1\) and \(\mathcal {S}_2\). Minibatches of noise vectors are randomly drawn from the uniform distribution. Models parameters of the discriminator and both generators are then sequentially updated using Adam optimization algorithm [13]. We implemented our method by having two layers neural networks for each of the components of \(\texttt {Cond}^2\texttt {GANs}\). These neural nets are composed of 200 nodes in hidden layers with a sigmoid activation function. Since the values of the generated samples are supposed to approximate any possible real value, we do not use the activation function in the outputs of both generators.^{1}
4 Experiments
The statistics of RCV1/RCV2 Reuters data collection used in our experiments.
4.1 Experimental Setup
In our experiments, we consider the case where the number of training documents having their two versions is much smaller than those with only one of their available versions (i.e. \(m_F\ll m_1+m_2\)). This corresponds to the case where the effort of gathering documents in different languages is much less than translating them from one language to another. To this end, we randomly select \(m_F=300\) samples having their both views, \(m_1 = m_2 = 6000\) samples with one of their views missing and the remaining samples without their translations for test. In order to evaluate the quality of generated views by \(\texttt {Cond}^2\texttt {GANs}\) we considered two scenarios. In the first one (denoted by T \(_{\texttt {EN}\tilde{v}}\)), we test on English documents by considering the generation of these documents with respect to the other view (\(v\in \{\texttt {FR},\texttt {GR},\texttt {IT},\texttt {SP}\}\)) using the corresponding generator. In the second scenario (denoted by T \(_{\tilde{\texttt {EN}} v}\)), we test on documents that are written in another language than English by considering their generation on English provided by the other generator. For evaluation, we test the following four classification approaches along with \(\texttt {Cond}^2\texttt {GANs}\); one singleview approach and four multiview approaches. In the singleview approach (denoted by \(\mathbf {c}_v\)) classifiers are the same as the discriminator and they are trained on the part of the training set with examples having their corresponding view observed. The multiview approaches are MKL [4], co-classification (co-classif) [1], unanimous vote (mv \(_b\)) [2]. Results are evaluated over the test set using the accuracy and the F\(_1\) measure which is the harmonic average of precision and recall. The reported performance are averaged over 20 random (train/test) sets, and the parameters of Adam optimization algorithm are set to \(\alpha = 10^{-4}\), \(\beta = 0.5\).
4.2 Experimental Results
Test classification accuracy averaged over 20 random training/test sets. For each of the pairs of languages, the best result is in bold, and a \(^\downarrow \) indicates a result that is statistically significantly worse than the best, according to a Wilcoxon rank sum test with \(p<.01\).
Approaches | (EN,\(v=~\) FR) | (EN,\(v=~\) GR) | (EN,\(v=~\) IT) | (EN,\(v=~\) SP) | ||||
---|---|---|---|---|---|---|---|---|
T \(_{\texttt {EN}\tilde{v}}\) | T \(_{\tilde{\texttt {EN}} v}\) | T \(_{\texttt {EN}\tilde{v}}\) | T \(_{\tilde{\texttt {EN}} v}\) | T \(_{\texttt {EN}\tilde{v}}\) | T \(_{\tilde{\texttt {EN}} v}\) | T \(_{\texttt {EN}\tilde{v}}\) | T \(_{\tilde{\texttt {EN}} v}\) | |
MKL | 75.6\(^\downarrow \) | 77.3\(^\downarrow \) | 79.4\(^\downarrow \) | 79.6\(^\downarrow \) | 78.4\(^\downarrow \) | 79.8\(^\downarrow \) | 81.2\(^\downarrow \) | 83.5\(^\downarrow \) |
co-classif | 81.4\(^\downarrow \) | 83.2\(^\downarrow \) | 84.3\(^\downarrow \) | 81.6\(^\downarrow \) | 82.7\(^\downarrow \) | 82.5\(^\downarrow \) | 85.1\(^\downarrow \) | 86.2\(^\downarrow \) |
mv \(_b\) | 83.1\(^\downarrow \) | 84.5\(^\downarrow \) | 85.2\(^\downarrow \) | 79.9\(^\downarrow \) | 84.3\(^\downarrow \) | 82.1\(^\downarrow \) | 84.4\(^\downarrow \) | 86.2\(^\downarrow \) |
\(\texttt {Cond}^2\texttt {GANs}\) | 85.3 | 85.1 | 86.6 | 82.9 | 85.3 | 84.5 | 86.5 | 88.3 |
Impact of the Increasing Number of Observed Views. In Fig. 3, we compare F\(_1\) measures between \(\texttt {Cond}^2\texttt {GANs}\) and one of the single-view classifiers with an increasing number of training samples, having the view corresponding to the singleview classifier observed; while the number of training examples with the other observed view is fixed. With an increasing number of training samples, the corresponding singleview classifier gains in performance. On the other hand, \(\texttt {Cond}^2\texttt {GANs}\) can leverage the lack of information from training examples having their other view observed, making that the difference of performance between these models for small number of training samples is higher.
5 Conclusion
In this paper we presented \(\texttt {Cond}^2\texttt {GANs}\) for multiview multiclass classification where observations may have missing views. The model consists of three neural-networks implementing a three players game between a discriminator and two generators. For an observation with a missing view, the corresponding generator produces the view conditionally on the other observed one. The discriminator is trained to recognize observations with a generated view from others having their views complete and to classify the latter into one of the existing classes. We evaluate the effectiveness of our approach on another challenging application than image analysis which is the domain of choice for the design of new GAN models. Our experiments on a subset of Reuters RCV1/RCV2 show the effectiveness of \(\texttt {Cond}^2\texttt {GANs}\) to generate high quality views allowing to achieve significantly better results, compared to the case where the missing views are generated by Machine Translation which requires a large collection of sentences and their translations to be tuned. As future study, we will be working on the generalization of the proposed model to more than 2 views. One possible direction is the use of an aggregation function of available views as a condition to the generators.
Footnotes
References
- 1.Amini, M.R., Goutte, C.: A co-classification approach to learning from multilingual corpora. Mach. Learn. J. 79(1–2), 105–121 (2010)MathSciNetCrossRefGoogle Scholar
- 2.Amini, M.R., Usunier, N., Goutte, C.: Learning from multiple partially observed views - an application to multilingual text categorization. In: Advances in Neural Information Processing Systems, vol. 22, 28–36 (2009)Google Scholar
- 3.Bach, F.R., Jordan, M.I.: Kernel independent component analysis. J. Mach. Learn. Res. 3(Jul), 1–48 (2003)MathSciNetzbMATHGoogle Scholar
- 4.Bach, F.R., Lanckriet, G.R.G., Jordan, M.I.: Multiple kernel learning, conic duality, and the SMO algorithm. In: Proceedings of the \(21^{st}\) International Conference on Machine Learning (ICML) (2004)Google Scholar
- 5.Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory (COLT), pp. 92–100 (1998)Google Scholar
- 6.Chen, M., Denoyer, L.: Multi-view generative adversarial networks. In: Ceci, M., Hollmén, J., Todorovski, L., Vens, C., Džeroski, S. (eds.) ECML PKDD 2017. LNCS (LNAI), vol. 10535, pp. 175–188. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71246-8_11CrossRefGoogle Scholar
- 7.Denton, E.L., Chintala, S., Szlam, A., Fergus, R.: Deep generative image models using a laplacian pyramid of adversarial networks. In: Advances in Neural Information Processing Systems, vol. 28, 1486–1494 (2015)Google Scholar
- 8.Donahue, J., Krähenbühl, P., Darrell, T.: Adversarial feature learning. In: International Conference on Representation Learning (ICLR) (2017)Google Scholar
- 9.Dumoulin, V., et al.: Adversarially learned inference. In: International Conference on Representation Learning (ICLR) (2017)Google Scholar
- 10.Farquhar, J., Hardoon, D., Meng, H., Shawe-taylor, J.S., Szedmák, S.: Two view learning: SVM-2K, theory and practice. In: Advances in Neural Information Processing Systems vol. 18, pp. 355–362 (2006)Google Scholar
- 11.Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, vol. 27, pp. 2672–2680 (2014)Google Scholar
- 12.Goyal, A., Morvant, E., Germain, P., Amini, M.R.: PAC-Bayesian Analysis for a two-step Hierarchical Multiview Learning Approach. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 205–221 (2017)Google Scholar
- 13.Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Representation Learning (ICLR) (2015)Google Scholar
- 14.Ma, L., Jia, X., Sun, Q., Schiele, B., Tuytelaars, T., Van Gool, L.: Pose guided person image generation. In: Advances in Neural Information Processing Systems, vol. 30, pp. 406–416 (2017)Google Scholar
- 15.Odena, A., Olah, C., Shlens, J.: Conditional image synthesis with auxiliary classifier GANs. In: Proceedings of the 34th International Conference on Machine Learning (ICML), pp. 2642–2651 (2017)Google Scholar
- 16.Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: International Conference on Representation Learning (ICLR) (2016)Google Scholar
- 17.Salimans, T., et al.: Improved techniques for training GANs. In: Advances in Neural Information Processing Systems, vol. 29, pp. 2234–2242 (2016)Google Scholar
- 18.Sindhwani, V., Rosenberg, D.S.: An RKHS for multi-view learning and manifold co-regularization. In: Proceedings of the \(25^{th}\) International Conference on Machine Learning (ICML) (2008)Google Scholar
- 19.Springenberg, J.T.: Unsupervised and semi-supervised learning with categorical generative adversarial networks. In: International Conference on Representation Learning (ICLR) (2016)Google Scholar
- 20.Tian, L., Nie, F., Li, X.: A unified weight learning paradigm for multi-view learning. In: Proceedings of Machine Learning Research, pp. 2790–2800 (2019)Google Scholar
- 21.Tian, Y., Peng, X., Zhao, L., Zhang, S., Metaxas, D.N.: CR-GAN: learning complete representations for multi-view generation. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI), pp. 942–948 (2018)Google Scholar
- 22.Ueffing, N., Simard, M., Larkin, S., Johnson, H.: NRC’s portage system for WMT 2007. In: Proceedings of the Second Workshop on Statistical Machine Translation, pp. 185–188 (2007)Google Scholar
- 23.Zhao, B., Wu, X., Cheng, Z., Liu, H., Feng, J.: Multi-view image generation from a single-view. In: Proceedings of the 26th ACM International Conference on Multimedia (MM), pp. 383–391 (2018)Google Scholar