Abstract
Missing data is one of the most common preprocessing problems. In this paper, we experimentally research the use of generative and nongenerative models for feature reconstruction. Variational Autoencoder with Arbitrary Conditioning (VAEAC) and Generative Adversarial Imputation Network (GAIN) were researched as representatives of generative models, while the denoising autoencoder (DAE) represented nongenerative models. Performance of the models is compared to traditional methods knearest neighbors (kNN) and Multiple Imputation by Chained Equations (MICE). Moreover, we introduce WGAIN as the Wasserstein modification of GAIN, which turns out to be the best imputation model when the degree of missingness is less than or equal to \(30\%\). Experiments were performed on realworld and artificial datasets with continuous features where different percentages of features, varying from \(10\%\) to \(50\%\), were missing. Evaluation of algorithms was done by measuring the accuracy of the classification model previously trained on the uncorrupted dataset. The results show that GAIN and especially WGAIN are the best imputers regardless of the conditions. In general, they outperform or are comparative to MICE, kNN, DAE, and VAEAC.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
When working with realworld datasets one of the standard problems that needs solving as part of the data preprocessing phase is dealing with missing data. The missingness can be represented by either individual missing data randomly located in instances or by the absence of entire features.
To our best knowledge, not much attention is paid to the second scenario where entire features are missing, i.e., there are no clear answers to questions such as how to face the situation, how the standard imputation method will perform or if there is a need to approach this challenge in a different way.
The aim of our work is to study these issues by experimentally comparing several stateofthe art imputation methods in realworld scenarios where one needs to impute (i.e., reconstruct) entire features. This work follows up on our previous work presented in paper [12], where we focus on the comparison of traditional (kNN, linear regression, MICE) and modern (multilayer perceptron, extreme gradient boosted trees) imputation methods.
In the current paper, we research more universal imputers represented by autoencoders and generative neural network models. These models have a common advantage in that one does not need to know which features are missing in advance. On the contrary, regular imputation methods need to be trained for each combination of missing features separately. A typical example where a universal imputer is needed is the prediction of a classification model from sensor data, where a sensor breakdown leads to missing data in one or more features. Usually, the prediction model itself is not able to handle this situation without a significant decrease in its performance. Furthermore, one typically does not know in advance which sensor is going to be broken. The best approach would be to retrain the model using data without missing features. However, in a production setting model retraining is impossible as the existing model needs to respond to corrupted data immediately.
We consider a situation where the prediction model is trained on a complete preprocessed dataset with numeric features, and we study its accuracy changes on new unseen data with imputed missing features. The amount of missing data (i.e. features) varies between \(10\%\) and \(50\%\). Experiments are performed on ten real and two artificial datasets. The impact of imputation is measured as the classification accuracy change of the best performing from six commonly used classification models: logistic regression, multilayer perceptron, kNN, naive Bayes, extreme gradient boosted trees [7], and random forest. Besides accuracy we also use root mean squared error (RMSE) (which was also used in [6, 17, 35]) as a measure of the quality of the imputation.
We compare the denoising autoencoder (DAE) [33], Generative Adversarial Imputation Network (GAIN) [35], and Variational Autoencoder with Arbitrary Conditioning (VAEAC) [17] with kNN and MICE [4], which are considered to be successful traditional imputation methods. Moreover, we introduce Wasserstein Generative Adversarial Imputation Network (WGAIN), a Wasserstein based modification of GAIN, see [2]. WGAIN is a generative imputation model and generally outperforms other presented models on the tested datasets. The EarthMover distance and the corresponding discriminator’s critic of the Wasserstein approach do not suffer from vanishing gradients in the way that a vanilla GAN would. This enables the model to capture the desired distribution better.
The paper is organized as follows. In Sect. 2, we briefly review related work in this field. In Sect. 3 the WGAIN model is introduced. Section 4 is devoted to the description of experiments performed, including the evaluation of their results. Finally, the paper is concluded in Sect. 5.
2 Related Work
There are many traditional imputation methods, such as e.g., [11, 24, 32]. Some of the most common and successful are knearest neighbors imputation (kNN) [18] and multivariate imputation by chained equations (MICE) [29, 32].
Approaches based on deep learning have been under active development for the last few years. They use many variants of neural networks starting from multilayer perceptron, e.g., in [3, 30]. A more advanced approach is based on the autoencoder as a specific kind of neural network aiming to reconstruct inputs on its outputs. Here, one of the most commonly used models is the denoising autoencoder (DAE) [33], e.g., [5, 8, 10, 15, 34]. Typically, they are used in a discriminative way (see [15] for difference), meaning they impute a single value, which is deterministic once the network is trained.
On the other hand, the most recent research focuses on generative models which enables one to sample from the distribution conditioned on the observed features and thus get information about the uncertainty in imputed values. There are two groups of deep learning generative models. First, there are models based on the variational autoencoder (VAE) [19] and its conditional alternations, see [25, 26, 31, 36]. In this group, some of the most successful imputation models are VAEAC [17] and HIVAE [27].
The second group contains models based on the Generative Adversarial Network (GAN) [16]. Notably, one can encounter them in image reconstruction tasks (i.e., image inpainting), see [20, 22, 28]. One of the most prominent methods based on GAN is the GAIN [35], which uses the generator discriminator mechanism to achieve learning of the desired distribution. The generator observes some components of a real data vector, imputes the missing components conditioned on what is observed, and outputs a completed vector. The discriminator then takes a completed vector and attempts to determine which components were observed and which were imputed. The GAIN forms the base for our modification of the imputation method based on Wasserstein GAN [2], which is introduced in the next section. Only recently, GAIN was outperformed by the previously mentioned VAEAC and HIVAE. However, for numeric variables, HIVAE achieves a comparable error to the rest of the methods [27]. Therefore we have chosen only VAEAC for the experimental comparison.
3 Wasserstein Generative Adversarial Imputation Network
In this section, the WGAIN model is introduced as GAIN adapting the discriminative approach from Wasserstein GAN.
Let us denote \(\mathcal {X}= \mathbb {R}^d\) the ddimensional numeric data domain and let \(\boldsymbol{X} = (X_1,\dotsc , X_d)\) be a random vector with values in \(\mathcal {X}\) whose distribution is denoted by \(\mathrm {P}(\boldsymbol{X})\). Let the mask be a random binary vector \(\boldsymbol{M}\), i.e., random vector with values in \(\{0,1\}^d\). The mask corresponds to unobserved values of \(\boldsymbol{X}\) so that the value 0 of its jth component means that the jth feature of \(X_j\) is missing and the value 1 means that the jth feature of \(X_j\) is not missing. The distribution of \(\boldsymbol{M}\) corresponds to the distribution of missingness in the data. Let us further denote by \(\tilde{\boldsymbol{X}}\) the vector \(\boldsymbol{X}\) having zeros in place of missing values given by
where \(\odot \) denotes elementwise multiplication. Our aim is to impute missing values in \(\tilde{\boldsymbol{X}}\) based on information from nonmissing features of \(\tilde{\boldsymbol{X}}\) and \(\boldsymbol{M}\). It is done in a generative way and it means that we want to learn the conditional distribution \(\mathrm {P}(\boldsymbol{X}  \tilde{\boldsymbol{X}} = \tilde{\boldsymbol{x}}, \boldsymbol{M} = \boldsymbol{m})\) of \(\boldsymbol{X}\) given \(\tilde{\boldsymbol{X}} = \tilde{\boldsymbol{x}}\) and \(\boldsymbol{M} = \boldsymbol{m}\). To do this let \(\boldsymbol{Z}\) be a random vector with identically distributed independent components having normal distribution \(\text {N}(0,\sigma ^2)\) with variance \(\sigma ^2\) and define
i.e. \(\tilde{\boldsymbol{X}}_{\boldsymbol{Z}}\) is \(\tilde{\boldsymbol{X}}\) with missing components replaced by normal random variables.
The WGAIN model consists of two parts, the generator g and the critic f, both represented by deep neural networks. The generator g is constructed as a mapping \(g: \mathcal {X}\times \{0,1\}^d \rightarrow \mathcal {X}\) so that
is a random vector whose conditional distribution \(\mathrm {P}(\hat{\boldsymbol{X}}_{\boldsymbol{Z}} \tilde{\boldsymbol{X}} = \tilde{\boldsymbol{x}}, \boldsymbol{M} = \boldsymbol{m})\), determined by the distribution \(\mathrm {P}(\boldsymbol{Z})\) of \(\boldsymbol{Z},\) should be close to the conditional distribution \(\mathrm {P}(\boldsymbol{X}  \tilde{\boldsymbol{X}} = \tilde{\boldsymbol{x}}, \boldsymbol{M} = \boldsymbol{m})\). Note that \(g(\tilde{\boldsymbol{x}}_{\boldsymbol{Z}}, \boldsymbol{m})\) is a random vector corresponding to \(\tilde{\boldsymbol{x}}\) with all missing components imputed.
In order to train it, we employ the standard squared loss function
forcing the output \(\hat{\boldsymbol{X}}_{\boldsymbol{Z}}\) to be close to the original data \(\boldsymbol{X}\). However, it turns out that this condition alone is not sufficient for learning the proper conditional distribution. To improve the performance of the generator, one may introduce a discriminator trying to find out which components of \(\hat{\boldsymbol{X}}_{\boldsymbol{Z}}\) were imputed and use the discriminator for adversarial training. This approach was introduced in [35] and is the base of WGAIN.
In this paper we present a similar way how to improve the conditional distribution of the generator’s output. It is based on the EarthMover (EM) distance between two probability distributions \(\mathrm {P}(X), \mathrm {P}(Y)\) defined by
where \(\mathbf {\Pi }(\mathrm {P}(X), \mathrm {P}(Y))\) denotes the set of all joint distributions (X, Y) whose marginals are respectively \(\mathrm {P}(X)\) and \(\mathrm {P}(Y)\). The term \({{\,\mathrm{E}\,}}_{(X, Y) \sim \gamma } \Vert X  Y \Vert \) might be understood as a measure of how much probability mass has to be transported in order to transform the distributions \(\mathrm {P}(X)\) into the distribution \(\mathrm {P}(Y)\) when the joint distribution is \(\gamma \). The EM distance can thus be seen as the cost of the optimal transport plan, see [2] and references therein for more details. The EM distance is usually expressed using the KantorovichRubinstein duality as
where \(\Vert f \Vert _L\) means that f is Lipschitz continuous with Lipschitz constant 1 which might be changed to any constant K since it just multiplies \(W\big (\mathrm {P}(X), \mathrm {P}(Y)\big )\) by the same constant.
In Wasserstein GAN one approximates (1) by training the neural network \(f_{\boldsymbol{w}}\) parametrized with weights \(\boldsymbol{w}\) in some compact space \(\mathcal {W}\), thus enforcing the Lipschitz continuity. The function \(f_{\boldsymbol{w}}\) is called the critic and is trained to maximize the expectations difference in (1). For a single dimensional generator g trying to transform random variable Z so that it has the distribution \(\mathrm {P}(X)\) one maximizes
In our case we want to minimize the EM distance between \(\mathrm {P}(\hat{\boldsymbol{X}}_{\boldsymbol{Z}} \tilde{\boldsymbol{X}} = \tilde{\boldsymbol{x}}, \boldsymbol{M} = \boldsymbol{m})\) and \(\mathrm {P}(\boldsymbol{X}  \tilde{\boldsymbol{X}} = \tilde{\boldsymbol{x}}, \boldsymbol{M} = \boldsymbol{m})\). Hence, we take the mask \(\boldsymbol{M}\) as the second argument of the critic as additional information to the first argument given by \(\boldsymbol{X}\) with correct features behind the mask \(\boldsymbol{M}\). The critic is therefore a mapping \(f_{\boldsymbol{w}}: \mathcal {X}\times \{0,1\}^d \rightarrow \mathbb {R}\) trained to maximize
which is usually estimated by sample means from minibatches. The overall structure of WGAIN is depicted in Fig. 1.
3.1 Training
The critic \(f_{\boldsymbol{w}}\) is used in adversarial training of both the generator g and the critic itself. There the generator and the critic play an iterative twoplayer minimax game when the critic wants to recognize the imputed values from the real ones and the goal of the generator is to trick the critic so it cannot recognize them. Moreover, the generator’s output is tighten to the correct output by the squared loss function \(L_{\text {MSE}}\).
Putting it all together, we have two objective functions to minimize. The first corresponds to training of the discriminator given by
where the weight \(\lambda \) enables one to increase or decrease the influence of the corresponding gradient. Second is the objective for the generator,
where the first term \(\lambda _g\) and \(\lambda _{\text {MSE}}\) are weights enabling one to strengthen or weaken the influence of squared loss function. The optimization is done via alternating gradient descent, where the first step is updating the critic \(f_{\boldsymbol{w}}\) and the second step is updating the generator g. Hence, when perfectly trained, the discriminator gives negative values to cases with imputed features and positive values for cases with true features. On the other hand, the generator entering the critic will be pushed to obtain large positive values of the critic as it gives to real values.
The pseudocode of the WGAIN training is given in Algorithm 1.
4 Experiments
An experimental validation of WGAIN using ten real and two artificial publicly available datasets is presented below. These datasets contain numeric data only and are devoted to the classification task. Their overview, together with the corresponding best performing classification models, is given in Table 2.
During the experiments, all datasets were divided as follows: \(70\%\) of data was used to train all classification and imputation models and \(30\%\) was used as a test set to evaluate imputation performance. The imputation models were trained to impute in scenarios where randomly selected combinations of multiple features are missing. The amount of missingness varies from \(10\%\) to \(50\%\) of missing features. Finally, evaluation of the accuracy of the classification model combined with all imputation methods is performed on the test dataset.
4.1 Imputation Models and Their Parameters
Let us start with the presented WGAIN model. The generator and the critic architectures were the same for all datasets and are described in Table 1. During the training, the following settings were used:

The original data \(\boldsymbol{X}\) are sampled in minibatches of size \(m = 128\).

The missingness is introduced using the mask \(\boldsymbol{M}\) with the following distribution: for each training point, the portion of missingness is sampled from a uniform distribution between 0 and maximum missing rate, which was chosen to be 0.3. Then the binary elements of \(\boldsymbol{M}\) were independently sampled with this portion of missingness, i.e., its item is 0 with a probability which was previously sampled.

The components of random vector \(\boldsymbol{Z}\) are i.i.d. with normal distribution having 0 mean and standard deviation 0.01.

The weights of the objectives functions \(J(f_{\boldsymbol{w}})\) and J(g) are \(\lambda _{f_{\boldsymbol{w}}} = 10\), \(\lambda _g = 2\), and \(\lambda _{\text {MSE}} = 1\).

Maximal norm used in clipping of the critic weights is \(w_{\max } = 1\).

We use RMSProp with learning rate \(\alpha = 0.0001\) as optimizers.

The number of training epochs is 8000.
The GAIN implementation follows the original paper [35] and is analogous to the described WGAIN with the following differences:

The generator architecture differs only in the sizes of layers, which are all equal to the input dimension.

The discriminator architecture is analogous to the generator architecture except for the sigmoid activation function on the last layer.

The binary elements of \(\boldsymbol{M}\) are independently sampled with the common portion of missingness, which is 0.2.

The hint rate used for the hint matrix is 0.9.

As an optimizer, we use Adam with learning rate of 0.0001.

The number of training epochs is 7000.
In the case of DAE, we follow the structure presented in [15]. For the hyperparameters search, the hyperband [21] algorithm was used. The typical best setup is the following: ELU as an activation function, three layers in both the encoder and decoder parts, the size of the code is twice the input dimension, and no regularization is used.
DAE, GAIN, and WGAIN models were implemented in the TensorFlow library^{Footnote 1}.
The implementation of VAEAC was based on the repository^{Footnote 2} corresponding to the original paper [17]. All hyperparameters stayed in the default settings.
For the MICE method (mice), we used the IterativeImputer class from the scikitlearn library^{Footnote 3}. In the default settings, the implementation uses Bayesian ridge regression as the internal imputation model and multiple imputations are pooled by the mean.
The kNN imputation (knn) was implemented using the fancyimpute library^{Footnote 4}. A missing value is imputed by sampling the mean of the values of its neighbors weighted proportionally to their inverse distances. In the case where multiple features are missing, we impute all missing values at once (per row). For the hyperparameter k values 11, 13, 15, 17, 19, 21, 23, 25 were tested. The best k was chosen based on the RMSE value.
4.2 Evaluation
The impact of imputation is evaluated using the classification accuracy changes of the best performing classification model chosen from the six commonly used ones: logistic regression (LR), multilayer perceptron (MLP), knearest neighbors (kNN), naive Bayes (NB), extreme gradient boosted trees (XGBT) (for details see [7]), and random forest (RF). The best hyperparameters for each model were found using randomized search algorithm. The accuracy of the best performing model for each dataset is shown in Table 2. Furthermore, the root mean squared error (RMSE) between the original and the imputed data is also used for evaluation, e.g., [6, 17, 35].
After all classification models were trained, and the most accurate one for each dataset was chosen, they were combined with imputation methods. Then, the accuracies of classification models on the imputed test dataset were measured.
Since it is not sound to compare accuracies for different datasets, we use a rank comparison. To do so, the algorithms are ranked for each dataset separately, the best performing algorithm getting the rank of 1, the secondbest rank 2, etc. An example of accuracies and corresponding ranks for 10% of missingness is presented in Tables 4 and 5. Even in cases when WGAIN is not the best, its performance is always comparable to the best performers. The only exception is the EEG dataset, where kNN imputation performs the best and the WGAIN is in second place with a difference of almost two percent.
The algorithms can be compared, taking the mean over the datasets. The results can be seen in Table 9. When the degree of missingness varies from \(10\%\) to \(30\%\) the WGAIN performs the best. When the degree of missingness is upwards of \(30\%\) the GAIN outperforms the WGAIN.
The results of the ranking evaluation can be statistically evaluated using the Friedman test [13, 14] and the corresponding posthoc tests. For more details, see [9]. Pvalues of Friedman \(\chi ^2_F\) and \(F_F\) tests are shown in Table 8. One can see that from \(20\%\) to \(40\%\) of missing data the nullhypothesis, that all methods perform the same, can be rejected at a \(10\%\) significance level. However, when the BonferroniDunn posthoc test is applied the performance of WGAIN is significantly better than DAE only and just for \(20\%\) and \(30\%\) of missing data.
The same ranking process is repeated for RMSE with results in Table 3. An example of RMSE and corresponding ranks for 10% of missingness is presented in Tables 6 and 7. Interestingly, the WGAIN performance is one of the worst, whereas the GAIN performs the best. This is in contrary to the fact that the WGAIN imputes the best from the accuracy point of view. Hence, we can see that low RMSE, which is usually taken as a measure of imputation quality may not lead to the desired performance on the target task. On the other hand, the RMSE differences are relatively small as can be seen in Table 6.
5 Conclusion
We propose a Wasserstein Generative Adversarial Imputation Network as a new deep learning imputation model. It is inspired by the GAIN. However, the discriminator is replaced by the Wasserstein critic. It is known that the Wasserstein approach does not suffer from vanishing gradients in the way that a vanilla GAN does. This enables the model to capture the desired distribution better. One may assume such benefits in WGAIN as well. We experimentally showed that in the imputation performance measured by classification accuracy, the WGAIN outperforms the other methods when the degree of missingness is lower than or equal to \(30\%\). In other cases, it is competitive. In future work, we would like to focus on the use of WGAIN in image inpainting tasks.
Notes
 1.
TensorFlow platform: https://www.tensorflow.org.
 2.
VAEAC implementation: https://github.com/tigvarts/vaeac.
 3.
Scikitlearn library: https://scikitlearn.org.
 4.
Fancyimpute repository: https://github.com/iskandr/fancyimpute.
References
AlcaláFdez, J., et al.: Keel datamining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Mult. Valued Logic Soft Comput. 17, 255–287 (2011)
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein gan (2017)
Arroyo, Á., Herrero, Á., Tricio, V., Corchado, E., Woźniak, M.: Neural models for imputation of missing ozone data in airquality datasets. Complexity 2018, 14 (2018)
Azur, M.J., Stuart, E.A., Frangakis, C., Leaf, P.J.: Multiple imputation by chained equations: what is it and how does it work? Int. J. Methods Psychiatric Res. 20(1), 40–49 (2011)
BeaulieuJones, B.K., Moore, J.H.: Missing data imputation in the electronic health record using deeply learned autoencoders. In: Pacific Symposium on Biocomputing 2017, pp. 207–218. World Scientific (2017)
Camino, R.D., Hammerschmidt, C.A., State, R.: Improving missing data imputation with deep generative models. CoRR, abs/1902.10666 (2019)
Chen, T., Guestrin, C.: Xgboost: a scalable tree boosting system. In: Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016, pp. 785–794. ACM, New York (2016)
Costa, A.F., Santos, M.S., Soares, J.P., Abreu, P.H.: Missing Data Imputation via Denoising Autoencoders: The Untold Story. In: Duivesteijn, W., Siebes, A., Ukkonen, A. (eds.) IDA 2018. LNCS, vol. 11191, pp. 87–98. Springer, Cham (2018). https://doi.org/10.1007/9783030017682_8
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Duan, Y., Lv, Y., Liu, J.L., Wang, F.Y.: An efficient realization of deep learning for traffic data imputation. Transp. Res. Part C Emerg. Technol. 72, 168–181 (2016)
Farhangfar, A., Kurgan, L.A., Dy, J.G.: Impact of imputation of missing values on classification error for discrete data. Pattern Recogn. 41, 3692–3705 (2008)
Friedjungová, M., Jiřina, M., Vašata, D.: Missing features reconstruction and its impact on classification accuracy. In: Rodrigues, J.M.F., et al. (eds.) ICCS 2019. LNCS, vol. 11538, pp. 207–220. Springer, Cham (2019). https://doi.org/10.1007/9783030227449_16
Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Statist. Assoc. 32(200), 675–701 (1937)
Friedman, M.: A comparison of alternative tests of significance for the problem of \(m\) rankings. Ann. Math. Statist. 11(1), 86–92 (1940)
Gondara, L., Wang, K.: MIDA: Multiple imputation using denoising autoencoders. In: Phung, D., Tseng, V.S., Webb, G.I., Ho, B., Ganji, M., Rashidi, L. (eds.) PAKDD 2018. LNCS (LNAI), vol. 10939, pp. 260–272. Springer, Cham (2018). https://doi.org/10.1007/9783319930404_21
Goodfellow, I.J.,et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 8–13 December 2014, Montreal, Quebec, Canada, pp. 2672–2680 (2014)
Ivanov, O., Figurnov, M., Vetrov, D.: Variational autoencoder with arbitrary conditioning. In: International Conference on Learning Representations (2019)
Jonsson, P., Wohlin, C.: An evaluation of knearest neighbour imputation using likert data. In: 10th International Symposium on Software Metrics, 2004. Proceedings, pp. 108–118. IEEE (2004)
Kingma, D.P., Welling, M.: Autoencoding variational bayes. In: 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, 14–16 April 2014, Conference Track Proceedings (2014)
Lee, D., Kim, J., Moon, W.J., Ye, J.C.: Collagan: Collaborative GAN for missing image data imputation. CoRR, abs/1901.09764 (2019)
Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., Talwalkar, A.: Hyperband: a novel banditbased approach to hyperparameter optimization. J. Mach. Learn. Res. 18(1), 6715–6816 (2017)
Li, S.C.X., Jiang, B., Marlin, B.M.: Misgan: learning from incomplete data with generative adversarial networks. CoRR, abs/1902.09599 (2019)
Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data, vol. 333. Wiley, Hoboken (2014)
LopezMartin, M., Carro, B., SanchezEsguevillas, A., Lloret, J.: Conditional variational autoencoder for prediction and feature recovery applied to intrusion detection in IoT. Sensors 17(9) (2017)
McCoy, J.T., Kroon, S., Auret, L.: Variational autoencoders for missing data imputation with application to a simulated milling circuit. IFACPapersOnLine 51(21), 141–146 (2018)
Nazábal, A., Olmos, P.M., Ghahramani, Z., Valera, I.: Handling incomplete heterogeneous data using vaes. CoRR, abs/1807.03653 (2018)
Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., Efros, A.A.: Context encoders: feature learning by inpainting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2536–2544 (2016)
Schafer, J.L.: Analysis of Incomplete Multivariate Data. Chapman and Hall, London (1997)
SilvaRamírez, E.L., PinoMejías, R., LópezCoello, M.: Single imputation with multilayer perceptron and multiple imputationcombining multilayer perceptron and knearest neighbours for monotonepatterns. Appl. Soft Comput. 29, 65–74 (2015)
Sohn, K., Lee, H., Yan, X.: Learning structured output representation using deep conditional generative models. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems 28, pp. 3483–3491. Curran Associates Inc. (2015)
Van Buuren, S.: Flexible Imputation of Missing Data. Chapman and Hall/CRC, Boca Raton (2018)
Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning ACM (2008)
Wong, L.Z., Chen, H., Lin, S., Chen, D.C.: Imputing missing values in sensor networks using sparse data representations. In: Proceedings of the 17th ACM International Conference on Modeling, Analysis and Simulation of Wireless and Mobile Systems, MSWiM 2014, pp. 227–230. ACM, New York (2014)
Yoon, J., Jordon, J., van der Schaar, M.: GAIN: missing data imputation using generative adversarial nets. In: Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp. 5689–5698. PMLR, Stockholmsmässan, Stockholm Sweden, 10–15 Jul 2018
Zadeh, A., Lim, Y.C., Liang, P.P., Morency, L.P.: Variational autodecoder. CoRR, abs/1903.00840 (2019)
Acknowledgements
This research has been supported by SGS grant No. SGS20/213/OHK3/3T/18 and by GACR grant No. GA1818080S.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Friedjungová, M., Vašata, D., Balatsko, M., Jiřina, M. (2020). Missing Features Reconstruction Using a Wasserstein Generative Adversarial Imputation Network. In: Krzhizhanovskaya, V.V., et al. Computational Science – ICCS 2020. ICCS 2020. Lecture Notes in Computer Science(), vol 12140. Springer, Cham. https://doi.org/10.1007/9783030504236_17
Download citation
DOI: https://doi.org/10.1007/9783030504236_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 9783030504229
Online ISBN: 9783030504236
eBook Packages: Computer ScienceComputer Science (R0)