Abstract
Most data analysis techniques, e.g. those faced in [15,16,17,18] usually require datasets without missing data, however, in the current age of large datasets it is becoming more and more common to deal with huge datasets for which some values are missing. For this reason, several techniques for the imputation of missing values have been developed. In the present work, we present one such technique based on the use of multivariate medians within the context of nearest neighbour imputation. Several experiments are carried out proving a better performance than some state-of-the-art methods in presence of noise.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Choudhury, S.J., Pal, N.R.: Imputation of missing data with neural networks for classification. Knowl.-Based Syst. 182, 104838 (2019)
Donoho, D., Gasko, M.: Multivariate generalizations of the median and trimmed means. Tech. rep., Technical Reports 128 and 133, Department Statistics, University California, Berkeley (1987)
Enders, C.K.: Applied Missing Data Analysis. Guilford Press, New York (2010)
Gagolewski, M., Pérez-Fernández, R., De Baets, B.: An inherent difficulty in the aggregation of multidimensional data. IEEE Trans. Fuzzy Syst. 28(3), 602–606 (2020)
Hotelling, H.: Stability in competition. Econ. J. 39(153), 41–57 (1929)
Hukkelås, H., Lindseth, F., Mester, R.: Image inpainting with learnable feature imputation. In: Akata, Z., Geiger, A., Sattler, T. (eds.) DAGM GCPR 2020. LNCS, vol. 12544, pp. 388–403. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-71278-5_28
Kim, K.Y., Kim, B.J., Yi, G.S.: Reuse of imputed data in microarray analysis increases imputation efficiency. BMC Bioinform. 5(1), 1–9 (2004)
Krarup, J., Vajda, S.: On Torricelli’s geometrical solution to a problem of fermat. IMA J. Manag. Math. 8(3), 215–224 (1997)
Li, D., Deogun, J., Spaulding, W., Shuart, B.: Towards missing data imputation: a study of fuzzy k-means clustering method. In: International Conference on Rough Sets and Current Trends in Computing, pp. 573–579. Springer, Uppsala (2004)
Oba, S., Sato, M., Takemasa, I., Monden, M., Matsubara, K., Ishii, S.: A Bayesian missing value estimation method for gene expression profile data. Bioinformatics 19(16), 2088–2096 (2003)
Rousseeuw, P., Hubert, M.: High-breakdown estimators of multivariate location and scatter. In: Becker, C., Fried, R., Kuhnt, S. (eds.) Robustness and Complex Data Structures, pp. 49–66. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-35494-6_4
Schmitt, P., Mandel, J., Guedj, M.: A comparison of six methods for missing data imputation. J. Biometr. Biostatist. 6(1), 1 (2015)
Small, C.G.: Measures of centrality for multivariate and directional distributions. Canadian J. Statist. 15(1), 31–39 (1987)
Small, C.G.: A survey of multidimensional medians. Int. Stat. Rev. 58(3), 263–277 (1990)
Troiano, L., Birtolo, C., Armenise, R., Cirillo, G.: Optimization of menu layouts by means of genetic algorithms. In: van Hemert, J., Cotta, C. (eds.) EvoCOP 2008. LNCS, vol. 4972, pp. 242–253. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78604-7_21
Troiano, L., Rodríguez-Muñiz, L., Díaz, I.: Discovering user preferences using dempster-Shafer theory. Fuzzy Sets Syst. 278, 98–117 (2015). https://doi.org/10.1016/j.fss.2015.06.004
Troiano, L., Rodríguez-Muñiz, L., Marinaro, P., Díaz, I.: Statistical analysis of parametric t-norms. Inf. Sci. 257, 138–162 (2014). https://doi.org/10.1016/j.ins.2013.09.041
Troiano, L., Rodríguez-Muñiz, L., Ranilla, J., Díaz, I.: Interpretability of fuzzy association rules as means of discovering threats to privacy. Int. J. Comput. Math. 89(3), 325–333 (2012). https://doi.org/10.1080/00207160.2011.613460
Troyanskaya, O., et al.: Missing value estimation methods for DNA microarrays. Bioinformatics 17(6), 520–525 (2001)
Tukey, J.W.: Mathematics and the picturing of data. In: Proceedings of the International Congress of Mathematicians, pp. 523–531. Vancouver (1975)
Van Buuren, S., Oudshoorn, C.G.: mice: Multivariate imputation by chained equations. J. Stat. Softw. 45(3), 1–67 (2011)
Vardi, Y., Zhang, C.H.: The multivariate l 1-median and associated data depth. Proc. Natl. Acad. Sci. 97(4), 1423–1426 (2000)
Weber, A.: Ueber den Standort der Industrien. Mohr Siebeck Verlag, Tübingen (1909)
Weiszfeld, E.: Sur le point pour lequel la somme des distances de n points donnés est minimum. Tohoku Math. J. First Ser. 43, 355–386 (1937)
Zuo, Y., Serfling, R.: General notions of statistical depth function. Annal. Statist. 28(2), 461–482 (2000)
Acknowledgments
Raúl Pérez-Fernández acknowledges the support of Campus de Excelencia Internacional de la Universidad de Oviedo in collaboration with Banco de Santander.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Gissi, F., Benedetto, V., Bhandari, P., Pérez-Fernández, R. (2023). On the Use of Multivariate Medians for Nearest Neighbour Imputation. In: Troiano, L., Vaccaro, A., Kesswani, N., Díaz Rodriguez, I., Brigui, I., Pastor-Escuredo, D. (eds) Key Digital Trends in Artificial Intelligence and Robotics. ICDLAIR 2022. Lecture Notes in Networks and Systems, vol 670. Springer, Cham. https://doi.org/10.1007/978-3-031-30396-8_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-30396-8_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30395-1
Online ISBN: 978-3-031-30396-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)