On the Use of Multivariate Medians for Nearest Neighbour Imputation

Gissi, Francesco; Benedetto, Vincenzo; Bhandari, Parth; Pérez-Fernández, Raúl

doi:10.1007/978-3-031-30396-8_13

Francesco Gissi^15,16,
Vincenzo Benedetto^15,16,
Parth Bhandari^15,16,17 &
…
Raúl Pérez-Fernández¹⁷

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 670))

Included in the following conference series:

International Conference on Deep Learning, Artificial Intelligence and Robotics

156 Accesses

Abstract

Most data analysis techniques, e.g. those faced in [15,16,17,18] usually require datasets without missing data, however, in the current age of large datasets it is becoming more and more common to deal with huge datasets for which some values are missing. For this reason, several techniques for the imputation of missing values have been developed. In the present work, we present one such technique based on the use of multivariate medians within the context of nearest neighbour imputation. Several experiments are carried out proving a better performance than some state-of-the-art methods in presence of noise.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Choudhury, S.J., Pal, N.R.: Imputation of missing data with neural networks for classification. Knowl.-Based Syst. 182, 104838 (2019)
Article Google Scholar
Donoho, D., Gasko, M.: Multivariate generalizations of the median and trimmed means. Tech. rep., Technical Reports 128 and 133, Department Statistics, University California, Berkeley (1987)
Google Scholar
Enders, C.K.: Applied Missing Data Analysis. Guilford Press, New York (2010)
Google Scholar
Gagolewski, M., Pérez-Fernández, R., De Baets, B.: An inherent difficulty in the aggregation of multidimensional data. IEEE Trans. Fuzzy Syst. 28(3), 602–606 (2020)
Article Google Scholar
Hotelling, H.: Stability in competition. Econ. J. 39(153), 41–57 (1929)
Article Google Scholar
Hukkelås, H., Lindseth, F., Mester, R.: Image inpainting with learnable feature imputation. In: Akata, Z., Geiger, A., Sattler, T. (eds.) DAGM GCPR 2020. LNCS, vol. 12544, pp. 388–403. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-71278-5_28
Chapter Google Scholar
Kim, K.Y., Kim, B.J., Yi, G.S.: Reuse of imputed data in microarray analysis increases imputation efficiency. BMC Bioinform. 5(1), 1–9 (2004)
Article Google Scholar
Krarup, J., Vajda, S.: On Torricelli’s geometrical solution to a problem of fermat. IMA J. Manag. Math. 8(3), 215–224 (1997)
Article MathSciNet MATH Google Scholar
Li, D., Deogun, J., Spaulding, W., Shuart, B.: Towards missing data imputation: a study of fuzzy k-means clustering method. In: International Conference on Rough Sets and Current Trends in Computing, pp. 573–579. Springer, Uppsala (2004)
Google Scholar
Oba, S., Sato, M., Takemasa, I., Monden, M., Matsubara, K., Ishii, S.: A Bayesian missing value estimation method for gene expression profile data. Bioinformatics 19(16), 2088–2096 (2003)
Google Scholar
Rousseeuw, P., Hubert, M.: High-breakdown estimators of multivariate location and scatter. In: Becker, C., Fried, R., Kuhnt, S. (eds.) Robustness and Complex Data Structures, pp. 49–66. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-35494-6_4
Schmitt, P., Mandel, J., Guedj, M.: A comparison of six methods for missing data imputation. J. Biometr. Biostatist. 6(1), 1 (2015)
Google Scholar
Small, C.G.: Measures of centrality for multivariate and directional distributions. Canadian J. Statist. 15(1), 31–39 (1987)
Article MathSciNet MATH Google Scholar
Small, C.G.: A survey of multidimensional medians. Int. Stat. Rev. 58(3), 263–277 (1990)
Article Google Scholar
Troiano, L., Birtolo, C., Armenise, R., Cirillo, G.: Optimization of menu layouts by means of genetic algorithms. In: van Hemert, J., Cotta, C. (eds.) EvoCOP 2008. LNCS, vol. 4972, pp. 242–253. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78604-7_21
Chapter Google Scholar
Troiano, L., Rodríguez-Muñiz, L., Díaz, I.: Discovering user preferences using dempster-Shafer theory. Fuzzy Sets Syst. 278, 98–117 (2015). https://doi.org/10.1016/j.fss.2015.06.004
Article MathSciNet MATH Google Scholar
Troiano, L., Rodríguez-Muñiz, L., Marinaro, P., Díaz, I.: Statistical analysis of parametric t-norms. Inf. Sci. 257, 138–162 (2014). https://doi.org/10.1016/j.ins.2013.09.041
Article MathSciNet MATH Google Scholar
Troiano, L., Rodríguez-Muñiz, L., Ranilla, J., Díaz, I.: Interpretability of fuzzy association rules as means of discovering threats to privacy. Int. J. Comput. Math. 89(3), 325–333 (2012). https://doi.org/10.1080/00207160.2011.613460
Article Google Scholar
Troyanskaya, O., et al.: Missing value estimation methods for DNA microarrays. Bioinformatics 17(6), 520–525 (2001)
Article Google Scholar
Tukey, J.W.: Mathematics and the picturing of data. In: Proceedings of the International Congress of Mathematicians, pp. 523–531. Vancouver (1975)
Google Scholar
Van Buuren, S., Oudshoorn, C.G.: mice: Multivariate imputation by chained equations. J. Stat. Softw. 45(3), 1–67 (2011)
Article Google Scholar
Vardi, Y., Zhang, C.H.: The multivariate l 1-median and associated data depth. Proc. Natl. Acad. Sci. 97(4), 1423–1426 (2000)
Article MathSciNet MATH Google Scholar
Weber, A.: Ueber den Standort der Industrien. Mohr Siebeck Verlag, Tübingen (1909)
Google Scholar
Weiszfeld, E.: Sur le point pour lequel la somme des distances de n points donnés est minimum. Tohoku Math. J. First Ser. 43, 355–386 (1937)
MATH Google Scholar
Zuo, Y., Serfling, R.: General notions of statistical depth function. Annal. Statist. 28(2), 461–482 (2000)
Google Scholar

Download references

Acknowledgments

Raúl Pérez-Fernández acknowledges the support of Campus de Excelencia Internacional de la Universidad de Oviedo in collaboration with Banco de Santander.

Author information

Authors and Affiliations

Department of Engineering, University of Sannio, Viale Traiano, 1, 82100, Benevento, Italy
Francesco Gissi, Vincenzo Benedetto & Parth Bhandari
Kebula srl, Via della Biblioteca 2, 84084, Fisciano, Italy
Francesco Gissi, Vincenzo Benedetto & Parth Bhandari
Department of Statistics and O.R. and Mathematics Didactics, University of Oviedo, Oviedo, Spain
Parth Bhandari & Raúl Pérez-Fernández

Authors

Francesco Gissi
View author publications
You can also search for this author in PubMed Google Scholar
Vincenzo Benedetto
View author publications
You can also search for this author in PubMed Google Scholar
Parth Bhandari
View author publications
You can also search for this author in PubMed Google Scholar
Raúl Pérez-Fernández
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vincenzo Benedetto .

Editor information

Editors and Affiliations

Department of Management and Innovation Systems, University of Salerno, Salerno, Italy
Luigi Troiano
Department of Engineering, University of Sannio, Benevento, Italy
Alfredo Vaccaro
Department of Computer Science, Central University of Rajasthan, Ajmer, Rajasthan, India
Nishtha Kesswani
Department of Computer Science, University of Oviedo, Gijón, Spain
Irene Díaz Rodriguez
EMLYON Business School, Écully, France
Imene Brigui
University College London, London, UK
David Pastor-Escuredo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gissi, F., Benedetto, V., Bhandari, P., Pérez-Fernández, R. (2023). On the Use of Multivariate Medians for Nearest Neighbour Imputation. In: Troiano, L., Vaccaro, A., Kesswani, N., Díaz Rodriguez, I., Brigui, I., Pastor-Escuredo, D. (eds) Key Digital Trends in Artificial Intelligence and Robotics. ICDLAIR 2022. Lecture Notes in Networks and Systems, vol 670. Springer, Cham. https://doi.org/10.1007/978-3-031-30396-8_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-30396-8_13
Published: 17 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-30395-1
Online ISBN: 978-3-031-30396-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

On the Use of Multivariate Medians for Nearest Neighbour Imputation