Skip to main content

On the Use of Multivariate Medians for Nearest Neighbour Imputation

  • Conference paper
  • First Online:
Key Digital Trends in Artificial Intelligence and Robotics (ICDLAIR 2022)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 670))

  • 156 Accesses

Abstract

Most data analysis techniques, e.g. those faced in [15,16,17,18] usually require datasets without missing data, however, in the current age of large datasets it is becoming more and more common to deal with huge datasets for which some values are missing. For this reason, several techniques for the imputation of missing values have been developed. In the present work, we present one such technique based on the use of multivariate medians within the context of nearest neighbour imputation. Several experiments are carried out proving a better performance than some state-of-the-art methods in presence of noise.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Choudhury, S.J., Pal, N.R.: Imputation of missing data with neural networks for classification. Knowl.-Based Syst. 182, 104838 (2019)

    Article  Google Scholar 

  2. Donoho, D., Gasko, M.: Multivariate generalizations of the median and trimmed means. Tech. rep., Technical Reports 128 and 133, Department Statistics, University California, Berkeley (1987)

    Google Scholar 

  3. Enders, C.K.: Applied Missing Data Analysis. Guilford Press, New York (2010)

    Google Scholar 

  4. Gagolewski, M., Pérez-Fernández, R., De Baets, B.: An inherent difficulty in the aggregation of multidimensional data. IEEE Trans. Fuzzy Syst. 28(3), 602–606 (2020)

    Article  Google Scholar 

  5. Hotelling, H.: Stability in competition. Econ. J. 39(153), 41–57 (1929)

    Article  Google Scholar 

  6. Hukkelås, H., Lindseth, F., Mester, R.: Image inpainting with learnable feature imputation. In: Akata, Z., Geiger, A., Sattler, T. (eds.) DAGM GCPR 2020. LNCS, vol. 12544, pp. 388–403. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-71278-5_28

    Chapter  Google Scholar 

  7. Kim, K.Y., Kim, B.J., Yi, G.S.: Reuse of imputed data in microarray analysis increases imputation efficiency. BMC Bioinform. 5(1), 1–9 (2004)

    Article  Google Scholar 

  8. Krarup, J., Vajda, S.: On Torricelli’s geometrical solution to a problem of fermat. IMA J. Manag. Math. 8(3), 215–224 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  9. Li, D., Deogun, J., Spaulding, W., Shuart, B.: Towards missing data imputation: a study of fuzzy k-means clustering method. In: International Conference on Rough Sets and Current Trends in Computing, pp. 573–579. Springer, Uppsala (2004)

    Google Scholar 

  10. Oba, S., Sato, M., Takemasa, I., Monden, M., Matsubara, K., Ishii, S.: A Bayesian missing value estimation method for gene expression profile data. Bioinformatics 19(16), 2088–2096 (2003)

    Google Scholar 

  11. Rousseeuw, P., Hubert, M.: High-breakdown estimators of multivariate location and scatter. In: Becker, C., Fried, R., Kuhnt, S. (eds.) Robustness and Complex Data Structures, pp. 49–66. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-35494-6_4

  12. Schmitt, P., Mandel, J., Guedj, M.: A comparison of six methods for missing data imputation. J. Biometr. Biostatist. 6(1), 1 (2015)

    Google Scholar 

  13. Small, C.G.: Measures of centrality for multivariate and directional distributions. Canadian J. Statist. 15(1), 31–39 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  14. Small, C.G.: A survey of multidimensional medians. Int. Stat. Rev. 58(3), 263–277 (1990)

    Article  Google Scholar 

  15. Troiano, L., Birtolo, C., Armenise, R., Cirillo, G.: Optimization of menu layouts by means of genetic algorithms. In: van Hemert, J., Cotta, C. (eds.) EvoCOP 2008. LNCS, vol. 4972, pp. 242–253. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78604-7_21

    Chapter  Google Scholar 

  16. Troiano, L., Rodríguez-Muñiz, L., Díaz, I.: Discovering user preferences using dempster-Shafer theory. Fuzzy Sets Syst. 278, 98–117 (2015). https://doi.org/10.1016/j.fss.2015.06.004

    Article  MathSciNet  MATH  Google Scholar 

  17. Troiano, L., Rodríguez-Muñiz, L., Marinaro, P., Díaz, I.: Statistical analysis of parametric t-norms. Inf. Sci. 257, 138–162 (2014). https://doi.org/10.1016/j.ins.2013.09.041

    Article  MathSciNet  MATH  Google Scholar 

  18. Troiano, L., Rodríguez-Muñiz, L., Ranilla, J., Díaz, I.: Interpretability of fuzzy association rules as means of discovering threats to privacy. Int. J. Comput. Math. 89(3), 325–333 (2012). https://doi.org/10.1080/00207160.2011.613460

    Article  Google Scholar 

  19. Troyanskaya, O., et al.: Missing value estimation methods for DNA microarrays. Bioinformatics 17(6), 520–525 (2001)

    Article  Google Scholar 

  20. Tukey, J.W.: Mathematics and the picturing of data. In: Proceedings of the International Congress of Mathematicians, pp. 523–531. Vancouver (1975)

    Google Scholar 

  21. Van Buuren, S., Oudshoorn, C.G.: mice: Multivariate imputation by chained equations. J. Stat. Softw. 45(3), 1–67 (2011)

    Article  Google Scholar 

  22. Vardi, Y., Zhang, C.H.: The multivariate l 1-median and associated data depth. Proc. Natl. Acad. Sci. 97(4), 1423–1426 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  23. Weber, A.: Ueber den Standort der Industrien. Mohr Siebeck Verlag, Tübingen (1909)

    Google Scholar 

  24. Weiszfeld, E.: Sur le point pour lequel la somme des distances de n points donnés est minimum. Tohoku Math. J. First Ser. 43, 355–386 (1937)

    MATH  Google Scholar 

  25. Zuo, Y., Serfling, R.: General notions of statistical depth function. Annal. Statist. 28(2), 461–482 (2000)

    Google Scholar 

Download references

Acknowledgments

Raúl Pérez-Fernández acknowledges the support of Campus de Excelencia Internacional de la Universidad de Oviedo in collaboration with Banco de Santander.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vincenzo Benedetto .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gissi, F., Benedetto, V., Bhandari, P., Pérez-Fernández, R. (2023). On the Use of Multivariate Medians for Nearest Neighbour Imputation. In: Troiano, L., Vaccaro, A., Kesswani, N., Díaz Rodriguez, I., Brigui, I., Pastor-Escuredo, D. (eds) Key Digital Trends in Artificial Intelligence and Robotics. ICDLAIR 2022. Lecture Notes in Networks and Systems, vol 670. Springer, Cham. https://doi.org/10.1007/978-3-031-30396-8_13

Download citation

Publish with us

Policies and ethics