Evaluating the Quality of Data Imputation in Cardiovascular Risk Studies Through the Dissimilarity Profile Analysis

Solaro, Nadia

doi:10.1007/978-3-030-21140-0_9

Nadia Solaro²¹

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

Included in the following conference series:

Scientific Meeting of the Classification and Data Analysis Group of the Italian Statistical Society

1081 Accesses

Abstract

Missing data handling is one of the crucial problems in statistical analyses, and almost always is overcome by imputation. Although the literature is rich in different imputation approaches, the problem of the assessment of the quality of imputation, i.e., appraising whether the imputed values or categories are plausible for variables and units, seems to have received less attention. This issue is critical in every field of application, such as the medical context considered here, i.e., the assessment of cardiovascular disease risks. We faced the problem of comparing the results obtained with different imputation methods and assessing the quality of imputation through the dissimilarity profile analysis (DPA), which is a multivariate exploratory method for the analysis of dissimilarity matrices. We also combined DPA with the traditional profile analysis for data matrices in order to improve understanding of the differentiation components among imputation methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

D’Orazio, M., Di Zio, M., Scanu, M.: Statistical Matching - Theory and Practice. Wiley, New York (2006)
Book Google Scholar
Honaker, J., King, G., Blackwell, M.: Amelia II: a program for missing data. J. Stat. Softw. 45, 1–47 (2011)
Article Google Scholar
Jobson, J.D.: Applied Multivariate Data Analysis. Volume II: Categorical and Multivariate Methods. Springer, New York (1992)
Google Scholar
Josse, J., Pagès, J., Husson, F.: Multiple imputation in principal component analysis. Adv. Data Anal. Classif. 5, 231–246 (2011)
Article MathSciNet Google Scholar
Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data, 2nd edn. Wiley, New York (2002)
Book Google Scholar
Lucini, D., Solaro, N., Pagani, M.: Autonomic differentiation map: a novel statistical tool for interpretation of heart rate variability. Front. Physiol. 9, 401 (2018). https://doi.org/10.3389/fphys.2018.00401
Article Google Scholar
Schafer, J.L.: Analysis of Incomplete Multivariate Data. Chapman and Hall/CRC, London (1997)
Book Google Scholar
Solaro, N.: Dissimilarity profile analysis: a case study from Italian universities. Electron. J. Appl. Stat. Anal. 5, 438–444 (2012)
Google Scholar
Solaro, N.: Dissimilarity profile analysis for assessing the quality of imputation in cardiovascular risk studies. In: Greselin, F., Mola, F., Zenga, M. (eds.) Cladag 2017 Book of Short Papers, Universitas Studiorum S.r.l. Casa Editrice, Mantova, Italy (2017)
Google Scholar
Solaro, N.: Dissimilarity profile analysis: a novel exploratory tool for dissimilarity matrices (2019, manuscript in preparation)
Google Scholar
Solaro, N., Barbiero, A., Manzi, G., Ferrari, P.A.: A sequential distance-based approach for imputing missing data: forward imputation. Adv. Data Anal. Classif. 11, 395–414 (2017)
Article MathSciNet Google Scholar
Solaro, N., Lucini, D., Pagani, M.: Handling missing data in observational clinical studies concerning cardiovascular risk: an insight into critical aspects. In: Palumbo, F., Montanari, A., Vichi, M. (eds.) Data Science, Studies in Classification, Data Analysis, and Knowledge Organization Series, pp. 175–188. Springer International Publishing, Cham (2017)
Google Scholar

Download references

Acknowledgements

The author would like to thank Daniela Lucini and Massimo Pagani, BIOMETRA Department, University of Milan, for sharing their data and research on the neurovegetative system and CVD risk factors, and for their precious comments and suggestions.

Author information

Authors and Affiliations

Department of Statistics and Quantitative Methods, University of Milano-Bicocca, Milano, Italy
Nadia Solaro

Authors

Nadia Solaro
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nadia Solaro .

Editor information

Editors and Affiliations

Department of Statistics and Quantitative Methods, University of Milano-Bicocca, Milan, Italy
Francesca Greselin
Department of Statistical Sciences, Università Cattolica del Sacro Cuore, Milan, Italy
Laura Deldossi
Department of Economic and Social Sciences, Università Cattolica del Sacro Cuore, Piacenza, Italy
Luca Bagnato
Department of Statistical Sciences, Sapienza University of Rome, Rome, Italy
Maurizio Vichi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Solaro, N. (2019). Evaluating the Quality of Data Imputation in Cardiovascular Risk Studies Through the Dissimilarity Profile Analysis. In: Greselin, F., Deldossi, L., Bagnato, L., Vichi, M. (eds) Statistical Learning of Complex Data. CLADAG 2017. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-030-21140-0_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-21140-0_9
Published: 07 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-21139-4
Online ISBN: 978-3-030-21140-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics