Revolutionary developments in the field of big data analytics and machine learning algorithms have transformed the business strategies of industries such as banking, financial services, asset management, and e-commerce. The most common problems these firms face while utilizing data is the presence of missing values in the dataset. The objective of this study is to impute fundamental data that is missing in financial statements. The study uses ‘Multiple Imputation by Chained Equations’ (MICE) framework by utilizing the interdependency among the variables that wholly comply with accounting rules. The proposed framework has two stages. The initial imputation is based on predictive mean matching in the first stage and resolving financial constraints in the second stage. The MICE framework allows us to incorporate accounting constraints in the imputation process. The performance tests conducted on the imputed dataset indicate that the imputed values for the 177 line items are good and in line with the expectations of subject matter experts.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
Bouhlila, D.S., and F. Sellaouti. 2013. Multiple imputation using chained equations for missing data in TIMSS: a case study. Large-scale Assessments in Education 1: 1–33.
Buuren, S.V., and K. Groothuis-Oudshoorn. 2010. Mice: Multivariate imputation by chained equations in R. Journal of Statistical Software. 45: 1–68.
Van Buuren, S., J.P.L. Brand, C.G.M. Groothuis-Oudshoorn, and D.B. Rubin. 2006. Fully Conditional Specification in multivariate imputation. Journal of Statistical Computation and Simulation 76: 1049–1064.
De Waal, T. 2011. Handbook of statistical data editing and imputation. New York: Wiley.
Fogarty, D.J. 2006. Multiple imputation as a missing data approach to reject inference on consumer credit scoring. Interstat. 41: 1–41.
Galler, B., and U. Kehral. 2012. Missing data methods in credit risk. Kirchberg: 5th European Risk Conference. (13–14 September 2012).
He, Y., A.M. Zaslavsky, M.B. Landrum, D.P. Harrington, and P. Catalano. 2009. Multiple imputation in a large-scale complex survey: a practical guide. Statistical Methods in Medical Research 19: 653–670.
Kennickell, Arthur B. 1991. Imputation of the 1989 survey of consumer finances: stochastic relaxation and multiple imputation. Proceedings of the Survey Research Methods Section of the American Statistical Association 1 (10): 41.
King, Gary, et al. 1998. List-wise deletion is evil: what to do about missing data in political science. Boston: Annual Meeting of the American Political Science Association.
Kofman, P., and I.G. Sharpe. 2000. Imputation methods for incomplete dependent variables in finance. School of finance and economics. Sydney: University of Techology.
Little, R.J., and D.B. Rubin. 2002. Statistical analysis with missing data. New York: Wiley.
Little, Roderick J.A. 1988. A test of missing completely at random for multivariate data with missing values. Journal of the American Statistical Association 83 (404): 1198–1202.
Pagano, A., Perrotta, D., and S. Arsenis. 2012. Imputation and outlier detection in banking datasets. Paper presented at 46th SIS Scientific Meeting of the Italian Statistical Society, Rome.
Raghunathan, T.E. 2001. A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey methodology 27: 85–96.
Rubin, D.B. 1976. Inference and missing data. Biometrika 63 (3): 581–592.
Rubin, D.B. 1987. Multiple imputation for nonresponse in surveys. New York: Wiley.
Rubin, D.B. 1996. Multiple imputation after 18 + years. Journal of the American statistical Association 91: 473–489.
Schafer, J.L. 1997. Analysis of incomplete multivariate data. Florida: CRC Press.
Schafer, J.L. 1999. Multiple imputation: a primer. Statistical Methods in Medical Research 8: 3–15.
Stuart, E.A., M. Azur, C. Frangakis, and P. Leaf. 2009. Multiple imputation with large data sets: a case study of the Children’s Mental Health Initiative. American Journal of Epidemiology 169: 1133–1139.
The views and opinions expressed in this article are those of the authors and do not necessarily reflect the view of the Credit Rating information services India Ltd (CRISIL).
About this article
Cite this article
Meghanadh, B., Aravalath, L., Joshi, B. et al. Imputation of Missing Values in the Fundamental Data: Using MICE Framework. J. Quant. Econ. 17, 459–475 (2019). https://doi.org/10.1007/s40953-018-0142-7
- Multiple imputation
- Fundamental data
- Accounting and financial statement