Imputation of Missing Values in the Fundamental Data: Using MICE Framework


Revolutionary developments in the field of big data analytics and machine learning algorithms have transformed the business strategies of industries such as banking, financial services, asset management, and e-commerce. The most common problems these firms face while utilizing data is the presence of missing values in the dataset. The objective of this study is to impute fundamental data that is missing in financial statements. The study uses ‘Multiple Imputation by Chained Equations’ (MICE) framework by utilizing the interdependency among the variables that wholly comply with accounting rules. The proposed framework has two stages. The initial imputation is based on predictive mean matching in the first stage and resolving financial constraints in the second stage. The MICE framework allows us to incorporate accounting constraints in the imputation process. The performance tests conducted on the imputed dataset indicate that the imputed values for the 177 line items are good and in line with the expectations of subject matter experts.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2


  1. Bouhlila, D.S., and F. Sellaouti. 2013. Multiple imputation using chained equations for missing data in TIMSS: a case study. Large-scale Assessments in Education 1: 1–33.

    Article  Google Scholar 

  2. Buuren, S.V., and K. Groothuis-Oudshoorn. 2010. Mice: Multivariate imputation by chained equations in R. Journal of Statistical Software. 45: 1–68.

    Google Scholar 

  3. Van Buuren, S., J.P.L. Brand, C.G.M. Groothuis-Oudshoorn, and D.B. Rubin. 2006. Fully Conditional Specification in multivariate imputation. Journal of Statistical Computation and Simulation 76: 1049–1064.

    Article  Google Scholar 

  4. De Waal, T. 2011. Handbook of statistical data editing and imputation. New York: Wiley.

    Google Scholar 

  5. Fogarty, D.J. 2006. Multiple imputation as a missing data approach to reject inference on consumer credit scoring. Interstat. 41: 1–41.

    Google Scholar 

  6. Galler, B., and U. Kehral. 2012. Missing data methods in credit risk. Kirchberg: 5th European Risk Conference. (13–14 September 2012).

    Google Scholar 

  7. He, Y., A.M. Zaslavsky, M.B. Landrum, D.P. Harrington, and P. Catalano. 2009. Multiple imputation in a large-scale complex survey: a practical guide. Statistical Methods in Medical Research 19: 653–670.

    Article  Google Scholar 

  8. Kennickell, Arthur B. 1991. Imputation of the 1989 survey of consumer finances: stochastic relaxation and multiple imputation. Proceedings of the Survey Research Methods Section of the American Statistical Association 1 (10): 41.

    Google Scholar 

  9. King, Gary, et al. 1998. List-wise deletion is evil: what to do about missing data in political science. Boston: Annual Meeting of the American Political Science Association.

    Google Scholar 

  10. Kofman, P., and I.G. Sharpe. 2000. Imputation methods for incomplete dependent variables in finance. School of finance and economics. Sydney: University of Techology.

    Google Scholar 

  11. Little, R.J., and D.B. Rubin. 2002. Statistical analysis with missing data. New York: Wiley.

    Google Scholar 

  12. Little, Roderick J.A. 1988. A test of missing completely at random for multivariate data with missing values. Journal of the American Statistical Association 83 (404): 1198–1202.

    Article  Google Scholar 

  13. Pagano, A., Perrotta, D., and S. Arsenis. 2012. Imputation and outlier detection in banking datasets. Paper presented at 46th SIS Scientific Meeting of the Italian Statistical Society, Rome.

  14. Raghunathan, T.E. 2001. A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey methodology 27: 85–96.

    Google Scholar 

  15. Rubin, D.B. 1976. Inference and missing data. Biometrika 63 (3): 581–592.

    Article  Google Scholar 

  16. Rubin, D.B. 1987. Multiple imputation for nonresponse in surveys. New York: Wiley.

    Google Scholar 

  17. Rubin, D.B. 1996. Multiple imputation after 18 + years. Journal of the American statistical Association 91: 473–489.

    Article  Google Scholar 

  18. Schafer, J.L. 1997. Analysis of incomplete multivariate data. Florida: CRC Press.

    Google Scholar 

  19. Schafer, J.L. 1999. Multiple imputation: a primer. Statistical Methods in Medical Research 8: 3–15.

    Article  Google Scholar 

  20. Stuart, E.A., M. Azur, C. Frangakis, and P. Leaf. 2009. Multiple imputation with large data sets: a case study of the Children’s Mental Health Initiative. American Journal of Epidemiology 169: 1133–1139.

    Article  Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Manish Kumar.

Additional information

The views and opinions expressed in this article are those of the authors and do not necessarily reflect the view of the Credit Rating information services India Ltd (CRISIL).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Meghanadh, B., Aravalath, L., Joshi, B. et al. Imputation of Missing Values in the Fundamental Data: Using MICE Framework. J. Quant. Econ. 17, 459–475 (2019).

Download citation


  • Multiple imputation
  • MICE
  • Fundamental data
  • Accounting and financial statement

JEL Classification

  • C13
  • C32
  • C51
  • C53
  • G20