Evaluation of Missing Data Imputation Methods for an Enhanced Distributed PV Generation Prediction

  • Aditya Sundararajan
  • Arif I. SarwatEmail author
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1069)


To effectively predict generation of distributed photovoltaic (PV) systems, three parameters are critical: irradiance, ambient temperature, and module temperature. However, their completeness cannot be guaranteed because of issues in data acquisition. Many methods in literature address missingness, but their applicability varies with missingness mechanism. Exploration of methods to impute missing data in PV systems is lacking. This paper conducts statistical analyses to understand missingness mechanism in data of a real grid-tied 1.4MW PV system at Miami, and compares the imputation performance of different methods: random imputation, multiple imputation using expectation-maximization, kNN, and random forests, using error metrics and size effect measures. Imputed values are used in a multilayer perceptron to predict and compare PV generation with observed values. Results show that values imputed using kNN and random forests have the least differences in proportions and help utilities make more accurate prediction of generation for distribution planning.


Distributed PV Missing data Data processing Imputation methods PV Generation Prediction 



The work published is a result of the research sponsored by the National Science Foundation (NSF) CNS division under the award 1553494.


  1. 1.
    Sundararajan, A., Olowu, T.O., Wei, L., Rahman, S., Sarwat, A.I.: A case study on the effects of partial solar eclipse on distributed photovoltaic systems and management areas. IET Smart Grid (2019)Google Scholar
  2. 2.
    Peterson, Z., Coddington, M., Ding, F., Sigrin, B., Saleem, D., Horowitz, K., et al.: An overview of distributed energy resource (DER) interconnection: current practices and emerging solutions. NREL Tech. rep. (number NREL/TP-6A20-72102), April 2019.
  3. 3.
    Sarwat, A.I., Amini, M., Domijan, A., Damjanovic, A., Kaleem, F.: Weather-based interruption prediction in the smart grid utilizing chronological data. J. Mod. Power Syst. Clean Energy 2, 308–315 (2015)Google Scholar
  4. 4.
    Sundararajan, A., Khan, T., Moghadasi, A., Sarwat, A.I.: A survey on synchrophasor data quality and cybersecurity challenges, and evaluation of their interdependencies. J. Mod. Power Sys. Clean Energy, 1–19 (2018)Google Scholar
  5. 5.
    Jeng, R.S., Kuo, C.Y., Ho, Y.H., Lee, M.F., Tseng, L.W., Fu, C.L., et al.: Missing data handling for meter data management system. In: Proceedings of the Fourth International Conference on Future Energy Systems. e-Energy 2013, pp. 275–276. ACM, New York (2013).
  6. 6.
    Peppanen, J., Zhang, X., Grijalva, S., Reno, M.J.: Handling bad or missing smart meter data through advanced data imputation. In: 2016 IEEE Power Energy Society Innovative Smart Grid Technologies Conference (ISGT), pp. 1–5 (2016)Google Scholar
  7. 7.
    Kodaira, D., Han, S.: Topology-based estimation of missing smart meter readings. MDPI Energies 11(224), 1–18 (2018)Google Scholar
  8. 8.
    Majidpour, M., Chu, P., Gadh, R., Pota, H.R.: Incomplete data in smart grid: treatment of missing values in electric vehicle charging data. In: 2014 International Conference on Connected Vehicles and Expo (ICCVE), pp. 1041–1042 (2014)Google Scholar
  9. 9.
    Genes, C., Esnaola, I., Perlaza, S.M., Ochoa, L.F., Coca, D.: Robust recovery of missing data in electricity distribution systems. IEEE Trans. Smart Grid, 1 (2018)Google Scholar
  10. 10.
    Olowu, T.O., Jafari, M., Sarwat, A.I.: A multi-objective optimization technique for Volt-Var control with high PV penetration using genetic algorithm. In: 2018 North American Power Symposium (NAPS), pp. 1–6 (2018)Google Scholar
  11. 11.
    Jurado, S., Nebot, A., Mugica, F., Mihaylov, M.: FIR forecasting strategies able to cope with missing data: a smart grid application. Appl. Soft Comput. 51, 225–238 (2017)CrossRefGoogle Scholar
  12. 12.
    Khalid, A., Sundararajan, A., Sarwat, A.I.: A multi-step predictive model to estimate Li-Ion state of charge for higher C-rates. In: 2019 IEEE International Conference on Environment and Electrical Engineering and 2019 IEEE Industrial and Commercial Power Systems Europe (EEEIC / I&CPS Europe) (2019)Google Scholar
  13. 13.
    Olowu, T.O., Sundararajan, A., Moghaddami, M., Sarwat, A.: Fleet aggregation of photovoltaic systems: a survey and case study. In: 2019 IEEE Power Energy Society Innovative Smart Grid Technologies Conference (ISGT) (2019)Google Scholar
  14. 14.
    Sundararajan, A., Chavan, A., Saleem, D., Sarwat, A.I.: A survey of protocol-level challenges and solutions for distributed energy resource cyber-physical security. MDPI Energies 9, 2360 (2018)CrossRefGoogle Scholar
  15. 15.
    Parvez, I., Sarwat, A.I., Wei, L., Sundararajan, A.: Securing metering infrastructure of smart grid: a machine learning and localization based key management approach. Energies, 9(9) (2016). Scholar
  16. 16.
    Sundararajan, A., Wei, L., Khan, T., Sarwat, A.I., Rodrigo, D.: A tri-modular framework to minimize smart grid cyber-attack cognitive gap in utility control centers. In: 2018 Resilience Week (RWS), pp. 117–123 (2018)Google Scholar
  17. 17.
    Sundararajan, A., Sarwat, A.I., Pons, A.: A survey on modality characteristics, performance evaluation metrics, and security for traditional and wearable biometric systems. ACM Comput. Surv. 52(2), 1–35 (2019)CrossRefGoogle Scholar
  18. 18.
    Wei, L., Sundararajan, A., Sarwat, A.I., Biswas, S., Ibrahim, E.: A distributed intelligent framework for electricity theft detection using Benford’s law and stackelberg game. In: Resilience Week. pp. 5–11 (2017)Google Scholar
  19. 19.
    Zhang, Y., Huang, T., Bompard, E.F.: Big data analytics in smart grids: a review. Energy Inf. 1(1), 8 (2018). Scholar
  20. 20.
    Olowu, T.O., Sundararajan, A., Moghaddami, M., Sarwat, A.I.: Future challenges and mitigation methods for high photovoltaic penetration: a survey. Electr. Power Syst. Res. (2018)Google Scholar
  21. 21.
    Jerez, J.M., Molina, I., Garcia-Laencina, P.J., Alba, E., Ribelles, N., Martin, M., et al.: Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artif. Intell. Med. 50, 105–115 (2010)CrossRefGoogle Scholar
  22. 22.
    Mittag, N.: Imputations: benefits, risks and a method for missing data (2013).
  23. 23.
    Schmitt, P., Mandel, J., Guedj, M.: A comparison of six methods for missing data imputation. J. Biometrics Biostatistics 6(1), 1–6 (2015)Google Scholar
  24. 24.
    Livera, A., Phinikarides, A., Makrides, G., Georghiou, G.E.: Impact of missing data on the estimation of photovoltaic system degradation rate. In: 2017 IEEE 44th Photovoltaic Specialist Conference (PVSC), pp. 1954–1958 (2017)Google Scholar
  25. 25.
    Miao, X., Gao, Y., Guo, S., Liu, W.: Incomplete data management: a survey. Front. Comput. Sci. 12(1), 4–25 (2018)CrossRefGoogle Scholar
  26. 26.
    Kang, H.: The prevention and handling of the missing data. Korean Soc. Anesthesiologists 64(5), 402–406 (2013)CrossRefGoogle Scholar
  27. 27.
    Addo, E.D.: Performance comparison of imputation algorithms on missing at random data. Master’s Thesis submitted to East Tennessee State University, pp. 1–129 (2018).
  28. 28.
    Horton, N.J., Kleinman, K.P.: Much ado about nothing: a comparison of missing data methods and software to fit incomplete data regression models. Am. Stat. 1, 79–90 (2011)MathSciNetGoogle Scholar
  29. 29.
    Sundararajan, A., Sarwat, A.I.: Roadmap to prepare distribution grid-tied photovoltaic site data for performance monitoring. In: International Conference on Big Data, IoT and Data Science (BID), pp. 101–115 (2017)Google Scholar
  30. 30.
    Anzalchi, A., Sundararajan, A., Moghadasi, A., Sarwat A. power quality and voltage profile analyses of high penetration grid-tied photovoltaics: a case study. In: 2017 IEEE Industry Applications Society Annual Meeting, pp. 1–8 (2017)Google Scholar
  31. 31.
    Zhang, Z.: Missing data exploration: highlighting graphical presentation of missing pattern. Ann. Transl. Med. 3(22), 356–362 (2015)Google Scholar
  32. 32.
    Kabacoff, R.I.: Advanced Methods for missing data (2015).
  33. 33.
    Cheng, X., Cook, D., Hofmann, H.: Visually exploring missing values in multivariable data using a graphical user interface. J. Stat. Softw. 68(6), 1–23 (2015)CrossRefGoogle Scholar
  34. 34.
    Rubin, D.B.: Inference and missing data. Biometrika 63(3), 581–592 (1976)MathSciNetCrossRefGoogle Scholar
  35. 35.
    Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. 2nd edn. (2002). Scholar
  36. 36.
    Garson, G.D.: Missing values analysis & data imputation. In: Statistical Associates Blue Book Series (2015)Google Scholar
  37. 37.
    Jamshidian, M., Jalal, S.: Tests of Homoscedasticity, Normality, and Missing Completely at Random for Incomplete Multivariate Data. J. Psychometrika 75(4), 649–674 (2010)MathSciNetCrossRefGoogle Scholar
  38. 38.
    Srivastava, M.S., Dolatabadi, M.: Multiple imputation and other resampling schemes for imputing missing observations. J. Multivar. Anal. 100(9), 1919–1937 (2009)MathSciNetCrossRefGoogle Scholar
  39. 39.
    Rayner, G.D., Rayner, J.C.W.: Power of the Neyman smooth tests for the uniform distribution. J. Appl. Math. Decis. Sci. 5(3), 181–191 (2001)CrossRefGoogle Scholar
  40. 40.
    Yu, B.P., Lemeshko, B.: A review of the properties of tests for uniformity. In: 12th International Conference on Actual Problems of Electronics Instrument Engineering (APEIE), vol. 1 (2014)Google Scholar
  41. 41.
    Jamshidian, M., Jalal, S., Jansen, C.: MissMech: an R package for testing homoscedasticity, multivariate normality, and missing completely at random (MCAR). J. Stat. Softw. 56(6), 1–31 (2014)CrossRefGoogle Scholar
  42. 42.
    Gelman, A.: Missing-data imputation. In: Data Analysis Using Regression and Multilevel/Hierarchical Models, pp. 529–544 (2006)Google Scholar
  43. 43.
    Zhang, Z.: Missing data imputation: focusing on single imputation. Ann. Transl. Med. 1, 1–8 (2016)Google Scholar
  44. 44.
    Lodder, P.: To Impute or not Impute: That’s the Question. Paper Methodological Advice, University of Amsterdam, pp. 1–7.
  45. 45.
    Dempster, A.P., Laird, M.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. B. 39(1), 1–38 (1977). Scholar
  46. 46.
    Honaker, J., King, G.: What to do about missing values in time-series cross-section data. Am. J. Polit. Sci. 2, 561–581 (2010)CrossRefGoogle Scholar
  47. 47.
    Honaker, J., King, G., Blackwell, M.: Amelia II: a program for missing data. J. Stat. Softw. 7, 1–47 (2011).
  48. 48.
    Beretta, L., Santaniello. A.: Nearest neighbor imputation algorithms: a critical evaluation. In: 5th Translational Bioinformatics Conference (TBC), no. 3, pp. 198–208 (2015)Google Scholar
  49. 49.
    Stekhoven, D.J., Buhlmann, P.: MissForest-non-parametric missing value imputation for mixed-type data. J. Bioinform. 1, 112–118 (2012)CrossRefGoogle Scholar
  50. 50.
    Khalid, A., Sundararajan, A., Acharya, I., Sarwat, A.I.: Prediction of Li-Ion battery state of charge using multilayer perceptron and long short-term memory models. In: 2019 IEEE Transportation Electrification Conference (ITEC) (2019)Google Scholar
  51. 51.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: 13th International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 1–8 (2010)Google Scholar
  52. 52.
    Nguyen, D., Widrow, B.: Improving the learning speed of 2-layer neural networks by choosing initial values of the adaptive weights. In: 1990 IJCNN International Joint Conference on Neural Networks, vol. 3, pp. 21–26 (1990)Google Scholar
  53. 53.
    Kalman, B.L., Kwasny, S.C.: Why tanh: choosing a sigmoidal function. In: Proceedings of IJCNN International Joint Conference on Neural Networks. vol. 4, pp. 578–581 (1992)Google Scholar
  54. 54.
    Karlik, B., Olgac, A.V.: Performance analysis of various activation functions in generalized MLP architectures of neural networks. Int. J. Artif. Intell. Expert Syst. 4, 111–122 (2011)Google Scholar
  55. 55.
    Zheng, H., Yang, Z., Liu, W., Liang, J., Li, Y.: Improving deep neural networks using softplus units. In: 2015 International Joint Conference on Neural Networks (IJCNN), pp. 1–4 (2015)Google Scholar
  56. 56.
    Kingma, D.P., Ba, J.L.: Adam: a method for Stochastic Optimization. In: International Conference on Machine Learning, pp. 1–15 (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Florida International UniversityMiamiUSA

Personalised recommendations