Abstract
Missing data poses a significant challenge in extensive datasets, particularly those containing time-series information, leading to potential inaccuracies in data analysis and machine learning model development. To address the issue, this paper compared and evaluated four imputation methods: MissForest, MICE, Simplefill, and Softimpute which utilized Random Forest Algorithm. The research examines the impact of missing ratios and temporal variations on the performance of the imputation methods. The results indicated that MissForest consistently outperformed other methods, exhibiting the lowest RMSE values and a high coefficient of determination (R2), indicating its accuracy and ability to explain the variation in the data. Furthermore, graphical analyses demonstrated the stability of MissForest over time, while MICE and Simplefill showed higher sensitivity to date changes. Softimpute demonstrated relative consistency but slightly lower performance compared to MissForest. Overall, this study highlights the effectiveness of MissForest as the preferred imputation method for AVL time-series data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Little RJ, Rubin DB (2019) Statistical analysis with missing data. Wiley, New York
Brieman L (2001) Random forests. Mach Learn 45:5–32
Tao W, Wang G, Sun Z, Xiao S, Pan L, Wu Q, Zhang M (2023) Feature optimization method for white feather broiler health monitoring technology. Eng Appl Artif Intell 123:106372
Navin KS, Nehemiah HK et al (2023) A classification framework using filter-wrapper based feature selection approach for the diagnosis of congenital heart failure. J Intell Fuzzy Syst 44(4):6183–6218
Newman DA (2009) Missing data techniques and low response rates: the role of systematic nonresponse parameters. In: Statistical and methodological myths and urban legend: doctrine, verity, and fable in the organizational and social sciences, p 7036
Little RJ (2002) Statistical analysis with missing data, 2nd edn. Wiley, Hoboken
Jamshidin M, Benter P (1999) MIL estimation of mean and covariance structures with missing data using complete data routines. J Educ Behav Stat 24(1):21–41
Gillespie T (2014) The relevance of algorithms. Media Technol Essays Commun Mater Soc 167:167
Jamaludin FAKR et al (2022) A review of current publications trend on missing data imputation over three decades: direction and future research. Neural Comput Appl 34:18325–18340
Medjahed SA (2013) Breast cancer diagnosis by using k-nearest neighbor with different distances and classification rules. Int J Comput Appl 62:1–5
Mustapha H (2019) Science direct investigating the use of random forest in software effort estimation on Investigating the use of random forest in software effort estimation. Proc Comput Sci 148:343–352
Chong D, Zhu N, Luo W, Pan X (2019) Human thermal risk prediction in indoor hyperthermal environments based on random forest. Sustain Cities Soc 49:101595
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Jaafar, N.N., Rosdi, M.N.A., Jamaludin, K.R., Ramlie, F., Talib, H.A. (2024). Imputation Analysis of Time-Series Data Using a Random Forest Algorithm. In: Mohd. Isa, W.H., Khairuddin, I.M., Mohd. Razman, M.A., Saruchi, S.'., Teh, SH., Liu, P. (eds) Intelligent Manufacturing and Mechatronics. iM3F 2023. Lecture Notes in Networks and Systems, vol 850. Springer, Singapore. https://doi.org/10.1007/978-981-99-8819-8_4
Download citation
DOI: https://doi.org/10.1007/978-981-99-8819-8_4
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8818-1
Online ISBN: 978-981-99-8819-8
eBook Packages: EngineeringEngineering (R0)