Imputation Analysis of Time-Series Data Using a Random Forest Algorithm

Jaafar, Nur Najmiyah; Rosdi, Muhammad Nur Ajmal; Jamaludin, Khairur Rijal; Ramlie, Faizir; Talib, Habibah Abdul

doi:10.1007/978-981-99-8819-8_4

Nur Najmiyah Jaafar ORCID: orcid.org/0000-0002-3039-9118¹⁵,
Muhammad Nur Ajmal Rosdi^15,16,
Khairur Rijal Jamaludin¹⁶,
Faizir Ramlie¹⁶ &
…
Habibah Abdul Talib¹⁶

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 850))

Included in the following conference series:

Innovative Manufacturing, Mechatronics & Materials Forum

45 Accesses

Abstract

Missing data poses a significant challenge in extensive datasets, particularly those containing time-series information, leading to potential inaccuracies in data analysis and machine learning model development. To address the issue, this paper compared and evaluated four imputation methods: MissForest, MICE, Simplefill, and Softimpute which utilized Random Forest Algorithm. The research examines the impact of missing ratios and temporal variations on the performance of the imputation methods. The results indicated that MissForest consistently outperformed other methods, exhibiting the lowest RMSE values and a high coefficient of determination (R²), indicating its accuracy and ability to explain the variation in the data. Furthermore, graphical analyses demonstrated the stability of MissForest over time, while MICE and Simplefill showed higher sensitivity to date changes. Softimpute demonstrated relative consistency but slightly lower performance compared to MissForest. Overall, this study highlights the effectiveness of MissForest as the preferred imputation method for AVL time-series data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Little RJ, Rubin DB (2019) Statistical analysis with missing data. Wiley, New York
Google Scholar
Brieman L (2001) Random forests. Mach Learn 45:5–32
Article Google Scholar
Tao W, Wang G, Sun Z, Xiao S, Pan L, Wu Q, Zhang M (2023) Feature optimization method for white feather broiler health monitoring technology. Eng Appl Artif Intell 123:106372
Article Google Scholar
Navin KS, Nehemiah HK et al (2023) A classification framework using filter-wrapper based feature selection approach for the diagnosis of congenital heart failure. J Intell Fuzzy Syst 44(4):6183–6218
Article Google Scholar
Newman DA (2009) Missing data techniques and low response rates: the role of systematic nonresponse parameters. In: Statistical and methodological myths and urban legend: doctrine, verity, and fable in the organizational and social sciences, p 7036
Google Scholar
Little RJ (2002) Statistical analysis with missing data, 2nd edn. Wiley, Hoboken
Book Google Scholar
Jamshidin M, Benter P (1999) MIL estimation of mean and covariance structures with missing data using complete data routines. J Educ Behav Stat 24(1):21–41
Article Google Scholar
Gillespie T (2014) The relevance of algorithms. Media Technol Essays Commun Mater Soc 167:167
Google Scholar
Jamaludin FAKR et al (2022) A review of current publications trend on missing data imputation over three decades: direction and future research. Neural Comput Appl 34:18325–18340
Article Google Scholar
Medjahed SA (2013) Breast cancer diagnosis by using k-nearest neighbor with different distances and classification rules. Int J Comput Appl 62:1–5
Google Scholar
Mustapha H (2019) Science direct investigating the use of random forest in software effort estimation on Investigating the use of random forest in software effort estimation. Proc Comput Sci 148:343–352
Article Google Scholar
Chong D, Zhu N, Luo W, Pan X (2019) Human thermal risk prediction in indoor hyperthermal environments based on random forest. Sustain Cities Soc 49:101595
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Engineering Technology Manufacturing and Mechatronics, Universiti Malaysia Pahang Al-Sultan Abdullah, 26600, Pekan, Pahang, Malaysia
Nur Najmiyah Jaafar & Muhammad Nur Ajmal Rosdi
Razak Faculty of Technology and Informatics, Universiti Teknologi Malaysia, Jalan Sultan Yahya Petra, 54100, Kuala Lumpur, Malaysia
Muhammad Nur Ajmal Rosdi, Khairur Rijal Jamaludin, Faizir Ramlie & Habibah Abdul Talib

Authors

Nur Najmiyah Jaafar
View author publications
You can also search for this author in PubMed Google Scholar
Muhammad Nur Ajmal Rosdi
View author publications
You can also search for this author in PubMed Google Scholar
Khairur Rijal Jamaludin
View author publications
You can also search for this author in PubMed Google Scholar
Faizir Ramlie
View author publications
You can also search for this author in PubMed Google Scholar
Habibah Abdul Talib
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nur Najmiyah Jaafar .

Editor information

Editors and Affiliations

Faculty of Manufacturing and Mechatronic Engineering Technology, Universiti Malaysia Pahang Al-Sultan Abdullah, Pekan, Malaysia
Wan Hasbullah Mohd. Isa
Faculty of Manufacturing and Mechatronic Engineering Technology, Universiti Malaysia Pahang Al-Sultan Abdullah, Pekan, Malaysia
Ismail Mohd. Khairuddin
Faculty of Manufacturing and Mechatronic Engineering Technology, Universiti Malaysia Pahang Al-Sultan Abdullah, Pekan, Malaysia
Mohd. Azraai Mohd. Razman
Faculty of Manufacturing and Mechatronic Engineering Technology, Universiti Malaysia Pahang Al-Sultan Abdullah, Pekan, Malaysia
Sarah 'Atifah Saruchi
School of Intelligent Manufacturing Ecosystem, Xi’an Jiaotong-Liverpool University, Suzhou, China
Sze-Hong Teh
Department of Computer Science, University of York, York, UK
Pengcheng Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jaafar, N.N., Rosdi, M.N.A., Jamaludin, K.R., Ramlie, F., Talib, H.A. (2024). Imputation Analysis of Time-Series Data Using a Random Forest Algorithm. In: Mohd. Isa, W.H., Khairuddin, I.M., Mohd. Razman, M.A., Saruchi, S.'., Teh, SH., Liu, P. (eds) Intelligent Manufacturing and Mechatronics. iM3F 2023. Lecture Notes in Networks and Systems, vol 850. Springer, Singapore. https://doi.org/10.1007/978-981-99-8819-8_4

Download citation

DOI: https://doi.org/10.1007/978-981-99-8819-8_4
Published: 18 April 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8818-1
Online ISBN: 978-981-99-8819-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics