TSPredIT: Integrated Tuning of Data Preprocessing and Time Series Prediction Models

Salles, Rebecca; Pacitti, Esther; Bezerra, Eduardo; Marques, Celso; Pacheco, Carla; Oliveira, Carla; Porto, Fabio; Ogasawara, Eduardo

doi:10.1007/978-3-662-68014-8_2

Rebecca Salles¹¹,
Esther Pacitti^12,13,
Eduardo Bezerra¹¹,
Celso Marques¹¹,
Carla Pacheco¹¹,
Carla Oliveira^11,13,14,
Fabio Porto¹⁴ &
…
Eduardo Ogasawara¹¹

Part of the book series: Lecture Notes in Computer Science ((TLDKS,volume 14160))

79 Accesses

Abstract

Prediction is one of the most important activities while working with time series. There are many alternative ways to model the time series. Finding the right one is challenging to model them. Most data-centric models (either statistical or machine learning) have hyperparameters to tune. Setting them right is mandatory for good predictions. It is even more complex since time series prediction also demands choosing a data preprocessing that complies with the chosen model. Many time series frameworks, such as Scikit Learning, have features to build models and tune their hyperparameters. However, only some works address tuning data preprocessing hyperparameters and model building. TSPredIT addresses this issue in this scope by providing a framework that seamlessly integrates data preprocessing activities with models’ hyperparameters. TSPredIT is made available as an R-package, which provides functions for defining and conducting time series prediction, including data pre(post)processing, decomposition, hyperparameter optimization, modeling, prediction, and accuracy assessment. Besides, TSPredIT is also extensible, which significantly expands the framework’s applicability, especially with other languages such as Python.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)
MathSciNet MATH Google Scholar
Bischl, B., et al.: mlr: machine learning in R. J. Mach. Learn. Res. 17(170), 1–5 (2016)
Google Scholar
Box, G.E.P., Jenkins, G.M., Reinsel, G.C., Ljung, G.M.: Time Series Analysis: Forecasting and Control. Wiley, Hoboken (2015)
Google Scholar
Cheng, C., et al.: Time series forecasting for nonlinear and non-stationary processes: a review and comparative study. IIE Trans. (Ins. Ind. Eng.) 47(10), 1053–1071 (2015). https://doi.org/10.1080/0740817X.2014.999180
Article Google Scholar
Davydenko, A., Fildes, R.: Measuring forecasting accuracy: the case of judgmental adjustments To SKU-level demand forecasts. Int. J. Forecast. 29(3), 510–522 (2013). https://doi.org/10.1016/j.ijforecast.2012.09.002
Article Google Scholar
Diebold, F., Lopez, J.: 8 Forecast evaluation and combination. Handb. Stat. 14, 241–268 (1996). https://doi.org/10.1016/S0169-7161(96)14010-4
Article MathSciNet Google Scholar
Diebold, F., Mariano, R.: Comparing predictive accuracy. J. Bus. Econ. Stat. 20(1), 134–144 (2002). https://doi.org/10.1198/073500102753410444
Article MathSciNet Google Scholar
Eugster, M.J.A., Leisch, F.: Bench plot and mixed effects models: first steps toward a comprehensive benchmark analysis toolbox. In: Brito, P. (ed.) Compstat 2008, pp. 299–306. Physica Verlag, Heidelberg, Germany (2008)
Google Scholar
Garcia, S., Luengo, J., Herrera, F.: Data Preprocessing in Data Mining. Springer (aug 2014). https://doi.org/10.1007/978-3-319-10247-4
Gujarati, D.N.: Essentials of Econometrics. SAGE (sep 2021)
Google Scholar
Hao, J., Ho, T.: Machine learning made easy: a review of Scikit-learn package in python programming language. J. Educ. Behav. Stat. 44(3), 348–361 (2019). https://doi.org/10.3102/1076998619832248
Article Google Scholar
Hyndman, R., Khandakar, Y.: Automatic time series forecasting: The forecast package for R. J. Stat. Softw. 27(3), 1–22 (2008). https://doi.org/10.18637/jss.v027.i03
Hyndman, R., Koehler, A., Snyder, R., Grose, S.: A state space framework for automatic forecasting using exponential smoothing methods. Int. J. Forecast. 18(3), 439–454 (2002). https://doi.org/10.1016/S0169-2070(01)00110-8
Article Google Scholar
Hyndman, R.J., Athanasopoulos, G.: Forecasting: principles and practice. OTexts (may 2018)
Google Scholar
Izaú, L., et al.: Towards robust cluster-based hyperparameter optimization. In: Anais do Simpósio Brasileiro de Banco de Dados (SBBD), pp. 439–444. SBC (sep 2022). https://doi.org/10.5753/sbbd.2022.224330
Khalid, R., Javaid, N.: A survey on hyperparameters optimization algorithms of forecasting models in smart grid. Sustain. Cities Soc. 61, 102275 (2020). https://doi.org/10.1016/j.scs.2020.102275
Kumar, A., McCann, R., Naughton, J., Patel, J.M.: Model selection management systems: the next frontier of advanced analytics. ACM SIGMOD Rec. 44(4), 17–22 (2016). https://doi.org/10.1145/2935694.2935698
Article Google Scholar
Lessmann, S., Baesens, B., Mues, C., Pietsch, S.: Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans. Softw. Eng. 34(4), 485–496 (2008). https://doi.org/10.1109/TSE.2008.35
Article Google Scholar
Lim, B., Zohren, S.: Time-series forecasting with deep learning: a survey. Philos. Trans. R. Soc. A: Math. Phys. Eng. Sci. 379(2194), 20200209 (2021). https://doi.org/10.1098/rsta.2020.0209
Lindemann, B., Müller, T., Vietz, H., Jazdi, N., Weyrich, M.: A survey on long short-term memory networks for time series prediction. In: Procedia CIRP. vol. 99, pp. 650–655 (2021). https://doi.org/10.1016/j.procir.2021.03.088
Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., Zhang, G.: Learning under concept drift: a review. IEEE Trans. Knowl. Data Eng. 31(12), 2346–2363 (2019). https://doi.org/10.1109/TKDE.2018.2876857
Article Google Scholar
Moreno, A.V., Rivas, A.J.R., Godoy, M.D.P.: predtoolsTS: Time Series Prediction Tools. Tech. rep.,https://cran.r-project.org/package=predtoolsTS (2018)
Mumuni, A., Mumuni, F.: Data augmentation: a comprehensive survey of modern approaches. Array 16, 100258 (2022). https://doi.org/10.1016/j.array.2022.100258
Ogasawara, E., Martinez, L., De Oliveira, D., Zimbrão, G., Pappa, G., Mattoso, M.: Adaptive normalization: a novel data normalization approach for non-stationary time series. In: Proceedings of the International Joint Conference on Neural Networks (2010). https://doi.org/10.1109/IJCNN.2010.5596746
Pacheco, C., et al.: Exploring data preprocessing and machine learning methods for forecasting worldwide fertilizers consumption. In: Proceedings of the International Joint Conference on Neural Networks. vol. 2022-July (2022). https://doi.org/10.1109/IJCNN55064.2022.9892325
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Ramey, J.A.: sorting hat: sorting hat. Tech. rep., https://cran.r-project.org/web/packages/sortinghat/index.html (2013)
Ran, Z.Y., Hu, B.G.: Parameter identifiability in statistical machine learning: a review. Neural Comput. 29(5), 1151–1203 (2017). https://doi.org/10.1162/NECOa00947
Article MathSciNet MATH Google Scholar
Salles, R., Assis, L., Guedes, G., Bezerra, E., Porto, F., Ogasawara, E.: A framework for benchmarking machine learning methods using linear models for univariate time series prediction. In: Proceedings of the International Joint Conference on Neural Networks. vol. 2017-May, pp. 2338–2345 (2017). https://doi.org/10.1109/IJCNN.2017.7966139
Salles, R., Belloze, K., Porto, F., Gonzalez, P., Ogasawara, E.: Nonstationary time series transformation methods: an experimental review. Knowl.-Based Syst. 164, 274–291 (2019). https://doi.org/10.1016/j.knosys.2018.10.041
Article Google Scholar
Salles, R., Pacitti, E., Bezerra, E., Porto, F., Ogasawara, E.: TSPred: a framework for nonstationary time series prediction. Neurocomputing 467, 197–202 (2022). https://doi.org/10.1016/j.neucom.2021.09.067
Article Google Scholar
Sarwar Murshed, M., Murphy, C., Hou, D., Khan, N., Ananthanarayanan, G., Hussain, F.: Machine learning at the network edge: a survey. ACM Comput. Surv. 54(8), 1–37 (2022). https://doi.org/10.1145/3469029
Talavera, E., Iglesias, G., González-Prieto, A., Mozo, A., Gómez-Canaval, S.: Data Augmentation techniques in time series domain: A survey and taxonomy (jun 2022). https://doi.org/10.48550/arXiv.2206.13508,http://arxiv.org/abs/2206.13508
Von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007). https://doi.org/10.1007/s11222-007-9033-z
Article MathSciNet Google Scholar
Wen, Q., et al.: Time series data augmentation for deep learning: a survey. In: IJCAI International Joint Conference on Artificial Intelligence, pp. 4653–4660 (2021)
Google Scholar
Wickham, H.: Advanced R. CRC Press, second edn. (may 2019)
Google Scholar

Download references

Acknowledgements

The authors thank CNPq, CAPES (finance code 001), and FAPERJ for partially sponsoring this research.

Author information

Authors and Affiliations

Federal Center for Technological Education of Rio de Janeiro (CEFET/RJ), Rio de Janeiro, Brazil
Rebecca Salles, Eduardo Bezerra, Celso Marques, Carla Pacheco, Carla Oliveira & Eduardo Ogasawara
University of Montpellier, Montpellier, France
Esther Pacitti
National Institute for Research in Digital Science and Technology (INRIA), University of Montpellier, Montpellier, France
Esther Pacitti & Carla Oliveira
National Laboratory for Scientific Computing (LNCC), Petropolis, Brazil
Carla Oliveira & Fabio Porto

Authors

Rebecca Salles
View author publications
You can also search for this author in PubMed Google Scholar
Esther Pacitti
View author publications
You can also search for this author in PubMed Google Scholar
Eduardo Bezerra
View author publications
You can also search for this author in PubMed Google Scholar
Celso Marques
View author publications
You can also search for this author in PubMed Google Scholar
Carla Pacheco
View author publications
You can also search for this author in PubMed Google Scholar
Carla Oliveira
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Porto
View author publications
You can also search for this author in PubMed Google Scholar
Eduardo Ogasawara
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eduardo Ogasawara .

Editor information

Editors and Affiliations

Paul Sabatier University, IRIT, Toulouse, France
Abdelkader Hameurlain
Technical University of Vienna, IFS, Vienna, Austria
A Min Tjoa
Aix-Marseille University, LIS, Marseille, France
Omar Boucelma
Université Clermont Auvergne, CNRS, LIMOS, Aubiere, France
Farouk Toumani

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Salles, R. et al. (2023). TSPredIT: Integrated Tuning of Data Preprocessing and Time Series Prediction Models. In: Hameurlain, A., Tjoa, A.M., Boucelma, O., Toumani, F. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems LIV. Lecture Notes in Computer Science(), vol 14160. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-68014-8_2

Download citation

DOI: https://doi.org/10.1007/978-3-662-68014-8_2
Published: 22 September 2023
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-68013-1
Online ISBN: 978-3-662-68014-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

TSPredIT: Integrated Tuning of Data Preprocessing and Time Series Prediction Models