Skip to main content

TSPredIT: Integrated Tuning of Data Preprocessing and Time Series Prediction Models

  • Chapter
  • First Online:
Transactions on Large-Scale Data- and Knowledge-Centered Systems LIV

Abstract

Prediction is one of the most important activities while working with time series. There are many alternative ways to model the time series. Finding the right one is challenging to model them. Most data-centric models (either statistical or machine learning) have hyperparameters to tune. Setting them right is mandatory for good predictions. It is even more complex since time series prediction also demands choosing a data preprocessing that complies with the chosen model. Many time series frameworks, such as Scikit Learning, have features to build models and tune their hyperparameters. However, only some works address tuning data preprocessing hyperparameters and model building. TSPredIT addresses this issue in this scope by providing a framework that seamlessly integrates data preprocessing activities with models’ hyperparameters. TSPredIT is made available as an R-package, which provides functions for defining and conducting time series prediction, including data pre(post)processing, decomposition, hyperparameter optimization, modeling, prediction, and accuracy assessment. Besides, TSPredIT is also extensible, which significantly expands the framework’s applicability, especially with other languages such as Python.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/cefet-rj-dal/tspredit.

  2. 2.

    https://cran.r-project.org/web/packages/daltoolbox/index.html.

  3. 3.

    http://www.fertilizer.org.

  4. 4.

    https://eic.cefet-rj.br/~dal/tspredit/.

  5. 5.

    https://cefet-rj-dal.github.io/tspredit/.

References

  1. Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)

    MathSciNet  MATH  Google Scholar 

  2. Bischl, B., et al.: mlr: machine learning in R. J. Mach. Learn. Res. 17(170), 1–5 (2016)

    Google Scholar 

  3. Box, G.E.P., Jenkins, G.M., Reinsel, G.C., Ljung, G.M.: Time Series Analysis: Forecasting and Control. Wiley, Hoboken (2015)

    Google Scholar 

  4. Cheng, C., et al.: Time series forecasting for nonlinear and non-stationary processes: a review and comparative study. IIE Trans. (Ins. Ind. Eng.) 47(10), 1053–1071 (2015). https://doi.org/10.1080/0740817X.2014.999180

    Article  Google Scholar 

  5. Davydenko, A., Fildes, R.: Measuring forecasting accuracy: the case of judgmental adjustments To SKU-level demand forecasts. Int. J. Forecast. 29(3), 510–522 (2013). https://doi.org/10.1016/j.ijforecast.2012.09.002

    Article  Google Scholar 

  6. Diebold, F., Lopez, J.: 8 Forecast evaluation and combination. Handb. Stat. 14, 241–268 (1996). https://doi.org/10.1016/S0169-7161(96)14010-4

    Article  MathSciNet  Google Scholar 

  7. Diebold, F., Mariano, R.: Comparing predictive accuracy. J. Bus. Econ. Stat. 20(1), 134–144 (2002). https://doi.org/10.1198/073500102753410444

    Article  MathSciNet  Google Scholar 

  8. Eugster, M.J.A., Leisch, F.: Bench plot and mixed effects models: first steps toward a comprehensive benchmark analysis toolbox. In: Brito, P. (ed.) Compstat 2008, pp. 299–306. Physica Verlag, Heidelberg, Germany (2008)

    Google Scholar 

  9. Garcia, S., Luengo, J., Herrera, F.: Data Preprocessing in Data Mining. Springer (aug 2014). https://doi.org/10.1007/978-3-319-10247-4

  10. Gujarati, D.N.: Essentials of Econometrics. SAGE (sep 2021)

    Google Scholar 

  11. Hao, J., Ho, T.: Machine learning made easy: a review of Scikit-learn package in python programming language. J. Educ. Behav. Stat. 44(3), 348–361 (2019). https://doi.org/10.3102/1076998619832248

    Article  Google Scholar 

  12. Hyndman, R., Khandakar, Y.: Automatic time series forecasting: The forecast package for R. J. Stat. Softw. 27(3), 1–22 (2008). https://doi.org/10.18637/jss.v027.i03

  13. Hyndman, R., Koehler, A., Snyder, R., Grose, S.: A state space framework for automatic forecasting using exponential smoothing methods. Int. J. Forecast. 18(3), 439–454 (2002). https://doi.org/10.1016/S0169-2070(01)00110-8

    Article  Google Scholar 

  14. Hyndman, R.J., Athanasopoulos, G.: Forecasting: principles and practice. OTexts (may 2018)

    Google Scholar 

  15. Izaú, L., et al.: Towards robust cluster-based hyperparameter optimization. In: Anais do Simpósio Brasileiro de Banco de Dados (SBBD), pp. 439–444. SBC (sep 2022). https://doi.org/10.5753/sbbd.2022.224330

  16. Khalid, R., Javaid, N.: A survey on hyperparameters optimization algorithms of forecasting models in smart grid. Sustain. Cities Soc. 61, 102275 (2020). https://doi.org/10.1016/j.scs.2020.102275

  17. Kumar, A., McCann, R., Naughton, J., Patel, J.M.: Model selection management systems: the next frontier of advanced analytics. ACM SIGMOD Rec. 44(4), 17–22 (2016). https://doi.org/10.1145/2935694.2935698

    Article  Google Scholar 

  18. Lessmann, S., Baesens, B., Mues, C., Pietsch, S.: Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans. Softw. Eng. 34(4), 485–496 (2008). https://doi.org/10.1109/TSE.2008.35

    Article  Google Scholar 

  19. Lim, B., Zohren, S.: Time-series forecasting with deep learning: a survey. Philos. Trans. R. Soc. A: Math. Phys. Eng. Sci. 379(2194), 20200209 (2021). https://doi.org/10.1098/rsta.2020.0209

  20. Lindemann, B., Müller, T., Vietz, H., Jazdi, N., Weyrich, M.: A survey on long short-term memory networks for time series prediction. In: Procedia CIRP. vol. 99, pp. 650–655 (2021). https://doi.org/10.1016/j.procir.2021.03.088

  21. Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., Zhang, G.: Learning under concept drift: a review. IEEE Trans. Knowl. Data Eng. 31(12), 2346–2363 (2019). https://doi.org/10.1109/TKDE.2018.2876857

    Article  Google Scholar 

  22. Moreno, A.V., Rivas, A.J.R., Godoy, M.D.P.: predtoolsTS: Time Series Prediction Tools. Tech. rep.,https://cran.r-project.org/package=predtoolsTS (2018)

  23. Mumuni, A., Mumuni, F.: Data augmentation: a comprehensive survey of modern approaches. Array 16, 100258 (2022). https://doi.org/10.1016/j.array.2022.100258

  24. Ogasawara, E., Martinez, L., De Oliveira, D., Zimbrão, G., Pappa, G., Mattoso, M.: Adaptive normalization: a novel data normalization approach for non-stationary time series. In: Proceedings of the International Joint Conference on Neural Networks (2010). https://doi.org/10.1109/IJCNN.2010.5596746

  25. Pacheco, C., et al.: Exploring data preprocessing and machine learning methods for forecasting worldwide fertilizers consumption. In: Proceedings of the International Joint Conference on Neural Networks. vol. 2022-July (2022). https://doi.org/10.1109/IJCNN55064.2022.9892325

  26. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  27. Ramey, J.A.: sorting hat: sorting hat. Tech. rep., https://cran.r-project.org/web/packages/sortinghat/index.html (2013)

  28. Ran, Z.Y., Hu, B.G.: Parameter identifiability in statistical machine learning: a review. Neural Comput. 29(5), 1151–1203 (2017). https://doi.org/10.1162/NECOa00947

    Article  MathSciNet  MATH  Google Scholar 

  29. Salles, R., Assis, L., Guedes, G., Bezerra, E., Porto, F., Ogasawara, E.: A framework for benchmarking machine learning methods using linear models for univariate time series prediction. In: Proceedings of the International Joint Conference on Neural Networks. vol. 2017-May, pp. 2338–2345 (2017). https://doi.org/10.1109/IJCNN.2017.7966139

  30. Salles, R., Belloze, K., Porto, F., Gonzalez, P., Ogasawara, E.: Nonstationary time series transformation methods: an experimental review. Knowl.-Based Syst. 164, 274–291 (2019). https://doi.org/10.1016/j.knosys.2018.10.041

    Article  Google Scholar 

  31. Salles, R., Pacitti, E., Bezerra, E., Porto, F., Ogasawara, E.: TSPred: a framework for nonstationary time series prediction. Neurocomputing 467, 197–202 (2022). https://doi.org/10.1016/j.neucom.2021.09.067

    Article  Google Scholar 

  32. Sarwar Murshed, M., Murphy, C., Hou, D., Khan, N., Ananthanarayanan, G., Hussain, F.: Machine learning at the network edge: a survey. ACM Comput. Surv. 54(8), 1–37 (2022). https://doi.org/10.1145/3469029

  33. Talavera, E., Iglesias, G., González-Prieto, A., Mozo, A., Gómez-Canaval, S.: Data Augmentation techniques in time series domain: A survey and taxonomy (jun 2022). https://doi.org/10.48550/arXiv.2206.13508,http://arxiv.org/abs/2206.13508

  34. Von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007). https://doi.org/10.1007/s11222-007-9033-z

    Article  MathSciNet  Google Scholar 

  35. Wen, Q., et al.: Time series data augmentation for deep learning: a survey. In: IJCAI International Joint Conference on Artificial Intelligence, pp. 4653–4660 (2021)

    Google Scholar 

  36. Wickham, H.: Advanced R. CRC Press, second edn. (may 2019)

    Google Scholar 

Download references

Acknowledgements

The authors thank CNPq, CAPES (finance code 001), and FAPERJ for partially sponsoring this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eduardo Ogasawara .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer-Verlag GmbH Germany, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Salles, R. et al. (2023). TSPredIT: Integrated Tuning of Data Preprocessing and Time Series Prediction Models. In: Hameurlain, A., Tjoa, A.M., Boucelma, O., Toumani, F. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems LIV. Lecture Notes in Computer Science(), vol 14160. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-68014-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-68014-8_2

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-68013-1

  • Online ISBN: 978-3-662-68014-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics