Model selection in reconciling hierarchical time series

Abolghasemi, Mahdi; Hyndman, Rob J.; Spiliotis, Evangelos; Bergmeir, Christoph

doi:10.1007/s10994-021-06126-z

Model selection in reconciling hierarchical time series

Published: 14 January 2022

Volume 111, pages 739–789, (2022)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Model selection in reconciling hierarchical time series

Download PDF

Mahdi Abolghasemi ORCID: orcid.org/0000-0003-3924-7695¹,
Rob J. Hyndman²,
Evangelos Spiliotis³ &
…
Christoph Bergmeir¹

2686 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

Model selection has been proven an effective strategy for improving accuracy in time series forecasting applications. However, when dealing with hierarchical time series, apart from selecting the most appropriate forecasting model, forecasters have also to select a suitable method for reconciling the base forecasts produced for each series to make sure they are coherent. Although some hierarchical forecasting methods like minimum trace are strongly supported both theoretically and empirically for reconciling the base forecasts, there are still circumstances under which they might not produce the most accurate results, being outperformed by other methods. In this paper we propose an approach for dynamically selecting the most appropriate hierarchical forecasting reconciliation method and leading to more accurate coherent forecasts. The approach, which we call conditional hierarchical forecasting, is based on machine learning classification methods that use time series features to select the reconciliation method for each hierarchy. Moreover, it allows the selection to be tailored according to the accuracy measure of preference and the hierarchical level(s) of interest. Our results suggest that conditional hierarchical forecasting can lead to significantly more accurate forecasts than standard approaches, especially at lower hierarchical levels.

Forecasting Combination of Hierarchical Time Series: A Novel Method with an Application to CoVid-19

Probabilistic Reconciliation of Hierarchical Forecast via Bayes’ Rule

Efficient probabilistic reconciliation of forecasts for real-valued and count time series

Article Open access 01 November 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction and background

Forecasting is essential for supporting decision-making, especially in applications that involve a lot of uncertainty. For instance, accurately forecasting the future demand of stock keeping units (SKUs) can significantly improve supply chain management (Ghobbar and Friend, 2003), reduce inventory costs (Syntetos et al., 2010), and increase service levels (Pooya et al., 2019), particularly under the presence of promotions (Giir Ali et al., 2009; Abolghasemi et al., 2020). In order to obtain more accurate forecasts, forecasters typically try to identify the most appropriate forecasting model for each series from a variety of alternatives. Although this task can provide significant improvements under perfect foresight (Fildes, 2001), it is difficult to effectively perform in practice due to model, parameter, and data uncertainty (Petropoulos et al., 2018). Thus, many strategies have been proposed in the literature to effectively perform forecasting model selection (Fildes and Petropoulos, 2015), most of which are based on the in-sample and out-of-sample accuracy of the forecasting models (Tashman, 2000), their complexity (Hyndman et al., 2002), and the features that time series display (Montero-Manso et al., 2020; Petropoulos et al., 2014).

However, in business forecasting applications, data is typically grouped based on its context and characteristics, thus structuring cross-sectional hierarchies. For example, although the demand of an SKU can be reported at a store level, it can be also aggregated (summed) at a regional or national level. Similarly, demand can be aggregated for various SKUs of the same type (e.g., dairy products) or category (e.g., foods). As a result, hierarchical time series introduce additional complexity to the whole forecasting process since, apart from selecting the most appropriate forecasting model for each series, forecasters have also to account for coherence, i.e. make sure that the forecasts produced at the lower hierarchical levels will sum up to those produced at the higher ones (Athanasopoulos et al., 2020). In fact, coherence is a prerequisite in hierarchical forecasting (HF) applications as it ensures that different decisions made across different hierarchical levels will be aligned.

Naturally, the demand recorded at lower hierarchical levels will always add up to the observed demand at higher levels. However, this is rarely the case for forecasts which are usually produced for each series separately and are therefore incoherent. To achieve coherence, various HF methods can be used for reconciling the individual, base forecasts (Spiliotis et al., 2019). The most basic HF method is probably the bottom-up (BU), according to which base forecasts are produced just for the series at the lowest level of the hierarchy, and are then aggregated to provide forecasts for the series at the higher levels (Dangerfield and Morris , 1992). Top-down (TD) is another option which involves forecasting just the series at the highest level of the hierarchy and then using proportions to disaggregate these forecasts and predict the series at the lower levels Gross and Sohl 1990; Athanasopoulos et al. 2009. Middle-out (MO) mixes the above-mentioned methods, producing base forecasts for a middle level of the hierarchy and then aggregating or disaggregating them to forecast the higher and lower levels, respectively (Abolghasemi et al. , 2019). Finally, a variety of HF methods that combine (COM) the forecasts produced at all hierarchical levels have been proposed in the literature, usually resulting in coherent and more accurate forecasts (Hyndman et al., 2011; Wickramasuriya et al., 2019; Jeon et al. , 2019).

From the HF methods found in the literature, a COM method, called minimum trace (Wickramasuriya, 2019, MinT;][), has been distinguished due to the strong theory supporting it and the results of many empirical studies highlighting its merits over other alternatives (Abolghasemi et al., 2019; Burba and Chen, 2021; Spiliotis et al., 2020). However, there are still circumstances under which MinT might fail to provide the most accurate forecasts. For instance, since MinT is based on the estimation of the one-step-ahead error covariance matrix, the method might be proven inappropriate when the in-sample errors of the baseline forecasting models do not represent post-sample accuracy, the assumption that the multi-step forecast error covariance is proportional to the one-step forecast error covariance is unrealistic, or the required estimations are computationally too hard to make. Moreover, since MinT treats all levels equally, it is not optimized with respect to certain hierarchical levels of interest. Finally, given that medians are not additive, there is no reason to expect that MinT will always improve the mean absolute forecast error, or other accuracy measures that are based on absolute forecast errors.

In such cases, simpler HF methods like the BU and the TD may be useful. However, there is inadequate evidence about which of the two methods to use (Hyndman et al., 2011). For example, the BU method is typically regarded as more suitable for short-term forecasts and for hierarchies in which bottom series are not highly correlated and not dominated by noise (Kahn, 1998). On the other hand, the TD method is usually regarded as more appropriate for long-term forecasts, but less accurate for predicting the series at the lower aggregation levels due to information loss (Dangerfield and Morris, 1992; Kahn, 1998). It seems that no reconciliation method can fit all kinds of HF problems and that, similarly to forecasting model selection, the appropriateness of the different HF methods depends on various factors, including the particularities of the time series (Nenova and May, 2016) and the structure of the hierarchy (Abolghasemi et al., 2019; Fliedner, 1999; Fliedner , 2001; Gross and Sohl, 1990). The above findings reconfirm highlight the potential benefits of conditional hierarchical forecasting (CHF); i.e. the improvements in terms of forecasting accuracy that could be possibly achieved if forecasters were able to select the most appropriate HF method according to the characteristics of the series that form a hierarchy. In this paper we propose an approach for performing such a conditional selection using time series features as leading indicators (Kang et al., 2017; Spiliotis et al., 2020a) and machine learning (ML) methods for conducting the classification. Essentially, we suggest that the forecasting accuracy of the different HF methods found in the literature is closely related with the characteristics of the individual series and that, based on these relationships, “horses for courses” can be effectively identified (Petropoulos et al., 2014). In addition, CHF allows the selection to be tailored according to the accuracy measure of preference (e.g. mean absolute or squared error) and the hierarchical level(s) of interest (e.g. top or bottom level), thus adapting to the requirements of the examined forecasting task and effectively supporting decisions.

Table 1 State-of-the-art: Major studies conducted in the field of hierarchical forecasting for reconciling base forecasts using either combination or selection approaches

Model selection in reconciling hierarchical time series

Abstract

Similar content being viewed by others

Forecasting Combination of Hierarchical Time Series: A Novel Method with an Application to CoVid-19

Probabilistic Reconciliation of Hierarchical Forecast via Bayes’ Rule

Efficient probabilistic reconciliation of forecasts for real-valued and count time series

1 Introduction and background

2 Hierarchical forecasting methods

2.1 Bottom-up

2.2 Top-down

2.3 Optimal combination

3 Conditional hierarchical forecasting

4 Data and experimental setup

4.1 Data

4.2 Experimental setup

5 Empirical results and discussion

6 Conclusion

Data availability

Code availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Consent to publication

Ethical approval

Additional information

Publisher's Note

Appendices

Appendix A: CHF Algorithm

Appendix B: Time series characteristics used in this study

Appendix C: Forecasting performance of CHF when additional data sets, optimization criteria, and classification methods are considered

1.1 Sales data set

Appendix D: Variables importance

2.1 Prison data set

2.2 Tourism data set

Appendix E: Computational time

Appendix F: Benchmarking CHF against similar approaches

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation