Skip to main content
Log in

A mixed integer linear program to compress transition probability matrices in Markov chain bootstrapping

  • Original - OR Modeling/Case Study
  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

Bootstrapping time series is one of the most acknowledged tools to study the statistical properties of an evolutive phenomenon. An important class of bootstrapping methods is based on the assumption that the sampled phenomenon evolves according to a Markov chain. This assumption does not apply when the process takes values in a continuous set, as it frequently happens with time series related to economic and financial phenomena. In this paper we apply the Markov chain theory for bootstrapping continuous-valued processes, starting from a suitable discretization of the support that provides the state space of a Markov chain of order \(k \ge 1\). Even for small k, the number of rows of the transition probability matrix is generally too large and, in many practical cases, it may incorporate much more information than it is really required to replicate the phenomenon satisfactorily. The paper aims to study the problem of compressing the transition probability matrix while preserving the “law” characterising the process that generates the observed time series, in order to obtain bootstrapped series that maintain the typical features of the observed time series. For this purpose, we formulate a partitioning problem of the set of rows of such a matrix and propose a mixed integer linear program specifically tailored for this particular problem. We also provide an empirical analysis by applying our model to the time series of Spanish and German electricity prices, and we show that, in these medium size real-life instances, bootstrapped time series reproduce the typical features of the ones under observation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. The size of the solution space is reduced from \(B\left( n^{k}\right) \) to \([B\left( n\right) ]^{k}\), where \(B\left( n\right) \) is the n-th Bell number, i.e., the number of partitions of the set of n states of a Markov chain of order k.

  2. For the sake of simplicity, we shall not introduce a specific notation for the estimates of the transition probabilities here.

  3. The matrices are available from the authors upon request.

  4. For both Spain and Germany the last observed 2-state is excluded from the computation of the cardinality of \({\mathcal {O}}_{2}\).

  5. Actually, in this case no aggregation at all is performed on the rows of the transition probability matrix which remains the original one.

  6. The matrices are available from the authors upon request.

References

  • Abdel-Moneim, A. M., & Leysieffer, F. W. (1984). Lumpability for non-irreducible finite Markov chains. Journal of Applied Probability, 21(3), 567–574.

    Article  Google Scholar 

  • Anatolyev, S., & Vasnev, A. (2002). Markov chain approximation in bootstrapping autoregressions. Economics Bulletin, 3(19), 1–8.

    Google Scholar 

  • Barr, D. R., & Thomas, M. U. (1977). An eigenvector condition for Markov chain lumpability. Operations Research, 25(6), 1028–1031.

    Article  Google Scholar 

  • Brock, W., Lakonishok, J., & LeBaron, B. (1992). Simple technical trading rules and the stochastic properties of stock returns. The Journal of Finance, 47(5), 1731–1764.

    Article  Google Scholar 

  • Bühlmann, P. (2002). Bootstraps for time series. Statistical Science, 17(1), 52–72.

    Article  Google Scholar 

  • Bühlmann, P., & Wyner, A. J. (1999). Variable length Markov chains. The Annals of Statistics, 27(2), 480–513.

    Article  Google Scholar 

  • Bunn, D. W. (2004). Modelling prices in competitive electricity markets. Chichester: Wiley.

    Google Scholar 

  • Burke, C. J., & Rosenblatt, M. A. (1958). A Markovian function of a Markov chain. The Annals of Mathematical Statistics, 29(4), 1112–1122.

    Article  Google Scholar 

  • Cerqueti, R., Falbo, P., & Pelizzari, C. (2010). Relevant states and memory in Markov chain bootstrapping and simulation. Munich Personal RePEc Archive. http://mpra.ub.uni-muenchen.de/46254/1/MPRApaper46250.pdf

  • Cerqueti, R., Falbo, P., Guastaroba, G., & Pelizzari, C. (2013). A Tabu search heuristic procedure in Markov chain bootstrapping. European Journal of Operational Research, 227(2), 367–384.

    Article  Google Scholar 

  • Ching, W.-K., Ng, M. K., & Fung, E. S. (2008). Higher-order multivariate Markov chains and their applications. Linear Algebra and Its Applications, 428(2–3), 492–507.

  • Chung, F. K. R. (1997). Spectral graph theory. Providence, RI: American Mathematical Society.

    Google Scholar 

  • Deng, K., Mehta, P. G., & Meyn, S. P. (2011). Optimal Kullback–Leibler aggregation via spectral theory of Markov chains. IEEE Transactions on Automatic Control, 56(12), 2793–2808.

    Article  Google Scholar 

  • Efron, B. (1979). Bootstrap methods: Another look at the jackknife. The Annals of Statistics, 7(1), 1–26.

    Article  Google Scholar 

  • Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. New York, NY: Chapman & Hall.

    Book  Google Scholar 

  • Freedman, D. (1984). On bootstrapping two-stage least-squares estimates in stationary linear models. The Annals of Statistics, 12(3), 827–842.

    Article  Google Scholar 

  • Freedman, D. A., & Peters, S. C. (1984). Bootstrapping a regression equation: Some empirical results. Journal of the American Statistical Association, 79(385), 97–106.

    Article  Google Scholar 

  • Hamilton, J. D. (1996). Specification testing in Markov-switching time-series models. Journal of Econometrics, 70(1), 127–157.

    Article  Google Scholar 

  • Hamilton, J. D. (2005). What’s real about the business cycle? Federal Reserve Bank of St. Louis Review, 87(4), 435–452.

    Google Scholar 

  • Huisman, R., & Mahieu, R. (2003). Regime jumps in electricity prices. Energy Economics, 25(5), 425–434.

    Article  Google Scholar 

  • Jeanne, O., & Masson, P. (2000). Currency crises, sunspots and Markov-switching regimes. Journal of International Economics, 50(2), 327–350.

    Article  Google Scholar 

  • Kemeny, J. G., & Snell, J. L. (1976). Finite Markov chains. Berlin: Springer.

    Google Scholar 

  • Kolmogorov, A. N. (1965). Three approaches to the quantitative definition of information. Problemy Peredachi Informatsii, 1(1), 3–11.

    Google Scholar 

  • Meila, M., & Xu, L. (2004). Multiway cuts and spectral clustering. University of Washington—Department of Statistics, 442. https://www.stat.washington.edu/research/reports/2004/tr442.pdf.

  • Mueller, M., & Kramer, S. (2010). Integer linear programming models for constrained clustering. In B. Pfahringer, G. Holmes, & A. Hoffman (Eds.), Discovery science (pp. 159–173). Springer: Berlin.

    Chapter  Google Scholar 

  • Rached, Z., Alalaji, F., & Campbell, L. L. (2004). The Kullback–Leibler divergence rate between Markov sources. IEEE Transactions on Information Theory, 50(5), 917–921.

    Article  Google Scholar 

  • Saǧlam, B., Salman, F. S., Sayin, S., & Türkay, M. (2006). A mixed-integer programming approach to the clustering problem with an application in customer segmentation. European Journal of Operational Research, 173(3), 866–879.

    Article  Google Scholar 

  • Spears, W. M. (1998). A compression algorithm for probability transition matrices. SIAM Journal on Matrix Analysis and Applications, 20, 60–77.

    Article  Google Scholar 

  • Sullivan, R., Timmermann, A., & White, H. (1999). Data-snooping, technical trading rule performance, and the bootstrap. The Journal of Finance, 54(5), 1647–1691.

    Article  Google Scholar 

  • Thomas, M. U. (2010). Aggregation and lumping of DTMCs. In J. J. Cochran, L. A. Cox Jr., P. Kesikinocak, J. P. Kharoufeh, & J. C. Smith (Eds.), Wiley encyclopedia of operations research and management science. Hoboken, NJ: Wiley.

    Google Scholar 

  • Verma, D., & Meila, M. (2003). Comparison of spectral clustering methods. Advances in neural information processing systems, 15. www.cs.washington.edu/spectral/papers/nips03-comparison.ps.

  • Ward, J. H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301), 236–244.

    Article  Google Scholar 

  • Weron, R., Bierbrauer, M., & Trueck, S. (2004). Modeling electricity prices: Jump diffusion and regime switching. Physica A: Statistical Mechanics and Its Applications, 336(1–2), 39–48.

    Article  Google Scholar 

  • Weron, R. (2006). Modeling and forecasting electricity loads and prices: A statistical approach. Chichester: Wiley.

    Book  Google Scholar 

  • White, L. B., Mahony, R., & Brushe, G. D. (2000). Lumpable hidden Markov models-model reduction and reduced complexity filtering. IEEE Transactions on Automatic Control, 43(12), 2297–2306.

    Article  Google Scholar 

  • Zhu, J., Hong, J., & Hughes, J. G. (2002). Using Markov chains for link prediction in adaptive web sites. In D. Bustard, W. Liu, & R. Sterritt (Eds.), SoftWare 2002: Computing in an imperfect world (pp. 55–66). Berlin: Springer.

    Google Scholar 

Download references

Acknowledgments

The fourth and fifth author wish to thank the partial support received from the Spanish Ministry of Science and Technology through grant number MTM2013-46962-C2-1-P.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andrea Scozzari.

Appendices

Appendix 1: Trend and weekly seasonality removal

The estimation of the exponential trend and weekly seasonality is based on the following model:

$$\begin{aligned} e_{t}^{(c)}=\exp (rt+\eta _{1}{\mathbb {I}}_{1}(t)+\eta _{2}{\mathbb {I}} _{2}(t)+\eta _{3}{\mathbb {I}}_{3}(t)+\eta _{4}{\mathbb {I}}_{4}(t)+\eta _{5} {\mathbb {I}}_{5}(t)+\eta _{6}{\mathbb {I}}_{6}(t)+\eta _{7}{\mathbb {I}} _{7}(t)+\varepsilon _{t})\text {,} \end{aligned}$$
(14)

where \(e_{t}^{(c)}\) are the raw original prices, \({\mathbb {I}}_{j}(t)\) is the dummy variable signalling whether t is the jth day of the week, with \( j=1,\dots ,7\), r is the growth rate, \(\eta _{j}\) is the coefficient of dummy variable \({\mathbb {I}}_{j}(t)\), with \(j=1,\dots ,7\), and \( \varepsilon _{t}\) are the errors. If we take the natural logarithm on both sides of formula (14), we obtain the following formula:

$$\begin{aligned} z_{t}=rt+\eta _{1}{\mathbb {I}}_{1}(t)+\eta _{2}{\mathbb {I}}_{2}(t)+\eta _{3} {\mathbb {I}}_{3}(t)+\eta _{4}{\mathbb {I}}_{4}(t)+\eta _{5}{\mathbb {I}}_{5}(t)+\eta _{6}{\mathbb {I}}_{6}(t)+\eta _{7}{\mathbb {I}}_{7}(t)+\varepsilon _{t}\text {,} \end{aligned}$$

where \(z_{t}=\ln e_{t}^{(c)}\).

For estimation purposes, we assume that the usual hypotheses of linear regression on the errors \(\varepsilon _{t}\) hold. We obtain the OLS estimates of r and \(\eta _{j}\), \(j=1,\dots ,7\), and they are significant at a level of \(5\,\%\) (see Table 4).

Table 4 Coefficients estimates of an exponential regression model of trend and weekly seasonality applied to the series of electricity prices of Spain and Germany

To the purpose of removing the exponential trend and weekly seasonality from our original series, we define the series of prices \( e(T)=(e_{1},\dots ,e_t, \dots , e_{T})\), where:

$$\begin{aligned} e_{t}= & {} \exp [z_{t}-(\hat{r}t+\hat{\eta }_{1}{\mathbb {I}}_{1}(t)+\hat{\eta }_{2} {\mathbb {I}}_{2}(t)+\hat{\eta }_{3}{\mathbb {I}}_{3}(t)+\hat{\eta }_{4}{\mathbb {I}} _{4}(t)+\hat{\eta }_{5}{\mathbb {I}}_{5}(t)+\hat{\eta }_{6}{\mathbb {I}}_{6}(t)\\&+\,\hat{ \eta }_{7}{\mathbb {I}}_{7}(t))]\text {, }t=1,\dots ,T\text {.} \end{aligned}$$

Set e(T) is an input of the bootstrapping method, while the output is the bootstrapped series \(x(\ell )=(x_{1},\ldots ,x_{\ell })\). To re-introduce the exponential trend and weekly seasonality in \(x(\ell )\), we multiply each point \(x_{j}\) by \(e^{(\hat{r}j+\hat{\eta }_{1}{\mathbb {I}} _{1}(j)+\hat{\eta }_{2}{\mathbb {I}}_{2}(j)+\hat{\eta }_{3}{\mathbb {I}}_{3}(j)+\hat{ \eta }_{4}{\mathbb {I}}_{4}(j)+\hat{\eta }_{5}{\mathbb {I}}_{5}(j)+\hat{\eta }_{6} {\mathbb {I}}_{6}(j)+\hat{\eta }_{7}{\mathbb {I}}_{7}(j))}\),\(j=1,\dots ,\ell \).

Appendix 2: Initial states, or intervals

Table 5 reports the 12 intervals of the initial partition of the support \([\alpha ,\beta ]\) of the series of Spain and Germany after removal of exponential trend and weekly seasonality.

Table 5 Elements of the initial partition of the support of the exponentially detrended and deseasonalized series of electricity prices of Spain and Germany

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cerqueti, R., Falbo, P., Pelizzari, C. et al. A mixed integer linear program to compress transition probability matrices in Markov chain bootstrapping. Ann Oper Res 248, 163–187 (2017). https://doi.org/10.1007/s10479-016-2181-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10479-016-2181-9

Keywords

Navigation