Skip to main content
Log in

Scenario tree construction driven by heuristic solutions of the optimization problem

  • Original Paper
  • Published:
Computational Management Science Aims and scope Submit manuscript

Abstract

We present a new scenario generation process approach driven purely by the out-of-sample performance of a pool of solutions, obtained by some heuristic procedure. We formulate a loss function that measures the discrepancy between out-of-sample and in-sample (in-tree) performance of the solutions. By minimizing such a (usually non-linear, non-convex) loss function for a given number of scenarios, we receive an approximation of the underlying probability distribution with respect to the optimization problem. This approach is especially convenient in cases where the optimization problem is solvable only for a very limited number of scenarios, but an out-of-sample evaluation of the solution is reasonably fast. Another possible usage is the case of binary distributions, where classical scenario generation methods based on fitting the scenario tree and the underlying distribution do not work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. In a large majority of applications, this still means highly subjective descriptions of the random phenomena.

  2. In other methods, p denote probabilities of particular scenarios. However in our approach, we do not demand their sum to be 1. Thus, we avoid calling them probabilities.

  3. For example the problem of minimum Hamiltonian cycle, where even finding a Hamiltonian cycle in a given graph is NP-complete (Garey and Johnson 1979).

  4. For example, a “nearest neighbor” heuristic for a traveling salesman problem with stochastic travel times may generate only routes with relatively short distances between customers. Thus, travel times on intermediate and long distance edges might be assigned randomly (since none of those edges is included in the pool), and thus, arbitrarily distort the final results.

  5. We would naturally prefer the perfect ranking. Then, for any subset of solutions, solving \(\max _x f(x, {\mathcal {T}})\) and \(\max _x f(x, \xi )\) would be equivalent tasks with respect to our objective. But having a requirement on reaching the perfect ranking of solutions is meaningless, since it implies the ability to solve the problem \(\max _x f(x, \xi )\). Usage of the scenario tree is, then, redundant.

  6. Moreover, a correct in-tree classification of the optimal solution can be reached by luck for a random scenario tree. Thus, our computational test would have to be performed multiple times.

  7. Based on a visual assessment. In order to provide a numerical justification of that claim, we would need to define an appropriate metric for the comparison.

  8. That is, we start the whole procedure by generating a new pool of solutions with the subsequent loss function minimization and final optimization. In the case of sampling, we generate a new scenario tree.

  9. The time limit is set in wall-clock time. This corresponds to CPU time which oscillates around 430s due to presence of 4 cores (not all the subprocesses can be parallelized).

  10. In the (2, 12) case, there are only \(2^{11} = 2048\) unique feasible solutions. Thus, it is not so surprising that some very high-quality (if not optimal) solution can be found among 1000 solutions, especially if they are not generated purely randomly.

  11. Based on our numerical test. Due to the lack of space, we do not report the variance in the table.

  12. Or generally a difficult problem, where the number of scenarios, and thus the number of constraints/variables significantly influence the computational time.

  13. We can compute the numerical approximation of sub-gradients directly from the definition of the sub-gradient instead of deriving them analytically. That is, however, computationally too expensive and significantly influence the computational time.

  14. The function \(\max (\cdot , \cdot )\) is non-differentiable at the point where the two arguments are equal. Hence we use sub-gradient instead of gradient to be precise. It has no practical impact from the computational point of view.

References

  • Ball MO, Colbourn CJ, Provan JS (1995) Network reliability. In: Ball MO, Magnanti TL, Monma CL, Nemhauser GL (eds) Network models, volume 7 of handbooks in operation research & management science, chapter 11. North-Holland, Amsterdam

  • Bent RW, Van Hentenryck P (2004) Scenario-based planning for partially dynamic vehicle routing with stochastic customers. Oper Res 52(6):977–987

    Article  Google Scholar 

  • Birge J, Louveaux F (1997) Introduction to stochastic programming. Springer, New York

    Google Scholar 

  • Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Cario MC, Nelson B (1997) Modeling and generating random vectors with arbitrary marginal distributions and correlation matrix. Technical report, Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, IL

  • Dantzig GB, Infanger G (1993) Multi-stage stochastic linear programs for portfolio optimization. Ann Oper Res 45(1):59–76

    Article  Google Scholar 

  • Fairbrother J, Turner A, Wallace S (2017) Problem-driven scenario generation: an analytical approach to stochastic programs with tail risk measure. ArXiv e-print 1511:03074

    Google Scholar 

  • Garey M, Johnson D (1979) Computers and intractability, a guide to the theory of NP-completeness. Freeman, New York

    Google Scholar 

  • Gendreau M, Jabali O, Rei W (2016) 50th anniversary invited article—future research directions in stochastic vehicle routing. Transp Sci 50(4):1163–1173

    Article  Google Scholar 

  • Haugland D, Wallace SW (1988) Solving many linear programs that differ only in the righthand side. Eur J Oper Res 37(3):318–324

    Article  Google Scholar 

  • Hendrix E, Tóth B (2010) Introduction to nonlinear and global optimization. Springer, New York

    Book  Google Scholar 

  • Higle JL, Sen S (1991) Stochastic decomposition: an algorithm for two-stage linear programs with recourse. Math Oper Res 16:650–669

    Article  Google Scholar 

  • Høyland K, Wallace SW (2001) Generating scenario trees for multistage decision problems. Manag Sci 47(2):295–307

    Article  Google Scholar 

  • Kall P, Wallace SW (1994) Stochastic programming. Wiley, Chichester

    Google Scholar 

  • Kaut M (2014) A copula-based heuristic for scenario generation. Comput Manag Sci 11(4):503–516

    Article  Google Scholar 

  • Kaut M, Wallace SW (2007) Evaluation of scenario-generation methods for stochastic programming. Pac J Optim 3(2):257–271

    Google Scholar 

  • Kaut M, Wallace SW, Vladimirou H, Zenios S (2007) Stability analysis of portfolio management with conditional value-at-risk. Quant Finance 7(4):397–409

    Article  Google Scholar 

  • King AJ, Wallace SW (2012) Modeling feasibility and dynamics, chapter 2. In: Modeling with stochastic programming. Springer series in operations research and financial engineering. Springer, New York

  • King AJ, Wallace SW, Kaut M (2012) Scenario-tree generation, chapter 4. In: Modeling with stochastic programming. Springer series in operations research and financial engineering. Springer, New York

  • Lurie PM, Goldberg MS (1998) An approximate method for sampling correlated random variables from partially-specified distributions. Manag Sci 44(2):203–218

    Article  Google Scholar 

  • Pflug GC (2001) Scenario tree generation for multiperiod financial optimization by optimal discretization. Math Program 89(2):251–271

    Article  Google Scholar 

  • Prochazka V, Wallace SW (2018) Stochastic programs with binary distributions: structural properties of scenario trees and algorithms. Comput Manag Sci 15(3):397–410

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vit Prochazka.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Heuristic for loss function minimization

We provide a detailed description of the heuristic used in the example 2.3.1 for the minimization of the loss function. The method is based on sub-gradients of the loss function with respect to the decision variables, denoted \(g_{p}\) and \(g_{w}\). In order to use them, we deriveFootnote 13\(\frac{dL}{df}\)—the gradient of the loss function with respect to the in-sample evaluation function, further \(\frac{df}{de}\)—the gradient of the in-sample evaluation function with respect to the exceeded capacity, and so on. By doing so, we get a chain of simple operations (square, multiplication, addition, \(\max ()\),Footnote 14 etc.), for which we have standard differentiation rules. By simple applications of the chain rule we get the desired sub-gradients \(g_{p}\) and \(g_{w}\).

Let us note that in other optimization problems it might not be so straightforward to derive sub-gradients as in our case. More complicated functions might come into play. Thus it could require more effort in analytical derivation or the use of numerical sub-gradients, which should always be available, at least in in principle. Or it is possible to choose a different method from non-linear optimization theory, that is not based on gradients [see some textbook on non-linear optimization, for instance Hendrix and Tóth (2010), Boyd and Vandenberghe (2004)].

With the sub-gradient method, we take a small step in the direction of the negative sub-gradient, that is, in the direction of the steepest descent, at every iteration. Such an approach converges to a local minimum, which is also a global minimum in the case of convex minimization. However, not in our case, so we add two features to enhance exploration of the search space in order to avoid termination of our procedure at some low-quality local minimum.

The first feature is to use multiple starts of the procedure from different initial points. The second feature is the recognition and replacement of “useless” scenarios. We recognize these scenarios by evaluating their impact on the loss function. The impact is defined as the change of the loss function values if we remove a particular scenario from the scenario tree. If the change is very small, it means that the particular scenario is not very useful, and we replace the scenario by a new one (chosen randomly). The heuristic is summarized in Algorithm 1.

figure a

The parameters p and w are updated (rows 5 and 6 in Algorithm 1) by small steps in the direction of the steepest descent. The step sizes, denoted \(\alpha ^p\) and \(\alpha ^w\), are decreased with the increasing number of iterations. They are hyper-parameters that enter the procedure and need to be carefully set (too small steps lead to slow convergence, too large to oscillation or divergence). We set these steps based on some trial tests.

The routine of scenario usefulness assessment is computationally expensive. It would require computation of the loss function K times at every iteration. To avoid it, we run a pre-test, where we check the \(\infty \)-norm of the sub-gradient related to each scenario (row 9), which allows us to break the test once we find one of its element greater than \(\epsilon ^w\). Only if the sub-gradient is small, do we proceed to the evaluation of the impact on the loss function.

Initialization and replacement of scenarios is performed by a random draw from the original data \({\hat{w}}\), weights p are randomly set. Algorithm 1 is just a pseudo-code, not the most efficient implementation. Obviously, storing all \(L_{jm}\) is not necessary, since at any time, we can store just two best values—a local one for the j cycle and a global one for the m cycle. We can also compute the value of the loss function while evaluating the sub-gradients of the function.

1.1 Note

We do not claim that this is the most effective heuristic for the problem. Most likely it is not. It is based on a simple sub-gradient method. If the main focus was on developing the most efficient algorithm for this task, it would be possible to build it on more sophisticated algorithms for non-linear optimization, such as the adaptive gradient method, possibly with momentum, or methods based on the second sub-derivative.

We believe this can still serve as an inspiration for developing more efficient algorithms, if needed. We pointed out some issues and suggestions how to overcome them. But for some applications, the presented algorithm is “good enough”, as it was in our case. It is important to realize that even if we could guarantee the global optimum of the loss function, the whole framework would still be a heuristic in the sense that there are no guarantees relative to other feasible solutions that are not included in the pool.

Heuristic for (in)feasibility classification

The main issue when considering (in)feasibility is the minimization of the loss function (15). The problem is that we cannot utilize the sub-gradients of the function \(u(x, {\mathcal {T}})\). The function is not continuous and returns only two values, that means its derivative is 0 (if it is defined). One needs to either use methods for non-linear optimization that do not utilize derivatives or approximate the function u with some differentiable function. The latter approach is used for the computational test in Sect. 2.5.

In our computational test, the classification of (in)feasibility, i.e., the approximation of \(u(x, {\mathcal {T}})\), is performed by a simple neural network model with one hidden layer, sigmoid function as the activation function in both layers and the sum of squares as the measurement of the error. As input we use the vector \(e_s\) scaled to (0, 1). The neural network is trained on the training pool of heuristically obtained solutions. This simple model works well in our case and correctly classifies (in)feasibility of solutions.

The main advantage of this approach is that the neural network can back-propagate the sub-gradients of error (miss-classified solutions) via the weights of the neural network to the sub-gradient of the \(e_s\) vector and further to scenarios \(w_{si}\). Thus, we can, in principal, use Algorithm 1 to find the scenario tree.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Prochazka, V., Wallace, S.W. Scenario tree construction driven by heuristic solutions of the optimization problem. Comput Manag Sci 17, 277–307 (2020). https://doi.org/10.1007/s10287-020-00369-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10287-020-00369-2

Keywords

Navigation