Rule-based Strategies for Dynamic Life Cycle Investment

In this work, we consider rule-based investment strategies for managing a defined contribution saving scheme under the Dutch pension fund testing model. We found that dynamic rule-based investment can outperform traditional static strategies, by which we mean that the pensioner can achieve the target retirement income with higher probability and limit the shortfall when target is not met. In comparison with the popular dynamic programming technique, the rule-based strategy has a more stable asset allocation throughout time and avoid excessive transactions, which may be hard to explain to the investor. We also study a combined strategy of rule based target and dynamic programming in this work. Another key feature of this work is that there is no risk-free asset under our setting, instead, a matching portfolio is introduced for the investor to avoid unnecessary risk.


Introduction
Nowadays, many people invest their retirement savings in a defined contribution pension scheme. In such a scheme, the contributions are agreed upon and are, e.g., a percentage of one's salary. The pension, however, is uncertain as it depends on the returns on investment. At retirement, the accumulated wealth is converted to a pension income that intends to replace a proportion of the investor's income, typically about 70%, which is referred to as the replacement ratio. In this paper, we propose a dynamic strategy that optimally steers the investor towards a replacement ratio target.
Our dynamic strategy will reduce risk after several years of good returns on investment. It presumes that upward potential concurs with downside risk. Our pension investor is only interested in reaching her replacement ratio target, i.e., not making the target is considered downside risk and she feels indifferent about any two values above the target. We will show that, in this sense, the designed dynamic strategy outperforms static life cycle strategies. By decreasing risk after several good years, our dynamic strategy prevents unnecessary risk taking.
A well-known static life cycle strategy is known as Bogle's rule (Bogle), which prescribes to invest 100% minus one's age in risky assets. Decreasing risk in the course of the life cycle in such a way is called a glide path. When the glide path is known in advance up to retirement, the strategy is static and does not adjust as events unfold. Therefore, static strategies may take unnecessary risk when returns on investment are better than anticipated, see Arnott, Sherrerd, and Wu and Graf for a discussion of drawbacks of static life cycle strategies. The strategy we propose is also rule-based, but it is dynamic as the prescribed rule depends on events that still have to unfold.
In the literature, dynamic strategies are often studied in the context of dynamic programming (Bellman). Dynamic programming optimizes the investment strategy backwards in time by optimizing decisions for the coming period given that consecutive decisions are already taken optimally. Merton was the first to apply dynamic programming to an asset allocation problem with two assets, a risky and a risk-free asset, also allowing for consumption during the investment period. Optimal decisions were based on the constant relative risk aversion utility function. Merton showed that the optimal strategy continuously rebalances, i.e., the optimal allocation is constant.
The literature on optimal asset allocation is very rich, and we cite here some contributions that influenced our work. The authors in Li and Ng introduced mean-variance strategies with respect to a wealth target. The wealth target then allows the investor to identify a surplus: wealth up to the target may be invested in stocks, any remainder is invested in the risk-free rate. Zhang et al. solved a similar, however utility-based, problem and combined dynamic programming with the least squares Monte Carlo method. Upper and lower bounds for the wealth were prescribed in that paper, showing that upward potential comes with downside risk. Terminal wealth is steered towards a desired range by investing the difference between a risk-free-discounted upper bound value and the current wealth in the risk-free asset. Forsyth and Vetzal also applied dynamic programming and used a PDE solver to solve a so-called time-consistent mean-variance problem, meaning that similar mean-variance problems were solved at future times. In addition to mean-variance that balances the mean and variance of returns, they studied a problem with a fixed wealth target. To reduce risk, both Forsyth and Vetzal and Zhang et al. proposed to invest excess wealth in a risk-free asset. Similarly, the rule-based strategies introduced in this paper will invest excess wealth into a so-called matching portfolio. Compared to static strategies, distributions of outcomes are more centered around the target value and the area below the target value will become smaller.
Besides many positive aspects, dynamic programming and its resulting strategies also have some drawbacks. First of all, dynamic programming is computationally rather intensive. Secondly, the corresponding investment decisions can be sensitive to small changes in parameters and underlying assumptions. Because of this, the allocation may fluctuate over time resulting in large turnovers, of which, from a practical perspective, it is hard to explain why they are required. Intuitively defined rules, typically, do not suffer from these drawbacks. Moreover, it is not straightforward to apply dynamic programming to the pension settings as an investor's replacement ratio target often depends on inflation influencing the future, which in turn also influences the future contributions.
Rule-based dynamic strategies fall in between the static and dynamic programming paradigms, when well constructed they aim for the best of both worlds. As shown by Basu, Byrne, and Drew, even simple rule-based strategies that reduce risk half way in the life cycle can outperform static life cycle strategies. Compared to Basu, Byrne, and Drew, our rule-based strategies can reduce risk annually, and consider the market price of future pension payments instead of a wealth target. Next to the rule-based strategies, we will also combine a rule-based strategy with dynamic programming in an integrated approach.
2 The optimal asset allocation problem

Model setting
To demonstrate the rule-based strategy's practical value, we will consider a specific pension investor (we will choose typical retirement data from the Netherlands). At t = 0, the 26 year old investor will start saving up to retirement at time t = T , coinciding here with a retirement age of 67 years. She intends to replace 70% of her income by her pension (including government allowances for old age). Although, in practice, an investor might be interested in insuring longevity risk or be interested in employing advanced withdrawal strategies, Blanchett, Kowara, and Chen illustrates that simple withdrawal strategies can perform well, e.g., based on an annuity with a maturity roughly equal to an investor's life expectancy. Therefore, as we focus on accumulating wealth before retirement, we simply assume the investor buys an annuity that indexes with the expected inflation, i.e., a bond which, apart from indexation for expected inflation, equals annual payoffs, for a period of N , say 20, years after retirement. Whichever withdrawal strategy an investor might follow, the assumption here is that this annuity gives a good estimate of, at least, the investor's income in her first year after retirement, and, thereby, to what extend she can replace her salary for 70% with a pension.
The investor can invest her wealth W t in a risky, equity-like, asset, which is called the return portfolio, or in a safe, bond-like, asset with annual payoffs during retirement, the matching portfolio. In our setting, the strategy will use the matching portfolio to protect the current gains, and it grows with inflation. Therefore, the matching portfolio also carries risk. Put differently, we assume the investor doesn't hedge inflation risk with inflation protected securities as the market for inflation protected is illiquid and strategies that hedge against inflation are not straightforward to follow in practice (Martellini, Milhau, and Tarelli). Finally, we assume there is no risk-free rate to invest money in.
The investor annually manages her portfolio, i.e., decisions, contributions and pension payments are made in discrete time, which runs up to retirement, from t = 0 to t = T . The pension payments start at t = T and run up to t = T + N − 1. At time t ≤ T before retirement, she invests a fraction α t of her wealth W t in the return portfolio. The investor is not allowed to short-sell assets or borrow money, so that 0 ≤ α t ≤ 1. (1) In the dynamic programming literature, α t is referred to as the control (as decisions intend to give the investor control over the outcome). A strategy maps information Z t available at time t, e.g., past returns and current wealth W t , to the desired allocation: Here Z t is adapted to a filtration F t , governing the underlying stochastic processes. Before time t, the information Z t is not yet available, and α t is thus a stochastic quantity. In a static strategy, such as Bogle's rule, α t only depends on time and is known, i.e., not stochastic, not even when the information Z t is not yet available. In practice, risk is reduced towards retirement, meaning that α t typically decreases over time.
Just before rebalancing, the investor makes a contribution c t to the portfolio. These contributions resemble an age-dependent percentage p t , see Table 3, of the investor's salary s t which she earned in the period t − 1 up to t. We assume that the investor's salary s t follows a deterministic career path, i.e., it increases with age. The investor's salary also increases stochastically with the wage inflation w t , see Appendix A.
The investor's objective is to achieve her 70% replacement ratio target at retirement without encurring too much downside risk. The replacement ratio at retirement, R T , is given by (1 + π τ ) . ( Here, the second term divides the investor's average wage in nominal amounts indexed with inflation π t to retirement at t = T , and M t is the market value factor that discounts N future pension payments indexed by expected inflation to time t ≤ T : where E t is the expectation, conditional on F t (i.e., conditional on the information available at time t), and r τ −t t represents the market rates that discount payments from τ − t years into the future back to the present time. Using the market value factor M T at retirement, the first term in (3) converts the accumulated wealth W T to N annual income payments indexed for expected future inflation.
To measure whether a strategy achieves the investor's objective, we use a utility function, U , which, whenever decisions are to be taken, intends to maximize the following expression in expectation: where F t represents current market information, α t is as in (2) and Z T is a vector with outcomes including the terminal replacement ratio. Although other choices are possible, we choose U (.) to be the shortfall below the investor's target replacement ratio of 70%: where there is no shortfall in replacement ratio if it ends above 70%. Note that this measure is not conditional on the shortfall. So, additionally, we will also evaluate a strategy's performance using the 10% conditional value at risk CVaR 0.1 (RR T ) of the replacement ratio, i.e., the expectation of the 10% worst case outcomes as defined by where F −1 RRT (α) is the inverse cumulative distribution function of terminal replacement ratio RR T and represents the α-th quantile below which are the worst case outcomes.

Governing stochastic model
For general applicability, we require that the designed strategies are not defined in terms of the governing stochastic model parameters. That is, the strategies can be applied when different governing stochastic models would be used. We merely assume that the governing stochastic model can be simulated by means of a Monte Carlo simulation. To make this explicit, we choose to use a standard model developed to make risk analyses comparable between Dutch pension funds, see (Koijen, Nijman, and Werker). The model and its calibration are well documented (Draper). Calibration on recent market data and a Monte Carlo simulation of the model are publicly available at the website of the Dutch Central Bank (DNB). In this paper, we use the set of 2017 (quarter 1), which is calibrated on data up to ultimo 2016 and start simulating from there.
In discrete time, the model is a VAR(1) model with normally distributed increments, see Muns for a short summary of the model specification. In the calibration, some structure is imposed to achieve realistic market dynamics. Based on the model, sample paths are generated for the following variables: • Equity returns x t , which are used for the return portfolio; • Inflation π t ; • Wage inflation w t , which equals inflation π t plus 0.5%; • A yield curve with interest rates r m t containing rates for each maturity m. The matching portfolio is tailored to the investor's retirement age. Its returns m t equal the rate of change in the market value factor: where M t is defined in (4). Note that the matching portfolio protects the investor against expected future inflation. To determine the expected future inflation, we use the least squares Monte Carlo technique, as presented in Section B.3. Table 1 gives the annual return statistics of the variables. Due to the fluctuating market price of future pension payments, the standard deviation of the matching returns is very similar to the one of the equity returns. Although the matching portfolio follows these fluctuations, it is considered less risky, in terms of the investor's goals. By investing in the matching portfolio, the pensioner will receive the corresponding amount from the annuity, no matter the future market prices.

Rule-based strategies
In this section, we define three rule-based strategies: a cumulative target strategy that decreases risk once it reaches a cumulative target for the contributions paid so far, an individual target strategy that tracks the investments of the contributions separately and decreases risk once it reaches the target for that contribution, and a combination strategy that combines the two with dynamic programming. The strategies all intend to steer towards a target replacement ratio of 70%, and decrease risk when return on investment develops well. The strategies differ in their views on when return on investment has been developing well enough to decrease risk.

Cumulative target strategy
The cumulative target strategy that we consider here has similarities with the strategies studied in Zhang et al. and Forsyth and Vetzal: risk is reduced once wealth exceeds a pre-defined wealth target. Contrary to Zhang et al. and Forsyth and Vetzal, however, our investor saves for retirement and we relate the wealth target to the price of a bond with payoff equal to the desired pension. Given a density forecast for the matching and return portfolios, see Section 2.2, the strategy depends on two parameters: a required real rate of return r (before retirement) and a discount rate δ (after retirement) to discount pension payments after retirement to the retirement date. At time t before retirement, i.e., t ∈ 0 . . . T , the investor contributes c t to her pension savings, see Table  3. The contributions c τ up to time t, i.e., τ = 0, . . . , t, are supposed to grow with inflation π, plus the real rate of return r, to a target wealth c τ E t F τ at retirement, where F t is given by and the conditional expectation, E t , enforces that the realized inflation is used before time t and the expected inflation is used beyond time t. The wealth targets at retirement for all contributions c τ up to time t are combined and converted into a target pension using a discount factor,M T , which is based on the discount rate δ:M Using the market value factor M t , as defined in (4), this gives us the following current target wealthW t :W where the summation represents the combined wealth targets at retirement for all contributions c τ up to time t.
The cumulative target strategy starts by investing new contributions c t in the risky asset. If the current wealth W t , including the current contribution c t , exceeds the target wealthW t , risk is reduced and W t is transferred to the matching portfolio. For the matching portfolio, the investor follows a buy and hold strategy. New contributions invested in the risky asset, will also be transferred to the matching portfolio if the current wealth W t , which consists of the current contribution c t , the value of the matching portfolio and the value of the return portfolio, exceeds the target wealthW t . In other words, at t = 0, the control α 0 is given by and, for t = 1 . . . T , the control α t is given by

Individual target strategy
Contrary to the cumulative target strategy, the individual target strategy, which is the second strategy we will analyze here, defines a wealth target per contribution and invests each contribution separately, i.e., the wealth W t is seen a sum of the individual wealth components resulting from investing the contributions separately: where W t,τ is the wealth component from investing the contribution c τ . As in (11), a wealth targetW t,τ , at time t for a contribution invested at time τ ≤ t, is given byW Apart from this, the strategy works similarly: the individual contributions are invested in the risky asset until the invested amount exceeds the wealth target for that contribution, in which case they are transferred to the matching portfolio until retirement. Thus, the control α t,τ for investing contribution c t,τ is given by At the aggregated level, the control α t is now given by Conceptually, the difference between the cumulative target strategy and the individual target strategy is what triggers the risk reduction. Contrary to the individual target strategy, in the cumulative target strategy new investments have to make up for insufficient past returns before a transfer to the matching portfolio can take place. On the other hand, in the cumulative target strategy good past returns may cause new contributions to be transferred immediately to the matching portfolio. With the individual target strategy, each contribution has to generate sufficient return on investment before such a transfer takes place.

Combination strategy
Both the cumulative and the individual target strategy either reduce risk by switching completely to the matching portfolio or don't reduce risk at all. Instead of completely switching or not switching at all, the combination strategy, which is the third strategy considered, combines the individual target strategy with dynamic programming to dynamically steer the wealth W t,τ resulting from the contribution c τ above its wealth targetW t,τ . For this, we define the following wealth to target ratio, and solve V (z, t, τ ) = sup whereǓ is a utility function, V (z, t, τ ) is the value function in the dynamic programming problem and the control A t,τ consists of the future investment decisions: Using the dynamic programming principle, it follows that the optimal control, which allows us to solve for the optimal control problem for A * t,τ , backwards in time.
In this context, we choose a utility function that steers the ratio Z T,τ in between the bounds z * min and z * max . This is in line with the investor's goal of minimizing downside risk, and with our assumption that upward potential comes with downside risk. The utility function should be positive concave and takes here the following functional form: where see Figure 1 (note that this is a different utility function than U (·) from (6). Utility functionǓ (·) is clearly concave and continuous on the domain R >0 . We set z * min = 1 and z * max = 3, as this choice fits well with the investor's replacement ratio target and, as we will show in Section 4.1, is sufficient to demonstrate the strategy's added value. Now, we will show that the ratio Z t,τ , between the current wealth W t,τ and its targetW t,τ , evolves in time by making returns on investment in the nominator and updating the inflation expectation in the denominator. Since this time evolution is independent of τ , we can show that the optimal control α * t,τ is independent of τ , i.e., once the optimal control is found, it can be applied to all contributions.
Lemma 1. The optimal control α * t,τ of dynamic programming problem (19) is independent of the contribution c τ and the time τ at which the contribution is made.
Proof. The portfolio wealth W t,τ , accumulated by investing contribution c τ , increases with the return on investment and, therefore, satisfies From (15), (9) and (8), it follows that the wealth target,W t,τ , satisfies Substitution of (23) and (24) in (19) yields that the optimal controls α * t,τ solve (25) This shows that both the value function V (z, t, τ ) and the optimal control α * t,τ are independent of τ .
Lemma 1 implies that, theoretically, the dynamic programming problem has to be solved only once, i.e., the investment decisions for the first contribution c 0 can be used for all other contributions.
For the practical implementation for the dynamic programming algorithm, readers may refer to Appendix B.

Target replacement ratio
The variable r, used in the construction of the wealth target, can be interpreted in multiple ways. First of all, it serves as a discount rate, which is used to compute the present value of contributions that are made in the future. It can also be viewed as an annual return requirement: each contribution is required to have an average annual return of r. A third interpretation of r is that of a future expected annual return. The computation of the expected replacement ratio requires a future annual return assumption.
Let t ∈ T and let F t be the corresponding filtration. The expected replacement ratio R t is defined as Computation of the expected replacement ratio requires four different estimators. The discount rate r is used as an estimator for the future expected annual return. The estimator for the future inflation, I(T ; t), has to be estimated through regression between the future and the past cumulative inflation, as shown in Equation (27). Future salaries are based on the information from Table 3. Lastly, the estimator for the market value factor at the end of the investment horizon, E [M T |F t ], is based on regression between M t and M T , with Φ = {1, x}. See Appendix B.3 for details of the regression method used.
The market value factor is considered to be independent of the discount rate r, inflation and wage inflation (the division operator can therefore be taken out of the expected value operator).
The computation of the target replacement ratio at time t is similar to the computation of the expected replacement ratio. The only difference is that the portfolio wealth, W t , is replaced by the target terminal wealth, W * (t). The target wealth definition causes the target replacement ratio, R * (t), to be independent of the market value factor: (1 + π τ )|F t , by using the independence of the market factor M T and the wealth process and the definition of the current target wealthW t , As mentioned, the investor's target is to reach a replacement ratio of 70%. To translate this target into the wealth target in terms of portfolio wealth, we have:W To steer towards a fixed replacement ratio target,W (t) would have to be altered for each scenario. It is, however, easier to differ the quantity R * (t) slightly between scenarios, from a computational point of view. Instead,W t is defined as in Equation (11) and r is set to the required annual return.
Numerically, we find that the target replacement ratio within a scenario is almost constant throughout time, as can be seen in the bottom-left plot of Figure 2. Small alterations are caused by the estimators for the inflation and the wage inflation. Alterations of up to 0.01 within a scenario are observed for a discount rate of 2.5%. Target replacement ratios are between 0.6847 and 0.7033 for a discount rate of 2.5%.

Numerical evaluation
In this section, we apply the rule-based strategies described in Section 3 to the pension investor introduced in Section 2.1, using the governing stochastic model described in Section 2.2.

Rule-based strategies
To illustrate the dynamics of the rule-based strategies, Figure 2 shows one of the 2000 sample paths for the investor's portfolio dynamics. In particular, the top left figure shows the investor's wealth W t when following the cumulative target strategy (orange) and when the investor's wealth exceeds the targetW t (yellow). Note that when this occurs, the investments are transferred to the matching portfolio (orange line, bottom right figure). The individual target strategy (in green) works similarly, but, as discussed, uses a target per contribution, so that, typically, only part of the wealth is transferred to the matching portfolio (green line, bottom right figure). The bottom left figure illustrates that, in this sample path, the rule-based strategies outperform the optimal static strategy in terms of expected replacement ratio -although only in the first 10 years of the investment the rule-based strategies take substantially more risk, i.e., have a substantially higher allocation to the return portfolio. Therefore, in this particular sample path, one could argue that the better performance comes from the rule-based strategies and not from increased exposure to risk.
The combination strategy is best illustrated by means of the resulting investment decisions, i.e., the optimal control α t,τ as defined by equation (25). Figure 3 illustrates the optimal allocation to the matching portfolio, 1 − α t,0 , for the first contribution c 0 as a function of time t and the wealth to target ratio, as defined by (18). In this example, allocations are restricted to multiples of 20%. Note that, contrary to the rule-based strategies, in the combination strategy risk can be increased and investments can be transferred from the matching to the return portfolio. All together, this makes the combination strategy more refined than the rule-based strategies, which follow a "risk on" or "risk off" approach, in terms of their allocation. Figure 4 compares the distribution of the terminal replacement ratio for the following best performing strategies in terms of the expected shortfall below the investor's 70% replacement ratio target: two rule-based strategies, a combination strategy and a static strategy. The figure illustrates that, as intended, the dynamic strategies reduce downward risk at the expense of upward potential, i.e., the dynamic strategies are centered more around the target replacement ratio of 70%.
A comparison of all strategies is best made by comparing the strategy successes, i.e., whether a strategy achieves the intended 70% replacement ratio target, versus its downside risk, and parametrize the strategies by the parameters that control the strategy's risk appetite, see Figure 5. From this figure, we conclude that all the dynamic strategies clearly outperform the traditional static strategies. Together with the intuitive rationale to reduce risk after several good years, we believe this sufficiently demonstrates the added value of these dynamic strategies. We do, however, find these simulations insufficient to rank the dynamic strategies based on their effectiveness. It is well-known that the relative performance of dynamic strategies can be sensitive to the characteristics of the underlying stochastic model. As such, the characteristics are not completely objective, and we believe that use of the strategies in practice is an appropriate way to test the strategies further (which lies beyond the scope of this research).

Discussion
One of the intended advantages of a static life cycle strategy is the reduced risk close to the retirement, meaning that one can provide the investor with an accurate estimate of her retirement income in the years before retirement. Table 2 Figure 2: Sample paths of wealth, return of the matching and return portfolio, expected replacement ratio, and allocation to the matching portfolio applied to the pension investor introduced in Section 2.1 following rule-based strategies with the discount rates r = 2% and π = 2.5%, and using the governing stochastic model described in Section 2.2. Top left: wealth W t for respectively the cumulative target strategy (orange), its wealth targetW t (yellow), individual target strategy (green), the optimal static strategy (blue) and cumulative contribution (dark blue). Top right: return of the matching portfolio (blue) and return portfolio (orange). Bottom left: 70% replacement ratio target (yellow) together with the expected replacement ratio of the cumulative target strategy (orange), individual target strategy (green) and optimal static strategy (blue). Bottom right: allocation 1 − α t to the matching portfolio for the cumulative target strategy (orange), individual target strategy (green) and optimal static strategy (blue). to the matching portfolio, as a function of wealth to target ratio Wt,0 /Wt,0 (y-axis), as defined by equation (18), and time t (x-axis) for the first contribution of an investor following the combination strategy, discussed in Section 3.3, and using the stochastic model of section 2.2.
Allocations are restricted to multiples of 20%.
the replacement ratio at retirement. We conclude that when following the rulebased strategies the investor can be provided with a similarly accurate estimate of the replacement ratio before retirement.  Figure 4: Distribution of the replacement ratio for, respectively, the cumulative target strategy with r = 3.06% (orange), individual target strategy r = 2.99% (green), combination strategy with r = 1% (yellow) and a static strategy with 46.02% constant allocation to the return portfolio.  Figure 5: Expected shortfall below the investor's 70% replacement ratio target, see equation (6), versus 10% CVaR of the terminal replacement ratio, as defined by (7), for: the cumulative target strategy (orange), the individual target strategy (green), the combination strategy (yellow), all parametrized by the real rate of return r (yellow), and, also, several annually rebalanced allocations (blue), and several default life cycles reducing risk with the investor's age.  Although the rule-based strategies outperform other strategies in our examples, we wish to point that there are also disadvantages in the all-or-nothing approach, e.g, the portfolio remains 100% invested in the more risky return portfolio when targets are not reached. Such truly worst case scenarios appear to have a minor influence, but are, e.g., illustrated in the far left lower tail in Figure 5. The individual target strategy presented in Section 3.2 suffers less from the all-or-nothing disadvantages, as it defines a target per contribution. As a result, inferior past performance does not influence the required performance of current and future contributions.
Compared to the rule-based strategies, the combination strategy does not exploit the fact that the matching portfolio can grow an investment securely to its intended target (indexed by expected inflation) until retirement. As the rule-based strategies explicitly made use of this, the combination strategy could be further improved.
One advantage of the combination strategy in practical use is that the corresponding asset allocation is much more smooth than for the rule-based strategies. The necessity of large turnovers is difficult to explain and investors might be uncomfortable to follow such a drastic strategy to the end.

Conclusion
In this paper, we discussed several dynamic strategies, suitable for pension investors that aim to replace a proportion of their salary with a retirement income. The strategies reduce risk after several good years and steer the investor to her target. By having the allocation depend on return on investment, the approaches exploit a freedom which is typically not used by traditional static approaches. We have shown that the dynamic approaches may outperform some traditional static approaches and prevent unnecessary risk taking.
Two simple and intuitive rule-based strategies were introduced that secure investments in a cash flow matching portfolio once they yielded sufficient return. Although both rule-based strategies can straightforwardly be implemented in practice, we recommend to also investigate alternatives where the investor, e.g., switches between an aggressive traditional life cycle and a matching portfolio to rule out very aggressive portfolios close to retirement.
The rule-based strategies were further refined into a combination strategy based on dynamic programming. In the current setup, the combination strategy may not be superior and we even found that the rule-based strategies outperform the combination strategy in a numerical example. We certainly believe that dynamic strategies based on dynamic programming can be further improved, as also this research clearly demonstrates their added value to pension investors.
A most suitable dynamic strategy is hard to determine objectively as its performance depends on the governing stochastic model. Also, such a dynamic strategy should fit well with practical requirements, such as whether an investor will follow through on the strategy or will feel the need to combine such a strategy with her own judgement, and whether such strategies comply with regulations. This research, however, demonstrates the added value of dynamic strategies to pension investors. In summary, such strategies exploit freedom that is not used by traditional approaches, can steer a pension investor to her target and prevent unnecessary risk taking.

B Dynamic Programming Implementation
In this section, we will give a brief description of various computational insights from our implementation. The algorithm for the combination strategy includes the dynamic programming design, the selection of the state variable and the use local regression. The local regression technique in this section is also used by the target strategies.

B.1 Dynamic Programming Algorithm
Asset allocations for a dynamic programming strategy follow from Algorithm 1. The algorithm runs backward in time, similar to the algorithms in Cong and Oosterlee and Binsbergen and Brandt. The solution space is discretized such that a suitable algorithm can be used to solve the dynamic programming problem: it is assumed that with the values of the control, a j ∈ [0, 1] for j ∈ {1, . . . , k}. The investor can choose between at most k asset allocations at each time t. Algorithm 1 solves the optimal control problem backward in time by calculating the expected utility, E Ǔ (Z T )|F t , with the state variable Z t (see Section B.2) by the local regression technique (see Section B.3). The use of local regression is similar to the use of bundling in Cong and Oosterlee: neighborhood points are used in the local regressions for each step of the algorithm.
Solving the sub-problem at time t i changes the future states Z ti+1 , . . . , Z tn−1 . These states have previously been used in the local regressions to find the optimal solution for the sub-problems at times t i+1 , . . . , t n−1 . Thus, the optimal solutions for sub-problems at t i+1 , . . . , t n−1 may be different after solving the sub-problem at time t i . This is why Algorithm 1 follows a "snake-like pattern" through time: after the sub-problem at time t i is solved, future sub-problems are first updated in a forward fashion in time. Sub-problems are subsequently updated backwards in time at the next step until the sub-problem at time t i−1 is solved for the first time.
Once all sub-problems have been solved, the solution can be further improved by repeating the procedure. Algorithm 1 restarts at the beginning of the snakelike pattern through time. Each iteration of Algorithm 1 follows the snake-like pattern from T to t 0 once.

B.2 State Variable
Individual wealth targets are not constant over the time horizon. They are defined using the expected inflation and the market value factor at time t (see also Section 3.4). The individual wealth targets are also not constant between the different scenarios, as they are dependent on the contribution. The dynamic programming approach requires, however, a fixed wealth target in order to evaluate the expected utility. This issue is resolved by using the ratio between the portfolio wealth and the wealth target Z t , defined in Equation (18), as the state variable of the dynamic programming algorithm. The target state is now constant in time: Z * t = 1. An advantage of choosing this state variable is that, theoretically, the dynamic programming solution only has to be computed once. The investment decisions for the first contribution of the investor can be used for all other contributions. Dynamic programming results for the first contribution span the time horizon [t 0 , T ]. At each rebalancing time, the optimal investment decision, depending on state Z t , has already been computed. Because a ratio is used, these investment decisions can also be used for the contributions in later years. This is, however, not the case in practice. Algorithm 1 does not fully converge due to the limited sample size and due to sets of observations changing multiple times per iteration. Running Algorithm 1 separately for each contribution will give better decision rules. Separate runs of the algorithm will provide a better approximation of the optimal solution on average.

B.3 Least squares Monte Carlo method
At each optimization step in Algorithm 1, a least squares Monte Carlo method is used to avoid nested simulations (and the, related, exploding computation times). The least squares Monte Carlo method was introduced by Longstaff and Schwartz as a simple method for pricing American options by simulation. The conditional expectation of the pay-off under the assumption of not exercising the option is estimated by using cross-sectional information already available in the simulation. Realized pay-offs from continuation (or, in the pension investment setting, from final utilityǓ (Z T )) are regressed on functions of the state variables. The fitted value provides an estimate of the conditional expectation.
The regression employed is based on a so-called regress-now strategy, and specifically on a local regression version. Regress-now estimates the expectation E Y ti+1 |X ti , X ti ∈ F ti by using a set of basis functions, Φ, with index set J : with c j coefficients found by using least squares regression and ϕ j ∈ Φ. Substitution gives Local regression, introduced by Cleveland, estimates a linear or quadratic polynomial fit at x by using a weighted least squares regression. Weights for an observation (x i , y i ) are dependent on the distance between x i and x (Cleveland and Grosse). The smoothness of the fit is dependent on the percentage of observations that are taken into account when evaluating at x.
Let n be the number of observations and let 0 < d ≤ 1 be a neighborhood parameter, i.e., the share of observations used for the weighted least squares regression at the evaluation point. Furthermore, let k = d · n, rounded up to an integer value, ∆ i (x) be the Euclidean distance of x to x i , and ∆ (i) (x) be the values of these distances, ordered from smallest to largest.
The weight ζ t for an observation (x i , y i ) is then equal to with T (u) = 1 − u 3 3 , for 0 ≤ u < 1, 0, for u ≥ 1, also known as the tri-cube weight function.
Not only is the regress-now strategy used to approximate the expected utility, this strategy is also used to estimate the expected annual inflation E t π τ , with Φ = {1, x}. The future cumulative inflation, Y ti+1 , is regressed on the past cumulative inflation, X ti , to estimate the future annual inflation at time t i : (1 + π k ), with x j ∈ X ti and y j ∈ Y ti+1 for each j in the scenario set.
The resulting regression function is of the form f = ξ 1 x + ξ 2 . The expected effective annual inflation for the time period k to T is now equal to: