1 Introduction

Over the last decades, index tracking (IT) has become an increasingly popular investment strategy all over the world. The aim is to create a portfolio of assets that replicates (tracks) the movements of a given financial index taken as benchmark. The difference between the performance of the replicating portfolio and the benchmark is generally referred to as tracking error and the problem entails the definition of a portfolio that minimizes this gap. On the basis of the specific tracking-error function used, different formulations have been proposed [see, for example, Mutunge and Haugland (2018) and the references therein].

The IT is traditionally referred to as a passive management strategy. In contrast, an active strategy aims at defining portfolios that achieve higher returns than the benchmark, generating an “excess of return”. Sometimes the term index-plus-\(\alpha \) portfolio is used to indicate a portfolio that outperforms the benchmark by a given, typically small, value \(\alpha \). Since, in general, there is no guarantee of achieving a positive excess of return under every circumstance, the risk of underperforming the benchmark should be limited. Enhanced index tracking (EIT), or simply enhanced indexation, can be seen as an evolution of the IT relying on the idea of properly combining the strengths of both the passive and active management approaches (Canakgoz and Beasley 2009), by defining portfolios that outperform the benchmark while incurring a limited additional risk.

The definition of the IT strategies represents an active research area that has been receiving an increasing attention by both researchers and practitioners. Compared to the IT, the contributions dealing with the EIT problem are still limited and more recent. The first formalization of the problem, due to Beasley et al. (2003), dates back to 2003 and almost all the contributions appeared later than 2005.

Most of the proposed models rely on a backward perspective in that the tracking portfolios are built by considering as input data the historical observations in the hope that high accuracy in the past could guarantee the same result in the future. It is trivial to observe that a forward view would be advisable specially in periods of high financial turmoil as those experienced in recent years.

Among the few papers that analyze the problem under this latter perspective, we cite the contribution by Stoyan and Kwon (2010) who propose a two-stage stochastic programming formulation for the classical index tracking problem. In this paper, the alternative paradigm of Chance Constraints (CC, for short) introduced in Charnes and Cooper (1959) has been adopted with the aim of defining replicating portfolios whose performance can overcome the future benchmark return with a high probability level.

During the last decades, CC-based models have been applied to formulate several real-world problems (see, for example, Beraldi et al. 2012, 2015, 2017), where providing reliable solutions is considered a primary concern. Their use is actually not new in the field of portfolio optimization: the well-known Value at Risk measure, being a quantile, is modeled by a chance constraint and a large amount of papers on portfolio optimization can be “labeled” as CC-based models.

For the EIT problem the number of contributions is more limited and recent. A CC based formulation was proposed by Lejeune and Samatli-Pac (2013) where the variance of the benchmark return is forced to be below of a given threshold with a high probability. A deterministic equivalent reformulation is proposed under the assumption that the asset returns are represented by a stochastic factor model with the random factor following a normal distribution. More recently, by extending the reliable model proposed by Lejeune (2012), Xu et al. have presented (Xu et al. 2018) a sparse EIT model aimed at maximizing the excess return that can be attained with a high probability level. In the paper, the CC is approximated via the data driven distributionally robust approach. Bruni et al. (2017) use the CC as relaxation of the Zero-order \(\epsilon \)-stochastic dominance criteria.

In this paper, we present a stochastic formulation for the EIT problem where the CC is dealt under the assumption of discrete random variables. We point out that typically the CCs are formulated by considering continuous distributions, in particular the Gaussian one. For the considered application, this assumption might be inappropriate since most asset return distributions are leptokurtic or fat tailed. For the case of discrete random variables a deterministic equivalent reformulation of the stochastic problem can be obtained by the introduction of binary supporting variables, related to the realizations of the random variables, and a classical knapsack like constraint is used to limit the violation of the stochastic constraints. Depending on the total number of realizations used to model the uncertain parameters, the solution of the corresponding problem can be computationally demanding. In this paper, we empower the classical Branch and Bound method with a initial warm start solution determined by exploiting the specific problem structure.

Besides the CC model, we present another formulation based on the integrated version of the CC constraints (ICC). While the CC is used to control the probability of beating the benchmark and, as such, accounts for the “qualitative” side of the eventual shortage, the ICC provides a “quantitative” measure. In this case, the negative deviation of the tracking portfolio performance from the benchmark is quantified and the corresponding expected value is bounded to a given threshold. The ICC-based formulation presents some similarities with recent contributions as those considering the Conditional Value at Risk Measure [see, for example, Goel et al. (2018)] or that including second-order stochastic dominance (SSD) constraints [see, for example, Dentcheva and Ruszczyński (2003)] that, indeed, can be seen as a collection of the ICC.

The main contribution of this paper is to investigate the EIT problem by the CC paradigm assuming that the random variables follow a discrete distribution. The performance of the proposed models are also compared with other recent formulations on an out-of-sample basis.

The rest of the paper is organized as follows. Section 2 surveys the most recent literature on the EIT problem. Section 3 presents the proposed models, whereas Sect. 4 details the deterministic equivalent reformulations in the case of discrete random variables. Section 5 is devoted to the presentation of the computational experiments carried out to measure the performance of the proposed strategies also in comparison with other approaches proposed in the recent literature. Some conclusions and possible research directions are presented in Sect. 6.

2 Related literature

The growing popularity of the enhanced index funds, experienced both in mature and emerging markets (Parvez and Sudhir 2005; Weng and Wang 2017), has pushed the academic community towards the design of quantitative tools to support investors. While the scientific literature on the IT is rather extensive [see, for example, Sant’Anna et al. (2017) for a recent overview], the number of contributions on the EIT problem is lower but steadily increasing. The EIT problem was originally introduced by Beasley et al. (2003) and an overview of the early literature can be found in Canakgoz and Beasley (2009). In what follows, we review the most recent contributions focusing on the main measures used to control the risk of underperforming the benchmark.

A first group of contributions adopt the variance (or the standard deviation) as risk measure. For example, in (de Paulo et al. 2016) De Paulo et al. consider the variance computed on the difference between the replicating portfolio return and the benchmark and propose a mean-risk approach. Standard deviation has been used by Wu et al. (2007), where the authors consider a bi-objective function, dealt by a goal programming technique, with the term to be maximized set equal to the excess portfolio return. More recently, Cesarone et al. have proposed (Cesarone et al. 2019) a novel approach aimed at finding a Pareto optimal portfolio that maximizes the weighted geometric mean of the difference between its risk and gain and those of a suitable benchmark index. In the experiments, the authors use as risk measure the standard deviation of the replicating portfolio. A mean-risk structure has been also adopted in Li et al. (2011) where the tracking error is expressed as downside standard deviation. The tracking error variance has been used in the very recent contribution (Gnägi and Strub 2020) where the authors propose a model aimed at minimizing a quadratic function of the covariances between the returns of the stocks, the weights of the stocks in the composing portfolio and the weights of the stocks in the benchmark. The authors showed that minimizing this risk measure may provide better performance in terms of the out-of- sample tracking error when compared with other strategies. The main limitation of this approach is related to necessity of knowing the actual composition of the index that is often not available for the existing test instances.

Absolute deviation related measures have been used in several other contributions. Koshizuka et al. proposed (Koshizuka et al. 2009) a model aimed at minimizing the tracking error from an index-plus-\(\alpha \) portfolio, choosing among the portfolios with a composition highly correlated with the benchmark. Two alternative measures of the tracking error have been considered: one based on the absolute deviation and the other on the downside absolute deviation. In Valle et al. (2014) the authors studied an absolute return portfolio problem and propose a three-stage solution approach. The maximum downside deviation of the replicating portfolio compared to the benchmark was used by Bruni et al. (2015) in a bi-objective approach.

More recently, the Conditional Value at Risk (CVaR) measure has been used to model risk in the EIT problem. For example, in Goel et al. (2018) the authors consider two variants of the CVaR: the two tails CVaR and the mixed one. While the former measure represents the weighted sum of the left tails (worst scenarios) and right tails (best scenarios), the latter is the weighted sum of the CVaR calculated for different confidence levels. The mixed CVaR has been also used by Guastaroba et al. (2017) in the framework of risk-reward ratio. The contribution is similar to the model proposed by the same authors in Guastaroba et al. (2016) where they consider the maximization of the Omega Ratio in the standard and the extended forms.

Stochastic dominance (SD) criteria, in different forms, have been also applied to control the risk of underperforming the benchmark. The basic aim is to create replicating portfolios whose random return stochastically dominates the return of the benchmark. One of the first EIT model based on the SD has been proposed by Kuosmanen (2004), who applied the First-order Stochastic Dominance (FSD) and Second Order SD rules. Later on, in Luedtke (2008) presented compact linear programing formulations where the objective is to maximize the portfolio expected return with SSD constraints over the benchmark. More recently, Roman et al. (2013) applied a SSD strategy to construct a portfolio whose return distribution dominates the benchmark ones. We note that typically the solution of a problem including SSD constraints is computationally demanding since the number of constraints to include is of the order of the number of scenarios squared. Moreover, the SSD conditions are judged by many agents as excessively demanding because the most extreme risk averse preferences are taken into account as well. For this reason, some forms of relaxation of the SD have been proposed. Sharma et al. (2017) used underachievement (surplus) and overachievement (slack) variables and control the relaxed SSD condition by imposing bounds on the ratio of the total underachievement to the sum of total underachievement and overachievement variables. More recently, Bruni et al. (2017) applied a Zero-order \(\epsilon \)-SD rule and its cumulative version. Higher order SD rules, that can be viewed as relaxation of SSD as well, have been applied by Post and Kopa (2017).

As final contributions, we mention the papers, more related to our work, that use the chance constraints to control the risk. In Lejeune (2012) propose a stochastic model aimed at maximizing the probability to obtain excess return of the invested portfolio with respect to the benchmark. Recently, an extension of the previous mentioned paper has been proposed by Xu et al. (2018), where the authors present a sparse enhanced indexation model aimed at maximizing the excess of return that can be achieved with a high probability. The proposed problem is dealt by a distributionally robust approach. In the same stochastic stream, we cite Lejeune and Samatli-Pac (2013) where the CC is used to impose the variance of the index fund return to be below a threshold with a large probability. A deterministic equivalent formulation, exploiting the classical standardization technique, is provided under the assumption that the asset returns are represented by a stochastic factor model, with the random factor following a Normal distribution.

Our CC formulation share with these last mentioned papers the choice of the modeling paradigm: the CC is used to guarantee that the performance of the replicating portfolio can beat the benchmark with a given probability level. Differently, the CC is dealt under the assumption of random variables with discrete distribution represented by a set of scenarios each occurring with a given probability level. The formulation based on the integrated form of the CC controls the expected shortage and, as such, shares some similarities with CVaR based models and with the SSD ones, since SSD can be seen as a family of ICC. In the paper, the CC based formulations are compared each other and with other recent approaches proposed for the EIT problem. The extensive computational phase can be seen as another contribution of the paper.

3 The EIT problem under chance constraints

We consider the problem of a portfolio manager who wants to determine a portfolio that outperforms a benchmark, represented by the rate of return of a market index eventually increased by a given value. A buy and hold strategy is supposed to be adopted in that the defined portfolio is kept until the end of a given investment horizon. Let \(J=\{1,2,\dots ,N\}\) denote the set of the index constituents. A portfolio is identified by specifying the fraction \(x_j\) of capital invested in every asset \(j \in J\). No short sales are allowed (\(x_j\) are non negative decision variables) and the consistency constraint \( \sum _{j \in J}x_j =1\) is imposed. We denote by \({{\mathcal {X}}}\) the feasible set determined by these basic constraints. Each asset j, as well as the benchmark, generate an uncertain return modeled as random variables, \(\widetilde{r_j}\) and \({\widetilde{\beta }}\), defined on a given probability space \((\Omega , {\mathcal {F}}, \mathrm{I\!P})\). Under this assumption, the EIT problem becomes a stochastic optimization problem:

$$\begin{aligned}&\max f( \widetilde{R_p},{\widetilde{\beta }}) \nonumber \\& \widetilde{R_p} \ge {\widetilde{\beta }} \nonumber \\& x \in X \end{aligned}$$
(1)

where \( \widetilde{R_p} = \sum _{j \in J} \widetilde{r_j} x_j\) denotes the random portfolio return.

The way in which the feasibility and optimality conditions of problem (1) are redefined depends on the approaches used to deal with the randomness. In the proposed formulation, the objective function is dealt by considering the expected value operator \(\mathrm{I\!E}[\cdot ]\), whereas the stochastic constraint is treated by adopting the CC paradigm:

$$\begin{aligned} \mathrm{I\!P}(\widetilde{R_p} \ge {\widetilde{\beta }}) \ge (1-\gamma ) \end{aligned}$$
(2)

Roughly speaking, the violation of the stochastic constraint is allowed provided that it occurs with a low probability value denoted by the parameter \(\gamma \) in (2). This value represents the allowed tolerance level and is chosen by the decision maker on the basis of his risk perception. Low values of \(\gamma \) (eventually 0) are used to model very risk averse positions: the replicating portfolios should beat the benchmark under every possible circumstances that might occur in the future. A part from the feasibility issue, a natural consequence of this pessimistic choice is the over conservativeness of the suggested investment policies. Higher values of \(\gamma \), on the contrary, may provide portfolios with higher potential returns, as long as the investor is willing to accept the risk of underperforming the benchmark.

Dating back to the 60th (Charnes and Cooper 1959), the CC still represents a challenging topic in optimization. All the existing approaches for solving the CC-based problems rely on the derivation of deterministic equivalent reformulations that, in turn, depend on the nature of the random variables. In the case of Normal random variables, a deterministic equivalent reformulation can be easily derived by applying standard normalization techniques. In particular, it is easy to show that (2) can be rewritten as

$$\begin{aligned} \sum _{j \in J}\overline{r}_j x_j \ge {\overline{\beta }} - \phi ^{-1}(\gamma ) \sqrt{\sum _{j \in J} \sigma _j^2 x_j^2 + \sum _{j \in J} \sum _{i \in J, i\ne j} \sigma _{ij} x_jx _i - 2 \sum _{j \in J} \sigma _{j \beta }x_j +\sigma _{\beta }^2} \end{aligned}$$
(3)

Here \(\overline{r}_j\) and \({\overline{\beta }}\) represent the expected rate of return of the assets and the benchmark, respectively, whereas \(\sigma _{ji}\), \(\sigma _{j \beta }\), \(\sigma _{\beta }^2\) denote the covariance between the return of asset j and i, between the asset j and the benchmark return, and the variance of the benchmark return, respectively. Finally, \(\phi ^{-1}(\gamma )\) is the \(\gamma \)-quantitle of the normal standard distribution. The deterministic reformulation highlights that the expected return of the replicating portfolio is required to exceed the expected benchmark return augmented by a penalty term (i.e. \(\phi ^{-1}(\gamma )\) is a negative term) that increases as \(\gamma \) decreases. We note that a reformulation similar to (3) can be derived when the distribution probability \(\mathrm{I\!P}\) is unknown, provided that the first two moments of the distribution are known. In this case the term multiplying the standard deviation is \(\sqrt{\frac{1-\gamma }{\gamma }}\).

In this paper, we deal with the CC assuming that the random variables follow a discrete distribution. This hypothesis allows to overcome the shortcomings related to the choice of the Normal distribution that is inappropriate in the case of distributions exhibiting fat tails. Discrete realizations can be derived by using specific scenario generation techniques [see, for example, Beraldi et al. (2010) and the references therein] or available historical observations. The following section presents the deterministic equivalent reformulation under this assumption.

It is worthwhile noting that the CC (2) allows to control the probability of beating the benchmark, thus accounting for the “qualitative” side of the eventual shortage. Even though this probability is limited, the shoratge, if experienced, could be high. Under this respect, the integrated version of the CC provides a way to keep the expected shortage under control. According to this paradigm, the stochastic constraint (2) is written as

$$\begin{aligned} \mathrm{I\!E}[({\widetilde{\beta }}-\widetilde{R_p})_{+}] \le \epsilon \end{aligned}$$
(4)

where \(({\widetilde{\beta }}-\widetilde{R_p})_{+}\) denotes the random shortage of the replicating portfolio return with the respect to the benchmark. In this case, the probability distribution is used to measure the expected magnitude of the shortage and both quantitative and qualitative aspects are jointly accounted for. The parameter \(\epsilon \) is defined by the end-user on the basis of the expected shortage that he is willing to accept.

4 The proposed formulations

This section presents the CC-based reformulations under the assumption of discrete random variables. We denote by \({{\mathcal {S}}}= \{1,\dots ,S\}\) the set of scenarios, i.e. realizations that the random parameters can take, and by \(\pi _s\) the probability of occurrence.

4.1 The CC-based model

Under this assumption, constraint (2) can be rewritten in the following disjunctive form:

$$\begin{aligned} \Gamma (1-\gamma ) = \bigcup _{K \in {{\mathcal {K}}}} \bigcap _{s \in K} \{ R_p^s \ge \beta ^s\}. \end{aligned}$$
(5)

Here the set \({{\mathcal {K}}}\) is defined as

$$\begin{aligned} {{\mathcal {K}}} = \{ K | K \subseteq {{\mathcal {S}}}, \sum _{s \in K} \pi ^s \ge 1-\gamma \} \end{aligned}$$
(6)

whereas \(R_p^s\) and \(\beta ^s\) denote the s-realization of the portfolio return and the benchmark, respectively. A natural approach to rewrite the constraint (5) is by “big-M” reformulation:

$$\begin{aligned}&\sum _{j \in J} r_{js} x_j + M z_s \ge \beta ^s \ \ \ \ \forall s \end{aligned}$$
(7)
$$\begin{aligned}&\sum _{s \in {{\mathcal {S}}}} \pi _s z_s \le \gamma \end{aligned}$$
(8)
$$\begin{aligned}&z_s \in \{0,1 \} \end{aligned}$$
(9)

where M is a real number large enough to ensure that if the binary variable \(z_s\) takes value 1, then (7) is not active, the opposite when \(z_s\) is 0. Constraint (8) is a knapsack restriction that limits to \(\gamma \) the violation of the scenario dependent constraints.

The complete CC-based reformulation of the EIT problem is a mixed integer problem with the following structure:

$$\begin{aligned}&\max \sum _{j \in J} \overline{r}_j x_j \end{aligned}$$
(10)
$$\begin{aligned}&(7)-(9) \end{aligned}$$
(11)
$$\begin{aligned}&x \in {{\mathcal {X}}} \end{aligned}$$
(12)

Depending on the cardinality of the scenario set and by the imposed risk level \(\gamma \), the solution of the reformulated problem can be computational demanding. Let us assume, for example, that each scenario s has the same probability of occurrence, i.e. \(\pi _s=1/S\). In this case, constraint (8) can be rewritten as:

$$\begin{aligned} \sum _{s \in {{\mathcal {S}}}}z_s \le q \end{aligned}$$
(13)

with q equal to \(\left\lfloor \gamma S \right\rfloor \). Let us denote by \({{\mathcal {S}}}_q\) the whole set of feasible solutions of constraint (13). Each solution is associated with a subset \({{\mathcal {I}}}_q\) of q scenarios. A trivial solution approach to solve the CC-based problem would rely on an enumeration scheme: for each element of \({{\mathcal {S}}}_q\), solve the following linear programming problem:

$$\begin{aligned}&\max \sum _{j \in J} \overline{r}_j x_j \end{aligned}$$
(14)
$$\begin{aligned}& R_p^s \ge \beta ^s \ \ \ \ \ \forall s \in {{\mathcal {S}}}-{{\mathcal {I}}}_q \end{aligned}$$
(15)
$$\begin{aligned}& x \in {{\mathcal {X}}} \end{aligned}$$
(16)

The optimal solution of the original problem will be the best among the determined ones. It is evident that such an approach is prohibitive even for moderate size of the scenario set, since the number of sets \({{\mathcal {S}}}_q\) is equal to \(\left( {\begin{array}{c}S\\ q\end{array}}\right) . \)

Solution approaches exploiting the specific problem structure have been proposed in the last decades (e.g. Beraldi and Bruni 2010, Bruni et al. 2013, Beraldi and Ruszczyński 2005). In this paper, we use the classical Branch and Bound approach implemented in commercial solvers, empowered with a procedure for determining an initial incumbent solution. The basic idea, reported in the following scheme, relies on the use of the Lagrangian multiplies \(\lambda _s\) associated with the scenario based constraints. Since a positive value of the multiplier represents the sensitivity of the optimal objective function value to variations in the s-th constraint, we adopt a locally-best choice by removing the q constraints associated with the largest values. Finally, the reduced problem containing the promising scenarios is solved as integer and the corresponding objective function value is imposed as initial incumbent value. The basic scheme is reported in the following.

  • Initialization. Create the supporting set \({{\mathcal {R}}}_q\) initially containing all the scenarios.

  • Step 1. Solve model (14)–(16) by imposing the scenario constraints for all s \(\in {{\mathcal {R}}}_q\).

  • Step 2. If the problem is feasible go to Step 3 otherwise go to Step4.

  • Step 3. Let \(\lambda _s\) be the Lagrangian multipliers associated with the scenario constraints. Sort these values in non increasing order. Compose the set \({{\mathcal {I}}}_q\) in (15) by using the corresponding first q scenarios. Solve model (14)–(16) and set the initial incumbent equal to the corresponding objective function value. Go to Step 5.

  • Step 4. Randomly select a scenario l and update the set \({{\mathcal {R}}}_q = {{\mathcal {R}}}_q -{l}\). Go to Step 1.

  • Step 5. Invoke the classical Branch & Bound algorithm.

We note that when the model solved at Step 1 is infeasible, an iterative procedure is applied consisting in removing a scenario constraint at time. Other rules for selecting a given number of q scenarios out of the S can be applied (as for example, a simple, random choice). However, we have found that simple initialization allows a nice reduction of the computational effort, when used as mip starter in a classical Branch & Bound algorithm.

4.2 The ICC-based model

In the case of the ICC the reformulation is less involved and relies on the introduction of continuous scenario dependent variables \(y_s\) defined as \(\max (0, \beta ^s-R_p^s)\). More specifically, constraint (4) can be written as:

$$\begin{aligned}&y^s \ge \beta ^s - \sum _{j \in J} r_{js} x_j \ \ \ \forall s \end{aligned}$$
(17)
$$\begin{aligned}&\sum _{s \in {{\mathcal {S}}}} \pi _s y_s \le \epsilon \end{aligned}$$
(18)
$$\begin{aligned}&y_s \ge 0 \ \ \ \ \ \forall s \end{aligned}$$
(19)

It is worthwhile observing that the ICC paradigm is related to other approaches recently proposed for the EIT problem. An evident connection is with the cumulative \(\epsilon \)-ZSD introduced in Bruni et al. (2017). While the \(\epsilon \)-ZSD imposes that under every scenario s the underperformance with respect to the benchmark \(\beta ^s\) is limited by \(\epsilon \), by the cumulative condition, the aggregated loss is taken into account. Instead of considering the bound \(\epsilon \) as a parameter, the authors consider it as decision variable to be minimized. In our formulation, we still consider an aggregated loss, but we also take the scenario probability in the expected value.

Finally, we note the connection between the ICC and the SSD constraints. We recall that a random variable \({\widetilde{R}}_p\) dominates in the second order a random variable \({\widetilde{\beta }}\) if

$$\begin{aligned} F_{\tilde{R}_p}^2(\eta ) \le F_{\tilde{\beta }}^2(\eta ) \ \ \ \ \ \forall \eta \in R \end{aligned}$$
(20)

where \(F^2(.)\) denotes the second performance function, that can be also expressed as

$$\begin{aligned} F^2_{V} = \mathrm{I\!E}[(\eta -V)_+] \end{aligned}$$

with V representing a generic random variable.

Under the assumption of discrete distributions, constraint (20) can be written as:

$$\begin{aligned} \mathrm{I\!E}[(\beta _t - {\widetilde{R}}_p)_+] \le \mathrm{I\!E}[(\beta _t - {\widetilde{\beta }})_+] \ \ \ \forall t \end{aligned}$$
(21)

Thus, the SSD constraints can be written as a collection of ICC:

$$\begin{aligned}&\sum _{j \in J} r_{js} x_j + l_{ts} \ge \beta _t \ \ \ \forall t,s \end{aligned}$$
(22)
$$\begin{aligned}&\sum _{s\in {{\mathcal {S}}}} \pi _s l_{ts} \le \ v_t\ \ \forall t \end{aligned}$$
(23)
$$\begin{aligned}&l_{ts} \ge 0 \ \ \ \ \forall t,s \end{aligned}$$
(24)

where \(v_t = \mathrm{I\!E}[(\beta _t - {\widetilde{\beta }})_+] \).

5 Computational experiments

This section is devoted to the presentation and discussion of the computational experiments carried out with the aim of empirically assessing the effectiveness of the proposed models. We have considered the data sets used in Guastaroba et al. (2016), consisting of two groups of instances, namely the GMS and the ORL. For all the instances, that are available at the website (http://or-brescia.unibs.it), the number of scenarios corresponds to two-years weekly observations and is equal to 104. The performance of the generated portfolios are evaluated on an out-of-sample basis, by measuring their return in the 52 weeks following the date of portfolio selection. Table 1 reports the main characteristics of the tested instances: the name of the instance, the benchmark, the number of available assets and of scenarios. The GMS set includes four instances created to span four different market trends. For example, the instance referred to as GMS-UU is characterized by an increasing trend of the market (i.e. the market is moving up) in the in-sample periods as well as in the out-of sample periods. Similar considerations are valid for the other GMS instances.

Table 1 Data set

The ORL set is generated from 8 benchmark instances for the index tracking problem that consider different stock market indices.

Besides using historical data as realizations of the random asset returns, additional tests have been carried out by adopting a MonteCarlo scenario generation technique.

In particular, we have assumed that random asset prices are modeled by a correlated Brownian Motion (the market index is dealt as an additional asset in our generation). Starting from the historical observations, we have determined for each asset i the expected returns \(\mu _i\), and the Variance Covariance matrix with elements \(\sigma _i^2\) denoting the variance of the return of asset i and \(\sigma _{ij}\) representing the covariance between the returns of asset i and j. For every asset i the following formula has been applied:

$$\begin{aligned} r_i= (\mu _i- (\sigma _i^2)/2)+\sigma _i \sum _{k=1}^i C_{ik} \epsilon _k \end{aligned}$$

where \(C_{ik}\) are the coefficients of the Cholesky’s matrix of the variance covariance matrix and \(\epsilon _k\) \(\sim \) N (0,1). By applying the MonteCarlo simulation different scenario sets of increasing cardinality have been determined. In particular, in our tests we have considered instances with 150, 250 and 450 scenarios. These new instances are referred by adding to the original name the scenario number. Thus, for example ORL-IT8-250 refers to the version of the original instance where 250 scenarios are generated by the MonteCarlo Simulation technique. For all the tested instances, the same probability of occurrence has been associated with the scenarios.

The results presented here after mainly refer to the set of test instances available in literature, i.e. scenarios are determined by considering historical observations. The motivation of this choice is related to the possibility of making the results easily reproducible. The last subsection is devoted to the presentation and discussion of the the results obtained by considering the MonteCarlo simulation technique.

All the models have been implemented in GAMS 24.7.4 and solved by CPLEX 12.6.1 on a C Intel Core I7 (2.5 GHz) with 8 GB of RAM DDR3. The solution of the CC formulations has been carried out by applying the specialized Branch & Bound approach described in Sect. 4.1. We note that the solution time increases as higher \(\gamma \) values are considered. For those instances, the application of the specialized approach allows to achieve a reduction of the solution time around \(20\%\). Anyhow, even for the larger instances the solution times are still limited. As for the ICC formulation, that entails the solution of a LP problem, the required computational times are much lower (around few seconds).

5.1 Numerical results

Different computational experiments have been carried out to evaluate the performance of the proposed CC models.

Figure 1 reports the expected portfolio return as function of the risk aversion level \(\gamma \). The numbers shown next to the graph denote the standard tracking error (computed by considering the deviation below he benchmark).

The results refer to the test case GMS-DD, but a similar behavior in terms of risk-return trade-off has been observed for all the other tested instances.

Fig. 1
figure 1

Efficient frontier for the GMS-DD instance

As evident, when lower values of \(\gamma \) are considered, the replicating portfolios show lower expected returns and, as expected, lower tracking errors. As the \(\gamma \) value is increased, the performance improves (the problem is less constrained), but the tracking error sightly worsens. By varying the \(\gamma \) values, portfolios with different performance can be obtained, thus providing the decision maker with a wide range of solutions to choose from according to the investor’s risk attitude. Very risk adverse investor may be willing to sacrifice some return as long as “safer” solutions are guaranteed.

The results on the portfolio composition confirm that the lower the \(\gamma \) value the larger the number of selected assets. For example, for the test case introduced above, the number of assets ranges between 38 and 20 when passing from 0.01 to 0.10. The same trend has been observed for all the tested instances that present a limited number of assets included in the replicating portfolios. Similar results have been obtained for the ICC model, that on the whole shows slighter worse performance.

Additional experiments have been have been carried out to evaluate the realized performance of the CC formulations on an out-of-sample basis. The following figures report the cumulative return of the benchmark and the replicating portfolios as function of \(\gamma \) for the GMS instances.

The results clearly show that the CC portfolios mimic closely the behavior of the market over the entire out-of-sample horizon: cumulative returns of the benchmark and the replicating portfolios jointly increase and/or decrease. Moreover, for some periods the replicating portfolios outperform the benchmark. Analyzing the results in more details, we may observe that when the market is down both in the in-sample and out-of-sample periods (GMS -DD instance) the best results have been obtained when considering higher values of \(\gamma \), as shown in Fig. 2.

Fig. 2
figure 2

CC formulation: cumulative realized returns as function of \(\gamma \)

This behavior can be explained observing that when the market experiences a negative trend, tracking the benchmark as close as possible is not a winning strategy in the attempt of gaining more. On the contrary, when the market is up (see Fig. 3), the best strategy seems to be the faithful replication of the benchmark obtained by choosing lower values of \(\gamma \).

Fig. 3
figure 3

CC formulation: cumulative realized returns as function of \(\gamma \)

When the market has a mixed trend, lower values of \(\gamma \) are preferable as shown in Figs. 4 and 5 that refer to the GMS-UD and GMS-DU instances.

Fig. 4
figure 4

CC formulation: cumulative realized returns as function of \(\gamma \)

Fig. 5
figure 5

CC formulation: cumulative realized returns as function of \(\gamma \)

For the ORL data set, the results (not reported here for the sake of brevity) are similar and the best performance are obtained for small values of \(\gamma \).

The results reported hereafter refer to the out-of-sample analysis for the ICC formulation. The following Figs. 6 and 7 show the cumulative returns for the \(GMS-DD\) and \(GMS-UD\) instances, respectively. Similar behavior has been observed for the other tested instances. As evident, the replicating portfolios track closely the ex-post behavior of the benchmark. In some cases (see Fig. 6) the return patterns seem to overlap, in some others the tracking portfolio provides better performance when compared to the market index (see Fig. 7). In all the cases, the results seem to be quite satisfactory.

Fig. 6
figure 6

ICC formulation: cumulative realized returns

Fig. 7
figure 7

ICC formulation: cumulative realized returns

5.2 Comparison with other strategies

Additional experiments have been carried out to provide a comparison between the CC based formulations and other strategies recently proposed for the EIT problem. For the CC model, since difference performance have been obtained for different values of \(\gamma \), we report the average results. As basis of comparison we have considered the model presented in Guastaroba et al. (2016), referred as omega, where the objective function is the maximization of the extended omega ratio (the value of \(\alpha \) has been set to 0) and the SSD formulation (22)–(24). The analysis has been carried out by computing some performance measures typically used in the portfolio optimization literature, namely the average return, the Sortino Ratio and the number of weeks, dived by 52 and in percentage, that the replicating portfolio beats the benchmark. All the values have been computed on the out-of-sample time horizon and for all the measures, larger values are preferred.

The following Tables report the values of the different measures for the tested models (Tables 2, 3). The best values recorded for all the measures are reported in bold.

Table 2 Out-of-sample results for the GMS instances
Table 3 Out-of-sample results for the ORL instances

Looking at the overall results, we may observe that there is no formulation that dominates the others according to all the computed measures and for all the tested instances. Table 4 reports the number of times that a given approach provides the best result with the respect to a given measure.

Table 4 Comparison of the different strategies

The results show that the CC formulation provides better results when compared with the ICC one. The SSD formulation presents satisfactory results that are better than those provided by the ICC formulation. Anyhow, it is no possible to establish the winning or losing model. More interestingly, it is possible to confirm the efficacy of the proposed models in supporting investment decisions suggesting strategies able to outperforms the benchmark ex-post.

5.3 Solution of larger instances

This section is devoted to the presentation and discussion of the results obtained when considering a larger number of scenarios generated by a MonteCarlo simulation technique. As expected, the quality of the scenarios affects the realized performance. The following Table reports the results obtained for the test instance ORL-IT8 as function of the scenario number (Table 5). Looking at the results, we may notice that better performance can be achieved when the Montecarlo simulation is adopted. For example, considering a high level of risk aversion (\(\gamma =0.99\)) the realized average return passes from 0.0072 when using historical observation (104 scenarios) to 0.050 when the scenarios (150) are generated by to the MonteCarlo simulation. The improvement in terms of Sortino ratio and number of times the portfolio outperforms the benchmark is also impressive, especially when considering a high level of risk aversion. We may notice that the performance seems to deteriorate when passing from 250 to 450 scenarios. Anyhow, even for in this case the results are much better than those achieved when historical data are used. If we look at the values averaged considering the different sizes of the scenario set, it appears evident that the best results are obtained considering a \(\gamma \) value equal to 0.01, confirming that keeping a prudent attitude pays on the long run. We finally notice that the same considerations drawn for the ORL-IT8 test can be drawn for all the other instances. The whole set of results is not reported here for the sake of brevity.

Table 5 Out-of-Sample results for the ORL-IT8 instance as function of the scenario number

6 Conclusions

The enhanced index tracking problem represents a challenging problem that has been receiving an increasing attention by the scientific community in the last decades. In this paper, the problem is addressed by applying the machinery of chance constraints and two formulations, based on the basic and the integrated form of the stochastic constraint are proposed. Extensive computational experiments are carried out on different benchmark instances. Both the proposed formulations suggest investment strategies that track very closely the benchmark over the out-of-sample horizon and often achieve better performances. When compared with other existing strategies, the empirical analysis reveals that no optimization model clearly dominates the others in the out-of-sample analysis, even though the chance constrained formulation seems to be very competitive. Additional tests have been carried out to evaluate the impact of the quality of the scenarios used as input data in the optimization models. We have empirically found that when scenarios are generated by applying a MonteCarlo simulation technique the realized returns are superior than those achieved when considering historical observations. The results confirm that the adoption of a forward perspective is preferable since high accuracy in the past seldom guarantees the same result in the future.