Introduction

Basic fund investment strategies are classified into active and passive strategies. In an active strategy, investors believe in the inefficient market hypothesis that market prices cannot accurately reflect real values. Thus, investors aim to beat the market through their experience, in-depth research, financial forecasting, and stock analysis [19, 34, 35]. In contrast, investors who adopt a passive strategy believe in the efficient market hypothesis (EMH), which states that market values accurately include and reflect all information at all times [40, 41, 51]. Under the EMH assumption, investors believe that it is difficult to outthink market performance, and they assume that the market posts positive returns over time. As a result, investors adopt the buy-and-hold portfolio strategy over the long term with minimal trading activities, and they seek to replicate the performance of the chosen benchmark market index as closely as possible.

Recently, the significance of passive management strategies has increased tremendously. Three major motivations have been determined for this phenomenon [5, 11, 54, 61]. First, the benchmark index continually rose in the past. Under this observation, passive investors have a greater possibility of earning a reasonable return. Second, fund managers have difficulty beating the market in the long term. The longer the selected time frame is, the more likely it is that investors underperform the market. Third, active management requires expensive fixed costs, while passive management requires less fixed costs.

The aforementioned reasons have prompted investors to shift from active investment strategies to passive investment strategies [1]. Passive investment can be achieved through different instruments, such as index funds, passive mutual funds, and passive exchange-traded funds. The principal approach behind these instruments is index tracking. This method is designed to replicate the performance of a particular market index; it can be viewed as a matching between the tracking portfolio and the actual market index.

The most straightforward approach is full replication, which considers all stocks in the index with their corresponding weights. The perfect tracking error is achieved, as this method utilizes all stocks with the same proportion as that of the chosen benchmark index. However, full replication does not work well in practice, as it brings several drawbacks. Imagine an investor who purchases whole Dow Jones Wilshire 5000 Total Stock Market Index or Standard and Poor’ s 500 stocks. First, the cost becomes relatively expensive for small assets under management (AUM), which significantly diminishes the investment return. Second, this process contains various small, illiquid stocks that are difficult to sell for cash without a considerable loss. Thus, the approach damages the return and incurs relatively high costs. Third, when rebalancing the tracking portfolio, the proportions of the whole tracking portfolio must be reassessed. Thus, more fund management is needed.

The second index-tracking method is partial replication. This approach utilizes a small number of stocks to approximately simulate the performance of a chosen market index. Although the tracking error is no longer a perfect match, the costs are lowered, and the process of rebalancing portfolio weights is simplified. Unlike full replication, partial replication involves lower transaction costs and can avoid purchasing illiquid stocks, as only a small number of stocks are employed. In addition, this method partly reassesses the proportions of the tracking portfolio. Thus, partial replication requires less rebalancing costs and is less complicated than full replication.

Under these deliberations, the greatest challenge with index tracking is the tradeoff between the tracking accuracy and cost. When the given portfolio includes a large number of assets, the cost becomes expensive. A common way to handle this problem is the partial replication of market performance without using all assets. However, sparsity and other practical constraints bring complexity, as they form a discontinuous global function optimization problem. A metaheuristic is preferable for dealing with index tracking, as the traditional local method may become trapped in local solutions. The contributions of this paper are briefly discussed as follows:

  • A framework is proposed for the comprehensive index-tracking problem (ITP) based on metaheuristics.

  • The comprehensive ITP is addressed through a fully global method instead of other reviewed suboptimal global-local methods.

  • Competitive simulation performance results are obtained on a benchmark tracking index.

  • The proposed framework can be extended with other practical constraints and via the application of other metaheuristics.

In this paper, we present a metaheuristic-based framework to address the enhanced ITP (EITP) with various practical constraints. We propose a solution strategy that incorporates a quantitative tracking model, metaheuristic procedure, lookback approach, and constraint validator. This method aims to reduce the complexity of the considered problem, presents an efficient model, and systematizes the process. This paper focuses on the US market.

The structure of this paper is organized as follows. The next section discusses a literature review and related works. The third section presents the formulation of the EITP with various practical constraints. The fourth section presents the proposed framework. The fifth section presents the simulation results and discussion. The last section provides the conclusion of the paper.

Related work

We first define some notations that we use throughout the paper. These notations are presented as follows:

  • \(E_{t}\) is the measured tracking error.

  • \(r_{p}\) denotes the computed return of the tracking portfolio: \( \sum _{n=1}^{N} r_{p(t, n) } \odot w_{(t, n)} \) in time t and the current stocks n, and it is expressed as \([r_{p(1)}, \ldots , r_{p(T)}]\).

  • \(w_{(t, n)}\) denotes the weight to be optimized for n stocks, and this is repeated over time t.

  • \( r_{b}\) denotes the return of tracking benchmark in time t, which is expressed as \([r_{b(1)}, \ldots , r_{b(T)}]\).

  • T is the maximum number of trading days in the given period, and \(t=1 \ldots T\).

  • N is the total number of available assets in the tracking benchmark, and \(n=1 \ldots N\).

  • \(\odot \) denotes the Hadamard product [25, 44].

The well-known modern portfolio theory (MPT) was an important breakthrough in personal investing, and it provides insight into index tracking. The mean-variance model was the first approach in MPT to discover the efficient frontier for a tradeoff between the expected return and risk [42]. Some reviewed papers are based on portfolio optimization in multiobjective (MO) test problems, where the overall return and financial risk are optimized [22, 31, 32, 36, 60]. The ITP has been widely studied by different researchers and financial analysts. The objective is to minimize the difference between the chosen benchmark index and the tracking portfolio. The artificial index should be as similar to the benchmark index value as possible. This problem is handled with a lookback approach in which historical price information provides hints about the future. The empirical index-tracking error equation is shown as follows:

$$\begin{aligned} \begin{aligned} \min E_{t}&~= \frac{1}{T}|| r_{p} - r_{b} ||_2^2 \\ \text {s.t.}&~\sum _{n=1}^{N} w_n \le 1 . \end{aligned} \end{aligned}$$
(1)

The portfolio weights must be optimized to approximately replicate the market performance. The out-of-sample metric is used to estimate the performance of the proposed framework, as the optimization of index tracking is based on historical information.

In addition, an investor is also concerned about the interest of their portfolio. Enhanced index tracking no longer involves a single-factor model, as it tries to achieve greater returns than those of the benchmark index by sacrificing a degree of tracking error. The measurement of this problem becomes a task of tracking the error and excess return to estimate the solution. The square root of the tracking error \(E_{s}\) is applied to the equation for balancing its measurement with that of the excess return, as shown as follows:

$$\begin{aligned} \min E_{s} = \sqrt{ \frac{1}{T}|| r_{p} - r_{b} ||_2^2 }. \end{aligned}$$
(2)

The excess return \(E_{e}\) is shown as follows:

$$\begin{aligned} \max E_{e} = \frac{1}{T} r_{p} - r_{b}. \end{aligned}$$
(3)

This function generalizes the tracking error and excess return into a single minimization function, and it is shown as follows:

$$\begin{aligned} \begin{aligned} \min&~ \lambda (E_{s} ) - (1 - \lambda )E_{e} \\ \text {s.t.}&~ \sum _{n=1}^{N} w_n \le 1 . \end{aligned} \end{aligned}$$
(4)

The tradeoff between the tracking error and excess return is determined by \(\lambda \). In addition, the logarithmic return is applied in this paper instead of the arithmetic return, as it is more suitable for estimating the tracking portfolio.

A simple example is discussed here to explain why the logarithmic return is better than the arithmetic return. When a stock price rises from 50 to 100, the arithmetic return will be 1.0, and the logarithmic return will be 0.69 with some decimal places. When the price decreases from 100 to 50, the arithmetic return will be − 0.5, and the logarithmic return will be − 0.69 with some decimal places. Based on this observation, arithmetic returns do not present the same price change magnitudes. As a result, arithmetic returns probably overestimate excess returns, and logarithmic returns give the same price change magnitudes for both positive and negative movements [26, 27, 50]. The formula of an arithmetic return is shown as follows:

$$\begin{aligned} r _{a} = \frac{P_{t} - P_{t-1} }{P_{t-1}}. \end{aligned}$$
(5)

The formula of a logarithmic return is shown as follows:

$$\begin{aligned} r_{l} = \ln \left( \frac{P_{t} }{P_{t-1}} \right) . \end{aligned}$$
(6)

Note that the number of days used for the profit return is one less than the number of days used for the closing price. The returns of tracking portfolio \(r_{p(t,n)}\) and benchmark \(r_{b(t)}\) are computed by \(r_{l}\).

The first approach to deal with sparse index tracking is a two-step procedure that decomposes this problem into stock selection and weight allocation. The manual process was an early stock selection method that was based on financial analysis tools, such as composition features [30]. After that, researchers focused on automating the stock selection process, as this approach is superior to a manual or random method. Various evolutionary heuristic and clustering algorithms have been applied to this process [9, 15, 20, 29, 38, 52]. Once the stock is determined, the weight allocation process is addressed by an exact method such as quadratic programming. However, this procedure is a suboptimal global-local method that accomplishes the two steps separately. A second approach was introduced to address this problem, namely, the joint approach. It integrates the two-step procedure into a single process, and it is optimized through various evolutionary heuristic or stochastic neural networks [3, 21, 37, 46, 63, 65]. The last approach is to reformulate the sparse ITP into an alternative approximate function that is optimized through mixed-integer programming [4, 10, 43, 45]. This approach is complex, and the approximate function is not entirely equivalent to the original function. The original problem is a discontinuous and nonconvex. The problem is approximated by a function that is convex and differentiable. More details about the MIP approach can refer in these papers [7, 8].

Several types of approaches are reviewed, and the proposed framework belongs to the second category. A metaheuristic is a principal method for optimizing the joint problem in this framework, and it possesses several joint optimization advantages. First, the joint method is a fully global optimization technique instead of a suboptimal global-local optimization technique, as the two-step procedure is a suboptimal global-local approach. It is not clear whether the solution is a nearby local solution or a global solution. Second, the joint problem is equivalent to the original problem, which implies that further constraints can be applied without reformulating the whole equation. Unlike for the third approach, adding more considerations is not complex for the joint approach. When more objectives and constraints are introduced, to the complexity of the approximate function is increased. Third, the joint method is a direct approach, as joint optimization depends on the selected metaheuristics. A desirable solution is expected with an acceptable computational resource cost through metaheuristics.

Metaheuristics

A metaheuristic is a high-level, problem-independent procedure that designs a collection of guidelines to develop heuristic optimization algorithms [55]. Sparsity and other constraints bring complexity to the problem, making it a discontinuous and nondifferentiable function that is difficult to address with an exact method. Although metaheuristics do not guarantee globally optimal solutions, they obtain good approximate solutions via convergence and the possibility of acquiring the globally optimal solution. In addition, the performance of metaheuristics is often superior to that of traditional methods [6, 48, 56]. For instance, metaheuristics have been successfully applied to a wide range of fields, such as recommender systems, job scheduling, fake news stance detection, feature selection processes, fuzzy shortest paths, and electric vehicle routing [39, 49, 53, 62, 64, 66]. Thus, a metaheuristic is adopted in the proposed framework to address the comprehensive ITP.

A genetic algorithm (GA) was developed by Holland and his students [24]. It was inspired by Darwin’ s theory of natural selection based on the “survival of the fittest” rule, and it was seemingly the first approach to practice this strategy. The GA is a stochastic search method that simulates the mechanics of biological behaviors. The search operators include reproduction, crossover, and mutation. First, reproduction maintains better solutions through selection pressure from the set of candidate solutions. Then, crossover swaps the information of two parents to generate offspring. Finally, the mutation operator is an uncommon random modification that maintains genetic diversity.

Particle swarm optimization (PSO) was developed by Kennedy and Eberhart [28]. It simulates social behaviors such as bird flocking and fish schooling. The swarm searches for food in multidimensional space through its velocity, personal-best position, and global-best position. However, PSO suffers from the premature convergence problem. Some researchers have combined Gaussian mutation with PSO to maintain the population density and escape local minimal solutions [23, 57].

A competitive swarm optimizer (CSO) was developed by Cheng and Jin for large-scale optimization [12]. Although this algorithm was inspired by PSO, their concepts and theories are dissimilar. The CSO operates based on a random pairwise competition mechanism with the current swarm to generate a set of losers and winners among the particles. The swarm updates its position through the competition mechanism instead of the global-best position or personal-best position. The losers learn from the winners, and the winners are retained in the next generation.

Differential evolution (DE) was developed by Storn and Price [58]. This heuristic algorithm is simple and efficient; only demands a few problem parameters; and combines the mutation operator, crossover operator, and selection operator. First, mutation generates the candidate solutions by joining the existing solutions. Next, crossover determines the new vector based on a predefined crossover rate probability. Then, selection retains the best fitness value for the next iteration. This algorithm has two mechanism update schemes. Scheme one includes a crossover rate and an amplification factor, and scheme two introduces an additional control term to incorporate the current best position.

These metaheuristics are applied in the proposed framework, and their performance is compared via objective measurement. Metaheuristics can be viewed as alternative ways to address this nondeterministic polynomial-time (NP)-hard problem through their search abilities and gradient-free optimization process. A good solution is expected when addressing the comprehensive ITP with practical constraints.

Problem formulation

Enhanced index tracking and practical constraints

The significance of time series is considered in the optimization process to better deal with the EITP. The weight of the time series increases steadily throughout the training dataset, as the days that are closest to the trading days in the test dataset are more important in financial data:

$$\begin{aligned} \min{} & {} ~\lambda (E_{s} ) - (1 - \lambda )E_{e} \nonumber \\ \text {s.t.}{} & {} ~ r_p = \tau _{b} \odot \sum _{n=1}^{N} r_{p(t, n) } \odot w_{(t, n)} \; \nonumber \\{} & {} ~ r_b = \tau _{b} \odot r_{b(t)} \; \nonumber \\{} & {} ~ \tau _{b} = \frac{\zeta _{b{(t)}}T }{\sum _{t=1}^{T} \zeta _{b{(t)}}} \nonumber \\{} & {} ~ \zeta _{b}= 1 + \varsigma \left[ \frac{\sum _{t=1}^{1} \vartheta _{t}}{\sum _{t=1}^{T}\vartheta _{t} }_{1}, \frac{\sum _{t=1}^{2} \vartheta _{t} }{\sum _{t=1}^{T}\vartheta _{t} } _{2}, \ldots , \frac{ \sum _{t=1}^{T} \vartheta _{t} }{\sum _{t=1}^{T}\vartheta _{t} }_{T} \right] \; \nonumber \\{} & {} ~ \vartheta = [ \ln {(1)}_{1} , \ln {(2)}_{2} , \ldots , \ln {(T)}_{T}] \; \nonumber \\{} & {} ~ \varsigma \in {\mathbb {R}}_{\ge 0} \; \end{aligned}$$
(7)

where \(\tau _{b}\) denotes the biased time factor, and \( \varsigma _{b}\) denotes the biased coefficient for non-negative real numbers. If the biased coefficient is set to zero, it is the same as the standard ITP. When the biased coefficient becomes larger, the recent data is given greater weight. The numerator takes the increasing with the natural logarithm, and the denominator gets the summation over the numerator. We take this equation instead of iteratively increasing to avoid the value growth too fast. Note that t is start from one for training and test dataset.

In particular, fund managers not only consider the EITP but also consider other constraints such as sparsity, weights, AUM, transaction fees, the full share restriction, and risk diversification. Various real-life constraints are considered in this comprehensive model.

Commission fee structures can be divided into tiered pricing and fixed pricing frameworks. For tiered pricing, the prices are split into a few levels. When the investor purchases more stocks, the transaction fee decreases. Fixed pricing includes all exchange and regulatory fees, and it is applied to demonstrate the cost constraint for simplicity. The regulatory fees come from the Financial Industry Regulatory Authority (FINRA) trading activity fee.

The prices of fixed costs in this paper are based on Interactive Brokers (IB), as it is one of the largest trading platforms in the US market. The IB brokerage firm charges USD 0.005 per share, the minimum cost is USD 1.00, and the maximum cost is 1.0% of the trade value. Note that if the calculated maximum per order is smaller than the minimum per order, the maximum per order will be evaluated. The FINRA trading activity fee charges 0.000119% of the total trade volume, the minimum trading activity cost is USD 0.01, and the maximum trading activity cost is USD 5.95. The equations of the commission fee and the FINRA trading activity fee are shown as follows:

$$\begin{aligned} \begin{aligned} o_{n}&~ = 1.0 \le 0.005 \odot I_{n} \le \xi \odot w_{n} \odot 0.01 \\ u_{n}&~ = 0.01 \le \xi \odot w_{n} \odot 0.000119 \le 5.95 \\ \eta&~ = \sum _{i=1}^N (o_{n} + u_{n}), \end{aligned} \end{aligned}$$
(8)

where \(o_n\) denotes the commission fee for n current assets, \( u_{n}\) denotes the FINRA trading activity fee for n current assets, \(\xi \) denotes the value of the AUM, \(w_{n}\) denotes the weight for the n current stocks, N denotes the total number of stocks, i denotes the index \(1 \cdots N\), and \(\eta \) denotes the total transaction fee.

In addition, a suitable transaction fee is considered. According to the standard practice, the FINRA 5% rule stipulates that the broker should not charge more than 5% of the commission fee value in the US stock market. The equation of the proper transaction fee is shown as follows:

$$\begin{aligned} \begin{aligned} \upsilon _{n} = {\left\{ \begin{array}{ll} o_{n} - \xi \odot w_{n} \odot \varrho , \; &{} {\text {if}} \; o_{n} > \xi \odot w_{n} \odot \varrho \\ 0, \; &{} {\text {if}} \, o_{n} \le \, \xi \odot w_{n} \odot \varrho \; \end{array}\right. } \end{aligned} \end{aligned}$$
(9)

where \(\varrho \) denotes the percentage of the acceptable commission fee to be subtracted from the budget value for a rate of 0.05.

Regarding risk, systematic risk and unsystematic risk exist in the finance market. Systematic risk represents the aggregation of risk from all investors in the market, such as the risks related to natural disasters and epidemics. Unsystematic risk denotes the risk that is unique to a particular company value, and it is lowered by diversifying the portfolio weights among different stocks. Therefore, systematic risk is unpredictable in the finance market, and unsystematic risk is considered in risk diversification.

Finance analysts recommend that investors practice risk management strategies that incorporate a broad range of investments within a portfolio. A combination of distinct assets can lower financial exposure to any particular asset risk. The portfolio standard deviation (SD) and upper bound are used to lower the total risk. Before discussing the portfolio SD, the stock correlation coefficient (CC) and portfolio VAR are discussed first, as they are highly related terms.

The stock CC measures the movement relation between two or more assets by calculating the Pearson CC, and the value of the CC is between − 1 and 1 (Asuero et al. [2]; Taylor [59]). A positive CC means that when one stock price increases, the other stock price also increases. Conversely, a negative CC denotes an inverse correlation between these stocks, where the stock prices move in opposite directions. For instance, a highly positive CC implies that the compared stock prices move simultaneously in the same direction and at similar percentages most of the time. Note that a negative stock CC is unusual in the real world. The equation for calculating the stock CC is shown as follows:

$$\begin{aligned} \rho {(x_{1} , x_{2} )} = \frac{\text {cov}(x_{1} , x_{2} )}{\sigma _ {x_{1} } \sigma _ {x_ {2} } }, \end{aligned}$$
(10)

where \(\rho \) denotes the CC operand, cov denotes the covariance operator, and \(\sigma \) denotes the SD operator. The CC equation for assets \( x_1 \) and \( x_2\) can be expanded as follows:

$$\begin{aligned} \rho {(x_{1} , x_{2} )} = \frac{{}\sum _{t=1}^{T} (x_{(1, t)} - \overline{x_{1} })(x_{(2, t)} - \overline{x_{2} })}{\sqrt{\sum _{t=1}^{T} (x_{(1, t)} - \overline{x_{1} })^2 } \sqrt{ \sum _{t=1}^{T}(x_{(2, t)} - \overline{x_{2} })^2} } \end{aligned}$$
(11)

where x is the return of x asset, and \({\overline{x}} \) is the mean value of x. Then, the matrix of correlation coefficients is shown as follows:

$$\begin{aligned} R (X) = \begin{bmatrix} 1 &{} \quad \rho {(x_{1},x_{2} )} &{} \quad \cdots &{}\quad \rho {(x_{1},x_{n} )} \\ \rho {(x_{2} , x_{1} )} &{}\quad 1 &{}\quad \cdots &{}\quad \rho {(x_{2},x_{n} )} \\ \vdots &{} \quad \vdots &{}\quad \ddots &{}\quad \vdots \\ \rho {(x_{n}, x_{1} )}&{}\quad \rho {(x_{n},x_{2 }) } &{}\quad \cdots &{}\quad 1 \\ \end{bmatrix}, \end{aligned}$$
(12)

where R(X) denotes the CC matrix. After introducing the stock correlation concept, we turn back to the portfolio SD, which measures the overall portfolio risk. A low portfolio SD implies that the portfolio exhibits less volatility and higher stability. In contrast, a high portfolio SD highlights that the investment risk is high. The equation for calculating the portfolio SD is shown as follows:

$$\begin{aligned} \sigma _{p} = \sqrt{ w_{n} \otimes \text {cov}(r_{p}) \otimes w_{n}^T }, \end{aligned}$$
(13)

where \(\sigma _{p}\) denotes the portfolio SD, \( \otimes \) denotes the matrix multiplication operation and \(w_{n}^T\) denotes the transpose of the matrix. Note that the order of the elements is not exchangeable in matrix multiplication. However, it is hard to determine whether the portfolio SD value whether it is high or low based on the value itself. Therefore, an equally weighted portfolio SD is used to determine the baseline. The multiplier is indicated to relax the risk constraints, as cardinality restricts risk diversification. The more stocks held in the portfolio, the lower the risk exposure is [16].

Various real-life constraints are considered in this model. First, the cardinality constraint restricts the maximum number of stocks and provides sparsity [33]. Thus, the management cost is decreased, and the fund administration workload is reduced. Second, the investor is not able exceed the budget value, and the budget should be utilized as much as possible. Thus, the minimum percentage of the budget value is determined. Note that this also implies that the summed portfolio weights should be smaller than one. Third, the transaction cost should not be too expensive, and this cost is limited. Fourth, risk diversification is considered to benefit a return due to the use of a portfolio with less risk, and risk diversification is measured by the portfolio SD. Fifth, the full share restriction is examined, as the number of buyable stocks should be an integer. Although the idea of a fractional share has been raised, this approach is not available at every brokerage. Sixth, the lower bound for the weights is defined as greater than or equal to zero, and short selling is not permitted. Short selling is a high-risk activity that may cause very large losses. Seventh, the upper bound for the weights is defined to provide risk diversification, as the investor should not put all of their eggs in one basket. This prevents all resources from concentrating on a particular asset, as one could lose everything in such a scenario. Under these considerations, the equation of the EITP with various practical constraints is formulated.

Fig. 1
figure 1

Diagram of the ITP

The notations and equation are presented as follows:

  • \(\kappa \) denotes the cardinality constraint that restricts the number of assets in the portfolio.

  • \(\xi \) denotes the amount of AUM being invested.

  • \(\eta \) denotes the transaction fee. There are two major types of transaction fees, with tiered pricing and fixed pricing structures. For simplicity, fixed pricing is considered in this framework.

  • \(\varphi \) denotes the minimum percentage of the budget value that must be contributed to the portfolio.

  • \(e_{n}\) denotes equally weights, which are stated as follows: \( e_{n} = [ \frac{1}{N}_{1}, \frac{1}{N}_2 \cdots \frac{1}{N}_N] \)

  • \( ( w \odot \sigma ) ^T \) denotes the matrix transpose operation.

  • \( w_{n} \otimes \text {cov}(r_{p(t, n)}) \otimes w_{n}^T \) denotes the portfolio variance (VAR), and the portfolio SD is the square root of the portfolio VAR.

  • \( e_{n} \otimes \text {cov}(r_{p(t, n)}) \otimes e_{n}^T \) denotes the equally weighted portfolio VAR, and the square root of the equally weighted portfolio VAR is the equally weighted portfolio SD.

  • \(\phi \) denotes the multiplier coefficient for the equally weighted portfolio SD.

  • \( \iota \) denotes the modulo operator.

  • \(P_{n}\) denotes the closing price on the starting day of the specific period of interest for the n current assets.

  • \(\mu \) denotes the upper bound for each stock.

Note that some notations have been previously mentioned.

$$\begin{aligned} \begin{aligned} \min&~ \lambda (E_{s}) - (1 - \lambda )E_{e} \\ \text {s.t.}&~ \sum _{n=1}^{N} \bigtriangleup (w_{n}) \le \kappa \; \\&~ \varphi \xi \le \sum _{n=1}^{N} ( \xi \odot w_{n} ) + \eta \le \xi \; \\&~ \sum _{n=1}^{N} \upsilon \le 0 \; \\&~ \sqrt{ w_{n} \otimes \text {cov}(r_{p(t, n)}) \otimes w_{n}^T } \le \phi \sqrt{ e_{n} \otimes \text {cov}(r_{p(t, n)}) \otimes e_{n}^T } \; \\&~ \iota \; ( \; \xi \odot e_{n} , P_{n} \; ) = 0 \\&~ 0 \le w_{n} \le \mu \; . \end{aligned} \end{aligned}$$
(14)
Fig. 2
figure 2

Process flow of the proposed framework

The \( \bigtriangleup \) denotes the operator for measuring the cardinality constraint. When the weight of a stock is larger than zero, the cardinality is one:

$$\begin{aligned} \bigtriangleup (w_n) = {\left\{ \begin{array}{ll} 0, \; \textrm{if} \; w_n = 0 \\ 1, \; \textrm{if} \, w_n > 0 . \end{array}\right. } \end{aligned}$$
(15)

In addition, fractional rounding is practiced to address the full share restriction, as the equality constraint is hard for metaheuristics to handle [13]. As a result, the rounding mechanism is applied, and the price is estimated on the starting day of the training and test period. Note that the minimum trading unit is one share in the US. Therefore, investors can invest more freely in the US market than in other regions. For instance, Hong Kong Exchanges have a restrictive policy in which the minimum trading unit is one lot. The equations of fractional rounding are shown as follows:

$$\begin{aligned} \begin{aligned} \lfloor x \rfloor&~ = \; \sup {\{ m \in {\mathbb {Z}}, m \le x\} }, \\ \lceil x \rceil&~ = \; \inf { \{ m \in {\mathbb {Z}}, m \ge x \} }, \\ [ x ]&~ = {\left\{ \begin{array}{ll} \lfloor x \rfloor , \; &{} \textrm{if} \; ( x - \lfloor x \rfloor ) \le q \\ \lceil x \rceil , \; &{} \textrm{if} \; ( x - \lfloor x \rfloor ) > q \end{array}\right. }, \end{aligned} \end{aligned}$$
(16)
Fig. 3
figure 3

Dataset with the standard Pareto principle

where x denotes the element set of integers, [x] denotes the rounding operand, \(\lfloor x\rfloor \) denotes the floor operand, \(\lceil x\rceil \) denotes the ceiling operand, \(\sup \) denotes the supremum, \(\inf \) denotes the infimum, q is set to 0.5 (corresponding to midpoint rounding), x denotes the set of all real numbers, and \({\mathbb {Z}}\) denotes the set of integers. The equation for integer rounding is shown as follows:

$$\begin{aligned} I_{n} = \left[ \frac{\xi \odot w_{n} }{P_{n} }\right] , \end{aligned}$$
(17)

where \(I_{n}\) denotes the number of full shares and \(P_{n}\) denotes the closing price of the current n assets on the starting day. Once the numbers of full shares are obtained, the portfolio weights are reassigned. The equation of the weights reassignment process is shown as follows:

$$\begin{aligned} w_n = \frac{ P_{n} \odot I_{n} }{\xi }. \end{aligned}$$
(18)

After the weight solutions for the test data are rounded, an equation is practiced to prevent cardinality constraint violations. The equation is shown as follows:

$$\begin{aligned} w_{e} = \bigtriangleup (w_{b}) \odot w_{e}, \end{aligned}$$
(19)

where \(w_{e}\) is the weight of the test data, and \(w_{b}\) is the best discovered weight of the training data.

Penalty technique

The enhanced penalty technique is practiced to handle various practical constraints, as it is widely applied [14]. The concept of this technique comes from Lagrangian relaxation (LR), which is an early method for approximating a challenging constrained problem to a straightforward [17, 18]. The constraint inequality optimization problem is shown as follows:

$$\begin{aligned} \begin{aligned} \min&~ c^Tx \\ \text {s.t.}&~ Ax \le b \\&~ x \in \chi , \end{aligned} \end{aligned}$$
(20)

where x denotes the optimal variables for the primal problem, b and c are given vectors, \(c^T\) denotes the transpose operand for transforming c, \(\chi \) denotes a set of elements, and \(Ax \le b\) denotes the inequality constraint. Then, the equation of the inequality constraint with LR is shown as follows:

$$\begin{aligned} \begin{aligned} \min&~ c^Tx + d(Ax - b) \\ \text {s.t.}&~ x \in \chi , \\ \end{aligned} \end{aligned}$$
(21)

where d denotes the positive Lagrangian multiplier coefficients. After discussing the LR method, the enhanced penalty technique is presented. This method handles the inequality constraint through the summation of the penalty term and an original objective function; this fitness function is shown as follows:

$$\begin{aligned} \begin{aligned} F(x) = f(x) + \sum _{s=1} ^{S} p_s \langle g_s (x) \rangle ^ 2, \end{aligned} \end{aligned}$$
(22)

where f(x) denotes the original objective function, S denotes the number of constraints, s is an index from \( 1 \cdots S\), \(\langle \rangle \) denotes the absolute value operand that returns zero for a negative value, \(p_s\) denotes the penalty parameter that adjusts the magnitude of the sth constraint, and \(g_s (x)\) denotes the constraint for the current s.

Table 1 Major objectives under the cardinality constraint with \(\kappa = 5 \)
Table 2 Major objectives under the cardinality constraint with \(\kappa = 10 \)
Table 3 Major objectives under the cardinality constraint with \(\kappa = 15 \)
Table 4 Major objectives under the cardinality constraint with \(\kappa = 20 \)

Constrained index tracking

Eventually, the EITP with various practical constraints is considered in this metaheuristic-based framework. First, the cardinality constraint is applied to capture the sparsity of this problem. Second, the maximum and minimum budgets and transaction fees are limited. This also implies that the summed weights should not be greater than 100%. Third, the acceptable commission fee is examined. Fourth, the portfolio SD and benchmark portfolio SD are considered the portfolio risk. These constraints are based on Eq. 11. However, the magnitudes of different constraints are not in the same order. Before determining reasonable penalty terms, a fraction is applied to regulate these magnitudes. In addition, some constraints are not yet restricted, and they are handled by the proposed framework. The equations of this problem are shown as follows:

$$\begin{aligned} \begin{aligned} \min&~ \lambda (E_{s}) - (1 - \lambda ) E_{e} + \sum _{s=1} ^{S} p_s \langle g_s (x) \rangle ^ 2 \\ \text {s.t.}&~ g_{1}(x) = \frac{ \sum _{n=1}^{N} \bigtriangleup (w_{n}) }{N} - \frac{\kappa }{N} \; \\&~ g_{2}(x) = \frac{ \sum _{n=1}^{N} ( \xi \odot w_{n} ) + \eta }{\xi } - 1 \;\\&~ g_{3}(x) = \varphi - \frac{ \sum _{n=1}^{N} ( \xi \odot w_{n} ) + \eta }{\xi } \; \\&~ g_{4}(x) = \frac{ \sum _{n=1}^{N} \upsilon }{N} \; \\&~ g_{5}(x) = \sqrt{ w_{n} \otimes \text {cov}(r_{p(t, n)}) \otimes w_{n}^T } \\&\quad - \phi \sqrt{ e_{n} \otimes \text {cov}(r_{p(t, n)}) \otimes e_{n}^T }\; \\&~ r_p = \tau _{b} \odot \sum _{n=1}^{N} r_{p(t, n) } \odot w_{(t, n)} \; \\&~ r_b = \tau _{b} \odot r_{b(t)} \; \\&~ \tau _{b} = \frac{\zeta _{b{(t)}}T }{\sum _{t=1}^{T} \zeta _{b{(t)}}} \\&~ \zeta _{b}= 1 + \varsigma \left[ \frac{\sum _{t=1}^{1} \vartheta _{t}}{\sum _{t=1}^{T}\vartheta _{t} }_{1}, \frac{\sum _{t=1}^{2} \vartheta _{t} }{\sum _{t=1}^{T}\vartheta _{t} } _{2}, \ldots , \frac{ \sum _{t=1}^{T} \vartheta _{t} }{\sum _{t=1}^{T}\vartheta _{t} }_{T} \right] \; \\&~ \vartheta = [ \ln {(1)}_{1} , \ln {(2)}_{2} , \ldots , \ln {(T)}_{T}] \; \\&~ \varsigma \in {\mathbb {R}}_{\ge 0}, \; \\ \end{aligned} \end{aligned}$$
(23)

where S denotes the number of penalty constraints, and s ranges from \(1 \cdots 5\). The biased time factor \( \varsigma \) is only applied to the major objective, not to the constraints.

Fig. 4
figure 4

Simulation results obtained on the test data when \(\kappa = 5\)

Fig. 5
figure 5

Simulation results obtained on the test data when \(\kappa = 10\)

Fig. 6
figure 6

Simulation results obtained on the test data when \(\kappa = 15\)

Fig. 7
figure 7

Simulation results obtained on the test data when \(\kappa = 20\)

Proposed framework

This proposed metaheuristic-based framework addresses the comprehensive ITP. An investor considers the major objective, general constraints, risk diversification, and the total budget. Therefore, these considerations form a mathematical formulation via the penalty technique, and they are addressed through the framework. The investor determines the quality of the solutions and the number of iterations, and he or she repeats this process until reaching the maximum number of iterations. The overall considerations and operations are reviewed in Fig. 1. Furthermore, other considerations can incorporated into this framework.

The framework starts by collecting asset data for the tracking index and computing their returns. Then, the investor considers this problem with various realistic constraints. Once the issue is settled, these considerations are formulated with the penalty technique. After formulation, the dataset is split into training and test subdatasets. Once the dataset is available, metaheuristic optimization is applied to optimize the portfolio weight. The following step is to check whether the optimized portfolio violates any constraints. Eventually, the performance of the optimized portfolio is evaluated. The process flow of the framework is summarized in Fig. 2.

Table 5 Major objectives under the cardinality constraint \(\kappa = 5 \) with biased coefficient \(\varsigma \) on DE1
Table 6 Major objectives under the cardinality constraint \(\kappa = 10 \) with biased coefficient \(\varsigma \) on DE1
Table 7 Major objectives under the cardinality constraint \(\kappa = 15 \) with biased coefficient \(\varsigma \) on DE1
Table 8 Major objectives under the cardinality constraint \(\kappa = 20 \) with biased coefficient \(\varsigma \) on DE1

The movement of the dataset is discussed here. When the solutions satisfy the problem constraints in the training set, these solutions are passed to the candidate set. After checking for constraint violations, the best solution is applied to the test dataset. The training dataset is used to discover the optimal evaluation model in the development stage. In addition, the standard Pareto principle is applied to split the dataset properly, as the 80/20 rule states that 80% of the results come from 20% of the causes [47]. The training and test datasets are set from 0 to 64%, from 64 to 80%, and from 80 to 100% of the whole dataset. The dataset is summarized in Fig.  3. The experimental dataset is derived from Standard and Poor’ s 100 Index from 01/01/2017 to 31/12/2017.

figure a

The procedure of the proposed framework is shown in Framework 1. The pseudocode begins by retrieving the closing price of the tracking index. Once the asset dataset and benchmark data are ready, the return is computed. Next, other functions and parameters are confirmed. After these steps, the portfolio weight is optimized through metaheuristics. The following step is to check for constraint violations. Once the solutions satisfy the feasible region, the weight is stored in the candidate solutions. When all candidate solutions are available, the best candidate in the training dataset is applied to the test dataset. Eventually, the simulation results are reported, and the graph is plotted.

Simulation

Simulation settings

In this simulation, the investors are assumed to not have any bias with respect to the tracking error or the excess return. Thus, \(\lambda \) is set to 0.5. The AUM is set to \(10^5\), the minimum percentage of the budget to be spent (\(\varphi \)) is set to 0.98, and the multiplier coefficient for an equally weighted portfolio SD (\(\phi \)) is set to 1.2. The cardinality constraint is separately set to 10 and 20. When the cardinality constraint is set to 10, the upper bound is set as 0.2. When the cardinality constraint is set to 20, the upper bound is set as 0.1.

Fig. 8
figure 8

Simulation results obtained on the test data when \(\kappa = 5\) with various biased coefficient \(\varsigma \) on DE1

Fig. 9
figure 9

Simulation results obtained on the test data when \(\kappa = 10\) with various biased coefficient \(\varsigma \) on DE1

Fig. 10
figure 10

Simulation results obtained on the test data with a cardinality constraint of \(\kappa = 15\) with various biased coefficient \(\varsigma \) on DE1

Fig. 11
figure 11

Simulation results obtained on the test data with a cardinality constraint of \(\kappa = 20\) with various biased coefficient \(\varsigma \)

Fig. 12
figure 12

Change in fitness value for with cardinality constraint with compared metaheuristics

Fig. 13
figure 13

Change in fitness value for with various biased coefficient \(\varsigma \) on DE1

For a fair comparison, the population sizes for all metaheuristic algorithms are set to 100. The stopping criteria are defined by maximum number of iterations 20,000. The CSO is set to 40,000 maximum number of iterations because it evaluates half of the particles for each process.

For the GA, binary tournament selection and uniform crossover are used. For PSO, the inertia weight is set to 0.72984, and personal-best and global-best acceleration parameters are set to 2.05. Mutation is applied to enhance the algorithmic performance. The mutation rates of the GA and PSO are set to 0.02. For the CSO, the control parameter of the mean position is set to zero because the number of decision spaces is less than one thousand. Note that this parameter is suggested from the original paper. Regarding DE, two schemes are compared in this paper. The amplification factor is set to 1, and the crossover rate is set to 0.3 in scheme 1. In scheme 2, the amplification factor is set to 1, the crossover rate is set to 0.2, and the additional control parameter is set to 0.99.

Simulation results

The penalty terms control the magnitudes of the constraint violations. Thus, the penalty terms need to be investigated carefully. The penalty terms \( p_1(x)\), \( p_2 (x)\), \( p_3 (x)\), \( p_4(x)\), and \( p_5 (x)\) are set to 100, 100, 2000, 10 and 200, respectively. After determining the suitable penalty terms, the remaining simulations are based on the described settings.

The performances of the GA, PSO, CSO, and DE are compared within the proposed framework. The major objectives of various cardinality constraints are presented in Tables 1, 2, 3 and 4. The cumulative returns are presented for the test data in Figs 4, 5, 6 and 7. Note that N/A denotes the constraint is not satisfied and the solution is dropped. It can be seen that DE1 performs better than other compared algorithms in most of the test cases. Based on this observation, further investigations are conducted using DE1. When the value of the biased time coefficient is greater, the weight is biased towards the later period. It is expected that the later period of the training data is more important in the ITP, and a biased time coefficient can improve the result. The biased time coefficient is set to 0, 250, 500, 750, and 1000, and the performances of various biased time coefficients are tested. Results are presented in Tables 5, 6, 7 and 8. Figures 8, 9, 10 and 11 present the cumulative returns on cardinality constraints with various biased time coefficients. Note that the cumulative return is calculated by the cumulative product for the return. When the cardinality constraint is relaxed, the fitness value is decreased. Figure 12 shows the change in fitness value with various metaheuristics. Figure 13 presents the change in fitness value with various biased time coefficients on DE1. It can be concluded that the use of metaheuristics in the proposed framework is able to solve the formulated optimization problem.

Conclusion

In this paper, the EITP and various practical constraints are addressed by the proposed metaheuristic-based framework. The proposed framework is different from traditional frameworks which makes use of a fully global approach rather than a suboptimal global-local approach. The traditional method achieves unstable performance. Metaheuristics can obtain global solutions with probabilities. Moreover, sparsity, weights, AUM, transaction fees, the full share restriction, and risk diversification are considered.

In summary, the comprehensive ITP is addressed by the proposed framework. In the simulation, the GA, PSO, the CSO, and DE are applied to the comprehensive ITP, and a competitive result is obtained by the proposed method. In addition, the framework can incorporate other practical constraints. In the future, this framework will be able to feasibly be run on further simulations through other metaheuristics and datasets derived from other benchmark market tracking indices.