An Iterative Approach to Ill-Conditioned Optimal Portfolio Selection

Covariance matrix of the asset returns plays an important role in the portfolio selection. A number of papers is focused on the case when the covariance matrix is positive definite. In this paper, we consider portfolio selection with a singular covariance matrix. We describe an iterative method based on a second order damped dynamical systems that solves the linear rank-deficient problem approximately. Since the solution is not unique, we suggest one numerical solution that can be chosen from the iterates that balances the size of portfolio and the risk. The numerical study confirms that the method has good convergence properties and gives a solution as good as or better than the solutions that are based on constrained least norm Moore–Penrose, Lasso, and naive equal-weighted approaches. Finally, we complement our result with an empirical study where we analyze a portfolio with actual returns listed in S&P 500 index.


Introduction
Modern portfolio theory has drawn much attention in the academic literature starting from 1952 when Harry Max Markowitz published his seminal paper about portfolio selection (see Markowitz 1952). He proposed efficient way of portfolio allocation that guarantees the lowest risk for a given level of the expected return.
A number of papers are devoted to questions like, e.g., how can an optimal portfolio be constructed, monitored, and/or estimated by using historical data (see, e.g., Alexander and Baptista 2004;Golosnoy and Okhrin 2009;Bodnar 2009;Bodnar et al. 2017a;Bauder et al. 2018), what is the influence of parameter uncertainty on the portfolio performance (cf., Okhrin and Schmid 2006;Bodnar and Schmid 2008;Javed et al. 2017), how do the asset returns influence the portfolio choice (see, e.g., Jondeau and Rockinger 2006;Mencia and Sentana 2009;Adcock 2010;Harvey et al. 2010;Amenguala and Sentana 2010), how is it possible to estimate the characteristics of the distribution of the asset returns (see, e.g., Jorion 1986;Wang 2005;Frahm and Memmel 2010), how can the structure of optimal portfolio be statistically justified (Gibbons et al. 1989;Britten-Jones 1999;Bodnar and Schmid 2009). Björk et al. (2014) studied the mean-variance portfolio optimization in continuous time, whereas Liesiö and Salo (2012) developed a portfolio selection framework which uses the set inclusion to capture incomplete information about scenario probabilities and utility functions. Chiarawongse et al. (2012) formulated a mean-variance portfolio selection problem that accommodates qualitative input about expected returns and provided an algorithm that solves the problem, while Levy and Levy (2014) analyzed the parameter estimation error in portfolio optimization.
There is another strand of research which focuses on factor models with applications to finance. In particular, factor models are commonly used in asset pricing theory and classical examples are the Capital Asset Pricing Model (CAPM) and the Arbitrage Pricing Theory (APT) (see Ross 1976;Sun et al. 2019 and references therein). Practically speaking, factor models can be used in the portfolio construction, the evaluation of performance of portfolio managers, the predicting asset returns, etc. (Meucci 2005;Chincarini and Kim 2006). Different estimation techniques for the covariance and precision matrices which are based on factor models in small and large dimensions with applications in portfolio theory are proposed by Ledoit and Wolf (2003), Fan et al. (2008), Fan et al. (2012), Fan et al. (2013), Bodnar and Reiss (2016), De Nard et al.
(2019) and among others. There is also the well known Barra Risk Factor Analysis which is pioneered by Bar Rosenberg, founder of Barra Inc. (see Grinold and Kahn 2000;Christopherson et al. 2009;Connor and Korajczyk 2010). In its core the model involves a number of factors that can be utilized to predict and control risk. 1 All above discussed papers are focused on the case when the covariance matrix of the asset returns is positive definite. In practice, covariance matrix is unknown, therefore, it needs to be estimated using historical data. Most common estimators are sample and maximum likelihood estimators. If the number of observations is greater than the number of assets in the portfolio then both estimators of the covariance matrix are positive definite. However, if the number of observations is less than the number of assets in the portfolio then both estimators of the covariance matrix are singular. 2 In this paper, we focus on the second case that leads us to the singular estimators of the covariance matrix. In practice, one could face this situation because of different 1 Let us note that Barra Inc. delivered different practical methods that can also control risk via additional constraints and without using factor models (Barra 2007). 2 Under the assumption of normally distributed data, the sample estimator of the covariance matrix has a (singular) Wishart distribution and its distributional properties are well studied by Muirhead (1982), Srivastava (2003), Bodnar and Okhrin (2008) and among others. Moreover, a product of a (singular) Wishart, or a (singular) inverse Wishart, matrix with a (singular) Gaussian vector characterizes the sample estimator of the tangency portfolio weights. Distributional properties of these products in different settings are analyzed by Bodnar and Okhrin (2011), Bodnar et al. (2013Bodnar et al. ( , 2014, Kotsiuba and Mazur (2015), Bodnar et al. (2018a), Bodnar et al. (2019b). reasons which are discussed by Bodnar et al. ( , 2017b and Bodnar et al. (2019c). For example, it is very common to consider the sample size that is shorter over time period to avoid varying dependence between assets. Additionally, one can have a large number of assets in the portfolio: the S&P 500 index consists of 500 companies that are traded on the NYSE and NASDAQ. It is also possible to face the multicollinearity because of high correlation of assets within a specific industry branch.
Since optimal portfolio weights depend on the inverse of the covariance matrix, the original optimization problem formulated by Markowitz (1952) will have an infinite number of solutions when the covariance matrix is singular. In Pappas et al. (2010), the solution to the optimization problem with singular covariance matrix is obtained by replacing the inverse with the Moore-Penrose inverse. It leads us to the unique solution with the minimal Euclidean norm. Statistical properties of the optimal portfolio weights for small sample and singular covariance matrix are well studied by Bodnar et al. ( , 2017b and Bodnar et al. (2019c). There are also various regularization methods that can be used when the covariance matrix is singular. The most common techniques are the ridge-type approach, the Landweber-Fridman algorithm, the spectral cut-off method, and the Lasso-type approach. The ridge-type approach is constructed by adding a diagonal matrix to the covariance matrix (Tikhonov and Arsenin 1977), while the Landweber-Fridman algorithm delivers a convergent iterative scheme (Kress 1999). The spectral cut-off method is based on the removing the eigenvectors associated with the smallest eigenvalues (Chernousova and Golubev 2014), and the Lasso-type approach penalizes the l 1 norm of the optimal portfolio weights (Brodie et al. 2009).
The main aim of the present paper is to deliver an alternative approach. In particular, we employ an iterative method that solves the linear ill-posed problem approximately. Iterative methods for linear ill-posed problems are certainly not new and include classical methods like Landweber iteration with Nesterov acceleration and Conjugate gradient methods, see Neubauer (2000Neubauer ( , 2017. Very recently a new approach has been developed based on second order damped dynamical systems, see Gulliksson et al. (2019) for an introduction and overview and specifically Zhang and Hofmann (2018) for the ill-posed linear case. In Zhang and Hofmann (2018) it is shown that the method is a regularization method, i.e., loosely speaking for an ill-posed linear problem there exists a unique solution when the number of iterations tend to infinity and the error tends to zero. We have applied and extended this method to the rank-deficient and linearly constrained portfolio selection problem considered here. Specifically, we show that the method is convergent and how to choose optimal parameters (time step and damping). As seen in Sects. 3.1 and 3.2 the iterative method generally performs better in the sense of giving a smaller variance of the portfolio.
The rest of the paper is structured as follows. In Sect. 2, an iterative approach to ill-conditioned optimal portfolio selection is discussed. The results of numerical and empirical studies are discussed in details in Sect. 3, while Sect. 4 summarizes the paper.

Main Results
We consider a portfolio of k assets and let x i = (x 1i , . . . , x ki ) T be the k-dimensional vector of log-returns of these assets at time i = 1, . . . , N . We assume that the second moment of x i is finite. Let the mean vector of the asset returns be denoted by μ and the covariance matrix by such that rank( ) = r ≤ k, i.e. can also be singular matrix. Let w = (w 1 , . . . , w k ) T be the vector of portfolio weights, where w j denotes the weight of the jth asset, and let 1 be the vector of ones while I be the identity matrix.
The classical problem of portfolio selection is defined as where q is the expected rate of return that is required on the portfolio. In general, we allow for short sales and, consequently, for negative weights. 3 If is a positive definite matrix then the optimization problem (1) has unique solution which is given by However, if is singular then the optimization problem (1) has an infinite number of solutions. Pappas et al. (2010) suggested the solution that appears to be unique with the minimal Euclidean norm and is obtained by replacing the inverse with the Moore-Penrose inverse In practice, both μ and are unknown parameters and the investor cannot determine w. Consequently, she/he should estimate μ and using previous observations. The most common estimators of μ and are given bŷ where X = (x 1 , . . . , x N ), and V = I − 1 N 11 T is a symmetric and idempotent matrix, i.e., V = V T and V 2 = V. If N ≥ k then sample covariance matrixˆ is positive definite, butˆ is singular when N < k. Hence, we can get the case when portfolio size is larger the sample size and, therefore, sample covariance matrix will be singular.
Then one can use the solution with smallest Euclidean norm that is defined in (3). Alternatively, one can use an iterative approach that is discussed in the next section.

The Discrete Functional Particle Method
In this section we give a brief description of the Discrete Functional Particle Method (DFPM) and describe how DFPM can be used to solve (1). For a more comprehensive discussion of DFPM see Gulliksson et al. (2019).
Let us first consider the unconstrained minimization problem where V : R n → R is at least a twice continuously differentiable convex function giving a unique solution u * . We use the conventional notation for inner-product in R n and norm as (·, ·) and · , respectively. The main idea for solving (5) is to utilize the fact that the solution u * to (5) is also a stationary solution to the second order damped dynamical system and this solution is unique and globally exponentially stable, see, e.g., references in Bégout et al. (2015). The dynamical system (6) is most efficiently used using a symplectic method such as, e.g., symplectic Runge-Kutta or Störmer-Verlet Hairer et al. (2006) applied on a reformulation of (6) as the first order systeṁ The approach of finding the solution to (5) by solving (6) with a symplectic method is DFPM. It is the combination of the damped dynamical system together with an efficient (fast, stable, accurate) symplectic solver that makes DFPM a very competitive method. Let us now return to our main optimization problem (1) in order to apply DFPM. The constraints in (1) can be treated in mainly three different ways. Firstly, a projected symplectic method can be used enabling the iterates to stay on the constraint manifold. Secondly, additional damped equations can be formulated such that the constraints are asymptotically satisfied. Thirdly, since the constraints are linear they can be eliminated and an unconstrained problem can be solved by DFPM. Here, we choose the third alternative and leave the other two for future research. Define then the constraints can be written as Bw = c and the solution is w = Zu + g where Z spans the null space of B and g is any solution to Bw = c. After some algebra the minimization problem (1) can be written as The solution will be uniquely defined if M = Z T Z is invertible. In the following we always choose g = B + c where B + is the Moore-Penrose inverse of B, i.e., g is the least norm solution to the constraint equations. Consider again the unconstrained problem (8).
where we will assume that M is positive semidefinite. In the general setting of the minimization problem (5) we can define and it is straightforward to formulate DFPM for V (u) in (10) as or the first order systemu Additionally we need intial conditions, say, u(0) = u 0 ,u(0) = v 0 . Using symplectic Euler on (12) or, equivalently, where t is the time step, η the damping, and we have defined Equation (14) defines DFPM for our problem. In the next section we derive values of t, η that will ensure fast convergence.

Convergence Analysis and Choice of Parameters
When M in (9) has full rank u(t) will converge to the unique solution u = M −1 d.
Turning to the the rank-deficient case we consider the SVD of M = U M U T and tranform (11) to where r M is the rank of M. Since the system now is decoupled we can by partitioning T write the solution of (15) as where γ j i = η/2± η 2 /4 − s i and α kl i are given by the initial conditions. We conclude that the solution is unbounded and grows linearly with t. However, that does not mean that the dynamical system (15) can not be used for getting one, of infinitely many, solutions to (1). Indeed, by iteratively solving (15) and carefully choosing when to stop the iterations we attain a, hopefully useful, regularized solution. This is called iterative regularization in the literature of ill-posed problems and is an alternative to the least norm solution (Vogel 2002). In order to ensure fast convergence of the iterative scheme (13) one must choose the time step and damping such that for convergence G < 1 and for efficiency G is as small as possible. The otpimal choice of parameters can be summarized in the following theorem, see Gulliksson (2017), Gulliksson et al. (2019) and references therein. However, the result in Theorem 1 is not applicable for the case when M does not have full rank since the smallest eigenvalue will be zero and the damping will in turn be zero giving an oscillating solution that does not converge to a solution of the underdetermined system. Therefore, we choose the parameters only in the subspace defined by the nonzero eigenvalues of M.

Theorem 1 Consider symplectic Euler
where λ r M = min λ i >0 λ i and λ max = max i λ i is convergent.
Proof The time step is smaller and the damping is larger than the parameters given by Theorem 1 and therefore the method is stable and convergent.

Numerical Study
In this section we examine the iterative approach which is proposed by us. In particular, we evaluate optimal portfolio weights, its norm 4 and variance of the portfolio. All results are compared with the optimal portfolio weights which are obtained by using Moore-Penrose inverse, Lasso-type approach, and naive equal-weighted approach.
In the simulation study, each element of μ is uniformly distributed on the interval [−0.01, 0.01]. The population covariance matrix is generated as follows: • first r non-zero eigenvalues of are generated from the uniform distribution on the interval (0, 0.01], while the rest k − r eigenvalues are set to 0; • the eigenvectors of are generated from the Haar distribution by generating a Wishart matrix with identity covariance and calculating its eigenvectors. The expected rate of portfolio return q is taken to be 1 T μ/k, i.e. q is equal to expected rate of naive equal-weighted portfolio. The results are compared for several values of k ∈ {10, 50, 100, 150, 300} and r ∈ {0.1k, 0.4k, 0.6k, 0.9k}. 5 The solution is not unique but one numerical solution can be chosen from the iterates that balances the size of the portfolio and the risk, see Figs. 1, 2, 3, 4 and 5. Therefore, the choice of convergence criteria is important in order to stop the iterations at a satisfactory solution. We have chosen to use a relative convergence criterion based on the projected objective function, i.e., we stop the iterations when where u k is the approximation of the solution of (8) at iteration k and (u) is defined in (8). The tolerance is set to 10 −12 in order to get a small risk. Again referring to Figs. 1, 2, 3, 4 and 5, we note that a higher tolerance will give a higher risk and size of portfolio (norm of w). Maximum number of iterations is taken to be 10 4 but in the presented tables this is not attained. In Figs. 6,7,8,9 and 10, we present optimal portfolio weights that are obtained by using DFPM, Moore-Penrose inverse, and Lasso approaches. We observe that behaviour of the weights is quite different, i.e., three methods suggest three very different investment strategies. In particular, for k ∈ {50, 100, 150, 300} we can observe that absolute values of almost all weights obtained by DFPM approach are much larger than the ones obtained by using Moore-Penrose inverse and Lasso methods.
In Table 1, we present the norm of the optimal portfolio weights that is obtained by using different approaches. We can observe that the DFPM method delivers the largest norm in almost all considered cases except the case when k = 10 and r = 9. In this case, DFPM gives us smaller norm in comparison with Moore-Penrose inverse and Lasso approaches, while the naive equal-weighted method has larger norm than the one which is obtained by using the DFPM approach.
In Table 2, we compare the variance of portfolio for different methods. In all considered cases, we can observe that the DFPM approach shows much better performance than the Moore-Penrose inverse, Lasso, and naive equal-weighted approaches. We would note that the variance of portfolio is much smaller in comparison with vari- ances that are obtained by using other methods. Since the expected return of portfolio q is the same for all methods, the smallest variance leads us to the highest Sharpe ratio. Comparing Tables 1 and 2, we can conclude that the DFPM approach delivers smaller variance of portfolio, while the norm of the weights can be larger than the ones obtained by using the Moore-Penrose inverse, Lasso, and naive equal-weighted approaches. So, the smallest norm of the weights doesn't guarantee us the smallest variance of portfolio.

Empirical Study
In this section the results of an empirical study are presented. It is shown how one can apply the theory from the previous sections to real data. We use weekly S&P 500 logarithmic returns of 440 stocks for the period from the 4th of May, 2007 to the 25th of January, 2013 resulting in 300 observations. Expected rate of return q is taken to be equal to 1, i.e. q = 1. In DFPM approach, the convergence criterion, the tolerance and the maximum number of iterations are taken as in Sect. 3.1. First of all, we estimate mean vector and covariance matrix by using their empirical counterparts that are defined in (4). Since the number of stocks k = 440 is greater than the sample size N = 300, sample estimator of the covariance matrix will be singular matrix with rank( ) = N − 1 = 299, i.e. r = 299. In Fig. 11, we present eigenvalues of the sample covariance matrix. We can observe that the first 299 eigenvalues and much larger than the rest of the eigenvalues. It confirms that the rank of the sample covariance matrix is 299.
Since we have estimated both mean vector and covariance matrix, we are able to construct optimal portfolio weights by using DFPM, Moore-Penrose inverse, Lasso, and naive equal-weighted methods. In Fig. 12, we deliver the plot with optimal portfolio weights that are obtained by DFPM, Moore-Penrose inverse, and Lasso methods. We can observe quite different behaviour of the weights. Consequently, we can see that these methods suggest completely different investment strategies. One can also note that the behavior of the optimal portfolio weights looks like a white noise process. To check this observation, we should verify several conditions. Here, we focus on the portfolio weights which were obtained by DFPM and Moore-Penrose inverse approaches. First, the expected value of the weights are the same in both methods and equal to 2.2727e−03 that is very close to 0. Second, the variance of the weights obtained by DFPM is 2.0208e+01, while the variance obtained by Moore-Penrose inverse is 1.8273e+01. So, we can observe that both variances are finite and the variance of the weights obtained by using Moore-Penrose inverse is slightly smaller than the variance obtained via DFMP. Third, white noise process is assumed to be uncorrelated. To verify this assumption, we conducted a Ljung-Box Q-test on the optimal portfolio weights. In both cases, this test indicated that there is not enough evidence to reject the null hypothesis of no autocorrelation between the weights. Fourth, we conducted an Engle's ARCH test on the variance of the weights. This test reported that the null hypothesis of no ARCH effects cannot be rejected in both methods. Thus, we can conclude that optimal portfolio weights that are obtained by DFPM and Moore-Penrose inverse approaches can be modelled by a white noise process.
In Table 3, we present the norm of the optimal weights, variance of portfolio, and Sharpe ratio that are obtained by DFPM, Moore-Penrose inverse, Lasso, and naive equal-weighted methods. We can observe the highest norm for the weights for the DFPM method, while the lowest norm is obtained for the naive equal-weighted strategy. However, DFPM delivers the smallest variance of the portfolio. Let us note that a quite similar behaviour is observed in the numerical study. In our comparison, we evaluate the Sharpe ratio that is used to help an investor to understand the expected rate of an investment in comparison with its risk (variance of portfolio). According to Table 3, the highest Sharpe ratio is obtained by the DFPM approach. It is remarkable that Sharpe ratio of the portfolio which is obtained via naive equal-weighted strategy is negative, i.e. nobody should invest in this portfolio.

Summary
Modern portfolio theory plays an important role in economics and suggests an investment strategy that give the lowest risk for a given level of expected return. If covariance matrix of the asset returns is positive definite, then Markowitz' optimization problem has unique solution that depends on the mean vector and covariance matrix of the asset returns. In practice, both mean vector and covariance matrix need to be esti- mated. Hence, one can get an estimator of the covariance matrix which is singular and, therefore, optimization problem will not have a unique solution. In our paper, we delivered a new iterative approach (DFPM) that solves the constrained and rankdeficient portfolio selection problem approximately. The method is based on using symplectic solvers for a damped dynamical system that solves the optimization problem and the solution is generally different from the least norm Moore-Penrose, Lasso, and naive equal-weighted solutions. We showed how to determine the optimal time step and damping for symplectic Euler that give fast convergence using only matrix-vector multiplications in each iteration step. In the numerical study we examined DFPM and compared it with solutions that are based on the Moore-Penrose inverse, Lasso, and naive equal-weighted methods. The results are compared for several values of the portfolio size and the rank of the covariance matrix. We observed that iterative and analytical approaches deliver quite different investment strategies. We also found that the norm of the weights in DFPM approach is higher than the ones obtained from the Moore-Penrose inverse, Lasso, and naive equal-weighted methods in almost all considered cases, while the variance of portfolio is always smaller in all considered cases.
In the empirical study, we analyzed weekly S&P 500 logarithmic returns of 440 stocks with 300 observations. It is shown that DFPM approach guarantees smaller variance of portfolio return than the solutions based on the Moore-Penrose inverse, Lasso, and naive equal-weighted methods. We also observed that optimal portfolio weights that are obtained by DFPM and Moore-Penrose inverse methods can be modelled by white noise process.  The comparison is done for q = 1