Statistical properties of estimators for the log-optimal portfolio

The best constant re-balanced portfolio represents the standard estimator for the log-optimal portfolio. It is shown that a quadratic approximation of log-returns works very well on a daily basis and a mean-variance estimator is proposed as an alternative to the best constant re-balanced portfolio. It can easily be computed and the numerical algorithm is very fast even if the number of dimensions is high. Some small-sample and the basic large-sample properties of the estimators are derived. The asymptotic results can be used for constructing hypothesis tests and for computing confidence regions. For this purpose, one should apply a finite-sample correction, which substantially improves the large-sample approximation. However, it is shown that the impact of estimation errors concerning the expected asset returns is serious. The given results confirm a general rule, which has become folklore during the last decades, namely that portfolio optimization typically fails on estimating expected asset returns.

and Samuelson 1974), it cannot be denied that the LOP has a number of nice and beautiful properties. For example, it is asymptotically optimal among all portfolios that share the same constraints on the portfolio weights (Cover and Thomas 1991, Chapter 15). Moreover, the LOP can be considered a discrete-time approximation of the GOP, which serves as a numéraire portfolio and thus plays a major role in financial mathematics (Karatzas and Kardaras 2007;Platen and Heath 2006). The GOP provides a link between financial mathematics, neoclassical finance, and financial econometrics (Frahm 2016). Hence, the LOP is of particular interest for a variety of reasons.
In this work, the statistical properties of LOP estimators are investigated. To the best of my knowledge, this is not done so far in the literature. We will consider the standard estimator for the LOP, i.e., the best constant re-balanced portfolio (BCRP), and the mean-variance estimator (MVE), which is based on a quadratic approximation of logreturns. The question of whether or not the BCRP or the MVE outperforms any other investment strategy is not discussed in this work. This seems to be well-investigated in the literature. In particular, the BCRP and the MVE are not compared with one another in order to clarify whether maximizing the logarithmic utility or the mean-variance objective function is more preferable (Hakansson 1971). Instead, the MVE is used only to approximate the BCRP. Correspondingly, the mean-variance optimal portfolio (MVOP) does not end in itself. Here, it just represents an approximation of the LOP.
The main conclusions of this work are as follows: (i) The MVE provides a very good approximation of the BCRP if re-balancing takes place on a daily basis. The numerical implementation of the MVE is quite easy and the corresponding algorithm is very fast even if the number of dimensions is high. (ii) One typically overestimates the expected out-of-sample log-return on the BCRP and even the expected log-return on the LOP. Similar statements hold true for the expected out-of-sample performance of the MVE and the performance of the MVOP. (iii) The BCRP exists and is unique under mild regularity conditions. Moreover, it is strongly consistent, which holds true also for the expected out-of-sample logreturn on the BCRP and its in-sample average log-return. Similar results are obtained for the MVE. (iv) Although both the BCRP and the MVE are affected by short-selling constraints, they are √ n -consistent. The asymptotic results can be used in order to construct hypothesis tests and to compute confidence regions.
(v) Due to the constraints on the portfolio weights, the asymptotic results are inaccurate in most practical applications. Nonetheless, a finite-sample correction exists. It substantially improves the large-sample approximation of the MVE (and thus of the BCRP). (vi) However, the impact of estimation risk that comes from estimating expected asset returns is tremendous in most real-life situations. This problem is so serious that estimating the LOP becomes a futile endeavour if we have no prediction power.
The rest of this work is organized as follows: In Sect. 2 the basic assumptions are made and the mathematical notation is explained. Section 3 contains some elementary results and provides a simple characterization of the LOP. In Sect. 4 the small-sample and large-sample properties of the BCRP are derived, which includes its existence, uniqueness, and consistency. That section contains also the asymptotic distribution of the BCRP. The reader can find the corresponding results for the MVE in Sect. 5. In Sect. 6 some computational issues that are related to the BCRP are discussed and the finite-sample correction for the MVE is demonstrated. Section 7 concludes this work. Finally, the "Appendix" contains an important but quite tedious derivation.

Basic assumptions and notation
Throughout this work, N denotes the set of positive integers, i.e., N := 1, 2, . . . , and the symbol "log" stands for the natural logarithm. The symbol 0 denotes a vector of zeros and 1 is a vector of ones. The dimensions of 0 and 1 should always be clear from the context. Any tuple x = (x 1 , x 2 , . . . , x d ) ∈ R d is understood to be a column vector and x = x 1 x 2 . . . x d is the transpose of x. It is implicitly assumed that we have an underlying probability space , A, P , where is some state space, A is a σ -algebra on , and P is a probability measure on A, which is often referred to as the physical or real-world probability measure in financial mathematics (Frahm 2016). A random quantity is a (measurable) real-valued function on . According to probability theory, two random quantities are considered identical if and only if they coincide with probability 1. Analogously, any statement about a random quantity is meant to be true almost surely (a.s.). Hence, we can drop the additional remarks "P(·) = 1" and "a.s." for convenience. For example, if X is a random variable, "X = x" with x ∈ R means that P(X = x) = 1 and if Y is a random variable, too, then "X > Y " means that X is greater than Y with probability 1, etc. Further, if X n n∈N is a random sequence, "X n → x" means that X n converges a.s. to x as n tends to infinity. Thus, we may drop also the notation "n → ∞".
Consider an asset universe with one riskless asset and N ∈ N risky assets. It is assumed that the assets are infinitely divisible and any market frictions are ignored. Let S t = S 0t , S 1t , . . . , S N t be the vector of asset prices at time t = 0, 1, . . . , where S 0t denotes the price of the riskless asset. The unit of time is one trading day. It is assumed that S 0t = 1 for t = 0, 1, . . . and that S i0 = 1 for i = 1, 2, . . . , N . In the following, each statement that contains the index i or t is meant to be true for all i and t that are appropriate in the given context. The price process S t t=0,1,... shall be positive. The time-index set is always 0, 1, . . . and thus the subscript in "{·} t=0,1,... " will be omitted for notational convenience.
Let X t := S t /S t−1 be the vector of price relatives after the trading day t, where the division of S t by S t−1 is understood to be componentwise. Any capital appreciation for Asset i during Day t, e.g., interest or dividend income, is considered part of the asset price S it . The portfolio weights of the risky assets are denoted by w 1 , w 2 , . . . , w N , whereas w 0 is the weight of the riskless asset. Hence, w = (w 0 , w 1 , . . . , w N ) ∈ R N +1 is a portfolio that consists of the riskless asset and N risky assets. Each single asset is considered a portfolio, i.e., a canonical vector in R N +1 . In order to distinguish the weights of the risky assets, w 1 , w 2 , . . . , w N , from the weight of the riskless asset, w 0 , the notationw = (w 1 , w 2 , . . . , w N ) ∈ R N is used. This means that w = (w 0 ,w).

G. Frahm
Analogously,X t indicates the risky part of X t , i.e., we have that X t = (1,X t ). Finally, the return on Asset i after Day t is given by R it := X it − 1 and R t =X t − 1 = (R 1t , R 2t , . . . , R N t ) denotes the vector of risky asset returns. Since we assume that S 0t = 1 for t = 0, 1, . . ., the risk-free interest rate is supposed to be zero. This assumption is made without loss of generality, which will be explained below.
Although the following terms will be defined later on, their notation is used throughout this work and so it shall be clarified beforehand: The symbol w * = (w * 0 , w * 1 , . . . , w * N ) denotes the LOP, whereas w = (w 0 , w 1 , . . . , w N ) is the MVOP. Note that the former superscript, " * ," has 6 spikes, whereas the latter, " ," consists of 5 spikes. This shall symbolize a key observation of this work, namely that the LOP and the MVOP are almost indistinguishable in most practical applications, which hopefully does not hold for the symbols themselves. The symbolsw * = (w * 1 , w * 2 , . . . , w * N ) and w = (w 1 , w 2 , . . . , w N ) denote the "risky parts" of w * and w , respectively. Further, w i is the portfolio weight of Asset i. By contrast, w n is an estimator for the portfolio w, where n ∈ N is the number of observations. 1 Consequently, w in is the estimator for the portfolio weight of Asset i.
At the end of each trading day, the investor re-balances his portfolio according to a constant vector of portfolio weights w satisfying the budget constraint 1 w = 1. The portfolio value at Day n ∈ N amounts to V wn := n t=1 w X t . The investment capital might vanish during some trading day if we do not pose any additional constraints on the portfolio weights. In fact, if we allow the investor to enter short positions, the probability of going bankrupt, i.e., V wn ≤ 0, is positive unless we make some additional assumption about X t , but this is omitted in this work. Hence, the portfolio w must be an element of the (unit) simplex The assumption that w ∈ S is crucial. It guarantees that w X t > 0 so that V wn > 0 for all n ∈ N. Hence, the log-value process log V wn = n t=1 log w X t exists for all w ∈ S and n ∈ N, where log w X t is referred to as the log-return on the portfolio after Day t. The short-selling constraints are indispensable because otherwise log w X t might not be defined.
As already mentioned above, we can assume without loss of generality that the risk-free interest rate is zero: Let r ≡ R 0 > − 1 be the risk-free interest rate and Y t the vector of relative prices with Y 0t = 1 + r . Then we could use the discounted relativeprice vector X t := Y t /(1 + r ). The log-return on any portfolio w ∈ S amounts to log(w Y t ) = log(1 + r ) + log(w X t ). We can ignore the first term, log(1 + r ), provided we are interested only in maximizing the expected log-return on the portfolio w, which is our main focus here. Throughout this work, we suppose that X t contains the discounted relative prices but omit the word "discounted" for convenience. Now, the following basic assumptions are made: A1. The relative-price process X t is strictly stationary, A2. the expected value of log w X t is finite for all w ∈ S, and A3. w 1 X t and w 2 X t do not coincide for any w 1 , w 2 ∈ R N +1 with w 1 = w 2 .
A1 is a fundamental assumption in econometrics and implies that the elements of X t are identically distributed, but it is not assumed that they are serially independent. Further, A2 guarantees that we can work with the quantity E(log w X t ) and A3 requires the relative prices to span (0, ∞) N +1 . It follows that no risky asset can be replicated by a convex combination of other assets. More precisely, since P w 1 X t = w 2 X t = 1 for all w 1 , w 2 ∈ R N +1 with w 1 = w 2 , it holds that P w X t = 0 = 1 for all w ∈ R N +1 with w = 0 and vice versa. Hence, it cannot happen thatw R t = c ∈ R for anyw ∈ R N withw = 0. Otherwise, we could define w 0 := − (c + 1 w) so that To sum up, it holds that P w X t = 0 = 1 ⇔ P w R t = c = 1 withw = 0. How can we interpret this basic condition from an economical point of view? Suppose that w R t = c = 0. Then we have an arbitrage opportunity, which is not possible if the market is in equilibrium and the market participants are rational (Frahm 2018). By contrast, in the case thatw R t = 0, at least one risky asset is redundant. For example, let us assume thatw N = 0 without loss of generality. Then we can construct the portfoliõ v := −w/w N of risky assets so thatṽ R t = 0 withṽ N = − 1. Now, we are able to replicate Asset N by a linear combination of all other assets, including the riskless asset, by using the portfolio (1 − 1 (ṽ 1 ,ṽ 2 , . . . ,ṽ N −1 ),ṽ 1 ,ṽ 2 , . . . ,ṽ N −1 ) ∈ R N , which satisfies the budget constraint. Hence, we can ignore Asset N and reduce the asset universe to N − 1 risky assets. Of course, also the converse is true. That is, if we are able to replicate a risky asset by linear combination of all other assets, we must have thatw R t = 0 withw = 0.
Throughout this work, it is assumed that the dimension reduction has already been made in advance. Note also that without the dimension reduction the covariance matrix of R t would not be positive definite becausew R t = c implies thatw Var R t w = Var w R t = 0 forw = 0 and vice versa. Hence, A3 is indispensable also for statistical reasons. In fact, this assumption plays a major role in portfolio theory, where it is typically required that Var(R t ) > 0, i.e., that the covariance matrix of the risky asset returns is positive definite.

The log-optimal portfolio
Definition 1 A log-optimal portfolio is a portfolio w * ∈ S that maximizes the expected log-return, i.e., The LOP is often associated with the "Kelly criterion" (Kelly 1956). Its asymptotic optimality properties are elaborated by Algoet and Cover (1988); Bell and Cover (1980) as well as Breiman (1961). 2 Although it was originally studied in information 6 G. Frahm theory, it became of growing interest to the finance community over the last decades. As already mentioned in Sect. 1, the LOP is sometimes referred to as the GOP. However, the GOP is typically studied in a continuous-time framework, whereas the LOP is based on a discrete-time setting. 3 The Lagrange function of the optimization problem given by Definition 1 is with κ = (κ 0 , κ 1 , . . . , κ N ) ≥ 0 and λ ∈ R. The corresponding Karush-Kuhn-Tucker (KKT) conditions are quite nice (Cover and Thomas 1991, Theorem 15.2.1). The following theorem establishes also the existence and uniqueness of the LOP.

Theorem 1 The LOP exists and is unique. It is characterized by
Proof The simplex S is compact and convex. Further, the random variables v X t and w X t do not coincide for any v, w ∈ S with v = w. Hence, since the natural logarithm is strictly concave, for each 0 < π < 1 and v, w ∈ S with v = w it holds that This means that the objective function w → E log w X t is strictly concave, which implies that w * exists and is unique. Further, the partial difference quotient increases monotonically and tends to Hence, we have that 1, w * 1 = 1, and w * κ = 0, we conclude that λ = 1. Thus, we obtain which leads to the given expression in the theorem.
The portfolio weight w * i is bounded by S if and only if E(X it /w * X t ) < 1, whereas the partial derivative equals 1 whenever w * i > 0. If all (optimal) portfolio weights are positive, the solution of the optimization problem, w * , lies in the interior of S, which is denoted by S o . Since we have that the expected log-return stays constant after a local change of the portfolio weights. 4 This could be true even on the boundary of S, ∂S, as long as E(X t /w * X t ) = 1. In this case, all portfolio weights are still unbounded by S. By contrast, if (at least) one partial derivative is lower than 1, some portfolio weight must be zero, i.e., w * ∈ ∂S.
Then the expected log-return decreases after a local change of a portfolio weight that is bounded by S. These basic considerations will be important later on when deriving the asymptotic properties of the LOP estimators.

The best constant re-balanced portfolio
Definition 2 A best constant re-balanced portfolio is a portfolio w * n ∈ S that maximizes the in-sample average log-return, i.e., The relative prices contained in X 1 , X 2 , . . . , X n , except for the relative price 1 of the riskless asset, are nondegenerate random variables and so, in general, 1 n n t=1 log w X t is a nondegenerate random variable, too. This means that for each element of the state space, ω ∈ , and so for each realization of X 1 , X 2 , . . . , X n , we maximize 1 n n t=1 log w X t (ω), which leads us to a particular realization, w * n (ω), of a BCRP w * n , which thus represents a random vector. A BCRP can be considered an empirical version of the LOP. It is said to be the "best" constant re-balanced portfolio because w * n maximizes the final value after Period n ∈ N, i.e., V wn , over all constant re-balanced portfolios w ∈ S. However, the maximization is done in hindsight, i.e., after all asset prices have been revealed to the investor, and thus the BCRP is unknown in advance.

Existence and uniqueness
Let n ∈ N be the number of price observations. The following additional assumption is made for statistical reasons: A4. The sample of price relatives, i.e., X = X 1 X 2 · · · X n , has rank N + 1.
A4 can be considered an empirical version of A3, which implies that no risky asset is redundant. It requires that the number of observations exceeds the number of risky assets, i.e., n > N .

Theorem 2 The BCRP exists and is unique. It is characterized by
log w X t is a concave objective function, Eq. 1 represents a convex optimization problem and, because the simplex is compact and convex, the BCRP exists. The rank of X is full so that v, w ∈ R N +1 must lead to different value processes unless v = w. That is, the given objective function is strictly concave, which implies that the BCRP is unique. The rest of the proof follows by the arguments that are used in the proof of Theorem 1.
A simple numerical algorithm for computing the BCRP is developed by Cover (1984). We will come back to this point in Sect. 6.

Finite-sample bias
Let w n be a portfolio that is based only on the price observations that have been made up to Day n. A standard assumption of portfolio theory is that w n is stochastically independent of R n+1 or, equivalently, of X n+1 (Frahm 2015). If w n would depend on R n+1 , the decision of the investor at time n would be influenced by some asset returns at time n + 1 or, vice versa, his financial transactions would have an impact on forthcoming asset prices. In this case he would be able to predict the future price evolution on the basis of past asset prices. This is typically ruled out in finance theory and, especially, in portfolio theory. Put another way, we assume that the investor has no prediction power. This basic assumption will be elaborated also in Sect. 5.1.2.
Hence, we can make the following additional assumptions: A5. The BCRP w * n is stochastically independent of X n+1 . A6. The BCRP does not coincide with the LOP, i.e., P w * n = w * = 1. A6 just states that w * n = w * holds only with probability lower than 1. This assumption is trivial, since otherwise we would not have any estimation risk at all.
Let X be any positive random vector that has the same distribution as the vectors X 1 , X 2 , . . . of price relatives and define the following quantities: Hence, by substituting w with w * or w * n , respectively, we can see that • ϕ(w * ) is the expected log-return on the LOP, • ϕ n (w * n ) represents the expected in-sample average log-return on the BCRP, and • ϕ n+1 (w * n ) denotes the expected out-of-sample log-return on the BCRP. The investor cannot achieve ϕ(w * ) because the LOP is unknown to him. Instead, he maximizes the average log-return 1 n n t=1 log w * n X t in order to compute the BCRP. At the end of Day n he applies the BCRP and one day later he obtains the log-return log w * n X n+1 . For this reason, ϕ n+1 (w * n ) may be considered the basic performance measure for w * n . The following theorem describes why the BCRP might lead to wrong conclusions in real-life situations, especially if the number of observations, n, is small.
Proof By definition, w * is the element of S that maximizes the expected log-return. Moreover, due to A5 and A6, and the fact that w * is unique, we have that with probability 1 but E log w X n+1 < E log w * X n+1 with positive probability. From the Law of Total Expectation and the stationarity of X t we conclude that Moreover, since w * n is unique and does not coincide with w * , we have that This means that Hence, the expected out-of-sample log-return on the BCRP, ϕ n+1 (w * n ), is always lower than the expected log-return on the LOP, i.e., ϕ(w * ). Nonetheless, the investor typically overestimates not only ϕ n+1 (w * n ) but even ϕ(w * ) when computing ϕ n (w * n ) by maximizing 1 n n t=1 log w X t . This phenomenon is not limited to the BCRP. It is a general problem of portfolio optimization (see, e.g., Frahm 2015; Frahm and Memmel 2010;Kan and Zhou 2007;Memmel 2004).

Consistency
For the subsequent analysis it is convenient to define the function x → f w (x) := log w x for all w ∈ S and x > 0 as well as the functions for all n ∈ N. We make the following statistical assumption, which is often used in the theory of empirical processes (see, e.g., van der Vaart 1998, Chapter 19): Hence, the Strong Law of Large Numbers shall hold true for the sequence M n (w) uniformly in S. For example, according to van der Vaart (1998, p. 46), it is sufficient to guarantee that (i) w stems from a compact set, (ii) the elements of F are continuous for every x > 0, and (iii) they are dominated by an integrable function, provided X 1 , X 2 , . . . are serially independent. 6 The first two properties are clearly satisfied in our context. In order to see that the third property is satisfied, too, note that for all x > 0, where (log x) − denotes the negative part of the vector log x. Hence, the function The maximum of two nonnegative and integrable random variables is also integrable. Thus, we conclude that E g(X t ) < ∞, i.e., the dominating function g is integrable.
At the beginning of Sect. 2 it has already been mentioned that each statement that refers to a random quantity is meant to be true with probability 1. The next theorem asserts that the BCRP is strongly consistent for the LOP. This means that w * n converges almost surely to w * , which is simply denoted by "w * n → w * ," i.e., without the additional remark "a.s.," for convenience.
Theorem 4 w * n → w * Proof The BCRP w * n represents an M-estimator, whose criterion functions are given by M and M n . Let ε be any positive real number and P ε := w ∈ S : w −w * = ε . 7 Since M is strictly concave, there exists some δ > 0 such that M(w * ) − M(w) > δ for all w ∈ P ε . Now, since F is Glivenko-Cantelli, we can find a sufficiently large number m ∈ N such that, for all natural numbers n ≥ m, Since M n is strictly concave, too, we have that w * n − w * < ε for all n ≥ m. This holds true for every ε > 0 and thus w * n → w * .
The next theorem asserts that the expected out-of-sample log-return on the BCRP converges to the expected log-return on the LOP.
Proof Theorem 4 and the Continuous Mapping Theorem reveal that log w * n x → log w * x for all x > 0. Further, we already know that there exists an integrable function x → g(x) such that | f w (x)| ≤ g(x) for all w ∈ S and x > 0. Hence, by the Dominated Convergence Theorem, we obtain Finally, also the in-sample average log-return on the BCRP converges to the expected log-return on the LOP as the number of observations grows to infinity.
The former is an immediate consequence of A7. Moreover, the Dominated Convergence Theorem tells us that M(w n ) → M(w) for every sequence {w n } with w n ∈ S such that w n → w ∈ S. This means that M is continuous at each w ∈ S. Theorem 4 and the Continuous Mapping Theorem complete the proof.

Asymptotic distribution
In this section, the asymptotic distribution of √ n w * n − w * is established. This can be done for all dimensions of w * that are not bounded by S, i.e., E X it /w * X t = 1. As already explained at the end of Sect. 3, each other component of w * is bounded by the simplex. If w * i = 0 represents such a component, i.e., E X it /w * X t < 1, it is well-known that i.e., w * in is superconsistent. However, not all components of the LOP can be affected by the given constraints on the portfolio weights. Indeed, we must have that E X it /w * X t = 1 for at least one asset because otherwise the KKT conditions given by Theorem 1 cannot be satisfied. Thus, we can reduce the asset universe until there is no portfolio weight that is bounded by S. The riskless asset need not be part of the reduced asset universe. However, in order to avoid the trivial solution w * n = 1, there should be at least two remaining assets in the universe.
Hence, we assume that the given asset universe has been reduced such that E X t /w * X t = 1. This guarantees that This means that the function M can be locally approximated at w * by From the Monotone Convergence Theorem we conclude that the Hessian is given by The following assumption implies that the Hessian is finite. Finally, A3 guarantees that ∇ 2 M(w * ) is negative definite.
A8. The second moments of X t /(w * X t ) are finite.
Further, we have to make the following assumptions: A9. The function f w can be locally approximated at w * by where the process r (X t ; w) is stochastically equicontinuous. This means that for every > 0 and η > 0 there exists a neighborhood U of w * in the simplex S such that lim sup where P * is an outer measure associated with P. A10. We have that A9 is a basic regularity condition, which guarantees that the remainder r (X t , w) of the linear approximation becomes negligible as n → ∞. To be more precise, it requires that Here, " " denotes convergence in distribution.
if the sample size, n, is large and w is close to w * . 9 Further, A10 says that the process X t /w * X t satisfies the Central Limit Theorem. In particular, if the elements of X t are serially independent, we obtain the asymptotic covariance matrix Nonetheless, we could take also any form of serial dependence into account, provided the Central Limit Theorem expressed by A10 is satisfied. There exist many strong mixing conditions that guarantee that this theorem holds true for the process X t /w * X t (see, e.g., Bradley 2005).
Suppose that ⊆ R d is any parameter set and let θ ∈ be the "true" parameter. The tangent cone at θ is the set that we obtain after centering at θ , blowing it up by some factor τ > 0, and taking the set limit for τ → ∞ (Geyer 1994(Geyer , p. 1993. In order to study the asymptotic behavior of a sequence {θ n } of global optimizers that converges to θ it is crucial to guarantee that the parameter set is Chernoff regular (Geyer 1994 In our context, the parameter θ corresponds to w * ∈ S, which represents the global solution of the convex optimization problem expressed by Definition 1. The simplex S is Chernoff regular and so let T S (w * ) := lim τ →∞ τ S − w * be the tangent cone of the simplex at w * . 10 Consider any random vector Y ∼ N 0, A and define the function The (unique) maximizer of Y is denoted by The following theorem describes the asymptotic behavior of the BCRP.

Theorem 7 We have that
Proof The theorem asserts that √ n w * n − w * ζ * , which is an immediate consequence of Theorem 4.4 in Geyer (1994).
Hence, if the sample size is large, √ n w * n − w * behaves essentially like the solution of a relatively simple quadratic optimization problem. In the case in which the elements of X t are serially independent, we obtain The following corollary establishes the long-run distribution of the log-return on the BCRP relative to the log-return on the LOP. i.e., The rest of the proof follows from Theorem 4.4 in Geyer (1994).
This completes our analysis of the BCRP. In the next section we focus on the MVE and derive its corresponding statistical properties.

The mean-variance estimator
Consider some portfolio w ∈ S and letw = (w 1 , w 2 , . . . , w N ) be the "risky part" of that portfolio. The return on Asset i after Day t is given by R it = X it − 1 and so the return on w amounts to R wt =w R t , where R t = (R 1 , R 2 , . . . , R N ) denotes the vector of risky asset returns. 11 The assumptions A1 to A10 shall still hold true. Now, we make the following additional assumption: B1. The second moments of R t are finite.
Let R be any random vector that has the same distribution as R 1 , R 2 , . . . . Define μ := E(R) and := E(R R ). Note that the matrix contains the second noncentral moments of the risky asset returns and thus it is not the covariance matrix of R. We already know that A3 guarantees that there cannot be anyw ∈ R N withw = 0 such thatw R = c ∈ R, i.e., is positive definite.

G. Frahm
Now, we may apply the quadratic approximation log(1 + r ) ≈ r − 1 2 r 2 and come to the conclusion that This section is build upon the observation that this approximation is very good in most practical applications. Hence, instead of maximizing the expected log-return, we can simply maximize the objective function w →w μ − 1 2w w. 12 In the following, this objective function is called "mean-variance" although contains the second noncentral moments of R and so it does not coincide with the covariance matrix Var(R).
Definition 3 A mean-variance optimal portfolio is a portfolio w ∈ S that maximizes the mean-variance objective function, i.e., w ∈ arg max w∈Sw μ − 1 2w w .
Some important remarks may be appropriate at this point: • The vector w is called mean-variance optimal, although is not a covariance matrix. However, in most practical applications, is close to Var(R) whenever R is a vector of daily asset returns. • We focus on the feasible set S only because w serves as an approximation of the LOP. However, in general a mean-variance optimal portfolio need not be restricted to S. • Under general (but quite technical) regularity conditions, the MVOP can be considered an approximation of the GOP (Karatzas and Kardaras 2007). Nonetheless, due to the reasons explained in Sect. 3, we should refrain from calling w "GOP." Now, the Lagrange function of the optimization problem expressed by Definition 3 is The following theorem is analogous to Theorem 2.

Theorem 8 The MVOP exists and is unique. It is characterized by
Proof The objective functionw →w μ − 1 2w w is strictly concave and the given set of constraints on the portfolio weights w 1 , w 2 , . . . , w N , i.e.,w ≥ 0 and 1 w ≤ 1, is closed and convex. Hence, the "risky part" of w , i.e.,w , exists and is unique, which means that w exists and is unique, too. Thus, we must have that The next corollary shows how to identify the components of w that are bounded by S. This will be helpful later on. R t R t be the moment estimators for μ and . Now, we are ready to define the MVE for w , which serves also as an estimator for the LOP w * .

Corollary 2 The number λ in
Definition 4 A mean-variance estimator for w is a portfolio w n ∈ S that maximizes the in-sample mean-variance objective function, i.e., w n ∈ arg max w∈Sw μ n − 1 2w nw .

Existence and uniqueness
Let R = R 1 R 2 . . . R n be the sample of risky asset returns. A4 implies that we cannot find anyw ∈ R N withw = 0 such that R w = 0. Hence, we have that for allw ∈ R N withw = 0, which means that n is positive definite.
The following corollary is a straightforward consequence of Theorem 8 and thus its proof can be skipped.

Corollary 3 The MVE exists and is unique. It is characterized by
Numerical procedures for solving quadratic optimization problems exist in abundance and so it is easy to compute w n even if the number of dimensions is high. Two points, which are discussed in more detail in Sect. 6, are worth emphasizing: (i) The estimates w * in and w in are indistinguishable in most real-life situations. 13 Put another way, the MVE leads to a very good approximation of the BCRP. (ii) Cover's algorithm (1984) for w * n is slow compared to quadratic optimization algorithms for w n . In particular, this holds true in the high-dimensional case.

Finite-sample bias
Let w n be any portfolio that is constructed on the basis of the asset returns R 1 , R 2 , . . . , R n . We know that the quantityw n R n+1 − 1 2 w n R n+1 2 approximates the out-of-sample log-return on w n and thus we call the expected out-of-sample performance of w n . As already mentioned before, it is reasonable to presume that w n is stochastically independent of R n+1 . Otherwise, the investment decision at time t would depend on some asset returns that occur one day later, which is usually considered implausible in finance theory. Thus, we obtain the conditional expectation which can be viewed as the out-of-sample performance of w n . Correspondingly, due to the Law of Total Expectation, its expected out-of-sample performance is The latter expectation is a basic performance measure in portfolio optimization (see, e.g., Frahm 2015; Kan and Zhou 2007;Markowitz and Usmen 2003). 14 Hence, as already mentioned before, it is an implicit assumption of portfolio theory that w n is stochastically independent of R n+1 . If w n ≡ w is a fixed portfolio, we have that φ n+1 (w) =w μ − 1 2w w. In this case we may drop the prefix "expected out-of-sample" and just say that φ n+1 (w) is the performance of w. Further, then we can simply write φ(w) instead of φ n+1 (w). In particular, represents the performance of the MVOP. Hence, the following assumptions, which are analogous to A5 and A6, are made: B2. The MVE w n is stochastically independent of R n+1 . B3. The MVE does not coincide with the MVOP, i.e., P(w n = w ) = 1.
Due to B2 the expected out-of-sample performance of the MVE amounts to Finally,w n μ n − 1 2w n nwn represents the in-sample performance of the portfolio w n and thus φ n (w n ) := E w n μ n − 1 2w n nw n is the expected in-sample performance of the MVE.
The following theorem is similar to Theorem 3.
Proof By definition, w is the portfolio that maximizes the performance. Due to B2 and B3, we conclude that Moreover, since w n is unique and does not coincide with w , we have that P w n μ n − 1 2w n nw n ≥w μ n − 1 2w nw = 1 and P w n μ n − 1 2w n nw n >w μ n − 1 2w Theorem 9 shows that we still suffer from the same problems that we have already found for the BCRP. This means that the in-sample performance of the MVE typically overestimates its expected out-of-sample performance and even the performance of the MVOP.

Consistency
The next assumption requires that R t and R t R t obey the Strong Law of Large Numbers. This holds true under very mild regularity conditions. If R 1 , R 2 , . . . are serially independent, B1 is already sufficient. However, there exist much weaker mixing conditions, which guarantee that the Strong Law of Large Numbers is satisfied both for R t and for R t R t . These mixing conditions are typically discussed in ergodic theory (see, e.g., Davidson 1994).
B4. The estimators μ n and n are strongly consistent for μ and , i.e., μ n → μ and n → .
Theorem 10 w n → w Proof Note that w n = arg max w∈Sw μ n − 1 2w nw represents a function of μ n and n . Since S is convex, this function is continuous in μ n and n . From B4 and the Continuous Mapping Theorem it follows that w n → w .
The next theorem is analogous to Theorem 5.
w is continuous in w ∈ S and the set S is compact. From the Extreme Value Theorem we conclude that it has a minimum, a, and a maximum b. Hence, w → max |a|, |b| is a dominating function and it is clearly integrable. We already know that w n → w and from the Dominated Convergence Theorem it follows that Moreover, analogous to Theorem 6, the Continuous Mapping Theorem immediately implies thatw i.e., the in-sample performance of the MVE converges to the performance of the MVOP.

Asymptotic distribution
Now, the asymptotic distribution of √ n (w n − w ) is derived. If some portfolio weight w i is bounded by S it must be zero and the associated MVE is superconsistent, i.e., √ n w in p → 0. Hence, in order to derive the asymptotic distribution of √ n (w n − w ), we must guarantee that no component of the MVOP w is bounded by S. According to Corollary 2, this holds true if and only if μ − w = 0, i.e.,w = −1 μ. However, in practical situations it often happens that the weight of the riskless asset, w 0 , is bounded by S, which means that the Lagrange multiplier λ in Theorem 8 is positive. In this case, we must abandon the riskless asset from our asset universe and focus on the risky assets. Then the MVOP is simply characterized byw ∈ S such that the ith with λ > 0. Thus, in the case in which the riskless asset has been removed, we assume that the remaining asset universe is such that μ − w = λ1 for any λ > 0.
Consider the family F = f w w∈S with for all w ∈ S and r ∈ R N . Further, define the functions It is obvious that the function F can be locally approximated at w by where is positive definite. The next regularity conditions are analogous to A9 and A10: B5. The function f w can be locally approximated at w by where the process r (R t ;w) is stochastically equicontinuous. B6. We have that Once again, B5 guarantees that the remainder r (X t , w) of the linear approximation becomes negligible as n → ∞. Further, B6 requires the joint asymptotic normality of the given estimators for μ and after the usual standardization. Since μ n and n represent the moment estimators of μ and , basically it states that R t − R t R tw should satisfy the Central Limit Theorem. 15 The latter assumption indicates that we can decompose the estimation risk into two parts: (i) √ n μ n − μ represents the estimation risk that can be attributed to μ, whereas (ii) √ n n − w stands for the estimation risk that is related to .
Note that such a risk decomposition cannot be accomplished for the BCRP. In some cases it is possible to calculate the asymptotic covariance matrix B in B6. For example, if R 1 , R 2 , . . . are serially independent and normally distributed, we have that where = − μμ denotes the covariance matrix of R. 16 More precisely, we can apply the decomposition quantifies the estimation risk that is associated with μ and measures the estimation risk related to . Similar results can be obtained if we assume that R has an elliptical distribution possessing heavy tails and tail dependence. Alternatively, we could apply a (block) bootstrap (see, e.g., Politis 2003) in order to approximate B, or even B μ and B , without making any parametric assumption. Consider any random vector Z ∼ N 0, B . Now, we may define with ς = (ς 0 , ς 1 , . . . , ς N ) andς = (ς 1 , ς 2 , . . . , ς N ). The (unique) maximizer of Z is given by The following theorem clarifies the asymptotic behavior of the MVE. This result follows by the same arguments that were used for Theorem 7 and so the proof can be skipped.

Theorem 12 We have that
In the case in which the riskless asset has been removed from the asset universe, we may consider the (unique) maximizer The derivation of B can be found in the "Appendix". 6 Some practical remarks 6.1 Computational issues Cover's (1984) algorithm for the BCRP is simple and works like this: (i) Choose any initial portfolio w (0) ∈ S and set k ← 0.
(ii) Update the portfolio according to and set k ← k + 1. 17 (iii) Repeat the second step until the largest component of the vector 1 n n t=1 X t w (k) X t falls below a critical threshold just above 1.
The computations made during this work are based on MATLAB. The critical threshold for the BCRP is exp 10 −6 . Further, the MOSEK optimization toolbox for MATLAB is used in order to compute the MVE, which proves to be very fast and reliable. It turns out that the BCRP, w * n , and the MVE, w n , are almost identical. However, computing w n by quadratic optimization is much faster. In order to demonstrate these statements, we can simulate n independent and identically distributed vectors of daily asset returns R 1 , R 2 , . . . , R n ∼ N μ, with Let us assume that the number of risky assets is N = 100 and the number of daily observations is n = 250. In this case, both w * 0 and w 0 are bounded by S, i.e., w * 0 = w 0 = 0. Thus, we abandon the riskless asset from the asset universe.
The numerical simulations are done 100 times. Each time Cover's algorithm forw * n and the quadratic optimizer forw n is applied. On average, Cover's algorithm needs 5.5914 s, whereas MOSEK takes only 0.0103 s. 18 The supremum norm ofw * n −w n is 0.0173. Although Cover's algorithm is much slower than the quadratic optimizer, the outcome of the latter turns out to be slightly better: The quadratic optimizer leads to an annualized average log-return of 0.4892, whereas Cover's algorithm yields only 0.4890 per year. That is, the quadratic optimizer comes even closer to the (true) BCRP than Cover's algorithm. In fact, the average log-returns produced by the quadratic optimizer are always better than those of Cover's algorithm. Hence,w n dominatesw * n in a numerical sense. Moreover, Cover's algorithm is very slow in high dimensions, whereas the quadratic optimizer works well even for N = 1000 and n = 2500, in which case the computational time forw n is still below 1 s.
There is another computational issue. For applying the asymptotic results derived in Sect. 4.2.2 we have to simulate the random vector Y ∼ N (0, A), where the covariance matrix A appears in A10. The problem is that A is singular. More precisely, we have thatw * Aw * =w * E X X (w * X ) 2 w * −w * 11 w * = E (w * X ) 2 (w * X ) 2 − 1 2 = 0, which means that A is not positive definite. Thus, we have to apply a matrix decomposition in order to simulate Y . This issue does not arise when applying the asymptotic results derived in Sect. 5.2.2, in which case we must simulate the random vector Z ∼ N (0, B). As already mentioned in Sect. 5.2.2, we can even provide a closed-form expression for B in many standard situations. The principal approach is demonstrated in the "Appendix".
To sum up, the quadratic approximation proposed at the beginning of Sect. 5 works very well and, in contrast to the BCRP, the MVE does not suffer from computational issues. For this reason, we focus onw n in the following discussion.

Statistical inference
Let us assume that the elements of R t are serially independent and normally distributed. To keep things as simple as possible, we may choose the parameterization in Eq. 3. Further, let the number of risky assets be N = 2 and the number of observations be n = 250. 19 Once again, we generate 100 samples and with each one we compute a realization ofw n . On the upper left of Fig. 1 we can see that most of the estimates are far away fromw = (0.5, 0.5). The vast majority of the estimates are boundary solutions. More precisely, we have 50 estimates that equal (0, 1) and 41 that correspond to (1, 0). The given result does not improve, essentially, if we increase the number of observations to n = 2500 and it is still sobering even for 1 million observations. By contrast, if we assume that μ was known, the estimates turn out to be much better (see the lower part of Fig. 1). In particular, there is no more estimate at the boundary of the simplex, and in the case of n = 10 6 observations the estimates are almost identical withw .
Are we able to replicate the finite-sample results by a large-sample approximation? For this purpose we could use Theorem 12 and the expressions for B μ and B presented in Sect. 5.2.2. The corresponding realizations of the synthetic estimatorw ∞ n :=w + ς / √ n are depicted in Fig. 2. The upper left of this figure indicates that there are 90 realizations outside the simplex. This is because the large-sample approximation is based on the maximizerς , which belongs to the tangent cone of S atw . Hence, the support ofw ∞ n does not correspond to S. Similarly, there are 82 realizations of w ∞ n missing in the simplex on the upper center. By contrast, the simplex on the upper right contains all 100 realizations ofw ∞ n . The picture changes essentially on the lower part of Fig. 2, where it is assumed that μ is known. In this case, we cannot find any realization ofw ∞ n outside S. Moreover, the large-sample approximation satisfyingly reproduces the finite-sample results that are depicted on the lower part of Fig. 1.
The problem is that the expected asset returns are unknown in real life. However, we can essentially improve the large-sample approximation by applying a finite-sample correction in order to guarantee that the realizations always belong to the simplex. We know from Theorem 12 that, if the sample size is large, √ n w n −w behaves essentially like the maximizer,ς , ofς →ς Z − 1 2ς ς over the tangent cone of S atw . Hence, since the sample size is not large enough, we may substituteς with The corrected version ofw ∞ n readsw n :=w +ς n / √ n , which always belongs to the simplex. 20 The constraintς ∈ √ n (S −w ) can simply be implemented, numerically, by settingς ≥ − √ nw and 1 ς = 0. In order to verify that the finite-sample correction works fine, we may compare the empirical distribution functions of 10,000 realizations of w 1n and w 1n , where μ is assumed to be unknown. We still have only N = 2 risky assets and the parameterization is the same as before (see Eq. 3). The results are given in Fig. 3. Obviously, the finite-sample correction serves its purpose. Indeed, the corrected large-sample approximation is very accurate for all sample sizes. Figure 3 reveals that most realizations ofw n are either (0, 1) or (1, 0) unless the sample size equals n = 10 6 . The LOP corresponds tow = (0.5, 0.5) and thus it is precisely in between (0, 1) and (1, 0). It seems that estimating the LOP is a mission impossible in real-life situations-at least without any prior information about μ. Table 1 contains the probability that the realization of the MVE is a single-asset G. Frahm portfolio for different numbers of assets (N = 5, 50, 100, 500, 1000) and observations (n = 250, 2500, 5000, 10,000, 10 6 ). The results are based on 1000 realizations ofw n for each combination of N and n. Note that the LOP always corresponds to the equally weighted portfolio, i.e.,w = 1/N . The table shows that, in all practical applications, the MVE proposes a single-asset portfolio with high probability although the LOP is well-diversified. It is worth emphasizing that the results would not change essentially if we substitute the MVE with the BCRP, since these estimators for the LOP are almost identical. Now, in principle, we are able to construct hypothesis tests and compute confidence regions. For example, we could try to apply a hypothesis test of the form H 0 :w =w 0 vs. H 1 :w =w 0 for anyw 0 ∈ S even in the case of N > 2. 21 However, in the light of the previous results, we may doubt that any hypothesis test will ever lead to a rejection or that a confidence region will ever be sufficiently small in real-life situations. This conclusion might appear negative to the reader, but the author fears that this is the price we sometimes have to pay in science.

Conclusion
A quadratic approximation of log-returns works very well on a daily basis. Thus, in order to find the BCRP, we may focus on the MVE, which can easily be computed. The corresponding algorithm is very fast even if the number of dimensions is high and the results are even better compared to Cover's algorithm for the BCRP. However, in most practical applications, we typically overestimate the expected out-of-sample performance of the MVE and even the performance of the MVOP. The same holds true for the expected out-of-sample log-return on the BCRP and the expected log-return on the LOP.
Both the BCRP and the MVE exist and are unique under mild regularity conditions. Moreover, they are strongly consistent. Analogously, both their out-of-sample performance measures and their in-sample performances converge to the performance of the LOP or the MVOP, respectively, as the number of observations grows to infinity. The given estimators for the LOP are even √ n -consistent. In principle, the asymptotic results derived in this work can be used for constructing hypothesis tests and for 21 Note thatw 0 is not the weight of the riskless asset but some portfolio of N risky assets. computing confidence regions, but for this purpose one should apply a finite-sample correction, which substantially improves the large-sample approximation.
However, it turns out that the impact of estimation risk concerning μ is tremendous in most real-life situations. Estimating the LOP without having any prediction power seems to be a futile undertaking. The estimators often lead to a single-asset portfolio even if the LOP corresponds to the equally weighted portfolio and thus is well-diversified. The given results confirm a general rule, which has become folklore during the last decades, namely that portfolio optimization typically fails on estimating expected asset returns.
which quantifies the estimation risk if the parameter μ was known to the investor. However, in real life the expected asset returns are unknown and so B equals the asymptotic covariance matrix of √ n μ n − μ − √ n n − w − √ n (μ n − μ)μ w − μ √ n (μ n − μ) w , which can be rewritten as By using the above arguments we conclude that Now, the reader can verify that the impact of estimating the expected asset returns is