Abstract
The best constant rebalanced portfolio represents the standard estimator for the logoptimal portfolio. It is shown that a quadratic approximation of logreturns works very well on a daily basis and a meanvariance estimator is proposed as an alternative to the best constant rebalanced portfolio. It can easily be computed and the numerical algorithm is very fast even if the number of dimensions is high. Some smallsample and the basic largesample properties of the estimators are derived. The asymptotic results can be used for constructing hypothesis tests and for computing confidence regions. For this purpose, one should apply a finitesample correction, which substantially improves the largesample approximation. However, it is shown that the impact of estimation errors concerning the expected asset returns is serious. The given results confirm a general rule, which has become folklore during the last decades, namely that portfolio optimization typically fails on estimating expected asset returns.
1 Motivation
During the last decades, the logoptimal portfolio (LOP) has become increasingly important in portfolio theory. There is a significant number of publications related to the LOP—or to the growthoptimal portfolio (GOP), which is often treated synonymously. The reader can find a huge number of articles in MacLean et al. (2011). For an overview on the subject matter see Christensen (2005). In spite of a controversial debate (Merton and Samuelson 1974), it cannot be denied that the LOP has a number of nice and beautiful properties. For example, it is asymptotically optimal among all portfolios that share the same constraints on the portfolio weights (Cover and Thomas 1991, Chapter 15). Moreover, the LOP can be considered a discretetime approximation of the GOP, which serves as a numéraire portfolio and thus plays a major role in financial mathematics (Karatzas and Kardaras 2007; Platen and Heath 2006). The GOP provides a link between financial mathematics, neoclassical finance, and financial econometrics (Frahm 2016). Hence, the LOP is of particular interest for a variety of reasons.
In this work, the statistical properties of LOP estimators are investigated. To the best of my knowledge, this is not done so far in the literature. We will consider the standard estimator for the LOP, i.e., the best constant rebalanced portfolio (BCRP), and the meanvariance estimator (MVE), which is based on a quadratic approximation of logreturns. The question of whether or not the BCRP or the MVE outperforms any other investment strategy is not discussed in this work. This seems to be wellinvestigated in the literature. In particular, the BCRP and the MVE are not compared with one another in order to clarify whether maximizing the logarithmic utility or the meanvariance objective function is more preferable (Hakansson 1971). Instead, the MVE is used only to approximate the BCRP. Correspondingly, the meanvariance optimal portfolio (MVOP) does not end in itself. Here, it just represents an approximation of the LOP.
The main conclusions of this work are as follows:

(i)
The MVE provides a very good approximation of the BCRP if rebalancing takes place on a daily basis. The numerical implementation of the MVE is quite easy and the corresponding algorithm is very fast even if the number of dimensions is high.

(ii)
One typically overestimates the expected outofsample logreturn on the BCRP and even the expected logreturn on the LOP. Similar statements hold true for the expected outofsample performance of the MVE and the performance of the MVOP.

(iii)
The BCRP exists and is unique under mild regularity conditions. Moreover, it is strongly consistent, which holds true also for the expected outofsample logreturn on the BCRP and its insample average logreturn. Similar results are obtained for the MVE.

(iv)
Although both the BCRP and the MVE are affected by shortselling constraints, they are \(\sqrt{n}\,\)consistent. The asymptotic results can be used in order to construct hypothesis tests and to compute confidence regions.

(v)
Due to the constraints on the portfolio weights, the asymptotic results are inaccurate in most practical applications. Nonetheless, a finitesample correction exists. It substantially improves the largesample approximation of the MVE (and thus of the BCRP).

(vi)
However, the impact of estimation risk that comes from estimating expected asset returns is tremendous in most reallife situations. This problem is so serious that estimating the LOP becomes a futile endeavour if we have no prediction power.
The rest of this work is organized as follows: In Sect. 2 the basic assumptions are made and the mathematical notation is explained. Section 3 contains some elementary results and provides a simple characterization of the LOP. In Sect. 4 the smallsample and largesample properties of the BCRP are derived, which includes its existence, uniqueness, and consistency. That section contains also the asymptotic distribution of the BCRP. The reader can find the corresponding results for the MVE in Sect. 5. In Sect. 6 some computational issues that are related to the BCRP are discussed and the finitesample correction for the MVE is demonstrated. Section 7 concludes this work. Finally, the “Appendix” contains an important but quite tedious derivation.
2 Basic assumptions and notation
Throughout this work, \({\mathbb {N}}\) denotes the set of positive integers, i.e., \({\mathbb {N}}:=\big \{1,2,\ldots \big \}\), and the symbol “\(\log \)” stands for the natural logarithm. The symbol \({\varvec{0}}\) denotes a vector of zeros and \({\varvec{1}}\) is a vector of ones. The dimensions of \({\varvec{0}}\) and \({\varvec{1}}\) should always be clear from the context. Any tuple \(x=(x_1,x_2,\ldots ,x_d)\in \mathbb {R}^d\) is understood to be a column vector and \(x'=\big [x_1~x_2~\ldots ~x_d\big ]\) is the transpose of x. It is implicitly assumed that we have an underlying probability space \(\big (\Omega ,{\mathcal {A}},{\mathbb {P}}\big )\), where \(\Omega \) is some state space, \({\mathcal {A}}\) is a \(\sigma \)algebra on \(\Omega \), and \({\mathbb {P}}\) is a probability measure on \({\mathcal {A}}\), which is often referred to as the physical or realworld probability measure in financial mathematics (Frahm 2016). A random quantity is a (measurable) realvalued function on \(\Omega \). According to probability theory, two random quantities are considered identical if and only if they coincide with probability 1. Analogously, any statement about a random quantity is meant to be true almost surely (a.s.). Hence, we can drop the additional remarks “\({\mathbb {P}}(\cdot )=1\)” and “a.s.” for convenience. For example, if X is a random variable, “\(X=x\)” with \(x\in \mathbb {R}\) means that \({\mathbb {P}}(X=x)=1\) and if Y is a random variable, too, then “\(X>Y\)” means that X is greater than Y with probability 1, etc. Further, if \(\big \{X_n\big \}_{n\in {\mathbb {N}}}\) is a random sequence, “\(X_n\rightarrow x\)” means that \(X_n\) converges a.s. to x as n tends to infinity. Thus, we may drop also the notation “\(n\rightarrow \infty \)”.
Consider an asset universe with one riskless asset and \(N\in {\mathbb {N}}\) risky assets. It is assumed that the assets are infinitely divisible and any market frictions are ignored. Let \(S_t=\big (S_{0t},S_{1t},\ldots ,S_{Nt}\big )\) be the vector of asset prices at time \(t=0,1,\ldots \,\), where \(S_{0t}\) denotes the price of the riskless asset. The unit of time is one trading day. It is assumed that \(S_{0t}=1\) for \(t=0,1,\ldots \) and that \(S_{i0}=1\) for \(i=1,2,\ldots ,N\). In the following, each statement that contains the index i or t is meant to be true for all i and t that are appropriate in the given context. The price process \(\big \{S_t\big \}_{t=0,1,\ldots }\) shall be positive. The timeindex set is always \(\big \{0,1,\ldots \big \}\) and thus the subscript in “\(\{\cdot \}_{t=0,1,\ldots }\)” will be omitted for notational convenience.
Let \(X_t:=S_t/S_{t1}\) be the vector of price relatives after the trading day t, where the division of \(S_t\) by \(S_{t1}\) is understood to be componentwise. Any capital appreciation for Asset i during Day t, e.g., interest or dividend income, is considered part of the asset price \(S_{it}\). The portfolio weights of the risky assets are denoted by \(w_1,w_2,\ldots ,w_N\), whereas \(w_0\) is the weight of the riskless asset. Hence, \(w=(w_0,w_1,\ldots ,w_N)\in \mathbb {R}^{N+1}\) is a portfolio that consists of the riskless asset and N risky assets. Each single asset is considered a portfolio, i.e., a canonical vector in \(\mathbb {R}^{N+1}\). In order to distinguish the weights of the risky assets, \(w_1,w_2,\ldots ,w_N\), from the weight of the riskless asset, \(w_0\), the notation \({\tilde{w}}=(w_1,w_2,\ldots ,w_N)\in \mathbb {R}^N\) is used. This means that \(w=(w_0,{\tilde{w}})\). Analogously, \({\tilde{X}}_t\) indicates the risky part of \(X_t\), i.e., we have that \(X_t=(1,{\tilde{X}}_t)\). Finally, the return on Asset i after Day t is given by \(R_{it}:=X_{it}1\) and \(R_t={\tilde{X}}_t{\varvec{1}}=(R_{1t},R_{2t},\ldots ,R_{Nt})\) denotes the vector of risky asset returns. Since we assume that \(S_{0t}=1\) for \(t=0,1,\ldots \), the riskfree interest rate is supposed to be zero. This assumption is made without loss of generality, which will be explained below.
Although the following terms will be defined later on, their notation is used throughout this work and so it shall be clarified beforehand: The symbol \(w^*=(w^*_0,w^*_1,\ldots ,w^*_N)\) denotes the LOP, whereas \(w^\star =(w^\star _0,w^\star _1,\ldots ,w^\star _N)\) is the MVOP. Note that the former superscript, “\(*\),” has 6 spikes, whereas the latter, “\(\star \),” consists of 5 spikes. This shall symbolize a key observation of this work, namely that the LOP and the MVOP are almost indistinguishable in most practical applications, which hopefully does not hold for the symbols themselves. The symbols \({\tilde{w}}^*=(w^*_1,w^*_2,\ldots ,w^*_N)\) and \({\tilde{w}}^\star =(w^\star _1,w^\star _2,\ldots ,w^\star _N)\) denote the “risky parts” of \(w^*\) and \(w^\star \), respectively. Further, \(w_i\) is the portfolio weight of Asset i. By contrast, \(w_n\) is an estimator for the portfolio w, where \(n\in {\mathbb {N}}\) is the number of observations.^{Footnote 1} Consequently, \(w_{in}\) is the estimator for the portfolio weight of Asset i.
At the end of each trading day, the investor rebalances his portfolio according to a constant vector of portfolio weights w satisfying the budget constraint \({\varvec{1}}'w=1\). The portfolio value at Day \(n\in {\mathbb {N}}\) amounts to \(V_{wn}:=\prod _{t=1}^n w'X_t\). The investment capital might vanish during some trading day if we do not pose any additional constraints on the portfolio weights. In fact, if we allow the investor to enter short positions, the probability of going bankrupt, i.e., \(V_{wn}\le 0\), is positive unless we make some additional assumption about \(\big \{X_t\big \}\), but this is omitted in this work. Hence, the portfolio w must be an element of the (unit) simplex
The assumption that \(w\in {\mathcal {S}}\) is crucial. It guarantees that \(w'X_t>0\) so that \(V_{wn}>0\) for all \(n\in {\mathbb {N}}\). Hence, the logvalue process \(\log V_{wn}=\sum _{t=1}^n\log w'X_t\) exists for all \(w\in {\mathcal {S}}\) and \(n\in {\mathbb {N}}\), where \(\log w'X_t\) is referred to as the logreturn on the portfolio after Day t. The shortselling constraints are indispensable because otherwise \(\log w'X_t\) might not be defined.
As already mentioned above, we can assume without loss of generality that the riskfree interest rate is zero: Let \(r\equiv R_0>\,1\) be the riskfree interest rate and \(Y_t\) the vector of relative prices with \(Y_{0t}=1+r\). Then we could use the discounted relativeprice vector \(X_t:=Y_t/(1+r)\). The logreturn on any portfolio \(w\in {\mathcal {S}}\) amounts to \(\log (w'Y_t)=\log (1+r)+\log (w'X_t)\). We can ignore the first term, \(\log (1+r)\), provided we are interested only in maximizing the expected logreturn on the portfolio w, which is our main focus here. Throughout this work, we suppose that \(X_t\) contains the discounted relative prices but omit the word “discounted” for convenience.
Now, the following basic assumptions are made:
 A1.:

The relativeprice process \(\big \{X_t\big \}\) is strictly stationary,
 A2.:

the expected value of \(\log w'X_t\) is finite for all \(w\in {\mathcal {S}}\), and
 A3.:

\(w'_1X_t\) and \(w'_2X_t\) do not coincide for any \(w_1,w_2\in \mathbb {R}^{N+1}\) with \(w_1\ne w_2\).
A1 is a fundamental assumption in econometrics and implies that the elements of \(\big \{X_t\big \}\) are identically distributed, but it is not assumed that they are serially independent. Further, A2 guarantees that we can work with the quantity \({\mathbf {E}}(\log w'X_t)\) and A3 requires the relative prices to span \((0,\infty )^{N+1}\). It follows that no risky asset can be replicated by a convex combination of other assets. More precisely, since \({\mathbb {P}}\big (w'_1X_t=w'_2X_t\big )\ne 1\) for all \(w_1,w_2\in \mathbb {R}^{N+1}\) with \(w_1\ne w_2\), it holds that \({\mathbb {P}}\big (w'X_t=0\big )\ne 1\) for all \(w\in \mathbb {R}^{N+1}\) with \(w\ne {\varvec{0}}\) and vice versa. Hence, it cannot happen that \({\tilde{w}}'R_t=c\in \mathbb {R}\) for any \({\tilde{w}}\in \mathbb {R}^N\) with \({\tilde{w}}\ne {\varvec{0}}\). Otherwise, we could define \(w_0:=\,(c+{\varvec{1}}'{\tilde{w}})\) so that \(w'X_t=\,(c+{\varvec{1}}'{\tilde{w}})+({\varvec{1}}'{\tilde{w}}+c)=0\) with \(w\ne {\varvec{0}}\). Conversely, if there is no \({\tilde{w}}\in \mathbb {R}^N\) with \({\tilde{w}}\ne {\varvec{0}}\) such that \({\tilde{w}}'R_t=c\in \mathbb {R}\), then we cannot have that \(w'X_t=0\) with \(w\ne {\varvec{0}}\) because otherwise \({\tilde{w}}'R=\,(w_0+{\varvec{1}}'{\tilde{w}})\in \mathbb {R}\) with \({\tilde{w}}\ne {\varvec{0}}\).
To sum up, it holds that \({\mathbb {P}}\big (w'X_t=0\big )\ne 1\Leftrightarrow {\mathbb {P}}\big ({\tilde{w}}'R_t=c\big )\ne 1\) with \({\tilde{w}}\ne {\varvec{0}}\). How can we interpret this basic condition from an economical point of view? Suppose that \({\tilde{w}}'R_t=c\ne 0\). Then we have an arbitrage opportunity, which is not possible if the market is in equilibrium and the market participants are rational (Frahm 2018). By contrast, in the case that \({\tilde{w}}'R_t=0\), at least one risky asset is redundant. For example, let us assume that \({\tilde{w}}_N\ne 0\) without loss of generality. Then we can construct the portfolio \({\tilde{v}}:=\,{\tilde{w}}/{\tilde{w}}_N\) of risky assets so that \({\tilde{v}}'R_t=0\) with \({\tilde{v}}_N=\,1\). Now, we are able to replicate Asset N by a linear combination of all other assets, including the riskless asset, by using the portfolio \((1{\varvec{1}}'({\tilde{v}}_1,{\tilde{v}}_2,\ldots ,{\tilde{v}}_{N1}),{\tilde{v}}_1,{\tilde{v}}_2,\ldots ,{\tilde{v}}_{N1})\in \mathbb {R}^N\), which satisfies the budget constraint. Hence, we can ignore Asset N and reduce the asset universe to \(N1\) risky assets. Of course, also the converse is true. That is, if we are able to replicate a risky asset by linear combination of all other assets, we must have that \({\tilde{w}}'R_t=0\) with \({\tilde{w}}\ne {\varvec{0}}\).
Throughout this work, it is assumed that the dimension reduction has already been made in advance. Note also that without the dimension reduction the covariance matrix of \(R_t\) would not be positive definite because \({\tilde{w}}'R_t=c\) implies that \({\tilde{w}}'{\mathbf {Var}}\big (R_t\big ){\tilde{w}}={\mathbf {Var}}\big ({\tilde{w}}'R_t\big )=0\) for \({\tilde{w}}\ne {\varvec{0}}\) and vice versa. Hence, A3 is indispensable also for statistical reasons. In fact, this assumption plays a major role in portfolio theory, where it is typically required that \({\mathbf {Var}}(R_t)>0\), i.e., that the covariance matrix of the risky asset returns is positive definite.
3 The logoptimal portfolio
Definition 1
A logoptimal portfolio is a portfolio \(w^*\!\in {\mathcal {S}}\) that maximizes the expected logreturn, i.e.,
The LOP is often associated with the “Kelly criterion” (Kelly 1956). Its asymptotic optimality properties are elaborated by Algoet and Cover (1988); Bell and Cover (1980) as well as Breiman (1961).^{Footnote 2} Although it was originally studied in information theory, it became of growing interest to the finance community over the last decades. As already mentioned in Sect. 1, the LOP is sometimes referred to as the GOP. However, the GOP is typically studied in a continuoustime framework, whereas the LOP is based on a discretetime setting.^{Footnote 3}
The Lagrange function of the optimization problem given by Definition 1 is
with \(\kappa =(\kappa _0,\kappa _1,\ldots ,\kappa _N)\ge {\varvec{0}}\) and \(\lambda \in \mathbb {R}\). The corresponding KarushKuhnTucker (KKT) conditions are quite nice (Cover and Thomas 1991, Theorem 15.2.1). The following theorem establishes also the existence and uniqueness of the LOP.
Theorem 1
The LOP exists and is unique. It is characterized by \(w^*\!\in {\mathcal {S}}\) such that
Proof
The simplex \({\mathcal {S}}\) is compact and convex. Further, the random variables \(v'X_t\) and \(w'X_t\) do not coincide for any \(v,w\in {\mathcal {S}}\) with \(v\ne w\). Hence, since the natural logarithm is strictly concave, for each \(0<\pi <1\) and \(v,w\in {\mathcal {S}}\) with \(v\ne w\) it holds that
and
This means that the objective function \(w\mapsto {\mathbf {E}}\big (\log w'X_t\big )\) is strictly concave, which implies that \(w^*\) exists and is unique. Further, the partial difference quotient
increases monotonically and tends to \(x_i/w'x>0\) as \(\Delta w_i\!\searrow 0\) for each \(x>{\varvec{0}}\). From the Monotone Convergence Theorem, we conclude that
Hence, we have that \({\mathbf {E}}\big (X_t/w^{*\prime }X_t\big )=\lambda {\varvec{1}}\kappa \) with \(w^*\!\in {\mathcal {S}}\), \(\lambda \in \mathbb {R}\), \(\kappa =(\kappa _0,\kappa _1,\ldots ,\kappa _N)\ge {\varvec{0}}\), and \(w^*_i\kappa _i=0\). From \(w^{*\prime }{\mathbf {E}}\big (X_t/w^{*\prime }X_t\big )={\mathbf {E}}\big (w^{*\prime }X_t/w^{*\prime }X_t\big )=1\), \(w^{*\prime }{\varvec{1}}=1\), and \(w^{*\prime }\kappa =0\), we conclude that \(\lambda =1\). Thus, we obtain
which leads to the given expression in the theorem. \(\square \)
The portfolio weight \(w^*_i\) is bounded by \({\mathcal {S}}\) if and only if \({\mathbf {E}}(X_{it}/w^{*\prime }X_t)<1\), whereas the partial derivative equals 1 whenever \(w^*_i>0\). If all (optimal) portfolio weights are positive, the solution of the optimization problem, \(w^*\), lies in the interior of \({\mathcal {S}}\), which is denoted by \({\mathcal {S}}^{\text{ o }}\). Since we have that
the expected logreturn stays constant after a local change of the portfolio weights.^{Footnote 4} This could be true even on the boundary of \({\mathcal {S}}\), \(\partial {\mathcal {S}}\), as long as \({\mathbf {E}}(X_t/w^{*\prime }X_t)={\varvec{1}}\). In this case, all portfolio weights are still unbounded by \({\mathcal {S}}\). By contrast, if (at least) one partial derivative is lower than 1, some portfolio weight must be zero, i.e., \(w^*\in \partial {\mathcal {S}}\). Then the expected logreturn decreases after a local change of a portfolio weight that is bounded by \({\mathcal {S}}\). These basic considerations will be important later on when deriving the asymptotic properties of the LOP estimators.
4 The best constant rebalanced portfolio
Definition 2
A best constant rebalanced portfolio is a portfolio \(w^*_n\in {\mathcal {S}}\) that maximizes the insample average logreturn, i.e.,
The relative prices contained in \(X_1,X_2,\ldots ,X_n\), except for the relative price 1 of the riskless asset, are nondegenerate random variables and so, in general, \(\frac{1}{n}\sum _{t=1}^n \log w'X_t\) is a nondegenerate random variable, too. This means that for each element of the state space, \(\omega \in \Omega \), and so for each realization of \(X_1,X_2,\ldots ,X_n\), we maximize \(\frac{1}{n}\sum _{t=1}^n \log w'X_t(\omega )\), which leads us to a particular realization, \(w^*_n(\omega )\), of a BCRP \(w^*_n\), which thus represents a random vector.
A BCRP can be considered an empirical version of the LOP. It is said to be the “best” constant rebalanced portfolio because \(w^*_n\) maximizes the final value after Period \(n\in {\mathbb {N}}\), i.e., \(V_{wn}\), over all constant rebalanced portfolios \(w\in {\mathcal {S}}\). However, the maximization is done in hindsight, i.e., after all asset prices have been revealed to the investor, and thus the BCRP is unknown in advance.
4.1 Smallsample properties
4.1.1 Existence and uniqueness
Let \(n\in {\mathbb {N}}\) be the number of price observations. The following additional assumption is made for statistical reasons:
 A4.:

The sample of price relatives, i.e., \({\mathbf {X}}=\big [X_1~X_2~\cdots ~X_n\big ]\), has rank \(N+1\).
A4 can be considered an empirical version of A3, which implies that no risky asset is redundant. It requires that the number of observations exceeds the number of risky assets, i.e., \(n>N\).
Theorem 2
The BCRP exists and is unique. It is characterized by \(w^*_n\in {\mathcal {S}}\) such that
Proof
Since \(w\mapsto \frac{1}{n}\sum _{t=1}^n\log w'X_t\) is a concave objective function, Eq. 1 represents a convex optimization problem and, because the simplex is compact and convex, the BCRP exists. The rank of \({\mathbf {X}}\) is full so that \(v,w\in \mathbb {R}^{N+1}\) must lead to different value processes unless \(v=w\). That is, the given objective function is strictly concave, which implies that the BCRP is unique. The rest of the proof follows by the arguments that are used in the proof of Theorem 1. \(\square \)
A simple numerical algorithm for computing the BCRP is developed by Cover (1984). We will come back to this point in Sect. 6.
4.1.2 Finitesample bias
Let \(w_n\) be a portfolio that is based only on the price observations that have been made up to Day n. A standard assumption of portfolio theory is that \(w_n\) is stochastically independent of \(R_{n+1}\) or, equivalently, of \(X_{n+1}\) (Frahm 2015). If \(w_n\) would depend on \(R_{n+1}\), the decision of the investor at time n would be influenced by some asset returns at time \(n+1\) or, vice versa, his financial transactions would have an impact on forthcoming asset prices. In this case he would be able to predict the future price evolution on the basis of past asset prices. This is typically ruled out in finance theory and, especially, in portfolio theory. Put another way, we assume that the investor has no prediction power. This basic assumption will be elaborated also in Sect. 5.1.2.
For example, suppose that \(X_1,X_2,\ldots ,X_{n+1}\) are serially independent. Since \(w_n\) is a function of \(X_1,X_2,\ldots ,X_n\), the portfolio \(w_n\) does not dependent on \(X_{n+1}\). However, the converse is not true. Consider some (measurable) realvalued function f of some random variable \(\xi \). The fact that \(f(\xi )\) is independent of another random variable \(\zeta \) does not imply that \(\xi \) is independent of \(\zeta \). A trivial example is any constant function f. Another wellknown and more sophisticated example is the case in which \(\xi _1,\xi _2,\ldots ,\xi _n\) are independent and identically normally distributed. Obviously, \(\xi _{n+1}:=\frac{1}{n}\sum _{t=1}^n\xi _t\) depends on \(\xi _1,\xi _2,\ldots ,\xi _n\), but it is known that \(f(\xi _1,\xi _2,\ldots ,\xi _n)=\sum _{t=1}^n (\xi _t\xi _{n+1})^2\) is independent of \(\xi _{n+1}\).^{Footnote 5} Thus, although \(\xi _{n+1}\) depends on \(\xi _1,\xi _2,\ldots ,\xi _n\) and \(f(\xi _1,\xi _2,\ldots ,\xi _n)\) is not constant, but a nondegenerate random variable, \(f(\xi _1,\xi _2,\ldots ,\xi _n)\) is still independent of \(\xi _{n+1}\). We conclude that, although \(X_1,X_2,\ldots ,X_{n+1}\) may be serially dependent, a (random) portfolio based on \(X_1,X_2,\ldots ,X_n\) need not depend on \(X_{n+1}\).
Hence, we can make the following additional assumptions:
 A5.:

The BCRP \(w^*_n\) is stochastically independent of \(X_{n+1}\).
 A6.:

The BCRP does not coincide with the LOP, i.e., \({\mathbb {P}}\big (w^*_n=w^*\big )\ne 1\).
A6 just states that \(w^*_n=w^*\) holds only with probability lower than 1. This assumption is trivial, since otherwise we would not have any estimation risk at all.
Let X be any positive random vector that has the same distribution as the vectors \(X_1,X_2,\ldots \) of price relatives and define the following quantities:

\(\varphi (w):={\mathbf {E}}\big (\log w'X\big )\),

\(\varphi _n(w):={\mathbf {E}}\big (\frac{1}{n}\sum _{t=1}^n\log w'X_t\big )\), and

\(\varphi _{n+1}(w):={\mathbf {E}}\big (\log w'X_{n+1}\big )\).
Hence, by substituting w with \(w^*\) or \(w^*_n\), respectively, we can see that

\(\varphi (w^*)\) is the expected logreturn on the LOP,

\(\varphi _n(w^*_n)\) represents the expected insample average logreturn on the BCRP, and

\(\varphi _{n+1}(w^*_n)\) denotes the expected outofsample logreturn on the BCRP.
The investor cannot achieve \(\varphi (w^*)\) because the LOP is unknown to him. Instead, he maximizes the average logreturn \(\frac{1}{n}\sum _{t=1}^n\log w^{*\prime }_nX_t\) in order to compute the BCRP. At the end of Day n he applies the BCRP and one day later he obtains the logreturn \(\log w^{*\prime }_nX_{n+1}\). For this reason, \(\varphi _{n+1}(w^*_n)\) may be considered the basic performance measure for \(w^*_n\).
The following theorem describes why the BCRP might lead to wrong conclusions in reallife situations, especially if the number of observations, n, is small.
Theorem 3
\(\varphi _{n+1}(w^*_n)<\varphi (w^*)<\varphi _n(w^*_n)\)
Proof
By definition, \(w^*\) is the element of \({\mathcal {S}}\) that maximizes the expected logreturn. Moreover, due to A5 and A6, and the fact that \(w^*\) is unique, we have that
with probability 1 but \({\mathbf {E}}\big (\log w'X_{n+1}\big )<{\mathbf {E}}\big (\log w^{*\prime }X_{n+1}\big )\) with positive probability. From the Law of Total Expectation and the stationarity of \(\big \{X_t\big \}\) we conclude that
Moreover, since \(w^*_n\) is unique and does not coincide with \(w^*\), we have that
and
This means that
\(\square \)
Hence, the expected outofsample logreturn on the BCRP, \(\varphi _{n+1}(w^*_n)\), is always lower than the expected logreturn on the LOP, i.e., \(\varphi (w^*)\). Nonetheless, the investor typically overestimates not only \(\varphi _{n+1}(w^*_n)\) but even \(\varphi (w^*)\) when computing \(\varphi _n(w^*_n)\) by maximizing \(\frac{1}{n}\sum _{t=1}^n\log w'X_t\). This phenomenon is not limited to the BCRP. It is a general problem of portfolio optimization (see, e.g., Frahm 2015; Frahm and Memmel 2010; Kan and Zhou 2007; Memmel 2004).
4.2 Largesample properties
4.2.1 Consistency
For the subsequent analysis it is convenient to define the function \(x\mapsto f_w(x):=\log w'x\) for all \(w\in {\mathcal {S}}\) and \(x>{\varvec{0}}\) as well as the functions
for all \(n\in {\mathbb {N}}\). We make the following statistical assumption, which is often used in the theory of empirical processes (see, e.g., van der Vaart 1998, Chapter 19):
 A7.:

The family \({\mathcal {F}}=\big \{f_w\big \}_{w\in {\mathcal {S}}}\) is GlivenkoCantelli, i.e.,
$$\begin{aligned} \sup _{w\in {\mathcal {S}}}M_n(w)M(w) \rightarrow 0\,. \end{aligned}$$
Hence, the Strong Law of Large Numbers shall hold true for the sequence \(\big \{M_n(w)\big \}\)uniformly in \({\mathcal {S}}\). For example, according to van der Vaart (1998, p. 46), it is sufficient to guarantee that

(i)
w stems from a compact set,

(ii)
the elements of \({\mathcal {F}}\) are continuous for every \(x>{\varvec{0}}\), and

(iii)
they are dominated by an integrable function,
provided \(X_1,X_2,\ldots \) are serially independent.^{Footnote 6} The first two properties are clearly satisfied in our context. In order to see that the third property is satisfied, too, note that
for all \(x>{\varvec{0}}\), where \((\log x)^\) denotes the negative part of the vector \(\log x\). Hence, the function
dominates each \(f_w\). Since \({\mathbf {E}}\big (\log w'X_t\big )<\infty \) for all \(w\in {\mathcal {S}}\), we have that \({\mathbf {E}}\big (\log X_{it}\big )<\infty \) and thus \({\mathbf {E}}\big ({\varvec{1}}'(\log X_t)^\big )<\infty \). Moreover, note that \({\mathbf {E}}\left( \log w'X_t\right) <\infty \) for \(w={\varvec{1}}/N\) and \({\varvec{1}}'X_t>1\) because \(X_{0t}=1\). Hence, we obtain
The maximum of two nonnegative and integrable random variables is also integrable. Thus, we conclude that \({\mathbf {E}}\big (g(X_t)\big )<\infty \), i.e., the dominating function g is integrable.
At the beginning of Sect. 2 it has already been mentioned that each statement that refers to a random quantity is meant to be true with probability 1. The next theorem asserts that the BCRP is strongly consistent for the LOP. This means that \(w^*_n\) converges almost surely to \(w^*\), which is simply denoted by “\(w^*_n\rightarrow w^*\),” i.e., without the additional remark “a.s.,” for convenience.
Theorem 4
\(w^*_n\rightarrow w^*\)
Proof
The BCRP \(w^*_n\) represents an Mestimator, whose criterion functions are given by M and \(M_n\). Let \(\varepsilon \) be any positive real number and \({\mathcal {P}}_\varepsilon :=\big \{w\in {\mathcal {S}}\!:\Vert ww^*\Vert =\varepsilon \big \}\).^{Footnote 7} Since M is strictly concave, there exists some \(\delta >0\) such that \(M(w^*)M(w)>\delta \) for all \(w\in {\mathcal {P}}_\varepsilon \). Now, since \({\mathcal {F}}\) is GlivenkoCantelli, we can find a sufficiently large number \(m\in {\mathbb {N}}\) such that, for all natural numbers \(n\ge m\), \(M_n(w^*)M(w^*)\le \delta /2\) and \(M_n(w)M(w)\le \delta /2\) for all \(w\in {\mathcal {P}}_\varepsilon \). Thus, \(M_n(w)<M_n(w^*)\) for all \(w\in {\mathcal {P}}_\varepsilon \). Since \(M_n\) is strictly concave, too, we have that \(\Vert w^*_nw^*\Vert <\varepsilon \) for all \(n\ge m\). This holds true for every \(\varepsilon >0\) and thus \(w^*_n\rightarrow w^*\). \(\square \)
The next theorem asserts that the expected outofsample logreturn on the BCRP converges to the expected logreturn on the LOP.
Theorem 5
\(\varphi _{n+1}(w^*_n)\rightarrow \varphi (w^*)\)
Proof
Theorem 4 and the Continuous Mapping Theorem reveal that \(\log w^{*\prime }_nx\rightarrow \log w^{*\prime }x\) for all \(x>{\varvec{0}}\). Further, we already know that there exists an integrable function \(x\mapsto g(x)\) such that \(f_w(x)\le g(x)\) for all \(w\in {\mathcal {S}}\) and \(x>{\varvec{0}}\). Hence, by the Dominated Convergence Theorem, we obtain
\(\square \)
Finally, also the insample average logreturn on the BCRP converges to the expected logreturn on the LOP as the number of observations grows to infinity.
Theorem 6
\(\frac{1}{n}\sum _{t=1}^n \log w^{*\prime }_nX_t\rightarrow \varphi (w^*)\)
Proof
The statement is equivalent to \(M_n(w^*_n)M(w^*)\rightarrow 0\). Thus, it suffices to demonstrate that
The former is an immediate consequence of A7. Moreover, the Dominated Convergence Theorem tells us that \(M(w_n)\rightarrow M(w)\) for every sequence \(\{w_n\}\) with \(w_n\in {\mathcal {S}}\) such that \(w_n\rightarrow w\in {\mathcal {S}}\). This means that M is continuous at each \(w\in {\mathcal {S}}\). Theorem 4 and the Continuous Mapping Theorem complete the proof. \(\square \)
4.2.2 Asymptotic distribution
In this section, the asymptotic distribution of \(\sqrt{n}\,\big (w^*_nw^*\big )\) is established. This can be done for all dimensions of \(w^*\) that are not bounded by \({\mathcal {S}}\), i.e., \({\mathbf {E}}\big (X_{it}/w^{*\prime }X_t\big )=1\). As already explained at the end of Sect. 3, each other component of \(w^*\) is bounded by the simplex. If \(w^*_i=0\) represents such a component, i.e., \({\mathbf {E}}\big (X_{it}/w^{*\prime }X_t\big )<1\), it is wellknown that
i.e., \(w^*_{in}\) is superconsistent. However, not all components of the LOP can be affected by the given constraints on the portfolio weights. Indeed, we must have that \({\mathbf {E}}\big (X_{it}/w^{*\prime }X_t\big )=1\) for at least one asset because otherwise the KKT conditions given by Theorem 1 cannot be satisfied. Thus, we can reduce the asset universe until there is no portfolio weight that is bounded by \({\mathcal {S}}\). The riskless asset need not be part of the reduced asset universe. However, in order to avoid the trivial solution \(w^*_n=1\), there should be at least two remaining assets in the universe.
Hence, we assume that the given asset universe has been reduced such that \({\mathbf {E}}\big (X_t/w^{*\prime }X_t\big )={\varvec{1}}\). This guarantees that
This means that the function M can be locally approximated at \(w^*\) by
From the Monotone Convergence Theorem we conclude that the Hessian is given by
The following assumption implies that the Hessian is finite. Finally, A3 guarantees that \(\nabla ^2M(w^*)\) is negative definite.
 A8.:

The second moments of \(X_t/(w^{*\prime }X_t)\) are finite.
Further, we have to make the following assumptions:
 A9.:

The function \(f_w\) can be locally approximated at \(w^*\) by
$$\begin{aligned} f_w(X_t) = f_{w^*}(X_t) + (ww^*)'\left( \frac{X_t}{w^{*\prime }X_t}\right) + \Vert ww^*\Vert \,r(X_t;w), \end{aligned}$$where the process \(\big \{r(X_t;w)\big \}\) is stochastically equicontinuous. This means that for every \(\epsilon >0\) and \(\eta >0\) there exists a neighborhood \({\mathcal {U}}\) of \(w^*\) in the simplex \({\mathcal {S}}\) such that
$$\begin{aligned} \limsup _{n\rightarrow \infty }\,{\mathbb {P}}^*\left( \sup _{w\in {\mathcal {U}}} \left \sqrt{n}\left( \frac{1}{n}\sum _{t=1}^nr(X_t;w){\mathbf {E}}\big (r(X;w)\big )\right) \right >\eta \right) < \epsilon , \end{aligned}$$where \({\mathbb {P}}^*\) is an outer measure associated with \({\mathbb {P}}\).
 A10.:

We have that
$$\begin{aligned} \sqrt{n}\,\left( \frac{1}{n}\sum _{t=1}^n\frac{X_t}{w^{*\prime }X_t}{\varvec{1}}\right) \rightsquigarrow {\mathcal {N}}\big ({\varvec{0}},A\big ). \end{aligned}$$^{Footnote 8}
A9 is a basic regularity condition, which guarantees that the remainder \(r(X_t,w)\) of the linear approximation becomes negligible as \(n\rightarrow \infty \). To be more precise, it requires that
if the sample size, n, is large and w is close to \(w^*\).^{Footnote 9} Further, A10 says that the process \(\big \{X_t/w^{*\prime }X_t\big \}\) satisfies the Central Limit Theorem. In particular, if the elements of \(\big \{X_t\big \}\) are serially independent, we obtain the asymptotic covariance matrix
Nonetheless, we could take also any form of serial dependence into account, provided the Central Limit Theorem expressed by A10 is satisfied. There exist many strong mixing conditions that guarantee that this theorem holds true for the process \(\big \{X_t/w^{*\prime }X_t\big \}\) (see, e.g., Bradley 2005).
Suppose that \(\Theta \subseteq \mathbb {R}^d\) is any parameter set and let \(\theta \in \Theta \) be the “true” parameter. The tangent cone at \(\theta \) is the set that we obtain after centering \(\Theta \) at \(\theta \), blowing it up by some factor \(\tau >0\), and taking the set limit for \(\tau \rightarrow \infty \) (Geyer 1994, p. 1993). In order to study the asymptotic behavior of a sequence \(\{\theta _n\}\) of global optimizers that converges to \(\theta \) it is crucial to guarantee that the parameter set \(\Theta \) is Chernoff regular (Geyer 1994), viz.
In our context, the parameter \(\theta \) corresponds to \(w^*\!\in {\mathcal {S}}\), which represents the global solution of the convex optimization problem expressed by Definition 1. The simplex \({\mathcal {S}}\) is Chernoff regular and so let \({\mathcal {T}}_{{\mathcal {S}}}(w^*):=\lim _{\tau \rightarrow \infty }\tau \big ({\mathcal {S}}w^*\big )\) be the tangent cone of the simplex at \(w^*\).^{Footnote 10}
Consider any random vector \(Y\sim {\mathcal {N}}\big ({\varvec{0}},A\big )\) and define the function
The (unique) maximizer of \(\Psi _Y\) is denoted by
The following theorem describes the asymptotic behavior of the BCRP.
Theorem 7
We have that
Proof
The theorem asserts that \(\sqrt{n}\,\big (w^*_nw^*\big )\rightsquigarrow \zeta ^*\), which is an immediate consequence of Theorem 4.4 in Geyer (1994). \(\square \)
Hence, if the sample size is large, \(\sqrt{n}\,\big (w^*_nw^*\big )\) behaves essentially like the solution of a relatively simple quadratic optimization problem. In the case in which the elements of \(\big \{X_t\big \}\) are serially independent, we obtain
The following corollary establishes the longrun distribution of the logreturn on the BCRP relative to the logreturn on the LOP.
Corollary 1
Proof
Note that
i.e.,
The rest of the proof follows from Theorem 4.4 in Geyer (1994). \(\square \)
This completes our analysis of the BCRP. In the next section we focus on the MVE and derive its corresponding statistical properties.
5 The meanvariance estimator
Consider some portfolio \(w\in {\mathcal {S}}\) and let \({\tilde{w}}=(w_1,w_2,\ldots ,w_N)\) be the “risky part” of that portfolio. The return on Asset i after Day t is given by \(R_{it}=X_{it}1\) and so the return on w amounts to \(R_{wt}={\tilde{w}}'R_t\), where \(R_t=(R_1,R_2,\ldots ,R_N)\) denotes the vector of risky asset returns.^{Footnote 11} The assumptions A1 to A10 shall still hold true. Now, we make the following additional assumption:
 B1.:

The second moments of \(R_t\) are finite.
Let R be any random vector that has the same distribution as \(R_1,R_2,\ldots \,\). Define \(\mu :={\mathbf {E}}(R)\) and \(\Sigma :={\mathbf {E}}(RR')\). Note that the matrix \(\Sigma \) contains the second noncentral moments of the risky asset returns and thus it is not the covariance matrix of R. We already know that A3 guarantees that there cannot be any \({\tilde{w}}\in \mathbb {R}^N\) with \({\tilde{w}}\ne {\varvec{0}}\) such that \({\tilde{w}}'R=c\in \mathbb {R}\), i.e., \(\Sigma \) is positive definite.
Now, we may apply the quadratic approximation \(\log (1+r)\approx r\frac{1}{2}r^2\) and come to the conclusion that
This section is build upon the observation that this approximation is very good in most practical applications. Hence, instead of maximizing the expected logreturn, we can simply maximize the objective function \(w\mapsto {\tilde{w}}'\mu \frac{1}{2}\,{\tilde{w}}'\Sigma \,{\tilde{w}}\).^{Footnote 12} In the following, this objective function is called “meanvariance” although \(\Sigma \) contains the second noncentral moments of R and so it does not coincide with the covariance matrix \({\mathbf {Var}}(R)\).
Definition 3
A meanvariance optimal portfolio is a portfolio \(w^\star \!\in {\mathcal {S}}\) that maximizes the meanvariance objective function, i.e.,
Some important remarks may be appropriate at this point:

The vector \(w^\star \) is called meanvariance optimal, although \(\Sigma \) is not a covariance matrix. However, in most practical applications, \(\Sigma \) is close to \({\mathbf {Var}}(R)\) whenever R is a vector of daily asset returns.

We focus on the feasible set \({\mathcal {S}}\) only because \(w^\star \) serves as an approximation of the LOP. However, in general a meanvariance optimal portfolio need not be restricted to \({\mathcal {S}}\).

Under general (but quite technical) regularity conditions, the MVOP can be considered an approximation of the GOP (Karatzas and Kardaras 2007). Nonetheless, due to the reasons explained in Sect. 3, we should refrain from calling \(w^\star \) “GOP.”
Now, the Lagrange function of the optimization problem expressed by Definition 3 is
The following theorem is analogous to Theorem 2.
Theorem 8
The MVOP exists and is unique. It is characterized by \(w^\star \!\in {\mathcal {S}}\) such that the ith component of \(\mu \Sigma \,{\tilde{w}}^\star \) is
Proof
The objective function
is strictly concave and the given set of constraints on the portfolio weights \(w_1,w_2,\ldots ,w_N\), i.e., \({\tilde{w}}\ge {\varvec{0}}\) and \({\varvec{1}}'{\tilde{w}}\le 1\), is closed and convex. Hence, the “risky part” of \(w^\star \), i.e., \({\tilde{w}}^\star \), exists and is unique, which means that \(w^\star \) exists and is unique, too. Thus, we must have that
with \(w^\star \!\in {\mathcal {S}}\), \(\lambda \in \mathbb {R}\), \(\kappa =(\kappa _0,\kappa _1,\ldots ,\kappa _N)\ge {\varvec{0}}\), and \(w^\star _i\kappa _i=0\) for \(i=0,1,\ldots ,N\). It follows that \(\lambda =\kappa _0\ge 0\). \(\square \)
The next corollary shows how to identify the components of \(w^\star \) that are bounded by \({\mathcal {S}}\). This will be helpful later on.
Corollary 2
The number \(\lambda \) in Theorem 8 is uniquely determined by \(\lambda ={\tilde{w}}^{\star \prime }\big (\mu \Sigma \,{\tilde{w}}^\star \big )\). Moreover, the portfolio weight

\(w^\star _0\) is bounded by \({\mathcal {S}}\) if and only if \(\lambda >0\), whereas

\(w^\star _i\) is bounded by \({\mathcal {S}}\) if and only if the ith component of \(\mu \Sigma \,{\tilde{w}}^\star \) is lower than \(\lambda \).
Proof
The proof of Theorem 8 reveals that \({\tilde{w}}^{\star \prime }\big (\mu \Sigma \,{\tilde{w}}^\star \big )=\lambda =\kappa _0\). Since \(w^\star \) is unique, the same holds true for \(\lambda \). Moreover, \(w^\star _0\) is bounded by \({\mathcal {S}}\) if and only if \(\kappa _0>0\), i.e., \(\lambda >0\), whereas \(w^\star _i\) is bounded by \({\mathcal {S}}\) if and only if \(\kappa _i>0\), i.e., the ith component of \(\mu \Sigma \,{\tilde{w}}^\star \) is below \(\lambda \). \(\square \)
In the following, let
be the moment estimators for \(\mu \) and \(\Sigma \). Now, we are ready to define the MVE for \(w^\star \), which serves also as an estimator for the LOP \(w^*\).
Definition 4
A meanvariance estimator for \(w^\star \) is a portfolio \(w^\star _n\in {\mathcal {S}}\) that maximizes the insample meanvariance objective function, i.e.,
5.1 Smallsample properties
5.1.1 Existence and uniqueness
Let \({\mathbf {R}}=\big [R_1~R_2~\ldots ~R_n\big ]\) be the sample of risky asset returns. A4 implies that we cannot find any \({\tilde{w}}\in \mathbb {R}^N\) with \({\tilde{w}}\ne {\varvec{0}}\) such that \({\mathbf {R}}'{\tilde{w}}={\varvec{0}}\). Hence, we have that
for all \({\tilde{w}}\in \mathbb {R}^N\) with \({\tilde{w}}\ne {\varvec{0}}\), which means that \(\Sigma _n\) is positive definite.
The following corollary is a straightforward consequence of Theorem 8 and thus its proof can be skipped.
Corollary 3
The MVE exists and is unique. It is characterized by \(w^\star _n\in {\mathcal {S}}\) such that the ith component of \(\mu _n\Sigma _n{\tilde{w}}^\star _n\) is
Numerical procedures for solving quadratic optimization problems exist in abundance and so it is easy to compute \(w^\star _n\) even if the number of dimensions is high. Two points, which are discussed in more detail in Sect. 6, are worth emphasizing:

(i)
The estimates \(w^*_{in}\) and \(w^\star _{in}\) are indistinguishable in most reallife situations.^{Footnote 13} Put another way, the MVE leads to a very good approximation of the BCRP.

(ii)
Cover’s algorithm (1984) for \(w^*_n\) is slow compared to quadratic optimization algorithms for \(w^\star _n\). In particular, this holds true in the highdimensional case.
5.1.2 Finitesample bias
Let \(w_n\) be any portfolio that is constructed on the basis of the asset returns \(R_1,R_2,\ldots ,R_n\). We know that the quantity \({\tilde{w}}'_nR_{n+1}\frac{1}{2}\big ({\tilde{w}}'_nR_{n+1}\big )^2\) approximates the outofsample logreturn on \(w_n\) and thus we call
the expected outofsample performance of \(w_n\). As already mentioned before, it is reasonable to presume that \(w_n\) is stochastically independent of \(R_{n+1}\). Otherwise, the investment decision at time t would depend on some asset returns that occur one day later, which is usually considered implausible in finance theory. Thus, we obtain the conditional expectation
which can be viewed as the outofsample performance of \(w_n\). Correspondingly, due to the Law of Total Expectation, its expected outofsample performance is
The latter expectation is a basic performance measure in portfolio optimization (see, e.g., Frahm 2015; Kan and Zhou 2007; Markowitz and Usmen 2003).^{Footnote 14} Hence, as already mentioned before, it is an implicit assumption of portfolio theory that \(w_n\) is stochastically independent of \(R_{n+1}\).
If \(w_n\equiv w\) is a fixed portfolio, we have that \(\phi _{n+1}(w)={\tilde{w}}'\mu \frac{1}{2}\,{\tilde{w}}'\Sigma \,{\tilde{w}}\). In this case we may drop the prefix “expected outofsample” and just say that \(\phi _{n+1}(w)\) is the performance of w. Further, then we can simply write \(\phi (w)\) instead of \(\phi _{n+1}(w)\). In particular,
represents the performance of the MVOP.
Hence, the following assumptions, which are analogous to A5 and A6, are made:
 B2.:

The MVE \(w^\star _n\) is stochastically independent of \(R_{n+1}\).
 B3.:

The MVE does not coincide with the MVOP, i.e., \({\mathbb {P}}(w^\star _n=w^\star )\ne 1\).
Due to B2 the expected outofsample performance of the MVE amounts to
Finally, \({\tilde{w}}'_n\mu _n\frac{1}{2}{\tilde{w}}'_n\Sigma _n{\tilde{w}}_n\) represents the insample performance of the portfolio \(w_n\) and thus
is the expected insample performance of the MVE.
The following theorem is similar to Theorem 3.
Theorem 9
\(\phi _{n+1}(w^\star _n)<\phi (w^\star )<\phi _n(w^\star _n)\)
Proof
By definition, \(w^\star \) is the portfolio that maximizes the performance. Due to B2 and B3, we conclude that
Moreover, since \(w^\star _n\) is unique and does not coincide with \(w^\star \), we have that
and
which means that
\(\square \)
Theorem 9 shows that we still suffer from the same problems that we have already found for the BCRP. This means that the insample performance of the MVE typically overestimates its expected outofsample performance and even the performance of the MVOP.
5.2 Largesample properties
5.2.1 Consistency
The next assumption requires that \(\big \{R_t\big \}\) and \(\big \{R_tR'_t\big \}\) obey the Strong Law of Large Numbers. This holds true under very mild regularity conditions. If \(R_1,R_2,\ldots \) are serially independent, B1 is already sufficient. However, there exist much weaker mixing conditions, which guarantee that the Strong Law of Large Numbers is satisfied both for \(\big \{R_t\big \}\) and for \(\big \{R_tR'_t\big \}\). These mixing conditions are typically discussed in ergodic theory (see, e.g., Davidson 1994).
 B4.:

The estimators \(\mu _n\) and \(\Sigma _n\) are strongly consistent for \(\mu \) and \(\Sigma \), i.e., \(\mu _n\rightarrow \mu \) and \(\Sigma _n\rightarrow \Sigma \).
Theorem 10
\(w^\star _n\rightarrow w^\star \)
Proof
Note that
represents a function of \(\mu _n\) and \(\Sigma _n\). Since \({\mathcal {S}}\) is convex, this function is continuous in \(\mu _n\) and \(\Sigma _n\). From B4 and the Continuous Mapping Theorem it follows that \(w^\star _n\rightarrow w^\star \). \(\square \)
The next theorem is analogous to Theorem 5.
Theorem 11
\(\phi _{n+1}(w^\star _n)\rightarrow \phi (w^\star )\)
Proof
The objective function \(w\mapsto {\tilde{w}}'\mu \frac{1}{2}\,{\tilde{w}}'\Sigma \,{\tilde{w}}\) is continuous in \(w\in {\mathcal {S}}\) and the set \({\mathcal {S}}\) is compact. From the Extreme Value Theorem we conclude that it has a minimum, a, and a maximum b. Hence, \(w\mapsto \max \big \{a,b\big \}\) is a dominating function and it is clearly integrable. We already know that \(w^\star _n\rightarrow w^\star \) and from the Dominated Convergence Theorem it follows that
\(\square \)
Moreover, analogous to Theorem 6, the Continuous Mapping Theorem immediately implies that
i.e., the insample performance of the MVE converges to the performance of the MVOP.
5.2.2 Asymptotic distribution
Now, the asymptotic distribution of \(\sqrt{n}\,(w^\star _nw^\star )\) is derived. If some portfolio weight \(w^\star _i\) is bounded by \({\mathcal {S}}\) it must be zero and the associated MVE is superconsistent, i.e., \(\sqrt{n}\,w^\star _{in}\overset{{\tiny {\text{ p }}}}{\rightarrow }0\). Hence, in order to derive the asymptotic distribution of \(\sqrt{n}\,(w^\star _nw^\star )\), we must guarantee that no component of the MVOP \(w^\star \) is bounded by \({\mathcal {S}}\). According to Corollary 2, this holds true if and only if \(\mu \Sigma \,{\tilde{w}}^\star ={\varvec{0}}\), i.e., \({\tilde{w}}^\star =\Sigma ^{1}\mu \). However, in practical situations it often happens that the weight of the riskless asset, \(w^\star _0\), is bounded by \({\mathcal {S}}\), which means that the Lagrange multiplier \(\lambda \) in Theorem 8 is positive. In this case, we must abandon the riskless asset from our asset universe and focus on the risky assets. Then the MVOP is simply characterized by \({\tilde{w}}^\star \!\in {\mathcal {S}}\) such that the ith component of \(\mu \Sigma \,{\tilde{w}}^\star \) is
with \(\lambda >0\). Thus, in the case in which the riskless asset has been removed, we assume that the remaining asset universe is such that \(\mu \Sigma \,{\tilde{w}}^\star =\lambda {\varvec{1}}\) for any \(\lambda >0\).
Consider the family \({\mathcal {F}}=\big \{f_w\big \}_{w\in {\mathcal {S}}}\) with
for all \(w\in {\mathcal {S}}\) and \(r\in \mathbb {R}^N\). Further, define the functions
and
It is obvious that the function F can be locally approximated at \(w^\star \) by
where \(\Sigma \) is positive definite.
The next regularity conditions are analogous to A9 and A10:
 B5.:

The function \(f_w\) can be locally approximated at \(w^\star \) by
$$\begin{aligned} f_w(R_t) = f_{w^\star }(R_t) + ({\tilde{w}}{\tilde{w}}^\star )'\big (R_tR_tR'_t{\tilde{w}}^\star \big ) + \Vert {\tilde{w}}{\tilde{w}}^\star \Vert \,r(R_t;{\tilde{w}}), \end{aligned}$$where the process \(\big \{r(R_t;{\tilde{w}})\big \}\) is stochastically equicontinuous.
 B6.:

We have that
$$\begin{aligned} \sqrt{n}\,\big (\mu _n\mu \big )  \sqrt{n}\,\big (\Sigma _n\Sigma \big ){\tilde{w}}^\star \rightsquigarrow {\mathcal {N}}\big ({\varvec{0}},B\big ). \end{aligned}$$
Once again, B5 guarantees that the remainder \(r(X_t,w)\) of the linear approximation becomes negligible as \(n\rightarrow \infty \). Further, B6 requires the joint asymptotic normality of the given estimators for \(\mu \) and \(\Sigma \) after the usual standardization. Since \(\mu _n\) and \(\Sigma _n\) represent the moment estimators of \(\mu \) and \(\Sigma \), basically it states that \(\big \{R_tR_tR'_t{\tilde{w}}^\star \big \}\) should satisfy the Central Limit Theorem.^{Footnote 15}
The latter assumption indicates that we can decompose the estimation risk into two parts:

(i)
\(\sqrt{n}\,\big (\mu _n\mu \big )\) represents the estimation risk that can be attributed to \(\mu \), whereas

(ii)
\(\sqrt{n}\,\big (\Sigma _n\Sigma \big ){\tilde{w}}^\star \) stands for the estimation risk that is related to \(\Sigma \).
Note that such a risk decomposition cannot be accomplished for the BCRP.
In some cases it is possible to calculate the asymptotic covariance matrix B in B6. For example, if \(R_1,R_2,\ldots \) are serially independent and normally distributed, we have that
where \(\Gamma =\Sigma \mu \mu '\) denotes the covariance matrix of R.^{Footnote 16} More precisely, we can apply the decomposition \(B=B_\mu +B_\Sigma \), where
quantifies the estimation risk that is associated with \(\mu \) and
measures the estimation risk related to \(\Sigma \). Similar results can be obtained if we assume that R has an elliptical distribution possessing heavy tails and tail dependence. Alternatively, we could apply a (block) bootstrap (see, e.g., Politis 2003) in order to approximate B, or even \(B_\mu \) and \(B_\Sigma \), without making any parametric assumption.
Consider any random vector \(Z\sim {\mathcal {N}}\big ({\varvec{0}},B\big )\). Now, we may define
with \(\varsigma =(\varsigma _0,\varsigma _1,\ldots ,\varsigma _N)\) and \({\tilde{\varsigma }}=(\varsigma _1,\varsigma _2,\ldots ,\varsigma _N)\). The (unique) maximizer of \(\Phi _Z\) is given by
The following theorem clarifies the asymptotic behavior of the MVE. This result follows by the same arguments that were used for Theorem 7 and so the proof can be skipped.
Theorem 12
We have that
In the case in which the riskless asset has been removed from the asset universe, we may consider the (unique) maximizer
and then Theorem 12 reads
6 Some practical remarks
6.1 Computational issues
Cover’s (1984) algorithm for the BCRP is simple and works like this:

(i)
Choose any initial portfolio \(w^{(0)}\in {\mathcal {S}}\) and set \(k\leftarrow 0\).

(ii)
Update the portfolio according to
$$\begin{aligned} w^{(k+1)} = w^{(k)}\!\odot \frac{1}{n}\sum _{t=1}^n\frac{X_t}{w^{(k)\prime }X_t} \end{aligned}$$and set \(k\leftarrow k+1\).^{Footnote 17}

(iii)
Repeat the second step until the largest component of the vector
$$\begin{aligned} \frac{1}{n}\sum _{t=1}^n\frac{X_t}{w^{(k)\prime }X_t} \end{aligned}$$falls below a critical threshold just above 1.
The computations made during this work are based on MATLAB. The critical threshold for the BCRP is \(\exp 10^{6}\). Further, the MOSEK optimization toolbox for MATLAB is used in order to compute the MVE, which proves to be very fast and reliable. It turns out that the BCRP, \(w^*_n\), and the MVE, \(w^\star _n\), are almost identical. However, computing \(w^\star _n\) by quadratic optimization is much faster. In order to demonstrate these statements, we can simulate n independent and identically distributed vectors of daily asset returns \(R_1,R_2,\ldots ,R_n\sim {\mathcal {N}}\big (\mu ,\Gamma \big )\) with
Let us assume that the number of risky assets is \(N=100\) and the number of daily observations is \(n=250\). In this case, both \(w^*_0\) and \(w^\star _0\) are bounded by \({\mathcal {S}}\), i.e., \(w^*_0=w^\star _0=0\). Thus, we abandon the riskless asset from the asset universe.
The numerical simulations are done 100 times. Each time Cover’s algorithm for \({\tilde{w}}^*_n\) and the quadratic optimizer for \({\tilde{w}}^\star _n\) is applied. On average, Cover’s algorithm needs 5.5914 s, whereas MOSEK takes only 0.0103 s.^{Footnote 18} The supremum norm of \({\tilde{w}}^*_n{\tilde{w}}^\star _n\) is 0.0173. Although Cover’s algorithm is much slower than the quadratic optimizer, the outcome of the latter turns out to be slightly better: The quadratic optimizer leads to an annualized average logreturn of 0.4892, whereas Cover’s algorithm yields only 0.4890 per year. That is, the quadratic optimizer comes even closer to the (true) BCRP than Cover’s algorithm. In fact, the average logreturns produced by the quadratic optimizer are always better than those of Cover’s algorithm. Hence, \({\tilde{w}}^\star _n\) dominates \({\tilde{w}}^*_n\) in a numerical sense. Moreover, Cover’s algorithm is very slow in high dimensions, whereas the quadratic optimizer works well even for \(N=1000\) and \(n=2500\), in which case the computational time for \({\tilde{w}}^\star _n\) is still below 1 s.
There is another computational issue. For applying the asymptotic results derived in Sect. 4.2.2 we have to simulate the random vector \(Y\sim {\mathcal {N}}({\varvec{0}},A)\), where the covariance matrix A appears in A10. The problem is that A is singular. More precisely, we have that
which means that A is not positive definite. Thus, we have to apply a matrix decomposition in order to simulate Y. This issue does not arise when applying the asymptotic results derived in Sect. 5.2.2, in which case we must simulate the random vector \(Z\sim {\mathcal {N}}({\varvec{0}},B)\). As already mentioned in Sect. 5.2.2, we can even provide a closedform expression for B in many standard situations. The principal approach is demonstrated in the “Appendix”.
To sum up, the quadratic approximation proposed at the beginning of Sect. 5 works very well and, in contrast to the BCRP, the MVE does not suffer from computational issues. For this reason, we focus on \({\tilde{w}}^\star _n\) in the following discussion.
6.2 Statistical inference
Let us assume that the elements of \(\big \{R_t\big \}\) are serially independent and normally distributed. To keep things as simple as possible, we may choose the parameterization in Eq. 3. Further, let the number of risky assets be \(N=2\) and the number of observations be \(n=250\).^{Footnote 19} Once again, we generate 100 samples and with each one we compute a realization of \({\tilde{w}}^\star _n\). On the upper left of Fig. 1 we can see that most of the estimates are far away from \({\tilde{w}}^\star =(0.5,0.5)\). The vast majority of the estimates are boundary solutions. More precisely, we have 50 estimates that equal (0, 1) and 41 that correspond to (1, 0). The given result does not improve, essentially, if we increase the number of observations to \(n=2500\) and it is still sobering even for 1 million observations. By contrast, if we assume that \(\mu \) was known, the estimates turn out to be much better (see the lower part of Fig. 1). In particular, there is no more estimate at the boundary of the simplex, and in the case of \(n=10^6\) observations the estimates are almost identical with \({\tilde{w}}^\star \).
Are we able to replicate the finitesample results by a largesample approximation? For this purpose we could use Theorem 12 and the expressions for \(B_\mu \) and \(B_\Sigma \) presented in Sect. 5.2.2. The corresponding realizations of the synthetic estimator \({\tilde{w}}^\infty _n:={\tilde{w}}^\star +{\tilde{\varsigma }}^\star /\sqrt{n}\,\) are depicted in Fig. 2. The upper left of this figure indicates that there are 90 realizations outside the simplex. This is because the largesample approximation is based on the maximizer \({\tilde{\varsigma }}^\star \), which belongs to the tangent cone of \({\mathcal {S}}\) at \({\tilde{w}}^\star \). Hence, the support of \({\tilde{w}}^\infty _n\) does not correspond to \({\mathcal {S}}\). Similarly, there are 82 realizations of \({\tilde{w}}^\infty _n\) missing in the simplex on the upper center. By contrast, the simplex on the upper right contains all 100 realizations of \({\tilde{w}}^\infty _n\). The picture changes essentially on the lower part of Fig. 2, where it is assumed that \(\mu \) is known. In this case, we cannot find any realization of \({\tilde{w}}^\infty _n\) outside \({\mathcal {S}}\). Moreover, the largesample approximation satisfyingly reproduces the finitesample results that are depicted on the lower part of Fig. 1.
The problem is that the expected asset returns are unknown in real life. However, we can essentially improve the largesample approximation by applying a finitesample correction in order to guarantee that the realizations always belong to the simplex. We know from Theorem 12 that, if the sample size is large, \(\sqrt{n}\,\big ({\tilde{w}}^\star _n{\tilde{w}}^\star \big )\) behaves essentially like the maximizer, \({\tilde{\varsigma }}^\star \), of \({\tilde{\varsigma }}\mapsto {\tilde{\varsigma }}'Z\frac{1}{2}\,{\tilde{\varsigma }}'\Sigma \,{\tilde{\varsigma }}\) over the tangent cone of \({\mathcal {S}}\) at \({\tilde{w}}^\star \). Hence, since the sample size is not large enough, we may substitute \({\tilde{\varsigma }}^\star \) with
^{Footnote 20}The corrected version of \({\tilde{w}}^\infty _n\) reads \({\tilde{w}}^\diamond _n:={\tilde{w}}^\star +{\tilde{\varsigma }}^\star _n/\sqrt{n}\,\), which always belongs to the simplex.
In order to verify that the finitesample correction works fine, we may compare the empirical distribution functions of 10,000 realizations of \(w^\star _{1n}\) and \(w^\diamond _{1n}\), where \(\mu \) is assumed to be unknown. We still have only \(N=2\) risky assets and the parameterization is the same as before (see Eq. 3). The results are given in Fig. 3. Obviously, the finitesample correction serves its purpose. Indeed, the corrected largesample approximation is very accurate for all sample sizes.
Figure 3 reveals that most realizations of \({\tilde{w}}^\diamond _n\) are either (0, 1) or (1, 0) unless the sample size equals \(n=10^6\). The LOP corresponds to \({\tilde{w}}^\star =(0.5,0.5)\) and thus it is precisely in between (0, 1) and (1, 0). It seems that estimating the LOP is a mission impossible in reallife situations—at least without any prior information about \(\mu \). Table 1 contains the probability that the realization of the MVE is a singleasset portfolio for different numbers of assets (\(N=5,50,100,500,1000\)) and observations (\(n=250,2500,5000,10{,}000,10^6\)). The results are based on 1000 realizations of \({\tilde{w}}^\diamond _n\) for each combination of N and n. Note that the LOP always corresponds to the equally weighted portfolio, i.e., \({\tilde{w}}^\star ={\varvec{1}}/N\). The table shows that, in all practical applications, the MVE proposes a singleasset portfolio with high probability although the LOP is welldiversified. It is worth emphasizing that the results would not change essentially if we substitute the MVE with the BCRP, since these estimators for the LOP are almost identical.
Now, in principle, we are able to construct hypothesis tests and compute confidence regions. For example, we could try to apply a hypothesis test of the form \(H_0\!:{\tilde{w}}^\star ={\tilde{w}}^\star _0\) vs. \(H_1\!:{\tilde{w}}^\star \ne {\tilde{w}}^\star _0\) for any \({\tilde{w}}^\star _0\in {\mathcal {S}}\) even in the case of \(N>2\).^{Footnote 21} However, in the light of the previous results, we may doubt that any hypothesis test will ever lead to a rejection or that a confidence region will ever be sufficiently small in reallife situations. This conclusion might appear negative to the reader, but the author fears that this is the price we sometimes have to pay in science.
7 Conclusion
A quadratic approximation of logreturns works very well on a daily basis. Thus, in order to find the BCRP, we may focus on the MVE, which can easily be computed. The corresponding algorithm is very fast even if the number of dimensions is high and the results are even better compared to Cover’s algorithm for the BCRP. However, in most practical applications, we typically overestimate the expected outofsample performance of the MVE and even the performance of the MVOP. The same holds true for the expected outofsample logreturn on the BCRP and the expected logreturn on the LOP.
Both the BCRP and the MVE exist and are unique under mild regularity conditions. Moreover, they are strongly consistent. Analogously, both their outofsample performance measures and their insample performances converge to the performance of the LOP or the MVOP, respectively, as the number of observations grows to infinity. The given estimators for the LOP are even \(\sqrt{n}\,\)consistent. In principle, the asymptotic results derived in this work can be used for constructing hypothesis tests and for computing confidence regions, but for this purpose one should apply a finitesample correction, which substantially improves the largesample approximation.
However, it turns out that the impact of estimation risk concerning \(\mu \) is tremendous in most reallife situations. Estimating the LOP without having any prediction power seems to be a futile undertaking. The estimators often lead to a singleasset portfolio even if the LOP corresponds to the equally weighted portfolio and thus is welldiversified. The given results confirm a general rule, which has become folklore during the last decades, namely that portfolio optimization typically fails on estimating expected asset returns.
Notes
Actually, it is assumed that \(n>N\) and so there should be no confusion.
See also Chapter 15 in Cover and Thomas (1991).
See Karatzas and Kardaras (2007) for a detailed analysis of the GOP.
Note that, after any local change, w must still belong to \({\mathcal {S}}\).
This follows from Cochran’s Theorem and represents a special result of Basu’s Theorem.
See also Example 19.8 in van der Vaart (1998).
Here, “\({\mathcal {P}}_\varepsilon \)” stands for “\(\varepsilon \)periphery.” Note that it contains only those w with distance \(\varepsilon \) to \(w^*\) that belong to \({\mathcal {S}}\).
Here, “\(\rightsquigarrow \)” denotes convergence in distribution.
Remember that \({\mathbf {E}}\big (X_t/w^{*\prime }X_t\big )={\varvec{1}}\) holds true by construction.
We could imagine \(w^*\!\in {\mathcal {S}}\) seeing through a microscope and increasing by and by the magnification. The visible part of \({\mathcal {S}}\) around \(w^*\) converges to \({\mathcal {T}}_{{\mathcal {S}}}(w^*)\), i.e., \(\tau \big ({\mathcal {S}}w^*\big )\rightarrow {\mathcal {T}}_{{\mathcal {S}}}(w^*)\), in the PainlevéKuratowski sense.
Remember that the riskfree interest rate is supposed to be zero without loss of generality.
Note that the domain of the objective function is \({\mathcal {S}}\) but its value is determined only by the “risky part” of w, i.e., \({\tilde{w}}\).
When using daily asset returns, the portfolio weights typically differ only from the fourth digit.
Some authors use the covariance matrix of R instead of \(\Sigma ={\mathbf {E}}(RR')\).
See also the explanations about the Central Limit Theorem regarding A10.
The derivation of B can be found in the “Appendix”.
Here, “\(\odot \)” denotes the Hadamard, i.e., componentwise, matrix product.
The computations are done on a Windows Laptop with Intel Core i75500U CPU (2.4 GHz).
The weight of the riskless asset is still bounded by \({\mathcal {S}}\) and thus \(w^\star _0=0\).
The constraint \({\tilde{\varsigma }}\in \!\sqrt{n}\,({\mathcal {S}}{\tilde{w}}^\star )\) can simply be implemented, numerically, by setting \({\tilde{\varsigma }}\ge \sqrt{n}\,{\tilde{w}}^\star \) and \({\varvec{1}}'{\tilde{\varsigma }}=0\).
Note that \({\tilde{w}}^\star _0\) is not the weight of the riskless asset but some portfolio of N risky assets.
References
Algoet P, Cover T (1988) Asymptotic optimality and asymptotic equipartition properties of logoptimum investment. Ann Probab 16:876–898
Bell R, Cover T (1980) Competitive optimality of logarithmic investment. Math Oper Res 5:161–166
Bradley R (2005) Basic properties of strong mixing conditions. A survey and some open questions. Probab Surv 2:107–144
Breiman L (1961) Optimal gambling systems for favorable games. In: Proceedings of the 4th Berkeley symposium on mathematical statistics and probability, pp 63–68
Christensen M (2005) On the history of the growth optimal portfolio, Technical report, University of Southern Denmark
Cover T (1984) An algorithm for maximizing expected log investment return. IEEE Trans Inf Theory IT 30:369–373
Cover T, Thomas J (1991) Elements of information theory. Wiley, New York
Davidson J (1994) Stochastic limit theory. Oxford University Press, Oxford
Frahm G (2015) A theoretical foundation of portfolio resampling. Theory Decis 79:107–132
Frahm G (2016) Pricing and valuation under the realworld measure. Int J Theor Appl Finance. https://doi.org/10.1142/S0219024916500060
Frahm G (2018) Arbitrage pricing theory in ergodic markets. Int J Theor Appl Finance. https://doi.org/10.1142/S021902491850036X
Frahm G, Memmel C (2010) Dominating estimators for minimum variance portfolios. J Econom 159:289–302
Geyer C (1994) On the asymptotics of constraint Mestimation. Ann Stat 22:1993–2010
Hakansson N (1971) Capital growth and the meanvariance approach to portfolio selection. J Financ Quant Anal 6:517–557
Kan R, Zhou G (2007) Optimal portfolio choice with parameter uncertainty. J Financ Quant Anal 42:621–656
Karatzas I, Kardaras C (2007) The numéraire portfolio in semimartingale financial models. Finance Stoch 11:447–493
Kelly J (1956) A new interpretation of information rate. Bell Syst Tech J 27:379–423
MacLean L, Thorp E, Ziemba W (eds) (2011) The Kelly capital growth investment criterion. World Scientific, Singapore
Magnus J, Neudecker H (1979) The commutation matrix: some properties and applications. Ann Stat 7:381–394
Markowitz H, Usmen N (2003) Resampled frontiers versus diffuse Bayes. J Invest Manag 1:1–17
Memmel C (2004) Schätzrisiken in der Portfoliotheorie, PhD thesis, University of Cologne
Merton R, Samuelson P (1974) Fallacy of the lognormal approximation to optimal portfolio decisionmaking over many periods. J Financ Econ 1:67–94
Neudecker H (1969) Some theorems on matrix differentiation with special reference to Kronecker matrix products. J Am Stat Assoc 64:953–963
Platen E, Heath D (2006) A benchmark approach to quantitative finance. Springer, Berlin
Politis D (2003) The impact of bootstrap methods on time series analysis. Stat Sci 18:219–230
van der Vaart A (1998) Asymptotic statistics. Cambridge University Press, Cambridge
Acknowledgements
Open Access funding provided by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The asymptotic covariance matrix
The asymptotic covariance matrix
Here, the asymptotic covariance matrix B, which occurs in Sect. 5.2.2, is derived. We assume that \(R_1,R_2,\ldots \) are serially independent and normally distributed. Write
where \(\Gamma \) is the covariance matrix of R. The empirical covariance matrix of \(R_1,R_2,\ldots ,R_n\) is
Thus, we obtain
Note that
where \(\sqrt{n}\,\big (\mu _n\mu \big )\big (\mu _n\mu \big )'\) vanishes (in probability) as \(n\rightarrow \infty \) and
Thus, we conclude that
where \(\sqrt{n}\,\big (\mu _n\mu \big )\) and \(\sqrt{n}\,\big (\Gamma _n\Gamma \big )\) are asymptotically independent. Moreover, the given terms converge to a joint normal distribution. The asymptotic covariance matrix of \(\sqrt{n}\,{\text{ vec }}\big (\Gamma _n\Gamma \big )\) is \(\big ({\mathbf {I}}_{N^2}+K_{N^2}\big )\big (\Gamma \otimes \Gamma \big )\), where the vec operator stacks the columns of a matrix on top of one another, \({\mathbf {I}}_{N^2}\) is the \(N^2\times N^2\) identity matrix, \(K_{N^2}\) is the \(N^2\times N^2\) commutation matrix, and “\(\otimes \)” denotes the Kronecker matrix product. According to Magnus and Neudecker (1979, Eq. 2.1) we have that
which means that
is the asymptotic covariance matrix of \(\sqrt{n}\,\big (\Gamma _n\Gamma \big ){\tilde{w}}^\star \). Due to Neudecker (1969, Eq. 2.2) we obtain
and from Theorem 3.1 in Magnus and Neudecker (1979) it follows that
Hence, the asymptotic covariance matrix of \(\sqrt{n}\,\big (\Gamma _n\Gamma \big ){\tilde{w}}^\star \) is
It remains to calculate the asymptotic covariance matrix of
The asymptotic covariance matrix of \(\sqrt{n}\,(\mu _n\mu )\) is \(\Gamma \), which leads to
Thus, the asymptotic covariance matrix of \(\sqrt{n}\,\big (\Sigma _n\Sigma \big )\) is
which quantifies the estimation risk if the parameter \(\mu \) was known to the investor. However, in real life the expected asset returns are unknown and so B equals the asymptotic covariance matrix of
which can be rewritten as
By using the above arguments we conclude that
Now, the reader can verify that the impact of estimating the expected asset returns is
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Frahm, G. Statistical properties of estimators for the logoptimal portfolio. Math Meth Oper Res 92, 1–32 (2020). https://doi.org/10.1007/s00186020007011
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00186020007011
Keywords
 Best constant rebalanced portfolio
 Estimation risk
 Growthoptimal portfolio
 Logoptimal portfolio
 Meanvariance optimization
JEL Classification
 C13
 G11