Signal-to-noise matrix and model reduction in continuous-time hidden Markov models

Continuous-time regime-switching models are a very popular class of models for financial applications. In this work the so-called signal-to-noise matrix is introduced for hidden Markov models where the switching is driven by an unobservable Markov chain. Its relations to filtering, i.e. state estimation of the chain given the available observations, and portfolio optimization are investigated. A convergence result for the filter is derived: The filter converges to its invariant distribution if the eigenvalues of the signal-to-noise matrix converge to zero. This matrix is then also used to prove a mutual fund representation for regime-switching models and a corresponding market reduction which is consistent with filtering and portfolio optimization. Two canonical cases for the reduction are analyzed in more detail, the first based on the market regimes and the second depending on the eigenvalues. These considerations are presented both for observable and unobservable Markov chains. The results are illustrated by numerical simulations.


Introduction
Regime-switching models are a very popular class of models in the field of mathematical finance. They describe return processes with time-changing drift or volatility parameters. Thus, they are a possible way to generalize the classical Black-Scholes lognormal stock price model by making the parameters dependent on a Markov chain with finitely many states. The switching parameters allow for flexible and realistic fits to observed market data. In the continuous-time model with switching volatility, the underlying Markov chain is observable (in theory) due to this stochastic volatility and no estimation (filtering) of it is needed. We call this model the Markov-switching model (MSM). In the model with constant volatility one has to filter for the underlying hidden Markov chain, i.e. to compute the conditional probability for the underlying state given the observed stock returns. Therefore, this model is called the hidden Markov model (HMM). The filtering problem was solved in the continuous-time model in Wonham (1965) and Elliott (1993). It was discretized and robustified in James et al. (1996) in the sense of Clark (1978), being consistent with filtering in discrete time as in Hamilton (1989). Portfolio optimization in the continuous-time HMM covering logarithmic and power utility was solved e.g. by Sass and Haussmann (2004) using Malliavin calculus and by Bäuerle and Rieder (2005) following an HJB approach. Also BSDE methods can be applied to portfolio optimization under partial information, see Papanicolaou (2019).
In portfolio optimization, estimating parameters or the true value of a system using "noisy" real-world observations can lead to poor portfolio performance already in the one-period model, when e.g. constructing a minimum-variance portfolio or applying mean-variance optimization. This is especially true for markets with a large number of assets (DeMiguel et al. 2007). There is an extensive literature available on how to approach such estimation problems, mainly in the one-period model. One may cluster according to the correlations and invest, in the spirit of DeMiguel et al. (2007), with equal weights in the representatives (e.g. Sass and Thös 2022). Or, as in Zhao et al. (2019), tackle such a problem by splitting the eigenvectors of the covariance matrix into well-estimated and poorly-estimated ones and use these to construct portfolios. Chen and Yuan (2016) restrict the investment in the mean-variance analysis on a subspace, using e.g. the leading eigenvectors of the covariance matrix to span this subspace. Avellaneda et al. (2021), Avellaneda and Lee (2010), Boyle (2014) also use a principal component analysis of the correlation matrix, leading to so-called eigenportfolios (see Remark 5.12 for how this relates to our setting).
Mutual funds are a well-known and famous concept from classical finance and financial mathematics which have been studied for a long time. Mutual fund separation theorems imply that it is optimal to trade in a portfolio that is a linear combination of single assets. The separation can e.g. be studied in the context of mean-variance analysis (Tobin 1958;Merton 1972) or using an expected utility setting (Cass and Stiglitz 1970;Schachermayer et al. 2009). The approach of Chamberlain (1988) uses martingale representation theory. General applicability of the Mutual Fund Theorem was discussed in Schachermayer et al. (2009) for classes of utility functions under some completeness condition. There, the mutual fund depends on time and there is only one fund of risky assets next to the riskfree bond.
In this work we bring both concepts, regime-switching models and mutual funds, together by introducing a reduced market model where one can trade in several mutual funds. In this reduced market we derive a result which provides a representation by as many mutual funds as we have market regimes. Each of these mutual funds would correspond to the classical mutual separation theorem, if we are in a model with static parameters corresponding to that regime. The reduced and the original model are connected by what we call the signal-to-noise matrix. It depends on the parameters of the HMM, namely it relates the states of the drift to the volatility. We prove that this matrix plays a central role in both portfolio optimization and filtering. Its name is motivated by the expression "signal-to-noise ratio" known from classical filtering theory and signal analysis. As an indicator of how much information about the signal can be present in the observation, the signal-to-noise matrix is also connected to the possible performance of the filter. The invariant distribution of the underlying Markov chain is always of significance for the filter as well, since it is approximately its expected value and also often chosen as the starting value.
We prove that for vanishing signal-to-noise matrix in terms of its eigenvalues, the filter indeed converges to this invariant distribution. The convergence result implies that in this case, the observations do not contain significant information about the signal anymore. We then derive a market reduction from the signal-to-noise matrix that depends on a condition on its eigenvalues. We establish the equivalence of the reduced market to the original model and the equivalence of the filters, depending on this condition. The funds are a linear combination of the original assets and do not change their composition over time. We also provide an explicit calculation for the composition of the funds. We show that portfolio optimization in the reduced market leads to an optimal wealth process that is identical to the optimal wealth in the original model. Furthermore, we consider two canonical cases for the reduced model: First the so-called "reduced regime representation model" (RRRM), where it turns out to be log-optimal to invest portions of wealth into the funds according to the filtered state probabilities. The second case is the "eigenvalue representation" (REVM) which is based on a principal component analysis of the signal-to-noise matrix. In this setting, the log-optimal strategy not only depends on the filter, but is also scaled by the eigenvalues of the signal-to-noise matrix. If the signal-to-noise matrix is singular, the market reduction looks differently. Then, we still arrive at an optimal terminal wealth that is identical to the original model, but the optimal strategy depends on the original observations. The theoretical results are underlined by a numerical evaluation of the optimization problem. The simulations suggest that the investor finds herself in the classical dilemma of risk versus gain. Again the signal-to-noise matrix assigning the funds is of great importance: controlling its values decides in which direction the portfolio is lead.
To summarize, in this work we prove a mutual fund representation and market reduction that is derived from the so-called signal-to-noise matrix. We investigate the relation of this matrix to filtering and portfolio optimization and prove convergence of the filter to the invariant distribution for vanishing eigenvalues. We present two canonical cases for the market reduction, the regime representation and the eigenvalue representation. For the latter case, a distinction is made depending on the number of non-zero eigenvalues of the signal-to-noise matrix. Furthermore, we present numerical simulations to illustrate our results. The main innovation is to base our analysis on the signal-to-noise matrix which is more informative in the filtering setting than the correlation matrix. This allows for new and more explicit results than focusing on the risk premium (cf. our discussion at the end of Sect. 4.1).
This work is organized as follows: In Sects. 2 and 3 we collect well-known results on filtering and portfolio optimization in regime-switching models. In Sect. 4 we formally introduce the signal-to-noise matrix and prove that a vanishing signal-to-noise matrix leads to convergence of the filter to the invariant distribution. In Sect. 5 we use a decomposition of the signal-to-noise matrix to obtain one of the main results of this work: A mutual fund theorem and corresponding model reduction. The effects of these model reductions on both portfolio optimization and filtering are investigated. As it turns out, the case of a singular signal-to-noise matrix has to be handled separately, which we do in Sect. 5.4. In Sect. 5.5 we then discuss corresponding results for the MSM and an HMM with non-constant volatility which can be seen as a model that lies between MSM and HMM. The conclusion in Sect. 6 summarizes our contributions.
In the following, we will use some abbreviations and acronyms. We summarize them here: where B ∈ R n×d is any matrix while the matrices σ (e k ) ∈ R n×n >0 are supposed to be non-singular. A popular intuition behind Y is that it reflects the current underlying state of the economy. To describe its dynamics we additionally need the rate matrix Q ∈ R d×d for which the negative of the diagonal element −Q kk provides the exponential rate for leaving state e k , k = 1, . . . , d, and the ratio −Q kl /Q kk is the transition probability from e k to e l , l = k, if the chain jumps at all. We assume that the chain is irreducible. Therefore, under our condition of having finitely many states, a unique stationary distribution ν exists and is given by ν Q = 0, ν 1 = 1.
The prices of the n stocks are then given by where Diag(y) denotes the diagonal matrix with diagonal y. Further, there is a money market account for which we assume interest rate r = 0 to keep notation simple. The results for filtering and for portfolio optimization below can be adapted to non-zero r easily.

MSM and HMM
In a suitable probability space ( , A, P) for the model above we shall distinguish two filtrations. On the one hand we have F = (F t ) t∈ [0,T ] , which is generated by Y and W , and augmented by the null sets. This corresponds to full information. For convenience we assume A = F T . On the other hand, in real-world applications an investor typically can only rely on the observed stock prices or stock returns (this is equivalent in this model). Her information thus is given by F R = (F R t ) t∈ [0,T ] , which is the filtration generated by R, again augmented by the null sets. We say that an investor with information F R has partial information.
Let us denote (e i ) = σ (e i )σ (e i ) . In the continuous-time regime switching model, (Y t ) can in theory be observed from the quadratic covariation of the stock returns and thus one can observe the jumps of the underlying chain if the matrices (e i ), i = 1, . . . , d are pairwise different. Then-in theory-an investor with partial information has in fact full information, since Y t can be obtained from (Y t ). For a full discussion and more details, see Elliott et al. (2008), Krishnamurthy et al. (2018). Note that in reality with discrete-time observations of the continuous process this would not be true. Nevertheless, in the theory we have to distinguish the following cases. MSM and HMM are the extreme cases. For settings for which only some of the (e i ) agree, the subsequent results can be adapted. As discussed before, in the MSM an investor has full observation and thus she knows Y t and hence μ t = BY t and t = (Y t ) at t. An investor with partial information has to estimate the underlying drift. The usual approach is to use the L 2 -optimal estimate for μ t at t. This quantity is called the filter and is defined by is the well-known continuous-time Wonham filter for Y t (Wonham 1965), see Sect. 2.3 below. Up to the beginning of Sect. 5.5 we shall concentrate on the HMM since filtering issues play an important role in our considerations.

Filtering in the HMM
We consider the HMM d R t = μ t dt +σ dW t , μ t = BY t , i.e. = σ σ is constant and the filtering problem is non-trivial. To find the filterŶ t = E[Y t | F R t ] which yieldsμ t by (2.2), we can use a change of measure toP ∼ P with Radon-Nikodym derivative UnderP,W t = σ −1 R t is a Brownian motion independent of Y .P is called reference measure in filtering and for interest rate 0 it corresponds to the risk neutral or equivalent martingale measure in finance, see e.g. Elliott (1993). ByẼ we denote the expectation underP. This reference measure is used to introduce the unnormalized filter ρ t := , which satisfies the Zakai-equation (Elliott 1993) The Zakai-equation is linear in ρ t and driven by the observations. Using Bayes' formula .
By the definition of ρ t this implieŝ and thus by (2.4) and by Q1 = 0 we get (2.6) Using (2.5), Bayes' formula for conditional expectations, also called Kallianpur-Striebel formula in this context, readsŶ This implies that knowing ρ t , the filterŶ t can be calculated directly. Thus, in filtering, one typically tries to compute ρ t . By (2.4) and (2.6) and applying Itô's formula to (2.7) we get the Kushner-Stratonovich equation (2.8)

Trading and portfolio optimization
Remember that we consider one money market account with interest rate 0 and n stocks with returns where σ t is switching with Y t in the MSM, constant in the HMM. We may also allow for suitable F R -adapted volatility processes as we use them in Sect. 5.5. We set t = σ t σ t . In all cases, the trading strategy of an investor can be described by her initial capital x 0 > 0 and the risky fraction process π = (π t ) t∈ [0,T ] if the wealth stays strictly positive (which is the case for the utility functions we will consider). The wealth process (X π t ) t∈[0,T ] then follows So π i t denotes the fraction of wealth X t invested in stock i. For given x 0 > 0 the admissible π are in particular an investor can at time t only use the information F R t obtained from observing the stock returns. However, for the MSM this is equivalent to having full information while for the HMM this is a case with strictly partial information, cf. the discussion in Sect. 2.2.
Note that (3.1) has the explicit solution We evaluate the terminal wealth by a utility function U : We will focus on power and logarithmic utility functions, which are utility functions with constant relative risk aversion (CRRA).
The problem of maximizing expected utility of terminal wealth then is: Maximize ] < ∞} is the set of risky fraction processes admissible for U , with U − its negative part. The problem (3.2) has been solved in Sass and Haussmann (2004) for the HMM with partial information. For general U , it is quite straightforward to show The difficulty lies then in finding the strategy as explicitly as possible. We cite the following result.
Theorem 3.1 In the HMM for U = U α , α < 1, In particular π * t = −1 BŶ t for U = U 0 = log. Proof This is a special case of Theorem 4.5 in Sass and Haussmann (2004) which uses the linearity of the Zakai equation (2.4) in order to show existence of the Malliavin derivative of X * T , cf. Corollaries 4.8, 4.9 and Proposition 4.10 in Sass and Haussmann (2004).
Note that the result also states that ρ t is a sufficient statistic to compute the conditional expectations. This is due to the fact that ρ t satisfies the stochastic differential equation (2.4) which is driven by the observations and allows to derive the corresponding Markov property and thus to simplify the initial conditional expectation given F R t . Practically it means that the strategy can be computed efficiently from the unnormalized filter ρ t .
For the MSM, [Bäuerle and Rieder (2004), Theorems 2 and 3] provide optimal policies for CRRA utility functions: Theorem 3.2 In the MSM, for U = U α , In the following we concentrate on the HMM since it involves the filtering problem, and discuss the MSM afterwards in Sect. 5.5 again.

Signal-to-noise matrix and convergence
The main idea for the model reduction in Sect. 5 is the observation that filter and portfolio optimization essentially depend on a lower-dimensional matrix which we introduce in Sect. 4.1 and whose influence we illustrate by a convergence result in Sect. 4.2.

Signal-to-noise matrix
In the fundamental results on filtering and portfolio optimization as presented in Sects. 2.3 and 3, for the n-dimensional HMM d R t = BY t dt + σ dW t , the dependency on σ and B is only via the signal-to-noise matrix A or its "root" , This allows for a reduction of the model dimension in case d < n. Before we introduce this in Sect. 5, we first show in the following section that by decreasing the signal-to-noise matrix, we end up with a trivial filter which corresponds to having no information at all. This motivates our name for A and underlines the intuition that the relation between drift and volatility parameters is decisive for the performance of the filter. However, note that Y t is the market price of risk or the risk premium and Y t AY t (or their counterparts using the filterŶ instead of Y ) may be called the risk premium function. This quantity plays a prominent role in financial applications, e.g. in the analysis of robust continuous-time mean-variance problems, cf. Pham et al. (2022). However, since the filterŶ depends on A, we can get more explicit structural results in our filtering setting by concentrating on A instead of the risk premium.

Convergence of the filter
As pointed out in Sect. 4.1, in (2.4) the dependence of the filter on σ and B is only indirectly through the matrix A. In the following we want to study the influence of changes in A on the behaviour of the filter. Intuitively, A describes a proportion between volatility and drift in the observations, and indirectly between volatility and Markov chain Y . This relation can also be seen as an indicator for how much information is present in the observation. Thus, in the following numerical example we consider a setting where the eigenvalues of A tend to zero and see how the average performance of the filter changes.

Example 4.1
We consider a sequence of 1-dimensional HMMs, where the eigenvalues of A decrease. For all iterations of the model, we choose the same rate matrix Q with invariant distribution ν and the same state matrix B as later on in Example 5.19. The decrease in A is achieved by increasing the volatility, i.e. we choose a sequence of volatilities σ n = 0.2 · n. The expected squared distance between the filter and the invariant distribution integrated over time, E[ T 0 ||Ŷ t − ν|| 2 dt], is plotted in Fig. 1. We clearly see that for parameter choices where the largest eigenvalue of A is close to 0, the filterŶ t is close to the vector ν of the invariant distribution, where closeness to ν is not in a distribution sense but in L 2 (R d )-distance. Since E[Ŷ t ] = ν this means the filter does not contain much information about the true state of Y .
If the eigenvalues of A tend to 0, we can imagine that the volatility dominates the drifts, so the information about Y encoded in the returns is overlaid by too much noise. We have less information compared to a model where the ratio of drift to volatility is higher. Note that on the one hand, in the filtering equations we need σ to move away from the non-informative starting value of the invariant distribution in the first place to learn dynamically. On the other hand, as we see it here, "too much" σ compared to B means of course losing information.
We will formalize this intuition by proving that the distances between the filters and ν in a sequence of models converge to 0 if the eigenvalues of the corresponding A converge to 0. We first prove a stability result for SDEs using Doob's martingale inequality and Gronwall's lemma and then apply this result for the SDE of the normalized filter Y . For the detailed proof see Appendix A.1.
Theorem 4.2 Let X n , X be d-dimensional processes bounded by 1 and satisfying Now we consider a series of parameters σ m , B m giving rise to a series of HMMs R m with respect to the same Markov chain and Brownian motion Recall that the corresponding normalized filtersŶ m are then given by Our aim is to prove the convergence ofŶ m t for "too much" volatility. Using Theorem 4.2 we can show that this "too much" is governed by the behaviour of the eigenvalues of A m . This is exactly what we have seen in Example 4.1. For the proof see Appendix A.2. Theorem 4.3 Consider the series of HMMs R m as above and let λ m = (λ max (A m )) 1/2 be the sequence of square roots of the largest eigenvalue of A m . Assume that lim m→∞ λ m = 0. Then we have for all t that that is for all t > 0 the sequence of normalized filters converges in L 2 to the probability vector of the invariant distribution.

Model reduction in the HMM
For a model reduction, we want to arrive at a d-dimensional model with the same performance and filter dynamics as the original n-dimensional model. To achieve this we should aim for a model with the same signal-to-noise matrix, as pointed out in Sect. 4.1. This observation inspires the definition of the model reduction that we discuss in the following.

HMM with non-singular signal-to-noise matrix
We consider the HMM and first look in this section at the main case that the signal-to-noise matrix A = B −1 B has full rank. This is the typical case if n ≥ d, e.g. if we model a market with n risky assets for high n by an underlying Markov chain with d states corresponding to a few market regimes. The idea is to find a d-dimensional return process whereW and δ −1Ȓ are d-dimensional Brownian motions under P andP, respectively. Let us denote byF = (F t ) t∈[0,T ] the filtration generated by Y and W , and byF R = (F R t ) t∈[0,T ] the one generated byȒ, both augmented by the null sets. Since the dimension ofȒ is d ≤ n, we define: Definition 5.1 We call a model with returns satisfying (5.2), where C and D = δδ are non-singular matrices in R d×d with a reduced model of the model with signal-to-noise matrix A = B −1 B.

Theorem 5.2 (i) In a reduced model for C and δ, the Brownian motionW in (5.2) is given byW
The same is true for the unnormalized filter.
Proof (i) Solving C D −1 dȒ t = B −1 d R t forW using (5.1), (5.2), we see that provides the only possible candidate forW . Then, by Lévy's characterization of Brownian motion,W is a Wiener process, since it is a continuous martingale with Vice versa, for d R t = BY t dt + σ dW t and dȒ t = CY t dt + δ dW t we then have So we have a reduced model in the sense of Definition 5.1. In this model,F is generated by Y andW , augmented by the null sets. SinceW s is a function ofW s for all s ≤ t we haveF t ⊆ F t .
(ii) The reference measureP in the reduced model is given by, cf. (2.3), Since (δ −1 C) W t = (σ −1 B) W t , we have by strong uniqueness or directly by the explicit representations of Z T andZ T that Z T =Z T . Therefore,P andP coincide onF T .
(iii) By the Zakai equation (2.4), the unnormalized filterρ in the reduced model Therefore, we have the same dynamics (underP in the original model) and by strong uniqueness of this linear SDE we have that the continuous processes ρ andρ are indistinguishable. Here we use that the corresponding reference measuresP andP are equivalent by (ii). By (2.7), also the corresponding normalized filters are indistinguishable.
For C and δ as given in Definition 5.1 we now chooseW as in Theorem 5.2 (i), i.e., The last statement is a consequence of Theorem 5.2 (ii) since FȒ T ⊆F T .

Remark 5.4
In the case d < n we clearly have strictly less information from observing R than from observing R: When observingȒ t only, we can not distinguish between original returns R t and R t + K t , where K t lies in the kernel ker(B −1 ) ⊆ R n . This kernel is at least 1-dimensional and since A is assumed to have full rank in this section, it is in fact (n − d)-dimensional here. This is true for any choice of C and D according to Definition 5.1.
For example, in the simple case n = 2, d = 1, with diagonal σ and assuming B 21 = 0, we would have that i.e. in the reduced model we cannot distinguish between original returns whose difference lies on the line x → −B 11 σ 2 22 x/(B 21 σ 2 11 ). Theorem 5.2 shows that interestingly the loss of information pointed out in Remark 5.4 does not affect the filter. Theorem 5.5 will show that this is also true for the optimal portfolio value.
By Theorem 5.2 and Corollary 5.3 we can from now on use the same notation for the filtersŶ and the unnormalized filters ρ in the original and the reduced models. We will also use the same notationP for the reference measures, but have to keep in mind that these only agree onF T (and thus on FȒ T ).

Theorem 5.5
The optimal risky fraction processπ * for maximizing expected utility of terminal wealth for utility functions U = U α , α < 1, leads to the same optimal wealth process as obtained in the original model, i.e., whereX * is the wealth process forπ * in the reduced model and X * is the wealth process in the original model when following the optimal strategy π * for maximizing expected utility of terminal wealth.
Proof Following the martingale approach, in the original model we have where I = (U α ) −1 and λ > 0 is uniquely determined by E[Z T X * T ] = x 0 , cf. Sass and Haussmann (2004) and Theorem 3.1. Analogously, we obtain in the reduced model, since the Radon-Nikodym derivatives of the reference measures agree on FȒ t by Theorem 5.2 (ii),X * (1 ρ t ) −1 and would get the analogous result in the reduced model, i.e., E[Z T | FȒ T ] = (1 ρ t ) −1 . But by Theorem 5.2 (iii), ρ andρ are the same, thus λ andλ are given uniquely by the same equation and hence agree. Therefore,X * T = X * T . This implies that also X * T is FȒ T -measurable and thus we get the same replicating strategiesπ * and π * by martingale representation.
As outlined in the introduction there is a long history on mutual fund separation theorems. In continuous time, we could adapt Definition 2.4 for the more general model in Schachermayer et al. (2009) to our setting as follows: We say that the mutual fund theorem holds for a class of utility functions U, if there exists a traded portfolio with values M = (M t ) t∈[0,T ] and corresponding return process R M such that for the optimal terminal wealth X U T under U there exists an F R -adapted, progressively measurable process η U satisfying (for interest rate r = 0) For example, in the Black-Scholes model, i.e. having constant parameters μ, σ in (2.1), = σ σ , we would get in the class of CRRA utility functions U α the optimal π U α t = 1 1−α −1 μ and Therefore, the portfolio given by returns d R M t = μ −1 d R t is the mutual fund, using η U α t = 1 1−α here. So the mutual fund theorem holds in the Black-Scholes model in the class of CRRA functions. This was shown in Merton (1971) already. While we will refer to a representation like (5.4) in Remark 5.16 below, we want to point out here that in the following we think of building portfolios of mutual funds in the following sense: By (3.1) and Theorem 3.1 we have for logarithmic utility i.e. we can think of the optimal terminal wealth coming from investing in d mutual funds B −1 d R t i . These are chosen with the weightsŶ i t which are the conditional probabilities for being in state i. So instead of using one mutual fund given by returns d R M t =Ŷ t B −1 d R t , here we rather think of a representation as in (5.5) with d funds that have some correspondence to the states. In particular, in case ofŶ = e k this would boil down to investing in fund b −1 R t , where b is row k of B. But the latter is the mutual fund in the Black-Scholes model with μ = b. This way, we may think of (5.5) as a representations of mutual funds that would satisfy the mutual fund theorem in the degenerate cases thatŶ t = e i for all t.
Our model reduction argument allows to introduce different decompositions into mutual fundsȒ 1 , . . . ,Ȓ d as discussed in the following remark. In Sect. 5.2 we introduce two canonical cases, one corresponding to (5.5) above.
Remark 5.6 (Interpretation of components ofȒ as mutual funds) By (5.2) we can interpret the components ofȒ as d ≤ n asset returns which yield the same filters by Theorem 5.2 and lead to the same optimal portfolio value when building a portfolio only of these funds. Theorem 5.5 shows that it is sufficient to invest in risky assets R 1 , . . . ,Ȓ d . Note that these funds are given by which shows that the composition of fund i in terms of the original n assets is given by row i of D(C ) −1 B −1 . In particular, it is time-independent. This makes its interpretation more straightforward than a single mutual fund as in (5.4) with an (in our case) time-dependent composition.
Since in the original model the optimal risky fraction isπ * t = −1 BŶ t and since B π * t = AŶ t = C π * t , we also have the representation π * t = (C ) −1 B π * which allows to compute the optimal fund investment directly from the optimal portfolio in the original model. Note that this does not work in the opposite direction, since B is not a square matrix for d < n. Clearly, several models can lead to the same reduced model but not vice versa.
Finally note that we have as many risky mutual funds as we have market regimes (states of the Markov chain), where the remainder is put in the riskfree asset. In particular, in case of d = 1 we obtain the classical mutual fund theorem in the Black-Scholes model.

Canonical cases of reduced models in the HMM
There are two canonical cases for choosing C and δ (or D = δδ ) in (5.2). By its definition, according to (5.3) any choice has to satisfy This RRRM yields returns For U = log the optimal strategyπ in the lower-dimensional model is Note that indeed (as proved in Theorem 5.5) the corresponding optimal wealthX is pathwise the same as in the original n-dimensional model, since The interpretation of the RRRM is according to Remark 5.6 that we invest in d funds, where fund k has returns evolving asȒ k t and a proportionŶ k t = P(Y t = e k | F R t ) of our money is invested in fund k. So in the degenerate caseŶ t = e k we would invest everything in fund k. This extreme case cannot happen for t > 0, due to the dynamics ofŶ . However, we see that the componentȒ k corresponds to the mutual fund k that would be optimal if we knew with certainty we were in state e k .
The more involved case is the second one which is based on a principal component analysis of the signal-to-noise matrix A.

Definition 5.8
The reduced model is a reduced eigenvalue model (REVM), if we choose C and D in Definition 5.1 as follows: Since A is non-singular and positive definite, we can decompose A as follows: and V is orthogonal, λ 1 ≥ · · · ≥ λ d > 0, Av k = λ k v k . So λ k is the kth eigenvalue of A and v k a corresponding eigenvector. Then we choose C = V and D = −1 .
In the REVM we get returns Now the interpretation of the mutual funds represented byȒ is that we would invest optimally in the kth fund only ifŶ t = v k . But the filter does not stay constant. In general, we In the following, we will consider this setting in more detail.

Mutual funds in the REVM
While the RRRM has a straightforward interpretation, the REVM is more sophisticated. In the following, we shall therefore have a more detailed look at this decomposition. The results in the next section provide a more fundamental interpretation on the structure of the mutual funds in the REVM.
Example 5.9 (REVM for n = 4, d = 3) In this example we consider an HMM where the chain has 3 states. For the continuous-time model, returns from real-world applications are still sampled in discrete time. For discretizing the filters, as presented in Sect. 2.3, we use a robust scheme as introduced in James et al. (1996) (see also Sass and Haussmann (2004) for a multivariate version). We use the following rate matrix Q of the chain, which yields invariant distribution ν, For a 4-dim. return process we consider further such that we see a suitable number of jumps in the graphs, but Q has no influence on the signal-to-noise matrix and thus is irrelevant for the subsequent results. An example path of the filters and the log-optimal strategy in the full model is given in Fig. 2. In Fig. 3 we see the optimal strategies in the reduced model. It can be seen that investment in the first fund fluctuates much more than in the other funds, contrary to the strategy in the full model. There, the wealth invested in the single assets fluctuates for all assets.
Proposition 5.10 Consider the returnsȒ in the REVM.

Proof
In the reduced eigenvalue model we have for the diagonal matrix of the eigenvalues λ k > 0 and for the matrix V of eigen- Therefore Using the independence of Y andW , which follows from the independence of Y and W , and that E[Y t ] = ν by starting with the invariant distribution ν, the claims in (i) can be derived straightforwardly (cf. Elliott et al. 2008). For (ii) note that by (5.9) we haveπ t = V Ŷ t and thusπ k t = λ k v kŶ t . This yields since v k = 1 and Ŷ t ≤ 1.
(iii) By Theorem 5.5 we haveX * T = X * T and thus by (4.1) and applying the Tonelli theorem yields the result.

Corollary 5.11
The optimal wealth under U = log satisfies The bounds in Corollary 5.11 have the following interpretation. Remember that X * T corresponds to investing according to the optimalπ * t = V Ŷ t under partial information in the reduced model. The upper bound corresponds to the optimal value for an investor with full information who usesπ full t = V Y t in the reduced model, and the lower bound to an investor which uses no further information and just invests according to the strategy based on the mean of the trend, i.e.π no-info t = V ν. In the original model, we would obtain the same optimal value by Theorem 5.5 using the optimal strategy π * t = (σ σ ) −1 BŶ t for partial information and the same bounds corresponding to full information by π full t = (σ σ ) −1 BŶ t and π no-info t = (σ σ ) −1 Bν for assuming constant parameters.

Remark 5.12
In case of the REVM, we can now be more specific on the structure of the d mutual funds than in Remark 5.6. Remember that λ 1 ≥ · · · ≥ λ d are the eigenvalues of the signal-to-noise matrix A. The funds are ordered according to the sizes of the eigenvalues. Proposition 5.10 (i) shows that the variances are decreasing of order t (the second term is of order t 2 ). By part (ii) this leads to a possibly higher investment in fund i than in fund j if i < j, since the invested fraction in fund i is bounded by λ i . So a fund with lower index may lead to a higher investment and thus is more attractive. From the estimate (5.10) we see that this happens in particular if Ŷ t is close to 1, which is the case if and only ifŶ t ≈ e k for some k, i.e. if the filter is quite informative. Because the funds are conditionally independent by part (i), for diversification we should still invest in all funds. This will always happen, sincê Y t = e k is not possible due to the dynamics of the filters. These relations also imply that using an approximate strategy by investing only π 1 t =π 1 t and π i t = 0 for i = 2, . . . , d, i.e. investing in fund 1 only, we can get a quite good approximation to the optimal value as long as λ 1 λ 2 . This is rather the typical case since the eigenvalues result from a principal component analysis providing the most influential investment direction. The approach in the REVM is similar to the idea of eigenportfolios. While we decompose the signal-to-noise matrix, these are based on a decomposition of the correlation matrix and choosing the portfolio corresponding to the principal eigenvector, see e.g. Avellaneda et al. (2021), Avellaneda and Lee (2010), Boyle (2014), Chen and Yuan (2016) and the references therein.

Example 5.13
Let us illustrate the preceding remark. In the setting of Example 5.9, the signal-to-noise matrix is with eigenvalues λ 1 = 82.85, λ 2 = 1.22, λ 3 = 0.07. So λ 1 is clearly dominant, which leads to the strong investment into the first fund as seen in Fig. 3. λ 2 is much smaller than λ 1 , but still of higher order than λ 3 , thus we see more investment into the second fund compared to the third.
Let us point out one relation to the convergence result Theorem 4.3. Note first that trivially, when the largest eigenvalue λ 1 of the signal-to-noise matrix A converges to 0 we have by (4.1) that the optimal expected utility converges to log(x 0 ), i.e., there is approximately no gain from investing in the stocks. However, based on the results of this section, we can utilize Theorem 4.3 for a more subtle argument on the relation of the funds in the reduced model as we outline in the following remark.
Remark 5.14 Let us consider a sequence of models with signal-to-noise matrices A (n) = 1 n A = B −1 B. Then the eigenvalues satisfy λ (n) i = 1 n λ i but the eigenvectors remain unchanged, i.e., n = 1 n , V (n) = V in the decomposition (5.8) of A (n) .
As discussed in Remark 5.12 the REVM decomposition may be used to invest only in the first k, k < d, portfolios of the decomposition. Analogously to Proposition 5.10 (ii) this would yield an expected utility of whereŶ (n) are the filters computed in the model with A (n) and where we used λ (n) i = 1 n λ i . We can compare this with the performance of the portfolio not taking the information into account, i.e. using the strategyπ no-info t = V ν as discussed after Corollary 5.11, leading to expected utility as stated in that corollary. Example 5.13 shows that often the performance in (5.11) for k < d can be expected to be better than in (5.12). Formally, by taking derivatives in (5.11) and (5.12) we see that this is true if Now the left-hand side converges to 0 due to the L 2 -convergence in Theorem 4.3 while the right-hand side is strictly positive and constant in n. This means that, if the eigenvalues become too low, then a portfolio of a strict subset of the mutual funds can no longer dominate the constant portfolio which takes no information via filtering into account.

Singular signal-to-noise matrix in HMM
Let A now be singular. A typical example is when d > n, i.e., we have a Markov chain with more states than risky assets. This already occurs when we have only one risky asset and consider d ≥ 2 market regimes. The RRRM as in Definition 5.7 does not work in this case since it requires A to be non-singular for computing its inverse. Also using the REVM directly as in Definition 5.8 does not work since it uses that the diagonal matrix of the eigenvalues is nonsingular -but now at least one eigenvalue is 0.
Utilizing the same idea, we now try to reduce the model to a dimension corresponding to the number of strictly positive eigenvalues. More precisely, having p strictly positive eigenvalues, 1 ≤ p < d, we can order the eigenvalues of A as follows ( A is at least positive semi-definite) i.e., we assume that A has rank 1 ≤ p < d. We denote as in Definition 5.8 where v i is a normalized eigenvector for λ i and V is orthogonal. Then, we have A = V V as in Definition 5.8 again, but we can not proceed as we did in Theorem 5.2.
is singular, so we can not define δδ by −1 in order to introduceW as we did in that theorem.
Instead, reducing the dimension even further to p, we set Then we have

Now we can define a Brownian motion
The same arguments as in the proof of Theorem 5.2 show thatW ( p) is a p-dimensional Brownian motion. For the p-dimensional model with returns we then have (5.14) Theorem 5.15 For maximizing power or logarithmic utility we get the same optimal terminal wealth as in the original model from investing in only p mutual funds with returns R ( p) , where the optimal strategy for U = log is But to compute the filters and thus the optimal strategyπ ( p) we need the observation from all d assets on the right-hand side of (5.14). In particular,π ( p) is in general not F R ( p) -adapted.
Proof Adapting the argument in the proof of Theorem 5.2, by (5.14) we get the same filters when we use the whole information from the right-hand side. Then we can also use the arguments in the proof of Theorem 5.5 to conclude that the optimal wealth processes agree.
Note that the second term in (5.14) really adds information. We can see this by rewriting (5.14) as

Model reduction in MSM and filter-based HMM
Consider the MSM as we introduced it in Sect. 2.2, where σ (e 1 ), . . . , σ (e d ) are pairwise different. As discussed there, Y t then is observable from the returns, i.e. it is F R t -measurable, and thus we know the current parameters Be k , σ (e k ) if Y t = e k , cf. Krishnamurthy et al. (2018). For optimization problem (3.2) with U = U α , the optimal strategies are of the form The signal-to-noise matrix then depends on time via Y t , We can apply the same decompositions as in the HMM, but now depending on Y .
Since Y is observable, this can be calculated based on the available information. We discuss the details in the following two remarks.
Remark 5.16 (Mutual funds and RRRM in the MSM) In analogy to the RRRM in Definition 5.7 we can introduce a d-dimensional reduced MSM by . Applying Theorem 3.2 to the reduced MSM, we havȇ Therefore, just as in the RRRM for the HMM, fundȒ k t will be chosen for Y t = e k . While we speak here of d funds, note that the MSM allows for the two-fund separation in the sense of Schachermayer et al. (2009, Definition 2.4) in the class of CRRA utility functions, cf. (5.4) above. Indeed, according to Theorem 3.2 where the latter part Y t B −1 (Y t )d R t would correspond to the risky portfolio in which the fraction η = 1 1−α of the current wealth would be invested.
Remark 5.17 (REVM in the MSM) In the MSM, in analogy to the REVM in Definition 5.8, we can also introduce for each value e k of Y t , k = 1, . . . , d, an eigenvalue decomposition where for a suitable Brownian motion. By Theorem 3.2 we then have optimal for logarithmic utility. But in fact, in the MSM we can simplify (5.15) considerably since Y t is a unit vector and (e k ) is diagonal, yielding In terms of mutual funds, we can look at this result as setting up, for each state of Y t , d funds ordered according to the eigenvalues of A(Y t ). Then M ik provides the fraction to be invested in fund i if the chain is in state Y t = e k , of the d funds for this state.
So in both cases we have a decomposition independent of time t, choosing the corresponding funds based on the observable state of Y t .
Typically a continuous-time MSM has better econometric properties than the HMM, e.g. it allows for volatility clustering. The reason for considering a continuous-time MSM (or HMM) is that we obtain more explicit results than in discrete time, e.g. like in Theorems 3.1 and 3.2, where corresponding discrete-time results are not explicit. But a continuous-time MSM may be a poor approximation for a discrete-time MSM in the sense that in continuous time no filtering problem exists. But in the discrete-time MSM the underlying Markov chain cannot be observed and its states have to be estimated by the corresponding filters. Therefore, solving e.g. portfolio optimization problems in the continuous-time MSM as in Theorem 3.2 provides a poor approximation for the discrete-time model. This is not the case for the HMM, where the discretization of continuous-time filters yields the filters which are optimal in the corresponding discrete-time HMM, see James et al. (1996).
This motivates to introduce a HMM with non-constant volatility as approximation for the MSM, This yields consistent approximations since the filtering problem is non-trivial. Filters can be computed similar as in Sect. 2.3. A choice of f which satisfies f (e k ) = σ (e k ) for k = 1, . . . , d is possible, cf. Krishnamurthy et al. (2018) for details in the onedimensional case.
The best model in an MSE-sense is (5.16) Using this parametrization, we speak of the filter-based hidden Markov model (FB-HMM).
Remark 5.18 (Optimization and mutual funds in the FB-HMM) Portfolio optimization also works for the FB-HMM (5.16), cf. Haussmann and Sass (2004), where filtering would have to be addressed as in Krishnamurthy et al. (2018). A reduced model can be introduced, but due to the relation (5.16) we have a signal-to-noise matrix depending on the filterŶ t . Thus, in the reduced model the composition of the mutual funds would then also depend on time and state via the filter. This could still be used to define an RRRM similar as in Definition 5.7, yielding optimalπ t =Ŷ t , but, as pointed out above, the composition of the funds would depend on the filter value via For the FB-HMM also other decompositions of A t are reasonable, but they would also be filter-dependent.
In summary, we have seen that for the HMM we have a very good interpretation of the reduced models in terms of d funds which have a time-independent composition, while for the MSM this composition depends on the state of Y t and for the FB-HMM on the filterŶ t . We shall close this section with a comparison of the three models which makes in particular evident why there is no reasonable other choice for the MSM than the one in Remark 5.16.  Elliott et al. (1995), for the HMM to Sect. 2.3 and for filters in the FB-HMM see Krishnamurthy et al. (2018).
The returns, the drift and the filter for the drift, BŶ t , are plotted in Fig. 4 for one simulation of Y and W . We see clearly, that the filter in the MSM provides the true state quite exactly as we expected since in continuous time the chain is observable. Therefore, the only reasonable decomposition into funds is the regime parametrization as discussed in Remark 5.16 since byπ t = Y t it leads to choosing the best portfolio in the current state. One also sees that the FB-HMM lies regarding filtering and volatility clustering between both models and just provides a good compromise between the more realistic filtering in the HMM and the more realistic econometric properties of the MSM.

Conclusion
In the context of hidden Markov models we showed that the signal-to-noise matrix plays a prominent role for portfolio optimization as well as for filtering. The convergence result in Chapter 4 gives an exact formulation of the intuition that we can retrieve less information on the underlying chain from observing the stock prices when the signal (drift) is small compared to the noise (volatility). This is shown by proving that for decreasing eigenvalues of the signal-to-noise matrix, the filters converge uniformly in L 2 to the invariant distribution of the chain. Since the latter is the distribution of the chain, we gain no additional information in the limit.
The important role of the signal-to-noise matrix, which is of dimension d, and of its d eigenvalues then motivated us to reduce the dimension (if d ≤ n) of the model by decomposing the signal-to-noise matrix and setting up a d-dimensional model based on this decomposition in (5.3). The returns in the reduced model can be seen as d mutual funds. We proved that portfolio optimization and filtering in the reduced model yields pathwise the same optimal wealth and filter processes.
Two special cases were introduced, using in the RRRM a decomposition which yields the optimal portfolios in the single states as funds, and in the REVM an eigenvalue decomposition which leads to funds which contribute according to the corresponding eigenvalues more or less to the optimal portfolio.
To complete the survey we looked at the case of a singular signal-to-noise matrix (e.g. when d > n) and at model reduction in related models as the MSM and the filter-based HMM.
Our analysis showed that in the standard case of a non-singular signal-to-noise matrix in the HMM, while there is less information in the reduced model from observing the funds only, the filters and the optimization based on this observation still yield the same results. For future research it would be interesting to analyze further if this reduction helps to identify relevant model parameters better. Also the effect of including expert opinions, see e.g. Frey et al. (2012), on the model reduction would be of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

A.1 Proof of Theorem 4.2
For t > 0 Using the Ito-isometry componentwise we see that