STABILITY OF TRIGONOMETRIC APPROXIMATION IN L p AND APPLICATIONS TO PREDICTION THEORY

. Let Γ be an LCA group and ( µ n ) be a sequence of bounded regular Borel measures on Γ tending to a measure µ 0 . Let G be the dual group of Γ, S be a non-empty subset of G \ { 0 } , and [ T ( S )] µ n ,p the subspace of L p ( µ n ), p ∈ (0 , ∞ ), spanned by the characters of Γ which are generated by the elements of S . The limit behaviour of the sequence of metric projections of the function 1 onto [ T ( S )] µ n ,p as well as of the sequence of the corresponding approximation errors are studied. The results are applied to obtain stability theorems for prediction of weakly stationary or harmonizable symmetric p - stable stochastic processes. Along with the general problem the particular cases of linear interpolation or extrapolation as well as of a ﬁnite or periodic observation set are studied in detail and compared to each other.


Introduction
If a time series is modelled on a weakly stationary process, its prediction is calculated using the spectral measure of the process, which in turn is estimated on the basis of observations.Therefore, similar to the Central Limit Theorem giving a theoretical justification of statistical inference, it would be desirable to prove assertions claiming that minimally differing spectral measures yield only slightly varying prediction.According to [26], cf.[28], Kolmogorov raised this problem at the VII-th Soviet Conference on Probability Theory and Mathematical Statistics in 1963.Let us formulate it more precisely.
As usual denote the sets of positive integers, of non-negative integers, of integers, of real and of complex numbers by N, N 0 , Z, R and C, respectively.Consider a weakly stationary process on R with a non-stochastic spectral measure µ 0 .Its linear prediction at the point zero on the basis of its observations at points of a set S ⊆ R is equivalent to the computation of the orthogonal projection φ µ0,2 of the function that is identically equal to 1 onto the subspace of L 2 (µ 0 ) spanned by the functions e ix• , x ∈ S. In most applications the measue µ 0 is unknown and has to be replaced by its estimate µ obtained from data, cf.[2].In fact, one computes the orthogonal projection φ µ,2 of the function equal to 1 in L 2 (µ), and one hopes that φ µ0,2 and φ µ,2 differ not too much.Accordingly, four kinds of prediction errors arise: (E1) the theoretical or "true" prediction error d Note, that additional assumptions are necessary to define (E3) and (E4) correctly, what will be explained in Section 3. The more, the theoretical prediction is not known and, thus, does not play any role in practice.So the error (E4) will be discussed only marginally.
As the next step we will define what we mean by saying that µ 0 and µ differ a little.Since both the measures are bounded it is natural to make use of the topology generated by the norm • on the linear space of C-valued measures, where for a C-valued measure ν, the symbol ν denotes the total variation.Indeed, soon after A. N. Kolmogorov had raised the stability problem, Yu. A. Rozanov published the following partial answer in [28].We always denote by lim the limit of a sequence indexed by the elements of N in their natural order.
Theorem 1.1.Let (µ n ) be a sequence of bounded Borel measures on R. If µ 0 ≤ µ n for all n ∈ N and lim µ 0 − µ n = 0, then lim In course of his proof Yu. A. Rozanov obtained the following result: Theorem 1.2.Let (µ n ) be a sequence of bounded Borel measures on R. If lim µ 0 − µ n = 0, then lim sup d 2 (µ n ) ≤ d 2 (µ 0 ).Note, that among the four prediction errors (E1)-(E4), the error (E2) is the only error that can be computed.The preceding theorem as well as Theorem 3.1 and Example 3.2 below indicate that (E2) tends to be smaller (at least, not larger) than the true prediction error.We are in the uncomfortable situation that doing prediction we must be afraid that the true prediction error is much larger than the prediction error we compute.
The error (E3) gives the most objective idea on the correctness of the prediction.It was the behaviour of (E3) which was investigated in early papers on stability of prediction, cf.[21,28].Several results, e.g.Theorem 4.9, 5.10 and inequality (6.2) below reveal that, unlike (E2), the error (E3) tends to be larger than (E1), although this is not always the case, cf.Example 3.6(a).This observation was the initiation to develop minimax robust methods, where the maximum of (E3) is computed under the condition that µ runs through a set of certain spectral measures, cf.[11] for a survey of early results.Such an approach to stability as well as the papers of M. Moklyachuk, e.g.[24] are not in the focus of the present paper.
The preceding remarks make clear that it is both of theoretical and practical importance to state sufficient conditions for the equalities lim d 2 (µ n ) = d 2 (µ 0 ) or lim d 2 (µ n , µ 0 ) = d 2 (µ 0 ), in the style of Theorem 1.1.From the point of view of applications it is probably even more important to prove convergence results for the sequence (φ µn,2 ) of the predictions themselves.To the best of our knowledge, this question has not yet been discussed in the literature.
To point out the variety of possible assertions we decided to deal with further topologies on the space of C-valued measures along with the norm topology.Section 2 is devoted to an overview of the corresponding modes of convergence we are interested in.However, for convenience of presentation we confine ourselves to the study of sequences (µ n ) n∈N of measures although this is not quite the adequate approach in case of a non-metrizable topology.
Stability of the prediction problem is equivalent to stability of the corresponding trigonometric approximation problem in L 2 (µ 0 ).It is natural to ask for stability results of analogous approximation problems in L p (µ 0 ), p ∈ (0, ∞].Since Example 3.5 indicates that one can not hope for non-trivial affirmative assertions if p = ∞, this case will be excluded from our investigations.If p ∈ (1,2], the results could be useful in prediction of harmonizable symmetric p-stable processes, cf.[3,31].
For the other values of p the results could be of interest from the point of view of trigonometric approximation.Recall in this context that Szegö had studied his celebrated "Szegö infimum" as a pure trigonometric approximation problem, cf.[29].and only after the spectral theory of weakly stationary processes had been invented its applicability to prediction theory became clear, cf.[16,17].
Another natural extension is the study of a more general set of parameters than R. Since there exists a well-developed Fourier theory on locally compact abelian groups, we shall be concerned with such a group as a parameter set of the process.In particular, our results are applicable to stochastic fields.To avoid non-essential technical complications, we do not investigate multivariate processes although it is not hard to see that part of the results can be generalized from the univariate to the multivariate cases in a straightforward way.
In Section 3 we give precise formulations of the stability problems we wish to discuss, and we try to establish interrelations between them.We prove general stability results, among them amplifications of Theorems 1.1 and 1.2, cf.Theorem 3.10.The reminder of the paper is devoted to particular prediction problems.
In Section 4 we consider stability of interpolation of one missing value, which is one of the simplest prediction problems and was already studied by A. N. Kolmogorov in [16,17].Corollary 4.10 exhibits that for the interpolation problem Theorems 1.1 and 1.2 remain to be valid if convergence in norm is replaced by a much weaker form of convergence, cf.Definition 2.5.Remarkably, it seems not so easy to prove stability results for the interpolation problem in case if more than one value is missing.
Section 5 deals with the m-steps-ahead prediction problem, m ∈ N, of a sequence of random variables.For m = 1 it is closely related to Szegö's infimum problem, [29].Although the m-steps-ahead prediction problem probably is the most important and most extensively studied prediction problem, several questions remain unanswered.For example, at present a complete extension of the results in case m = 1 to arbitrary values of m ∈ N is not known.
In Section 6 we briefly discuss the case in which the set S of observation points is finite.Since the approximating linear subspace is finite dimensional, rather strong stability results can be obtained, cf.Theorem 6.2.However, to prove them we had to suppose that the dimension of the approximating subspace is equal to the number of elements of the set S. It would be interesting to explore what happens if this dimension is less than the cardinality of S. In any case the results of Section 6 suggest that from point of view of stability it is reasonable to compute the prediction of a discete time series on the basis of a finite observation set.
The final Section 7 is devoted to a periodic set of observation points, i.e. S is a translate of a closed subgroup H of the parameter group.Problems of this type are related to the famous Whittaker-Shannon-Kotel'nikov theorem which is of great importance in information theory.Although progress has been made in recent years, cf.[22], almost all known facts pertain the stationary or Hilbert space case p = 2.As Theorem 7.8 shows, under the assumption that the annihilator group of H is at most countable, the prediction has a rather nice limit behaviour.The reason is that the prediction is bounded by 1 independently of the underlying spectral measure.
Along with affirmative claims there exist many negative assertions, and examples and counterexamples constitute an essential part of the present paper.To point out the variety of results let us mention some of them.It is not surprising that there is a strong dependency on p, in general.It is more remarkable that according to Szegö's infimum theorem the one-step-ahead prediction error does not depend on p ∈ (0, ∞).For m-steps-ahead prediction, the independence on p ∈ [1, ∞) can be proved, cf.Corollary 5.12.Clearly, the results may depend on the mode of convergence and on the observation set S. But, there is also an interplay between them, compare Theorem 4.6 and Corollaries 4.7 and 4.8 with Example 6.4.There may also occure a dependency on the direction of convergence.Theorem 1.1 and Corollaries 3.3(ii) and 5.7 seem to indicate that convergence from above provides better stability properties than convergence from below.Thus, if one has to choose a spectral density among several candidates, reasonably one may choose that, whose minimum modulus is the largest.It is also worth mentioning that strong stability results can be obtained as soon as the observations at the points of S give full information (in the linear sense) on the underlying process, cf.Corollary 3.3(i), Proposition 3.8, and the end of Section 3.

Modes of Convergence
Let Γ be a locally compact topological space with Hausdorff topology.For a subset B ⊆ Γ, denote by 1 B its indicator function and set 1 := 1 Γ .Let B be the σ-algebra of Borel subsets of Γ and M be the set of all non-zero finite non-negative regular measures on B. If ν is a C-valued measure on B, denote its total variation by ν .Recall, that • is a norm on the space of all C-valued Borel measures on Γ.If µ ∈ M and ν is absolutely continuous with respect to µ, we write ν << µ and denote by dν dµ a Radon-Nikodym derivative.Recall, that where here and henceforth integration is over Γ in case the domain of integration is not indicated.For p ∈ (0, ∞], let L p (µ) denote the common L p space of (µequivalence classes of) C-valued functions related to µ.Throughout the present paper, let µ 0 ∈ M and let (µ As a consequence of (2.1) the equalities lim µ 0 − µ n = 0 and |w 0 − w n | dν = 0 are equivalent.Moreover, one easily concludes that lim µ 0 − µ n = 0 if and only if lim µ n (B) = µ 0 (B) uniformly for all B ∈ B.
It is near at hand to discuss certain modes of weak convergence along with norm convergence.
Definition 2.1.We say that (µ n ) converges to µ 0 weakly and write w-lim µ n = µ 0 if the sequence (w n ) tends to w 0 with respect to the weak topology of The following lemma gives a characterization of weak convergence and unveils that Definition 2.1 does not depend on the choice of the measure ν ∈ M.
Lemma 2.2.The sequence (µ n ) converges weakly to µ 0 if and only if for all B ∈ B.
Since the linear space of simple functions is dense in L ∞ (ν), (2.2) follows from (2.3).
Definition 2.3.We say that (µ n ) converges to µ 0 in the weak* sense and write w *lim µ n = µ 0 if lim f dµ n = f dµ 0 for any bounded continuous C-valued function on Γ.
Remark 2.4.In probability theory the mode of convergence introduced in Definition 2.3 is often called "weak convergence".Since M is a subset of the dual space of the Banach space of bounded continuous functions the differing notion is in accordance with the terminology of functional analysis.
Since in practice the estimated spectral measure of the modelling stationary process is often absolutely continuous with respect to the Lebesgue measure λ on the interval [0, 2π), cf.[2], one could also study convergence in measure λ.To apply the concept of convergence in measure to general sequences of bounded measures we give the following definition.Definition 2.5.We say that (µ n ) converges to µ 0 in measure and write m-lim µ n = µ 0 if (w n ) tends to w 0 in measure ν.
n = w (1) 0 , we can choose a number n 0 ∈ N such that ν 1 (|w Conversely, let ν 2 -lim w (2) 0 .To prove the equality ν 1 -lim w n = w (1) 0 it is sufficient to show it on the set B := {γ ∈ Γ : v(γ) > 0}.Since the measures ν 1 and ν 2 are equivalent on B, the assertion follows from the first part of the proof.Finally, if ν 1 and ν 2 are arbitrary, we have ν 1 -lim w (1) Note that Lemma 2.6 fails if the measures ν 1 and ν 2 are allowed to be σ-finite.For example, if Γ := N, µ n := δ n is the Dirac measure at the point n ∈ N, n does not exist.Of course, all these convergence notions introduced above are equivalent if Γ is a finite set.In general, weak convergence does not imply convergence in measure, which in turn does not yield weak* convergence.Norm convergence is stronger than weak convergence, as well as it is stronger than convergence in measure, and weak convergence is stronger than weak* convergence.In general, these inclusions can be sharp.However, there exist interesting particular cases, in which some of these notions are equivalent.For example, if Γ = [0, 1] and the measures µ k , k ∈ N 0 , are absolutely continuous with respect to the Lebesque measure λ then weak and weak* convergence coincide.To see this, note that Γ = [0, 1] is normal and compact as a topological space.Furthermore, any open subset U of Γ is an at most countable union of open intervals, which implies that the boundary of U has µ 0 -measure 0. Therefore, w * -lim µ n = µ 0 if and only if lim µ n (U ) = µ 0 (U ) for any open subset U of Γ.According to a theorem by Jean Dieudonné, cf.[5,IV.16],this is equivalent to the condition lim µ n (B) = µ 0 (B) for any B ∈ B. So w-lim µ n = µ 0 by Lemma 2.2.Proposition 2.7.Let Γ be a discrete space, i.e. all subsets of Γ are open.Then weak* convergence implies convergence in norm.If Γ is an infinite set, there exists a sequence converging in measure, but not converging in the weak* sense.
Proof.We can assume that Γ is infinite since the finite case is known.Select a countable subset D := {γ i : i ∈ N} such that µ k (Γ \ D) = 0 for all k ∈ N 0 .Identify µ k with the sequence s k := (µ k ({γ i })) i∈N .The sequence s k is an element of the space l 1 of absolutely summable sequences, and lim µ 0 − µ k = 0 is equivalent to lim s k = s 0 with respect to the norm topology of l 1 .Similarly, w * -lim µ n = µ 0 is equivalent to lim s n = s 0 with respect to the weak topology of l 1 , because the space l ∞ and the space of bounded continuous functions on N coincide.Therefore, the first assertion is a consequence of the fact that on the space l 1 norm convergence and weak convergence of sequences are equivalent, cf.[23, pp. 218-220].To prove the second assertion, set µ n ({γ i }) := δ n,i for any n, i ∈ N. We obtain m-lim µ n = 0.Moreover, lim 1 D dµ n = 1 and lim 1 {γ} dµ n = 0 for any γ ∈ Γ, what implies the non-existence of a weak* limit of the sequence (µ n ).
It is worth mentioning that there exist sequences (µ n ) on Γ = [0, 1] such that both the limits w * -lim µ n and m-lim µ n exist, but are unequal.To give an example, set . This gives m-lim µ n = 0 and w * -lim µ n = δ 0 , the Dirac measure at zero.Choosing a subsequence (µ m ) of (µ n ) with lim m→∞ w m = w (2) 0 ν-a.e. and applying Egorov's theorem, we find a set C ⊆ B satisfying ν(C) > 0 and (2.5) lim 0 ν-a.e. on C, a contradiction to (2.4).In [6, Lemma 1 of Ch. 1] it was shown that for µ ∈ M, p ∈ (0, ∞), For future use we state a slight generalization of this assertion.
Proof.Assume the contrary, i.e. there exists c ∈ (0, ∞) and a subsequence (f m ) of the sequence (f n ) such that (2.9) for any m.Again, choose a subsequence (f l ) of the sequence (f m ) such that lim f l = f 0 µ-a.e.. From the result cited above one derives lim l→∞ |f 0 − f l | p dµ = 0, what contradicts (2.9).
Applying Lemma 2.9 with p = 1 and f k = w k we get the following result.

General results
Let G be an LCA group, i.e. a locally compact abelian group with Hausdorff topology.Let Γ denote its dual group of continuous characters.The group G can be identified with the dual of Γ, and the character on Γ generated by x ∈ G is denoted by e x .Group operations are written additively, and the letter λ stands for a Haar measure on Γ, which in case of compact Γ is presumed to be normalized, i.e. λ(Γ) = 1.Recall, that M denotes the set of all non-zero non-negative regular measures on the Borel σ-algebra B of Γ.
Let S be a non-empty subset of G \ {0} and T (S) be the linear space of all complex-valued trigonometric S-polynomials, i.e. the linear space of all finite sums of the form Σ j a j e xj with a j ∈ C, x j ∈ S. If p ∈ (0, ∞) and µ ∈ M, define the distance of the functrion 1 to T (S) with respect to the metric of L p (µ) by it would be more correct to call it the p-th power of the distance.
Let [T (S)] µ,p be the closure of T (S) in L p (µ).If there exists a unique element Let µ ∈ M be such that µ << ν.Then the ν-equivalence yields µ-equivalence, and if φ ν,p (S) exists the integral If µ is not absolutely continuous with respect to ν, there exists a B ∈ B with µ(B) > 0 and ν(B) = 0. Since φ ν,p (S) can be arbitrarily chosen on B, the integral on the right hand side of (3.2) cannot be given a sense.Therefore, whenever we shall be concerned with d p (ν, µ; S) we always suppose that φ ν,p (S) exists and µ << ν although this will not be mentioned explicitely each time.If ν is the estimate of the spectral measure µ of a harmonizable symmetric p-stable process, then d p (ν, µ; S) can be interpreted as the true error of the estimated prediction φ ν,p (S).We should conclude that φ ν,p (S) is an unsuitable prediction in case it does not belong to L p (µ).Let µ k ∈ M, k ∈ N 0 .As described in the introduction, from the point of view of prediction theory it is of interest to study the behaviour of the sequence (φ µn,p ) n∈N if (µ n ) tends to µ 0 in one or another way.More precisely, we try to give necessary and sufficient conditions for any of the following relations: Of course, in the cases (R2) and (R3) we presume that the respective metric projections exist and µ 0 << µ n for any n ∈ N. To simplify the notation we set φ k,p (S) := φ µ k ,p (S), and frequently we shall not indicate the dependency on the set S, e.g.
Our first result is a generalization of Theorem 1.2.
for all S. Proof.
Since the function |1 − τ ε | p is continuous and bounded, there exists for any n ≥ n 0 , which yields (3.3) by the arbitrarity of ε > 0.
To see that inequality (3.3) can be sharp consider the interpolation problem for weakly stationary sequences where 0 is the only missing value.Thus, let G = Z and S = Z \ {0} for the moment.Defining e x (γ) := e ixγ , x ∈ Z, γ ∈ [0, 2π), the dual group Γ of Z can be identified with the interval [0, 2π), where the group operation is addition modulo 2π and the set of all open subintervals together with all sets of the form [0, a) ∪ (b, 2π) form a basis of its topology.Let λ be the normalized Lebesgue measure on [0, 2π).Assume, that µ k << λ and denote , and consequently, lim d 2 (µ n ) = 0. Therefore, for any b ∈ [0, d 2 (µ 0 )] there exists a sequence (µ n ) with lim µ 0 − µ n = 0 and lim d 2 (µ n ) = b.Note, that (w n ) tends to w 0 even uniformly and that, additionally, lim log w0 min(w0,wn) dλ = 0. Compare this with Theorem 5.5.The following corollary is a straightforward consequence of Theorem 3.1.Its first assertion claims that as soon as the observations at the points of S contain the whole (linear) information on the process, then the computed prediction error tends to be equal to the theoretical prediction error.
Corollary 3.4.Let S ⊆ G \ {0} be an arbitrary non-empty set and µ 0 ∈ M. The following two assertions are equivalent: , where ess sup µ denotes the essential supremum with respect to the measure µ ∈ M. For the sake of completeness we show by example that for p = ∞ the inequality (3.3) is not true, in general.
We give further sufficient conditions for the implication (R2) → (R3) that are closer to prediction theory in spirit.Proposition 3.8 tells us that (R3) follows from (R2) if the observations at the points of S contain the whole information on the process.The proof of Theorem 3.9 is based on the uniform rotundity of the Recall that • 0 is a uniformly rotund norm, cf.[23, p. 441].By Proposition 3.8 we can assume that d p (µ 0 ) > 0 and set Using elementary properties of norms we obtain (3.5)

) lim
n,j→∞ ). and since φn,p+φj,p 2 and an application of (3.5), (3.6) and (3.8) yields (3.4), what means that for δ ∈ (0, ∞), there exists n 0 ∈ N such that 1 2 If it would exist a positive number ε and a subsequence (f nr ) r∈N such that then for any δ ∈ (0, ∞) there would exist elements f nr and f nr+1 satisfying (3.9) and the inequality 1 is a Cauchy sequence with respect to the norm • 0 .Since lim f n 0 = 1, the sequence (φ n,p ) is a Cauchy sequence as well.Its limit lim φ n,p =: φ p satisfies the equality what implies that φ p = φ 0,p since φ p ∈ [T ] µ0,p and the metric projection is unique.Note, that in case p = 2 the assertion of the preceding theorem can be easily derived from the Pythagorean theorem.Indeed, if (1 − φ n,2 ) is the orthogonal sum of (1 − φ 0,2 ) and (φ 0,2 − φ n,2 ), one has Summarizing our extensions of Theorem 1.1 we state the following theorem.Its assertions fail if weak* convergence is replaced by convergence in measure, cf.Example 5.2.
Lemma 3.11.Let µ 0 << µ n , n ∈ N. Assume that the set S, p ∈ (0, ∞) and µ k are such that the metric projection φ k,p (S) exists, k ∈ N 0 .Assume further that the following conditions are satisfied: would not be true, by (ii) there would exist a subsequence (µ n ′ ) satisfying (3.10) lim From (iii) we derive the existence of another subsequence (n ′′ ) of (n ′ ) with Proof.The suppositions of the lemma imply that the sequence ess sup µ0 |φ 0,p (S) − φ n,p (S)| p ) is a bounded sequence and (R3) follows from Lebesgue's dominated convergence theorem.The second assertion then follows from Corollary 3.12.
The strong boundedness condition on the sequence (φ n,p (S)) is the main obstacle to apply the preceding lemma.For the by far most important case p = 2, now we state a result which somewhat widens its field of applications.Let S be an arbitrary non-empty subset of G \ {0} and x ∈ G \ {0} be such that e x ∈ [T (S)] µ0,2 .Denote by Proposition 3.14.Let µ 0 << µ n , n ∈ N 0 , and lim µ 0 − µ n = 0. Assume that for all y ∈ G, the function P k e y has the following two properties: • Then for all y ∈ G, the function Q k e y has the same properties.
To prove Proposition 3.14 we show the following lemma: Lemma 3.15.Let µ 0 << µ n and lim µ 0 − µ n = 0.If (f n ) denotes a sequence of complex-valued measurable functions such that Proof.We have We conclude the present section with a few remarks on the estimated error of the theoretical prediction d p (µ o , µ n ) := |1 − φ 0,p | p dµ n , of course, presuming that φ 0,p exists and µ n << µ 0 , n ∈ N. From the mere definition of weak* convergence we can derive that, in case φ 0,p is continuous and bounded, then lim It is also worth mentioning that if the observation at the points of S give full information on the process, i.e. if d p (µ 0 ; S) = 0, or equivalently, φ 0,p (S) = 1, then φ 0,p (S) = 1 µ-a.e., since µ n << µ 0 .Thus, if d p (µ 0 ; S) = 0, then d p (µ 0 , µ n ; S) = 0 always.Conversely, if d p (µ 0 ; S) > 0, under a slight technical proviso on µ 0 one can construct a sequence (µ n ) satisfying mlim µ n = µ 0 and lim d p (µ 0 , µ n ; S) > d p (µ 0 ; S).To do this note first that there exist B n ∈ B and c ∈ (0, ∞) such that µ 0 (B) > 0 and |1 − φ 0,p (S)| p ≥ c µ 0 -a.e. on B. If we assume now that there exists a sequence (B n ) of Borel subsets of B with µ 0 (B n ) > 0 and lim µ 0 (B n ) = 0, we can set dµ n :=

Interpolation of one missing value
Throughout this section, S denotes the subset S = G \ {0} of an LCA group G, and the dependence on S will not be indicated in most cases.
Proof.Since G is the dual group of its dual group Γ, the family of subsets U (K, δ) := {x ∈ G : |e 0 (γ) − e x (γ)| < δ for all γ ∈ K}, where K runs through the compact subsets of Γ and δ through the positive real numbers constitutes a basis of neighbourhoods of the zero element 0 of G. Let ε ∈ (0, ∞).By the regularity of the measure µ, there exists a compact subset p and obtain |1 − e x | dµ < ε for all x ∈ U (K, δ).Since G is assumed not to be discrete, there exists an x ∈ U (K, δ) which is different from 0. The assertion follows.
To prove Proposition 4.2 we need a characterization of the set [T ] µ,1 in L 1 (µ).The following description is more general and of interest in its own.
, then all functions of L p (µ) are integrable with respect to λ and [T ] µ,p is exactly the subspace of all f ∈ L 1 (µ) with f dλ = 0.
Proof.Let p ∈ (1, ∞).If w − q p ∈ L 1 (µ), then for f ∈ L p (µ) Hölders inequality yields Recall, that a subspace L of L p (µ) is a Chebyšev subspace if for arbitrary f ∈ L p (µ) the metric projection onto L exists.Chebyšev subspaces of L 1 (µ) with finite codimension were described by Garkavi [8,Thm. 3].From our results we can derive the following corollary: if and only if ess inf λ w = 0, where w denotes the Radon-Nikodym derivative of the absolutely continuous part of µ.
In the remaining part of this section some of the general results of Section 3 are specified to the case S = G \ {0}.Let us assume that µ k ∈ M is absolutely continuous and set w k := dµ k dλ , k ∈ N 0 .Theorem 4.5.Let p ∈ (1, ∞).The relations (R2) and (R3) are equivalent for S = G \ {0}.
Proof.According to Theorem 3.9 we have only to prove that in case d p (µ 0 ) = 0 and lim d p (µ n , µ 0 ) = d p (µ 0 ), φ n,p ∈ [T ] µ0,p for all n large enough.Since the just mentioned conditions imply that [T ] µn,p = L p (µ n ) and φ n,p ∈ L p (µ 0 ) for all n large enough, we get φ n,p dλ = 0, hence φ n,p ∈ [T ] µ0,p by Lemma 4.3.
Note, that the cases d p (µ) > 0 and d p (ν, µ) = 0 as well as d p (ν, µ) = ∞ can occur, cf.Examples 3.2 and 3.6.If both d p (µ) and d p (ν, µ) are positive real numbers, the left inequality of (4.12), which was obtained for p = 2 by Taniguchi [30, p. 57] in a slightly more general form, is a simple consequence of Hölder's inequality.To see this, multiply the left inequality of (4.12) by ( w − q p dλ) p q ( v − q p w dλ) p and take into account (4.3) and (4.11).We get the inequality ( v − q p dλ) p ≤ v −q w dλ • ( w − q p dλ) p q , which can be proved applying Hölder's inequality to the integral ( (v − q p w 1 p )w − 1 p dλ) p .Here we give a different proof of Theorem 4.9 applying Lemma 4.3.
2, reveals that the first assertion of the preceding corollary is false for p = 1.Moreover, the following example shows that the convergence in measure cannot be replaced with weak convergence.

Prediction m steps ahead
The goal of the present section is m-steps-ahead prediction, which has played a central role from the very beginning of the prediction theory.To avoid several transformations to complex conjugates it is more convenient to study the m-steps backwards prediction problem instead.So, let G = Z, Γ = [0, 2π), λ be the normalized Lebesgue measure on Γ, and the character e x for x ∈ Z be defined as described in Section 3.For m ∈ N, denote by S m the set S m := (m − 1) + N = {m, m + 1, ...}.Let µ ∈ M and d µ = w dλ + µ σ be its decomposition into its absolutely continuous and singular parts.Similarly to (4.1) the equality A different proof and at the same time an extension to all p ∈ (0, ∞) can be derived from [13], see the remarks after formula (4.1) of the present paper.Similarly to Section 4, we assume µ σ = 0 throughout the present section.
The celebrated Szegö theorem asserts that for all p ∈ (0, ∞), d p (µ; S 1 ) is equal to the geometric mean of w, i.e.
(5.2) d p (µ; S 1 ) = exp log w dλ , where the right hand side of (5.1) has to be understood as 0 if log w ∈ L 1 (λ), in particular, if w = 0 on a set of positive λ-measure, cf.[ ) .Obviously, w n ≥ w 0 , m-lim µ n = µ 0 , and (5.2) yields d p (µ 0 ; S 1 ) = 0 and It is a simple consequence of (5.2) that the distance d p (µ; S m ) is equal to zero if and only if log w ∈ L 1 (λ), p ∈ (0, ∞), m ∈ N. The principal tool to study the case d p (µ; S m ) > 0 is Hardy space theory, cf. the excellent books [6], [10], [18].If log w ∈ L ! (λ), the function f defined by f (z) := 1 2 e1+z e1−z log w dλ has a Taylor series where a j is the j-th Fourier coefficient of log w, i.e. a j = e −j log w dλ.The function the function h has a Taylor expansion h(z) = ∞ j=0 b j z j , |z| < 1.For m ∈ N and a power series g(z) = ∞ j=0 c j z j denote by Π (m) (g) the polynomial From (5.4) it is clear that Π (m) (h) is defined by Π (m) (f ) unambiguously.The following lemma implies conversely that Π (m) (h) defines Π (m) (f ) uniquely.
Lemma 5.3.For any r ∈ N 0 , the r-th Fourier coefficient a r of log w is uniquely defined by the first r + 1 Taylor coefficients b 0 , ..., b r of h.
where g 1 has a Taylor series of the form g 1 (z) = ∞ j=m c j z j , |z| < 1.The function exp Π (m) (f ) is continuous and root-free, which yields for some positive constants c and C. From (5.4) we obtain Taking into account (5.9), (5.10) and (5.11) we derive (5.12) Since h is outer, (5.11) implies that exp g is outer as well.From (5.10) we conclude that the metrics of L p (λ) and of L p (| exp Π (m) (f )| 2 dλ) are equivalent.Therefore, by (5.5), (5.9) and the outerness of exp( 2 p g).Using Lemma 5.3 we can give a sufficient condition for (R1) being true in case S = S m .To state the result we introduce a function w n := min(w 0 , w n ), n ∈ N, and make the following convention: for a, b ∈ [0, ∞), a ≥ b, we set log Proof of Theorem 5.5.If log w 0 ∈ L 1 (λ), then d p (µ o ; S m ) = 0 and the result follows from Corollary 3.3(i).Let log w 0 ∈ L 1 (λ).Relation (5.15) implies that log w n ∈ L 1 (λ) for all suffiently large n.Define f n by f n (z) := 1 2 e1+z e1−z log w n dλ, |z| < 1, and Π (m) (f n ) for the corresponding power series, cf.(5.3) and (5.7).From (5.15) it follows that the sequence uniformly on the unit circle.Thus, (5.10) yields the existence of a sequence (c n ) of positive numbers such that lim inf c n ≥ 1 and An application of Lemma 5.4 and Corollary 3.3(ii) completes the proof.
The preceeding assertion does not remain true if the norm-convergence of (µ n ) is replaced by the convergence in measure or with weak convergence as can be seen from Example 5.2 or from the following example, resp..
since the summands with odd indices give 0. .Moreover, our Theorem 5.11 below states a result on d 2 (µ n , µ 0 ; S m ), whose analogue in the case G = Γ = R was also claimed in [21].
If the metric projection of the function 1 onto the space [T (S m )] µ,p exists, it is denoted by φ (m) µ,p .From [6,Thm. 8.1] it can be derived that φ (m) µ,p exists for all m ∈ N and p ∈ [1, ∞).To study the theoretical error of the estimated prediction it would be helpful to have an expression for φ (m) µ,p .To the best of our knowledge Theorem 5.10 summarizes the information on φ (m) µ,p known at present.In case p = 2 the assertion is an old result of prediction theory and can be found e.g. in [4].It was extended to p ∈ (1, 2] by Cambanis and Soltani [3, Thm.5.1] and, with a different proof, to p ∈ (1, ∞) by Rajput and Sundberg in [27, Thms. 2 and 4].Recently, the result was re-discovered, cf.[19].
Proof.To prove the first assertion we have only to show that if both log w and log w ν are integrable with respect to λ, then the left inequality of (5.23) is satisfied.According to (5.2) and (5.22) one has to derive the inequality exp log w dλ ≤ w wν dλ• exp log w ν dλ or, equivalently, exp log w wν dλ ≤ w wν dλ, which is Jensen's inequality.The second assertion merely expresses the condition for equality in Jensen's inequality.
We slightly extend Theorem 5.10 showing that condition (5.18) is not a necessary assumption for the equality (5.20).Proof.Since relation (5.13) is a consequence of condition (v), cf.Remark 5.9, conditions (iii) and (v) yield (5.15), which implies that the j-th Taylor coefficient of h n tends to the j-th Taylor coefficient of h 0 for n → ∞, j ∈ N 0 .Therefore, from (5.19) |hn| 2 λ-a.e. for some positive constant c and all n large enough.We derive hence, lim sup d p (µ n , µ 0 ; S m ) ≤ d p (µ 0 ; S m ) by condition (v) and Theorem 5.5.To complete the proof take into account condition (ii).
As the following example reveals, condition (v) of the preceding theorem cannot be replaced by the weaker condition that lim |w0−wn| a w a n dλ = 0 for some a ∈ (0, 1).The more, it cannot be replaced by condition (5.13).
From Theorem 5.10 a function theoretic inequality can be derived, which does not seem to be proven so easy with purely function theoretic means.
Proposition 5.15.Let h, g ∈ H 2 and g be an outer function.Then for all m ∈ N, where (b j ) is defined by (5.6).

Finite set of observations
In this section G is an arbitrary LCA group and S k := {x 1 , ..., x k }, k ∈ N, is a finite subset of G \ {0}.Write T := T (S k ).For µ ∈ M and p ∈ (0, ∞), the space [T ] µ,p = T is a finite-dimensional linear space not depending on p.However, note that its dimension can be less than k since µ-equivalent functions are identified.To emphasize that we are concerned with µ-equivalence classes of functions we use the notation [T ] µ instead of T .
Let ν ∈ M satisfy µ << ν and let p ∈ (0, ∞) be such that the metric projection φ ν,p exists.Since φ ν,p ∈ [T ] µ we have (6.1) We recall some facts on subadditive functionals which are absolutely homogeneous of order p.Let L be a finite-dimensional linear space over C equipped with a norm Denote the set of all subadditive and absolutely homogeneous of order p functionals on L by F L .Note, that subadditivity yields Moreover, if f ∈ F L then f is continuous and there exists a constant C with If additionally f (u) = 0 only for u = 0, then ρ(u, v) := f (u − v) defines a metric on L, and there exists a constant c ∈ (0, ∞) with From (6.3) and (6.4) we derive that the topology generated by the metric ρ is equivalent to the norm topology on L and that the set Proof.Since the sequence (f (j) ) j∈N , f (j) := sup{f n : n ≥ j} is increasing and converges to f 0 pointwise, it converges to f 0 uniformly on K 0 by Dini's theorem, in particular, f for all u ∈ D and any n ≥ n 0 .From (6.2), (6.5), (6.6) and (6.7) the inequality for all τ ∈ L and all n ≥ n 0 .Taking the infimum over all τ of the form τ was arbitrary.Taking into account Theorem 3.1, the proof is completed for p ∈ [1, ∞).In the case p ∈ (0, 1) a similar proof works applying Lemma 6.1 to the functionals f j (τ ) := |τ | p dµ j with j ∈ N.
Proof.By Corollary 3.4 one has only to prove (R1) for a sequence (µ n ) satisfying lim µ 0 − µ n = 0 and µ n ≤ µ 0 , n ∈ N. Assume without loss of generality that the family of functions e x1 , ..., e xj , j ≤ k, is a basis of [T ] µ0 .Since µ n ≤ µ 0 these functions span [T ] µn and the claim is proved by Theorem 6.2(i).
If p = 2 and lim µ 0 −µ n = 0, the condition of Theorem 6.2(ii) can be weakened slightly.Note, that for a singleton S = S 1 the conditions (i) and (ii) of Proposition 3.14 are satisfied.Thus, we can state the following corollary.

Periodic observations
This section is devoted to a rather incomplete discussion of the observation set S = x + H, where H is a closed subgroup of an LCA group G and x is a given element of G \ H. Since little is known for p = 2 most of our results pertain the stationary case p = 2. Let A := {γ ∈ Γ : e y (γ) = 1 for all y ∈ H} be the annihilator of H. Recall that A is a closed subgroup of Γ, and thus, an LCA group with respect to the induced topology.To the end of this section we assume that A is at most countable and denote the number of its elements by cardA.It follows that A is discrete, hence, metrizable.
φ µ,2 (S) = 1 otherwise, the following results can be obtained by simple computations: a) If w 0 = 1 and w n

where |µ 0
− µ n | denotes the variation of the real-valued measure µ 0 − µ n .Since µ 0 << µ n condition (3.13) implies that sup{ess sup µ0 |f n | : n ∈ N} ≤ c.Thus, if n → ∞ the first summand at the right-hand side of (3.14) tends to zero by Lebesgue's dominated convergence theorem.To see that the second summand tends to zero as well, note first that |µ 0 −µ n | << µ n .In fact, if B ∈ B with µ n (B) = 0 then µ n (C) = µ 0 (C) = 0 for any C ∈ B, C ⊆ B, what yields |µ 0 −µ n |(B) = 0. Therefore, sup{ess sup |µ0−µn| |f n | : n ∈ N} ≤ c by (3.13), and hence, lim |f n | d|µ 0 − µ n | ≤ c lim µ 0 − µ n = 0. Proof of Proposition 3.14.Denote by (•, •) k and • k the scalar product and the norm, resp., of L 2 (µ k ).For y ∈ G, under the assumption e x − P k e x = 0 we can write (3.15) Q k e y := P k e y + e y , e x − P k e x e x − P k e x k k e x − P k e x e x − P k e x k .The claim of the proposition can be derived from (3.15) applying Lemma 3.15 twice.First, setting φ k := |e x − P k e x | 2 in Lemma 3.15 , we get lim e x − P n e x n = e x −P 0 e x 0 , in particular, e x −P n e x n = 0 for any sufficiently large n since the assumption e x ∈ [T (S)] µ0,2 yields e x − P 0 e x 0 > 0. Now, setting f k := e y (ex−P k ex) ex−P k ex k in Lemma 3.15, we obtain lim e y , ex−Pnex ex−Pnex n n = e y , ex−P0ex ex−P0ex 0 0 , and consequently, sup{ess sup µn |Q n e y | : n ∈ N} < ∞ by condition (i) and (3.15).Finally, lim |Q 0 e y − Q n e y | 2 dµ 0 = 0 follows from condition (ii) and (3.15).
R2) and (R3) are satisfied for S = S k .Proof.(i) According to Corollary 3.3(i) we have only to consider the case d p (µ 0 ) > 0. Let p ∈ [1, ∞) and set L