Generalized approach to the problem of regression

As a result of studying and solving certain extremal problems defined on the basis of finite and infinite dimensional pseudo-Hilbert space, the authors present a generalization of the classical regression idea. Given discrete or continuous empiric data, the class of solutions is uniquely determined and expressed in a new form of the regression function sequences, both within the asynchronous and synchronous type in a given pseudo-Hilbert space. With the help of this new technique, a large variety of observed phenomena in different areas of practical and theoretical sciences can be precisely described and investigated.


Introduction
The contemporary world is characterized by an increasing influx of information. It is possible to achieve some of them in applied selected moments of time, whereas others can be observed in continuous time system. In this avalanche of information, one has to know how to find both relational and functional dependence. The latter is often too complicated to capture and describe by means of simple mathematical expression.
Approximate functional dependences which would describe a number of interesting phenomena with assigned accuracy should be sought. The study of appropriately constructed approaching functions can lead to detection of not yet discovered dependences, as well as assessments separate and combined effects caused by several observed variables. This has a huge significance, especially in situations where dependence, expressed in terms of physical, chemical and biological laws, between observed parameters is unknown.
A particular, although simplified, example of a solution to such a set of scientific problems is a method of linear regression with its various modifications, formulated on the basis of probability calculus; cf. [1,11,14,15,17]. This method led to a series of implementations and experienced numerous theoretical modifications, crucial due to the seriousness of the implementation problem; cf. [4][5][6]13,15,16].
The regression problem considered in this paper has the form of solution of a properly formulated extremal problem well defined and stated in the both finite and infinite dimensional pseudo-Hilbert space environment.
A short presentation of the classical approach to the regression problem can be formulated as follows.
Let F be the family of all functions R t → at + b, where a, b ∈ R, and let x, y : Z 0,n → R be arbitrarily given sequences. It is well known that if x is not a constant sequence, then there exists the unique f 0 ∈ F satisfying the following condition (1. 2) cf. e.g. [17] and [4]. The function f 0 is usually said to be the regression line for the empiric sequences x, y : Z 0,n → R. In view of (1.1), the function f 0 has a natural interpretation as an optimal function with the smallest quadratic deviation from the empiric observations {(x k , y k ) : k ∈ Z 0,n }. The function f 0 plays the very essential role in different areas of the applied mathematics; cf. e.g. [6] and [4]. It shows that the mentioned above extremal problem, can be considerably qeneralized and solved, which is a subject of this paper. To this end we introduce the regression structures R := (A, B, δ; x, y), where: Given a regression structure R and a functional model F of R we seek the optimal theoretic functions f 0 ∈ F which are the best fitted to the empirical data-represented by the empirical data functions x and y-with respect to the criterion δ. To be more precise, we consider the extremal problem of determining all functions f 0 ∈ F minimizing the functional i.e. all functions f 0 ∈ F satisfying the following inequality The set of all f 0 ∈ F satisfying the inequality (1.4) will be denoted by Reg(F, R). Each function f 0 ∈ Reg(F, R) is said to be the regression function in F with respect to R. The problem of describing all regression functions in F with respect to R we call the regression problem for F with respect to R. Example 1.1 Consider an electric circuit with direct current. According to Ohm's law the voltage V depends on the intension I by the equality V = R I , where the multiplier R is the resistance of the circuit. We want to determine the parameter R by means of measurements samples of intensity and voltage represented by a sequence Z 0,n k → (i k , v k ). To this end we consider the regression structure R, where A := R, B := R, the empiric data functions are defined by Z 0,n k → x(k) := i k and Z 0,n k → y(k) := v k , and, as a criterion of deviation δ, we take the least squares method, i.e. δ( f, g) := n k=0 ( f (k) − g(k)) 2 , f, g : Z 0,n → R. (1.5) The theoretic functional model F is represented by linear functions R t → rt for r ∈ R. Calculating the critical point of the function (1. 6) In what follows we shall study the regression problem for a wide range of theoretic functional models F and regression structures R involving a generalized variant of quadratic deviation applied in (1.1). The main idea of our approach was presented in the paper [12] where we confined ourselves to the case where the theoretic functional model F is a finite-dimensional linear set with respect to the standard operations of adding and multiplying complex-valued functions. In the present paper we consider the general case where F is an arbitrary linear set. We also provide various examples motivating our general approach to the regression problem and discuss the, so called, diagonal case much related to the classical approach to the regression problem. Hence and by (2.1), .
Definition 2.1 Any regression structure R satisfying the conditions II.1 and II.2 is said to be the asynchronous regression structure; real asynchronous regression structure as B = R and complex asynchronous regression structure as B = C.
Example 2.2 Consider a regression structure R defined as follows. Given n, m ∈ N let 1 := Z 0,n and 2 := Z 0,m . Let B be the family of all subsets of the cartesian product 1 × 2 . Obviously, the family B is a σ -field of subsets of 1 × 2 , and hence we can define a unique measure μ : B → [0; +∞) satisfying the condition where 1 × 2 (k, l) → ρ k,l ∈ R is a given non-negative function. Then, for any In particular, assuming m = n and setting ρ k,l := 1 as k = l, 0 as k = l, (2.5) we conclude from (2.4) that Combining then (1.3) with (2.4) and (2.5) we can see that for given empirical data functions Z 0,n k → x k ∈ A and Z 0,n k → y k ∈ B, where A := R, B := R and F is the family of all functions R t → at + b as a, b ∈ R. Therefore δ is exactly the classical square deviation used in (1.1).
Fix a regression structure R satisfying the properties II.1-II.2. We consider the family L 1 (R) of all functions f : A → B such that is a B-measurable function and (2.8) We shall also consider the family 1 L 2 (R) of all functions g : B → B such that is a B-measurable function and From (2.8) and the inequality it follows that the functional is well defined. Hence, u|u ≥ 0 as u ∈ L 1 (R), and so the functional is also well defined.
It remains to prove the completeness of H(R). The mapping x : 1 → A induces the σ -field where 2 A denotes the family of all subsets of the set A, and the measure Fix u ∈ L 1 (R). Since the function is B-measurable we see that for every Borel set U ⊂ B, and hence u −1 (U ) ∈ B x . Thus u is B x -measurable. Moreover, from (2.12) and Lemma 7.3 it follows that Consider a sequence N n → f n ∈ L 1 (R) satisfying the Cauchy condition (2.14). From (2.17) it follows that this sequence is also a Cauchy sequence in the space L 2 (A, B x , μ x ), which is complete. Thus  (2.20) it follows that the functional · is a pseudo-norm on the linear space (L 1 (R), +, ·), i.e. for any u, v ∈ L 1 (R) and λ ∈ C: cf. e.g. [ (R), +, ·, · ) is a pseudo-Banach space.
By (2.8)-(2.10) we see that for each g ∈ L 2 (R) the functional is well defined.

Lemma 2.5
The structure (L 2 (R), +, ·) is a complex (resp. real in the case where B = R) linear space and for each g ∈ L 2 (R) the functional g * is bounded on H(R) and the supremum norm of g * satisfies the following inequality Proof From the inequality (2.16) and by (2.9) we see that for all λ 1 , λ 2 ∈ B and g 1 , g 2 ∈ L 2 (R), Thus λ 1 g 1 + λ 2 g 2 ∈ L 2 (R) for all λ 1 , λ 2 ∈ B and g 1 , g 2 ∈ L 2 (R). Hence L 2 (R) is a linear set with respect to standard operations of adding and multiplying functions. Then the structure (L 2 (R), +, ·) is a linear space as a linear subspace of the linear space (B → B, +, ·). The linearity of the functional g * comes out from the algebraic properties of the Lebesgue integral. From Schwarz's integral inequality and from (2.22), (2.8), (2.9) and (2.12) it follows that for all f ∈ L 1 (R) and g ∈ L 2 (R), which yields (2.23).

Solution of the regression problem
Given a regression structure R := (A, B, δ; x, y) satisfying the properties I.1-I.3 we see that is a regression structure for each function g : B → B. From now on we shall study the regression problem for F with respect to R g , where R is an arbitrarily given regression structure satisfying the assumptions II.1-II.2, g : B → B is a fixed function and F is a linear functional model of R with respect to the standard operations of adding and multiplying functions, i.e. f + g ∈ F and λ f ∈ F for any f, g ∈ F and λ ∈ B. If additionally F ⊂ L 1 (R) and g ∈ L 2 (R), then the regression problem means the extremal problem of determining all functions f 0 ∈ F which are minimizing the functional F g defined by We shall start our research with the following basic characterization of the regression functions. Lemma 3.1 If F = ∅ is a linear set in H(R) and g ∈ L 2 (R), then for every f ∈ F the following property holds Hence and by (3.1), (2.22) as well as by (2.11) and (2.12) we get By this and by (2.13), Assume now that f ∈ F satisfies the right hand side condition in (3.2). Then setting λ := 1 we deduce from (3.3) that (3.4) Replacing h by (−h) in (3.4) we get Combining (3.4) and (3.5) we have and consequently, where α(λ) ∈ [0; 2π) is the unique number satisfying the equality λ = |λ|e iα(λ) . Thus (3.6) yields, in the limiting case as |λ| → 0, the following equality Choosing appropriately α we can see that which completes the proof.
By the properties of a pseudo-norm we see that the following set is linear. We call it the null set of H(R). As a matter of fact the set is the closed ball with radius 0 and center at the zero function θ , defined by θ(t) := 0 for t ∈ A. We may extend the standard operations of adding and multiplying functions by a constant to any sets F 1 , F 2 ⊂ (A → B) as follows:

Corollary 3.2
If F = ∅ is a linear set in H(R) and g ∈ L 2 (R), then If additionally F ⊂ , then Reg(F, R g ) = F.
Proof Fix f, h ∈ L 1 (R). If h = 0, then from the Schwarz's inequality (2.20) and Lemma 2.5 it follows that Hence Assume that f ∈ Reg(F, R g ) and h ∈ + F. Then h = h 0 + h 1 for some h 0 ∈ and h 1 ∈ F. Applying now Lemma 3.1 and (3.8) we see that By definition, f ∈ F ⊂ + F. Applying Lemma 3.1 once more, we obtain and consequently Conversely, assume now that and h ∈ F. Since h ∈ + F, we conclude from Lemma 3.1 that Then Lemma 3.1 says that f ∈ Reg(F, R g ), and so Combining this with the inclusion (3.9) we obtain the equality (3.7). Since ⊂ L 1 (R), the equalities in (3.8) hold for all f, h ∈ . Then Lemma 3.1 yields If now F ⊂ , then the equality (3.7) takes the form Reg(F, R g ) = F, which proves the theorem. Moreover, for any subsets S 1 and S 2 of L 1 (R), Given a nonempty set S ⊂ L 1 (R) we denote by S ⊥ the orthogonal complement of S in the space H(R), i.e.
Proof Assume that Reg(F, R g ) = ∅. Then f ∈ Reg(F, R g ) for some f ∈ F, and by Lemma 3.1, Thus f ∈ ( ∩ F) + f , and then Conversely, if f ∈ ( ∩ F) + f , then by (3.16) and by (2.20) we see that for every h ∈ F, Hence and by (3.16), Applying Lemma 3.1 once again we see that f ∈ Reg(F, R g ), and so Both the inclusions yield the equality in (i). Assume now that and consider any fixed f ∈ (F ∩ S) ⊥ ∩ F\ . Then f > 0 and This means that f ∈ , which contradicts our assumption. Therefore g * ( f ) = 0, and consequently Then Lemma 3.1 shows that which together with the property (i) yields (3.15). This proves the property (ii). If Reg(F, R g ) = ∩ F, then θ ∈ Reg(F, R g ), and applying Lemma 3.1 we see that which shows, by Lemma 3.1, that θ ∈ Reg(R g ).
Hence and by the property (i), which yields the property (iii).
Assume now that f ∈ Reg(F, R g ) for some f ∈ F. From (3. 16) it follows that Consider first the case where f / ∈ and suppose that g * ( f ) = 0. Then f ∈ F ∩ S, and by (3.19), f 2 = f | f = 0. Hence f ∈ , which contradicts our assumption. Thus g * ( f ) = 0, and so for each h ∈ F, (3.20) which leads to Therefore provided f / ∈ . If f ∈ , then from the properties (i) and (iii) it follows that and obviously the inclusion (3.21) holds. This way the property (iv) was proved in the direction (⇒). Conversely, assume that the inclusion (3.21) holds. If then by the property (ii), Reg(F, R g ) = ∅. Therefore we may confine ourselves to the case where By (3.8) we see that ⊂ S, which implies Hence F ⊂ S, and by the property (iii) we see that Thus we have showed that the inclusion (3.21) implies that which completes the proof of the property (iv).

Corollary 3.4 If F = ∅ is a closed and linear set in
and then (3.23) Since g ∈ L 2 (R), it follows from Lemma 2.5 that g * is a continuous functional on H(R). Hence S is also a closed set in H(R), and so F ∩ S is a closed set in H(R). Then each h ∈ F has an orthogonal projection h S onto F ∩ S, i.e.
cf. [8,Sect. 13.3] and [10]. Hence which yields the inclusion (3.21). Applying now (iv) of Theorem 3.3 we conclude that , then from (i) of Theorem 3.3 and the equality (3.23) it follows that If F ⊂ S, then from (iii) of Theorem 3.3 and the equality (3.23) we see that Otherwise there exists and the property (3.24) holds. From (3.8) it follows that ⊂ S.
If h − h S ∈ , then by (3.24), Then the condition (ii) of Theorem 3.3 and the equality (3.23) imply the equality (3.22), which completes the proof.
We end this section with an important application of Corollaries 3.2 and 3.4.

Corollary 3.5
If F = ∅ is a finite dimensional linear set in H(R) and g ∈ L 2 (R), then Proof Given a linear set F in H(R) and f ∈ cl(F) there exists a sequence N n → f n ∈ F such that Assume first that dim(F) = 1, where dim(X ) stands for the dimension of a linear set X . Then F = lin({h}) for certain h ∈ L 1 (R). If h = 0, then obviously F ⊂ , which gives cl(F) ⊂ + F. Otherwise h > 0, and there exists a sequence N n → λ n ∈ B such that f n = λ n h for n ∈ N. By (3.25) we have Hence |λ − λ n | → 0 as n → ∞ for certain λ ∈ B, which together with (3.25) gives We conclude from this that and finally that Fix n ∈ N and suppose that (3.26) holds provided dim(F) ≤ n. Assume now that dim(F) ≤ n + 1. If F ⊂ , then the inclusion (3.26) evidently holds. Otherwise there exists h ∈ F\ and we may consider the following sequences we can see that

From (3.25) and Schwarz's inequality (2.20) it follows that
and consequently which gives Then applying the first part of the proof we deduce that and by the assumption, Combining this with (3.27) we obtain Applying now the mathematical induction we conclude that the inclusion (3.26) holds for every finite dimensional linear set F in H(R). The inverse inclusion is obvious, because = cl({θ }) ⊂ cl(F) and F ⊂ cl(F). Therefore By Corollary 3.4, Applying now Corollary 3.4 once more we can see that which completes the proof.

Orthogonal decompositions of regression functions
In this section we establish various results dealing with the orthogonality properties of regression functions. Given f, g ∈ L 1 (R) we will write We extend this relation to any nonempty sets F, G ⊂ L 1 (R) in the following manner: Given p, q ∈ Z, p ≤ q, and a sequence Z p,q k → F k of nonempty sets in the space H(R), we write q k= p F k for the set of all In the sequel we will use the following auxiliary properties that hold for any nonempty subsets S 1 and S 2 of L 1 (R) such that S 1 ⊥ S 2 : The proofs of these properties are slight modifications of the well known properties, that hold in a Hilbert space; cf. e.g. [ In particular, we set We first prove the following two auxiliary lemmas.
then q k= p cl(F k ) is a linear and closed set in H(R) and and so the first equality in (4.4) holds. Therefore we may assume that q = ∞. Let now Hence f ∈ n k= p F k , and consequently, which together with the inclusion (4.5) leads to the first equality in (4.4). It remains to prove the second equality in (4.4). From the property (4.2b) it follows that and hence the second equality in (4.4) holds for every q ∈ Z p,∞ .
Consider now the infinite case q = ∞. Assume that Then there exists a sequence From (3.14c) and (3.14a) we deduce that each set cl(F k ), as k ∈ Z p,∞ , is linear and closed in H(R), and hence there exists an orthogonal projection f k of f onto the set cl(F k ). Therefore Given m ∈ Z p,∞ and h ∈ m k= p cl(F k ) there exists a sequence such that h = m k= p h k . From this and (4.9) we see that Given ε > 0 we conclude from (4.7) that f − f n < ε/2 for some n ∈ Z p,∞ . Since and so for certain m ε ∈ Z p,∞ , Combining this with (4.11) we see that Conversely, assume that which together with the inclusion (4.12) leads to the second equality in (4.4).
Since lin q k= p F k is a linear set in H(R), we deduce from (3.14c), (3.14a) and (4.4) that q k= p cl(F k ) is a linear and closed set in H(R) for every q ∈ Z p,∞ ∪ {∞}, which is the desired conclusion. Lemma 4.2 Given p ∈ Z and q ∈ Z p,∞ ∪{∞} let Z p,q k → F k be a function such that F k is a nonempty linear set in the space H(R) for every k ∈ Z p,q . If g ∈ L 2 (R), Reg(F k , R g ) = ∅ for every k ∈ Z p,q and the condition (4.3) holds, then (4.14) Applying Lemma 3.1, with F replaced by any F k , we see that We will show that Consider first the case where q ∈ Z. By (4.3) and (4.15) we get Applying the Schwarz inequality (2.20) we conclude from (4.3) that for all n, m ∈ Z p,∞ with n < m, Applying again the Schwarz inequality (2.20) we see that for every n ∈ Z p,∞ , and hence Analysis similar to that in (4.17) shows that By Lemma 2.5, the functional g * is continuous on H(R), and so Combining this with (4.18), (4.19) and (4.20) as well as applying the Schwarz inequality (2.20) we obtain which yields (4.16). Hence Lemma 3.1 shows that f ∈ Reg(F, R g ). Therefore the inclusion (4.13) holds, which is our claim.
The basic result in this section is the following orthogonal decomposition theorem in the countable case. Theorem 4.3 Given p ∈ Z and q ∈ Z p,∞ ∪ {∞} let Z p,q k → F k be a function such that F k is a nonempty closed linear set in the space H(R) for all k ∈ Z p,q . If g ∈ L 2 (R) and the condition (4.3) holds, then Proof Since the sets F k , as k ∈ Z p,q , are closed in H(R), we conclude from Corollary 3.4, with F replaced by each F k , that 2 it follows that f ∈ Reg(F, R g ). By Lemma 4.1 we deduce that F is a linear and closed set in H(R). Then by Corollary 3.4 we see that

From this and (4.22) we conclude that
and so the equality (4.21) holds, which is our claim.
We now extend the decomposition (4.21) to arbitrary family of closed linear sets in the space H(R). To this end we write . Then the following theorem holds.

Theorem 4.4 Given a set
is a countable set and for every q ∈ N ∪ {∞} and every injective mapping σ of Z 1,q onto A , the equality holds provided A = ∅ and Proof Since F and F α as α ∈ A are closed sets in H(R) we see that Hence and by Corollary 3.4 we deduce that there exist f ∈ F and a function Given p, q ∈ N with p < q, let us consider an injective sequence From (4.23) and (3.14c) it follows that and applying now Theorem 4.3 we obtain for some f γ ∈ Reg(F γ , R g ). From this, by (4.23) and (4.2f) we have Fix now m ∈ N and assume that the set is not finite. Then there exists an injective sequence σ : N → A m . Applying the inequality (4.29) with p := 1 and γ replaced by σ restricted to the set Z 1,q we see that it follows that A is a countable set. From Lemma 3.1 and (4.26) it follows that we see that The remaining part of the proof will be divided naturally into three cases: Applying now Theorem 4.3 we can see that which yields the equality (4.24). For each n ∈ N there exist n ∈ N and a sequence Z 1,n k → α k ∈ A such that Then there exists a sequence Applying now (4.30) and the Schwarz inequality (2.20) we obtain Thus g * (h n ) = 0 for every n ∈ N. Since g * is a continuous functional on the space H(R), we conclude from (4.31) that g * (h) = 0 for every h ∈ F. Then Lemma 3.1 shows that θ ∈ Reg(F, R). Hence and by Corollary 3. Moreover, by (3.12) and (4.2b) we obtain Applying our theorem in the already proved case I we have where q ∈ N ∪ {∞} and σ is an injective mapping of Z 1,q onto A . Applying once again our theorem in the already proved case II we obtain Reg(cl(F ), R g ) = . (4.34) Combining the equalities (4.32), (4.33) and (4.34) we conclude from Corollary 3.4 that which yields the equality (4.24) in the third case, and the proof is complete.

Corollary 4.5
Given q ∈ N ∪ {∞} let Z 1,q k → F k be a sequence such that F k is a nonempty closed linear set in the space H(R) for every k ∈ Z 1, p . If F is a closed linear set in the space H(R), g ∈ L 2 (R), F k ⊂ F for every k ∈ Z 1, p and the condition (4.3) holds, then Proof Since F is a nonempty closed linear set in the space H(R), we conclude from (3.14a) that F 0 is also a nonempty closed linear set in the space H(R). Moreover, F 0 ⊥ F k for every k ∈ Z 1, p and, by (3.14c), Then Theorem 4.3 shows that and the equality (4.35) is proved.

The regression functions calculating procedure
Given an asynchronous regression structure R let F be a nonempty linear set in the space H(R) and g ∈ L 2 (R). If Reg(F, R g ) = ∅, then Theorem 3.3 enables us to find regression functions in F with respect to R g , provided we can determine the linear set (F ∩ S) ⊥ ∩F, which is rather difficult task, in general. However in the case where F is spanned by a finite system of functions we can effectively calculate all the regression functions in F with respect to R g in terms of these functions; cf. [12,Corollary 3.2].
Obviously, this case is the most essential one from the practical point of view.
In what follows we will show how to use directly Lemma 3.1 in order to determine all regression functions in F with respect to R g , provided From Lemma 3.1 it follows that f satisfies the condition (3.2), which leads to the following equalities This way we obtain the following linear equation system of the Gramm-Schmidt type p k=1 λ k h k |h l = g * (h l ), l ∈ Z 1, p , (5.2) with respect to the variables λ 1 , λ 2 , . . . , λ p ∈ B. All solutions of the equation system (5.2) determine the set Reg(F, R g ). The system (5.2) simplifies itself much if we assume that the sequence Then h k |h l = 0 for k = l, and by (5.2) we obtain the unique solution Using orthogonal decomposition properties from the previous section we can develop this idea in the form of the following theorem.
Theorem 5.1 Given p ∈ N ∪ {∞} let Z 1, p k → h k ∈ L 1 (R)\ be an orthogonal sequence in the space H(R) and g ∈ L 2 (R). If p ∈ N, then and applying now (ii) of Theorem 3.3 we derive the equality (5.6). Thus the equality (5.6) holds for every one-dimensional linear set F ⊂ L 1 (R).
Assume now that p ∈ N or p = ∞. Setting F k := lin({h k }) for k ∈ Z 1, p we see that each set F k is a one-dimensional linear subset of L 1 (R). Hence If p ∈ N, then from Lemma 4.2 we conclude that and consequently the equality (5.6) follows from (i) of Theorem 3.3. Thus it remains to consider the case where p = ∞. Since we deduce from (5.8) and Corollary 3.2 that for every k ∈ Z 1,∞ , Then Corollary 3.4 yields Since the sequence Z 1,∞ k → h k ∈ F is orthogonal, we deduce from Lemma 4.1 and (4.2a) that Applying now Theorem 4.3 we conclude from (5.9) and (5.10) that which proves the equality (5.7).
As far as applications are concerned we will study theoretic models spanned by sequences Z 1, p k → h k which in general are not orthogonal in the space H(R), because the pseudo-inner product ·|· depends on the empirical data function x : 1 → A and measure μ. Then we can not apply Theorem 5.1 directly. However, in such cases we can ortogonalize these sequences. To this end we recall that for a given p ∈ N ∪ {∞}, a sequence is said to be an orthogonalization of a sequence where H 0 := and H k : The following lemma gives a sufficient and necessarily condition for a sequence to be ortogonalized. holds.
Proof Fix p ∈ N and a sequence Z 1, p k → h k ∈ L 1 (R). Assume first that the condition (5.12) holds. Then there exists a sequence Z 1, p k → h k ∈ L 1 (R) satisfying (5.11). Setting Then for every l ∈ Z 1, p , and hence λ l = 0, because h l > 0 by (5.11 Moreover the equality dim(H p ) = p implies that the sequence Z 1, p k → h k is linearly independent. This proves the lemma in the direction (⇐), provided p ∈ N.
Conversely, assume that the sequence Z 1, p k → h k is linearly independent and the equality ∩ H p = {θ } holds. If h k = 0 for certain k ∈ Z 1, p , then h k ∈ ∩ H p , and so h k = θ . This is impossible, because the sequence Z 1, p k → h k is linearly independent. Therefore h k > 0, k ∈ Z 1, p . (5.14) In particular, and so the property (5.12) holds in the case where p = 1. Therefore we may assume that p ≥ 2. Then provided the condition (5.12) does not hold. Hence there exists a sequence and we can define Since H k−1 ⊂ H k for k ∈ Z 1,q , the sequence Z 1,q−1 k → h k is orthogonal, and consequently for each l ∈ Z 1,q−1 , Therefore the sequence Z 1,q k → h k is orthogonal, and following the first part of the proof we see that this sequence is linearly independent. Hence dim(lin({h k : k ∈ Z 1,q−1 })) = q − 1.
Since h k ∈ H k ⊂ H q−1 for k ∈ Z 1,q−1 , we see that On the other hand side the sequence Z 1,q−1 k → h k is also linearly independent, and consequently dim(H q−1 ) = q − 1.
This is impossible, because the sequence Z 1,q k → h k is linearly independent. Therefore h q > 0, which together with (5.18) implies that contrary to (5.15). This means that the condition (5.12) holds, which proves the lemma in the direction (⇒), provided p ∈ N.
It remains to prove the lemma in the case where p = ∞. Then Z 1, p = N. Assume first that a sequence N k → h k ∈ L 1 (R) is linearly independent and the equality ∩ H ∞ = {θ } holds. Then for every n ∈ N the sequence Z 1,n k → h k is also linearly independent and Applying now the already proved finite part (⇒) of the lemma we see that for every n ∈ N, which means that the condition (5.12) holds. Conversely, assume now that a sequence satisfies the condition (5.12). From the already proved finite part (⇐) of the lemma it follows that for every n ∈ N the sequence Z 1,n k → h k is linearly independent and H n ∩ = {θ }. Hence each finite subsequence of the sequence N k → h k is linearly independent, and hence the sequence N k → h k is linearly independent as well. Moreover, Thus the lemma holds also in the case, where p = ∞, which completes the proof.
From Lemma 5.2 it follows that each linearly independent sequence Z 1, p k → h k ∈ L 1 (R) satisfying the condition ∩ H p = {θ } has an associated sequence being its ortogonalization result. Such a sequence Z 1, p k → h k may be determined by using the Gramm-Schmidt recursive method by setting Suppose that Z 1, p k → h k ∈ L 1 (R) is a sequence satisfying (5.11) and that g ∈ L 2 (R). If p ∈ N, then In particular, the sequence Z 1, p k → h k may be defined by (5.19).
Proof Fix p ∈ N ∪ {∞} and consider any sequences satisfying the assumptions. By Lemma 5.2, the condition (5.12) holds. Therefore the last sequence exists. It may be defined for instance by (5.19). From the property (5.11) it follows that h k = 0 for k ∈ Z 1, p and h k ⊥ h l for k, l ∈ Z 1, p such that k = l. Moreover, by definition, Thus, applying Theorem 5.1 to the sequence Z 1, p k → h k replaced by its ortogonalized associate Z 1, p k → h k , we derive the assertion.
The following example shows how to apply Lemma 5.2 and Corollary 5.3 to numerical computation of the regression functions.

Example 5.4 Given
is a sequence satisfying (5.11) and that g ∈ L 2 (R). Let us consider the sequence This together with (5.24) yields μ = 0, and consequently v n ∈ H n−1 . Moreover, from which means that each v k is an orthogonal projection of h k onto H k−1 . Then and, consequently, We now define the following two matrixes as well as From this, (5.24) and (5.30) we conclude that for all k ∈ Z 2, p and l ∈ Z 1, p , From this, (5.28) and (5.31) it follows that for every k ∈ Z 2, p , which together with (5.32) leads to Now, we wish to find a sequence Z 1, Then (5.25) and (5.27) imply which together with (5.37) yields Applying the initial formulas (5.29) and the recursive ones (5.33), (5.35) and (5.38) we can directly compute the coefficients λ k , k ∈ Z 1, p , of the regression function f in the basis Z 1, p k → h k of F. Therefore these formulas are suitable for numerical computation of the regression functions.
In certain cases the following corollary can be an useful tool in order to determine the regression functions.

Corollary 5.5 Given nonempty linear sets F and F in
(5.39) Proof Setting F 1 := F and F 2 := F ∩ F ⊥ we deduce from the assumption F ⊂ F that Then Lemma 4.2 shows that which proves the inclusion (5.39).
We end this section with a few simple observations arrising from Corollary 5.3 on the structure of the set consisting of all regression functions.
provided p ∈ N. Since ∩ F is a linear set, the second equality in (5.40) shows that the class Reg(F, R g ) forms an affine variety in the space H(R). Moreover from (5.40) we can easily deduce that the following properties are pairwise equivalent: (i) f is a unique regression function in F with respect to R g ; If additionally the sequence Z 1, p k → h k ∈ F\ satisfies the orthogonality condition (5.3), then the formulas (5.19) yield and consequently the property (5.40) remains valid after replacing h k by h k as k ∈ Z 1, p .
According to (5.40) the class Reg(F, R g ) is determined by the sequence We call it the regression functions sequence (RFS) generated by a linearly independent sequence Z 1, p k → h k ∈ L 1 (R) satisfying the equality is a functional model of R g .

Examples
It is worth noting that our approach to the regression theory is very flexible. We provide an universal and simple theory covering classical cases of regressions where the theoretic functional model F is spanned by polynomials, trigonometric polynomials and other specific functions; cf. e.g. [17] and [4]. Moreover, we study the regression functions with respect to the wide range of the regression structures R, involving the generalized quadratic deviation (2.2) by means of certain measures μ. This simplifies much theoretical considerations on the ground of pseudo-Hilbert spaces. On the other hand side we gain the possibility of using the modified least squares method which can be more adequate in more specific situations. In Example 1.1 the classical least squares method was used. According to the equality (2.4) in Example 2.2, this is a special case of the criterion δ with the measure μ satisfying (2.3) and (2.5). In what follows we present an example which motivate using a more sophisticated measure μ. Example 6.1 Following Example 1.1 we want to determine now the electric circuit resistance R by means of measurements samples of intensity and voltage represented by two sequences Z 0,n k → i k and Z 0,m k → v k for some n, m ∈ N. Assume that all measurements were made independently. Given a precision rate let ρ k be the probability that the intensity sample i k satisfies the precision rate for k ∈ 1 := Z 0,n and let ρ l be the probability that the voltage sample v l satisfies the precision rate for l ∈ 2 := Z 0,m . As in Example 1.1 we consider the regression structure R, where A := R, B := R, the empiric data functions are defined by Z 0,n k → x(k) := i k and Z 0,m k → y(k) := v k , and as the deviation criterion δ we take the generalized quadratic deviation given by (2.2). Following Example 2.2 we define the measure μ as a unique measure satisfying the equalities (2.3). Now, we shall need to define the numbers ρ k,l for k ∈ 1 and l ∈ 2 . Obviously, we can do it in many ways. In our particular case all measurements were made independently, so it seems to be natural to set ρ k,l := ρ k · ρ l , k ∈ 1 , l ∈ 2 . (6.1) Then each coefficient ρ k,l is equal to the probability of the event that both the measurement samples i k and v l satisfy simultaneously the prescribed precision rate. As a matter of fact the coefficient ρ k,l reflects accuracy of the measurement samples i k and v l , and thereby reflects accuracy of the measurements devices used for getting these samples. If a coefficient ρ k,l is closer to 1, then intuitively the corresponding pair (i k , v l ) of samples is more valuable for us. Therefore the generalized quadratic deviation criterion δ, defined by (6.1), seems to be more natural in this case as compared to the classical least squares method, where all samples (i k , v k ) are treated equivalently and that the samples of the form (i k , v l ) as k = l, are not considered at all.
As in Example 1.1, we consider the theoretic functional model F represented by linear functions R t → rt for r ∈ R. Then F = lin({h 1 }) where h 1 is the identity mapping on R, i.e. h 1 (t) = t for t ∈ R. Thus we can apply our theory from the previous sections in order to determine all regression functions in F with respect to R. The condition (2.8) obviously holds for every function f : A → B, which means that L 1 (R) = (R → R). From (2.12) we have It is also easily seen that each function g : B → B satisfies the condition (2.9), and so L 2 (R) = (R → R). From (2.22) it follows that Assume that h 1 = 0. Then F ⊂ , which implies, by Corollary 3.2, that Suppose for simplicity that ρ k,l > 0 for k ∈ 1 and l ∈ 2 . From (6.2) we see that h 1 = 0 iff i k = 0 for k ∈ 1 . Thus the equality h 1 = 0 means that the current intensity vanishes (current does not flow) or the current intensity is below the sensitivity of intensity measurements devices. In both the cases we are not able to determine the resistance R. This provides a natural interpretation of the equality Reg(F, R) = F. Assume now that h 1 = 0. Then h 1 ∈ F\ , and so ∩ F = {θ }. Theorem 3.3 (or directly Theorem 5.1) now leads to which means that the set Reg(F, R) consists of the unique regression function Combining this with (6.2) and (6.3) we can uniquely determine the resistance Note that if n = m, g is the identity function and the coefficients ρ k,l are defined by (2.5), then (6.5) yields (1.6). Such a situation naturally corresponds to the sequence Z 0,n k → (i k , v k ) of n + 1 simultaneous measurements of the current intension and voltage with the same precision.
The following example illustrates the usage of Corollary 5.3 in the case where the theoretic functional model F is spanned by two functions. Example 6.2 Given a regression structure R with g ∈ L 2 (R) let us consider the case where the functional model F = lin({h 1 , h 2 }) is spanned by two linearly independent functions h 1 , h 2 ∈ L 1 (R), such that F ∩ = {θ }. Applying Corollary 5.3 we see that where according to (5.19), Hence, h 2 ⊥ h 1 , and consequently Setting we conclude from (6.6) and (6.7) that Combining (6.9) with (6.7) and (6.8) we obtain In particular, if h 2 (t) = t, h 1 (t) = 1 and g(t) = t as t ∈ R, and the regression structure R is defined as in Example 2.2 under the assumption that m = n and the coefficients ρ k,l satisfy (2.5), then the equalities in (6.11) yield a 2 = a 0 and a 1 = b 0 where a 0 and b 0 are defined in (1.2).
The next example deals with applications to econometric sciences.

Example 6.3 Consider a linear econometric model of the following form
a k X k , (6.12) for some p ∈ N and a sequence of coefficients Z 0, p k → a k ∈ R. This model describes the theoretic dependence the variable Y on the variables X 1 , X 2 , . . . , X p , which represent certain economical parameters; cf. e.g. [7, pp. 89-130], [9, pp.127-208], [3, pp. 73-93]. We seek the best such model for a given n ∈ N and experimental data series Z 1,n k →ỹ k ∈ R and Z 1,n k →x k,l ∈ R for l ∈ Z 1, p , with respect to the classical quadratic deviation. To be more precise we seek a sequence of coefficients Z 0, p k →â k ∈ R satisfying the condition Then for each sequence of coefficients Z 0, p k → a k ∈ R, the function f := p l=0 a k h k ∈ F, and by (6.13) we see that iff Z 0, p k →â k ∈ R is the best choice of coefficients. This way the optimal linear model (6.12) can be expressed by ( p + 1)-dimensional regression functionŝ f ∈ Reg(F, R).
The next example deals in a very natural way with theoretical functional model F supported by complex-valued functions.
Example 6.4 Assume that a point P runs along an elliptic trajectory with a constant radial speed ω. We want to describe the trajectory by means of the location measurements samples of the point P. It is quite convenient to use here the complex plane C, because the elliptic trajectory has a simple representation f (t) := a + be iωt + ce −iωt , t ∈ R, (6.14) for certain a, b, c ∈ C. Consider an asynchronous regression structure of the form R := (A, B, δ; x, y), where: A := R and B := C. As a functional model F of R we set where h 0 (t) := 1, h 1 (t) := e iωt and h 2 (t) := e −iωt for every t ∈ R. From (6.14) it follows that the optimal trajectory is a 3-dimensional regression f ∈ Reg(F, R). In particular, if the deviation criterion δ is defined by the formula (2.6), then the optimal trajectory f ∈ Reg(F, R) is best fitted to the empirical data functions x : Z 0,n → R and y : Z 0,n → C with respect to the classical square deviation. Note that a 2dimensional regression f ∈ Reg(F , R), where is the best chosen circular trajectory.

The synchronous regression structures
We have considered so far the asynchronous regression structures of the form R := (A, B, δ; x, y) where the empirical data functions x and y were depended on two different parameters t 1 and t 2 , respectively. In the other words the functions x and y were defined asynchronously. The dependence between these functions was given by a measure μ; see Example 6.1. In this section we define another type of regression structures where, roughly speaking, the empirical data functions x and y depend on one parameter t, i.e. they are defined synchronously. To be more precise, we consider regression structures satisfying the following three conditions:

III.3
There exist a σ -field A of subsets of the set := 1 and a measure ν : A → [0; +∞] such that the function δ satisfies the following equality provided the function |u − v| is A-measurable, and δ(u, v) = +∞, otherwise.
Then the regression problem for R means the extremal problem of determining all functions f 0 ∈ F minimising the functional F satisfying, according to (1.3) and (7.1), the following equality Definition 7.1 Any regression structure R satisfying the conditions III.1, III.2 and III.3 is said to be the synchronous regression structure; real synchronous regression structure as B = R and complex synchronous regression structure as B = C.

Remark 7.2
The above type of regression structures corresponds to the classical regression theory, where := Z 0,n for certain n ∈ N and the measure ν defined on the family A of all subsets of is such that ν({k}) = 1 for k ∈ . Then by (7.2), where x k := x(k) and y k := y(k) for k ∈ . Thus the deviation criterion δ coincides with the classical quadratic deviation.
Analyzing our considerations from the previous sections it is easy to see that we could develop, in much the same way, an adequate theory of the synchronous regression structures. However, we will handle this case in a different, more interesting manner. We will show that, in fact, all the properties of the regression functions with respect to the synchronous regression structure R can be derived from those with respect to certain asynchronous regression structure R * associated with R. Therefore the regression theory for asynchronous regression structures, embraces in this sense the one for synchronous regression structures.
In the discrete case described in Remark 7.2 we can easily associate an asynchronous regression structure considered in Example 2.2 by putting (2.5). Then (2.6) yields (7.1). Therefore in such a case synchronous regression structures can be treated as a special case of asynchronous regression structures where the measure μ is focused to the diagonal of 1 × 2 . Therefore this method of reduction of asynchronous regression structure to synchronous regression structure will be called the diagonal method. It is possible to find a variant of the diagonal method in the non-discrete case as well, but it is more difficult task. We will handle this problem in what follows now.
First of all we recall the following, useful in the sequel, fact; cf. e.g. [2, Thm. 1.6.12].

Moreover, for every
as well as Consider a synchronous regression structure R = (A, B, η; x, y), where the deviation criterion η satisfies the equality (7.1) instead of δ for a given measurable space is an injective mapping of onto the diagonal D := {(t, t) : t ∈ }. (7.5) where D := ( × )\D, is the smallest σ -field in × such that

Lemma 7.4 The set
Moreover, and for every set S ⊂ the implication holds.
Proof We show first that B is a σ -field in × . Given a sequence N k → S k ∈ B we deduce from (7.5) that there exist sequences we can see by (7.5 Fix now S ∈ B. Then for some S , S , S ∈ A ⊗ A. Since we deduce from (7.5) that Thus we have showed that B is a σ -field in × . Moreover, which shows that as S ∈ A ⊗ A, and so the inclusion (7.6) holds. Suppose now that B is a σ -field in × satisfying (7.6) with B replaced by B . Then D ∈ B . From (7.5) we conclude that each S ∈ B satisfies the equality for some S , S , S ∈ A ⊗ A. Since A ⊗ A ⊂ B , it follows that S ∈ B , and so B ⊂ B . Thus B is the smallest σ -field in × containing the diagonal D and the family A ⊗ A.
For the proof of the equality (7.7) we consider the family Then for every sequence N n → S n ∈ A φ , and so ∞ n=1 S n ∈ A φ . Furthermore, for every S ∈ A φ , and consequently, Thus A φ is a σ -field in × . Since we see that for all U, V ∈ A, (7.12) because A⊗A is the smallest σ -field in × containing the set A×A. Furthermore, and so D ∈ A φ . This, together with (7.12), yields the inclusion (7.6) with B replaced by A φ . Therefore B ⊂ A φ , because B is the smallest σ -field in × containing the set A ⊗ A ∪ {D}. Then by (7.10) we obtain On the other hand side, S × S ∈ B for any S ∈ A. From (7.11) it folows that Therefore the inverse inclusion to that in (7.13), and consequently, the equality (7.7) holds. It remains to prove the implication (7.8). To this end fix a set S ⊂ . If S × ∈ B, then from (7.11) and (7.7) we obtain which yields the implication In the same manner we can see that Both the implications lead to (7.8), which completes the proof.

Lemma 7.5 The function
is well defined, the structure ( × , B, μ) is a measurable space and Proof Since ν is a measure on the σ -field A, we deduce from (7.7) that μ : B → R is a well defined function. Let N k → S k ∈ B be a sequence of pairwise disjoint sets. Then for all k, l ∈ N, k = l, we have is a sequence of pairwise disjoint sets. Since ν is a measure on the σ -field A, we deduce from (7.14) and (7.7) that Moreover, the following properties hold Thus μ is a measure on the σ -field B and the equality (7.15) holds.

Lemma 7.6
Given any functions h : → B and H : × → B suppose that one of the following conditions hold: Then h ∈ L 1 ( , A, ν) ⇐⇒ H ∈ L 1 ( × , B, μ). Therefore by (7.18), (7.14) and the properties of Lebesgue's integral we deduce that This shows, by Lemma 7.4, that the condition (i) implies (7.16). In the much similar way we show that the condition (ii) also implies (7.16). This gives (7.17), and the proof is complete.
We are now in a position to show the fundamental result in this section.
Theorem 7.7 Suppose that a measurable space ( , A, ν) determines a synchronous regression structure R = (A, B, η; x, y). Then the measurable space ( × , B, μ), defined in Lemma 7.5, determines an asynchronous regression structure R * = (A, B, δ; x, y), and the following properties hold: Moreover, for each g ∈ L 2 (R * ) and any nonempty linear set F ⊂ L 1 (R * ), Proof Applying the implication (i) ⇒ (7.16) from Lemma 7.6 with h replaced by |h| 2 we get Given h, g ∈ L 2 (A, A x , ν x ) we conclude from (2.10) that Then Lemma 7.3 shows that we see by the implication (i) ⇒ (7.16) from Lemma 7.6 that H ∈ L 1 ( × , B, μ).
which together with (7.25) yields (7.21). By this we have and Applying the implication (ii) ⇒ (7.16) from Lemma 7.6 with h replaced by |h| 2 we get and by (2.10) we see that we see, by the implication (ii) ⇒ (7.16) from Lemma 7.6, that Then, by the implication (iii) ⇒ (7.16) from Lemma 7.6, we see that
In the same manner we can show that Both the inclusions yield the equality (7.24), which completes the proof.
As an application of Theorem 7.7 we will show the following counterpart of Corollary 5.3 for synchronous regression structures. R = (A, B, η; x, y) be a synchronous regression structure determined by a measurable space ( , A, ν) and let g ∈ L 2 (B, A y , ν y ).

Corollary 7.8 Let
Moreover, the sequence   (A, A x , ν x ), which is a direct consequence from (7.21).

Remarks on approximation
In this section we will point out that the synchronous regression structures can be applicable in the theory of approximation. To this end let R = (A, B, η; x, y) be a synchronous regression structure determined by a measurable space ( , A, ν), where A = B = and x, y are the identity mapping on B. Then ν x = ν y = ν, and thereby B, A, ν).
Let us consider a sequence and fix g ∈ L 2 (B, A, ν). From Corollary 3.5 and Theorem 7.7 it follows that for every n ∈ N, where F n := lin({h k : k ∈ Z 1,n }) as n ∈ N. Therefore there exits a sequence which can approximate the function g, as stated in the following theorem. where F := lin({h k : k ∈ N}). In particular, if g ∈ cl ν (F), then g ∈ Reg(cl ν (F), R g ) and g − f n ν → 0 as n → ∞.
Proof Assume that f ∈ Reg(cl ν (F), R g ) is arbitrarily fixed. By the definition of the regression function, and consequently, Hence f is an orthogonal projection of g onto the linear and closed set cl ν (F) in the space L 2 (B, A, ν), and so and F = ∞ n=1 F n , we see that there exists a sequence N n → f n ∈ F n satisfying the following property By assumption, f n ∈ Reg(F n , R g ) for n ∈ N, and therefore and (8.1) leads to (8.2), which completes the proof.
The following example illustrates how to apply synchronous regression structures in the theory of approximation. (B, A, ν) be an orthogonal sequence in the space L 2 (B, A, ν) such that h k ν > 0 for k ∈ N. Then for every g ∈ L 2 (B, A, ν), f − n k=1 g|h k ν h k 2 ν h k → 0 as n → ∞, (8.8) where f is an orthogonal projection of g onto cl ν (F) in the space L 2 (B, A, ν) and F := lin({h k : k ∈ N}). Moreover, the following Bessel equality holds: By the orthogonality of the sequence N k → h k , we deduce from (8.10) that Combining this with (8.11) and (8.12) we get (8.9).

The regression in probabilistic spaces
Regression functions are very often used in the context of the probability theory. Note that this case corresponds to a synchronous regression structure R = (A, B, η; x, y) determined by a probability space ( , A, P), where ν = P, A = B and x, y : → B are random variables, i.e. they are A-measurable. Therefore all the facts discussed in Sect. 7 are still valid, in particular, in the probabilistic case. In this section we rewrite formulas (7.28) and (7.29) in terms of -specific for the probability theory -expected values of random variables and distribution functions. First we note that usually instead of the probability measure P the distribution function P x,y generated by P and random variables x and y is considered. We recall that P x,y is the unique probability measure on the σ -field B(B) ⊗ B(B) satisfying the following condition From (9.1) it follows that B×B I U k ×V l (t 1 , t 2 )d P x,y (t 1 , t 2 ) = P x,y (U k × V l ) = P(x −1 (U k ) ∩ y −1 (V l )) Combining this with (9.3) we deduce that B×B g(t 1 )h(t 2 )d P x,y (t 1 , t 2 ) = B×B n k=1 m l=1 λ k μ l I U k (t 1 ) I V l (t 2 )d P x,y (t 1 , t 2 ) = n k=1 m l=1 λ k μ l B×B I U k ×V l (t 1 , t 2 )d P x,y (t 1 , t 2 ) and so the lemma holds for any simple Borel functions g and h.
Assume now that g, h : B → B are any Borel functions. Then there exist sequences N n → g n and N n → h n of simple Borel functions such that |g n (t)| ≤ |g(t)| and |h n (t)| ≤ |h(t)|, t ∈ B, n ∈ N, (9.4) as well as for every t ∈ B, g n (t) → g(t) and h n (t) → h(t) as n → ∞. (9.5) By the already proved equality (9.2) for simple Borel functions we see that the equalities (g n • x)(h n • y)d P = B×B g n (t 1 )h n (t 2 )d P x,y (t 1 , t 2 ) (9.6) as well as |(g n • x)(h n • y)|d P = B×B |g n (t 1 )h n (t 2 )|d P x,y (t 1 , t 2 ) (9.7) hold for every n ∈ N. Assume that (g • x)(h • y) ∈ L 1 ( , A, P).
Since each function B × B (t 1 , t 2 ) → |g n (t 1 )h n (t 2 )| is B(B)⊗B(B)-measurable we conclude from (9.5) that the pointwise limit function B × B (t 1 , t 2 ) → |g(t 1 )h(t 2 )| is also B(B) ⊗ B(B)-measurable. Applying now Fatou's integral lemma, (9.7) and (9.4) we have B×B |g(t 1 )h(t 2 )|d P x,y (t 1 , t 2 ) = B×B lim inf n→∞ |g n (t 1 )h n (t 2 )|d P x,y ≤ lim inf n→∞ B×B |g n (t 1 )h n (t 2 )|d P x,y (t 1 , t 2 ) This means that the function is integrable with respect to the distribution function P x,y , provided In much the same way we can justify the inverse implication. It remains to show the equality (9.2), provided (g • x)(h • y) ∈ L 1 ( , A, P).
Applying now Lebesgue's dominated limiting integral theorem we conclude from (9.4), (9.5) and (9.6) that which is the desired conclusion.
Since our structure R is a synchronous regression structure we can use directly Corollary 7.8 in order to compute the regression functions. However, from the practical point of view it seems to be more convenient to compute the ones in terms of the distribution function P x,y . To this end we introduce the following two functions: as well as for f ∈ L 2 (B, A x , P x ), g ∈ L 2 (B, A y , P y ) and h ∈ L 2 (B, A x , P x )\ P x , where E is the expected value operator for the probability space ( , A, P), i.e. E( f ) := f d P, f ∈ L 1 ( , A, P). (9.10) By Lemma 9.1 we obtain (9.11) and E y,x (g|h) = B×B g(t 2 )h(t 1 )d P x,y (t 1 , t 2 ) B×B |h(t 1 )| 2 d P x,y (t 1 , t 2 ) Combining (9.9) with (9.19) and (9.17) we have as g ∈ L 2 (B, B(B), P y ), and h ∈ L 2 (B, B(B), P x )\ P x . Using (9.18) and (9.19) we deduce from Corollary 7.8 that equalities (9.13), (9.14) and (9.15) hold, which is the desired conclusion.