The Value of the High, Low and Close in the Estimation of Brownian Motion: Extended Version

The conditional density of Brownian motion is considered given the max, B(t|\max), as well as those with additional information: B(t|close, max), B(t|close, max, min) and B(t|max, min) where the close is the final value: B(t=1)=c and t in [0,1]. The conditional expectation and conditional variance of Brownian motion are evaluated subject to one or more of the the close (final value), the high (maximum), the low (minimum). Computational results displaying both the expectation and variance in time are presented and compared with the theoretical values. We tabulate the time averaged variance of Brownian motion conditional on knowing various extremal properties of the motion. The final table shows that knowing the high is more useful than knowing the final value among other results. Knowing the open, high, low and close reduces the time averaged variance to 42% of the value of knowing only the open and close (Brownian bridge).


Introduction
Classically, most of the financial forecasting based on charts uses at most four pieces of information for each day, the opening price (open), the closing price (close), the maximum price (high) and the minimum price, (low) [18]. We address the issue of how much additional information the high and low carry beyond that of the open and close. We measure the "value" by the reduction of the variance of the Brownian motion given one or both of the high and low.
In today's financial markets, every tick is archived. In analyzing events in the ancient past (1970s) or less automated markets like credit default swaps or emerging market bonds (roughly pre-2013), the only data that typically is available is open, high, low, close data. An entire field, chartist analysis, uses these descriptors as the "sufficient statistics" for prediction. This paper defines the probability distribution of B(t|high, low, close) and calculates its expectation. Our formulas allow us to interpolate the price signal as E[B(t|open, high, low, close)] over all time in [0, 1] given any data source that only has open, high, low, close data.
There have been a number of studies that use the open, high, low and close to improve the estimate of the volatility (standard deviation) of the Brownian motion [11,16,17,20]. In contrast, we assume that the variance is given and standardized to one. In reality, the volatility of financial time series are unknown, bursty, and temporally non-uniform on many time scales. Given a model of the time dependence of the volatility, one must transform time to an equal volatility time. For this paper, we neglect this difficult problem and proceed with the studying standardized Brownian motion.
Let B(t) be the standard Brownian motion on [0, 1] and B c (t) be the Brownian motion restricted to B c (t = 1) = c. Our notation tracks the excellent compendium of results by Devroye [6]. Many of the results summarized in Sections 2 can be found there. The variance of the path of a Brownian motion is V (t) = t which integrates to 1 0 V (s) = 1/2. For the Brownian bridge pinned to B(t) = c, the variance is independent of the terminal value, c, and satisfies V (t) = t(1 − t). with an average variance of .125. This shows that one minimizes the uncertainty during the day by recording not the final value of the Brownian motion but the value at time = .75. The 25% reduction in the variance by shifting the time point to t o = .75 is much smaller than the increase in variance for t o < .5. For finance, we are often interested in forecasting the overnight return. In these cases, the closing value may be more valuable than having the value at time equal .75. We consider the distribution of B(t) conditioned on one or more of B(t = 1) = c, max t∈[0,1] B(t) = h and min t∈[0,1] B(t) = . We evaluate the conditional density of B(t|close, max) and B(t|close, max, min) using Chapman-Kolmogorov type calculations. The conditional densities of B(t| max) and B(t| max, min) are found by integrating the earlier densities over c. Our primary goal is to evaluate the conditional mean and conditional variance of B(t) in these cases. For several cases, explicit formulas for the moments are given. The location of the minimum and the location of the maximum are unknown in our analysis.
Section 2 reviews results on the density/distribution of extrema of Brownian motion.

Distributions of Brownian Extrema
The study of Brownian extrema date back to the founders of the field [15]. Our brief discussion follows [6] with additional results taken from [21], [8], [7] and [13]. We denote the Gaussian density by φ s (x) = (2πs) −.5 exp(−x 2 /2s). The density of the high, (maximum of B(t)), h is that of the half normal: [21,14], the joint distribution of the close, c, the high, h, and the location of the high, θ, is shown to be: The marginal density of the maximum, h, and c = B(1) is obtained by integrating (2.1) over θ: where h ≥ 0, h ≥ c. [22,12]. The conditional density, p(c|h), is given by 3) The distribution of the high, given the closing value, B(1) = c, is The density for (2.4) can be computed using [16,6]: A result that goes back to Levy [15,5], if not earlier, is where ∆ ≡ (h − ). The symmetric form, (2.9), not only treats h and symmetrically, but also shows the series is in an alternating form. There is a third form of (2 To evaluate the density, p(h, l), we integrate p(h, , c) from L to H and then set L = and H = h: where h k ≡ h − 2k∆ and k ≡ − 2k∆.

Density Given High and Close
We derive the density, P (B(t) = x|B(1) = c, max{B(s)} = h) and then compute the first and second moments. Clearly, The divisor, p(h, c), is the well-known probability and given by (2.2). We decompose P ( The sum of the terms in the square brackets of (3.2) is just ∂F ∂h (x, t, c, h). Note that F (x, t, h, c) is the difference of four Gaussian. We write (3.7) as All four terms, f i , are of the form: . The equality of the four terms at x = h will allow us to cancel terms when we integrate by parts. To simplify our calculations, observe Note f 1 is independent of h and therefore may be ignored. Derivatives of F (x, t, h, c) with respect to h only enter through h dependencies in µ i and g i . Let s i = −1 for i = 2, 3 and s i = 1 for i = 1, 4. We define τ i ≡ .5 * ∂ h µ i so that τ 2 = t, τ 3 = 1 − t, τ 4 = 1. Checking the normalization: (3.14) We use H to denote the upper limit which is evaluated at H = h. The point of using a differrent symbol is that when we differentiate with respect to h, the upper limit H is not differentiated. Now we compute the moments To go from (3.15) to (3.16), we use that the three terms evaluated at x = h again cancel. We evaluate M 1 (t, h, c) and M 2 (t, h, c) in the Appendix and present the results here:

Numerical Methods
Simply put, we generate a large number of Brownian paths, bin the paths in (close, max, min) space and calculate the mean and variance for each time and bin. We order the coordinates of phase space, (q 1 , q 2 , q 3 ), so that q 1 = B(1), q 2 = max 0≤t≤1 B(t) and q 1 = min 0≤t≤1 B(t). We also consider the case where we replace one or more of these operators with argmax or argmin. The results for the argmax case are found in [19]. A very straightforward algorithm is 1) Specify a timestep, dt, a number of bins in each direction nbins, and a number of sample pathe, N samp with typically N samp ≈ κ nbins 3 where κ denotes the typical number of simulations in a bin. More generally, for any choice of grids for the bins, we want at least κ simulations in each bin where κ is a large number. Generate a large array of scaled Gaussian random variables, size (N samp , 1/dt). Cumsum them to generate an array of Brownian paths. We often use a nonuniform time step where the time step is smaller near t = 0 and near t = 1.
2) In the first phase space direction, compute bin boundaries so that the number of curves are roughly equal in each bin. For each one dimensional bin, compute bin boundaries in the second coordinate direction so that the number of bins is roughly equal. Finally, for each of the two dimensional bins, compute bins in the third direction.
3) For each of the nbin 3 bins, assign a triple index, (i, j, k) bins, compute the mean of the coordinates, (q 1 ,q 2 ,q 3 ), and compute the mean, µ(t;q 1 ,q 2 ,q 3 ), and variance, V (t;q 1 ,q 2 ,q 3 ), of {B(t)} in the bin. 4) Test for convergence of µ(t;q 1 ,q 2 ,q 3 ) and V (t;q 1 ,q 2 ,q 3 ) in N samp , nbins, and dt. This involves interpolation as grids boundaries are random functions of the particular ensemble of paths. Note that the grid boundaries for the first coordinate direction are independent of the second two coordinate directions but that the average value of q 1 will depend on all three indices, (i, j, k). We find that interpolation from one grid to another grid broadens the width of the peaked functions especially when argmax is one of the given variables.
There is a bias versus variance tradeoff. If the bins are too large, the variation of the mean and variance will be obscured. If the bins are too small, there will be too few curves in each bin and the sample variance will dominate. Each of the close, max and min have a Gaussian or half Gaussian distribution individually so the tails of the distribution will be spread out. The situation is actually somewhat better as the high and low are exponentially distributed given the closing value. Nevertheless, exponential distributions have very few points in the tail of the distribution. Again, a low density of curves will significantly inflate the size of the tail bins and thereby add larger bias to the the computation of the bin variance. Thus convergence of the mean and variance on the outermost bins is tenuous. When we compute population average variance, we are tempted to downweight or even exclude the outer bins. While this is probably a smart thing to do, we report the simple ensemble average instead of a more complex limit reducing the underweighting as the bin size goes to zero.
Assume that we wish to generate bins in theq direction. We sort the Brownian realization in theq 1 direction. To generate the grid boundaries, we initially tried equi-spaced quantile bins. This results in very large bins in the low density region. These large bins result in bias to our estimates for both the expectation and variance estimates. Let the density of points/curves be n(q). To reduce the the size of the largest bins, we select bin boundaries to keep q k+1 q k n(q) α dq to be approximately equal where {q k } are the bin boundaries. We use α = .7 − .75 while α = 1 corresponds to equal quantiles. We find that first and last bins converge much very slowly in (nSim, nbin) space especially using a quantile based gridding. Using equal bins of n(q) α partially but not completely alleviates this problem.
Wiener's Fourier representation of Brownian paths on [0, 1] is allows us to take one set of Brownian paths and use them on a grid of final values. This significantly reduces the number of realizations we need to cover phase space. Thus if the closing value is the first parameter direction that we examine, a 3-dimensional parameterization is reduced to a sequence of two-dimensional parameterizations.

Brownian Bridge
We begin with plots of our simulation for the Brownian bridge case, i.e. Brownian motion constrained to a given closing value. For this simulation, we use 15 million simulations with nsteps=1500.

Given High
To calculate the probability, p( The theoretical values of E[B(t)|h] and V ar[B(t)|h] can be calculated by computing moments with respect to (5.1) and then dividing by p(h) = 2 * φ(t), h ≥ 0.
Unfortunately, we have not found a tractable analytic form from the integrals and therefore we compute them numerically [9]. between .68 < h < .95. Using (2.6), we see the precise value is .7517915247. Figure 3 plots the variance of a bin as a function of time. Again, the computed variance includes the squared bias from effectively assuming that expectation is constant in each bin. Since f (t, h) varies from the smallest value of h in the bin to the largest value of h in the bin, we are systematically overestimating the variance. For this particular computation, we define vrAvg to be the time and ensemble average of the variance. Looking at the dependence as a function of nbins, the number of bins, we find vrAvg(nbins = 80) = 0.16033, vrAvg(nbins = 160) = 0.16021, vrAvg(nbins = 320) = 0.16018 and vrAvg(nbins = 480) = 0.16017. Knowing the value of the high is slightly better at reducing the time averaged variance since vrAvg < 1/6. Returning to Figure 3, we see that that var(t, h) ≡ V ar[B(t)| max B = h] is monotonically increasing for small values of h, up to at least h = .67. For larger values of h, the variance is non-monotone. This non-monotonicity occurs because at large values of h , the maximum of B(t) is likely to be near t = 1.   For each of the ten values of the high, we display both the simulation curve and the analytic curve from numerically computing the moments of (5.1). The simulated curves have the symbols overstruck on them. The point is the match of simulation with (5.1) is very good.

Feller Range
To look at convergence, we examine the distribution of the range as a function of the number of steps in the Brownian motion simulation. The theoretical distribution was calculated by Feller in [10]: The range R t ≡ max 0≤s≤t B(s) − min 0≤s≤t B(s) at time t is distributed like √ tR 1 and the density of , where φ denotes the standard normal density [10]. As noted by Feller: "In this form it is not even obvious that the function is positive".     We compute Feller's formula for the density of the range of a Brownian motion. It converges very slowly near zero. To evaluate f (x = .005), we need between 300 and 400 terms. The formula is useful to compare our Brownian motion computations with the theoretical results. Figure 6 compares the empirical density with Feller's result. The blue curve is computed from Feller's expansion, the black curve is the empirical density from four million realizations with 2000 time steps. The green curve uses only 500 time steps. We see very good agreement. The main difference is that the empirical distribution is shifted slightly to the left. There is less than 0.1% of the distribution below range < 0.7. In Section 8, the density given high and low bounds involves an expansion in k exp(−k 2 (h − l) 2 ). This expansion converges very quickly for vast majority of the ensemble of Brownian paths. We see that the shift of the empirical distribution decreases as the step size decreases. For a step size of .0005, the shift of the center of mass of the distribution is .0066 from the theoretical result. Using a timestmp four times larger doubles the shift.
The distribution of the range is very small for range < .5 and this region is poorly approximated by the Feller expansion. The is the clear opportunity for an asymptotic expansion in the region of small range. For these plots, we use 1530 time steps on each simulation for a total of 18 million simulations with 100 bins in each parameter direction. The curves overstruck by symbols are the simulation curves. The analytic formula curves have the same color but no symbol. Figure 7 shows that the expectation is nearly monotonically decreasing for strongly negative values of the close and near zero values of the high. We say nearly decreasing because we have not examined the behavior near time = 0. For large value of the high, the high peaks near the middle of the time interval. Figure 8 shows the expectation is nearly symmetric in time when the close is near zero.    .18) and (3.19). In many cases, the variance is multimodal in time.

Comparison of Theory and Simulation Given Close and High
In this section, we plot the simulation and theoretical calculation given by (3.18) and (13.4). for this comparison, we use 30 million realizations each with 1530 steps. The results are then binned in 120 bins in each direction for a total of 1.73 million bins. We compute the MSE for each bin and sort them. We then display the fits for the worst .05, .02, .01 and .002 quantiles of the bins. To put the curves to scale, we plot all the curves together. The curves overstruck by symbols are the simulation curves. The analytic formula curves have the same color but no symbol. We now display the comparisons for each  Figure 14 compares the simulated variance in four separate bins with the analytic expression in (13.4). Here again, we compute the squared error for each of the one million bins. We then plot the fits for the worst .05, .02, .01 and .002 quantiles of the bins. The worst fits for the variance have different parameters than the parameters for the worst fits to the empirical mean. To put the curves to scale, we plot all the curves together.

Density Given High, Low and Close
To derive the the density of p(x; t, c, h, ), we need to consider four terms, the probability that both the high and low are to the left of t, the probability that just the low is to the left of t, the probability that just the high is to the right of t and the probability that both the high and the low are to the right of t.
which we denote as Q R (x, t, h, , c). As in (3.9), we decompose the generator, G(x, t, h, , c), into a sum of Gaussians in x. Here This allows us to integrate by parts and drop terms.
We define the moments M m where the limits of integration, H and L, are to be set to h and after we differentiate ∂ h ∂ G.
The (i, j, k)th term inside the integral satisfies Here H ijk (z) is defined as To simplify the moment calculation, we evaluate the derivatives by h and and recast them as derivatives with respect to x so that we can integrate by parts: We integrate by parts and find from (8.12): where B ijk is the boundary term. In the Appendix, we show that the boundary terms sum to zero. Also in the Appendix, we define the functions G mn (). We then have the repesentation: For M 0 , only the last term is nonzero and the sum reduces to To simplify the first sum, we used To sum these terms, we reparametrize k(j,k). For i = 1, 4, we set k =k − j,k = k + j. For i = 2, 3, we set k = j −k,k = k − j. With these transformations, v j,k = j −kt,ṽ j,k = j −kt, w jk = 2k,w jk = 2k, µ 1,jk = ct +ṽ j,k ∆, g 1 = (c − w j,k ∆) 2 /2, µ 2 = (2h − c)t + v j,k ∆, g 2 = (2h − c −w j,k ∆) 2 /2. Since the g i depend only onk and not j, so do the Γ .jk . The double sum splits into a single sum (8.15) where we have dropped the j dependence on g and Γ. We use that for a given k, the sum of the integrals for (1, j,k) and (4, j,k) cover the region from −∞ to ∞. This allows us to collapse the sum over i ∈ (1, 4), j. Similarly, the sums over (2, j,k) and (3, j,k) collapse. We recognize the expression in (8.15) to precisely correspond to M 0 (t, h, , c) = p(h, , c) as given by (2.10).
We would very much like to have expressions for the first and second moment that reduce the double sum to a single sum. This does not appear possible because the a ijk and a ijk do not vanish. For m ≤ 2, . When m > 2, the terms multiplying E σ (h − µ ijk ) and E σ ( − µ ijk ) may be different. For m = 1, the coefficients are In the Appendix, we further simplify the expressions in (8.16)-(8.17). We are unable to collapse the two dimensional sum over j and k to a single infinite sum as was possible in the m = 0 case.
To evaluate (8.14) numerically, we need to truncate the expansion in j and k. Luckily, the Feller distribution of 5.3 shows that very few realizations have small values of ∆. Thus the double expansion for j and k converges quickly for the vast majority of the Brownian realizations.
A second method to evaluate the probability of (8.1) is to define Q(x, t, h, , c) ≡ P (B(t) = x, ≤ B(t) ≤ h) by (8.3) and to numerically evaluate Q, ∂ h Q, ∂ Q and ∂ ∂ h Q. Similarly, we define Q R (x, t, h, , c) ≡ (8.6) and numerically evaluate Q R , ∂ h Q R , ∂ Q R and ∂ ∂ h Q R .
We then numerically integrate the moments of (8.19). (8.19) times x m from x = to x = h. Each of the terms in the integral involves truncating only in one of j or k. Thus the additional work involved in evaluating Q and Q R at many points to evaluate the integral is partially compensated by the single infinite sums as opposed to a doubly infinite sum.  Figures 15-17 show E[B(t|c, h, )] for c = −1.011. Note that maximum of the expectation is less the expectation of the maximum. The curves on the three plots have a similar shape as the value of the low is varied. This may indicate a somewhat weaker dependence on high than on the low when the close equals -1. However, a stronger factor is that the curves in 'low' the low coordinate vary more since we sample 10 values from the second smallest bin value of to the second largest value of given (c, h).   In Figure 20, the curves for large high and small low are not very symmetric, but this may be due to fewer curves in the bin due to our adaptive binning. We have also plotted E[B(t|c, h, )] for c ≈ 1. These

Comparison of Theory and Simulation Given Close, High and Low
In this section, we plot the simulation and theoretical calculation given by (8.14). For this comparison, we use 30 million realizations each with 1530 steps. The results are then binned in 120 bins in each direction, thus a total of 1.73 million bins. We compute the MSE for each bin and sort them. We then display the fits for the worst .05, .02, .01 and .002 of the bins. To put the curves to scale, we plot all the curves together.   (13.4). Here again, we compute the squared error for each of the 1.73 million bins. We then plot the fits for the worst .05, .02, .01 and .002 quantiles of the bins. The worst fits for the variance have different parameters than the parameters for the worst fits to the empirical mean. To put the curves to scale, we plot all the curves together.

Distribution Given High and Low
We evaluate the distribution p(x, t, h, ) by integrating over the closing value in p(x, t, h, , c) using 8.7. As before, the limits of integration, H and L, are to be set to h and after differentiation. The generating function is ) .
The density satisfies P (x; t, h, ) = −∂ ∂ h G(x, t, h, ; H, L). To get the density conditional on the high and low, we divide P (x; t, h, ) by p(h, ) as given by (

Summary
By calculating E[B(t)| max, min, close], we are able to interpolate in time any dataset where only the open, high, low and close are given. In practice, we interpolate on the log scale using the logarithms of the open, high, low and close. For most applications, we are interested in relative price chances so the log scale is appropriate. If one is truly interested in the actual price, our formulas need to be modified for log Brownian motion. Our simulations have calculated the ensemble average of the mean square error in Brownian motion for a variety of different givens. The time dependence of the variance is displayed in Figure 31. The variance is symmetric in time when final value is specified. If just the high or the high and low are specified, the variance is nonmonotonic in time. The time averaged variance is displayed in Table 1. It is slightly better to know the high than the closing value. It is better to know the close and the high than the close and time of the high. It is better to know the close, high and low than to know the close high and time of the high. The values for Table 1 and Figure 31 are from the simulation. We plan to compute these ensemble averages using the analytic results in Sections 3 and 8.

Close and High
We now evaluate the integrals M m in (3.15)- (3.17).
For the second moment, (3.15) reduces to . For (13.7), we use . (13.12) These formulas have been verified by numerically integrating the moments of ∂ h F (x, t, h, c) from −∞ to h.