# The exact bootstrap method shown on the example of the mean and variance estimation

- First Online:

- Received:
- Accepted:

- 1 Citations
- 1.5k Downloads

## Abstract

The bootstrap method is based on resampling of the original randomsample drawn from a population with an unknown distribution. In the article it was shown that because of the progress in computer technology resampling is actually unnecessary if the sample size is not too large. It is possible to automatically generate all possible resamples and calculate all realizations of the required statistic. The obtained distribution can be used in point or interval estimation of population parameters or in testing hypotheses. We should stress that in the exact bootstrap method the entire space of resamples is used and therefore there is no additional bias which results from resampling. The method was used to estimate mean and variance. The comparison of the obtained distributions with the limit distributions confirmed the accuracy of the exact bootstrap method. In order to compare the exact bootstrap method with the basic method (with random sampling) probability that 1,000 resamples would allow for estimating a parameter with a given accuracy was calculated. There is little chance of obtaining the desired accuracy, which is an argument supporting the use of the exact method. Random sampling may be interpreted as discretization of a continuous variable.

### Keywords

Bootstrap Nonparametric estimation Discrete random variables Mean and variance estimation## 1 Introduction

Consider random variable \(X\) with unknown distribution \(F\). We are interested in the distribution parameter denoted by \(\theta \). If the parameter can not be constructed directly, it is necessary to draw a random sample and select an appropriate estimator of parameter \(\theta \). The estimator is a statistic defined on a sample space. The random sample is denoted by \({\mathbf X} = (X_{1}, X_{2},{\ldots }, X_{n})\), its realization by \({\mathbf x} = (x_{1}, x_{2}, {\ldots }, x_{n})\), and the estimator of parameter \(\theta \) by \(\hat{{\theta }}=t({\mathbf X})\).

Efron (1979) proposed what he called the bootstrap method. It is based on a random selection of resamples (bootstrap samples) of size \(n\) from the obtained sample (original sample) **x**. The random selection is done with replacement and is assumed to have identical probabilities equal to 1/\(n\) of randomly selecting each of the values \(x_{k}\), for \(k=1, {\ldots }, n\). Thus, distribution \(\hat{{F}}\), also known as the bootstrap distribution, is generated.

The bootstrap sample is denoted by \({\mathbf X}^{*}=\left( {X_1^*,X_2^*,\ldots ,X_n^*} \right)\), and its arbitrary realization by \({\mathbf x}^{*}=\left( {x_1^*,x_2^*,\ldots ,x_n^*} \right)\). Estimator \(\hat{{\theta }}\) for the bootstrap sample is denoted by \(\hat{{\theta }}^*=t\left( {{\mathbf X}^*} \right)\).

Approximation of the distribution of statistic \(\hat{{\theta }}\) by the bootstrap statistic \(\hat{{\theta }}^*\) is the essence of this method. If Monte Carlo approximation is used to construct distribution \(\hat{{\theta }}^*\), it is necessary to determine the number of the randomly selected bootstrap samples \(B\).

Using the bootstrap variance, Efron (1987) states that it is sufficient to have a small number of random samplings in order to achieve sufficient accuracy. Booth and Sarkar (1988) disagree with this statement. They used the distribution approximation of relative bootstrap variance. This allowed for the estimation of \(B\) for the given error level at the assumed confidence level. It proved that achieving an error lower than 10 % at the 0.95 confidence level requires \(B\) to be around 800. (Efron and Tibshirani (1993), p. 52) believe that the estimation of standard error rarely requires more than 200 replications (repeated random samplings) while estimating the confidence interval requires 1,000 replications (Efron and Tibshirani 1993, p. 162).

Considering the bootstrap method one may ask the question whether random sampling of the bootstrap sample \({\mathbf X}^{*}\) from the original sample **X **obtained previously is necessary. Random sampling is necessary if examining the entire population data is impossible or too costly. Using a sample instead of the population has its significant implications in the area of mathematical statistics interest.

Note that the fundamental sample property is its finite size. The given bootstrap distribution \(\hat{{F}}\) for this sample is a simple discrete distribution. Distribution of any given statistic determined for \(n\) discrete random variables with a finite number of realizations does not have to be estimated as it may simply be calculated. The only question that remains open is how many calculations are required in this approach, which will be discussed below.

Consider the case of the two-element sample (\(x_{1}\), \(x_{2})\). The possible resamples that may be obtained are: (\(x_{1}\), \(x_{1})\), (\(x_{2}\), \(x_{2})\), (\(x_{1}\), \(x_{2})\), (\(x_{2}\), \(x_{1})\). The probability of randomly selecting each of them is the same and equals 1/4. For the three-element sample (\(x_{1}\), \(x_{2}\), \(x_{3})\) there are \(3^{3}\) = 27 of resamples: (\(x_{1}\), \(x_{1}\), \(x_{1})\), (\(x_{1}\), \(x_{1}\), \(x_{2})\), (\(x_{1}\), \(x_{1}\), \(x_{3})\), (\(x_{1}\), \(x_{2}\), \(x_{1})\), (\(x_{1}\), \(x_{2}\), \(x_{2})\), (\(x_{1}\), \(x_{2}\), \(x_{3})\), (\(x_{1}\), \(x_{3}\), \(x_{1})\), (\(x_{1}\), \(x_{3}\), \(x_{2})\), (\(x_{1}\), \(x_{3}\), \(x_{3})\), (\(x_{2}\), \(x_{1}\), \(x_{1})\), (\(x_{2}\), \(x_{1}\), \(x_{2})\), (\(x_{2}\), \(x_{1}\), \(x_{3})\), (\(x_{2}\), \(x_{2}\), \(x_{1})\), (\(x_{2}\), \(x_{2}\), \(x_{2})\), (\(x_{2}\), \(x_{2}\), \(x_{3})\), (\(x_{2}\), \(x_{3}\), \(x_{1})\), (\(x_{2}\), \(x_{3}\), \(x_{2})\), (\(x_{2}\), \(x_{3}\), \(x_{3})\), (\(x_{3}\), \(x_{1}\), \(x_{1})\), (\(x_{3}\), \(x_{1}\), \(x_{2})\), (\(x_{3}\), \(x_{1}\), \(x_{3})\), (\(x_{3}\), \(x_{2}\), \(x_{1})\), (\(x_{3}\), \(x_{2}\), \(x_{2})\), (\(x_{3}\), \(x_{2}\), \(x_{3})\), (\(x_{3}\), \(x_{3}\), \(x_{1})\), (\(x_{3}\), \(x_{3}\), \(x_{2})\) i (\(x_{3}\), \(x_{3}\), \(x_{3})\). The probability of selecting each of the aforementioned resamples is also the same and equals 1/27. The fact that the resamples include the same elements which are just permutated has no significance as each of them has a defined (identical) probability.

If the original sample is an \(n\) element sample, then the number of equally probable resamples equals \(BE = n^{n}\). The probability of randomly selecting each of the resamples is equal to 1/*BE*. It is necessary to stress that the space of resamples is a finite space measuring *BE* and such is the size of the exact (ideal) bootstrap sample. If it is not too large, all realizations of the estimator may be calculated. These realizations may be interpreted as realizations of a given discrete random variable. Since the number of the realizations is finite, it is necessary to use descriptive statistics tools for their analysis (estimation error is then calculated using formula (3)). If it is impossible to generate the entire sample space of resamples as \(n^{n}\) is too large, random sampling, that is using the classical bootstrap (estimation error is described by formula (4)), is then necessary. It is worth noting that if the estimator is the mean and the sample is large, according to the Central Limit Theorem, there will be normal asymptotic distribution.

The bootstrap method which uses the entire space of resamples may be called the exact bootstrap method. The claim that the method is exact only pertains to resampling. With regard to the original sample, its adequacy in relation to the original variable \(X\) is based on the Glivenko-Cantelli Theorem.

## 2 The exact bootstrap method

Let us assume that from the population described by random variable \(X\) with an unknown distribution of probability \(F\), an \(n\) element primary sample \({\mathbf x} = (x_{1}, x_{2}, {\ldots }, x_{n})\) was drawn. Because for some \(i\ne j\) it is possible that \(x_{i}=x_{j}\), we should reduce^{1} the size of the random sample to \(k\) different values. The probabilities \(p_{i}\) of achieving the realization \( x_{i}\), for \(i=1, 2, {\ldots }, k\) does not have to be identical for all \(i\) (as is the case with the classical bootstrap).

The correct algorithm should satisfy the condition: \(\sum \nolimits _{b=1}^{BE} {p^\mathrm{D} b} =1\).

Formula (6) describes the distribution of estimator \(\hat{{\theta }}^\mathrm{D}\), which is used to approximate the distribution of estimator \(\hat{{\theta }}\). It is a discrete distribution with a finite number of realizations although in most cases the number is very high. This distribution may be used in point or interval assessment of parameter \(\theta \) or in testing hypotheses.

Note that in essence the entire operation is based on the approximation of an unknown continuous distribution of a certain random variable \(\hat{{\theta }}\) using discrete random variable \(\hat{{\theta }}^\mathrm{D}\) with a distribution which may be generated based on a sample. Through random sampling we actually conduct discretization of a certain continuous occurrence. We attempt to approximate the continuous random variable \(X\) by a sequence of its realization \({\mathbf x} = (x_{1}, x_{2}, {\ldots }, x_{n})\). Knowing the distribution of the discrete random variable, we may automatically calculate the distribution of the function of this variable. In the case of continuous random variables there is no automatic method which would allow for calculating the distribution of these functions.

When using the bootstrap methods it is worth comparing the value of *BE* (the number of all resamples) with the prescribed number of resamplings \(B\) in the classical bootstrap. For example, for \(k=15\) and \(n=18\) we obtain \(BE =15^{18}\), which is a very large number, significantly greater than the sample \(B=1{,}000\). Nowadays such a great number of repetitions can be generated. The pioneering work of Efron dates back to 1979. At that time conducting such a great number of calculations within a reasonable time was impossible. Since it was impossible to examine the entire “population” represented by the original sample, it was necessary to draw resamples from a “sample functioning as a population.”

In the bootstrap method the sequence of the obtained values of the estimator is sequenced from the lowest to the highest, which allows one to, for example, set the confidence intervals using the percentile method (Efron and Tibshirani 1993). In theory, the exact bootstrap method also permits it. The number of the possible realizations of statistic \(\hat{{\theta }}^\mathrm{D}\) is very high. However, firstly, a part of the realizations of discrete statistic will certainly be repeated and secondly, it is advisable to group the results in a histogram. Creating a histogram is necessary for large problems, as the number of the possible estimator realizations is very large. However, this may cause a loss of data. We should also stress that in spite of this, a very accurate estimation of the confidence intervals may be achieved through the exact bootstrap method as the widths of the intervals in the histogram do not have to be identical. In ranges that require exact probabilities (or cumulative distribution function), the width of the interval may be very small. Limited accuracy may only result from the density of the individual realizations of the estimator and the probability of their selection.

The easiest method of generating all resamples for discrete distribution is the recursive drawing of sequential elements from the original sample of size \(n\) (or \(k\) if there are repetitions in the original sample). Such an algorithm may be included in the brute force category. Algorithms of this type are considered ineffective.

The number of generated realizations of the bootstrap samples may be reduced, as in resampling some values will be repeated—random sampling with repetitions. (Feller (1950), p. 38) presents a similar problem. Fisher and Hall (1991) presented an algorithm which allows for generating all resamples. Both works pertain to the situation when the probability of drawing every element from a sample is the same.

## 3 The limit distribution of the bootstrap sample mean and variance estimator

The exact bootstrap method will be used to estimate the mean and variance. The verification of accuracy will be made possible through the limit distributions which may be used when the sample is large (\(n \ge 30\)).

Consider an \( n\) element random sample \({\mathbf X} = (X_{1}, X_{2},{\ldots }, X_{n})\). Variables \(X_{i}\), for i=1, 2,..., \(n\) have the same distributions \(F\), with the expected value \(\mu \) and standard deviation \(\sigma \).

## 4 A comparison of the exact and basic bootstrap

*BE*resamples, \(B\) samples may be selected in \(BE^{B }\) ways. Even if

*BE*is not large, \(BE^{B }\) will be a very large number, which makes it impossible to calculate mean distribution. However, because \(B \) is large, limit distribution may be used, which is normal distribution:

## 5 Results

### 5.1 Example 1

Distribution of random variable \(X^{D}\)

\(x_{i}\) | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|

\(p_{i}\) | 0.010 | 0.050 | 0.180 | 0.253 | 0.040 | 0.127 | 0.210 | 0.100 | 0.020 | 0.010 |

Two alternatives of the distributions of mean and variance estimators calculated using the exact bootstrap method and limit distributions are provided. The first one assumes that \(n = 20\) and the second one \(n= 30\). The samples were generated according to the algorithm proposed by (Fisher and Hall 1991).

#### 5.1.1 Mean estimation for \(n = 20\) and \(n = 30\)

In Fig. 1 there is the mean estimator distribution calculated using the exact bootstrap method (DBA) and the limit distribution GA defined by (19) in the interval separated from the mean by 4 standard deviations. The bootstrap distribution for the mean was provided as the probability of using individual values. In the case of the limit distributions, however, it is the probability of assuming the values of the interval (from the center of the interval between the values on the left to the center on the right). Both when \(n = 30\) and \(n = 20\) the diagrams are nearly identical. The probability values overlap with an accuracy to three decimal places.

Confidence intervals of the mean computed using the exact DBA bootstrap method and the GA limit distribution

Sample size | \(n=20\) | \(n=30\) | ||
---|---|---|---|---|

Estimator distribution | DBA | GA | DBA | GA |

Mean of mean estimator | 5.17400 | 5.17400 | 5.17400 | 5.17400 |

Standard deviation of mean estimator | 0.44664 | 0.44664 | 0.36468 | 0.36468 |

Boundaries confidence level \(1-\alpha \) = 0.95 | 4.2750 | 4.2986 | 4.4500 | 4.4592 |

6.0750 | 6.0494 | 5.9000 | 5.8888 | |

Width confidence level \(1-\alpha \) = 0.95 | 1.8000 | 1.7508 | 1.4500 | 1.4296 |

Boundaries confidence level \(1-\alpha \) = 0.99 | 4.0500 | 4.0235 | 4.2333 | 4.2346 |

6.3500 | 6.3245 | 6.1333 | 6.1134 | |

Width confidence level \(1-\alpha \) = 0.99 | 2.3000 | 2.3009 | 1.9000 | 1.8788 |

Probability that the mean calculated based on 1,000 bootstrap samples will be computed with the given accuracy

Accuracy | Intervals | \(n=20\) | \(n=30\) |
---|---|---|---|

0.1 | (\(\text{ mean}-0.1\); \(\text{ mean}+0.1\)) | 1.0000 | 1.0000 |

0.01 | (\(\text{ mean}-0.01\); \(\text{ mean}+0.01\)) | 0.5211 | 0.6141 |

0.001 | (\(\text{ mean}-0.001\); \(\text{ mean}+0.001\)) | 0.0564 | 0.0691 |

0.0001 | (\(\text{ mean}-0.0001\); \(\text{ mean}+0.0001\)) | 0.0056 | 0.0069 |

#### 5.1.2 Variance estimation for \(n = 20\) and \(n = 30\)

In Table 4 there are distributions of variance estimator determined using the exact bootstrap method (DBV) and limit distribution GV defined by formula (20). The number of different realizations of the variance estimator is significantly higher than that of the mean estimator. Therefore the probabilities for intervals rather than for individual values are shown in the table. If one should use them to calculate the parameters of the distributions in the same way as for grouped data; thus, the results may differ from the exact values.

The method of selecting the width of the intervals also requires some comment. For the limit (continuous) distributions and for each arbitrarily small interval the probability that the random variable will assume the values of this interval is \(>\)0. For the discrete distribution, and such is the variance estimator distribution calculated using the exact bootstrap method, the case is different. The smaller the interval width, the greater the number of intervals where the probability is equal to 0.

Moreover, in Table 4 the expected values of variance estimator distributions and their standard deviations are presented. These are exact values calculated based on all the generated realizations.

The expected values of limit distribution GV are equal to sample variance. In the case of the DBV distribution the expected value was also equal to the variance, which attests to the accuracy of the applied algorithm. The exact bootstrap method does not introduce additional estimator bias (contrary to the bootstrap method with random sampling).

Also, note that the standard deviation of the DBV distribution is equal to the standard deviation of the GV distribution with an accuracy to five decimal places.

In Fig. 2 there is the distribution of the variance estimators for \(n=20\) and \(n=30\). They prove that the distribution GV constitute the correct approximation of the distribution DBV for the random variable whose distribution is presented in Table 1.

Distributions of variance estimator: obtained using the exact DBV bootstrap method and limit distributions of variance estimator GV

Parameters | \(n=20\) | \(n=30\) | ||
---|---|---|---|---|

DBV | GV | DBV | GV | |

Mean of variance estimator | 3.98972 | 3.98972 | 3.98972 | 3.98972 |

Standard deviation of variance estimator | 0.91790 | 0.91790 | 0.73650 | 0.73650 |

Intervals | ||||

[0.00; 0.25) | 2.77E\(-\)08 | 2.31E\(-\)05 | 8.73E\(-\)12 | 1.91E\(-\)07 |

[0.25; 0.50) | 9.47E\(-\)07 | 4.87E\(-\)05 | 1.52E\(-\)09 | 8.87E\(-\)07 |

[0.50; 0.75) | 5.73E\(-\)06 | 1.36E\(-\)04 | 3.14E\(-\)08 | 4.36E\(-\)06 |

[0.75; 1.00) | 2.91E\(-\)05 | 0.000355 | 3.66E\(-\)07 | 1.92E\(-\)05 |

[1.00; 1.25) | 9.40E\(-\)05 | 0.000856 | 2.84E\(-\)06 | 7.50E\(-\)05 |

[1.25; 1.50) | 0.000361 | 0.001921 | 2.03E\(-\)05 | 0.000262 |

[1.50; 1.75) | 0.001463 | 0.004003 | 0.000110 | 0.000817 |

[1.75; 2.00) | 0.003616 | 0.007749 | 0.000607 | 0.002272 |

[2.00; 2.25) | 0.009544 | 0.013933 | 0.002482 | 0.005634 |

[2.25; 2.50) | 0.021507 | 0.023274 | 0.008270 | 0.012467 |

[2.50; 2.75) | 0.037907 | 0.036112 | 0.022408 | 0.024610 |

[2.75; 3.00) | 0.063318 | 0.052051 | 0.045941 | 0.043341 |

[3.00; 3.25) | 0.073398 | 0.069692 | 0.076109 | 0.068095 |

[3.25; 3.50) | 0.097465 | 0.086681 | 0.110997 | 0.095448 |

[3.50; 3.75) | 0.121196 | 0.100148 | 0.124357 | 0.119359 |

[3.75; 4.00) | 0.102042 | 0.107484 | 0.139018 | 0.133161 |

[4.00; 4.25) | 0.102419 | 0.107159 | 0.126058 | 0.132538 |

[4.25; 4.50) | 0.093798 | 0.099241 | 0.107751 | 0.117691 |

[4.50; 4.75) | 0.074792 | 0.085377 | 0.085236 | 0.093235 |

[4.75; 5.00) | 0.062212 | 0.068230 | 0.059421 | 0.065896 |

[5.00; 5.25) | 0.040503 | 0.050651 | 0.038703 | 0.041549 |

[5.25; 5.50) | 0.032387 | 0.034929 | 0.024514 | 0.023373 |

[5.50; 5.75) | 0.024813 | 0.022375 | 0.013139 | 0.011729 |

[5.75; 6.00) | 0.013530 | 0.013314 | 0.007679 | 0.005251 |

[6.00; 6.25) | 0.009445 | 0.007359 | 0.003776 | 0.002097 |

[6.25; 6.50) | 0.006109 | 0.003779 | 0.001875 | 0.000747 |

[6.50; 6.75) | 0.003479 | 0.001802 | 0.000887 | 0.000238 |

[6.75; 7.00) | 0.002140 | 0.000799 | 0.000377 | 6.74E\(-\)05 |

[7.00; 7.25) | 0.001051 | 0.000329 | 0.000159 | 1.7E\(-\)05 |

[7.25; 7.50) | 0.000650 | 1.26E\(-\)04 | 6.56E\(-\)05 | 3.85E\(-\)06 |

[7.50; +\(\infty )\) | 0.000724 | 4.46E\(-\)05 | 3.83E\(-\)05 | 7.74E\(-\)07 |

Confidence intervals of the variance constructed using the exact bootstrap method DBV and GV limit distributions

Sample size | \(n=20\) | \(n=30\) | ||
---|---|---|---|---|

Estimator distribution | DBV | GV | DBV | GV |

Boundaries confidence level \(1-\alpha \) = 0.95 | 2.3685 | 2.1907 | 2.6715 | 2.5462 |

5.9565 | 5.7888 | 5.0845 | 5.4332 | |

Width confidence level \(1-\alpha \) = 0.95 | 3.5880 | 3.5981 | 2.4130 | 2.8870 |

Boundaries confidence level \(1-\alpha \) = 0.99 | 1.9575 | 1.6254 | 2.3265 | 2.0926 |

6.6945 | 6.3541 | 6.1185 | 5.8868 | |

Width confidence level \(1-\alpha \) = 0.99 | 4.7370 | 4.7287 | 3.7920 | 3.7942 |

In Table 5 there are confidence intervals of the variance when using the exact bootstrap method (the DBV distribution) and the limit distribution GV.

Comparing the width of the intervals we can state that the more precise estimation was done using the exact bootstrap method (and it is an exact estimation), then the limit distribution GV (with the exception of \(n=20\) and \(1-\alpha = 0.99\), for which the confidence interval of the GV distribution was narrower than DVB).

Probability that the variance calculated based on 1,000 bootstrap samples will be computed with the given accuracy

Accuracy | Intervals | \(n=20\) | \(n=30\) |
---|---|---|---|

0.1 | (\(\text{ variance}-0.1\); \(\text{ variance}+0.1\)) | 0.9994 | 1.0000 |

0.01 | (\(\text{ variance}-0.01\); \(\text{ variance}+0.01\)) | 0.2695 | 0.3323 |

0.001 | (\(\text{ variance}-0.001\); \(\text{ variance}+0.001\)) | 0.0275 | 0.0342 |

0.0001 | (\(\text{ variance}-0.0001\); \(\text{ variance}+0.0001\)) | 0.0027 | 0.0034 |

### 5.2 Example 2

The second simulation experiment for a small sample including the values {1, 2, 3, 4, 5}, assuming the same probabilities of random sampling of each element equal 0,2. This distribution is represented by discrete random variable \(X^{D}\) with the expected value and variance equal to \(\mu ^{D}= 3\), and \(\left( {\sigma ^\mathrm{D}} \right)^{2}= 2\), respectively. Mean estimation for \(n\) = 5.

Probability that the mean computed based on 1,000 bootstrap small samples will be calculated with the given accuracy

Accuracy | Intervals | Probability |
---|---|---|

0.1 | (\(\text{ mean}-0.1\); \(\text{ mean}+0.1\)) | 1.0000 |

0.01 | (\(\text{ mean}-0.01\); \(\text{ mean}+0.01\)) | 0.3829 |

0.001 | (\(\text{ mean}-0.001\); \(\text{ mean}+0.001\)) | 0.0399 |

0.0001 | (\(\text{ mean}-0.0001\); \(\text{ mean}+0.0001\)) | 0.0040 |

#### 5.2.1 Variance estimation for \(n = 5\)

Probability that the variance calculated based on 1,000 bootstrap small samples will be computed with the given accuracy

Accuracy | Intervals | Probability |
---|---|---|

0.1 | (\(\text{ variance}-0.1\); \(\text{ variance}+0.1\)) | 0.9988 |

0.01 | (\(\text{ variance}-0.01\); \(\text{ variance}+0.01\)) | 0.2531 |

0.001 | (\(\text{ variance}-0.001\); \(\text{ variance}+0.001\)) | 0.0257 |

0.0001 | (\(\text{ variance}-0.0001\); \(\text{ variance}+0.0001\)) | 0.0026 |

Table 8 presents the probability that the variance calculated based on 1,000 bootstrap samples will be computed with the given accuracy. The probabilities were calculated using limit distribution \(N\)(2, 0.0310). As was the case with the mean, only for the 0.1 accuracy is the probability close to 1. In all other cases the use of the exact bootstrap method is recommended, since the probability of exact variance estimation using the basic bootstrap method is small.

## 6 Conclusion

In the article the exact bootstrap method was discussed. This method can be used to estimate the parameters of the estimators of random variables with an unknown distribution. The method allows for determining the estimation of an arbitrary parameter, the error of this estimation, the distribution estimator or confidence intervals. Traditionally, this problem is solved using the bootstrap method, which consists on resampling of the original random sample. Random sampling is used in statistics if the entire population can not be examined or the study would be too problematic. First of all, the original sample is finite, and secondly, its distribution is known – it is the empirical distribution. Instead of resampling, one can generate an entire space of resamples and determinate all the realizations of the statistic which is the estimator of the unknown parameter.

The method was used to estimate mean and variance. It was shown that the expected values of the estimators are equal to the mean and variance of the sample. The method, therefore, does not introduce bias resulting from resampling as it may occur in classical bootstrap.

The estimators distributions calculated using the exact bootstrap method was compared with the limit distributions. The similarity between the distributions indicates that there is a possibility of approximation of the “exact” distribution by the limit distribution if the sample is not too small.

In order to assess the effectiveness of the traditional bootstrap method, limit distribution of the mean calculated from \(B\) realizations of the estimator of a given parameter (every realization is calculated based on a single bootstrap sample) was used. This distribution allows for calculating the probability of obtaining the assumed accuracy of estimation. Although the number of bootstrap resamples was large (\(B=1{,}000\)), both the mean and variance probabilities rapidly decreased as required accuracy increased. This proves that it is worth using the exact method, which guarantees that there will not additional bias at the resampling stage.

The conducted simulation experiments have revealed that in the case of small samples (\(n\le 15\) and \(k=n\)) the time necessary to generate the entire space of resamples is short (\(<\)10 s using an average-quality computer). This means that there is no need for resampling. For larger samples, the time is much longer and requires several hours of computation (for \(n=20\) and \(k=n\) the calculations lasted 5 h and 30 min). The increase in size of the sample causes significant lengthening of the computation time. On the other hand, we should remember that the bootstrap method is used for small samples—for larger samples the limit distribution of estimators may be used. However, considering the progress in computer technology, the exact bootstrap method will also be used for lager samples in the future.

What are the consequences of the possibility of generating complete information contained in the sample as presented in the article? The fundamental issue is the much greater flexibility in constructing estimators as there is no need to make the assumption of the distribution form so that determining the distribution of the estimator would be possible. One should attempt to make the estimator unbiased, consistent and effective. The exact bootstrap method may also be useful in this matter.

Drawing a random sample may be seen as the replacement of a continuous random variable with an unknown distribution by a discrete variable with known distribution—the bootstrap distribution. Transformations of discrete variables are easier than transformations of continuous variables since the distribution of discrete variable statistics can be calculated automatically. In reality, due to the finite accuracy of all measurements we can only observe discrete variables. We may suppose that with the increasing power of computers their role in statistics will also increase.

## Footnotes

- 1.
This reduction is not necessary but recommended as it allows for reduction of the problem dimension.

## Notes

### Open Access

This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.

### References

- Booth JG, Sarkar S (1998) Monte Carlo approximation of bootstrap variances. Am Stat 4(52):354–357Google Scholar
- Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Stat 1(7):1–26MathSciNetCrossRefGoogle Scholar
- Efron B (1987) Better bootstrap confidence intervals (with discussion). J Am Stat Assoc 397(82):171–185MathSciNetCrossRefGoogle Scholar
- Efron B, Tibshirani RJ (1993) An introduction to the bootstrap. Chapman& Hall, LondonMATHGoogle Scholar
- Feller W (1950) An introduction to probability theory and its application. Wiley, New York, London, SydneyGoogle Scholar
- Fisher NI, Hall P (1991) Bootstrap algorithms for small samples. J Stat Plan Inference 27:157–169MathSciNetCrossRefGoogle Scholar
- Smirnow NW, Dunin-Barkowski IW (1973) Kurs rachunku prawdopodobieństwa i statystyki matematycznej dla zastosowań technicznych. Państwowe Wydawnictwo Naukowe, WarsawGoogle Scholar