Algorithm for error-free determination of the variance of all contiguous subsequences and fixed-length contiguous subsequences for a sequence of industrial measurement data

The article presents an algorithm for fast and error-free determination of statistics such as the arithmetic mean and variance of all contiguous subsequences and fixed-length contiguous subsequences for a sequence of industrial measurement data. Additionally, it shows that both floating-point and integer representation can be used to perform this kind of statistical calculations. The author proves a theorem on the number of bits of precision that an arithmetic type must have to guarantee error-free determination of the arithmetic mean and variance. The article also presents the extension of Welford’s formula for determining variance for the sliding window method—determining the variance of fixed-length contiguous subsequences. The section dedicated to implementation tests shows the running times of individual algorithms depending on the arithmetic type used. The research shows that the use of integers in calculations makes the determination of the aforementioned statistics much faster.


Introduction
The development of measurement, computational and data storage technologies provided us with huge amounts of information that can form the basis for various analyses. Statistics is one of the basic tools used to achieve this purpose. The need to process ever-increasing sets, however, means that the methods developed a few decades ago are no longer useful today due to their low efficiency. Consequently, a need arises B Andrzej Chmielowiec achmie@prz.edu.pl 1 Faculty of Mechanics and Technology, Rzeszow University of Technology, Kwiatkowskiego 4, 37-450 Stalowa Wola, Poland to constantly improve the existing algorithms and propose new solutions. Data from production processes of enterprises serve as a good example of an area where the amount of data collected has been growing rapidly. Currently, we are able to measure the manufactured goods and the condition of the production line at almost every stage of the process. This way, huge amounts of numerical data are collected and can be used, among others, in advanced methods of quality control and predictive maintenance. In this context, the concept of statistical process control created by Shewart (1931), Deming (1986) takes on a whole new dimension-a statistical analysis of large data sets. In general, analysis of statistical properties of a sequence of measurement data is a well-known issue. There are many proven methods to verify specific statistical hypotheses (Selvamuthu and Das 2018). They all assume, however, that we have a well-defined data sequence that we verify. The situation is completely different when we look for contiguous subsequences having specific statistical properties within a certain data sequence. The number of potential possibilities poses a problem of high computational complexity of the approach consisting in independent analysis of all possible contiguous subsequences. To illustrate this fact, we will use the example of the arithmetic mean calculated for all possible contiguous subsequences of a sequence consisting of 10 5 measurements. This task will take approximately 1.7 · 10 14 floatingpoint operations. Assuming that a single core of a typical processor is now capable of performing about 5 · 10 9 operations, it would take about 9.5 hours to make these calculations. Consequently, it is reasonable to look for such algorithmic solutions that will allow us to accelerate this type of analyses.
The question obviously arises: why set statistics for all contiguous subsequences? This is due to the fact that the analysis of statistics for the entire measurement series does not give us an answer if there were any situations during production that should be checked more closely. However, if we analyze all measurement contiguous subsequences, then we will be able to indicate the time periods in which something unexpected happened. In this way, for example, you can try to detect employee mistrust, technical problems with machines or problems with the quality of raw material. In addition, the analysis of individual contiguous subsequences gives us accurate information about what time period interesting events took place. To better present this idea, we decided to examine a series of 10 5 measurements from the production line of one of the factories. This series corresponds to about 30 days of production. The measured value was the diameter of the aluminum piston of an internal combustion engine. The measurements were carried out for each item produced. Ideally, these measurements should have a normal distribution. Therefore, we decided to test about 10 6 overlapping measuring contiguous subsequences of lengths 10 3 , 2 · 10 3 , 3 · 10 3 , . . . , 10 · 10 3 . Each of the sequences (x 1 , x 2 , . . . , x n ) was subjected to a quick statistical test, which verified that all the elements of the sequence belong to a specific range [μ − d, μ+ d]. The range was chosen in such a way that, assuming the normality of samples distribution, all samples were in the range with a probability of 1 − α. Verification of this property is very quick, because it comes down to checking the conditions min{x i } < μ − d and max{x i } > μ + d with d = σ √ 2 erf −1 ((1 − α) 1/n ). Each time the test was rejected, we increased the counter of participating samples by 1. Each sample participated in approximately 50,000 tests. The results obtained are presented in Fig. 1. It can be deduced from this plot that potentially interesting phenomena occurred around samples 0.6·10 4 , 2.5·10 4 , 3.2·10 4 , 4.9·10 4 , 5.8·10 4 , 6.1·10 4 and 9·10 4 , which corresponds to specific periods during production. Figures 2 and 3 show histograms for samples in the ranges (57500, 58500) and (90000, 91000). In addition, there is a graph showing the histogram of all tested samples (10 5 measurements) scaled by factor 0.01. The above ranges were chosen due to the fact that in the most obvious way they show evident deviations from what can be observed for entire data set. The real values taken by the samples were linearly converted into numbers in the range [−15, 15]. It was necessary due to the inability to show the real dimensions of the measured object. The presented histograms show an unusually large number of products with dimensions matching the design specification. This can be caused, for example, by the replacement of the machining tool with a new or deliberate action of employees who modified the settings of the measuring machine in order to achieve the appropriate effect during quality control. The search for such regularities in the production process is the main reason for the development of the calculation methods presented in this article.
It should be emphasized here that the problem of finding anomalies in time series has been the subject of intense research for the last twenty years. There are many algorithms and techniques used to search for anomalies provided that we know the details of their location or the size of the time window to be analyzed (Keogh et al. , 2006Senin et al. 2015).
The current methods of finding anomalies in time series can be divided into three basic categories in terms of the type of results obtained. We distinguish algorithms for finding point anomalies, structural anomalies and anomalous series (in the case of many series). By point anomalies we mean single, significant deviations from the values for a given time series (Hawkins 1980;Evans et al. 2015). On the other hand, structural anomalies are such contiguous time series subsequences that deviate from the values Zhang and Meratnia 2010;Senin et al. 2015). In contrast, by anomalous series we mean whole series that are somehow different from other observed series (Hyndman et al. 2015;Laptev et al. 2015). In many cases, one of the most important element that allows us for search structural anomalies is determining the value of appropriate statistics for a large class of contiguous subsequences. Therefore, the algorithms presented in this article may be of considerable importance  for such methods. Anomaly search algorithms using artificial intelligence methods are currently very popular. The publication based on the experience gathered by the Intel Wang et al. (2018) is highly interesting from the point of view of the production process. Beside article Chalapathy and Chawla (2019) presents a comprehensive overview of anomaly detection methods using machine learning. For this type of method, it is also very important to be able to quickly determine the value of statistics and divide the data set into appropriate clusters.
Article Ding et al. (2016) introduces the division of algorithms that detect anomalies by the type of mechanisms used to detect them. The authors distinguish four basic methods for detecting anomalies: classification, nearest neighbor, clustering and statistical. It should be emphasized that the last three methods more or less directly use functions that are either statistics or certain measures originating in the theory of probability. Quick determination of various types of statistics is therefore crucial from the point of view of the computational complexity of these algorithms.
The next sections of this article focus primarily on determining the arithmetic mean μ n = 1 n n i=1 x i and variance ν n = 1 n n i=1 (x i − μ n ) 2 , but the methods proposed therein can also be used to determine other statistics. It should be emphasized that the use of the biased estimator for the variance is appropriate in this case. This is due to its computational properties that allow for more effective determination of its numerical value. When its value is determined, it is very easy to use it to calculate the value of the unbiased estimator by multiplying the result by n n−1 . This is the reason why we will consistently use the biased variance estimator later in this article. Before we get to that, however, we briefly summarise the current state of knowledge on the calculation of variance, because it is undoubtedly one of those parameters whose calculation poses a lot of numerical problems. The first algorithm, which significantly reduces the numerical error in the determination of variance, was proposed by Welford (1962). His approach is modelled on the idea put forward by Box and Hunter (1959), who proposed a similar technique for determining residual sums. The Welford formula was improved by Knuth (1981), who gave it a very elegant and computationally effective form Next, Neely (1966) developed the idea presented by Kahan (1965) and proposed an algorithm that needs double access to data; its main element was the correction of input data by the mean computed during the first round. This way, he developed a procedure highly resistant to numerical errors in many practical applications. In search for solutions offering the highest resistance to rounding errors, Rodden (1967) presented the concept of replacing floating-point numbers with integers. We will elaborate on his proposal later on in this article, adding estimates on the number of bits necessary for error-free computation of variance. Then, Van Reeken (1968), Youngs and Cramer (1971) extended the set of algorithms described by Neely. Ling (1974) summarised the methods described above and compared their operation for many different input data distributions. Next, Hanson (1975), West (1979) and Cotton (1975) added an option to determine weighted variance to the existing methods and increased the numerical stability of the results obtained. A substantial contribution to the estimation of numerical errors for individual methods was made by Chan and Lewis (1979), Chan et al. (1982), Chan et al. (1983). Recent years have seen a fairly intense development of algorithms and methods focused on the processing of large data sets. The two basic directions of research work are worth mentioning. The first one includes online calculation methods developed by Kreinovich et al. (2007), Pébay (2008), Villaverde and Xiang (2008) and Terriberry (2008). The second one involves parallel data processing methods proposed by Schubert et al. (2015), Schubert and Gertz (2018) and Gouin et al. (2016). Peaby, Terriberry, Kolla and Bennett published very valuable results that combine both of these research directions. In article Pébay et al. (2016) they defined general recursive formulas for determining higher order moments. Their approach can be the basis for many parallel and numerically stable algorithms. This article focuses on computationally effective methods for determining statistics for all possible contiguous subsequences or for a specific group of contiguous subsequences.

Computation of the arithmetic mean and variance after conversion of measurement data into integer values
A classic algorithm for determining variance based on the following relation poses a multitude of numerical problems. In production control, calculation errors grow along with rising product quality. This is due to the fact that the expected value is calculated by adding numbers that are very similar in value. However, calculation of the variance under the aforesaid method involves deducting the square of the arithmetic mean from the root mean square, which poses a number of problems connected with calculation precision. The arithmetic and logic unit of modern processors usually supports both integer and floating point operations (mathematical coprocessor). Nowadays, a 64-bit application processor is able to perform very fast operations on 32 and 64-bit integers, and 32 and 64-bit floating point numbers (single and double precision). Floating point numbers are given by where S is a sign bit, M is the mantissa and C is the exponent. In the case of floating point numbers, the IEEE-754 standard (IEEE 2008) specifies that in the singleprecision format, the mantissa and the exponent have 23 and 8 bits, respectively. In the double-precision format, the mantissa and the exponent have 52 and 11 bits, respectively. The speed of multiplication of numbers in particular formats depends on the type of processor. Table 1 presents the times of scalar multiplication of vectors consisting of 10 8 elements of a given type.
This table clearly shows that integer multiplication is more efficient than floating point multiplication. Therefore, by switching to integer arithmetic, we can achieve two goals: speed up calculations and eliminate rounding errors. Such an approach may raise doubts, as the measurement results are, by their nature, floating point values. However, if we assume that we know the nominal value of the measured element and assume is the actually measured value. By determining m i from the above formula, we get The great advantage of this representation is that it allows for computing the expected value and variance without rounding errors. Some of them cannot be avoided in the case of floating point representation, because if, for example, a measuring device produces results with a precision of 0.1, then the floating point representation of number 1 10 in the binary system is infinite and is given by: We can avoid such problems by switching to integer representation. Its correct operation, however, is conditional on precise control of the factor by which the integers being processed had been previously multiplied. This approach will be most effective for sets that have a well-defined and appropriately small range of possible values. For example, such a set is the measurement results. If we perform the measurement using a 16-bit sensor, we can be sure that we will get no more than 65536 possible values. However, if variables can potentially take any floating point values, then in such cases the transition to an integer representation will not make much sense.
Later on in the article we focus on determining the number of bits of the mantissa necessary for correct calculation of the arithmetic mean μ n and variance ν n . In doing so, we assume that only integer numbers are used, i.e. that the exponent has zero bits. Under these assumptions, the following theorem is true: Theorem 1 Let δ 0 be the nominal value of the measured value and ε 0 the calculation precision. We also assume that the measuring range is set and the results fall within the range [δ 0 − r 0 , δ 0 + r 0 ). If n is the number of measurements, A n (ε 0 , r 0 ) is the number of integer bits necessary to correctly determine the arithmetic mean and V n (ε 0 , r 0 ) is the number of integer bits necessary for correct determination of the variance, then: Proof To prove the theorem it suffices to show that the largest values appearing in the calculation of μ n and ν n are contained in bit registers with lengths A n (ε 0 , r 0 ) and V n (ε 0 , r 0 ), respectively. Note that if the calculation precision is ε 0 , then there are [δ 0 − r 0 , δ 0 + r 0 ) possible measurement results in the interval 2r 0 ε 0 . This means that the measurement results x i stand in a one-to-one relation with integers Observe that Therefore, to precisely calculate the arithmetic mean it is necessary to precisely compute the sum m 1 + · · · + m n , satisfying the relation Remembering that the number of bits necessary to represent the nonnegative integer m is expressed by the formula log 2 m + 1 and considering the sign bit, we obtain To calculate the variance, we use the fact that it is invariant to shifts by a fixed value and can be expressed as From Jensen's inequality (Jensen 1906) for convex function it follows that 1 n n i=1 m 2 i ≥ 1 n n i=1 m i 2 , therefore, to compute the variance precisely, it is necessary to accurately determine the sum of squares m 2 1 + · · · + m 2 n satisfying the following relation which ends the proof.
Corollary 1 If we have a sequence of measurement data (x 1 , . . . , x n ) and there exist such constants δ 0 , ε 0 and r 0 that for certain integers m i , then the arithmetic mean μ n and variance ν n of the sequence of measurement data can be computed precisely using the formulas on condition that for the purpose of computing the arithmetic mean we use the integer data type with a length of at least A n (ε 0 , r 0 ) bits, and for the purpose of computing the variance we use the integer data type with a length of at least V n (ε 0 , r 0 ) bits.
To illustrate the importance of the result, imagine an optical measuring system on the production line. Let's assume that it is able to measure manufactured elements with an accuracy of 2 −20 [m] (about 1 [µm]), assuming that their dimensions do not exceed for about 2 · 10 9 mesured elements. As for the production process, this number is so large that it gives a real opportunity to analyze this process in a fairly long time horizon.

The algorithm for computing the arithmetic mean and variance of all contiguous subsequences
In this section we assume that we have a certain sequence of n observations We define contiguous subsequence s of sequence t to be sequence s = (x l+1 , x l+2 , . . . , x l+k ) such that for every i satisfying the condition l We also assume that the order of elements of contiguous subsequence s is the same as in sequence t. The contiguous subsequence can therefore be treated as a kind of time window. Because the contiguous subsequence is precisely defined by the index of its first and last term, we also use the notation s l+1,l+k to denote contiguous subsequence Let t k signify the number of k-long contiguous subsequences of sequence s. Note that this number is given by t k = n + 1 − k. Therefore, the number of all possible contiguous subsequences T of sequence t equals Now, let us assume that we want to calculate a certain statistic for all possible contiguous subsequences of an n-element sequence. Let the statistics be, for example, the arithmetic average and variance. Note that if k-element contiguous subsequence s = (x l+1 , x l+2 , . . . , x l+k ), is given, then the time needed to compute both the arithmetic mean and variance is proportional to the number of its elements. This arises directly from the definitions of these values: In order to compute the arithmetic mean, one needs k − 1 additions and one division (k operations in total), and to compute the variance, one needs k − 1 additions, k + 1 squarings, one subtraction, one division and operations needed to determine the arithmetic mean (3k + 2 operations in total). We can therefore assume that the number of floating point operations needed to calculate the arithmetic mean and the variance is limited by Ck, where C is a constant and k is the number of elements of the contiguous subsequence. Consequently, the complexity of the algorithm for computing the arithmetic mean and variance of all possible contiguous subsequences of an nelement sequence may be expressed by Note that the operations of determining the sum and the sum of squares materially affect the complexity of computing the arithmetic mean and variance. These are the only operations that depend on the number of elements in the contiguous subsequence. The number of the remaining operations is constant and independent of the length of the processed sequence. Therefore, the only way to reduce the computational complexity of the algorithm referred to above is to speed up the calculation of sums and sums of squares. It turns out that if the contiguous subsequences are processed in the right order, it takes a constant amount of time to compute the values of the arithmetic mean and variance. It is similar for other statistics. Consequently, the whole algorithm will be presented in the most general form possible. Let us set all contiguous subsequences in the order shown below: Note that each row begins with a contiguous subsequence of length 1 and that the difference in length between neighbouring contiguous subsequences in one row is also 1. This means that the time needed to compute the expected value and variance for the first contiguous subsequence in a row is constant. In the case of neighbouring contiguous subsequences located in one row, the expected value and variance of the next contiguous subsequence can also be computed in a constant time.
Definition 1 Let F be a function defined for every contiguous subsequence s = (x l+1 , . . . , x l+k ). We say that F is forward computationally separable function, if it meets the condition: for a certain function h having constant computational complexity. Moreover, we say that F is backward computationally separable function, if it meets the condition: for a certain function g having constant computational complexity. If function is both forward and backward computationally separable, then it is computationally separable function.

Corollary 2 Functions
are computationally separable functions.
Proof In the case of M 1 we simply assume that h(x, y) = x + y and g(x, y) = x − y.
For M 2 we assume that h(x, y) = x + y 2 and g(x, y) = x − y 2 .
Before we proceed any further, it is worth showing that there are functions that are either only forward or only backward computionally separable. We can use continued fractions as an example. If F 1 and F 2 are functions defined as follows then F 1 is a forward computationally separable function and F 2 is backward computationally separable function. It results directly from the following relations In other words, function h for F 1 takes the form h(x, y) = 1 y+x , while function g for F 2 takes the form g(x, y) = 1 x − y.
Let R be any statistic such that for every contiguous subsequence s = (x l+1 , x l+2 , . . . , x l+k ) meets the condition R(s) = f s (F(s)), where f s is a function dependent on s having constant computational complexity and F(s) = (F 1 (s), . . . , F m (s)) is a forward computationally separable function. Since s is explicitly designated by l and k, then we can define f s as f l+1,l+k .
Using these designations, we can define Algorithm 1, that computes statistic R for all contiguous subsequences. Data: Sequence t = (x 1 , . . . , x n ), statistic R meeting the condition R(s) = f s (F(s)) = f s (F 1 (s), . . . , F m (s)) or F being a forward computionally separable function. Result: Sequence of statistics r i, j = R(s i, j ) computed for all possible contiguous subsequences t. Proof First, we prove that the algorithm operates correctly. For this purpose, it suffices to prove that r i, j = R(s i, j ) = R((x i , . . . , x j )). Note that lines (2) and (5) of Algorithm 1 define the following recursive relation We prove by induction on j that u i, j was computed correctly. The first induction step for every i is true because this is how u i,i values are initiated. In step two we assume that for j − 1 the following is true Consequently, we obtain Therefore, R is computed correctly, which results directly from step (6) of Algorithm 1, because This ends first part of the proof. In order to estimate computational complexity, note that the assumptions show that steps (2), (3), (5) and (6) of the Algorithm have constant computational complexity O(1). Consequently, the determination of every r i, j also has constant computational complexity O(1). Finally, we conclude that the computational complexity of the entire algorithm equals the number of computed r i, j values which is exactly the same as the number of two-element combinations from an n-element set, that is n(n+1) 2 . Therefore, we proved that the algorithm has complexity O(n 2 ), which ends the proof.
. Using these designations, Algorithm 1 takes the form presented as Algorithm 2. First, we prove that this algorithm works correctly. For this purpose, it suffices to prove that r i, j contains the arithmetic mean of the elements of sequence (x i , x i+1 , . . . , x j ). To do this, let us note that for a given i the elements of u i, j meet the following recursive relation: By expanding this recursion, we obtain that u i, j = x i + x i+1 + · · · + x j . But which proves that r i, j is the arithmetic mean of sequence (x i , x i+1 , . . . , x j ). The estimation of the computational complexity arises directly from Theorem 2, because the individual computations of u i, j and r i, j have constant computational complexity.
Using these designations, Algorithm 1 takes the form presented as Algorithm 3. First, we prove that this algorithm works correctly. For this purpose, it suffices to prove that r i, j contains the variance of the elements of sequence (x i , . . . , x j ). To do this, let us note that for a given i the elements of u i, j meet the following recursive relation: By expanding this recursion, we obtain that u i, j = (x i + · · · + x j , x 2 i + · · · + x 2 j ). But which is the variance of sequence (x i , . . . , x j ). The estimation of the computational complexity arises directly from Theorem 2, because the individual computations of u i, j and r i, j have constant computational complexity.
The variance computation method used in Theorem 4 is not numerically stable. That is why it should be applied only in cases where the values of functions M 1 and M 2 are determined precisely (i.e. are free from numerical errors). The conditions that must be met in this regard have been set out in Theorem 1 and Corollary 1. When numerical errors cannot be avoided, it is advisable to compute the variance using the Data: Sequence t = (x 1 , . . . , x n ). Result: Sequence of variances r i, j = ν(s i, j ) computed for all possible contiguous subsequences t. method proposed by Welford (1962), which also defines a forward computationally separable function.
The Welford formulas shown in Eqs. 4 and 6 confirm that s i, j → (μ(s i, j ), ν(s i, j )) is a forward computationally separable function. This means that Algorithm 1 takes the form presented as Algorithm 4 and proves correctness of presented method. Complexity O(n 2 ) esults from the fact that all operations in the loop have complexity O(1). This ends the proof.
Algorithm 4: Computation of the arithmetic mean and variance for all contiguous subsequences using Welford's formula.

The algorithm for computing the arithmetic mean and variance for fixed-length contiguous subsequences
Theorem 2 which has been proven, is of general nature and applies to many different statistics. Theorems 3, 4 and 5 , which use Algorithm 1 to compute the arithmetic mean and variance for all contiguous subsequences of a given sequence t = (x 1 , x 2 , . . . , x n ), serve as an example. However, it is not always necessary to calculate statistics for all possible contiguous subsequences. Sometimes we just want to scan a series of observations using a window of a predefined width. If such a window covers w observations, then the analysed contiguous subsequences are arranged in the following series (x 1 , . . . , x w ), (x 2 , . . . , x w+1 ), . . . , (x n−w+1 , . . . , x n ).
In order to effectively compute statistics for such a set of contiguous subsequences, it is necessary to assume that a statistic is made up of certain computationally separable functions. The weaker assumption about forward computationally separability is insufficient. Let R be any statistic such that for every contiguous subsequence s = (x l+1 , x l+2 , . . . , x l+k ) meets the condition R(s) = f s (F(s)), where f s is a function dependent on s having constant computational complexity and F(s) = (F 1 (s), . . . , F m (s)) is a computationally separable function. Since s is explicitly designated by l and k, then we can define f s as f l+1,l+k . Using these designations, we can define Algorithm 5, that computes statistic R for all contiguous subsequences of length w.
Theorem 6 If s = (x l+1 , . . . , x l+k ) is a contiguous subsequence of sequence t = (x 1 , . . . , x n ), statistic R is given by R(s) = f s (F(s)) = f s (F 1 (s) (x 1 , . . . , x n ), window of width w, statistic R such that R(s) = f s (F(s)) = f s (F 1 (s), . . . , F m (s)) for a computationally separable function F. Result: A sequence of statistics r i = R(s i,i+w−1 ) set for all contiguous subsequences t of length w.
Consequently, we proved that s 1,w ), . . . , F m (s 1,w ) = R(s 1,w ). Now, we assume that r i−1 was calculated correctly. By definition of function R it follows that Considering the properties of function g, we conclude that step (8) of Algorithm 5 defines Then, using the properties of function h in step (9) we obtain

This means that
Using inductive reasoning we proved that Algorithm 5 correctly determines statistics r i for i ∈ {1, . . . , w}. In order to estimate computational complexity, note that the assumptions show that steps (1), (3), (5-6) and (8-10) of the Algorithm have constant computational complexity O(1). Consequently, the total complexity of the Algorithm equals the total number of iterations of loop (2) and (7) and amounts to O(w + (n − w)) = O(n), which ends the proof. The proof of the above conclusions is very similar to the proof of Theorems 3 and 4. It must be stressed that the method presented in Corollary 4 should be used to compute the variance only when we are certain that the calculations will be free from numerical errors. Otherwise, we advise to use the method presented below.
Lemma 1 If s = (x 1 , . . . , x m ), and s = (x 2 , . . . , x m+1 ), then the following relations are true Proof Equation 7 can be easily proved using the definition of arithmetic mean Equation 8 is slightly more complex and so to prove it we use alternative expressions for the variance ν(s) = x 2 1 +x 2 2 +···+x 2 m m − μ(s) 2 and ν(s ) = x 2 2 +···+x 2 m +x 2 m+1 m − μ(s ) 2 . Using the above relations, we can write that Using Eq. 7 proved above, we obtain that This ends the proof of the lemma.
Proof If s i, j = (x i , . . . , x j ), then the Welford formulas can be used to compute the arithmetic mean and variance in loop (2)-(4) of Algorithm 5. Since in this loop we calculate μ(s 1,w ) and ν(s 1,w ), then the formulas are reduced to In turn, for loop (7)-(11) of Algorithm 5 we use the formulas from Lemma 1, which for a window of width w are given by The above equations prove that function R(s) = (μ(s), ν(s)) is determined correctly for every s = s l+1,l+w = (x l+1 , . . . , x l+w ) using Algorithm 5, which takes the form presented as Algorithm 6. Since all operations within the two loops have the complexity O(1), then by Theorem 6 we conclude that the complexity of the presented algorithm is O(n), which ends the proof.
Result: A sequence of statistics r i = (μ(s i,i+w−1 ), ν(s i,i+w−1 )) set for all contiguous subsequences t of length w.  Bearing in mind the application of Algorithm 7 to the analysis of large sets of measurement data, we focused on int64_t and long double. arithmetic types. These are the only types allowing us to precisely determine the variance for samples consisting of more than 10 6 elements and represented with a 16-bit precision. However, in order to compare all options, each of the previously mentioned data types has been subjected to a performance analysis. Table 2 presents the time needed to perform the following operations: determination of extremes, determination of the arithmetic mean, determination of the variance, and determination of all these statistics simultaneously.
An analysis of running times showed that int64_t is more than twice as fast as long double. Due to the nature of our calculations (determination of extremes, expected value and variance), both types guarantee virtually identical precision. In addition, the integer type allows us to avoid rounding errors when the precision of measurements is based on the powers of 10. Table 3 presents running times of Algorithm 7 and of the algorithm determining statistics for each contiguous subsequence separately (denoted as std.). Figure 4 gives a more accurate picture of the speed of the algorithms for particular arithmetic types. It clearly shows the difference in the complexity of both algorithms. The time needed to determine the basic statistics grows quickly with the size of the analysed data. For a sequence of around 10 6 elements, the running time of Algorithm 7 is approximately 30 hours. For comparison, it would take almost 4 years to perform this analysis using a non-optimised algorithm. The basic condition for the reliability of Algorithm 7 is the selection of an arithmetic type that allows for error-free calculation of sums and sums of squares for a large number of elements. That is why we suggest using integer types instead of floating-point types. Figure 5 compares the running times of the proposed algorithm for two types of data: long double and int64_t. Note that in terms of speed, the advantage of the integer type over the floating-point type has been significantly reduced. There are three reasons behind this: 1. a large number of short contiguous subsequences for which the statistics are calculated, 2. the need to store a large number of calculated statistics, which translates into a large number of read and write memory operations, 3. a larger number of conditional instructions in the loops determining individual statistics (the processor spends more time on analysing conditions than on making calculations). Naturally a question arises how the performance time will change for sequences consisting of 10 5 or 10 6 elements. Another important issue is how to optimise access to memory or structure the algorithm so that it does not remember statistics, but only locations that should be analysed more closely. Figure 6 presents the ratio of the running time of the algorithm determining the statistics for each sequence separately to the running time of Algorithm 7 for two arithmetic types: long double and int64_t. The plot shows that the integer type guarantees a much higher calculation efficiency than the floating point type, and it can also be seen that this disproportion increases as the size of the input sequence being processed increases.

Conclusions
The article proposes a new algorithm for determining statistics such as the arithmetic mean and variance for all contiguous subsequences and fixed-length contiguous subsequences. Its main element is to reduce floating point data to their integer representation. As a result, an algorithm was obtained that works without a numerical error for many practical applications-especially for the analysis of industrial measurement data. The developed method significantly speeds up the calculations and gives the possibility of its effective application in machine learning and accurate statistical inference.
The first section of this article presents a brief introduction to the issues of statistical analysis of measurements in manufacturing enterprises and a description of the current state of knowledge on the determination of the arithmetic mean and variance.
The second section outlines calculation methods based on measurement data with the use of both floating-point representation and integer representation. In Theorem 1 we laid down the conditions that an arithmetic type (its mantissa) has to meet so that it can be used to precisely determine statistics such as the expected value and variance. Additionally, this sections presents a method of converting measurement data to a corresponding sequence of integers that can be used to determine the said statistics, which is expressed in Corollary 1.
The third section presents an algorithm for fast statistical analysis of all contiguous subsequences of a certain numerical sequence. Algorithm 1 has been presented in the general version using the proposed concept of computationally separable functions. Theorem 2 proves the correctness of the proposed method and determines its computational complexity at O(n 2 ), where n is the number of elements of the analysed sequence. Theorems 3 and 4 prove the usefulness of Algorithm 1 for calculating the arithmetic mean and variance. It is worth stressing that these methods have been presented in a form that allows error-free calculations, provided that the assumptions of Theorem 1 are met. For numerically unstable cases, we presented Theorem 5, which uses Algorithm 1 to determine arithmetic means and variances based on the Welford formula. This theorem applies when the mantissa of a number is too small to enable the calculations to be free from numerical errors.
The fourth section deals with the problem of determining statistics for a set of fixedlength contiguous subsequences. In this case, we proposed Algorithm 5, which was proved to operate correctly in Theorem 6. The theorem also proved that if the statistics can be defined using computationally separable functions, then the determination of statistics for all w-element contiguous subsequences of an n-element sequence has complexity O(n) regardless of the value of w. Next, Corollaries 3 and 4 show how to apply Algorithm 5 to determine arithmetic means and variances, avoiding numerical errors in the process. Lemma 1 proved the correctness of the new formula for calculating the arithmetic mean and variance for fixed-length contiguous subsequences. The lemma was used to prove Theorem 7, that proved that Algorithm 5 operates correctly based on the proposed relation.
The last section of the article compares the running times of the proposed algorithms for various data types. The conclusion is that by converting measurement data from floating-point to integer representation, we can not only speed up calculations, but also make them more precise. Although it is possible to achieve this goal only when the conditions of Theorem 1, are met, practice shows that it is very often feasible, especially in the case of a well-defined set of measurement data.