Detection of Derivative Discontinuities in Observational Data

This paper presents a new approach to the detection of discontinuities in the n-th derivative of observational data. This is achieved by performing two polynomial approximations at each interstitial point. The polynomials are coupled by constraining their coefficients to ensure continuity of the model up to the (n-1)-th derivative; while yielding an estimate for the discontinuity of the n-th derivative. The coefficients of the polynomials correspond directly to the derivatives of the approximations at the interstitial points through the prudent selection of a common coordinate system. The approximation residual and extrapolation errors are investigated as measures for detecting discontinuity. This is necessary since discrete observations of continuous systems are discontinuous at every point. It is proven, using matrix algebra, that positive extrema in the combined approximation-extrapolation error correspond exactly to extrema in the difference of the Taylor coefficients. This provides a relative measure for the severity of the discontinuity in the observational data. The matrix algebraic derivations are provided for all aspects of the methods presented here; this includes a solution for the covariance propagation through the computation. The performance of the method is verified with a Monte Carlo simulation using synthetic piecewise polynomial data with known discontinuities. It is also demonstrated that the discontinuities are suitable as knots for B-spline modelling of data. For completeness, the results of applying the method to sensor data acquired during the monitoring of heavy machinery are presented.


Introduction
In the recent past physics informed data science has become a focus of research activities, e.g., [9]. It appears under different names e.g., physics informed [12]; hybrid learning [13]; physics-based [17], etc.; but with the same basic idea of embedding physical principles into the data science algorithms. The goal is to ensure that the results obtained obey the laws of physics and/or are based on physically relevant features. Discontinuities in the observations of continuous systems violate some very basic physics and for this reason their detection is of fundamental importance. Consider Newton's second law of motion, Any discontinuities in the observations of m(t),ṁ(t), y(t),ẏ(t) orÿ(t) indicate a violation of some basic principle: be it that the observation is incorrect or something unexpected is happening in the system. Consequently, detecting discontinuities is of fundamental importance in physics based data science. A function s(x) is said to be C n discontinuous, if s ∈ C n−1 \ C n , that is if s(x) has continuous derivatives up to and including order n − 1, but the n-th derivative is discontinuous. Due to the discrete and finite nature of the observational data, only jump discontinuities in the n-th derivative are considered; asymptotic discontinuities are not considered. Furthermore, in more classical data modelling, C n jump discontinuities form the basis for the locations of knots in B-Spline models of observational data [15].

State of the Art
There are numerous approaches in the literature dealing with estimating regression functions that are smooth, except at a finite number of points. Based on the methods, these approaches can be classified into four groups: local polynomial methods, splinebased methods, kernel-based methods and wavelet methods. The approaches vary also with respect to the available a priori knowledge about the number of points of discontinuity or the derivative in which these discontinuities appear. For a good literature review of these methods, see [3]. The method used in this paper is relevant both in terms of local polynomials as well as spline-based methods; however, the new approach requires no a priori knowledge about the data. In the local polynomial literature, namely in [8] and [14], ideas similar to the ones presented here are investigated. In these papers, local polynomial approximations from the left and the right side of the point in question are used. The major difference is that neither of these methods use constraints to ensure that the local polynomial approximations enforce continuity of the lower derivatives. Additionally, it is not clear whether only co-locative points are considered as possible change points, or interstitial points are also considered. Furthermore, both papers use different residuals to determine the existence of a change point. The latter also focuses on optimal subset selection, which is not the focus of this paper.
In [11] on the other hand, one polynomial instead of two is used, and the focus is mainly on detecting C 0 and C 1 discontinuities. Additionally, the number of changepoints must be known a-priori, so only their location is approximated; the required apriori knowledge make the method unsuitable in real sensor based system observation.
In the spline-based literature there are heuristic methods (top-down and bottom-up) as well as optimization methods. For a more detailed state of the art on splines, see [2]. Most heuristic methods use a discrete geometric measure to calculate whether a point is a knot, such as: discrete curvature, kink angle, etc, and then use some (mostly arbitrary) threshold to improve the initial knot set. In the method presented here, which falls under the category of bottom-up approaches, the selection criterion is based on calculus and statistics, which allows for incorporation of the fundamental physical laws governing the system, in the model, but also ensures mathematical relevance and rigour.

The New Approach
This paper presents a new approach to detecting C n discontinuities in observational data. It uses constrained coupled polynomial approximation to obtain two estimates for the n th Taylor coefficients and their uncertainties, at every interstitial point. These correspond approximating the local function by polynomials, once from the left f(x, α) and once from the right g(x, β). The constraints couple the polynomials to ensure that α i = β i for every i ∈ [0 . . . n−1]. In this manner the approximations are C n−1 continuous at the interstitial points, while delivering an estimate for the difference in the n th Taylor coefficients. All the derivations for the coupled constrained approximations and the numerical implementations are presented. Both the approximation and extrapolation residuals are derived. It is proven that the discontinuities must lie at local positive peaks in the extrapolation error. The new approach is verified with both known synthetic data and on real sensor data obtained from observing the operation of heavy machinery.

Detecting C n Discontinuities
Discrete observations s(x i ) of a continuous system s(x) are, by their very nature, discontinuous at every sample. Consequently, we will require some measure for discontinuity, with uncertainty, which provides the basis for a statistical hypothesis test.
The observations are considered to be the co-locative points, denoted by x i and collectively by the vector x; however, we wish to estimate the discontinuity at the interstitial points, denoted by ζ i and collectively as ζ. Using interstitial points, one ensures that each data point is used for only one polynomial approximation at a time. Furthermore, in the case of sensor data, one expects the discontinuities to happen between samples. Consequently the data is segmented at the interstitial points, i.e. between the samples. This requires the use of interpolating functions and in this work we have chosen to use polynomials.
Polynomials have been chosen because of their approximating, interpolating and extrapolating properties when modelling continuous systems: The Weierstrass approximation theorem [16] states that if f (x) is a continuous real-valued function defined on the real interval x ∈ [a, b], then for every ε > 0, there exists a polynomial p(x) such that for all That is any function f (x) can be approximated by a polynomial to an arbitrary accuracy ε given a sufficiently high degree.
The basic concept (see Figure 1) to detect a C n discontinuity is: to approximate the data to the left of an interstitial point by the polynomial f(x, α) of degree d L and to the right by g(x, β) of degree d R , while constraining these approximations to be C n−1 continuous at the interstitial point. This approximation ensures that, while yielding estimates for f (n) (ζ i ) and g (n) (ζ i ) together with estimates for their variances λ f (ζ i ) and λ g(ζ i ) . This corresponds exactly to estimating the Taylor coefficients of the function twice for each interstitial point, i.e., once from the left and once from the right. It they differ significantly, then the function's n th derivative is discontinuous at this point. The Taylor series of a function f (x) around the point a is defined as, for each x for which the infinite series on the right hand side converges. Furthermore, any function which is n + 1 times differentiable can be written as wheref(x) is an n th degree polynomial approximation of the function f (x), and R(x) is the remainder term. The Lagrange form of the remainder R(x) is given by where ξ is a real number between a and x. A Taylor expansion around the origin (i.e. a = 0 in Equation 3) is called a Maclaurin expansion; for more details, see [1]. In the rest of this work, the n th Maclaurin coefficient for the function f (x) will be denoted by The coefficients of a polynomial f(x, α) = α n x n + ... + α 1 x + α 0 are closely related to the coefficients of the Maclaurin expansion of this polynomial. Namely, it's easy to prove that A prudent selection of a common local coordinate system, setting the interstitial point as the origin, ensures that the coefficients of the left and right approximating polynomials correspond to the derivative values at this interstitial point. Namely, one gets a very clear relationship between the coefficients of the left and right polynomial approximations, α and β, their Maclaurin coefficients, t From equation 9 it is clear that performing a left and right polynomial approximation at an interstitial point is sufficient to get the derivative values at that point, as well as their uncertainties.

Constrained and Coupled Polynomial Approximation
The goal here is to obtain ∆t g via polynomial approximation. To this end two polynomial approximations are required; whereby, the interstitial point is used as the origin in the common coordinate system, see Figure 1. The approximations are coupled [6] at the interstitial point by constraining the coefficients such that α i = β i , for every i ∈ [0 . . . n − 1]. This ensures that the two polynomials are C n−1 continuous at the interstitial points. This also reduces the degrees of freedom during the approximation and with this the variance of the solution is reduced. For more details on constrained polynomial approximation see [4,7].
To remain fully general, a local polynomial approximation of degree d L is performed to the left of the interstitial point with the support length l L creating f(x, α); similarly to the right d R , l R , g(x, β). The x coordinates to the left, denoted as x L are used to form the left Vandermonde matrix V L , similarly x R form V R to the right. This leads to the following formulation of the approximation process, A C n−1 continuity implies α i = β i , for every i ∈ [0 . . . n − 1] which can be written in matrix form as We obtain the task of least squares minimization with homogeneous linear constraints, Clearly γ must lie in the null-space of C; now, given N , an ortho-normal vector basis set for null {C}, we obtain, γ = N δ.
Back-substituting into Equation 13 yields, The least squares solution to this problem is, and consequently, Formulating the approximation in the above manner ensures that the difference in the Taylor coefficients can be simply computed as

Covariance Propagation
Defining, K = N (V N ) + , yields, γ = K y. Then given the covariance of y, i.e., Λ y , one gets that, Additionally, from equation 19 one could derive the covariance of the difference in the Taylor coefficients Keep in mind that, if one uses approximating polynomials of degree n to determine a discontinuity in the n th derivative, as done so far, Λ ∆ is just a scalar and corresponds to the variance of ∆t

Error Analysis
In this paper we consider three measures for error: 1. the norm of the approximation residual; 2. the combined approximation and extrapolation error; 3. the extrapolation error.

Approximation Error
The residual vector has the form The approximation error is calculated as

Combined Error
The basic concept, which can be seen in Figure 2, is as follows: the left polynomial f (x, α), which approximates over the values x L , is extended to the right and evaluated at the points x R . Analogously, the right polynomial g (x, β) is evaluated at the points x L . If there is no C n discontinuity in the system, the polynomials f and g must be equal and consequently the extrapolated values won't differ significantly from the approximated values.
Analytical Combined Error The extrapolation error in a continuous case, i.e. between the two polynomial models, can be computed with the following 2-norm, Given, the constraints which ensure that α i = β i i ∈ [0, . . . , n − 1], we obtain, Expanding and performing the integral yields, Given fixed values for x min and x max across a single computation implies that the factor, is a constant. Consequently, the extrapolation error is directly proportional to the square of the difference in the Taylor coefficients, Numerical Combined Error In the discrete case, one can write the errors of f(x, α) and g(x, β) as e f = y − f(x, α) and e g = y − g(x, β) respectively. Consequently, one could define an error function as where z x.ˆn. From these calculations it is clear that in the discrete case the error is also directly proportional to the square of the difference in the Taylor coefficients and that E fg ∝ ε x . This proves that the numerical computation is consistent with the analytical continuous error.

Extrapolation Error
One could also define a different kind of error, based just on the extrapolative properties of the polynomials. Namely, using the notation from the beginning of Section 3, one defines and then calculates the error as In the example in section 5, it will be seen that there is no significant numerical difference between these two errors.

Numerical Testing
The numerical testing is performed with: synthetic data from a piecewise polynomial, where the locations of the C n discontinuities are known; and with real sensor data emanating from the monitoring of heavy machinery.

Synthetic Data
In the literature on splines, functions of the type y (x) = e −x 2 are commonly used. However, this function is analytic and C ∞ continuous; consequently it was not considered a suitable function for testing. In Figure 3 a piecewise polynomial with a similar shape is shown; however, this curve has C 2 discontinuities at known locations. The algorithm g , together with the two identified peaks.
was applied to the synthetic data from the piecewise polynomial, with added noise with σ = 0.05 and the results for a single case can be seen in Figure 3. Additionally, a Monte Carlo simulation with m = 10000 iterations was performed and the results of the algorithm were compared to the true locations of the two known knots. The mean errors in the location of the knots are: µ 1 = (5.59 ± 2.05) × 10 −4 with 95% confidence, and µ 2 = (−4.62±1.94)×10 −4 . Errors in the scale of 10 −4 , in a support with a range [0, 1], and 5% noise amplitude in the curve can be considered a highly satisfactory result.

Sensor Data
The algorithm was also applied to a set of real-world sensor data 1 emanating from the monitoring of heavy machinery. The original data set can be seen in Figure 4 (top). It has many local peaks and periods of little or no change, so the algorithm was used to detect discontinuities in the first derivative, in order to determine the peaks and phases. The peaks in the Taylor differences were used in combination with the peaks of the fg calculated at every interstitial point. The red and blue circles mark the relevant local maxima and minima of the difference respectively. According to this, the red and blue lines are drawn in the top-most graph. The bottom graph shows the approximation error evaluated at every interstitial point. extrapolation error to determine the points of discontinuity. A peak in the Taylor differences means that the Taylor coefficients are significantly different at that interstitial point, compared to other interstitial points in the neighbourhood. However, if there is no peak in the extrapolation errors at the same location, then the peak found by the Taylor differences is deemed insignificant, since one polynomial could model both the left and right values and as such the peak isn't a discontinuity. Additionally, it can be seen in Figure 5 that both the extrapolation error and the combined error, as defined in Section 4, have peaks at the same locations, and as such the results they provide do not differ significantly. One can see that the location of the peaks doesn't change, and the two errors don't differ significantly.

Conclusion and Future Work
It may be concluded, from the results achieved, that the coupled constrained polynomial approximation yield a good method for the detection of C n discontinuities in discrete observational data of continuous systems. Local peaks in the square of the difference of the Taylor polynomials provide a relative measure as a means of determining the locations of discontinuities.
Current investigations indicate that the method can be implemented directly as a convolutional operator, which will yield a computationally efficient solution. The use of discrete orthogonal polynomials [5,10] is being tested as a means of improving the sensitivity of the results to numerical perturbations.