Advertisement

Experimental Astronomy

, Volume 39, Issue 1, pp 1–10 | Cite as

Kolmogorov-Smirnov like test for time-frequency Fourier spectrogram analysis in LISA Pathfinder

  • Luigi FerraioliEmail author
  • Michele Armano
  • Heather Audley
  • Giuseppe Congedo
  • Ingo Diepholz
  • Ferran Gibert
  • Martin Hewitson
  • Mauro Hueller
  • Nikolaos Karnesis
  • Natalia Korsakova
  • Miquel Nofrarias
  • Eric Plagnol
  • Stefano Vitale
Original Article
  • 201 Downloads

Abstract

A statistical procedure for the analysis of time-frequency noise maps is presented and applied to LISA Pathfinder mission synthetic data. The procedure is based on the Kolmogorov-Smirnov like test that is applied to the analysis of time-frequency noise maps produced with the spectrogram technique. The influence of the finite size windowing on the statistic of the test is calculated with a Monte Carlo simulation for 4 different windows type. Such calculation demonstrate that the test statistic is modified by the correlations introduced in the spectrum by the finite size of the window and by the correlations between different time bins originated by overlapping between windowed segments. The application of the test procedure to LISA Pathfinder data demonstrates the test capability of detecting non-stationary features in a noise time series that is simulating low frequency non-stationary noise in the system.

Keywords

Kolmogorov-Smirnov test Spectrogram Noise analysis Time-frequency map LISA Pathfinder Gravitational waves eLISA LISA 

1 Introduction

The Kolmogorov-Smirnov test is a well known statistical tool for the analysis of data, it allows to verify with what probability an empirical distribution will tend to a given cumulative distribution function when the number of data points goes to infinite [1]. The great advantage of the Kolmogorov-Smirnov test is its flexibility since the test statistic does not depend from the particular distribution of the test data. The aim of the present work is to develop a procedure based on the Kolmogorov-Smirnov test for the analysis of time-frequency maps of noisy data in the framework of the LISA Pathfinder mission. LISA Pathfinder (LPF) is an European Space Agency mission that will characterize and analyze all possible sources of disturbance which perturb free-falling test masses from their geodesic motion [2, 3, 4, 5, 6]. One of the final outcomes of the mission will be the definition of a noise model for free-falling test masses that will be used as a reference for the design and realization of future space-based gravitational wave detectors. This will require a technique to quantitatively analyze noise data and to assess the differences between noise measurements and models. Moreover, the analysis of noise is typically performed in the frequency or time-frequency domain therefore we aim to develop a noise analysis tool that is suited for such data. The problem of the statistical analysis of noise in the frequency domain was already formulated in [7] where a number of data analysis strategies were developed and compared. In this paper we present a further refinement of the Kolmogorov-Smirnov test presented in [7] and we extend its range of application to the time-frequency domain. The analysis of time-frequency data is particularly interesting since it allows to identify and characterize non-stationary noise. In LISA Pathfinder non-stationary noise can be the result of natural processes such as test masses random charging due to high energy particles and thermal drift in the electronics. In Section 2, the statistical properties of the time-frequency spectrogram are discussed while in Section 3 the Kolmogorov-Smirnov test is introduced and applied to the analysis of time-frequency spectrogram data. The influence on the test statistic of the correlations introduced by the data windowing process are analyzed in details for 4 different windows. In Section 4 the test is applied to LISA Pathfinder synthetic noise. The noise series is made non-stationary assuming that the noise provided by the capacitive sensor has an energy that is increasing quadratically with the time. Once applied to the time-frequency noise spectrogram the Kolmogorov-Smirnov test detect unambiguously the increase of the noise excess with the time. In this paper we simulated an example of non-stationary noise adding a non-stationary term (quadratic with time) in one of LPF sub systems (electrostatic actuators). The global noise model used is representative of the current expectations of LPF performances. The non-stationary scenario that has been chosen is only one of the possibility but currently it is not possible to predict if LPF noise will be non-stationary, in what amount and in what sub-system. We know that some subsystems are sensitive to thermal drifts but the true environment in the space will be known only when the mission will fly. In any case the method presented here is not dependent from the model assumed. The method detects the differences in the underlining distributions for the noise sample spectrum at two different times independently from the underlining model.

2 Statistical properties of the noise spectrogram

The spectrogram is a time-frequency map of the power content of a time series x 0,…, x N−1 that is based on the application of a short-time Fourier Transform. A segment of data of length M < N is selected by a window function w and the spectrogram elements for a time t i and a frequency f j are calculated by:
$$ S\left(t_{i},f_{j}\right) = T\left|\sum\limits_{h=p}^{p+M-1}{w_{h-p} x_{h} e^{- \imath 2 \pi f_{j} h T}}\right|^{2}. $$
(1)
Here T is the sampling time of the data, p is the starting point of the data segment of length M, the window function w 1 is defined over a segment of length M starting at 0, the frequency f j = j/(TM) with j = 0,…, M/2 and t i is the time corresponding to the center of the interval \(\left [x_{p}, x_{p+M-1}\right ]\). It is worth to note that the frequency series defined by the spectrogram data for a given t i is the sample spectrum (sample periodogram) of the reduced time series x p ,…, x p+M−1. If the data series x 0,…, x N−1 is constituted of non-stationary noise then the spectrogram provides the spectral evolution of the noise power with the time.
The statistic of the sample spectrum was analyzed in detail in [7] where it was demonstrated that the elements of the sample spectrum follow a Gamma distribution if the elements of the time domain stochastic process are independent and Gaussian distributed. The statistical properties of the elements of the spectrogram can be obtained in analogy to the results presented in [7] for the sample spectrum. In particular in the case that the elements of the noise time series are independent and Gaussian distributed the distribution of the elements of the spectrogram is a Gamma:
$$ f\left(z; k, \theta\right) = \frac{z^{\left(k-1\right)} e^{-\frac{z}{\theta}}}{\theta^{k} {\Gamma}\left(k\right)}. $$
(2)

Where k = ν/2, ν = 2, 𝜃 = 2λ, \(\lambda = E\left [S\left (t_{i},f_{j}\right )\right ]/\nu \) \(z = S\left (t_{i},f_{j}\right )\) and \(E\left [S\left (t_{i},f_{j}\right )\right ]\) is the expectation value for the spectrogram element.

It is worth to note that the statistic of the sample spectrum at each frequency bin depends on the expectation value at that frequency (equation (2)), therefore the set of random variables that constitutes the sample spectrum does not have the same cumulative distribution. As a consequence the Kolmogorov-Smirnov test is, in principle, not applicable. Here we are interested in performing a test on the sample spectrum in a given frequency band, If instead we considered a normalized spectrum, which is obtained dividing the sample spectrum for its expectation value (pre-whitening), the statistic of each frequency bin become the same and equation (2) simplifies to a Chi square distribution. In such conditions the Kolmogorov-Smirnov test can be applied to the data in a given frequency band. All these considerations are valid in the case of vanishing correlation among the different elements of the spectrogram \(S\left (t_{i},f_{j}\right )\). In practice the calculation of the spectrogram introduces two types of correlations affecting \(S\left (t_{i},f_{j}\right )\) along the frequency axis f j and the time axis t i respectively. The correlations along the frequency axis are introduced by the data windowing process that naturally correlates different frequency bins since it is a convolution operation in frequency domain between the data and the window function. The correlations as a function of the frequency bins separation Δf can be written as [8]:
$$ R\left({\Delta} f\right) = \left|\sum\limits_{h=0}^{M-1}{w_{h} e^{- \imath 2 \pi {\Delta} f h T}}\right|^{2}. $$
(3)
The result of the application of equation (3) to a selection of windows functions [9] is reported in Fig. 1a. It is worth to note that in the case of the rectangular window the correlations are negligible on the standard Fourier frequency grid in accordance to the well known result of the Fourier theory [8]. Blackman-Harris window instead is the worst performing of the set but it remains one of the most appealing window function for the application to high dynamic range signals thanks to its efficiency in the suppression of the spectral leakage. Correlations along the time axis are introduced by the overlapping of the data segments in spectrogram estimation, such correlations can be calculated as [8]:
$$ Q\left(k\right) = \frac{1}{2}\left|\sum\limits_{h=0}^{M-1}{w_{h} w_{h + k}} \right|^{2}. $$
(4)
Where k is an overlap shift factor. The expected values for the different windows of our set are reported in Fig. 1b where it can be seen that the Blackman-Harris window is the better performing in terms of suppressing correlations for a given overlap. This property is particularly advantageous for spectrogram estimation since it allows to obtain a finer time grid without increasing too much the degree of correlation between the different time bins.
Fig. 1

a) Correlation between the frequency bins of a spectrogram at a given time bin. The reported values refer to a series containing 105 data points and sampled at 1 Hz. As a consequence the minimum difference between two contiguous frequency bins in the sample spectra is Δf = 1 × 10−5 Hz. The definition of the different windows can be found in [9]. b) Overlap correlation between contiguous time bins of the spectrogram. The different segments of the time series are assumed to contain 105 data points.

3 Kolmogorov-Smirnov test

Let X 1,…, X n be a set of independent random variables with cumulative distribution function F(x), and let \(\bar {X}_{1}, \ldots , \bar {X}_{n}\) be the same set sorted in ascending order, we define the empirical distribution of the sample:
$$ F_{n}(x) = \left\{\begin{array}{ll} 0 & for x < \bar{X}_{1} \\ \frac{k}{n} & for \bar{X}_{k} \leq x < \bar{X}_{k+1} \\ 1 & for x \geq \bar{X}_{n}. \end{array}\right. $$
(5)
As \(n \to \infty \) we expect that F n (x) → F(x). The Kolmogorov-Smirnov test provides a statistical tool to verify if an empirical distribution is compatible with a given cumulative distribution function [1]. Moreover the test can be used to verify if two empirical distributions share the same asymptotic cumulative distribution function. In this case, given two empirical distributions \(F_{n_{1}}\left (x\right )\) and \(F_{n_{2}}\left (x\right )\) we test the hypothesis that they share the same cumulative distribution function F(x) if we define a distance in the space of the cumulative functions:
$$ d_{K}\left(x\right) = \left|F_{n_{1}}\left(x\right) - F_{n_{2}}\left(x\right)\right|. $$
(6)

Where \(d_{K}\left (x\right )\) is defined on the interval \(\left [0,1\right ]\) and \(K = \left (N_{1} N_{2}\right ) /\left (N_{1} + N_{2}\right )\) [1]. The statistical properties of \(d_{K} = \textnormal {max}\left [d_{K}\left (x\right )\right ]\) are independent from the particular distribution F(x) that we are testing. This flexibility represents the major advantage of the Kolmogorov-Smirnov test and it allows to implement the test for spectrogram data in a straightforward way.

As already discussed the statistic of the sample spectrum at each frequency bin depends on the expectation value at that frequency (equation (2)), those problems are solved if we considered a normalized spectrum (pre-whitened), which is obtained dividing the sample spectrum for its expectation value. In this case the statistic of each frequency bin become the same and the Kolmogorov-Smirnov test can be applied to the data. Therefore assuming to have a normalized spectrum or white noise the Kolmogorov-Smirnov test can be easily applied for the analysis of non-stationary noise in time-frequency maps obtained with the Fourier spectrogram technique. The expectation value for the sample spectrum that is used for its normalisation (pre-whitening) is typically not known a priori therefore it has to be estimated from the data itself or from a previous noise run. As a consequence the model used for the normalisation can not be an exact representation of the expectation value but, since the same model is used for all the sample spectra corresponding to different time bins of the spectrogram, the effect of the model inaccuracy cancels out and the test results preserve their reliability.

Given a spectrogram, we select the sample spectrum corresponding to the first time bin as reference and construct the reference cumulative distribution from it according to equation (5). We then compare the spectra corresponding to the other time bins with the reference using the Kolmogorov-Smirnov test as formulated in equation (6).

As discussed above the application of a finite-time window to the data introduces two sources of correlation, one is connected with the convolution in the frequency domain the other is caused by segments overlap. Those two sources of correlations affect the Kolmogorov-Smirnov test statistic in opposite directions. As can be seen in Fig. 2 the frequency convolution tends to enlarge the possible fluctuations of the empirical distribution and as a consequence the expected critical value2 is enlarged proportionally. The overlap, instead, reduces such fluctuations since the overlapping time series share a given amount of the data points. As a consequence a large overlap correlation tends to decrease the expected critical values for the test. This can be easily seen in Fig. 3 where we report the calculated critical values at 95 % confidence for the Kolmogorov-Smirnov test, obtainded with a Monte Carlo simulation over 5000 independent realizations of white noise. The critical values are shown for 4 different data windows as a functions of the overlap and the number of samples in the data series.
Fig. 2

Window induced correlation (above) and Kolmogorov-Smirnov critical values (below) for four different window functions. Calculations are done assuming 0 % overlap and 5000 points in the time series

Fig. 3

Critical values for a 95 % confidence level for the Kolmogorov-Smirnov test on two spectral data series as defined in equation (6). Those values are obtained with a Monte Carlo calculation over 5000 white noise independent realizations, the critical values are displayed as a function of the segments overlap percentage and the number of samples (Nsamp) in the test data. I.e. critical values are calculated for two sample spectra obtained from overlapping segments and containing Nsamp data each

4 Application to LISA Pathfinder synthetic data

LISA Pathfinder is a controlled three body system composed of two test masses and the enclosing spacecraft. One test mass is free falling along the principal measurement axis and it is used as reference for the drag-free controller of the spacecraft. The second test mass is actuated at very low frequencies (below 1 mHz) in order to follow the free falling test mass. This actuation scheme provides a measurement bandwidth 1 ≤ f ≤ 100 mHz in which both test masses can be considered effectively free-falling. The system has two output channels along the principal measurement axis, one measures the displacement of the spacecraft relative to one free falling test mass and the other measures the relative displacement between the test masses. From the knowledge of the displacement signals an effective force-per-unit-mass, a eff , acting on the test masses can be extracted by a data reduction procedure that project displacement data into force-per-unit-mass using a model for the spacecraft dynamics [11]. Thanks to the common mode rejection between the two test masses, the differential force-per-unit-mass is not affected by the spacecraft noise, while It is largely dominated by test mass noise at frequencies f < 10 mHz and by the interferometer readout noise for f>10 mHz. The test mass noise is then determined by the combination of different contributions such as magnetic noise, thermal gradients, test mass charging and capacitive actuation noise.

In order to simulate a case in which the test mass is affected by non-stationary noise, we generated a set of LISA Pathfinder synthetic data in which the capacitive actuation noise is characterized by a power that is quadratically increasing with the time while the other noise sources are kept stationary. At the beginning of the time series we have the nominal capacitive actuation noise while at the end of the time series the average noise power is 6 time the nominal one. Displacement time series at the output of the interferometer differential channel is reported in Fig. 4, together with the corresponding force per unit of mass. In displacement time series the presence of an increasing low frequency noise is clearly visible, while the force per unit of mass time series seems unaffected by that noise. In reality the force per unit of mass is obtained by a data reduction procedure that involves a second derivative that largely enhance the high frequency noise component that appears dominant in a visual expection.
Fig. 4

Displacement time series (above plot) as obtained at the interferometer differential channel in our simulation. Those data are then processed in order to obtain the force per unit of mass time series (below plot)

As we have already noted, the test mass noise is dominating the noise budget only for frequencies f < 10 mHz, combining this range with the measurement bandwidth we get a frequency band of interest for our experiment 1 ≤ f ≤ 10 mHz. We adopted the procedure reported in [12] for the generation of two-channel cross-correlated data series. We then converted the raw displacement time series in effective force-per-unit-mass and we calculated the spectrogram for the differential force-per-unit-mass using a Balckman-Harris data window and 50% overlap between different segments. We then divided the sample spectra at each time bin by an expected model3 for the acceleration noise in order to have a normalized time-frequency map as reported in Fig. 5.
Fig. 5

Time-frequency spectrogram of the synthetic data series. The data series is representing a 3 days noise run sampled at 10 Hz. The frequency band of interest is 1 ≤ f ≤ 10 mHz as marked in the figure. For the calculation of the spectrogram, we converted the raw data series in force-per-unit-mass and split it in 50 overlapping segments (50% overlap). The spectrogram is obtained calculating the sample spectrum for each of the segments and then normalizing for the expected value, which is calculated assuming all the noise sources stationary and at their nominal values

The Kolmogorov-Smirnov test can be applied to the spectrogram data in order to perform a quantitative assessment of the noise evolution with the time in the frequency band of interest. As discussed in Section 3 we use the sample spectrum corresponding to the first time bin as reference. All the sample spectra corresponding to the other time bins of the spectrogram are then compared against the reference using the Kolmogorov-Smirnov test. The quantity d K reported in Fig. 6 corresponds to the Kolmogorv-Smirnov statistic \(d_{K} = \textnormal {max}\left [d_{K}\left (x\right )\right ]\), where \(d_{K}\left (x\right )\) is defined in equation (6).
Fig. 6

Kolmogorov-Smirnov statistic as a function of the time. The statistic is calculated for the time-frequency data reported in Fig. 4 in the frequency band of interest 1 ≤ f ≤ 10 mHz. We also report the threshold line corresponding to different confidence levels, such levels are obtained by a Monte Carlo calculation over 5000 independent white noise realisations. The graph should be interpreted in the sense of a statistical test in which one rejects the null hypotheses if the test statistic is larger than the critical value. Here the null hypothesis is that the noise level in the given segment of the spectrogram is compatible with the one of the first segment. If one of the threshold level is exceeded then the two noise levels should be considered not compatible to the confidence level associated to that threshold

As can be observed in Fig. 6, the Kolmogorov-Smirnov statistic is presenting an increasing trend with the increase of the time, which is indicating a departure from the statistic of the spectrum corresponding to the first time bin. The dashed lines on the plot are the thresholds corresponding to different confidence levels. As can be seen all the lines are crossed with the increase of the time with the exception of the 99.99 % confidence line. In order to have a quantitative comparison we can look at the values of the in-band energy excess for the different time bins. In particularly we have that at the time 1.5,2 and 2.5 × 105 seconds, we observe an excess energy with respect to the first time bin of 12,16 and 21% respectively.

5 Conclusions

A procedure for the statistical analysis of time-frequency noise maps was presented and applied to LISA Pathfinder synthetic data. The procedure is based on the Kolmogorov-Smirnov test that, thanks to its flexibility, can be applied in a straightforward way to the analysis of time-frequency maps. The influence of the correlations introduced by the data windowing process was classified and quantified thanks to a Monte Carlo calculation over 5000 independent realizations of a Gaussian white noise process. The application of the test to LISA Pathfinder synthetic noise data has demonstrated the capability of detecting non-stationary features in the noise data series. The proposed experiment was simulating a failure in the capacitive actuation hardware that was introducing a quadratically increasing power term to the test mass noise time series. The test applied to a normalized time-frequency map has unambiguously demonstrated its capabilities of detecting non-stationary behavior in noise data series. In fact, the Kolmogorov-Smirnov statistic clearly demonstrates an evolution with the time that is a consequence of the change in the power content of the noise time series. In particular we have observed, in our test, that the Kolmogorov-Smirnov statistic is convincingly crossing the 95 % confidence threshold for in-band energy excess greater then 12 %.

Footnotes

  1. 1.

    w is assumed to be square normalized to 1 so that \(\sum _{i} {w_{i}^{2}} = 1\).

  2. 2.

    Critical values are cut-off values that define regions where the test statistic has a probability lower than α to be if the null hypothesis is true. α is the significance level such that the confidence level is 1−α. The null hypothesis is rejected if the test statistic lies within this region which is often referred to as the rejection region [10].

  3. 3.

    The expected model was obtained by a fit procedure of a sample spectra realized with all the noise sources kept stationary at their nominal values.

Notes

Acknowledgements

This research was supported by the Centre National d’Études Spatiales (CNES).

References

  1. 1.
    Feller, W.: Ann. Math. Statist. 19(2), 177 (1948)CrossRefzbMATHMathSciNetGoogle Scholar
  2. 2.
    Congedo, G., Ferraioli, L., Hueller, M., De Marchi, F., Vitale, S., Armano, M., Hewitson, M., Nofrarias, M.: Phys. Rev. D 85(12), 122004 (2012)CrossRefADSGoogle Scholar
  3. 3.
    Antonucci, F., et al.: Class. Quantum Gravity 28(9), 094002 (2011)CrossRefADSGoogle Scholar
  4. 4.
    Antonucci, F., et al.: Class. Quantum Gravity 28(9) (2011)Google Scholar
  5. 5.
    Antonucci, F., et al.: Class. Quantum Gravity 28(9), 094006 (2011)CrossRefADSGoogle Scholar
  6. 6.
    Armano, M., et al.: Class. Quantum Gravity 26(9), 094001 (2009)CrossRefADSGoogle Scholar
  7. 7.
    Ferraioli, L., Congedo, G., Hueller, M., Vitale, S., Hewitson, M., Nofrarias, M., Armano, M.: Phys. Rev. D 84, 122003 (2011)CrossRefADSGoogle Scholar
  8. 8.
    Percival, D.B., Walden, A.T.: Spectral Analysis for Physical Applications. Cambridge University Press, Cambridge, UK (1993)CrossRefzbMATHGoogle Scholar
  9. 9.
    Harris, F.: IEEE Proc. 66(1), 51 (1978)CrossRefADSGoogle Scholar
  10. 10.
    NIST/SEMATECH e-Handbook of Statistical Methods, http://www.itl.nist.gov/div898/handbook/ (2013)
  11. 11.
    Ferraioli, L., Hueller, M., Vitale, S.: Class. Quantum Gravity 26(9), 094013 (2009)CrossRefADSGoogle Scholar
  12. 12.
    Ferraioli, L., Hueller, M., Vitale, S., Heinzel, G., Hewitson, M., Monsky, A., Nofrarias, M.: Phys. Rev. D 82(4), 042001 (2010)CrossRefADSGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2014

Authors and Affiliations

  • Luigi Ferraioli
    • 1
    • 6
    Email author
  • Michele Armano
    • 2
  • Heather Audley
    • 3
  • Giuseppe Congedo
    • 4
  • Ingo Diepholz
    • 3
  • Ferran Gibert
    • 5
  • Martin Hewitson
    • 3
  • Mauro Hueller
    • 4
  • Nikolaos Karnesis
    • 5
  • Natalia Korsakova
    • 3
  • Miquel Nofrarias
    • 5
  • Eric Plagnol
    • 1
  • Stefano Vitale
    • 4
  1. 1.APC, Université Paris Diderot, CNRS/IN2P3, CEA/Ifru, Observatoire de Paris, Sorbonne Paris CitéParis Cedex 13France
  2. 2.SRE-OD ESACEuropean Space AgencyMadridSpain
  3. 3.Albert-Einstein-InstitutMax-Planck-Institut fuer Gravitationsphysik und Universität HannoverHannoverGermany
  4. 4.University of Trento and INFNPovo (Trento)Italy
  5. 5.Facultat de CiènciesInstitut de Ciències de l’Espai, (CSIC-IEEC)BellaterraSpain
  6. 6.Institut für GeophysikETH ZürichZürichSwitzerland

Personalised recommendations