Design and implementation of a seismic Newtonian-noise cancellation system for the Virgo gravitational-wave detector

Terrestrial gravity perturbations caused by seismic fields produce the so-called Newtonian noise in gravitational-wave detectors, which is predicted to limit their sensitivity in the upcoming observing runs. In the past, this noise was seen as an infrastructural limitation, i.e., something that cannot be overcome without major investments to improve a detector's infrastructure. However, it is possible to have at least an indirect estimate of this noise by using the data from a large number of seismometers deployed around a detector's suspended test masses. The noise estimate can be subtracted from the gravitational-wave data; a process called Newtonian-noise cancellation (NNC). In this article, we present the design and implementation of the first NNC system at the Virgo detector as part of its AdV+ upgrade. It uses data from 110 vertical geophones deployed inside the Virgo buildings in optimized array configurations. We use a separate tiltmeter channel to test the pipeline in a proof-of-principle. The system has been running with good performance over months.


I. INTRODUCTION
The detection of gravitational waves (GWs) from a binary black hole coalescence in 2015 [1] by the Advanced LIGO detectors [2] marked the start of a new era in GW astrophysics.Ever since, the Advanced Virgo (AdV) and Advanced LIGO detectors [2,3] have detected more than a hundred GW signals [4] spanning over three observing runs.Each of the observing runs were followed by a period of instrument upgrade and commissioning [5] aimed at improving the sensitivity and the duty-cycle of the detectors.The AdV detector achieved a binary neutron star range of about 60 Mpc towards the end of the third observing run which lasted until March, 2020 [6].Following this, a series of instrument upgrades were planned to achieve the Advanced Virgo Plus (AdV+) sensitivity [7].Design and implementation of an online Newtonian noise cancellation (NNC) system was one of the planned activities aimed at improving the low-frequency sensitivity of the detector during the first phase of AdV+ upgrades.From an astrophysical standpoint, improving the lowfrequency sensitivity would increase the possibilities of detecting GW signals from stellar-mass black hole mergers and also increase the rate of detection of intermediatemass binaries.Additionally, better constrained estimates of parameters like the chirp mass and effective spin of the binaries is also expected [8].
Terrestrial gravity noise also known as Newtonian noise (NN) originates due to the gravitational coupling of ambient density fluctuations to the suspended test masses of the interferometer [9].These density fluctuations can be either atmospheric due to pressure and temperature fluctuations [10,11], or of the subsurface due to the propagation of seismic waves [12,13].The former is referred to as atmospheric NN, while the latter is referred to as seismic NN.In this article, we address the cancellation strategies concerning seismic NN.
Newtonian noise is expected to be one of the major fundamental limits to the sensitivity of the AdV+ detector in the frequency band 10-20 Hz. Figure 1 shows the contribution of the several fundamental sources of noise to the AdV+ design sensitivity [7].The contribution of NN to the low-frequency sensitivity of the detector can be estimated by either using analytical models [14] or by finite element simulations of the seismic wavefield in the vicinity of the test-mass [15][16][17].Both approaches require surface-seismic array studies aimed at deciphering the dominant wave type at the site (surface or body waves) and also quantifying the contribution of each of the wave-types from the different anthropogenic sources of noise.Prior to the design of the NNC for AdV+, several surface-seismic array studies have been conducted inside each of the end buildings [16,18] and outside the interferometer arms [19].
Based on the understanding of the propagation characteristics of the seismic waves near the test-masses of the interferometer, it is possible to design an optimal surfacearray of seismometers for NNC.A first such NNC system was proposed in [20] which makes use of the correlation between the ground motion measured by seismometers near the test-masses and the main interferometer signal.The underlying principle for NNC systems makes use of the linear relation between the measured ground motion and the expected Newtonian noise in order to design a Wiener filter corresponding to each of the seismometers [21,22].Application of such a subtraction scheme was fully simulated in time domain [23].In cases when the NN originates due to a pure Rayleigh wavefield, studies by [24] have shown that NNC by even one tiltmeter would achieve NN residuals that would be limited only by the tiltmeter self-noise.The study also shows that for a more pessimistic scenario, when the seismic noise is a mixture of body and surface waves, a modest cancellation by about a factor two would be possible.However, before a noise-cancellation scheme can be implemented and tested, the positions of seismometers near the testmasses need to be determined for optimal NNC.Determination of the optimal locations of the seismometers for NNC is an optimization problem that minimizes a residual, which can be estimated by making use of the crosscorrelations between seismometers and that between the seismometers and the expected Newtonian noise.For AdV+, based on prior estimates of correlations between seismometer channels, a Particle Swarm optimizer was used to determine the optimal geometry of the NNC arrays corresponding to each of the end buildings [25].
In this paper, we present the results of the cancellation of the tilt signal measured at the North End Building (NEB) of the AdV+ detector [26] by using the seismic noise data measured by the NEB NNC array.In section II, we present the seismic wavefield characteristics at the NEB and prove that it is dominated by Rayleigh waves, a case in which the tilt signal can be used as a proxy for the expected NN.In Section III, we present the NN estimates for the AdV+ detector based on array studies and finite element simulations.In Section IV we present the optimization results that helped in designing the surface array of seismometers for the NNC system.In Section V we derive the expressions corresponding to the time-domain implementation of the Wiener filter for a Multiple-Input-Single-Output (MISO) system and detail the several signal processing steps implemented in the NNC pipeline.The noise-cancellation performance of the system when using the tilt signal as the target is presented in Section VI.Finally, we present the conclusions of our work in Section VIII.

II. SITE CHARACTERISTICS
The AdV+ NNC array comprises a total of 110 seismic sensors, with 55 sensors deployed at the Central Building (CEB), and 30 sensors each at the NEB and the West End Building (WEB).These sensors were deployed in 2020 in their optimal positions (see Section IV) with some refinements of the CEB array a year later.Each sensor  is equipped with a vertical geophone with a resonance frequency of 4.5 Hz and a data acquisition system.Sensors creating an array are connected to an SPU (signal processing unit), providing sensor communication, time synchronization, and power.The sensor data acquisition system samples the seismic signal from the geophone at 500 samples/s and sends data after time synchronization to the storage server through the SPU.SPU provides sensors with a modified ethernet standard, which uses a communication speed of 100 Mb/s and only two pairs of ethernet cables.Another pair is used to provide time synchronization pulses generated every 1 s.The last pair provides the power supply.These installations follow an initial site-characterization phase with deployments of temporary arrays in 2018 as reported in [18].Figures 2 (a), (b), and (c) show the positions of the sensors at the CEB, NEB, and WEB, respectively.In this Section, we present the amplitude and the phase characteristics of the seismic noise corresponding to the layout after the optimization studies were done.The amplitude characteristics of the seismic noise data presented here were acquired between April 01, 2023 and May 07, 2023 at a sampling frequency of 500 Hz.Seismic data corresponding to each of the geophones are first divided into 1200 s long segments, and the instrument response is deconvolved, which applies a correction to the amplitude and the phase of the seismic data and converts it from voltage to ground velocity.The (single-sided) power spectral densities (PSDs) are then computed as where X m (f ) is the Fourier transform of the deconvolved seismic data at frequency f of geophone m, and T is the segment duration.The discrete Fourier transform is calculated using a Hamming spectral window [27].The estimated spectral densities for every 1200 s segment are then used to generate histograms with a bin size of 0.5 dB (1 dB = 20log 10 (m/s/ √ Hz)).Next, the 10 th , 50 th , and 90 th percentile PSDs are extracted from the histogram.The process is then repeated for all the geophones in the NNC array.
The black, blue, and red shaded regions in Figure 3(a) show the maximum and minimum of the 10 th , 50 th , and the 90 th percentiles of the PSDs for the CEB geophones.The solid black, blue, and red curves show the average of the 10 th , 50 th , and 90 th percentile PSDs.Figures 3(b) and (c) show the same corresponding to the WEB and NEB, respectively.Seismic noise at each of the buildings vary between 3×10 −14 -10 −18 m 2 /s 2 /Hz.A spatial variability of about 20 dB is observed for frequencies between 10 -15 Hz and it increases to about 40 dB for frequencies above 20 Hz.Besides broadband noise, several peaks are observed in the PSDs.These (nearly) monochromatic peaks are observed at the rotation frequency (or their harmonics) of the fans, motors, and pumps that constitute the heating, ventilation, and air conditioning system (HVAC) [28].The HVAC system is necessary for operating the experimental equipments and the clean rooms near the test-masses.
The spatial variation of PSDs inside the end buildings can be used to point to sources of noise.However, it gives very limited information on the propagation characteristics of the noise.For that we use the phase difference between the noise measured at different geophones.Here we study the phase characteristics of the noise measured at the NEB, since we want to establish that the dominant wave type propagating at the NEB is the Rayleigh wave.
The first metric we present is the propagation velocities of the seismic waves for different frequency bands.We use a plane wave beamformer [29], and estimate the frequency-domain beampower (FDB) BP as a function of slowness p (inverse of speed) and direction of propagation ϕ, which is measured anticlockwise from an eastward direction.The FDB for N concurrent segments of seismic data measured at M geophones is estimated as where S(f ) is the matrix of cross-spectral densities with where '*' denotes the complex conjugate operator.The vector w(p, ϕ, f ) has m = 1, . . ., M components exp(−2jπf τ m (p, ϕ, f )) representing the phase delays for a plane wave to reach geophone m, and j = √ −1.The time delay for a plane wave can be expressed as τ m (p, ϕ, f ) = x m p cos(ϕ)+y m p sin(ϕ), where (x m , y m ) are  the coordinates of the m th geophone.Following equation (2), we estimate the FDB for every 1200 s stretch of data during the period April 01 -May 07, 2023 corresponding to v ∈ [100, 1500] m/s at an interval of 10 m/s and ϕ ∈ [0 • , 356 • ] (measured anticlockwise from an eastward direction) at an interval of 4 • .We divide the 1200 s of data into N = 12 segments of length 100 s, which means that the FDB is estimated with frequency bins of width 10 mHz.An average over 20 adjacent bins is subsequently calculated to reduce the data volume.Figure 4 shows the FDB averaged in the frequency band 10 -10.2 Hz over all such 1200 s windows during the entire measurement period.This band is characterized by a noise peak at 10.1 Hz, which coincides with the rotation frequency of a fan, which is a part of the air handling unit (AHU) located in the technical room north of the NEB.The direction estimate from the FDB (Figure 4) matches with the location of the AHU, and the propagation speed of about 250 m/s is evidence for a surface wave.
In order to generate statistics about the wavepropagation attributes, the velocity and direction of propagation corresponding to the maximum FDB is stored for every 0.2 Hz bin and histograms are generated using all 1200 s windows.Later, we generate the probability density functions (PDFs) of the velocity and direction of propagation.The results are shown in Figures 5(a) and (b).We observe that the seismic waves dominantly propagate with speeds between 100 -250 m/s, which is characteristic of slowly propagating Rayleigh waves.The velocity PDFs for some frequency bands (Figure 5  of the NEB, and our analysis also points to the same direction. The second metric we present are the normalized crosscorrelations C m1m2 (f ) between geophones m 1 and m 2 which is computed as for each of the N time segments, and ℜ represents the real part of a complex number.Figure 6 shows the estimated cross-correlations between all 435 station pairs at the NEB.A strong positive correlation is observed for frequencies below 2 Hz, since these frequencies are characterized by surface waves with large wavelengths [19].
For the NNC frequency band between 10-30 Hz, several peaks in the cross-correlation spectra are observed.These show both positive and negative cross-correlations and are characteristic of plane waves with wavelengths that are shorter than the array aperture (≈ 25 m for the NEB).These cross-correlations can be explained with an anisotropic plane wave (APW) model.The theoretical cross-correlation value C APW i,j (f ) between geophones located at position vectors ⃗ r i and ⃗ r j corresponding to a propagation speed v and angle of propagation between ϕ 1 and ϕ 2 is expressed as where Several frequency bands do not show the strong positive and negative cross-correlations, and can be explained as a mixture of an APW and a Gaussian correlation model.The Gaussian correlation model between stations with position vectors ⃗ r i and ⃗ r j can be expressed as where σ(v, f ) = v πf , v is the speed of the propagating wave at frequency f , and A G is the amplitude of the source.Gaussian correlation models are used to model the effect of sources of noise within the array and the effect of wave reflection and scattering, which suppress correlations over larger distances compared to APW models.An application of Gaussian correlation models to the Advanced LIGO seismic data has been shown in [30].
Figure 8(a) shows the spatial distribution of the observed cross-correlations averaged in the frequency band 11 -13 Hz.We try to model the observed crosscorrelations using a mixture of APW and Gaussian correlation models.].We observe that not all of the crosscorrelations are reconstructed accurately using these models and is testament to the complexity of the seismic field inside the Virgo buildings.
In summary, the two metrics presented for interpreting the phase characteristics of the seismic noise at the NEB point to a dominant contribution from Rayleigh waves for the noise peaks.However, for the broadband noise, analytical models cannot reconstruct the observed correlations and only a part of the seismic field can be explained with a plane wave approach.Although we did not show any results from the CEB and WEB, seismic noise characteristics at these two buildings are similar to the NEB.

III. SIMULATIONS OF VIRGO NN SPECTRA
The Virgo detector incorporates open spaces or recesses under the test masses as part of its clean room system.First calculations of Virgo's NN spectra relied on analytical equations that assume a flat surface, leaving room for improvement in accurately modelling the effects of recesses.The proper dimensions of the recesses under the input and end test mirrors in Virgo's central and end buildings were taken into account for NN estimation in [15,16].Here we summarize the main results of these studies.
To assess the impact of the recesses, simulations were performed of an isotropic distribution of Rayleigh-wave propagation directions in the vicinity of the test-masses.The speed of Rayleigh waves is an important parameter, which was taken from an analysis similar to what is shown Figure 5(a).The slower the waves (the shorter the wave length) the more effective the recess to reduce NN.Using a finite-element model, the resulting gravity perturbation caused by these Rayleigh waves was integrated.The mathematical formulation and parameters of the Rayleigh-wave field used in the simulation can be found in [14].
Simulations were done for frequencies between 5 Hz and 25 Hz.The integration over finite-element displacements, which gives rise to gravity perturbations δa(⃗ r 0 , t) at the position ⃗ r 0 of the test-mass, can be expressed as follows: In this equation, ⃗ r i represents the position of the i th finite element of volume V i , ⃗ ξ(⃗ r i , t) is its seismic displacement, and ⃗ e i is the unit vector pointing along ⃗ r i − ⃗ r 0 .The finite-element model allows us to consider both the gravity perturbations resulting from vertical surface displacement and the compression or decompression of the underlying ground medium by summing over these effects.
In the estimation of the total NN with contributions from four test masses, as presented in Figure 9, the assumption was made that the NN in the 5-25 Hz band is uncorrelated between the test masses.This assumption is certainly valid for NN correlations between the two test masses of an interferometer arm, but it might not be valid between the two 3 km distant test masses inside the CEB.Compared to earlier analyses, it was found that the recess causes NN to be reduced by about a factor of 4 within the frequency range 10-20 Hz when accounting for the observed Rayleigh-wave dispersion.
FIG. 9. Comparison of Virgo NN estimates summing contributions from all four test masses as published in [16].For reference, the low-noise models of the O4 and O5 observation runs are shown [31].The blue curve represents the NN for a flat surface.The NN estimate with constant velocity of 250 m/s is shown in green.The NN estimate considering the observed Rayleigh-wave dispersion is shown in red.
Accordingly, seismic NN is expected to largely fall below the sensitivity targets set for the upcoming observation runs, O4 and O5, with the exception of a few peaks.However, this analysis does not consider the contribution of NN transients associated with stronger transient waves of the seismic field.In fact, given the highly non-stationary character of the seismic field, one should expect frequent perturbation of Virgo data by NN transients in O5 [32].

IV. SYSTEM DESIGN
The main goals of a NNC system design are to reduce NN in the GW detector data to an acceptable level and to do so reliably over months and years of operation.NNC systems will typically require a large number of sensors, which means that sensor failures and problems with data quality, e.g., introduced by electromagnetic disturbances or by the data-acquisition system, must be extremely rare.A single sensor whose data quality degrades for some reason while the NNC system is operational can spoil the NNC performance.
Concerning the NNC performance, as long as the goal is to reduce the NN spectral density (instead of subtracting NN transients, which is a different problem not specifically addressed with the current Virgo NNC design), it is determined by the correlations between all the seismometers and between seismometers and the Virgo GW channel, and by the signal-to-noise ratios of the seismic measurements.The correlations depend on what is actually measured, e.g., horizontal or vertical seismic displacement.If NN is not (yet) observed, the correlations between seismometers and GW detector data must be modeled.We describe below how these correlations are used to calculate optimal array configurations.
Finally, it must be decided what type of noisecancellation filter is used.In section VI, we will present results for a time-invariant, finite-impulse response (FIR) Wiener filter, but adaptive Wiener filters, Kalman filters and other types of time-variant filters can be considered.In fact, we will argue in section VI and VII that adaptive Wiener filters should not only be expected to achieve better average NNC performance, but also to solve practical issues.

A. Instrumentation
Assuming that seismic NN is dominated by contributions from Rayleigh waves, the natural choice for efficient NNC would be to monitor seismic displacements along the horizontal directions of the interferometer arms since this would lead to higher correlations between seismometers and seismic NN compared to vertical seismometers [24].However, the seismic sensors deployed for the NNC system at Virgo are vertical geophones.The reason is that while our array measurements presented in section II provide strong evidence that Rayleigh waves make the dominant contribution to vertical surface displacement and seismic NN, horizontal surface displacement is expected to have significant contributions from Love waves.Love waves can only produce NN through inhomogeneous geology and non-planar surfaces, which means that in the presence of a dominant Rayleigh-wave field, the main effect of Love waves is to reduce correlations between horizontal seismometers and seismic NN.It is an immense benefit for NNC to deal with only one type of seismic wave [33], and so the choice of vertical geophones measur-ing Rayleigh-wave displacement is justified.As a note, a tiltmeter was deployed at the NEB to investigate a potential utilization for NNC [24,26].It must be emphasized that the discussion so far is accurate only if the surface is flat, which is not the case at Virgo around its test masses [15].It has never been analyzed whether a combination of horizontal and vertical sensors or tiltmeters might lead to improved performance of the Virgo NNC system.
Another technical design choice was to digitize the geophone data at the sensors, and to send the digitized data to a central data-acquisition unit.The rationale for this decision was that it avoids excess noise from ambient electromagnetic fluctuations coupling into cables and connectors transmitting the seismic data.This design comes with its own risks.For example, the timing and digitization of data at the sensors is a source of noise, and in fact, during the commissioning of the arrays, data-quality issues were noticed and eventually solved by modifying the sensor housing (increasing distance and adding EM shielding between geophone and digitizer).Also, receiving and packaging timed digitized data from many sensors is a complex operation, which can fail.Issues with this operation were identified in the early phase of the commissioning of the NNC system and had to be solved.The only remaining known data-quality issue is coming from loss of a few samples per day created by the digitizers of the individual sensors.However, these sample losses have a negligible impact on noise-cancellation performance.

B. Optimizing the array configuration
Virgo presents a complicated structure: the ground is not homogeneous and there is a basement under each test mass whose floor is 3.5 m below the surface.At the end buildings, the walls of the basement are disconnected by a thin gap (5 cm) from the main building floor, which reduces the transmission of external seismic disturbances [18].The entire structure supported by the basement is called tower platform and it is anchored with 52 m deep pillars to a more stable gravel layer beneath the clay (there are many gravel layers, which alternate with clay in the substrate of soil beneath Virgo [34]).These pillars are meant to prevent the sinking of the basement.This complex structure and the presence of local seismic sources entail a seismic field that is not describable with analytical models.Therefore, unlike LIGO, finding the optimal array to cancel NN in Virgo is not a trivial task and it is necessary to rely on measured seismic correlations [30].Correlation measurements can only be done between a finite set of points on the surface, and the full correlation function between any two points of the seismic field needs to be properly reconstructed.To search for the optimized array configuration two things are necessary: the reconstructed correlation function between seismometers and the correlation vector between seismometers and GW detector noise.The latter can be either modeled or measured.
In the following, we summarize the main results of [25].Any optimization needs a cost function to be minimized.For NNC, a commonly used cost function is the spectral density of the residual noise E(f ) left after NN subtraction in the GW data.Expressed as a relative reduction of the NN spectral density N (f ), the cost function can be written as [20]: where S(f ) is the seismometer cross-power spectral density matrix of the seimometers and P(f ) is the crosspower spectral density vector of seismometers and seismic NN.
The optimization can be performed by minimizing −P † (ω)S −1 (ω)P(ω), which is the term depending on the positions of the seismometers.The optimization for the NNC has to find the global minimum of R(f ) with respect to the seismometer positions.This can be done with a stochastic optimization algorithm, such as Particle Swarm or a Genetic Algorithm.At each step of the stochastic optimization, the value of the cost function (the residual) relative to randomly sampled array configuration is evaluated.The next configuration is then chosen following some criteria, which depend on the chosen algorithm [35][36][37].This means that the optimizer must be able to calculate the residual, and therefore S(f ) and P(ω), for any possible array configuration.
Site-characterization measurements provide correlations only between a finite set of seismometer positions.Some form of interpolation of the correlation measurements is needed to carry out an array optimization.Standard interpolation techniques (linear, cubic, spline) are not accurate enough and in any case cannot be used to extrapolate to sensor positions outside the convex hull of the site-characterization array.More sophisticated Bayesian methods are computationally very expensive.A solution to this problem was presented in [25].Instead of performing an interpolation of measured correlations S(x i , y i , x j , y j , f ) between sensors i, j, it is possible to interpolate the Fourier transform of the signals recorded by all seismic sensors, which only depends on two coordinates.It is then possible to evaluate S(x i , y i , x j , y j , f ) for any pair of sensor positions by exploiting the convolution theorem (see [25] for further details).To further accelerate the optimization process, one first evaluates S(x i , y i , x j , y j , f ) on a denser grid with the method discussed above, and then uses a standard interpolation technique for a rapid evaluation of S(x i , y i , x j , y j , f ) for arbitrary sensor locations.One thereby obtains a surrogate model of seismic correlations, which are also required for a model of P(ω) [14].This means that the full cost function R(f ) is now given as a surrogate model and optimization can be performed.The results of such an optimization are shown in Figure 2. The optimization is performed at a specific frequency for an arbitrary (but fixed during optimization) number of sensors.It is also possible to perform an array optimization for broadband NNC.This can be done by building a cost function, for example, as a sum of residuals at different frequencies or by minimizing the maximum of residuals at different frequencies.

V. WIENER FILTERING
The filtering of seismic data for NNC in the time domain is formulated as a MISO system where the multiple inputs comprise the data from the geophones (reference channels) and the target is the GW channel of the Virgo detector, or in our case, for test purposes, the tiltmeter signal measured at the NEB (see section VI).Given the linear relation between the measured seismic data and the expected NN and since the cost function is quadratic in the residuals, the objective is to estimate the optimal linear filter, i.e., Wiener filter, mapping samples of the geophones to a combined NN estimate.Following the Wiener theory [38], the k th sample of the error signal in the time domain is expressed as, where y(k) is the target signal, ) is a column vector of the past L samples of the data measured at each of the M reference channels, and ⊤ is the transpose sign.Hence, every element of the vector x(k) is a column vector of the form x m (k) = [x m (k), x m (k − 1), ..., x m (k − L + 1)] ⊤ (L×1) .Similarly, every element of can be expanded as . Thus, given the past L samples of the reference data, the optimal impulse response per reference channel can be used to estimate the present sample of the target signal.The optimal set of impulse responses is obtained by solving ∂E{e 2 (k)}/∂h(k) = 0 (M L×1) , and it yields The matrix S and row vector P can be expressed as , and (11) Each submatrix Φ ij of the block matrix S can be further written as, where c ij (τ ) is the cross-correlation between the reference data measured at the i th and j th channels corresponding to the time lag τ .Each element Ψ ym of the row vector P can be expanded as, ].It should be noted that the cross-correlations c ij and c ym used in the Wiener filter calculation are typically averaged over a day of data.This makes the crosscorrelations less sensitive to the temporal variability of the seismic data, and the performance of the Wiener filter becomes more robust.
The NNC system aims at removing the contribution of seismic NN in the frequency band 10-30 Hz.Hence, for the real-time implementation, several stages of signal preconditioning are implemented to each of the reference channels before the Wiener output ŷ(k) = h(k)x(k) can be subtracted from the target channel.At the first stage, the reference data acquired at a sampling frequency of f R = 500 Hz are decimated to f D = 100 Hz by using a Hamming window FIR (Finite Impulse Response) lowpass filter of order M A = 100 and stopband frequency f S ≥ 35 Hz.It is important to note that a M th order FIR filter has (M + 1) coefficients.At the second stage, the 100 Hz reference data are high-pass filtered using a Hamming window FIR filter of order M B = 50 and passband frequency f p ≥ 10 Hz.At the third stage, the Wiener filter of order L = 100 is applied to the low-pass and high-pass filtered data.However, before the Wiener output can be subtracted from the target data, it needs to be upsampled to the sampling frequency of the target data f T = 10 kHz (for the tiltmeter signal) or 20 kHz (for the Virgo strain signal).At the final stage, in order to remove the aliasing effect in the upsampled data, the data is low-pass filtered with a Hamming window FIR filter of order M C = 5000 and f S ≥ 35 Hz.The upsampled Wiener output is finally subtracted from the target data and we produce the NN cancelled target data.
For the real-time implementation of the abovementioned steps, two things must be addressed.Firstly, a circular buffer of the reference data needs to be maintained.This is due to the fact that the application of a FIR filter of order M to a time series data produces the filtered output starting at the (M + 1) th sample of the data.Secondly, an FIR filter of order M introduces a time delay of (M/2) samples (M ∈ even positive integer), hence the Wiener output needs to be aligned in time with the target data before subtraction.The NNC application acquires data from the Virgo data stream every second and creates a buffer of N B samples per channel.The length of this buffer can be expressed as Corresponding to f R = 500 Hz, f T = 10 kHz, and the filter orders mentioned previously, a buffer of 1600 samples or 3.2 s (corresponding to f R = 500 Hz) is necessary.Hence, the first second of output is only produced at the end of the first 4 s.The process then repeats and the buffer is replenished with new data every second.Next, we align the starting point of the one-second long Wiener output which was obtained by processing the 3.2 s of the reference data.The starting sample N S of the 3.2 s long target data that aligns with the first sample of the upsampled Wiener output is expressed as Using the sampling frequencies at the different stages of processing and the respective filter orders, the starting sample equals 16001 or 1.6001 s.Hence, the Wiener output is subtracted from the 10 kHz target data between samples 16001 and 26000.This also implies that the delay introduced in the one second long NN canceled output is 6000 samples or 0.6 s.Figures 10(a)-(e) show the shifts in the data at each stage of the NNC application.

VI. PROOF OF PRINCIPLE
The performance of the NNC system was tested by using the tiltmeter signal measured at the NEB as the tar-get data and the geophone signals as the reference data.The location of the tiltmeter inside the NEB is shown with a red star in Figure 2(c).The tiltmeter was initially developed within the Archimedes experiment [39] as a beam-balance prototype and essentially functions as a rotational sensor.The resonance frequency of the tiltmeter is about 25 mHz corresponding to a center of mass positioning within 10 µm of the bending point.It is equipped with two different optical readout systems comprising a Michelson interferometer for higher sensitivities and an auxiliary optical lever capable of handling a larger dynamic range.A detailed description of the tiltmeter and an assessment of its sensitivity in a quiet seismic environment can be found in [40] and [26], respectively.Figure 11(a) shows the 10 th , 50 th , and 90 th percentiles of the PSDs of the tilt signal.These were estimated by dividing the data into 1000 s long segments and corresponds to the period May 01 -08, 2023.The measured tilt in the 10 -20 Hz band is about 10 −11 -10 −10 rad/ √ Hz and is comparable to that measured at the LIGO Hanford site [21].The sharp peaks that appear in the PSDs coincide with Rayleigh waves originating from the HVAC system of the NEB, and was previously established in section II.Consequently, strong positive or negative cross-correlations between the tiltmeter and geophone signals are observed at these peaks (figure 11 The first step in the estimation of the Wiener filter for every reference channel is signal preconditioning.Following the steps and the filter orders mentioned in section V, the tiltmeter signal is first downsampled from 10 kHz to 100 Hz, and the geophone signals are downsampled from 500 Hz to 100 Hz.Next, the data are high-pass filtered with a passband frequency ≥ 10 Hz.The mean of the 10 -35 Hz signals are then subtracted and the zeromean signals are used to estimate the cross-correlations between the geophones and that between the tiltmeter and the geophones.We use the data measured on May 01, 2023 to estimate the cross-correlations.Finally, following equation (10), the Wiener filter is estimated using all geophones as reference channels.Figure 12(a) shows the amplitudes of the Wiener filter for one of the geophone channels.Filter amplitudes for other geophones look similar.We show the unwrapped phase of filters of all geophones in the frequency domain in figure 12(b).Filters of different geophones have different phase characteristics depending on the geophone locations and capture the propagation characteristics of the seismic noise at different points inside the NEB.The Wiener filter estimated from one day of cross-correlations (May 01, 2023) is then used to reconstruct the tiltmeter signal for the next seven days (May 02 -08, 2023).The blue and red curves in figure 13(a) show the PSDs of the tiltmeter signal and the signal estimated by applying the Wiener filter to the geophone data for a 1000 s stretch.We denote the error between the measured tilt signal τ (k) and the reconstructed tilt signal τ (k) as e(k) = τ (k) − τ (k).
In the frequency domain, the noise cancellation factor in decibels is then defined as where E(f ), T (f ) represent the PSDs of the error signal e and tiltmeter signal τ at frequency f .Since the composition of the seismic field is dependent on the frequency, we further average R dB (f ) for different frequency bands of interest.Figure 13(b) shows the temporal evolution of the noise cancellation factor for five different frequency bands.These bands are chosen such that they don't overlap with the sharp spectral peaks.
The best noise cancellation factor of about 10 -15 dB is observed for the bands 13.3 -15, 15.5 -19, 25 -30 Hz during the day time.The cancellation factor is less than 5 dB during the night time.It is worth noting that the Wiener filter that is estimated using a day of data is dominated by strong noise cross-correlations that occur during the day, hence a better cancellation is observed during the day time.However, the lack of noise cancellation during the night does not add noise to the subtracted signal.The worst performance is observed for the band 21 -25 Hz, where the noise cancellation factor is slightly above 0 dB, implying that it adds little noise to the output.Similar to the broadband case, the evolution of the noise cancellation factors for the spectral peaks are shown in figure 13(c).As expected, we observe a better noise cancellation factor of more than 15 dB and is in accord with the strong Rayleigh wave content of these signals.Unlike the broadband case, little diurnal variation is observed.These signals are characterized by a strong SNR and a stationary phase, and are affected little due to interference from local transient noise sources.
The noise cancellation results shown in Figures 13(b) and (c) point to moderate temporal variation in the seismic field characteristics at the site.In particular, the noise cancellation in the band 21 -25 Hz is poor (even enhancing noise).Hence we assessed the performance of the Wiener filter for two cases.In the first case, the Wiener filter was calculated every day and applied to the same day of data.In the second case we calculated the Wiener filter every 1000 s and applied it to the same data stretch.The blue curve in Figure 14 shows that no excess noise is added to the output data for the band 21 -25 Hz when the Wiener filter was updated every day.The performance is further improved by about 5 dB in the case when the filter was updated every 1000 s.This points to variability in the origin and the propagation characteristics of the noise in this band, and that the static Wiener filter although calculated using a full day of crosscorrelations is not optimal.The variability in the direc- tion of propagation of the noise in the 21 -25 Hz band is also observed in Figure 5(b), where the histograms of the estimated direction of propagation does not point to a persistent source of noise.Although the performance of the static Wiener filter for other frequency bands is satisfactory, it must be noted that updating the Wiener filter a few times every day would further improve the cancellation performance.This pattern is also reflected in the temporal evolution of the filter amplitudes for every channel.If no variation in the amplitude and phase characteristic of the filter is observed, that would imply a stationary seismic field and the noise cancellation would not vary with time.We estimate the Wiener filter for the same week during which noise cancellation results were shown earlier.Figure 15 shows the temporal evolution of the magnitude of the Fourier transform of the Wiener filter corresponding to one of the geophones at the NEB.We observe a diurnal variation with higher amplitudes during the day.Hence, the application of a static Wiener filter which has been estimated using a certain period of the data is not the best solution when the noise varies significantly between days.In such cases the optimal filter needs to be adaptive and should be calculated for every new data sample or data stretch, depending on the needs of the cancellation system.A performance analysis of adaptive Wiener filters is beyond the scope of this work, but such schemes are currently under study and their suitability for a real-time application are being tested.

VII. PREPARATIONS FOR FUTURE PERFORMANCE IMPROVEMENTS
A question that needs to be answered as part of the risk management and design phase of the Virgo NNC system is what to do if its performance is not good enough, or not as good as expected.The sensor array, dataacquisition system, and data-processing pipeline are well characterized at this point and function as foreseen.The noise-cancellation performance with the tiltmeter as target channel is as expected, i.e, similar performance was observed at the LIGO Hanford site with a tiltmeter and temporary array deployment [21].This means that if the performance of the NNC system is not as good as expected, then the most likely explanation is that the sensors do not provide all the required information about environmental fields to do efficient noise cancellation.Site characterization, NN modeling, and array optimization were the three most important steps to predict NNC performance.Structural vibrations near the test masses were studied with vibration measurements and NN modeling with the conclusion that they make a small contribution to NN. Arrays of microphones were deployed in all buildings (more than 70 microphones in total).These microphones were planned from the beginning as part of the NNC system to cancel NN from the acoustic field [11].They turned out to be less important to NNC after the detector infrastructure team managed to reduce the level of acoustic noise by making changes to the ventilation system [41].Only after these changes, Rayleigh-wave NN became the clearly dominant predicted contribution to NN.Nevertheless, the microphones are being used for site characterization, and they might become valuable in the future to improve the NNC performance.Construction of finite-element models has begun for dynamical simulations of the seismic field, implementing all we know about the structure of buildings, surface, and geology.If NNC does not perform as expected, these simulations would provide important information about missed properties of the seismic field and how to adapt the NNC array to improve performance.
Another design modification that promises performance improvements is to switch to time-variant filters, e.g., adaptive Wiener filters.These can take the form of recursive least squares filters, or Kalman filters, etc.We have some indication that correlations of the seismic field change during the day and during the week, and implementing adaptive Wiener filters might improve performance [18].Studies are already underway to explore time-variant filters for noise cancellation and to assess their robustness and effectiveness with respect to static Wiener filters.
More important design upgrades have been discussed.As can be seen in figure 2, the distance of some of the test masses to the building walls is only several meters.The array optimization does not suggest sensor placements outside the buildings, but the calculated optimized arrays are not expected to provide a broadband NN reduction by more than a factor 3 in amplitude [25].For greater noise reduction, it might be necessary to add outdoor sensors to the array.It will also be investigated whether seismic tiltmeters can improve NNC performance as expected for sites with flat surfaces [24].
Finally, the most advanced design upgrade might come from a robotic sensor array currently under development [42].A pilot project called Flexible Grid Mapping Tool (FGMT) is being carried out at the Virgo interferometer site with the collaboration of the European Gravitational Observatory and the Gran Sasso Science Institute.The FGMT is part of the European research project AHEAD-2020.The idea is to move the array optimization from a simulated environment to the real system.The robots will move the sensor to their optimal locations, and after a data-taking phase at these positions, an improved array configuration will be calculated and the robots will move to their next locations.This process is meant to repeat until the performance of the NNC system converges to its optimum.The main challenges of this system are to manage the robot charging cycles, to navigate with high accuracy inside the buildings, to provide good ground connection of the accelerometers during measurement phases, and to realize a low-latency communication with the Virgo data-acquisition system and timing signal.

VIII. CONCLUSION
In this paper, we have presented the design and implementation of the Newtonian-noise cancellation (NNC) system for the Virgo detector as part of its AdV+ technological upgrades [7].It is the first such system in the current global network of GW detectors.The main steps that led to the design are (1) selecting sensors, (2) designing the array data-acquisition system, (3) site characterization, (4) Newtonian-noise (NN) modeling, (5) array optimization, (6) design of the noise-cancellation filter, and (7) defining data-processing steps for the online implementation.The design phase started in 2018 and was completed in 2023 soon followed by the completion of the commissioning of the system.The AdV+ NNC system consists of 110 vertical geophones whose data are digitized directly at the sensor and transmitted to a central data-acquisition unit at each of the three Virgo stations (two end buildings and the central building).These units communicate with the Virgo data-acquisition system and share its timing signal, which is propagated to all the sensors.More than 70 microphones (the number is increasing steadily due to the interest of the noise-hunting team) have been deployed as well to cancel NN from the acoustic fields.The data of these microphones are not yet included in the online NNC pipeline since acoustic NN is predicted to be a smaller contribution to the total NN.
The first implementation of the NNC pipeline uses a time-invariant, time-domain (FIR) Wiener filter.We studied its performance in a proof-of-principle with a tiltmeter as target channel.We assessed performance limitations and studied their variations with time.The Wiener filter models the PSD of the tiltmeter signal accurately above 15 Hz, but this does not necessarily mean good coherent noise-cancellation performance.For example, in the 21-25 Hz band, the cancellation performance diminishes significantly within a few hours after the data stretch used to calculate the filter.In this band, the performance does not get better again later on, which is different from the clear day-night cycle in performance seen in other bands.A careful study of this interesting observation is needed.In any case, it is to be expected that a time-variant Wiener filter will significantly improve the average noise-cancellation performance coming from these temporal changes.
According to our predictions, the AdV+ NNC design meets the requirements for a factor 3 NN reduction in average [25].The predicted performance depends on a model of seismic NN, which has limitations since the site characterization only produced measurements of vertical surface displacement.These limitations could be overcome by doing new measurements with three-axis seismometers; some of those deployed inside boreholes.Simulations based on refined finite-element models will be important as well for future improvements of NNC performance.The impact of NN transients has not been analyzed yet.While future NN observations might lead to better NNC designs, improving NNC designs beyond state-of-the-art will become an ever more challenging problem.An increasing amount of details concerning geology, topography and more extensive surveys of the seismic field, and possibly other NN contributions from structural vibrations and the atmosphere will have to be considered.The experience of the next years will be crucial to assess the true complexity of NNC also with respect to proposed next-generation detectors like the Einstein Telescope [43] or Cosmic Explorer [44].

FIG. 2 .
FIG. 2. (a) The blue and red solid circles show the positions of the geophones at level 1 and 2 of the CEB, respectively.(b) The blue solid circles show the locations of the geophones and the red star shows the location of the tiltmeter in the NEB.(c) The blue solid circles show the locations of the geophones in the WEB.Note that the origin of the coordinate system corresponds to the location of the beamsplitter at the CEB, and 'north' corresponds to the direction of the north arm of the interferometer which is oriented 20• clockwise with respect to the geographic north.

FIG. 3 .
FIG. 3. (a)The black, blue, and red shaded regions show the maximum and minimum of the 10 th , 50 th , and the 90 th percentiles of the PSDs for the CEB geophones.The solid black, blue, and red curves show the average of the 10 th , 50 th , and 90 th percentile PSDs.(b) and (c) are same as (a), but corresponds to the WEB and NEB, respectively.
(a)) show multiple peaks between 100 -250 m/s and it corresponds to the different modes of Rayleigh wave propagation.Most of the noise originates either north or south of the array (Figure 5(b)).For example, it is well known that the origin of the noise peaks at 10.1 Hz, 15.2 Hz, and 20.1 Hz is the AHU located north

FIG. 4 .
FIG. 4. Frequency-domain beampower estimated using the NNC NEB array seismic noise data in the band 10-10.2Hz showing seismic noise propagating at speeds of about 250 m/s and propagating from the north.
the amplitude as a function of source azimuth, and x, ŷ represent the unit vectors along the east and north directions, respectively.An amplitude normalization factor A P (f ) = ϕ2 ϕ=ϕ1 dϕ A APW (ϕ, f ) is applied in order to set cross-correlation values in the range [-1,1].Figure 7(a) shows the observed cross-correlations at 10.1 Hz as a function of the relative position vector between the geophones.An APW model that reproduces the observed cross-correlations using Eq.(5) with v = 250 m/s and ϕ = [80 • , 110 • ] is shown in Figure 7(b).

FIG. 6 .
FIG. 6. Normalized frequency domain cross-correlation corresponding to all the 435 station pairs at the NEB.

FIG. 7 .
FIG. 7. (a) Observed cross-correlations between geophone pairs at the NEB shown as a function of the relative position vector between them at a frequency of 10.1 Hz.(b) Theoretical cross-correlations at a frequency of 10.1 Hz computed using Eq. 5 corresponding to a velocity of 250 m/s and direction of propagation between 80 • -100 • .
FIG. 10.(a) A test signal sampled at 500 Hz composed of Ricker wavelets with peak frequencies 20 Hz, 40 Hz, and 150 Hz.The double arrow shows the length of the buffer.(b) Low-pass filtered with a Hamming window FIR filter of order MA = 100, fS ≥ 35 Hz, and decimated to 100 Hz.(c) High-pass filtered signal with a Hamming window FIR filter of order MB = 50, and fp ≥ 10 Hz.(d) Wiener filtered signal with L = 100.(e) Wiener output upsampled to 10 kHz and low-pass filtered using a a Hamming window filter of order MC = 5000, and fS ≥ 35 Hz.At each stage of processing, the signals are delayed and reduced in length which equals the filter order.
(b)).The broader peaks centered at 15, 17, 20, 27, and 34 Hz, show moderate correlations between 0.2 and 0.4.The frequency-domain crosscorrelations shown in figure 11(b) are estimated using equation (4), and by dividing an entire day of data into 30 s long segments.Consequently, the minimum value of significant cross-correlation is 1/ √ 2880 ≈ 1.81 × 10 −2 , which is represented with the red dashed line in the figure.

FIG. 11 FIG. 12 .
FIG. 11.(a) The blue solid line shows the 50 th percentile of the PSDs measured by the tiltmeter during the period May 01 -07, 2023.The 10 th , and 90 th percentiles are shown with the blue band.(b) Normalized cross-correlations between the 30 geophones at the NEB and the tiltmeter for the frequency band 10 -35 Hz corresponding to May 01, 2023.The red dashed lines show the level of significant cross-correlation.

FIG. 13 .
FIG. 13.(a) The blue and the red curves show the PSDs of the tiltmeter signal and the signal reconstructed by applying the Wiener filters on the geophone data.(b) Noise cancellation in units of decibels achieved for the tiltmeter signal corresponding to five frequency bands.(c) Noise cancellation in units of decibels achieved for the tiltmeter signal corresponding to four of the sharp spectral peaks.

FIG. 14 .
FIG.14.Noise cancellation in units of decibels achieved for the tiltmeter signal for the frequency band 21 -25 Hz corresponding to the two cases: i) when the Wiener filter is updated daily (blue curve), and ii) when the filter is updated every 1000 s (red curve).

FIG. 15 .
FIG. 15.Temporal evolution of magnitude of the Fourier transform of the Wiener filter corresponding to one of the geophones at the NEB.