1 Introduction

Coherence scanning interferometry (CSI) is a non-contact optical technique for measuring areal surface topography [1, 2]. CSI uses a spatially extended, spectrally broadband illumination, such that the interference fringes only occur in a small region around the surface along the axial direction, depending on the coherence length of the source and the numerical aperture (NA) of the microscope objective [3, 4]. CSI can measure a variety of surface types, from optically smooth to rough, and surfaces that have large height variations or discontinuities without the 2π ambiguity that can occur with phase-shifting interferometry [1, 3]. CSI has found broad applications in the semiconductor, optics, biomedical, automotive, and aerospace industries, such as fuel injection systems [5] and additively manufactured components [6, 7], as well as applications that incorporate the measurement of transparent film structures and dissimilar materials [8, 9]. Surface measurements with nanometer or even sub-nanometer accuracy are required in some applications, as is the case for extreme-ultra violet lithography [10]. In other applications, it is also desirable to measure the surface of a part directly on or close to the production line in the shop floor, e.g., on-machine and in-process surface metrology, such as the precision manufacturing of micro lenses [11].

CSI is a wide-field imaging technique. The images that contain interferograms are sequentially recorded by a camera during the axial scanning process. Each pixel of the camera records a low-coherence interference signal, from which the surface height is calculated corresponding to the lateral position defined by that pixel. Noise in the interferogram at each pixel is translated into height variations in the measured surface topography and is an influence factor that contributes to the uncertainty of surface measurement [12, 13]. Weak signals below the digitization limit can also result in lost or missing data points.

Random topographic measurement noise in CSI in an ideal environment is ultimately limited by the electronic noise in the camera [1, 3]. In practice, it is also important to consider environmental noise generated by floor vibrations, air turbulence, temperature fluctuations, and acoustics [13, 14]. Other sources of noise originate from the positioning uncertainty of the piezoelectric scanner [15] and from fluctuations in the intensity of the light source [16]. Although CSI can measure flat surfaces with sub-nanometer precision along the direction of surface height [17], the topographic measurement noise may increase when measuring steeply sloped surfaces [18, 19], or surfaces with low reflectance or significant surface roughness [20, 21]. Low-pass, field-averaging filters can reduce topography noise; however, this practice compromises lateral resolution, and filtering is ineffective at recovering lost data points attributable to weak signals [22].

In this work, we focus on two physical methods to improve the signal strength and to reduce topographic measurement noise in CSI at the expense of longer data acquisition times: (1) averaging a sequence of repeated surface topography measurements [14], here simply called averaging, and (2) sampling the raw signal data more densely during a single data acquisition, referred to in this paper as oversampling [1, 23]. Although the averaging and oversampling methods are established techniques, there is currently scarce quantitative information available in the literature to guide users as to when to use one or the other of these techniques. The literature is equally silent as to the expectations for improvement for specific environments and part types. It can come as a surprise, for example, that under certain conditions, oversampling of the interferometry signal has no tangible effect on the random noise in the final topography map. In other situations, averaging the topography improves the noise level but does little to capture more information from weakly reflecting surface areas.

The goal of the present work is to fill a gap between the theoretical benefits of noise reduction methods and the practical application of these methods in real-world circumstances. An effective illustration of the differences between averaging and oversampling is obtained by measuring a flat surface at different tilt angles. We provide an explanation for the different behaviors observed in the two methods when the surface is tilted. We developed a simple method to model the effects where the environment-induced vibration is considered. This paper provides guidance for CSI users to choose the appropriate method to optimize measurement accuracy.

2 Methods

2.1 Topography Averaging

A measured surface topography map \(M\) as a function of the spatial coordinates \(\left( {x,y} \right)\) contains the topographic information \(S\left( {x,y} \right)\) and the noise contribution to surface height \(f_{n} \left( {x,y} \right)\), i.e.

$$M\left( {x,y} \right) = S\left( {x,y} \right) + f_{n} \left( {x,y} \right).$$
(1)

Here we assume that the systematic measurement errors are either negligible or can be completely corrected, and the distribution of \(f_{n} \left( {x,y} \right)\) has a zero mean. Moreover, the surface measurement is assumed to be an ergodic random process, which means the ensemble average and time average are equal.

The averaging method is based on the creation of a mean surface topography map \(\overline{M} \left( {x,y} \right)\) from a sequence of \(N\) repeated surface topography measurements \(M_{i} \left( {x,y} \right)\), each acquired at the same position on the sample, one after another in quick succession:

$$\overline{M} \left( {x,y} \right) = \frac{{\mathop \sum \nolimits_{i = 1}^{N} M_{i} \left( {x,y} \right)}}{N}.$$
(2)

Substituting Eq. (1) into Eq. (2), and considering that the surface is unchanged during the measurement, we have

$$\overline{M} \left( {x,y} \right) = S\left( {x,y} \right) + \frac{{\mathop \sum \nolimits_{i = 1}^{N} f_{ni} \left( {x,y} \right)}}{N}.$$
(3)

The variance of the difference between the mean surface map \(\overline{M}\) and the true topography \(S\) gives the variance of the residual surface height deviation,

$${\text{Var}}\left[ {\overline{M} \left( {x,y} \right) - S\left( {x,y} \right)} \right] = {\text{Var}}\left[ {\frac{{\mathop \sum \nolimits_{i = 1}^{N} f_{ni} \left( {x,y} \right)}}{N}} \right].$$
(4)

Assuming the noise source is unchanged, we have

$${\text{Var}}\left[ {\frac{{\mathop \sum \nolimits_{i = 1}^{N} f_{ni} \left( {x,y} \right)}}{N}} \right] = \frac{{{\text{Var}}\left[ {f_{n} \left( {x,y} \right)} \right]}}{N}.$$
(5)

This result can be understood by referring to the central limit theorem [24]. Taking the square root of Eq. (5), the measurement noise of the mean surface topography is calculated as the standard deviation of the residual surface height deviation,

$$\sigma = \frac{{\sigma_{n} }}{\sqrt N }$$
(6)

where \(\sigma_{n} = \sqrt {{\text{Var}}\left[ {f_{n} \left( {x,y} \right)} \right]}\) is the topographic noise of a single surface measurement, i.e., \(N = 1\). The topographic measurement noise decreases by the square root of the number of measurements [25].

Since the total data acquisition time for a given averaged measurement is equal to the acquisition time \(t_{0}\) for a single data acquisition multiplied by \(N\), the topographic measurement noise will be reduced at the expense of longer data acquisition times. Consequently, the topographic noise \(\sigma\) will have an inverse square root dependence on the data acquisition time, i.e.,

$$\sigma \propto \frac{1}{\sqrt t }$$
(7)

where \(t = N \cdot t_{0}\) is the total data acquisition time. This behavior is a well-known characteristic of electronic devices and distance sensors [26].

2.2 Oversampling

In CSI, a digital camera captures the height-dependent interference data during an axial scan of the interference objective with respect to the object surface. A typical interferogram recorded by a pixel is shown in Fig. 1a. Usually, four sample points are used for sampling a fringe [27], equivalent to four camera frames per fringe and a sampling distance of λ/8.

Fig. 1
figure 1

CSI signal acquired a with a typical sampling rate of four camera frames per fringe and b with oversampling at eight frames per fringe (\(N = 2\))

When using oversampling, the number of camera frames per fringe along the axial direction is increased by sampling the fringe at smaller phase increments (see Fig. 1b) [1]. Reducing the sampling distance means a slower scan speed. In the case of oversampling, the integer \(N\) is the “oversampling factor” and increases the total data acquisition time in such a way that \(t = N \cdot t_{0}\). The number of camera frames per fringe is then \(N\) times the number of camera frames corresponding to a data acquisition that takes \(t_{0}\) to complete.

Sampling the interferogram with more sample points means the signal can be reconstructed with lower uncertainty, according to the central limit theorem. As with averaging, the measurement noise is expected to be reduced. More importantly, oversampling in CSI enhances the signal-to-noise ratio, which can significantly increase the number of valid data points for CSI measurements of surfaces with low reflectance or high roughness and steep slopes [6, 23].

2.3 Instrument and Materials

A Zygo NexView™ NX2 interference microscope performed the measurements presented in this work. The instrument was located in a metrology laboratory with a controlled temperature of (20 ± 1) °C. The specifications of the investigated objective lenses are shown in Table 1.

Table 1 Optical parameters for the investigated objective lens

The object surface for evaluating the measurement noise was a SiC reference flat with a certified root mean square (RMS) roughness of 0.1 nm over a 40-mm aperture. The reference flat was measured at \(\theta\) tilt angles, where \(\theta = 0^\circ , 2^\circ , 4^\circ\). The axial scan range was 100 µm for all the measurements. The data acquisition was completed in approximately 7.5 s at the nominal scan speed of 13.4 µm/s. The surface topography was reconstructed using the frequency domain analysis method that uses both the coherence envelope and the phase of the interference fringes to locate the surface [1, 2].

2.4 Evaluation of Measurement Noise

In order to minimize the impact of other systematic errors in CSI, such as lateral distortion [28] and retrace error [29], we extracted the central areas of the surface topography maps for noise evaluation. The size of the extracted area is defined by 500 × 500 image points, corresponding to (779 × 779) µm for the case of the 5.5 × lens, and (217 × 217) µm for the case of the 20 × lens.

The topographic measurement noise was estimated by the subtraction method [14, 30]. The subtraction method requires two surface topography maps, \(M_{1}\) and \(M_{2}\), respectively, each acquired at the same position on the sample with the shortest possible time difference between measurements. The resulting difference topography map \(\Delta M = M_{1} - M_{2}\) should only contain information about the measurement noise. A high-pass areal Gaussian filter [31] with a cut-off spatial wavelength of 80 µm is applied to \(\Delta M\) to reduce the residual systematic form errors. The experimental topographic measurement noise \(\sigma_{\text{e}}\) is calculated as the standard deviation of \(\Delta M\), which is then divided by the square root of 2, thus

$$\sigma_{\text{e}} = \frac{1}{\sqrt 2 }\sqrt {\frac{1}{{L_{i} L_{j} }}\mathop \sum \limits_{i = 1}^{{L_{i} }} \mathop \sum \limits_{j = 1}^{{L_{j} }} \Delta M\left( {x_{i} ,y_{j} } \right)^{2} } ,$$
(8)

where \(L_{i,j}\) is the total number of image points in \(x\) and \(y\) directions.

When the averaging method was used, 16 individual measurements were performed at the nominal scan speed of the instrument. N measured surfaces were averaged to calculate the mean measurement, where \(N = 2, 3, \ldots , 8\). Two sets of mean measurements were generated to evaluate measurement noise using Eq. (8). When the oversampling method was used, a series of values for the factor N were considered, where \(N = 2, 3, \ldots , 8\). For each oversampling factor, two repeated measurements were taken and the topographic measurement noise was evaluated using Eq. (8).

2.5 Noise Density

By using the concept of noise density for surface measurement, it is possible to evaluate performance independently of the scan speed and areal filtering [26]. The noise density is defined as

$$\eta_{M} = \frac{{\sigma_{\text{e}} }}{{\sqrt {P/t} }} ,$$
(9)

where \(\sigma_{\text{e}}\) is the topographic measurement noise that can be obtained using Eq. (8), P is the number of uncorrelated image points in the field of view and t is the total data acquisition time. If a 3 × 3 pixel denoising filter is used, the nine neighboring pixels will be correlated, and the number of uncorrelated image points P will be reduced by a factor of nine. From Eq. (9), we have that

$$\sigma = \frac{{\beta_{M} }}{\sqrt t },$$
(10)

where \(\beta_{M} = \eta_{M} \sqrt P\). The coefficient \(\beta_{M}\) and noise density \(\eta_{M}\) have the same unit \({\text{nm}}/\sqrt {\text{Hz}}\).

3 Results Over a Range of Surface Tilts

3.1 Noise as a Function of Data Acquisition Time

We started with measuring the SiC flat at a 0° tilt angle. The topographic measurement noise shows an inverse square root dependence on the data acquisition time (Fig. 2) for both averaging and oversampling methods, in agreement with the result recently reported elsewhere [26]. The measurement noise for both objectives when \(N = 1\) is of the order of 0.2 nm. The observed measurement noise levels are almost the same for both averaging and oversampling methods.

Fig. 2
figure 2

Experimental results: measurement noise for the SiC flat measured at a 0°, 2°, and 4° tilt angles. Each line corresponds to the least squares fit using Eq. (10). The coefficient \(\beta_{M}\) for each fitted line is shown in the legend

The surface topography repeatability specification of the CSI instrument is 0.12 nm for the scan speed of 7.2 μm/s, 1 million image points, with a 3 × 3 pixel denoising filter engaged. A consistent comparison between this performance specification and our results expressed in terms of noise density \(\eta_{M}\) is shown in Table 2. The small difference in noise level between 5.5 × and 20 × lenses is likely to be attributable to the differences in instrument setup and in the characteristics of the two lenses, including for example, the fringe contrast.

Table 2 Noise density for the SiC flat measured at a 0° tilt angle

In this study, we disengaged the default 3 × 3 denoising filter of the instrument. In principle, a 3 × 3 filter reduces random pixel noise by a factor of three. Therefore, the noise density for our results is approximately three times the noise density of the manufacturer’s specification.

When measuring the flat at a 2° and 4° tilt angles using simple averaging, the measurement noise shows a similar inverse square root dependence on the data acquisition time relative to the 0° tilt scenario, although at a slightly higher level. In contrast, when using oversampling in place of averaging with the flat at tilted positions, the measurement noise no longer follows the trend of improving with measurement time. The observed phenomena will be discussed further in the following section.

3.2 Correlated and Uncorrelated Noise

The discrepancy in the results for averaging and oversampling is best understood by examining the difference topography maps (i.e., \(\Delta M\)). Figure 3a shows the result of subtracting two successive individual images with an oversampling factor \(N = 8\), with the sample flat at 0° tilt. Figure 3b shows the difference map for the same data acquisition parameters but with 4° tilt along the horizontal axis. Figure 3a shows essentially random noise from pixel to pixel, whereas Fig. 3b shows stripes that are clearly correlated. While we might expect the random noise to be reduced with oversampling, the patterns in Fig. 3b for a tilted flat are unlikely to be reduced and are the result of a disturbance to the scanning motion caused by vibration, acoustics, or air turbulence. To test this hypothesis, we measured the vibration separately and simulated the results.

Fig. 3
figure 3

Difference between two successive individual measurements acquired using oversampling for the SiC flat at a a 0° tilt angle and a b 4° tilt angle along the horizontal axis

3.3 Evaluation of Environment-Induced Vibration

The environment-induced vibration was evaluated using the SiC flat as the reference surface. By nulling the fringes and using a carrier fringe technique [32] built into the instrument, the instantaneous position of the reference surface is determined as a function of time. The fastest camera mode available in the instrument was used for the evaluation, which allows a sampling frequency of 800 Hz.

In total, 40 repeated tests were performed at the beginning and at the end of each experimental measurement session. Each test consists of 1024 sampling intervals, recording a time lapse of 1.28 s, as shown in Fig. 4a–c. The Fourier transforms of the environment-induced vibration profiles in the time domain provide the corresponding vibration amplitude spectra (see Fig. 4d–f). The vibration amplitude spectra show a resonance spike at 84 Hz, which corresponds to the vibration produced by the cooling fans of other systems in the laboratory. Lower spikes were identified at 18, 198, and 336 Hz, possibly corresponding to the vibration of motorized equipment and machinery operating in the building.

Fig. 4
figure 4

Evaluation of the environment-induced vibration. ac Example measured environment-induced vibration profiles; df the corresponding amplitude spectra

3.4 Simulation of Surface Measurement and Environment-Induced Vibration

We simulate the surface measurement by considering the measured environment-induced vibration to verify the observed different behaviors of the averaging and oversampling methods for tilted surfaces. As a first step, the mean vibration amplitude spectrum is calculated by averaging the 40 vibration amplitude spectra (see Fig. 5a). Assuming there are no significant low-frequency contributions, the mean vibration amplitude spectrum is interpolated. Then, we give a random phase (distributed between − π and π) to each frequency component of the interpolated spectrum. The inverse Fourier transform of the complex-valued spectrum gives a simulated environment-induced vibration profile with increased length in the time domain. This process is repeated to generate a series of simulated environment-induced vibration profiles which have the same spectrum but are different in the time domain.

Fig. 5
figure 5

Simulation of surface measurement. a The mean and interpolated vibration amplitude spectra; b an arbitrary simulated environment-induced vibration profile; c the (simulated) measured surface profile; d the (simulated) areal surface topography measurement

The simulated environment-induced vibration profile in the time domain is then converted to a spatial signal by multiplying with the axial scan speed of 13.4 µm/s, to obtain the environment-induced vibration profile as a function of the axial scan position \(z\) (Fig. 5b).

To simulate the surface measurement, the nominal surface profile is defined by a sequence of coordinate points \(\left( {x,z} \right)\) along the horizontal and vertical axes, respectively. For a flat surface, \(z = ax\), where \(a\) is the slope and equal to \(\tan \theta\). Then, the environment-induced vibration profile is added to the nominal surface profile (see Fig. 5c).

The areal surface measurement is simulated by synthesizing the replicated simulated profiles along the \(y\) direction (see Fig. 5d). Here, we assume that the surface points at the same height position experience the same environment-induced vibration. The difference between the two methods is perhaps best understood by emphasizing the difference between noise contributions that are fully random between image points, which is the case for example by the noise contribution from the camera, and noise that is correlated over many pixels. The environment-induced vibration tends to generate correlated noise that may not be improved by lateral filtering or oversampling.

The averaging and oversampling methods were evaluated using the simulated surface measurement. The simulation helps us understand the phenomenon that was observed in Sect. 3.1. To keep the simulation consistent with the experimental method, we considered 500 × 500 surface points, which corresponds to (779 × 779) µm for the case of the 5.5×  lens, and (217 × 217) µm for the case of the 20×  lens. The same high-pass areal Gaussian filter (i.e., with a cut-off spatial wavelength of 80 µm) was applied to the resulting difference topography maps (i.e., \(\Delta M\)) before using Eq. (8) for the evaluation of measurement noise.

In the case of a non-tilted flat (i.e. \(\theta = 0^\circ\)), the nominal surface height would have a constant height value, e.g., \(z = 0\), and the simulated environment-induced vibration would also have a constant \(z\) value, since it is a function of the vertical scan position \(z\). Disregarding the effects of camera electronic noise, the surface topography measurement noise would be zero in the simulation of a non-tilted flat.

When simulating the averaging method for a 2° and 4° tilted flat (see Fig. 6), the measurement noise results show an inverse square root dependence on the data acquisition time, in good agreement with the experimental results. When simulating the oversampling method, the measurement noise does not follow the inverse square root dependence on the data acquisition time, again similar to the experimental results.

Fig. 6
figure 6

Simulation results: measurement noise for a flat surface measured at 2° and 4° tilt angles. Each line corresponds to the least squares fit using Eq. (10). The coefficient \(\beta_{M}\) for each fitted line is shown in the legend

4 Conclusions

This paper contributes to the understanding of the mechanisms of the two noise-reduction methods and compares their effects on surface topography measurement. It is clear from both experimental and simulation results that the measurement noise levels rise with increasing surface tilt and height variation.

The topography averaging method is effective for reducing all sources of noise, such as environment-induced vibration and camera noise, regardless of surface tilt. Our experiments and simulations confirm that the noise can be reduced at a rate given by the square root of the number of averages, or equivalently, by the square root of the total data acquisition time.

The signal oversampling method has the same noise reduction effect, but in the presence of vibration, this conclusion is valid only when the part is a flat with zero tilt.

However, these two methods are not simply competing ways to reduce measurement noise. The averaging method reduces noise, but does little to capture more data points. The oversampling method allows us to pull weak signals out of noise for each individual data acquisition, e.g., for surfaces with high slopes and roughness, and materials with low reflectivities [6]. Although for such surfaces or a tilted flat the noise reduction effect is compromised for the oversampling method, the benefit of capturing weak signals is preserved.

This knowledge is important, as CSI technologies are increasingly applied to difficult surface structures and in environments that resemble production areas more than metrology labs.