1 Introduction

Terahertz (THz) waves are noninvasive, sensitive to free carriers and biomolecular vibrations, and transparent to many optically opaque materials. It has been widely applied in many fields including cultural relics conservation [1], agriculture [2], food security [3], communication [4, 5], and biomedicine [6]. THz imaging, often realized based on a THz time-domain spectroscopy (THz-TDS) system, combines THz spectroscopy and imaging that every spatial pixel contains a coherently detected complex spectrum. It provides multidimensional data in the spatial, temporal, and frequency domain, revealing lateral and axial properties of long-range intermolecular interaction, hydrogen bond rearrangement, and free carrier movement. As a result, THz-TDS imaging is an important technique in chemical recognition [7], medical diagnosis [8, 9], substance identification [10, 11], defect detection [12, 13], etc.

Due to the lack of efficient matrix detectors, raster scan is the primary imaging mode for THz-TDS. In this mode, THz optics or samples are equipped on a translational X–Y stage [14, 15]. Signals are recorded at each spatial position to form an image. Apart from raster scan, focal plane imaging by employing a CCD camera in the electro-optic sampling system or by using compress-sensing techniques can form coherent THz spectroscopic images without mechanical movements [16,17,18]. However, they require either an intense pumping laser or an efficient spatial modulator. In terms of the signal-to-noise ratio, bandwidth, and field-of-view, raster scan is still the best and most widely used mode for THz imaging.

Signals in THz-TDS are obtained by rapidly oscillating the optical delay line to sample the electric field in time-domain. The scanning speed, or equivalently the signal acquisition rate, is about few hertz to tens of hertz, depending on the stage type and scanning length. As a result, the most accurate way to perform an image scan is the “move-stop” mode. In this scheme, the X–Y stage moves to each target position and stops, waits until the signal has been fully sampled, and moves to the next position. However, such a scanning scheme is very low in efficiency as a majority of time is spent on the acceleration and deceleration without data acquisition. To obtain a high imaging speed, the continuous scanning mode is more widely used, in which the stage continuously moves along the X direction with a constant speed without a dwell time for every pixel. This then requires a synchronization between the optical delay line and the X stage, as shown in Fig. 1. The delay line sends a trigger signal when it starts sampling a THz signal. The X stage accelerates immediately. Because the acceleration time will not be exactly an integer multiple of the signal acquisition time, there will be a waiting range at which the X stage moves at a constant speed without signal acquisition until the next full THz signal arrives. After that, \(M\) numbers of signals are continuously recorded with \(M=\left\lfloor\frac{\left|{X}_{{\text{end}}}-{X}_{{\text{start}}}\right|}{{X}_{{\text{resolution}}}}\right\rfloor-1\). \(M\) decides the number of columns and this definition ensures that the X stage is still at a constant speed after the last signal has been obtained. The stage then moves by a small amount to Xend, decelerates, stops, and moves to the next row by moving in the Y direction by the amount specified by the Y resolution.

Fig. 1
figure 1

The scanning and synchronizing mechanism of THz imaging in THz-TDS systems

There are two optional scanning modes after that. In the one-way scanning mode, the X stage moves back to the start point of Xstart-Xacc and scans the next row in the same direction as the previous row. This method is most reliable as the trigger signal aligns different rows without any displacement, which is also adopted in most commercial THz-TDS imaging products. Obviously, the major limitation of the one-way scan is the low efficiency. There is no signal recording when the X stage moves back to the starting position; hence, nearly half of the scanning time is wasted. In a two-way scanning mode shown in the lower part of Fig. 1, the X stage moves backward to scan the next row in an opposite direction. This is most efficient as it saves the time moving back to the start point. However, displacements between different rows occur because the position when it starts recording the first signal mismatches with the position recording the last signal of the previous row. Comparisons between different scanning modes have been extensively studied in ref. [19], and a calibration method has been proposed but requires the accurate knowledge about the physical parameters of the stage and the delay line.

In practical experiments, other factors such as command sending delays or unstable stage acceleration could also lead to row displacements [19]. In addition, a precise trigger requires a synchronization connection between the optical delay line and the X-Y stage, which is not always accessible in many commercial THz-TDS systems. In that case, even the one-way scanning mode results in row displacements. In this article, we present a robust displacement correction method based on a genetic algorithm (GA) for THz imaging. We verify its performance from both theoretical simulation and experimental results. The algorithm enables employing the most efficient two-way scanning mode without a trigger and it is insensitive to technical delay factors. This paper is organized as follows. We will introduce the theory and algorithm design in Sect. 2. The numerical simulation and experimental verification will be demonstrated in Sect. 3. Conclusions are provided in Sect. 4.

2 Methods

2.1 Theory

According to the Fourier theory, the frequency spectrum of a signal indicates the frequency components needed to compose the signal in time or space. Fast-varying signals contain more high-frequency components and vice versa. Row displacements result in image distortion and a direct consequence is the increased variation along the column direction (i.e., between rows). Figure 2a plots a simulated image containing a rectangle, a triangle, and a circle. The number of rows is \(N=100\) and the number of columns is \(M=110\). Random row displacements were manually applied to all rows and the result is shown in Fig. 2b. Figure 2c and d are the natural logarithm amplitude of the Fast Fourier Transformed (FFT) spectrums of all columns of Fig. 2a and b, respectively. An obvious increase in the amplitude at high frequencies is observed in Fig. 2d, verifying the prediction that row displacement increases the rapid variation components along the column direction.

Fig. 2
figure 2

Comparison of the Fourier amplitude variations induced by row displacements. a Original and b displaced images by numerical simulation. Logarithm amplitudes of the Fourier spectrums along columns for the c original and d displaced images

In light of this, row displacements can be corrected by minimizing the high-frequency amplitude of the FFT spectrum. Therefore, we define function \({f}_{1}\) in Eq. 1 to evaluate the average high-frequency amplitude of the corrected image, which is normalized to that of the original image as:

$${f}_{1}=\frac{\sum_{k}{A}_{{\text{corr}}}}{\sum_{k}{A}_{{\text{org}}}}, k\ge \frac{N}{20}$$
(1)

where

$${A}_{{\text{corr}}}=\frac{\sum_{j=1}^{M}{\left|{\text{FFT}}{\{I}_{{\text{corr}}}\left(j\right)\}\right|}^{2}}{M}$$
$${A}_{{\text{org}}}=\frac{\sum_{j=1}^{M}{\left|{\text{FFT}}{\{I}_{{\text{org}}}\left(j\right)\}\right|}^{2}}{M}$$

where \(k\) is the frequency index, \(j\) is the column index, and \({I}_{{\text{org}}}\left(j\right)\) and \({I}_{{\text{corr}}}\left(j\right)\) are original and corrected image values of the \(j\) th column. \(N/2\) is the number of data points of the frequency spectrum. The average is applied over the frequency components with \(k\ge N/20\), which is an empirical value that takes the upper 90% frequencies into consideration. Figure 3 shows the averaged spectral amplitudes of the aligned and the displaced images shown in Fig. 2 a and b, respectively. The gray area indicates the region with \(k\ge N/20\) that is evaluated by Eq. 1.

Fig. 3
figure 3

Averaged spectral amplitudes of the aligned and the displaced images shown in Fig. 2 a and b, respectively. The gray area indicates the region with \(k\ge N/20\)

The alignment can be done by searching displacement index \(x\left(i\right)\) for all rows by minimizing \({f}_{1}\), where \(x\left(i\right)\) refers to the number of pixels that were displaced in the ith row. The positive or negative sign of \(x\left(i\right)\) indicates the displacement direction towards the start or the end of the row. To improve the robustness, we further define a second evaluation function \({f}_{2}\) as:

$${f}_{2}=\frac{\sum_{i=1}^{N}{\left|{\text{MovAve}}\left(x\left(i\right),11\right)\right|}^{2}}{N}$$
(2)

The key principle of this function is that the displacements \(x\) should have an average value of about 0 due to its random characteristics. However, we do not directly calculate the average of \(x\). Instead, we use \({\text{MovAve}}\left(x\left(i\right), 11\right)\) to calculate the moving average of \(x\left(i\right)\) for every adjacent 11 data points and use the average of the absolute values to evaluate the random distribution, where 11 is an empirical parameter. The difference compared to directly averaging all data points is that the applied method tends to reduce the average value of every 11 data points. In this way, every small section of the row displacements should distribute randomly. Figure 4 shows a simulated example. \(x\) represents a potential solution to correct a misaligned image. Overall, \(x\left(i\right)\) is randomly distributed over 50 rows with an average value very close to 0, as indicated by the dashed black line. However, the moving average of every 11 values is not always close to 0; hence, the average of the absolute value (i.e., the dashed yellow curve) will be large. The given example tends to move some sections of rows leftward and some sections rightward, which is unlikely to happen in practical situations; hence, it will result in a relatively large value of \({f}_{2}\).

Fig. 4
figure 4

A simulated example of displacement distribution \(x\left(i\right)\) and the comparison of direct average and moving average

The total evaluation function is a weighted sum of \({f}_{1}\) and \({f}_{2}\) by:

$$f={f}_{1}+{\alpha f}_{2}$$
(3)

where \(a\) is a weighting coefficient to balance the contribution of the two functions, which is defined as 0.2 in this work.

2.2 Algorithm

The image correction becomes a problem of searching \(N\) unknown displacement values \(x\left(i\right)\) by minimizing \(f\). This forms a multivariate optimization issue as \(N\) is typically large, depending on the image size, resulting in an issue of “curse of dimensionality” [20]. Meta-heuristic algorithms are suitable for large-scale multivariate optimization. GA is one such algorithm derived from the evolution theory [21], which has been widely applied in THz image denoising, characterization, and adaptive sampling [22,23,24,25]; hence, it is applied in this work.

Table 1 gives the steps of the algorithm. The first step of GA is initialization, which randomly assigns \(P\) groups (i.e., \(P\) populations) of displacement vector \({x}_{p}\left(i\right)\) with \(p\) (\(1\le p\le P\)) indicating the number of groups. We define \(P=40\sqrt{N}\), being proportional to the square root of the row number, such that the population size is automatically adjusted according to the number of rows needed to be corrected. \({x}_{p}\left(i\right)\) is randomly assigned within \(\pm d\) where d is an integer defining the maximum displacement. The values of \({x}_{p}\) represent genes in a population. As our system does not provide a trigger signal to align all rows, \(d=3\) is set in the experiment to allow a larger displacement correction. The second step evaluates the populations. The original image is corrected according to \({x}_{p}\left(i\right)\) by shifting the ith row by \({x}_{p}\left(i\right)\), resulting in \(P\) corrected images of \({I}_{p}\left(i,j\right)\). \({I}_{p}\left(i,j\right)\) and \({x}_{p}\left(i\right)\) are evaluated by Eq. 3 with \({f}_{p}\) representing their fitness. Subsequently, ranking, crossover, and mutation are standard steps of GA that mimic the natural evolution process to produce better solutions from the iteration. In detail, ranking sorts the populations in an ascending order according to \({f}_{p}\). Crossover randomly exchanges a part of the genes (i.e., \({x}_{p}\left(i\right)\)) between every two adjacent populations in the top half populations, producing a new offspring generation. The newly produced and the remained top half populations are sent to the next evaluation loop. The population size is kept unchanged in this way. Mutation randomly changes some of the genes in the produced population to maintain the divergency of the optimization. This avoids being tracked into local minimums. Steps 2–6 repeat until reaching the maximum iteration number. The population with the minimum \({f}_{p}\) in the last iteration is exported as the optimal solution of the algorithm.

Table 1 Process of the displacement correction algorithm

3 Verification

3.1 Simulation

We first verified the proposed method by the numerically simulated image shown in Fig. 2. The proposed algorithm was applied to the misaligned image and the result is shown in Fig. 5a. The corrected image coincides nicely the original image, showing that the algorithm has successfully compensated the displacements. Figure 5b gives the quantitative comparison between the simulated \(x\left(i\right)\) applied to the misaligned image and the values found from the algorithm (for rows with \(20\le i\le 80\) that contain amplitude variations). All values are perfectly matched, demonstrating that the evaluation function has successfully guided the optimization towards the correct row positions and the algorithm is efficient in multivariate optimization.

Fig. 5
figure 5

Numerical verification of the algorithm. a Corrected image of the displaced image shown in Fig. 2b. b \(x\left(i\right)\) applied to the displaced image and the values found from the algorithm

3.2 Experiments

The algorithm was further verified experimentally. In the first experiment, transmission images from a metal resolution panel with different patterns were used, containing circles, rectangles, and a radial pattern. These images are quasi-binary as metal has a transmission of 0 and the air pattern region has a transmission of 1, providing a high contrast that any little displacement can be clearly observed. In the second experiment, a tooth slice transmission image was used to verify the performance on samples with irregular shape and gradual spatial variation. All the experiments were conducted based on the THz-TDS system (Menlo System TeraSmart). Samples were assembled on the X-Y stage (OptoSigma, OSMS20-85(XY)-M6).

Figure 6a shows the measured THz image of the radial pattern formed by six triangles and Fig. 6c shows the image of two closely spaced circles with a diameter of 1 mm, respectively. The scanning step was set as 0.1 mm in both the X and Y directions. We plot the images by the amplitudes at 1.3 THz, which provides a sufficiently high spatial resolution to clearly resolve the displacements. Misaligned rows can be clearly seen due to the two-way scanning mode and the lack of a trigger as explained above. Figure 6b and d show the corrected images by the proposed algorithm. The jagged borders in the original image have been mostly eliminated. The shapes of triangles and circles were well reconstructed by the correction. Very small misalignment can be seen from the weakly distorted middle vertical line at the center of Fig. 6b, which is likely due to the limited spatial amplitude variation along the row direction; hence, the displacements have limited influence on the evaluation function. Overall, the results demonstrate that our algorithm works well for correcting images of different spatial shapes. The algorithm takes 23 s and 2 s on a general PC (Intel i5-10500, Ram 16 GB) for the radial-pattern image (101 rows) and the circle-pattern image (31 rows), which is sufficiently fast for data analysis.

Fig. 6
figure 6

THz images of radial and circular patterns at 1.3 THz. a Measured and b corrected radial-pattern images. c Measured and d corrected circular-pattern images

We further apply the algorithm to images shown in Fig. 7 a and c, which contain rectangular bars with edges perpendicular and 45° to the X axis, respectively. The corrected results shown in Fig. 7 b and d verify the stable performance of the algorithm on different sample orientations. Being insensitive to image rotations is important in practical imaging applications. However, slight distortion can be observed, for example, the slightly curved edge near the top left and bottom right of Fig. 7d. The parameter setting of the evaluation function could be the main reason and the correction can be improved by optimizing these parameters.

Fig. 7
figure 7

THz images of rectangular bars perpendicular and 45° to the X axis at 1.3 THz. a Measured and b corrected perpendicular bar images. c Measured and d corrected 45° bar images

In the multivariate optimization, robustness is an important aspect of the algorithm, which evaluates the variation of the optimal solution in different independent runs. To investigate the robustness, we repeated 10 independent runs for the above four images in Figs. 6 and 7, and analyze the variation of the optimal solutions. Let \({x}_{r}\left(i\right)\) denote the optimal solution in the rth run, and \(\overline{x }\left(i\right)\) be the average of the 10 runs, Eq. 4 defines the standard deviation \(\sigma \left(i\right)\) of \({x}_{r}\left(i\right)\) as:

$$\sigma \left(i\right)=\sqrt{\frac{1}{10}\sum_{r=1}^{10}{\left({x}_{r}\left(i\right)-\overline{x }\left(i\right)\right)}^{2}}$$
(4)

where

$$\overline{x }\left(i\right)=\frac{1}{10}\sum_{r=1}^{10}{x}_{r}\left(i\right)$$

The average of \(\sigma \left(i\right)\) over the rows within the set of \(I\) was calculated as:

$$\tilde{\sigma }=\sum_{i\in I}\sigma \left(i\right)$$
(5)

where \(I\) contains rows with a minimum amplitude variation greater than 0.1. This is because some rows, for example, the first and the last few rows in Fig. 7a, have a constant amplitude along the row direction; the displacement has no influence on the image and cannot be accurately determined. \(\tilde{\sigma }\) of images with different patterns is shown in Fig. 8, which has values all below 0.15, corresponding to an average positioning error less than 0.15 × 0.1 mm = 0.015 mm. The values for the circular and the rectangular bar images are even close to zero, showing a high robustness of the optimization algorithm.

Fig. 8
figure 8

Average standard deviation of \({x}_{r}\left(i\right)\) for images with different patterns

We further apply the algorithm on a tooth slice sample. The optical image is shown in Fig. 9a. The sample contains the dentin, enamel, and epoxy resin embedding material. It has a thickness of about 400 μm and was fixed on a low-density sponge having a THz transmission of ~ 1 due to the 99% porosity. Figure 9b shows the THz transmission image of the sample at 1.3 THz. Compared to the standard metal resolution panel, it is less regular in shape and the transmission values vary more gradually in space. The misalignment has led to jagged irregular borders, resulting in difficulties in margin recognition and region segmentation, especially for the enamel area. Figure 9c plots the image after correction, showing a continuous and smooth border. The sample structure and borders coincide well with the optical image. The boundaries between different regions are clearly distinguished. The narrow enamel region is now well resolved and continuous in space, which has a much better contrast to the dentin region compared to the optical image due to the high sensitivity of THz light to the degree of mineralization [26]. The results demonstrate the great performance of the algorithm on practical samples with arbitrary shapes and transmission values.

Fig. 9
figure 9

Tooth slice sample. a Optical image. b Measured and c corrected THz images at 1.3 THz

4 Conclusions

In this article, we propose a row displacement correction algorithm for accurate and high-speed THz raster scanning imaging. The algorithm is built on two theoretical assumptions. First, the row displacements generate high-frequency components in the Fourier spectrum along the column direction. Second, the displacements of different rows are random and should have an average value close to 0. We defined the evaluation function according to the above theory and determined the displacements based on a GA. We verified the algorithm performance via both the numerical simulated and experimental images. The simulation quantitatively verifies the perfect reconstruction of the image. The experimental results of the metal panel demonstrate the effectiveness for samples of different shapes and orientations. The application on a tooth slice further verifies the great performance on practical irregular samples. The statistical analysis on multiple algorithm runs validates the algorithm robustness.

The algorithm can correct row displacements regardless of the causing factors, including the two-way scan, the lack of trigger, and any other technical misalignment factors. Especially, the use of the two-way scanning mode without a need of trigger signal maximizes the scanning speed and the versatility of the algorithm. It removes the need of the stage synchronization with the delay line and simplifies the program design. Therefore, it can be widely applied in various THz systems to provide a high scanning efficiency and image quality. The proposed method also has some limitations. One is the pixel losses and hence a reduced field-of-view due to the need of edge-cutting after the alignment. This is because row boundaries are not located within the same column after being shifted by different numbers of pixels, and columns with incomplete pixels were discarded. However, we note that it only takes an additional 0.15 s for each row to compensate for the maximum 6 columns removed (assuming a 40-Hz signal acquisition rate). Another major limitation is the parameter optimization for the evaluation function, including the high-frequency threshold N/20, the moving average number, and the weight α of function \({f}_{2}\), which are determined empirically. Although they were proved good enough for most examples demonstrated in this work, they could be further optimized using a larger image dataset to provide a more objective setting and a better robustness.