1 Introduction

In the last decade GPR technology has emerged as a popular research topic. Fields, where GPR applications are being considered or already successfully present, are quite miscellaneous: construction industry, archeology, sedimentology, military technology—to mention a few [11, 21, 24].

It is worth explaining that there are three main types of GPR images (radargrams). The simplest variant is an A-scan, being a single GPR profile defined over the time axis only (directed inwards the ground). A linear collection of A-scans along some direction forms a B-scan. A collection of A-scans over a certain area, which also can be treated as a linear collection of B-scans, forms a C-scan, i.e., a three-dimensional image, with coordinates system typically defined as across track \(\times\) along track \(\times\) time. The time axis can be intuitively associated with the depth.

Fig. 1
figure 1

Two examples of landmine detections in our GPR system: metal AT mine (top row), plastic AT mine (bottom row). The scene with the plastic mine contains additionally a metal box with cables, serving as a distraction object for the detector

In general, any buried objects, which are non-transparent to GPR waves, produce in C-scans the patterns being combinations of hyperboloids (resembling bowl-like shapes). For metal landmines at least two strong hyperboloids are usually visible, related to top and bottom surfaces of the mine casing. On the other hand, plastic mines are typically less clearly visible in the image. They produce thinner and more subtle shapes in radargrams. Sometimes, more details of a mine and its casing can be seen (rendered as smaller hyperboloids), but this depends on several aspects like: antenna system, GPR bandwidth, C-scan resolution, soil type and humidity, mine sizes and ground clutter. Figure 1 presents two examples of C-scans generated by our GPR system and detections of antitank (AT) landmines in them.

As regards algorithmic approaches to mine detection task met in the literature, one should look at them keeping in mind a distinction between two stages: (1) features extraction and (2) learning and classification algorithms. As regards the latter, quite many state-of-the-art methods have been tried out, e.g., Naive Bayes and LVQ in [6], neural networks in [10], least squares curve fitting in [9, 26], HMMs in [14, 17, 26] or ensemble classifiers in [12, 22]. Yet, it seems, in general, that the final success is less dependent on the choice of learning algorithm and more dependent on the quality of images and features extracted from them.

It is worth to mention that the process of features extraction for GPR applications is often accompanied by auxiliary techniques, such as hyperbola or ellipse detection. In order to reduce hyperboloids to hyperbolas or ellipses, the C-scans must be preprocessed and suitable B-scans or time slices must be selected out. For example, Milisavljević et al. [16] detected hyperbolas via Hough transform. Zhu and Collins [26] used polynomial curve fitting. Later, hyperbola characteristics or polynomial coefficients served as features for machine learning. In [26], authors measured also (as additional features) the intensities of diagonal and antidiagonal edges of hyperbolas. As regards the features that can be extracted from ellipses in time slices, Yarovoy et al. [25] measured e.g., horizontal position (from ellipse center), dielectric permittivity of the ground (from the increase in ellipse size), depth of burial (from time delay and calculated ground permittivity).

1.1 Motivation and contribution

The main motivation for our research was to work directly on C-scans and thereby to focus on features describing three-dimensional shapes. Obviously, a dense scanning/detection procedure carried out over a 3D image of high resolution is computationally expensive, because for every position of the scanning window calculations related to features extraction and classification must be performed.

In this paper we make an attempt to apply statistical moments as features. Various applications of 2D statistical moments are known from computer vision—quite many of them met in the field of optical and handwritten character recognition, see e.g., [1, 4], but also in a general object detection setting [7, 13]. We want to check the applicability of 3D statistical moments to landmine detection.

The main contribution of the paper is an idea to speed up the extraction of moments for each image window by means of multiple integral images, calculated once, prior to the detection procedure. One may come across publications where a similar idea is applied in 2D cases, especially in the context of variance or covariance calculations [19, 20]. Yet for some reasons, statistical moments of still higher orders, supported by integral images, can hardly be met, although the technique can be extended in a straightforward manner. We derive suitable formulas in the paper.

1.2 Organization of this paper

The rest of this paper is organized as follows. Section 2 pertains to computational aspects of statistical moments in the context of detection tasks. In Sect. 2.1 we shortly review the so-called central statistical moments and define their 3D variant suitable for our images. Section 2.2 demonstrates the main contribution of the paper, namely, the technique to extract the moments fast—in constant time. Sections 2.3 and 2.4 discuss some technical details related to the contribution, such as: preparation of integral images and generation of features by windows partitioning, respectively. Section 3 is the experimental section. It describes an application of the proposed method to landmine detection based on GPR, in particular: the hardware of our prototype radar, measurements collected from different scene variations, feature spaces and data sets, and the machine learning setup. Finally, the section discusses results of tests (10-fold cross-validation) with the focus on: error rates, ROCFootnote 1 curves and time performance. Section 4 summarizes the paper.

Additionally, we encourage the reader to study Appendix 1, in which we compare our results against the ones obtained on the same learning material by a benchmark method due to Torrione et al. [22].

2 Statistical moments and integral images

2.1 3D statistical moments

A good intuition on statistical moments (working as image features in recognition or detection tasks) can be gained by thinking first of moments for continuous probability distributions. For the 2D case the central continuous moments weighted by a density function f are

$$\begin{aligned} \mu ^{p,q}=\int \limits _{-\infty }^\infty \int \limits _{-\infty }^\infty \left( x-\mu ^{1,0}\right) ^p \left( y-\mu ^{0,1}\right) ^q f(x,y) \,dx \,dy, \end{aligned}$$
(1)

where pq define the moment order variable-wise, and here \(p+q\ge 2\); the moments of order one are

$$\begin{aligned} \mu ^{1,0}&=\int \limits _{-\infty }^\infty \int \limits _{-\infty }^\infty x f(x,y) \,dx \,dy,\end{aligned}$$
(2)
$$\begin{aligned} \mu ^{0,1}&=\int \limits _{-\infty }^\infty \int \limits _{-\infty }^\infty y f(x,y) \,dx \,dy. \end{aligned}$$
(3)

As regards moments for images, the integrals are replaced by sums weighted by pixel intensities (instead of a density).

In the setting of our landmine problem, we firstly need to account for the 3D case we have, and secondly we need to define moments for image windows (cuboids) not the whole images. Thus, we shall define 3D normalized central moments independent of the window position and size.

Let i denote the 3D image function (a C-scan). The point value i(xyt) represents the image intensity over coordinates (xy) for the time moment t. For a window spanning from \((x_1,y_1,t_1)\) to \((x_2,y_2,t_2)\) we define the moments of interest as follows

$$\begin{aligned}&\mu ^{p,q,r}_{\begin{array}{c} x_1,y_1,t_1\\ x_2,y_2,t_2 \end{array}}= \sum _{x_1\le x \le x_2} \sum _{y_1\le y \le y_2} \sum _{t_1\le t \le t_2} \left( \frac{x-x_1}{x_2-x_1} - \mu ^{1,0,0}_{\begin{array}{c} x_1,y_1,t_1\\ x_2,y_2,t_2 \end{array}}\right) ^p\nonumber \\&\quad \left( \frac{y-y_1}{y_2-y_1} - \mu ^{0,1,0}_{\begin{array}{c} x_1,y_1,t_1\\ x_2,y_2,t_2 \end{array}}\right) ^q \left( \frac{t-t_1}{t_2-t_1} - \mu ^{0,0,1}_{\begin{array}{c} x_1,y_1,t_1\\ x_2,y_2,t_2 \end{array}}\right) ^r \cdot i(x,y,t)/S, \end{aligned}$$
(4)

where \(S=\sum _{x_1\le x \le x_2} \sum _{y_1\le y \le y_2} \sum _{t_1\le t \le t_2} i(x,y,t)\), and the moments of order one are

$$\begin{aligned} \mu ^{1,0,0}&=\sum _{x_1\le x \le x_2} \sum _{y_1\le y \le y_2} \sum _{t_1\le t \le t_2} \frac{x-x_1}{x_2-x_1}\cdot \frac{i(x,y,t)}{S}, \end{aligned}$$
(5)
$$\begin{aligned} \mu ^{0,1,0}&=\sum _{x_1\le x \le x_2} \sum _{y_1\le y \le y_2} \sum _{t_1\le t \le t_2} \frac{y-y_1}{y_2-y_1}\cdot \frac{i(x,y,t)}{S}, \end{aligned}$$
(6)
$$\begin{aligned} \mu ^{0,0,1}&=\sum _{x_1\le x \le x_2} \sum _{y_1\le y \le y_2} \sum _{t_1\le t \le t_2} \frac{t-t_1}{t_2-t_1}\cdot \frac{i(x,y,t)}{S}. \end{aligned}$$
(7)

We remark that the aforementioned normalization is related to the presence of terms \((x-x_1)/(x_2-x_1)\) (similarly for y, t), due to which our moments take values in the \([-1, 1]\) interval.

2.2 Calculations of moments via integral images

Let us now reformulate the moments in terms of integral images and their growths. First, we define a general 3D integral image \(ii^{p,q,r}\) (of order \(p + q + r\)) as

$$\begin{aligned} ii^{p,q,r}(x,y,t)=\sum _{1\le j \le x} \sum _{1\le k \le y} \sum _{1\le l \le t} x^p y^q t^r i(x,y,t). \end{aligned}$$
(8)

We give an induction algorithm to calculate \(ii^{p,q,r}\) in the next subsection.

Next, it is useful to define the growth operation for integral images. Growths shall later serve as an economic way (constant time) to calculate sums of suitable moment-related terms weighted by pixel intensities in image windows. In the 3D case, growths can be expressed using only 8 elements of the integral image. For a window spanning from \((x_1,y_1,t_1)\) to \((x_2,y_2,t_2)\) the growth can be defined e.g., as

$$\begin{aligned}&\varDelta _{\begin{array}{c} x_1,y_1,t_1\\ x_2,y_2,t_2 \end{array}}(ii)=ii(x_2,y_2,t_2)-ii(x_1-1,y_2,t_2) \nonumber \\&\quad -ii(x_2,y_1-1,t_2)+ii(x_1-1,y_1-1,t_2)\nonumber \\&\quad -\Bigl (ii(x_2,y_2,t_1-1)-ii(x_1-1,y_2,t_1-1)\nonumber \\&\quad -ii(x_2,y_1-1,t_1-1)+ii(x_1-1,y_1-1,t_1-1)\Bigr ), \end{aligned}$$
(9)

where ii stands for some integral image.

The following proposition constitutes the main algorithmic contribution of the paper.

Proposition 1

Given a maximum order \(N\ge 0\) of moments, suppose the set of integral images \(\{ii^{p,q,r}\}\), \(0\le p,q,r\le N\), defined as in (8), has been calculated prior to the detection procedure. Then, for any cuboid in the image, spanning from \((x_1,y_1,t_1)\) to \((x_2,y_2,t_2)\), each of its statistical moments can be extracted in constant timeO(1)—regardless of the number of pixels within the cuboid, as follows:

$$\begin{aligned} \mu ^{p,q,r}_{\begin{array}{c} x_1,y_1,t_1\\ x_2,y_2,t_2 \end{array}}&=\frac{1}{\varDelta _{\begin{array}{c} x_1,y_1,t_1\\ x_2,y_2,t_2 \end{array}}(ii^{0,0,0}) (x_2-x_1)^p(y_2-y_1)^q(t_2-t_1)^r}\nonumber \\&\cdot \sum _{j=0}^p \sum _{k=0}^q \sum _{l=0}^r (-1)^{p+q+r-j-k-l}\left( {\begin{array}{c}p\\ j\end{array}}\right) \left( {\begin{array}{c}q\\ k\end{array}}\right) \left( {\begin{array}{c}r\\ l\end{array}}\right) \nonumber \\&\quad \cdot \left( x_1+\mu ^{1,0,0}_{\begin{array}{c} x_1,y_1,t_1\\ x_2,y_2,t_2 \end{array}}(x_2-x_1)\right) ^{p-j} \nonumber \\&\quad \cdot \left( y_1+\mu ^{0,1,0}_{\begin{array}{c} x_1,y_1,t_1\\ x_2,y_2,t_2 \end{array}}(y_2-y_1)\right) ^{q-k} \nonumber \\&\quad \cdot \left( t_1+\mu ^{0,0,1}_{\begin{array}{c} x_1,y_1,t_1\\ x_2,y_2,t_2 \end{array}}(t_2-t_1)\right) ^{r-l} \nonumber \\&\quad \cdot \underbrace{\sum _{x_1\le x \le x_2} \sum _{y_1\le y \le y_2} \sum _{t_1\le t \le t_2} x^j y^k t^l i(x, y, t)}_{\varDelta _{\begin{array}{c} x_1,y_1,t_1\\ x_2,y_2,t_2 \end{array}}(ii^{j,k,l})}. \end{aligned}$$
(10)

Proof

The proof is in fact a straightforward derivation from formula (4). First, the means (moments of order one), that are present under powers, should be multiplied by suitable unity terms: \(\mu ^{1,0,0}_{\cdot } \cdot \frac{x_2-x_1}{x_2-x_1}\), \(\mu ^{0,1,0}_{\cdot } \cdot \frac{y_2-y_1}{y_2-y_1}\), \(\mu ^{0,0,1}_{\cdot } \cdot \frac{t_2-t_1}{t_2-t_1}\). This allows to extract the denominators and form the normalizing constant \(1/\left( (x_2-x_1)^p(y_2-y_1)^q(t_2-t_1)^r\right)\) in front of the summation. Then, the powers are expanded by means of the binomial theorem, grouping the terms into the ones dependent on the current pixel index (xyt), namely the terms: \(x^j y^k t^l\), and the ones independent of it. Finally, by changing the order of summations one arrives at the equivalent formula (10). The underbrace indicates how the expensive summation over all pixels in the cuboid is replaced by the constant-time computation (cheap) of the growth of a suitable integral image: \(\varDelta _{\begin{array}{c} x_1,y_1,t_1\\ x_2,y_2,t_2 \end{array}}(ii^{j,k,l})\). Note also that the required normalizer S is calculated by the growth of the zero-order integral image \(S=\varDelta _{\begin{array}{c} x_1,y_1,t_1\\ x_2,y_2,t_2 \end{array}}(ii^{0,0,0})\). \(\square\)

For the sake of strictness, we should remark that though the calculations involved in (10) are constant time with respect to the number of pixels in a cuboid, they are polynomial with respect to the given moment order, represented by pqr. More precisely, the total number of operations is proportional to \((p+1)(q+1)(r+1)\) times seven additions (or subtractions) involved in the growth operator \(\varDelta _{\begin{array}{c} x_1,y_1,t_1\\ x_2,y_2,t_2 \end{array}}(ii^{j,k,l})\) as defined in (9).

2.3 Derivation of integral images: induction

The algorithm 1, presented below, is a form of induction and calculates any wanted 3D integral image \(ii^{p,r,q}\) from (8) in a single image pass; i.e., in \(O({n_x} {n_y} {n_t})\) time, where \(n_x\times n_y \times n_t\) represents the resolution of a C-scan. Therefore, if one imposes for the moments a maximal order N variable-wise, i.e., \(0\le p,q,r\le N\), then there are \((N+1)^3\) integral images to be calculated, and the overall cost becomes \(O\left( (N+1)^3 {n_x} {n_y} {n_t}\right)\).

figure a

2.4 Introducing more features by partitioning image windows

Up to now, we have formulated (for simplicity) the moments as being extracted from whole 3D windows. Given N as the maximal order, this approach implies that the total number of features is \((N+1)^3\). Unfortunately, that is also the number of integral images to be calculated, which for a larger N may constitute a considerable time cost. Recall that the calculation (10) of a single moment, though independent of the number of pixels, scales with pqr values. On the other hand, in practice we would like to have many features for learning and the final description of objects, e.g., of order \(10^3\) or \(10^4\)—as it is common in computer vision applications (for example in face detectors). To resolve this problem we propose a simple operation of window partitioning.

Imagine a 3D window is partitioned into a regular \(m\times m \times m\) grid of cuboids (later on in our GPR experiments, we try out \(m=3\) and \(m=5\)). The moments from now on shall be extracted from each cuboid. This will allow us to have a greater number of features, namely:

$$\begin{aligned} n=m^3 (N+1)^3, \end{aligned}$$
(11)

while keeping N (and implied extraction costs) fairly small. An illustration of the partitioning operation is shown in Fig. 2. Looking back at formula (10), one should understand that from now on that, with the partitioning applied, the coordinates \(x_1,y_1,t_1\) and \(x_2,y_2,t_2\) represent suitable bounding coordinates for a single cuboid within the grid (not for the whole 3D window).

Fig. 2
figure 2

Illustration of the partitioning grid for \(m=5\)

3 Measurements, experiments and results

3.1 GPR system and laboratory test stand

In our research project we have constructed a mobile GPR platform shown in Figs. 3 and 4. The platform contains the antenna system and a standard VNA (Agilent E5071C, inside the black case) as the core of the GPR. Successive B-scans are performed by the platform perpendicularly to its movement. The motion of the platform is remotely controlled by a joystick. Raw data from the scanning are transferred to a host computer through WiFi. The host is a standard PC with a server configuration (Xeon 2.4 GHz \(2\times 8\)-core, 64-bit 24 GB RAM, 2 TB of disk space), also equipped with an nVidia Tesla Quadro 6000 for extra computing power and graphics acceleration.

Fig. 3
figure 3

Mobile GPR platform on in-door laboratory test stand

Fig. 4
figure 4

Out-door test lane and exemplary B-scans of a mine along track collected over gravel

Stepped frequency continuous wave modulation was performed using sequentially generated commands of S-Parameters measurement transmitted to the VNA for any next frequency. Typically, for SFCW radars [18], the amplitude/phase responses are gathered for each discrete frequency transmitted. An appropriate number of these frequencies, covering an effective bandwidth, is needed to achieve required resolution for an A-scan. In our case the effective bandwidth was 12.7 GHz and was limited by the antenna system.

We use our own original antenna system. The transmitting antenna is a form of the Vivaldi type [2], and the receiving antenna has the shielded loop form [3]. The Vivaldi antenna gives good efficiency and directivity having a big enough aperture to cover a sufficient area with homogeneous lighting of microwaves. The loop antenna acts as a point field sensor with small internal ringing.

All the software for control, communication, learning and detection has been implemented by us in the C# programming language.

3.2 Measurements and scene variations

For convenience reasons, the main series of measurements meant to constitute the learning material was performed in in-door conditions over a container (of area \(\approx 1\,\text {m}^2\)) filled with a garden type of soil. Nevertheless, we should remark that our GPR vehicle has also been tested in out-door conditions, performing scans over four types of soil: peat, garden, sand, gravel. In all cases we managed to obtain suitably clear images, see an example in Fig. 4.

The objects of interest were two AT landmines: a metal one (TM-62M, height \(128\,\text {mm}\), diameter \(320\,\text {mm}\)) and a plastic one (PT-Mi-Ba III, height \(110\,\text {mm}\), diameter \(330\,\text {mm}\)). In the measurements we have also included negative objects, such as: metal cans and boxes, a large metal box with cables, a large round metal disk, a long metal shaft, a wooden box and building bricks. They were meant as disruptions and potential sources of mistakes for the detector. Examples of scanned scenes are depicted in Fig. 5.

Fig. 5
figure 5

Examples of scanned scenes before burial and variations on soil surface (bottom row)

The elevation of the antenna over the ground varied from 10 to 15 cm. As regards mine placements in the scenes, we varied their depths of burial from \(\approx 0\,\text {cm}\) (flash buried) up to \(15\,\text {cm}\) and their inclination angles approximately in the range \(0^\circ \pm 45^\circ\) in different directions. Mines lying flat or almost flat (\(0^\circ \pm 15^\circ\)) were however the most frequent in the collected material (as it is their natural way of placement).

We should mention that additionally we have experimented with different variations related to the surface of the ground after the objects were buried. Most of the scans were taken with the surface naturally shaped, but we have included also two other extremes: some scenes with the surface flattened down unnaturally with a shovel, and some scenes with unnaturally uneven surface with multiple holes, knolls or canyon-like shapes. Some of these variations are shown in the most bottom row of Fig. 5. It is known in GPR studies that strong surface variations may cause significant changes in the image (especially for high resolutions), propagating onto deeper time slices. Some of such image changes might even be mistaken for an actual object in an extreme case.

3.3 Experimental setup, data sets, learning algorithm

As the learning material collected was a set of 210 C-scans with a physical resolution of \(1\,\text {cm}\) (distance between two closest A-scans) and image resolution of \(92\times 91 \times 512\) (area of about \(1\,\text {m}^2\)). The whole material (210 scans) consisted of three groups: 70 scans with the metal mine (and possibly other objects), 70 scans with the plastic mine (and possibly other objects), 70 scans with non-mine objects only.

After some preliminary experimentations, we have decided to thoroughly test four sets of features (3D statistical moments), implied by the following parameterizations:

A.:

\(N=2\), \(m=3\) (total no. of features: \(n=729\)),

B.:

\(N=3\), \(m=3\) (total no. of features: \(n=1728\)),

C.:

\(N=2\), \(m=5\) (total no. of features: \(n=3375\)),

D.:

\(N=3\), \(m=5\) (total no. of features: \(n=8000\)).

We shall use the A, B, C, D naming of the feature sets when reporting the results.

A 10-fold cross-validation scheme was introduced. In every fold a testing pack consisted of: 7 metal mine scans, 7 plastic mine scans and 7 non-mine scans. Training packs were suitably 9 times larger, each containing 189 scans. Before the actual learning, each training pack was processed in a batch manner (images traversed with a scanning 3D window) and transformed to a data set consisting of multiple examples of positive and negative windows. The scanning window was of dimensions: \(w_x=67\times w_y=67 \times w_t=39\), and the traversal procedure was of full density, i.e., with one pixel shifts for the window \(dx=dy=dt=1\). Additionally, the window was allowed to move partially outside the image for xy variables, so that hyperbolic patterns of mines located near image borders could be sampled more appropriately (more centrally). Such overlaps onto the margins were set up to be equal at most \(15\%\) of the scanning window widths. Positive windows were memorized on the basis of positive object coordinates in the images, which we kept registered in an auxiliary file. Beforehand, the process of marking (determining) these coordinates was done visually by a human after each C-scan was taken (supervised learning). We have introduced a 2-pixel tolerance around a target for each xyt variable when memorizing positive windows. For negative window examples we had to use undersampling due to their great number. Please note that the majority of negative windows are repeated examples of the ground background, similar among each other; therefore, there is no need to memorize all of them. Such proceedings resulted in generation of large training sets with approximately \(10\,000\) positive and \(90\, 000\) negative window examples for each cross-validation fold.Footnote 2

The large number of learning examples and large feature spaces (up to 8000 features) made our machine learning settings similar to the ones known for example from training of face or body detectors. Therefore, we limited our selection of a learning algorithm only to boosting methods with several simple weak learners. It is known that boosting is well suited for large-scale data. Its properties like stagewise progression and mathematical connections to logistic regression make boosting strongly resistant to overfitting [8].

After initial experimentations with such algorithms as: AdaBoost + decision stumps, AdaBoost + bins, RealBoost + normal approximations, RealBoost + bins, RealBoost + decision trees, we have finally decided for that last variant. Observed error rates and ROC characteristics indicated that the RealBoost with decision trees was suited best for the characteristics of our GPR data. We have implemented shallow trees with at most 4 terminal nodes, trained by means of the well-known Gini index as the impurity criterion [8, 15, 23]. The final ensembles (for each CV fold) consisted of 600 weak classifiers.Footnote 3 This potentially made the ensemble use at most 1800 features, since each 4-terminal tree involves three inequality tests. In practice we have observed that about 1500 distinct features were present in an ensemble after the learning was finished.

To speed up the boosting procedure itself, we have also implemented the weight trimming technique described in [8]. After this modification the learning times for the most numerous features set D (8000 features) were \(\approx 2.0\,\text {h}\) per fold, as opposed to \(\approx 35\) h without weight trimming.

3.4 Results

For each set of features we have trained two separate detectors, one aimed to detect only the metal mine—let us refer to it in short as the “metal detector”—and the second aimed to detect only the plastic mine—the “plastic detector”.Footnote 4 At the testing stage, both detectors have been run separately on every test image.

Table 1 Cross-validation results for different sets of features

Detailed results of cross-validation are gathered in Table 1. In the table, cells reporting percentages of correct positive detections are marked with a “sensitivity” label, while cells reporting percentages of incorrect positive detections are marked with a “FAR” label (false alarm rate). We distinguish two types of false alarms: (1) proper false alarms—e.g., when a plastic mine or a non-mine object is detected incorrectly as a metal mine (or vice versa), (2) side false alarms—e.g., when some window in the image gets detected as a positive one, however it is not correctly focused on a positive object but rather on its side traces or deeper time slices (“echos” of a mine). The second type can be regarded as non-dangerous false alarms, since they accompany the actual correct detections.

In Fig. 6 we show a comparison of ROC curves for different sets of features, averaged over all CV folds. The curves are calculated at the windows level of detail, i.e., each window example is treated as a separate object under detection. In the plot, ranges for both sensitivity and FAR axes were purposely narrowed down to show better the differences between the curves. In the plot legend we also report the AUCFootnote 5 measures. AUC\(_{0.1}\) represents the normalized area obtained up to the FAR of 0.1—this indicates how fast the curve grows in its initial stage (a property important in detection tasksFootnote 6). The AUC notation without a subindex stands for the area over the whole [0, 1] range of FAR.

Fig. 6
figure 6

Comparison of ROC curves (averaged over 10 CV folds) for different sets of features and both types of detectors. The sensitivity and FAR are calculated on the windows level of detail

To comment on the results the following remarks can be given. Overall, the results of “metal detectors” are noticeably better than of “plastic detectors.” This is intuitive—responses of plastic mines produce more subtle and weaker traces in images. Also as expected, more numerous sets of features C, D performed much better than A, B sets. In the case of “metal detectors,” this difference is visible mainly in FAR values (the sensitivities remained close to equal). For “plastic detectors,” the difference can be seen both in sensitivity and FAR.

For the largest features set D the results were as follows. The “metal detector” yielded a sensitivity of \(\mathbf {68/70}\varvec \approx \mathbf {97.14}\varvec{\%}\) and in total \(\mathbf {6/210\varvec \approx 2.86\varvec{\%}}\) false alarms. The “plastic detector” yielded the sensitivity of \(\mathbf {65/70\varvec \approx 92.86\varvec{\%}}\) and \(\mathbf {13/210\varvec \approx 6.19\varvec{\%}}\) false alarms. As regards the reasons for misdetections, they were mainly caused by mines placed with a significant inclination angle (close to \(45^\circ\)). As regards the false alarms, their most frequent sources were: the large metal disk (see the fourth row in Fig. 5)—similar in size and shape to actual mines. Also, a few false alarms occurred due to certain particular arrangements of other objects, generating some resemblance to mines in the image. It is also fair to comment that the material collected by us did not contain “empty square meters” as it is often the case in out-door test lanes. By that we mean square meters with no objects in the scans, just the soil clutter. In our experiments, every scan (every square meter) contained some object buried. Therefore, the calculated false alarm rates should be regarded as overly pessimistic and would be smaller in more realistic conditions with fewer disruptive objects.

Thinking for a moment of the recognition task (rather than detection) the results can be assessed as satisfactory. The “metal detector’ mistook only once (1 / 70) a plastic mine for a metal one, while the “plastic detector” was slightly worse in this aspect—4 / 70, 5 / 70 mistakes, respectively, for C and D sets of features.

3.5 Time performance of dense detection procedure

Table 2 summarizes the time performance of our detectors measured on our CPU (Xeon \(2.4\,\text {GHz}\) \(2\times 8\) core). In the table we report times both without and with parallelization.

Table 2 Time performance for different feature sets (Xeon \(2.4\,\text {GHz}\) \(2\times 8\) core)

We should remark that at the detection stage images were scanned less densely (jumps of the window set to \(dt=1\), \(dx=dy=2\)) than it was at the data acquisition stage (\(dt=dx=dy=1\)). Moreover, we have restricted the analysis only to a subinterval of time slices, for \(t=381,\ldots ,480\), related to potential mine locations (subsurface or flash buried) with some overhead. Therefore, unnecessary slices have been discarded. Despite these reductions, our scanning loop should still be regarded as computationally expensive, close to an exhaustive C-scan traversal. The loop involved an analysis of approximately \(34\,000\) windows (3D) and for each window the extraction of features and the classification calculations (by 600 boosted decision trees) were performed. For the richest features set D, the duration times of the overall detection procedure were on average \(13.7\,\text {s}\) long per a C-scan. Given the number of windows analyzed (\({\approx} 3.4\cdot 10^4\)), this yields the mean time of analysis for a single window of approximately \(\mathbf {0.40}\) ms, which in our opinion is a satisfactorily fast result.

The parallelization led to an about 7 times speed up with respect to sequential calculation. The parallelized elements were: calculation of multiple integral images and the main scanning loop (coarse-grained). Yet, it is fair to add that parallelization, as such, plays a minor role in the algorithmic sense, and the crucial element is integral images. We remind they allow for extraction of each statistical moment in constant time, without the need to iterate over all pixels in each 3D window [(formulas (9), (10)].

It is possible to give an estimate on time performance if the calculations were to be carried out without integral images. In that case the computational complexity is

$$\begin{aligned}&O\Bigl ((n_x-w_x+1)(n_y-w_y+1)(n_t-w_t+1)\nonumber \\&\quad \cdot n_f (w_x w_y w_t c_{fe/p} + c_{d/f})\Bigr ) , \end{aligned}$$
(12)

where \((n_x-w_x+1)(n_y-w_y+1)(n_t-w_t+1)\) accounts for the maximum number of window positions, \(n_f\) is the number of features to be extracted, \(c_{fe/p}\) represents the cost of the extraction per single pixel and \(c_{d/f}\) represents the cost of detection (classification) procedure per single feature. Now, even with an optimistic setup of \(c_{fe/p}=10^{-9}s\) and \(c_{d/f}=10^{-9}s\) and for \(3.2\cdot 10^{4}\) windows one may check that the total time of detection procedure for a single C-scan becomesFootnote 7 approximately \(8\,900\,\text {s}\), thus almost 2.5 hours. Even after parallelization this time is inacceptable in practice. Please note that owing to integral images one simply avoids in (12) the term \(w_x w_y w_t\) which is proportional to the number of pixels.

4 Summary

We have reported experimental results obtained by our prototype GPR system for automatic landmine detection. In this paper, we have focused more on computational aspects of the application rather than the hardware ones. A particular attention has been given to fast extraction of features

As the key contribution we regard the technique based on multiple integral images allowing for constant-time calculation of 3D statistical moments. The technique is general and may be applied in computer vision applications (detection tasks) other than ours.

As regards our specific GPR experiments, the technique is helpful in two places. Firstly, at the data acquisition stage, it allows us to generate very large sets of features to learn from. In other words, the learning algorithm is given a rich multitude of features and can look for a relevant subset among them—i.e., such features that describe best the hyperboloids related to mines. Secondly, at the detection stage, we perform a dense traversal of a C-scan (analyzing over \(34\,000\) windows per \(\approx 1\,\text {m}^2\)) and the constant-time extraction of each statistical moment allows us to carry out the procedure within a reasonable time. Note that we purposely perform no auxiliary operations like: preliminary segmentation, hyperbola detection or prescreening.

The future research direction for us is a more through experimental work. Up to now our results are promising, but it is fair to remark they have been obtained on a fairly small GPR material and with only two types of antitank mines. We have strived to make this material more difficult by introducing many disruptive objects. The future experiments should include more mine types (antipersonnel mines in particular), more soil types and various weather/humidity conditions.