Hidden Object Detection and Recognition in Passive Terahertz and Mid-wavelength Infrared

The study presents the comparison of detection and recognition of concealed objects covered with various types of clothing by using passive imagers operating in a terahertz (THz) range at 1.2 mm (250 GHz) and a mid-wavelength infrared (MWIR) at 3–6 μm (50–100 THz). During this study, large dataset of images presenting various items covered with various types of clothing has been collected. The detection and classification algorithms aimed to operate robustly at high processing speed across these two spectrums. Properties of both spectrums, theoretical limitations, performance of imagers and physical properties of fabrics in both spectral domains are described. The paper presents a comparison of two deep learning–based processing methods. The comparison of the original results of various experiments for the two spectrums is presented.


Introduction
Non-intrusive detection and recognition of objects concealed under clothes remains challenging in terms of sensors and processing algorithms. Two spectra, namely thermal infrared (IR) and terahertz (THz), may provide images visualising objects placed on a human body covered with clothes. However, both spectra rely on different phenomena and provide images of different qualities.
The most useful property of imagers operating in terahertz range is the ability to see through clothing thanks to high transmission of terahertz waves through popular textiles. Thermal infrared imagers are able to detect small temperature differences on the object's surface. Both https://doi.org/10.1007/s10762-019-00628-7 properties of imagers can be used to detect metallic or non-metallic, potentially dangerous objects hidden under clothes.
Terahertz imagers, which are one of the most popular non-intrusive imagers for concealed object detection, provide images with low signal-to-noise ratio and low spatial resolution. The thermal infrared imager provides higher spatial resolution but they relies on thermal contrast which decreases due to thermalisation effect.
The paper presents a study on passive imaging of concealed objects with automatic detection and classification algorithm.
The study contains theoretical analysis of concealed object visualisation including relationship between spectral bands, the theory between heat transfer through clothing and between a hidden object and a human body. The paper briefly reports on values of transmittance of radiation through several textiles in both spectral bands. The selected deep learning methods are meant to combine real-time processing capability with high recognition rates. The study is concluded with results and analysis of proposed processing methods.

Related Works
Terahertz (THz) and millimetre-wave (MMW) are the most frequently explored spectra for detection of objects hidden under clothing with most of the commercial scanners operating in the MMW [1,2]. However, mid-wavelength infrared is another promising spectral domain since it provides much more details than THz or MMW images [3]. Most of the systems for detection of concealed objects operate in MMW or THz spectra using various processing methods to automatically detect the object. Other related methods to reveal concealed items are based on THz spectroscopy [4][5][6] but their practical application in real-life is limited.
Various methods have been proposed for detection of concealed objects. Primarily, objects were detected by combining various feature descriptors with basic learning schemes. The most popular methods to create object representation include local binary pattern (LBP) [7,8], Gabor Jet Descriptors [9,10], histograms of Weber Linear Descriptor features [10] and histograms of oriented gradients (HOG) [11]. In recent years, machine learning (ML) methods, mostly based on convolutional neural networks (CNN), become increasingly popular. The state-of-the-art CNN architectures include the following: (1) single pass approaches that perform detection within single step (single shot multi-box detector (SSD) [12], you only look once (YOLO) [13]) and (2) region-based approaches that exploit a bounding box proposal mechanism prior to detection (faster regional-CNN (R-CNN) [14], region-based fully convolutional networks (R-FCN) [15], lightweight deep neural networks for real-time object detection (PVANET) [16], local R-CNN [17]).
Several approaches on hidden object detection have been proposed. Haworth et al. proposed several methods based on shape analysis. In [18], Active Shape Models (ASMs) have been combined with K-means to segment images into three regions: background, body and threats. Since the method was not accurate for body segmentation, another work [19] attempted to outperform using Gaussian mixture models. The results were better however unconnected body segments remained unresolved. Shen [20] combined various methods including anisotropic diffusion for denoising, mixture of Gaussian densities and isocontours for temperature modelling and segmentation. In [21], Martinez presented effective but timeconsuming method based on non-local means (NLM) and iterative steering kernel regression (ISKR) for denoising and local binary fitting (LBF) for segmentation. Yeom proposed a method using global and local segmentation, a Gaussian mixture model with parameters initialised using vector quantisation [22] and additional k-means clustering [23]. A method proposed by Agarwal et al. [24] uses a mean standard deviation-based segmentation technique. Gómez et al. [25] developed a two-step algorithm, based on denoising and mathematical morphology. Kumar et al. [26] used singular value decomposition and discrete wavelet transform for object detection and a convolutional neural network for classification.
Mohammadzade et al. [27] proposed an algorithm based on principal component analysis and a two-layer classification algorithm. Lopez-Tapia [28,29] proposed a detection approach based on machine learning which learns from the spatial statistical information of the grey level. Liand et al. used mask conditional generative adversarial nets (CGANs) for real-time segmentation of weapons.
The paper concerns two deep learning methods, namely you only look once 3 (YOLO 3) and region-based fully convolutional network (R-FCN). The presented methods aim to detect and recognise (classify) hidden objects in THz and MWIR images.

Theory of Passive Concealed Objects Detection
The study concerns two spectra of complementary properties. Terahertz band is contained between frequencies from 0.1 to 10 THz (wavelengths from 3 mm to 30 μm) and is also commonly referred to as far-infrared . This radiation is low-energetic and it does not ionise matter so it is not harmful to people. Infrared is radiation typically comprised in the range between 0.78 and 30 μm (430 THz down to 10 THz) [47], divided into four subbands: near, short-wavelength, mid-wavelength, long-wavelength and far-infrared.
Mid-wavelength infrared, also referred as thermal infrared, covers wavelengths from 3 to 8 μm (100 down to 37.5 THz). MWIR imagers are able to distinguish small changes of temperature [48] on an object's surface and therefore are useful in detection of objects covered with various types of fabrics [31,49,50].
Terahertz-based hidden object detection systems typically operate below 1 THz, due to high transmission through clothes in that range [51][52][53][54]. Ability to penetrate fabrics and other materials [55][56][57] with low water content is the basis for concealed object detection in the terahertz range.
It should to be added that thermal infrared imagers, in contrary to imagers operating in THz spectrum, provide high-quality, high-resolution images with a higher spatial resolution. The spatial resolution in this context defines the smallest size of object that the imager is able to capture.
The transmittance depends on fabric type, basic weight and its thickness (number of layers) [58][59][60]. The transmittance through a fabric of terahertz radiation is significantly greater than of infrared and decreases with increasing frequency, number of layers and basic weight. Figure 1 presents the graphs of transmittance through selected clothes-shirts and a sweater made of cotton and polyester with different basic weights and leather jacket within the MWIR (3-6 μm) and THz (150 GHz-2.5 THz) ranges.
This study concerns passive imaging of concealed objects which relies on radiation emitted by objects. Passive cameras assign temperature differences to differences in the radiated energies in their spectral ranges per unit surface. The detection process relies on searching for temperature differences on the object's surface. Those differences should correspond to the differences of energy radiated in the whole radiation range. The larger differences may be classified as anomalies and hidden objects.
The ability of terahertz and infrared imagers to differentiate temperatures is the main measure for assessing their performance. The two parameters are minimum resolvable temperature difference (MRTD) and noise-equivalent temperature difference (NETD) with average values between 40 and 130 mK for cameras equipped with uncooled infrared detectors and between 20 and 70 mK for cooled infrared detectors that describe ability [50]. The value of NETD for THz imagers is lower and, typically in the range between 0.5 and 2 K.
The purpose of this study is to detect and classify potentially dangerous objects including guns and knives placed on the human body covered with the clothing. In the adopted measurement scenario, an item is placed on a human body with constant direct contact. The direct contact results in a heat energy transfer according to Fourier's law [60,61].
where q ! is the local heat flux density, k is the material's conductivity (related to temperature) and ∇T is the temperature gradient. The Fourier's law is often simplified and presented in one dimensional form of [61] Both bodies, according to the Planck's law radiate the energy in a broad spectral range, including THz and IR frequencies.
The energy transfer between the object and the body is visualised in Fig. 2.
The total radiant emission is given with the following formula where ρ C is the reflection coefficient of a clothing fabric, τ C its transmittance coefficient, Φ B is the radiant exitance of the human skin, Φ C is the radiant exitance of a clothing fabric, Φ A is the irradiance of the clothing surface and Φ H is the radiant exitance of a hidden object. During the analysis of the energy transfer between the body, object and clothing, it was assumed that temperature of the external surface equalled the temperature of the object. The analysis does Fig. 1 Transmittance of a mid-wave infrared and b terahertz through different clothes not take into account the transmittance coming from the human body that was transmitted, absorbed and re-emitted by the hidden object.
The spectral contrast cp(λ, T) calculated for a given temperature (T) has been described by the following equation [13]: where φ B (λ, T) and φ H (λ, T) are the spectral radiant exitance of a human body and hidden item, respectively.

Experiment Protocol
The experiment concerned collecting images presenting the subject wearing various clothing with various objects hidden under clothes. In order to perform moredetailed investigations, the set of objects to be concealed contained various dangerous objects as well as typical objects including wallet and mobilephone.Duringtheexperiments,imagesof asubjectcarryingvariousitemsweresimultaneously acquired in both spectra. The dataset of collected images have been used to train and test the proposed algorithm. The experiment has been divided into sessions lasting 30 min each. The aim of long-lasting measurement sessions was to collect images presenting various concealed objects with various contrasts and to assess the impact of decreasing contrast on detection and recognition.
The session duration is a result of initial experiments showing that concealed object used during this study reach the thermal equilibrium after 23-26 min as presented in Fig. 3. The time intervals apply to measurements performed indoors with constant ambient temperature. The duration of each Fig. 2 a Thermal image of clothing surface with and without any object and b thermal radiation model measurement session has been adjusted to acquire possible changes that may result from the thermalisation process. During experiment, the object is heated by the body, and both reach thermal equilibrium. Measurement data (images, air temperature, humidity and pressure, values of the body and object temperatures) were collected every 5 min.
During each measurement session, a set of clothes and various metallic and non-metallic objects have been employed including a metal knife a plastic pistol, a ceramic knife and a subject mimicking dynamite. The set of objects is presented in Fig. 4. It has been taken into account that some of the objects may be mistakenly considered as different ones due to uniqueness of shapes. The set of items included also a leather wallet and a mobile phone.
All the measurement session were taken indoors in stand-off scenario with a subject standing in front of the imagers. The ambient temperature was controlled and constant (294 K, controlled by the air conditioning) and the relative humidity varied by no more than 3%. The experiments have been performed using MWIR and THz imagers with parameters provided in Table 1.

Method for Automatic Object Classification
This study involves two automatic methods for detecting the classification of concealed objects. Both methods are based on convolutional neural networks and are able to perform detection and classification tasks in a single architecture. Two deep learning-based methods, namely You Only Look Once 3 (YOLO3) and R-FCN, are described in the following subsections. The deep learning methods studied in this work follow two different approaches for detection and classification of objects in an image. YOLO3 is designed for fast object detection by eliminating the use a delegated region proposal network (RPN). R-FCN uses region proposal network to detect the object and softmax function for classification.

YOLO3
You only look once, also referred to as YOLO3, is known as a very fast object detection algorithm. The entire architecture contains 106 fully convolutional layers, where 53 layers come from the Darknet architecture trained on ImageNet. The architecture of YOLO3 is shown in Fig. 8.
The architecture is able to perform object detection and classification. The Darknet-53, a fully convolutional network with 53 layers, is used for feature extraction. The first detection is made by the 82nd layer. The objects' score for each bounding box are calculated using logistic Fig. 4 Test objects used during the experiments: a plastic pistol, b metal pistol, c bombs, d ceramic knife, e metal knife, f leather wallet, g mobile phone regression at three scales. To determine the priors, YOLO3 applies k-means clustering algorithm. The 9 priors are grouped into 3 different groups according to their scale. Each group is assigned to a specific feature map.
The classification is done with independent logistic classifiers to calculate the likeliness of the input belongs to a specific label. Moreover, YOLO3 uses binary cross-entropy loss for each label which reduces the computation complexity.

R-FCN
R-FCN is based on the ResNet-101-based [38] framework. This architecture computes candidate regions by the fully convolutional region proposal network (RPN). The adopted region proposal and region classification approach are aimed to achieve detection and classification capability. It uses a bank of k 2 position-sensitive score maps for each category produced by last convolutional layer. The model is able to compute C number of object categories. The R-FCN architecture is presented in Fig. 9.

Results
The results are clustered into two groups concerning detection and classification. During the 30-min acquisition sessions, the performance of object detection and object classification has been evaluated within 5-min intervals. At each specific minute of the experiment, both sensors acquired at least 250 images each. The dataset is composed of subsets containing images acquired during respective measurement sessions. The dataset has been split into train, test and validate subsets. Both models require large dataset to train upon, and the train-test-validation split ratio has been set to 75%, 10% and 15%, respectively. Data augmentation has been applied.  Our previous studies [60,62] show that the ability to detect the concealed object with the infrared and terahertz cameras is decreasing for long-lasting experiments. The detection algorithm provided bounding box for every detected object in the image. The detection rate depends on the observed objects' temperature, size, basic weight and type of clothing that the subject is wearing.
Comparison of the images indicated that in the case of the terahertz images, the contrast (the change in the normalised pixel intensity of the object) between the concealed object and the body decreases more rapidly in the case of MWIR images. Summary of the performance indicators is presented in Table 2.
The algorithms were trained to detect and classify certain objects with specific shapes. However, the non-dangerous items, which were not used to train the algorithm, were also detected. The detection rates of other objects (false acceptance rate-FAR) are considerably lower than desired items as presented in Table 2.
The results show that the presented detection method works better for MWIR images, but only at the beginning of the experiment. After several minutes, the detection rates for MWIR images are decreasing. The region-based method outperformed the single shot detector; however, the method is slower, allowing for detection at 11 and 7 frames per second for YOLO3 and R-FCN, respectively.
The maximum detection rates of 86% and 94% have been achieved for images registered at the beginning of the experiment with the metal gun hidden under T-shirt. This is the result of relatively high temperature difference between an object and subject's body at the beginning of the acquisition session. Moreover, infrared images provide more accurate edges than respective THz images due to a higher spatial resolution. However, detection performance of the R- Fig. 8 Architecture of YOLO3 Fig. 9 Architecture of R-FCN FCN for THz images was more stable, providing almost the same performance along the acquisition session.

Classification
The classification performance has been calculated only for the images with correctly detected objects of interest. Similar to the previous task, several experiments have been made. Performance curves of object detection task in various configurations are presented in Figs. 13, 14 and 15. Both algorithms have been trained to classify following objects: guns, knives and objects mimicking dynamite. The mean performance of classification is lower than the detection.  However, general trend is similar to detection task, where classification in MWIR domain outperformed THz range at the beginning of the experiment and showed decreasing rates along the experiment to finally perform worse than the THz domain at the end of acquisition sessions. Moreover, similar to detection task, R-FCN outperformed YOLO3. Classification rates of both methods for the THz images were slowly decreasing, providing almost the same performance along the acquisition session. Summary of the classification performance of presented algorithm is presented in Table 3.  Recognition rates vary depending on type of item and clothing. Detection and classification of concealed items depend on the number of factors, including the temperature difference between objects and human body, thickness, type of material and transmission of clothes. The general trend shows that large objects are easier to detect and classify than the small ones. Moreover, loose clothes may not perfectly adhere to the object and the heat transfer may not be perfect. This problem had particular importance for the imaging of objects within the infrared range where the transmission of clothes is very low and detection relies mostly on the heat transfer. When the contact between the object and the clothing is not uniform, the concealed object is not uniform and its shape may not be correctly classified.  A similar experiments performed with active THz imaging are presented in Table 4. The presented methods have been implemented on the NVIDIA 1080-Ti graphical processing unit and are reported to process maximum 14 and 7 frames per second, for YOLO3 and R-FCN, respectively.

Summary
The process of a hidden object's detection and recognition has different physical sense in terahertz and infrared ranges and may be influenced by various factors. Terahertz cameras can visualise a hidden object mainly due to a non-zero transmission through clothing, whereas thermal cameras cannot utilise this property because the transmission rate through clothing is negligible. However, anomalies analysis of the temperature distribution on the clothing surface in the thermal infrared image we can reveal the object. Generally, the classification task achieved lower performance than the detection in both spectra with the highest rates achieved by the region proposal network-based method. The experiments showed that the changes in the temperature values of the human body and the concealed object affect the detection ability of imagers operating in both the terahertz and infrared ranges. The compared methods, in some limited extent, are capable for real-time processing.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.