1 Introduction

Facial skin temperature is a physiological index that varies with skin blood flow controlled by autonomic nervous system activity [1]. The facial skin temperature can be remotely measured using infrared thermography, and it has recently attracted attention as a remote biomarker [2,3,4].

It has been known so far that there is some relationship between human physiological and psychological states and skin temperature at anatomical sites, but the appropriate size and position of region of interest (ROI) has not yet been clarified. Moreover, it is necessary to set an appropriate baseline according to the purpose of the experiment when we evaluate the physiological and psychological state based on the facial skin temperature. Furthermore, there are many issues facing the practical application of facial skin temperature, such as the slow response of skin temperature due to changes in subcutaneous blood flow under existing conditions [1]. There are many studies that try to solve such the issues by methods such as pattern identification. For example, studies have been reported to estimate human emotions [5,6,7,8], drowsiness [9,10,11], mental stress [12], etc. In contrast to the preceding argument, it is impossible to make a machine that can discriminate all infinite physiological and psychological states. Considering the practicality of skin temperature, a machine that can determine the normal state of facial skin temperature may be sufficient. For example, the driver’s drowsiness estimation is enough to determine that he is not sleepy. We only need to know that we are not stressed in case of stress judgment. Furthermore, we only need to know that we are healthy in case of a simple health check.

In this study, we propose a completely new approach to incorporate the concept of anomaly detection [13] into the analysis of physiological and psychological states by facial skin temperature. The proposed anomaly detection algorithm separates the normal facial skin temperature from the anomaly facial skin temperature such as “sleepy”, “stressed”, or “unhealthy”.

In the anomaly detection field, only normal data that can be collected easily are often used, since it is difficult to cover the data in the anomaly state. Therefore, we focus on the anomaly detection problem using unsupervised learning [14]. In the past, it was considered difficult to perform unsupervised learning in image space because of the curse of dimension, but the deep generative models [15] can deal with this problem. Typical deep generative models are autoencoder (AE) [16], variational autoencoder (VAE) [17], and generative adversarial network (GAN) [18]. In addition, many of these derivative technologies, vector quantized variational autoencoder- 2 (VQVAE-2) [19, 20], anomaly detection with GANs (ADGAN) [21], and efficient GAN [22] have been reported, and anomaly detection in image space has made remarkable progress.

Among these algorithms, we investigate the VAE-based anomaly detection method for the following reasons. (1) The features of unlearned part are not reproduced. In other words, we cannot guarantee how it will behave for unlearned parts, since normal AE only learns how to reduce the learned dimension. (2) Convergence is guaranteed and learning is easier than GAN. (3) Latent variables can be dropped to a lower dimension, so it is easier to handle compared with GAN. (4) Easy to interpret because it has a mathematical structure [23]. VAE has also been applied to images in various wavelength bands such as detection of cancerous tissue parts from chest X-ray images [24] and anomaly detection of industrial products based on visible images [25]. Therefore, it is necessary to study whether VAE can be applied to infrared images that we specialize in.

In this study, we propose the anomaly detection model for facial skin temperature using VAE for the development of a machine that can judge the normal state of facial skin temperature. The objective of study is to detect anomaly facial thermal images from multiple facial thermal images. In this paper, the method for separating normal and anomaly facial thermal images using an anomaly detection model was investigated to evaluate the applicability of VAE to facial thermal images.

2 Anomaly detection using VAE

2.1 Overview of the VAE algorithm

VAE is a generative model based on deep learning. Figure 1 shows overview of the VAE algorithm. X, \(\widetilde{X}\), NN represent input, output, and a neural network, respectively. The VAE network is divided into an encoder section and a decoder section. Given observation \(X = \{\overrightarrow{x_1}, \overrightarrow{x_2},\ldots , \overrightarrow{x_N}\}\), VAE identifies probability distributions (\(p(\overrightarrow{x_*}|X)\)) that produce unobserved value (\(\overrightarrow{x_*}\)). VAE is designed on the assumption that latent variables that serve as explanatory variables are normally distributed. The encoder performs dimensional compression of X, and it calculates the mean vector (\(\overrightarrow{\mu _{\phi }}\left( x\right) \)) and variance (\(\Sigma _{\phi }\left( x\right) \)), which are parameters of the normally distribution. The blue part in Fig. 1 indicates that sampling is performed from the standard normal distribution. The points are then sampled from the latent space distribution (\(\overrightarrow{z}\)). In the decoder, the model likelihood parameters (\(\overrightarrow{\eta _{\theta }}\left( z\right) \)) are calculated, and the reconstruction error can be computed. Finally, the reconstruction error is backpropagated through the network. Since the reconstruction error to be optimized includes a regularization term that brings the mean (\(\overrightarrow{\mu _{\phi }}\left( x\right) \)) to 0 and the variance (\(\Sigma _{\phi }\left( x\right) \)) close to the unit matrix, the distribution of the latent variable \(\overrightarrow{z}\) has a shape close to a standard normal distribution. There is a tendency to regularize the organization of the latent space by bringing the distribution returned by the encoder closer to the standard normal distribution. For this reason, VAE can avoid overfitting and achieve a high recall compared to a normal autoencoder.

Fig. 1
figure 1

Overview of VAE

Fig. 2
figure 2

Conceptual diagram of anomaly detection using VAE

2.2 Anomaly detection using VAE

We explain the concept of anomaly detection in thermal face images using VAE. Figure 2 shows the conceptual diagram. Normally, with machine learning, it is necessary to have a dataset with a complete number of samples for each class. However, it is difficult to collect data when a person is abnormal (e.g., data on people who are not feeling well). Therefore, in this study, an evaluation of the model constructed using only the data in the normal condition, which is relatively easy to collect, was performed. In this study, we defined thermal face images in normal state as Normal, in anomaly state as Anomaly. Only Normal is used for anomaly detection using VAE. First, by learning a large amount of Normal using VAE, the anomaly detection model was constructed. Second, testing data (Anomaly and Normal) were input to the anomaly detection model. If the pattern was similar to Normal, the testing data were decided Normal; otherwise, the testing data were decided Anomaly.

3 Proposed algorithm

3.1 Collection of anomaly and normal

Anomaly detection using VAE requires a large amount of normal data. In this study, we verify whether the proposed algorithm is effective in anomaly detection in facial skin temperature. Anomaly is thermal face images in which the skin temperature is forcibly changed by holding the breath. The Normal were thermal face images obtained when the subject was in the normal state. Image sizes are 640 \({\times }\) 480 pixels. The number of normal data was 4976, of which 90% was used as learning data and 10% was used as testing data. The number of Anomaly serving as the testing data was 195 in accordance with the testing data of Normal. The difference in the facial skin temperature value in a single image is small. The thermal images may also be biased by room temperature and other disturbances. In this study, the thermal face image was normalized, such that the maximum temperature was 1 and the minimum temperature was 0 for learning the relative amount of skin temperature inside the face. Specifically, the 640 \({\times }\) 480 pixel thermal image output from the thermographic device was cropped to 180 \({\times } \)180 pixels, leaving the face area. This thermal image after cropping is defined as a thermal face image. The normalization was applied independently to each of thermal face images.

Fig. 3
figure 3

Samples of learning data

Fig. 4
figure 4

Overall of the anomaly detection model

3.2 Construction of the anomaly detection model

When performing VAE learning, a part of the thermal face image is randomly cut out at 8 \(\times \) 8 size pixels, and this patch is used as learning data, as shown in Fig. 3. The local thermal image of the skin area was expanded to 10,000 sheets. Figure 4 shows overall of the anomaly detection model. In this study, convolutional layers were placed before the FC layer of encoder to extract the features of the skin temperature pattern of the skin blood vessels. Along with that, deconvolutional layers were placed after the decoder. The construction of the encoder is depicted in Table 1. The structure of the VAE encoder consisted of two convolutional layers and one fully connected layer. In this table, Conv, BatchNorm, and FC indicate the convolutional, batch normalization, and fully connected layers, respectively. The mean vector and variance were output from FC. The structure of the decoder is paired with the structure of the encoder, and the structure is opposite to that of the encoder. The construction of the decoder is depicted in Table 2. ConvTrans indicates a transpose convolution. The gradient descent method was used for VAE parameter learning, and the optimization algorithm at that time was Adam. The number of dimensions of the latent variable \(\overrightarrow{z}\) was searched from 4, 5, 6, 7, 8, 10, and the best model was selected. The number of epochs was 20 and the batch size was 128.

When testing, for all test data, the spatial unregularized anomaly score was calculated with reference to [25] and used as an index for detecting facial thermal images with some kind of abnormality from multiple facial thermal images. Unregularized anomaly score represented the error between the input vector and the reconstructed vector. To use the unregularized anomaly score to determine Anomaly, it is necessary to set a threshold mostly. However, the threshold is not defined in this study. In this paper, the statistics of Normal and Anomaly of unregularized anomaly scores were calculated to see whether it is possible to separate Normal and Anomaly. The equation for the spatial unregularized anomaly score is shown below:

$$\begin{aligned} L_{{\text {VAE}}}\left( x\right) = \left. \sum ^{Nx}_{i=1}\dfrac{1}{2}\times \dfrac{\left( \mu _{xi}-x_{i}\right) ^{2}}{\sigma ^{2}x_{i}} \right| _{z=\mu _{z}}. \end{aligned}$$
(1)

The above equation is directly related to the reproduction error. Nx and \(x_{i}\) represent the number of pixels in the image and any pixel value, respectively. \(\mu _{xi}\) represents the maximum posterior probability estimation of the latent variable \(\overrightarrow{z}\).

Table 1 Construction of encoder
Table 2 Construction of decoder

4 Experiment

4.1 Experiment system

The experimental systems consisted of an infrared thermography device (FLIR A600-Series, FLIR systems Co., Ltd). The size of the thermal image was 640 \(\times \) 480 pixels, and the temperature resolution was less than \(0.05\,^\circ \)C. The infrared emissivity of skin was \(\varepsilon \) = 0.98. The viewing angle of the infrared thermography is \(60^\circ \).

4.2 Procedure and condition

Healthy young subject (male; aged, 24) participated in the experiments. The subject provided informed consent about the experiments and objects of this study prior to agreeing to participate in the experiment. For the introduction to the real environment, subject was asked to cooperate in the experiment as usual as possible without modifying their daily activities such as food intake, sleep, smoking habits, etc. The experiment was conducted in an experimental room (24.7 \(\pm \,{6.0}\,^\circ \)C). The ultimate goal of this study is to construct a system that recognizes the normal state of facial skin temperature for estimating driver drowsiness and checking stress in daily life. Therefore, we conducted the experiment in the room, which the environment temperature was not controlled, to simulate the daily life environment. In the experiment, the skin temperature of the face was measured twice. The sampling frequency of thermal images was 10 Hz. An infrared thermograph was placed 1 m in front of the subject. In the first time, the subject did not control anything, and they sat in a chair for about 10 min, with their face as still as possible. This thermal face image measured for about 10 min was regarded as Normal. The second time, the subject held his breath held for 50 s, which raised the blood pressure to promote fluctuations in facial skin temperature. The measurement was performed in iterations of 30–50 s, and the thermal face image obtained at this time was regarded as Anomaly.

5 Result and discussion

5.1 Anomaly and normal

Figure 5 shows examples of normalized thermal face image samples.

Fig. 5
figure 5

Samples of thermal face images

5.2 Anomaly detection

Figures 6 and 7 show the unregularized anomaly scores of two arbitrary samples. The unregularized anomaly scores are mapped on a log scale. The blue color indicates the degree of abnormality. When the unregularized anomaly scores were observed for all test data, they were categorized into two types of map, Figs. 6 and 7. In Fig. 6, the input thermal images are samples that vary significantly across facial skin temperature. When observing unregularized anomaly score, there is no abnormal part in the entire face in the normal and abnormal parts in the nose, the left and right cheeks, and around the mouth in the anomaly space.

This is suggested that VAE learned a large number of samples in which no part of the entire face had a significant change in temperature. In Fig. 7, the nose was abnormal even in the normal space, but parts other than the nose were not recognized as abnormal. As a result, the VAE recognized a portion where the skin temperature was lowered as an abnormal portion and a portion where the skin temperature did not change as a normal portion. That is, the VAE algorithm was effective in capturing changes in temperature within the face. When constructing the anomaly detection model, we only use the thermal face images of Normal. In other words, the unregularized anomaly score calculated with the model that only learns Normal may be able to identify the state of the test image (Normal or Anomaly) even if it is not known whether the test image is Normal or Anomaly.

Fig. 6
figure 6

Unregularized anomaly score of sample1

Fig. 7
figure 7

Unregularized anomaly score of sample2

The histograms of unregularized anomaly score shown in Figs. 6 and 7 are shown in Figs. 8 and 9, respectively. The horizontal axis represents unregularized anomaly score and the vertical axis represents frequency of unregularized anomaly score. The vertical axis of these histograms is plotted on a log scale. For both samples, the general shape of the histogram distribution varied significantly. Abnormality patterns are categorized into two types, both of which have different distributions for anomaly and normal; therefore, the same can be said for all test data. Figure 10 shows the average and variance of the unregularized anomaly score for all test data. Since the anomaly space deviates from the normal space, the proposed algorithm is useful for detecting abnormal skin temperature fluctuations.

Fig. 8
figure 8

Histogram of unregularized anomaly score in sample1

Fig. 9
figure 9

Histogram of unregularized anomaly score in sample2

Fig. 10
figure 10

Average and variance of unregularized anomaly score

6 Conclusion

The objective of this study is to detect anomaly facial thermal images from multiple facial thermal images for the development of a machine that can judge the normal state of facial skin temperature. In this study, we proposed the anomaly detection model in thermal face images using VAE. In this paper, the method for separating normal and anomaly facial thermal images using an anomaly detection model was investigated to evaluate the applicability of VAE to facial thermal images. In actual fact, we collected the Normal skin temperature and the anomaly skin temperature by experiment. Using this anomaly detection model to calculate unregularized anomaly score, different distributions of unregularized anomaly score were obtained between Normal and Anomaly in all test data. As a result, it has been shown that VAE can be used to detect abnormalities in facial skin temperature. In the future, we plan to use probability models to discriminate between Normal and Anomaly, and perform quantitative evaluations using discrimination rates. In addition, we plan to increase the number of samples and conduct an assessment of generality.