Human Re-Identification with a Robot Thermal Camera Using Entropy-Based Sampling

Human re-identification is an important feature of domestic service robots, in particular for elderly monitoring and assistance, because it allows them to perform personalized tasks and human-robot interactions. However vision-based re- identification systems are subject to limitations due to human pose and poor lighting conditions. This paper presents a new re-identification method for service robots using thermal images. In robotic applications, as the number and size of thermal datasets is limited, it is hard to use approaches that require huge amount of training samples. We propose a re-identification system that can work using only a small amount of data. During training, we perform entropy-based sampling to obtain a thermal dictionary for each person. Then, a symbolic representation is produced by converting each video into sequences of dictionary elements. Finally, we train a classifier using this symbolic representation and geometric distribution within the new representation domain. The experiments are performed on a new thermal dataset for human re-identification, which includes various situations of human motion, poses and occlusion, and which is made publicly available for research purposes. The proposed approach has been tested on this dataset and its improvements over standard approaches have been demonstrated.


Introduction
The ageing population and increased life-expectancy of people worldwide motivated the growing number of wellbeing and health monitoring applications for personal and domestic use.Service robotics is a promising research field that contributes to the creation of new solutions for elderly care.In recent years, service robots have become very popular by accomplishing various tasks, from guiding visitors in public environments to assisting elderly people at home.For the latter, in particular, a robust human reidentification system is needed in order for the robot to Serhan Cos ¸ar scosar@lincoln.ac.ukNicola Bellotto nbellotto@lincoln.ac.uk 1 Lincoln Centre for Autonomous Systems (L-CAS), School of Computer Science, University of Lincoln, LN6 7TS Lincoln, UK distinguish between two or more users in the household and provide personalised services (e.g.medication reminders).
Considering its importance, human re-identification has not been sufficiently investigated in robotic applications.A very large amount of work focused on recognizing people across a network of RGB cameras in surveillance systems [2,26].In most of these applications, re-identification is performed by extracting appearance features from RGB images [5,10,16].On the other hand, by exploiting RGB-D cameras, anthropometric features (e.g., limb lengths) extracted from skeleton data [1,21], point cloud information [20] and volumetric features extracted from depth image [8] can be used for re-identification in service robot applications.
However, for long-term applications of domestic service robots, many existing approaches have strong limitations.For instance, appearance-based approaches are not applicable as people change often their clothes.In addition, for poorly illuminated or dark environments (at night), which are typical in domestic environments, RGB images provide very little information (Fig. 1d).Skeletal data is not always available because of self-occluding body motion (e.g.person facing opposite the camera, Fig. 1b) or objects occluding parts of the body (e.g., passing behind a table, see Fig. 1c).In order to deal with the above limitations, in this paper we propose the use of a thermal camera, which provides clear images in the infrared spectrum, even in the darkness (Fig. 1e).The camera is mounted on the top of an interactive service robot (Fig. 1a) used in the ENRICHME 1 project to monitor and assist elderly people with mild cognitive impairments at home.
In this kind of robotics applications, it is very hard to collect large amount of thermal data because, differently from static cameras, the robot should be moving among people for an extensive period of time and collect a multitude of views, which can be technically infeasible.Therefore, deeplearning approaches based on large amounts of thermal data are not suitable for our human re-identification system.Moreover, in domestic environments, people can move freely and can be observed by the robot from many different views and distances.Thus, it is essential to implement a robust re-identification system that can cope with these variations and the uncertainty introduced by occlusions and human pose.
Our approach builds a dictionary of thermal features at different views.However, instead of sampling at predefined, we perform an entropy-based sampling that automatically selects the observations providing more information.We then transform each video to a new sequence of dictionary elements (symbols).In this new representation, we use the geometric distribution among symbols as features and train a support vector machine (SVM) classifier.As our approaches performs entropybased sampling, it uses small amount of data, which fits well with the requirements of service robots.
Although thermal images are widely used in computer vision, especially for face recognition, there is no benchmark dataset for re-identification.To our knowledge, there are two datasets about people walking i) along a corridor [19] and ii) outside a building [22], both recorded by a fixed camera in a surveillance setup, not covering the case of domestic environments from a robot perspective.Thus, we have collected a thermal re-identification dataset, which is publicly available, with the camera mounted on a mobile service robot (see Fig. 1a).We have recorded thermal images of people under various domestic environment cases such as walking, sitting, occlusion from different views.
The contributions of this paper are threefold: -a new entropy-based sampling to build thermal dictionaries and symbolic representations of humans in thermal images; -a full software pipeline, implemented in ROS2 , for thermal-based human re-identification with service robots; -a new publicly available thermal dataset for human re-identification with such robots.
The reminder of this paper is as follows.Related work on re-identification approaches are presented in Section 2. Section 3 explains the details of our approach and how entropy-based sampling is used to create thermal dictionary models (TDM).The symbolic representation and classification in the TDM domain are described in Section 4. Experimental results with our new public dataset are presented in Section 6.Finally, we conclude this paper in Section 7 discussing achievements and current limitations.

Related Work
The main goal of re-identification is to establish a consistent labeling of the observed people across multiple cameras or in a single camera in non-contiguous time intervals [2].
The approach of [10] on RGB cameras focuses on an appearance-based method, which extracts the overall chromatic content, spatial arrangement of colors and the presence of recurrent patterns from the different body parts of the person.In [17], the authors propose a deep architecture that automatically learns features for the optimal re-identification.However, the problem of these methods is the use of color, which is not discriminative for long-term applications.
In [1], re-identification is performed on soft biometric traits extracted from skeleton data and geodesic distances extracted from depth data.These features are weighted and used to extract a signature of the person, which is then matched with training data.The methods in [20,21] tackles the problem applying features based on the extracted skeleton of the person.This is used not only to calculate distances between the joints and their ratios, but also to map the point clouds of the body to a standard pose of the person.This allows to use a point cloud matching technique, typical of object recognition, in which the objects are usually rigid.However, as skeleton data is not robust to body motion and occlusion, these approaches have strong limitations.In addition, point cloud matching has a high computational cost.
In [25], a multi-modal dissimilarity representation is obtained by combining appearance and skeleton data.Similarly, in [24], an ensemble of distance functions, each one learned using a single feature, is built in order to exploit multiple appearance features.While in other works the weights of such functions are pre-defined, in the latter they are learnt by optimizing the evaluation measures.Although these ensembles of state-of-the-art approaches can improve the accuracy of human re-identification, their dependency on color and/or skeletal data pose strong limitations on the type of environment and sensing capabilities of a mobile robot.
In [3], human recognition is performed by fusing a histogram-based human clothes classification, which takes into account the uncertainty of the human position to select relevant image regions, and a simple face recognition algorithm.Then, the output of the recognizers are integrated with multi-sensor detectors to perform simultaneous tracking and recognition.Wengefeld et al. [30] present a combined system on a mobile robot using both laser and 3d-camera for detection, tracking and visual appearance based re-identification.Similarly, [14] presents a method for person identification and tracking with a mobile robot.The person is recognized using height, gait, and appearance features.The tracking information is also used in [29], where the identification is based on an appearance model, using particle swarm optimization to combine a precise upper body's pose estimation and appearance.In these approaches, re-identification is mainly used to recover human IDs during people tracking.In this case, appearance based features are enough for human reidentification in the short period, but not to identify people in the long term.
In the last decade, thermal images have became increasingly popular to solve standard computer vision problems, in particular for face recognition [7,11,12,31].In [7], local binary patterns (LBP) are extracted from thermal images.Then, given a feature vector from a test sample, authors use partial least squares-discriminant analysis (PLS-DA) to perform face recognition.Wu et al. [31] presented a convolutional neural network (CNN) architecture that can automatically learn effective features from thermal data and perform face recognition with a softmax classification layer.Although there are many thermal face recognition approaches that achieve good results, they are typically not suitable for domestic service robot applications because they require a clear frontal image of the face, which is often not possible to obtain with a mobile robot.

Entropy-Based Sampling
The proposed approach for human re-identification uses images acquired by a thermal camera, shown in Fig. 3a.It performs face segmentation, extracts thermal features and creates thermal dictionary models from training sequence.Symbolic representations of the thermal features are used then to train an SVM classifier.The flow diagram of the system and the respective sub-modules are depicted in Fig. 2. The following subsections explain each part of our approach in detail.

Head Segmentation and Thermal Feature Extraction
The image acquired from the thermal camera provides the temperature of objects in the field of view of the camera.Since the temperature of humans are within a specified interval, it is possible to segment people in the thermal image by thresholding the temperature data.Face and body provide an important feature to recognize people.However, the temperature data obtained by observing the human body is largely dependent on the type of clothes the person wears (Fig. 3b).Therefore, we focus on the segmentation of the head region only.
We first perform thresholding on the thermal image (Fig. 3b) in the interval [32 • C − 39 • C] and obtain a binary image (Fig. 3c).Then, we apply connected component analysis on the binary image.We filter the components based on area and width, by keeping the ones that occupies an area and that has a width bigger than pre-defined values.Among the remaining components, we select the region of the binary image with smallest width (Fig. 3).After the head region on the thermal image is segmented, we extract features from it.The temperature data of the head region (i.e., whole 3D head, not just 2D face) provide important information.We therefore calculate the temperature histogram of the current head region (Fig. 3e), which is normalized to obtain the distribution of the temperatures (Fig. 3f).The concatenation of temperature histograms from different points of view will provide the temperature characteristic of the person's head.In [27], it is noted that the temperature of the skin surface varies with the environmental temperature, the body temperature, with the conditions of the skin and the structures beneath it.However, the head is one of the regions on the human body where the skin temperature remains more or less constant despite temperature changes in the environment [23].Our hypothesis is that the head temperature distribution can be used to distinguish people identities.This is verified experimentally in Section 6.

Thermal Dictionary Models
In real-world scenarios, where people freely move in the environment, service robots require a view-independent re-identification approach.Considering single-shot reidentification, most of the view-independent methods are based on full person models built from various views [6,33].Although they can typically achieve very good results, they also have a high computational cost.In addition, finding A model representing the person from different point of view can be embedded in a dictionary of thermal features.Assuming we have thermal data of a person turning around, we can extract a sequence of features obtained at different angles (Fig. 4).However, choosing the sampling angles is not easy.The representation could be too coarse or too fine, depending on the pre-defined angle intervals.In our approach, instead, we let the data determine which features are worth to be kept by performing an entropy-based sampling on the sequence of features.The features that provide sufficient information gain are included into the dictionary.
In information theory, the information gain (relative entropy) is a measure of the difference between two probability distributions, which can be measured by the Kullback-Leibler (KL) divergence [15].We determine the information gain between features using the latter, calculated as follows: where P and Q are features extracted from the thermal data.Using Eq. 1, we perform an entropy-based sampling as follows: first, we calculate the KL divergence between each element in the dictionary model and a new thermal feature; then, if the information gain is bigger than a pre-defined threshold, we include the new feature to the dictionary model.The procedure is summarized in Algorithm 1.Then, we convert the sampled features into symbols and include them in a thermal dictionary model (TDM) (Fig. 5).As TDMs are individually processed and generated for each person, we associate histograms from different people with same symbols.In our approach, we do not assume a pre-defined number of samples or any particular (fixed) orientation of the user.Thanks to our entropy-based sampling scheme, the system automatically selects the most informative observations.During training, the user is only asked to turn around in front of the robot, so the system can measure the information gain and automatically generate the TDMs.These are used for the classification step of the re-identification, as explained in the next section.
Algorithm 1 Thermal dictionary models are obtained by entropy-based sampling based on the Kullback-Leibler (KL) divergence [15].

Symbolic Representation
For each person, we extract a TDM using a training sequence, i.e., T c , for 1 ≤ c ≤ C where C is the number of people (classes).Then, we obtain a symbolic representation of each test sequence by converting the thermal features into symbols using the TDM of each person.For each feature extracted, we find the most similar element in the TDM and assign its symbol to the feature.Here, we use the KL divergence (1) as a measure to evaluate the similarity between features and TDM elements.We calculate the KL divergence between a feature and each TDM element and assign the symbol of the most similar element.As a result of this operation, we obtain a symbol S for each TDM: where T i represents the i th element of the TDM, F is the extracted feature, and m is the index of the most similar element.
An important aspect of the symbolic representation lies on taking into account how similar a feature is to a dictionary element.S represents the most similar element of each feature, but it does not contain this information.Thus, we extend the symbolic representation by including the similarity measure between features and dictionary elements (3).
In conclusion, we obtain the following combined representation for each thermal feature:

Classification
c provides a representation of feature vector F in the domain of samples from class c.This representation contains answers to the following questions about the feature vector: i) what is the most similar dictionary element in class c and ii) how similar are they?Geometrically, the representation is shown in Fig. 6a.If we concatenate the symbols computed for each class into a single vector, we obtain a new feature vector with respect to the whole training space (Fig. 6b): With this representation, we can encode any new feature vector in the (previously trained) feature space.We assume that features from the same class (i.e., person) will have similar representations, so we can match the representation of a test sample to the representation of training samples from the same class.
As a result of the features extracted by Eq. 5, we obtain a high dimensional data representation for the next classification stage.We train an SVM classifier [9], which is proven to work very well for high dimensional data [13].We use the similarity measure to train the SVM classifier:

Software Pipeline and Implementation
The full software pipeline of the proposed human reidentification approach is implemented as a ROS node 3 for applications with domestic mobile robots.This ROS node assumes that the SVM classifier is already trained and the corresponding models for thermal dictionary and classifier are present.The software pipeline is illustrated in Fig. 7.
The blue and yellow boxes present the main and auxiliary functions implemented in the software, respectively.The green box represents the ROS driver of the Optris thermal camera 4 .The following subsections explain the details of the pipeline.

Thermal Image Acquisition
In this paper, the thermal images are acquired using the thermal camera Optris PI-450 (Fig. 3a).This thermal camera offers a temperature range of −20

Software Implementation
The thermal re-identification software is encapsulated into a ROS package, which can be very easily installed thanks to catkin-compatibility 5 .The ROS package developed for re-identification includes a one-click roslaunch 6 file with a YAML 7 file containing the parameters for re-identification shown in Table 1.This ROS node subscribes to both raw and color thermal images published by the Optris driver and performs the following operations.First, following the PI imager library 8 , raw image data (data) are converted to temperature values in floating point format (t) as follows: t = (data − 1000)/10.0 Then, as described in Section 3, the ROS node performs thermal feature extraction and create the symbolic representation of a new thermal feature.Finally, the SVM classifier is used to predict the label of the new feature.The SVM classifier is implemented by using the libSVM library [4].This ROS node assumes that the model files for thermal dictionary and classifier are present under the "config" directory of the ROS package.
The name of the recognized person is published as a ROS topic together with a confidence level.In addition, the result of the re-identification is visualized on a color converted thermal image and published as a separate ROS topic.

Training
The training of the SVM classifier is performed by an offline procedure using a separate software.We set the parameters of libSVM [4] to perform multi-class support vector classification.Linear, polynomial, sigmoid and radial basis functions were tried as kernels.The radial basis function (RBF) was empirically selected to provide the best classification results.The gamma of the RBF is set to 1/120, where 120 is the size of our feature vector.Different values of the cost (C) and epsilon eps have been tried.
Based on these classification tests, we chose the C and eps that gave the best results, i.e. 1000 and 0.01, respectively.To reduce the training time, we also enabled the shrinking parameter of libSVM and set the kernel cache to 4,000 MB.

Experimental Results
We evaluated the performance of our approach under various real-world conditions such as walking, sitting, and occlusion of face.As one of the contributions of this paper, we have recorded a novel thermal dataset for human reidentification.The details of this dataset together with the obtained results are presented in the following subsections.The sequence details are presented as per person

Thermal Re-identification Dataset
We have recorded a publicly available thermal dataset for human re-identification 9 .The dataset was recorded in a laboratory environment using an Optris PI-450 thermal camera mounted on a Kompaï robot.Thermal images were recorded with a resolution of 382 × 288 at 10 fps.Our dataset covers different challenges in real-world scenarios, such as observing people from different points of views, while walking and sitting.It also includes people wearing accessories such as hat, glasses, and scarf that occlude part of the face.The dataset is available as ROSbag files and it can be directly used in a ROS environment.
In particular, the dataset consists of sequences of four categories: 1) person turning around on the spot, 2) person walking, 3) person sitting on a sofa, and 4) person turning around while wearing hat, glasses, or a scarf.In total, 15 people were recorded in the dataset.Sample images from the dataset are depicted in Fig. 8.
For the turn around and occlusion sequences (categories 1 and 4), the participants were asked to turn to four different directions (frontal, left side, back, right side) and remain still for about 3 seconds.In the occlusion sequences, they were also asked to wear accessories occluding different parts of the face.We asked people to repeat the sequences 4 times and 6 times in the occlusion and turn around sequences, respectively.On average, each sequence lasted about 15 seconds, resulting in 600 and 900 total thermal images per person in occlusion and turn around sequences, respectively.The walking sequences (category 2) lasted about 30 seconds.They contain images of non-frontal face views of people walking freely in front of the robot.The sitting sequence (category 3) contains images acquired while the participants were sitting on a sofa at three different distances (2m, 3.5m, and 5m) for about 5 seconds.For the walking and sitting sequences, we asked people to repeat the same activity twice, resulting in 600 and 300 thermal images per person, respectively.In total, our dataset contains around 9 https://lcas.lincoln.ac.uk/wp/research/data-sets-software/ l-cas-rgb-d-t-re-identification-dataset/ 2,400 thermal images per person, resulting 36K images in total (Table 2).
Thermal image datasets are relatively new in computer vision and robotics communities.Most of the existing datasets are for face recognition and include only frontal images [7].To our knowledge, there are only two thermal datasets for human re-identification that do not focus on faces [19,22].However, these datasets only contain thermal images of people walking along a corridor or in front of a building.Therefore, they are not suitable to represent real-world situations typical of domestic environments (e.g., sitting).To the best of our knowledge, ours is the first thermal dataset recorded using a robot in challenging realworld scenarios.

Experimental Setup
In our experiments, we used the sequences in which people turn around on the spot for learning the thermal dictionary models, and the rest for testing.We took 2/3 of the turn around set for training.The rest is included in the testing set.Hence, on average, 600 thermal images per person were used for training.
For the thermal features, we calculated histograms (10 bins) of the head region in the same temperature interval used for thresholding, i.e. [32 • C − 39 • C].The number of bins was selected empirically from real tests.With a higher number of bins, the histograms become sparse, and the TDMs very large, decreasing the re-identification performance.If we select a smaller number of bins, the histograms become flat and look mostly alike, generating very few TDMs, which are not enough to distinguish people.

Results
We evaluated the system on single-frames comparing the recognized class of each frame in a test sequence with the ground truth.We compared our approach ("Symbolic Rep.") to a standard SVM classifier using the whole training set without entropy sampling ("Whole Training Set").In order to analyse the advantages of the symbolic representation, we also compared our approach to an SVM classifier that We calculated recall, precision, accuracy and F 1 score for every subject individually, averaging the results across all the test frames.We also computed the Cumulative Matching Characteristic (CMC) curve, which is commonly used for evaluating re-identification methods [28].For every k = {1 • • • N train }, where N train is the number of training subjects, the CMC expresses the average person recognition rate computed when the correct person appears among the k best classification scores (rank-k).A standard way to evaluate CMC is to calculate the rank-1 recognition rate and the normalized Area Under Curve (nAUC), which is the integral of the CMC.

Turn Around Sequences
First, we evaluated the system on turn around sequences.As the type of motion in the test set is the similar to training, the re-identification problem is relatively easy and we would expect very good results.This is confirmed indeed by Table 3.It can be seen that our symbolic representation achieves the best results with over 85% recall, precision and accuracy.This is much better than the 75% obtained with the other approaches.We should also note that our symbolic representation achieves 10% better performance in average compared to the SVM classifier without symbolic representation ("Entropy-based Samp."). Figure 9 shows the CMC curve of all the approaches for the turn around sequences.Again, this shows that our symbolic representation outperforms the other approaches.

Occlusion Sequences
The results for the occlusion scenario are presented in Table 4.This experiment was designed to understand the effect of occlusion cases that usually happen in real-world scenarios wearing accessories such as glasses, hat, and scarf.The results show a decrease in performance compared to the turn around case, which was expected due to the more challenging nature of the experiment.However, our approach still achieves the highest re-identification rates among all methods.We can also see that our symbolic representation achieves 5% better performance in average compared to the classifier without symbolic representation.This is also confirmed by the CMC curve in Fig. 10.Although the results look similar in higher ranks, we can Fig. 9 The Cumulative Matching Characteristic (CMC) curve of all approaches for turn around sequence clearly see that our approach obtains the highest recognition rate in lower ranks, especially in rank-1.

Walking Sequences
Table 5 presents the re-identification results for the walking scenario.This includes people walking freely observed by various angles and distances very different from the training set.Thus, the complexity of this scenario is higher than previous cases, as can be observed by the performance drop for all the methods.Nevertheless, our approach achieves again the best performance, in particular thanks to the improvement introduced by the entropy-based sampling stage.This can also be observed from the CMC curves in Fig. 11, showing that our approach obtains the highest recognition rate in most of the lower ranks.

Sitting Sequences
Finally, in the sitting sequences, we evaluated the performance under different human poses.This scenario is to understand the effects of observation distance individually.
The results are presented in Table 6.We can say that our approach is not much affected by distance.It clearly outperforms other approaches.Compared to the classifier without symbolic representation, we can see that our final classification achieves 5% better performance in average.Again, when we look at CMC curve in Fig. 12, we can see that our approach achieves the best re-identification rates almost in all ranks.

Overall Performance
For a comparison of overall re-identification performance, we have measured the accuracy of all approaches in a big testing set that consists of all testing sequences.Table 7 presents the overall results and Fig. 13 displays the overall CMC curve.Once again, we can clearly see that our approach outperforms the others with a high margin.We can also see that our symbolic representation achieves 10% better accuracy in average compared to the SVM classifier using only plain TDMs (without symbolic representation).This shows the superiority of the symbolic approach.
Fig. 10 The Cumulative Matching Characteristic (CMC) curve of all approaches for occlusion sequence Bold values represent the best accuracy rates

Discrimination and Rejection
In this subsection, we present the experiments that are performed to evaluate the discrimination and rejection properties of our approach.
To have a better understanding of the classification results, we calculated the confusion matrix for the turn around sequence in Fig. 14.We can see that, except for a couple of people (person 10 and 14), our approach achieves at least 80% recognition accuracy.This also proves that our symbolic representation, even if based on simple temperature histograms, enables a powerful and discriminative re-identification of humans with a robot thermal camera.
We have also tested the rejection property of our approach by analysing the confidence level of the samples that were correctly classified (true positive).In Fig. 15, we present the histogram of all the confidence levels in the testing sequences.We can see that more than 90% of the true positives have a confidence level greater than 60%, proving that our approach has the ability to reject unknown people in most of the cases.

Effects of Head Segmentation Performance
The head segmentation process is the first step of our algorithm, and its failure may compromise the re-identification performance.To understand the effect on our system, we simulated several levels of failure in the head segmentation by artificially occluding the binary image from four directions (Fig. 16b-c), replicating potential problems due to poor thresholding or connected component analysis.In particular, we tested the classification performance by occluding 10% to 90% (at 10% intervals) of the binary image in the turn around sequence.
Figure 17 shows the recognition rate of our approach with various levels of occlusion.Similar to the previous occlusion experiments (Section 6.3.2),we see that there is a decrease in the recognition rate as the face gets more and more occluded.However, we can also see that our approach still works in challenging cases, achieving 60% recognition rate with 30% occlusion.Notice that the recognition performances with the occlusions on the y-axis (top-to-bottom and bottom-to-top) are slightly worse than Fig. 11 The Cumulative Matching Characteristic (CMC) curve of all approaches for walking sequence Bold values represent the best accuracy rates Fig. 13 The overall Cumulative Matching Characteristic (CMC) curve of all approaches Fig. 14 The confusion matrix of our re-identification system for the turn around sequence on the x-axis, showing a higher of our approach on the temperature at the top and bottom of the head.An error the measurement of the temperature from thermal images can negatively affect the head segmentation step.To evaluate the effect of this error, we tested our approach with several levels of temperature error.We simulate this by randomly removing some pixels from the binary image (Fig. 16a).In particular, we tested the classification performance on the turn around sequence by removing 10% to 90% (at 10% intervals) of the binary image.
Fig. 15 The histogram of confidence levels of our approach for the true positive samples in the testing sequences.90% of the true positives have a confidence level greater than 60% (marked by a red line) Fig. 16 The effects of head segmentation performance is tested by applying several levels of occlusion error on the binary image, some examples of which are shown here Figure 18 shows the recognition rate of our approach with various levels of temperature error.Our system can still achieve a recognition rate of 77.8% even with a 50% temperature error, which is higher than the recognition rate achieved by other (non-symbolic) methods without temperature noise (Table 3).This proves that our approach is also robust to this type of errors in the head segmentation step.

Experiments on a Mobile Robot
We further evaluated our approach on a different platform, a TIAGo mobile robot (Fig. 19a), also used in the ENRICHME project.The system works in real-time on the robot using our ROS software implementation (Section 5).An Optris PI-450 thermal camera was mounted on the robot's head, slightly higher than the previous case.We Figure 19 illustrates some outputs of the ROS reidentification The images show the segmented head with a white bounding-box and the recognized person together with the confidence value.Following our analysis on confidence level (Section 6.4), we set the rejection threshold on confidence level to 60%.It can be seen that there are some failures, mostly because of wrongly detected head regions (Fig. 19b).We can also notice that the confidence level of the classifier drops when the person is too close or too far from the robot (Fig. 19c).This is due to the out-of-focus thermal images affecting the extracted features.Nevertheless, we see that in general the proposed approach can recognize the person with high confidence and, importantly, that it works successfully on a real service robot.

Conclusion
This paper presented a new re-identification system for service robot applications using thermal images.A viewindependent approach, using entropy-based sampling and a symbolic representation, has been described.The method is suitable for mobile robots monitoring and assisting elderly people at home, in particular to distinguish the actual user in case of two or more occupants.Our solution requires a relatively small amount of training data, which is an advantage in many real-world applications.To achieve this, we extracted a thermal dictionary model of the person sampling over a single rotation sequence of the head.Then, we transform each thermal frames to a new set of dictionary elements (symbols).In this new symbolic representation, we exploit the geometric distributions of its symbols as features for classification.The proposed approach was evaluated under various real-world conditions, including people walking, sitting, and under occlusion.Both quantitative and qualitative results were presented on a new dataset and a real mobile robot, respectively.Despite some limitations in case of walking or sitting people, the experimental results showed the good performance of our re-identification system in several challenging situations and proven that it can be used for companion robots assisting elderly in daily life.Future work will consider temporal models and multi-sensor extensions of our solution to improve the robustness of the re-identification in case of different human poses and motion behaviors.We will also look into on-line learning approaches [18,32] to incrementally improve the re-identification performance over time as the robot keeps track of people and collects more and more data about the target users.

Fig. 1
Fig. 1 Examples of human observed by a Kompai service robot a from the back b, under occlusion c, with poor lighting conditions d and on thermal camera e

Fig. 2
Fig. 2 The flow diagram of our approach for thermal-based human re-identification

Fig. 5
Fig. 5 Using entropy-based sampling, we create a thermal dictionary model of each person and represent it in symbols

Fig. 6 Fig. 7
Fig. 6 Symbolic representation of a feature vector in class 1 (i.e., person 1) (a) and whole training space (b).Light and dark colored stars represents all the test samples and the dictionary elements of each class obtained by entropy-based sampling (e.g., T 1 ), respectively

Fig. 8
Fig.8 The dataset consists of four parts: people standing still and turning around (a-b), people walking freely (c), people sitting on a sofa recorded at 3 different views (d-f), and people occluding parts of their face while wearing hat, glasses, and scarf (g-i)

Fig. 19
Fig. 19 Some examples of the experiments on a TIAGo mobile robot (a) with two subjects walking (b) and sitting (c).The red marks indicate cases of unsuccessful re-identification The image resolution is 382 × 288 and the maximum frame rate is 80 fps.Our model is equipped with a 15mm lens, which provides a field of view (FOV) of 38 • × 29 • C up to 900 range of 7.5 to 15 μm.•.The ROS driver of the Optris thermal camera provides in two different formats: i) raw image data in unsigned short format, ii) color thermal image in (BGR8) format.Using the ROS driver of Optris thermal camera, both raw and color thermal images are acquired and made directly available to our ROS software module for thermal re-identification.

Table 1
Description

Table 6
The re-identification accuracy rates of our approach (Symbolic Rep.), SVM with