A 3D Face Modelling Approach for Pose-Invariant Face Recognition in a Human-Robot Environment

Grupp, Michael; Kopp, Philipp; Huber, Patrik; Rätsch, Matthias

doi:10.1007/978-3-319-68792-6_10

Michael Grupp¹⁷,
Philipp Kopp¹⁸,
Patrik Huber¹⁹ &
…
Matthias Rätsch¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9776))

Included in the following conference series:

Robot World Cup

1739 Accesses
2 Citations

Abstract

Face analysis techniques have become a crucial component of human-machine interaction in the fields of assistive and humanoid robotics. However, the variations in head-pose that arise naturally in these environments are still a great challenge.

In this paper, we present a real-time capable 3D face modelling framework for 2D in-the-wild images that is applicable for robotics. The fitting of the 3D Morphable Model is based exclusively on automatically detected landmarks. After fitting, the face can be corrected in pose and transformed back to a frontal 2D representation that is more suitable for face recognition. We conduct face recognition experiments with non-frontal images from the MUCT database and uncontrolled, in the wild images from the PaSC database, the most challenging face recognition database to date, showing an improved performance.

Finally, we present our SCITOS G5 robot system, which incorporates our framework as a means of image pre-processing for face analysis.

You have full access to this open access chapter, Download conference paper PDF

Construction of a Computer Vision Test Platform: VISART for Facial Recognition in Social Robotics

Multi-pose Face Registration Method for Social Robot

A Novel Feature-Based Pose Estimation Method for 3D Faces

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Among the technologies for biometric identification and verification, face recognition has become a widely-used method, as it is a non-intrusive and reliable method.

However, the robustness of a facial recognition system is constrained by the degree of head pose that is involved and the recognition rates of state of the art systems drop significantly for large pose angles. Especially for tasks where a great variation in head pose has to be expected, like in human-robot interaction, pose-invariant face recognition is crucial. The rise of collaborative and assistive robots in industrial and home environments will increase the demand for algorithms that can adapt to changing settings and uncontrolled conditions. A humanoid robot’s cognitive ability to interpret non-verbal communication during conversations relies also heavily on the face of the human counterpart.

The impact of head pose on the face analysis performance can be minimised by using normalisation techniques that transform non-frontal faces into frontal representations. This type of image pre-processing can loosely be classified into two categories:

2D methods (cylindrical warping, AAM (active appearance models) [7], 2D warping [2])
3D methods (3DMM, GEM (generic elastic models) [22], mean face shape [3])

As the 2D methods are trained on 2D data, the warping will only be an approximation of the underlying 3D rotation. They also do not model imaging parameters like the pose of the face and illumination, and face parameters like expressions, explicitly. Instead, these parameters are inherent and implicitly encoded in the model parameters. When dealing with larger pose ranges than around ±40$^\circ $ in yaw angle, a single 2D model is often no longer sufficient and a multi-model approach must be used (e.g. [8, 13]).

The use of a 3D face model has several advantages, especially when further analysis involving the shape of the face is required, like in emotion detection. To correct the pose for face recognition, a 3D representation of the face can be rotated easily into a frontal view. At close ranges, the use of a 3D or RGB-D sensor can provide the additional depth information for the model. However, since the error of depth measurements increases quadratically with increasing distance when using RGB-D sensors like Microsoft Kinect [17], they are not suitable for acquiring image data from a distance. On the other hand, 2D cameras are more widely-used in existing platforms and cost-efficient, so that a method combining the advantages of 2D image acquisition and the analysis with a 3D model is needed.

The 3D Morphable Model (3DMM) [6] satifies this need by providing a parameterised Principal Component Analysis (PCA) model for shape and albedo, which can be used for the synthesis of 3D faces from 2D images. A Morphable Model can be used to reconstruct a 3D representation from a 2D image through fitting. Fitting is the process of adapting the 3D Morphable Model in such a way that the difference to a 2D input image is as small as possible. Depending on the implementation, the fitter’s cost function uses different model parameters to iteratively minimise the difference between the modeled image and the original input image.

Existing fitting algorithms solve a complex cost function, using the image information to perform shape from shading, edge detection, and solve for both shape and texture coefficients [11, 15, 24,25,26]. Due to the complexity of the cost functions, these algorithms require several minutes to fit a single image. While the execution time may not be crucial for offline applications, real-time robotics applications require faster solutions. More recently, new methods were introduced that use local features [16] or geometric features (landmarks, edges) with a fitting algorithm similar to the iterative closest point algorithm [4].

In this paper, we present and evaluate a complete, fully automatic face modelling framework consisting of landmark detection, landmark-based 3D Morphable Model fitting, and pose-normalisation for face recognition purposes. The whole process is designed to be lightweight enough to run in real-time on a standard PC, but is also especially intended to be used for human-robot interaction in uncontrolled, in the wild environments. The theory is explained in Sect. 2. In Sect. 3, we show how the approach improves the recognition rates of a regular COTS (commercial off-the-shelf) face recognition system when it is used as a pre-processing method for pose-normalisation on non-frontal and uncontrolled images from two image databases. Finally, we show how our approach is suitable for real-time robotics applications by presenting its integration into the HMI framework of our SCITOS robot platform in Sect. 4. Section 5 concludes the paper and gives an outlook to future work.

All parts of the framework are publicly available to support other researchers: the C++ implementations of the landmark detection^{Footnote 1} and the fitting algorithm and the face model^{Footnote 2}, as well as a demo app that combines both^{Footnote 3}.

2 Landmark-Based 3D Morphable Model Fitting

In this section, we will give a brief introduction to the 3D Morphable Model and then introduce our algorithm to recover pose and shape from a given 2D image. We will then present our pose-invariant texture representation in form of a so-called isomap.

2.1 The 3D Morphable Model

A 3D Morphable Model consists of a shape and albedo (colour) PCA model constructed from 169 3D scans of real faces. These meshes first have to be brought in dense correspondence, that is, vertices with the same index in the mesh correspond to the same semantic point on each face. The model used for this implementation consists of 3448 vertices. A 3D shape is expressed in the form of $\mathbf {v}= [x_1, y_1, z_1, \dots , x_{_V}, y_{_V}, z_{_V}]^\mathrm {T}$, where $[x_{v}, y_{v}, z_{v}]^\mathrm {T}$ are the coordinates of the vth vertex and V is the number of mesh vertices. The RGB colour values are stacked in a similar manner. PCA is then applied to both the vertex- and colour data matrices separately, each consisting of m stacked 3D face meshes, resulting in $m-1$ shape eigenvectors $\mathbf {S}_i$, their variances $\sigma _{S,i}^2$, and a mean shape $\mathbf {\bar{s}}$, and similarly for the colour model ($\mathbf {T}_i$, $\sigma _{T,i}^2$ and $\mathbf {\bar{t}}$). A face can then be approximated as a linear combination of the respective basis vectors:

$$\begin{aligned} \mathbf {s} = \mathbf {\bar{s}} + \sum _{i=1}^{m-1}\alpha _i \mathbf {S}_i, \quad \mathbf {t} = \mathbf {\bar{t}} + \sum _{i=1}^{m-1}\beta _i \mathbf {T}_i, \end{aligned}$$

(1)

where $\varvec{\alpha } = [\alpha _1, \dots ,\alpha _{m-1}]^\mathrm {T}$ and $\varvec{\beta } = [\beta _1, \dots ,\beta _{m-1}]^\mathrm {T}$ are vectors of shape and colour coefficients respectively.

2.2 Shape and Pose Reconstruction

Given an image with a face, we would like fit the 3D Morphable Model to that face and obtain a pose invariant representation of the subject. At the core of our fitting algorithm is an affine camera model, shape reconstruction from landmarks, and a pose invariant textural representation of the face. We thus only use the shape PCA model from the 3DMM and not the colour PCA model, and instead use the original texture from the image to obtain the best possible image quality for subsequent face analysis steps.

Similar to Aldrian and Smith [1], we obtain a linear solution by decomposing the problem into two steps which can be alternated. The first step in our framework is to estimate the pose of the face. Given a set of 2D landmark locations and their known correspondences in the 3D Morphable Model, we compute an affine camera matrix. The detected 2D landmarks $ x_i \in \mathbb {R}^3 $ and the corresponding 3D model points $ X_i \in \mathbb {R}^4 $ (both represented as homogeneous coordinates) are normalised by similarity transforms that translate the centroid of the image and model points to the origin and scale them so that the Root-Mean-Square distance from their origin is $\sqrt{2}$ for the landmark and $\sqrt{3}$ for the model points respectively: $ \tilde{x}_i = \text {U}x_i $ with $ \text {U} \in \mathbb {R}^{3\times 4}$, and $ \tilde{X}_i = \text {W}X_i $ with $ \text {W} \in \mathbb {R}^{4\times 4} $. Using $\ge $4 landmark points, we then compute a normalised camera matrix $ \tilde{\text {C}} $ using the Gold Standard Algorithm [14] and obtain the final camera matrix after denormalising: $ \text {C} = \text {U}^{-1}\tilde{\text {C}}\text {W} $.

The second step in our framework consists of reconstructing the 3D shape using the estimated camera matrix. We estimate the vector of PCA shape coefficients $\varvec{\alpha }_s$:

$$\begin{aligned} \mathop {\mathrm {arg\,min}}\limits _{\varvec{\alpha }_s}\sum _{i=1}^{3N} (x_{m,i} - x_{i})^{2} + \lambda ||\varvec{\alpha }_s ||^{2} \,, \end{aligned}$$

(2)

where N is the number of landmarks and $\lambda $ is a weighting parameter for the regularisation that is needed to only allow plausible shapes. $ x_i $ are the detected landmark locations and $ x_{m,i} $ is the projection of the 3D Morphable Model shape to 2D using the estimated camera matrix. Subsequently, the camera matrix can then be re-estimated using the now obtained identity specific face shape, instead of only the mean face. Both steps are iterated for a few times - each of them only involves solving a small linear system of equations.

In the experiments with automatically detected landmarks, we use a cascaded regression based approach similar to Feng et al. [12]. After an initial estimate, i.e. placing the landmarks in the region found by a face detector, we extract local features (HOG [9]) to update the landmark locations until convergence towards their actual position. The detected 2D landmarks have known corresponding points in the 3D Morphable Model, and these points are then used to fit the model. The combination of the regression based landmark detection and the landmark-based 3D Morphable Model fitting results in a lightweight framework that is feasible to run on videos (all components run in the order of milliseconds on a standard CPU).

2.3 Pose-Independent Face Representation

After fitting the pose and shape, a correspondence between the 3D mesh and the face in the 2D image is known for each point in the image. We use this correspondence to remap the original face texture from the image onto a pose-invariant 2D surface that covers the face from all angles. We create such a generic representation with the isomap algorithm [27]: it finds a projection from the 3D vertices to a 2D plane that preserves the geodesic distance between the mesh vertices. Our mapping is computed with the algorithm from Tena [23].

The isomap of different persons are in dense correspondence with each other, meaning each location in the map corresponds to the same physical point in the face of every subject (for example, a hypothetical point $ x = [100, 120] $ is always the center of the right eye). It can consequently show the accuracy of the fitting and is therefore a plausible representation of the result.

Figure 1 shows an isomap and a frontal rendering side by side. The isomap captures the whole face, while the rendering only shows the frontal parts of the face. As the face in the input image is frontal, with most parts of the face visible, there is little self occlusion. In case of large poses, a major part of the isomap can not be filled with face texture, and these regions will be marked black (for example, a small black spot next to the nose can be observed in the figure).

3 Face Recognition Experiments

To analyse the performance of our approach, we have run experiments on two different image databases (MUCT and Point and Shoot Challenge (PaSC)). Our experiments are face verification experiments, which means that the system has to verify the identity of probe images compared to a set of gallery images via matching. The results are then statistically analysed regarding the ratio of the two possible error types, the false acceptance rate (FAR) and false rejection rate (FRR) and plotted into detection error tradeoff (DET) curves [19], using the ground-truth subject-ID information provided in the metadata of the image databases. In the experiments on the PaSC database (Sect. 3.2), we filter the probe and gallery images according to different head poses.

We used a market-leading, commercial face recognition engine for the process of enrolment and matching^{Footnote 4}. As a reference measure, all experiments were also conducted using the original, unprocessed images from the databases (denoted as org). With this reference, we are able to analyse the impact of using our approach compared to a conventional face recognition framework without 3D modelling.

3.1 Experiments on the MUCT Image Database

The MUCT Database consists of 3755 faces of 276 subjects and is published by the University Of Cape Town [20]. For our experiments only persons without glasses and their mouths closed were used. After this filtering, 1221 pictures were left. The pictures are taken from 5 different cameras that cause different pose angles and there are in total 10 different lighting schemes applied to the pictures. Moreover, the size of the faces within the images is quite large and the subjects are placed in a controlled lab environment. MUCT comes with 76 manually labeled landmarks given for every image, from which we used 16 for the initialisation of our landmark-based 3DMM fitting (LM).

In total, we used the unprocessed images (org) and the pose normalised renderings coming from our fitting method (LM). We ran a verification experiment on these images, leading to similarity matrices with 1 490 841 scores for both methods. Using these scores, a DET curve was plotted, of which a relevant part is depicted in Fig. 2.

It is clear to see that the verification errors of the face recognition drop significantly when our landmark-based modelling approach is used to correct the head pose. This experiment shows that when pose is the major uncontrolled factor, like on MUCT, the usage of our LM approach leads to a clearly improved face recognition performance.

3.2 Experiments on the Point and Shoot (PaSC) Image Database

For our proposed use case in assistive or humanoid robotics, MUCT is not representative of typical in the wild images we encounter. To verify the performance in heavily uncontrolled settings, we decided to do further investigations on a larger, more challenging database. The Point and Shoot Challenge (PaSC) is a performance evaluation challenge for developers of face recognition systems initiated by the Colorado State University [5].

As the name depicts, the images and videos used in the challenge are taken with consumer point and shoot cameras and not with the help of professional equipment. The database offers 9376 still images of varying resolution up to $4000 \times 3000$ pixels and labeled metadata. The arrangement of the image sceneries raises some major challenges for the automated recognition of faces. The pictures were taken in various locations, indoors and outdoors, with complex backgrounds and harsh lighting conditions, accompanied by a low image quality due to blur, noise or incorrect white balance. Additionally, the regions of interest for face recognition are also relatively small and of low resolution because the photographed people are not the main subject of the scenery.

In a direct comparison, the verification rate on PaSC equals only 0.22 at 0.001 FAR, in contrast to the more controlled databases MBE (0.997), GBU (0.8) and LFW (0.54) [5].

Because of these harsh, uncontrolled conditions, which are similar to a real life scenario, we chose this image database for our experiments and applied our fitting algorithm (LM) to the images. Like the previous experiment on MUCT (Sect. 3.1), it was a verification experiment using the same COTS face recognition algorithm.

Influence of the Number of Landmarks. Our experiments on PaSC are intended to be as near to a realistic scenario as possible. Therefore, we used the automatic landmark detection of [12] instead of using manually annotated facial landmarks. The detector and our fitting algorithm (LM) allow the use of different landmark schemes, which can vary in their amount. We tested three different sets of 7, 13 and 49 landmarks, which are visualised on the 3D Morphable Model in Fig. 3(a). To compare the different amounts of landmarks, their corresponding results for the face recognition experiment on PaSC are shown in Fig. 3(b). It can be observed that a higher number of facial landmarks leads to a higher quality 3DMM fitting, which then leads to a better verification performance.

Performance Evaluation Across Head Pose. One of the key questions of this work is whether our approach can improve the performance of face recognition under uncontrolled conditions, especially when dealing with head pose.

To allow a more detailed interpretation of the results, we generated head pose annotations for the PaSC database. We used OpenCV’s implementation of the POSIT algorithm [10] to calculate the yaw angles for each image in the database. POSIT estimates the pose of an object from an image using a 3D reference model and corresponding feature points of the object in the image. In our case, the 3D model was the 3D Morphable Model and the feature points were facial landmarks.

We used this yaw angle annotation to filter the scores into different groups of yaw angles. The results obtained with our method were then compared to the results without any face modelling. For comparison, we also tested the performance of a commercial face modelling system (CFM) using the same workflow. The CFM uses additional modelling capabilities like mirroring (to reconstruct occluded areas of the face) and deillumination. Figure 4 depicts frontalised renderings of the commercial system and our fitter. In total, we compared all images of PaSC against each other and the yaw angle filter separated the scores into 10$^{\circ }$ bins that span from −70$^{\circ }$ to +70$^{\circ }$.

To better see how the pose normalisation methods improve the performance, the resulting curves show the absolute difference of the false rejection rate to the baseline without face modelling (org) at a specific operating point. This type of diagram allows a more simple and clearly arranged representation of the results at different head poses.

In Fig. 5, such a curve from the experiment is shown. A set of gallery images of 0$^{\circ }$ yaw is compared to sets of probe images from 0$^{\circ }$ to 70$^{\circ }$. As gallery databases in face recognition applications often contain frontal images, this experiment reflects a common use case. At the operating point 0.01 FAR, we then plot the improvement for each group of yaw angles. The commercial face modelling solution shows its first improvements over the conventional face recognition starting at 30$^{\circ }$ yaw. Surprisingly, our 3D Morphable Model based method is even able to slightly improve the matching performance for frontal images. Although the COTS face recognition system was used outside of the specification for the large yaw angles, we are able to improve the performance for the whole pose range. By using the original image’s texture, our fitter (LM) does not alter the face characteristics and corrects only rotations in the pitch or roll angles.

In the next two experiments, we take a look at how well the system operates when the gallery is also not frontal. In Fig. 6, the results for the experiment with a 20$^{\circ }$ and a 40$^{\circ }$ gallery can be seen. In this case, matching the faces is a much more difficult task, especially when the subjects look in different directions and large areas of the faces are not visible due to self-occlusion. While we are still able to improve the face recognition capabilities in the experiment with a 20$^{\circ }$ gallery, our method LM struggles to keep up with the commercial solution for a 40$^{\circ }$ gallery. The ability to reconstruct parts of the non-visible texture is a key component of the commercial modelling system (CFM) in such a use case which our system currently doesn’t offer. Although, for the robotics application, which we present in Sect. 4 of this paper, it is likely that frontal images are used for the gallery enrolment.

4 Application on a Robot System

Our fast and accurate 3D face modelling method is particularly suitable for face analysis tasks in assistive or humanoid robotics. In the following, we demonstrate our efforts in this field by illustrating how we integrate our approach into a mobile robot’s software framework (Fig. 7).

A mobile robot platform combined with an HMI system is the basis for our research on human-robot interaction. The robot is based on a SCITOS G5 drive system with onboard industrial PC which is able to navigate autonomously in indoor buildings. For mapping and localisation ability, as well as collision avoidance, two laser scanners are attached to the base (SICK S300 and Hokuyo URG-04LX). Using this base platform, the robot is able to approach humans safely. The HMI part of the robot consists of a touch-based display and a head with moveable, human-like eyes. On top of the robot, we mounted a pan-tilt-unit (PTU; Directed Perception D46-17), where additional sensors and cameras can be installed. The PTU adds an additional degree of freedom which is used for camera positioning to enhance the tracking process.

To align the cameras to the region of interest, we track people by applying an adaptive approach which uses a particle filter to track the position and size of the target and estimates the target motion using an optical flow based prediction model. Furthermore, an SVM-based learning strategy is implemented to calculate the particle weights and to prevent bad updates while staying adaptive to changes in object pose [18, 21]. At this point, the 3DMM pose normalisation of our approach can be used for the image of the tracked face. It can be either applied continuously or if the pose of the subject is greater than a certain threshold (using the knowledge obtained in the experiments of Sect. 3.2).

Key frames of an example video taken from our robot system are presented in the upper row of Fig. 8. The OpenCV object detection with a face cascade [28] was used to initialise the regression based landmark detector, which then automatically found 15 facial landmarks. The fitting results, represented in isomaps, are shown in the second row of Fig. 8. On the robot, we use the frontal renderings as an input for the same face recognition engine that was also used for the experiments. To distinguish the person’s identity properly, the system should generate a rather high score for this positive match. The pose normalised renderings allow the face recognition algorithm to achieve a clearly higher score compared to the original, unprocessed images. Possible use cases for a robot of this kind are industrial mobile robotics, shop or museum tour guides or a surveillance system (e.g. in a supermarket). In all these cases, the robot has to interact naturally with technically unskilled people. It is therefore convenient to be able to analyse people independent of their pose, without requiring them to look at the robot directly. Identification, age and gender estimation on pose normalised renderings offer the possibility to personalise the robots behaviour.

5 Conclusion and Further Work

We proposed a powerful method for head pose correction using a fully automatic landmark-based approach for fitting a 3D Morphable Model. By using an efficient fitting algorithm, our approach can be used for tasks which require a fast real-time solution that can be also used on live video streams. In contrast to existing work, we focus on an evaluation of our approach with automatically found landmarks and in-the-wild images and publish the whole 3DMM fitting pipeline as open source software (see Sect. 1).

An experimental evaluation on the two commonly-used image databases MUCT and PaSC showed the significance of our approach as a means of image processing for facial recognition in both controlled and heavily unconstrained settings. For recognition algorithms that are mostly trained for frontal faces, a rotation of the head means a loss of information that results in a lower matching score. We showed that our methodology is capable of improving the recognition rates for larger variations in head pose. Compared to a commercial face modelling solution, our approach keeps up well and even outperforms it in certain scenarios. The addition of 3DMM pose normalisation certainly brings advantages compared to a conventional face recognition framework when the setting is uncontrolled - like in robotics, surveillance or consumer electronics.

Furthermore, we presented our software framework on a mobile robot platform which uses our pose normalisation approach to allow a more reliable face analysis. We are currently doing further research on age- and emotion-estimation systems to expand the abilities of our robot. In the future, we also plan to statistically evaluate the effect on their performance when using our landmark-based pose normalisation, like we did for face recognition in this paper.

Notes

1.
https://github.com/patrikhuber/superviseddescent.
2.
https://github.com/patrikhuber/eos.
3.
http://www.4dface.org.
4.
Our results obtained with commercial systems should not be construed as the vendors’ full-capability results.

References

Aldrian, O., Smith, W.A.: Inverse rendering of faces with a 3D morphable model. IEEE Trans. Pattern Anal. Mach. Intell. 35(5), 1080–1093 (2013)
Article Google Scholar
Asthana, A., Jones, M.J., Marks, T.K., Tieu, K.H., Goecke, R.: Pose normalization via learned 2D warping for fully automatic face recognition. In: BMVC, pp. 1–11 (2011)
Google Scholar
Asthana, A., Marks, T.K., Jones, M.J., Tieu, K.H., Rohith, M.: Fully automatic pose-invariant face recognition via 3D pose normalization. In: IEEE International Conference on Computer Vision (ICCV), pp. 937–944 (2011)
Google Scholar
Bas, A., Smith, W.A., Bolkart, T., Wuhrer, S.: Fitting a 3D morphable model to edges: a comparison between hard and soft correspondences. arXiv preprint (2016). arXiv:1602.01125
Beveridge, J.R., Phillips, P.J., Bolme, D.S., Draper, B.A., Givens, G.H., Lui, Y.M., Teli, M.N., Zhang, H., Scruggs, W.T., Bowyer, K.W., et al.: The challenge of face recognition from digital point-and-shoot cameras. In: IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS), pp. 1–8 (2013)
Google Scholar
Blanz, V., Vetter, T.: A morphable model for the synthesis of 3D faces. In: Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques, pp. 187–194. ACM Press/Addison-Wesley Publishing Co. (1999)
Google Scholar
Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 681–685 (2001)
Article Google Scholar
Cootes, T.F., Taylor, C.J.: Statistical models of appearance for computer vision. Technical report, University of Manchester (2004)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, pp. 886–893. IEEE (2005)
Google Scholar
Dementhon, D.F., Davis, L.S.: Model-based object pose in 25 lines of code. Int. J. Comput. Vis. 15(1–2), 123–141 (1995)
Article Google Scholar
Egger, B., Schönborn, S., Forster, A., Vetter, T.: Pose normalization for eye gaze estimation and facial attribute description from still images. In: Jiang, X., Hornegger, J., Koch, R. (eds.) GCPR 2014. LNCS, vol. 8753, pp. 317–327. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11752-2_25
Chapter Google Scholar
Feng, Z.H., Huber, P., Kittler, J., Christmas, W., Wu, X.J.: Random cascaded-regression copse for robust facial landmark detection. Signal Process. Lett. 22(1), 76–80 (2015)
Article Google Scholar
Feng, Z.-H., Kittler, J., Christmas, W., Wu, X.-J.: Feature level multiple model fusion using multilinear subspace analysis with incomplete training set and its application to face image analysis. In: Zhou, Z.-H., Roli, F., Kittler, J. (eds.) MCS 2013. LNCS, vol. 7872, pp. 73–84. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38067-9_7
Chapter Google Scholar
Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, New York (2004). ISBN 0521540518
Book Google Scholar
Hu, G., Chan, C.H., Kittler, J., Christmas, B.: Resolution-aware 3D morphable model. In: BMVC, pp. 1–10 (2012)
Google Scholar
Huber, P., Feng, Z.H., Christmas, W., Kittler, J., Rätsch, M.: Fitting 3D morphable models using local features. In: ICIP (2015)
Google Scholar
Khoshelham, K.: Accuracy analysis of kinect depth data. In: ISPRS Workshop Laser Scanning, vol. 38, pp. 133–138 (2011)
Article Google Scholar
Kopp, P., Grupp, M., Poschmann, P., Böhme, H.J., Rätsch, M.: Tracking system with pose-invariant face analysis for human-robot interaction. In: Informatics Inside (2015)
Google Scholar
Martin, A., Doddington, G., Kamm, T., Ordowski, M., Przybocki, M.: The DET curve in assessment of detection task performance. Technical report (1997)
Google Scholar
Milborrow, S., Morkel, J., Nicolls, F.: The MUCT landmarked face database. Pattern Recogn. Assoc. S. Afr. (2010). http://www.milbo.org/muct/
Poschmann, P., Huber, P., Rätsch, M., Kittler, J., Böhme, H.J.: Fusion of tracking techniques to enhance adaptive real-time tracking of arbitrary objects. In: Conference on Intelligent Human Computer Interaction (IHCI) (2014)
Article Google Scholar
Prabhu, U., Heo, J., Savvides, M.: Unconstrained pose-invariant face recognition using 3D generic elastic models. IEEE Trans. Pattern Anal. Mach. Intell. 33(10), 1952–1961 (2011)
Article Google Scholar
Rodríguez, J.R.T.: 3D face modelling for 2D+3D face recognition. Ph.D. thesis, University of Surrey (2007)
Google Scholar
Romdhani, S., Vetter, T.: Estimating 3D shape and texture using pixel intensity, edges, specular highlights, texture constraints and a prior. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 986–993 (2005)
Google Scholar
van Rootseler, R., Spreeuwers, L., Veldhuis, R.: Using 3D morphable models for face recognition in video. In: Proceedings of the 33rd WIC Symposium on Information Theory in the Benelux (2012)
Google Scholar
Tena, J.R., Smith, R.S., Hamouz, M., Kittler, J., Hilton, A., Illingworth, J.: 2D face pose normalisation using a 3D morphable model. In: IEEE Conference on Advanced Video and Signal Based Surveillance, pp. 51–56 (2007)
Google Scholar
Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)
Article Google Scholar
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. I-511–I-518 (2001)
Google Scholar

Download references

Acknowledgements

The authors would like to thank Huan Fui Lee and the RT-LIONS robocup team of Reutlingen University. We would also like to thank CyberExtruder, Inc. for supporting our research.

Author information

Authors and Affiliations

Technische Universität München, Munich, Germany
Michael Grupp
Reutlingen University, Reutlingen, Germany
Philipp Kopp & Matthias Rätsch
University of Surrey, Guildford, UK
Patrik Huber

Authors

Michael Grupp
View author publications
You can also search for this author in PubMed Google Scholar
Philipp Kopp
View author publications
You can also search for this author in PubMed Google Scholar
Patrik Huber
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Rätsch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael Grupp .

Editor information

Editors and Affiliations

University of Bonn, Bonn, Germany
Sven Behnke
Department of Computing, Curtin University, Perth, Western Autralia, Australia
Raymond Sheh
Computer Engineering Department, Istanbul Technical University, Istanbul, Turkey
Sanem Sarıel
School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, Pennsylvania, USA
Daniel D. Lee

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Grupp, M., Kopp, P., Huber, P., Rätsch, M. (2017). A 3D Face Modelling Approach for Pose-Invariant Face Recognition in a Human-Robot Environment. In: Behnke, S., Sheh, R., Sarıel, S., Lee, D. (eds) RoboCup 2016: Robot World Cup XX. RoboCup 2016. Lecture Notes in Computer Science(), vol 9776. Springer, Cham. https://doi.org/10.1007/978-3-319-68792-6_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-68792-6_10
Published: 01 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68791-9
Online ISBN: 978-3-319-68792-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics