1 Introduction

We have been developing the human head pose detection systems based on positions of the pupils and nostrils which are detected with video camera and active near-infrared illumination [13]. These methods do not need the learning process but shows high resolution (continuous measurement) of the position and angle of the head, compared to the various non-lighting computer vision methods; appearance template methods, detector array methods, geometric methods, and so on [4].

The light sources for illumination alternately produce so-called bright and dark pupil image (Fig. 1). From the difference images of the consecutively obtained bright and dark pupil images, two pupils of a user are easily detected because the image except the pupils are cancelled out. Once the pupils are detected, small windows are given to each pupil in the following frames and the pupils are detected within the windows. In addition, a large window for nostril detection are given below the pupils in both bright and dark pupil images. Then, small dark regions in the window are detected as nostrils. Thus the nostrils are relatively easily detected. Once the nostrils are detected, also small windows are given to each of the nostrils, which are detected within the small windows.

Fig. 1.
figure 1figure 1

Bright and dark pupil images obtained from camera and their difference image

In [1], the 3D positions of the pupils and nostrils are detected by stereo-matching with the two optical systems, each of these systems consists of a camera-calibrated video camera having a near-infrared sensitivity and the light sourced attached to each of the cameras. The average of the normal vectors of the two triangle planes passing though the two pupils and each of the two nostrils was estimated as the face direction. In [2], by assuming the mutual distances between pupils and nostrils are constant and by giving the distances in advance, the 3D positions of the pupils and the midpoint of the nostrils (internostril midpoint) are estimated by one optical system. The normal vector of the triangle plane consisting of the two pupils and the internostril midpoint and the center of gravity of the triangle are estimated as the direction and position of the head, respectively.

When the users move their head quickly, the bright and dark pupils have a positional discrepancy and the effect of the image difference method weakens. This causes the failure of the pupil detection, followed by the failure of the head pose detection. In order to improve this problem, the reference [3] proposed the image difference method with positional compensation based on head pose detection (Positionally compensated image difference based on head pose, PCID). In [3], the pose of the triangle is detected every frame based on the one-camera head pose detection method [2]. By grouping the pupils and the internostril midpoint (mutual distances are constant) as a rigid triangle that translates and rotates, the pose of the triangle in the current frame is predicted from the poses of the latest two frames using the constant translation and angular velocity model. Using the predicted pose, the 3D positions of the pupils and nostrils in the current frame are estimated. Furthermore, the 3D positions are projected onto the camera image by using the pinhole model. The projected positions mean the predicted positions of the pupils and nostrils in the current frame image. Finally, after shifting the small-area image including the pupil in the latest frame image so that the center of this pupil and the predicted pupil center in the current image accords, the images in the small-area are differentiated and then the pupil is detected.

Although we mentioned before, that the large window given below the pupils is effective for searching the nostrils, the following problem exists. In order to capture the nostrils, the optical systems are greatly inclined up and installed. Therefore, the nose is easy to make shadows beside the nose especially when a user rotates the face horizontally. The shadows tend to be misdetected as the nostrils. When the nostrils are misdetected, the PCID functions wrongly. As a result, the pupils tend to be also misdetected. Accordingly, robust nostril detection is important not only for head pose detection but also for pupil detection. In the present paper, we propose a method to increase the robustness of nostril detection.

2 Proposed Methods

Figure 2 shows the appearance of our head pose detection system. The inclination angle of the cameras was approximately 30°. Definition of the world coordinate system is shown in the figure. In the head pose detection system, the 3D positions of the pupils and nostrils are detected by stereo-matching [1]. The PCID method proposed for the one-camera method [2] is applied to the stereo camera-based method [1]. As shown in Fig. 3, referring to the positions of the two pupils, the 3D domain was determined as the range that surely includes the nostrils. The pupils move up and down (b and c) or to the left and right (±a) against the nostrils by an eyeball rotation. In addition, the nostrils rotate around the line passing through the two pupils by head tilt rotation (+60° ~ −15°). Considering these movements, the range of the domain was determined. Furthermore, when the line connecting two pupils inclines by the user head roll rotation, the domain rotates together with the line. When at least either of the 2D nostril candidate obtained from the two cameras is misdetected, e.g., when one of the two cameras detects a true nostril while another camera detects a false nostril, the position of the nostril detected by stereo-matching tends to protrude from the 3D domain. Accordingly, using the domain can select the true nostrils from the several candidates including the false nostrils when both searching and tracking the nostrils.

Fig. 2.
figure 2figure 2

Optical systems each including video camera, near-infrared light sources, lens, and near-infrared pass filter.

Fig. 3.
figure 3figure 3

Definition of nostril existable 3D domain for nostrils detection (hatched region)

For searching the nostrils, in the proposed method, the size and position of the large window were made variable. The apexes of the 3D domain (A–H in Fig. 3) were transformed from the world coordinate system to the camera image coordinate system. The smallest rectangular area including all apexes are determined in each of the camera image and is used as the large window. However, in the case that the area contained the contour of the user face, the contour images tended to induce false detection. To prevent this, the right and left of the large window were trimmed just under the right and left pupils in the image (Fig. 4). Besides, there is the case that one of the two nostrils cannot be detected, especially when the user greatly rotates the head in horizontal (yaw). To improve this problem, as shown in Fig. 4, the positions of the second-order nostril candidates were determined at both right and left sides of the detected nostril candidate. Here, the line segment connecting between each of the detected nostril candidates and its second-order candidates and the line segment connecting between both pupils was parallel. Besides, the ratio of the length of the two line segments were equaled to the ratio that had been determined in the calibration procedure where subjects were asked to turn their head to the front. All nostril candidates were stereo-matched in a round-robin, produced many 3D nostril candidates. The candidates outside the 3D domain were removed. Furthermore, the mutual distances among all the retained nostril candidates are calculated. The nostril pairs whose mutual distance was close (within ±3 mm) to the corresponding distance measured in the calibration procedure was retained as the nostril pair candidates. Finally, only one nostril pair where the angle between their mutual directional vector and the directional vector connecting the two pupils showed minimal was determined as the pair consisting of the true two nostrils (nostril confirmation method).

Fig. 4.
figure 4figure 4

Setting of second-order nostril candidates and setting of large window in image

When also tracking the nostrils (tracking by small windows), whether the 3D nostrils are true or false was confirmed by using the above-mentioned nostril confirmation method. If they were judged to be false, the tracking process using the small windows stopped and instead the searching process using the large window started.

3 Experiments

Experiment 1: Five healthy university students participated. They were seated 70 cm from the display screen (Fig. 1). The chin of the subjects was put on a chin stand to adjust the head direction every 10° between −30° and +30° in horizontal and −20 and +20° in vertical. In each head direction, the subjects closed their eyes three times. Immediately after opening the eyes, whether the firstly detected nostrils were true or false was visually examined. In Method 1, the large window using the 3D domain was not utilized for searching the nostrils. Instead, a constant and appropriate size of large windows was given at a constant position below the pupils in each camera image. In Method 2, either the second-order nostril candidates or the nostril confirmation method using the 3D domain were not used in the searching process. Figure 5 compares the correct detection ratios (mean of 5 subjects). The proposed method showed the correct detection ratios higher than those of Methods 1 and 2 for all of the horizontal head directions between ±40°. These results indicate the effects of the use of the large window based on the nostril existable 3D domain, the nostril confirmation process for searching, and the setting of the second-order nostril candidates. The effects appeared especially when the subjects rotate their head greatly. For the vertical directions between ±20°, the outstanding superiority of the proposed method was not seen.

Fig. 5.
figure 5figure 5

Comparison of correct detection ratio when head rotates horizontally

Experiment 2: Six healthy university students participated. In Method 3, the nostril confirmation method using the 3D domain for the tracking process was furthermore removed from Method 2. The subject moved a palm up and down and covered the nose ten times for ten seconds. The head direction was 0°. The palm casted the shadow on the face. Shadows also appeared between fingers. Immediately after the nostrils perfectly appeared, whether the first detection was true nostril or false nostril was examined. Figure 6(a) and (b) show that the proposed method decreased the false detection ratio of the nostrils dramatically (43.8 % →  4.0 % in average) and increased the correct detection ratio greatly (57.3 % →  87.2 %), compared to Method 3, indicating that the nostril confirmation method for tracking functioned well. This is because Method 3 tended to continue to track the false nostril (shadow) misdetected by the searching process, whereas the proposed method was able to stop the mistracking.

Fig. 6.
figure 6figure 6

Nostril detection ratios when subject covered nose by palm ten times for ten seconds

Experiment 3: Two healthy university students participated. The subject was asked to rotate his head slowly in horizontal. Figure 7 shows that the false detection and false tracking occurred (see circles A and B) in Method 3 when the head direction angle became large, whereas the proposed method tended to prevent the mistracking.

Fig. 7.
figure 7figure 7

Time course of detected horizontal head direction when subject rotates head in horizontal

4 Discussion

The use of the nostril existable 3D domain functioned well in both searching and tracking nostrils processes. The proposed method is a kind of geometrical method. In the present study, we used the non-coaxial irradiation to produce the dark pupil image. This made it easy to cast the shadows on the face. However, the non-coaxial irradiation effectively produced the dark pupil image, this made it easy to detect the pupils. Since the detected pupils support the nostril detection, the occurrence of the shadows and the easiness of the pupil detection are in a trade-off relationship. Although the pupils are detected by differentiating the bright and dark pupil images in the present study, the proposed method would be useful for pupil detection using the dark pupil method by the non-coaxial irradiation.

5 Conclusions

The present paper shows that, in the head pose detection method based on 3D pupil and nostril detection by stereo-matching, the setting of the existable 3D domain of nostrils prevented the erroneous nostril detection. In the present experiments, the subjects kept opening the eyes because the nostril searching is impossible without the existence of the pupils. Dealing with blinks is the future work.