Background

Alternative access pathways

Individuals with severe physical impairments who are unable to communicate through speech or gestures require an alternative means to convey their intentions. In the rehabilitation engineering context, these alternative channels are called access pathways and they constitute the critical front end of an access solution [1]. Some recent efforts have set out to non-invasively translate physiological signals such as the electrical [2, 3] and hemodynamic activity [46] of the brain or the electrodermal response of the skin [7, 8] into functional communication. A comprehensive review of emerging access technologies can be found in [1].

Biomedical applications of thermal imaging

Infrared thermography refers to the measurement of the radiation emitted by the surface of an object in the infrared range of the electromagnetic spectrum, i.e., between wavelengths of 0.8 μm and 1.0 mm [9]. Infrared cameras use specialized lenses manufactured from materials such as germanium to focus thermal radiation onto a focal plane array of infrared detectors [10]. Thermal cameras yield an image that is a spatial, two-dimensional (2-D) map of the 3-D temperature distribution of the object [11].

Infrared thermography has been widely applied in health research, including, for example, breast cancer detection [12, 13], brain surgery [14, 15], heart surgery [16], diagnosis of vascular disorders [17], arthritis [18], pain assessment [19] and post-surgical follow-up in ophthalmology [20].

Recently, Murthy and Pavlidis non-invasively measured human breathing using infrared imaging and a statistical methodology based on multinormal distributions, the method of moments, and Jeffreys divergence measure [21]. Their study was based on the fact that exhaled gases have a higher temperature than the typical background of indoor environments. They achieved high detection accuracy on a small set of subjects and suggested potential applications in polygraphy, sleep studies, sport training, and patient monitoring [21].

Thermal imaging as an access pathway

The goal of this paper is to investigate the potential of thermal imaging as an access pathway. In particular, we introduce a thermographic binary switch activated by voluntary mouth opening. Expired air and the oral cavity are generally warmer than the surrounding tissue and environment while cyclic jaw movements do not cause significant increases in facial temperatures over time [22]. Therefore localized temperature changes due to mouth opening and closing may be detectable using video and image processing of thermographic data. Examples of patient groups that may benefit from this access pathway are people with high level spinal cord injuries resulting in quadriplegia and individuals with spastic quadriplegic cerebral palsy or general hypotonia.

Like computer vision-based access pathways [23], thermal imaging is non-invasive and does not require any sensor attachment to the user. However, thermography overcomes some of the major limitations of conventional computer vision-based access pathways. Firstly, thermography is skin colour invariant since there is no difference in emissivity between black, white and burnt skin, in vivo or in vitro [24]. Human skin has an emissivity of about 0.98. Thermal radiation from the skin originates in the epidermis and is independent of race; it depends therefore only on the surface temperature [9, 11]. Secondly, thermal image quality is independent of ambient lighting conditions and can thus be effective both night and day. Conceivably, this non-contact, non-invasive access pathway could be tailored to the user's unique motor capacity, whether that be mouth opening, eye blinking or simply deep breathing. These are all motor activities that may generate measurable, local temperature changes. Furthermore, given that the key information is thermal variation, a frontal view of the user may not be necessary, facilitating more flexible and unobtrusive placement of the camera.

Methods

Participants

Eight able-bodied participants and two individuals with quadriplegia (one with a C1-C2 incomplete spinal cord injury and the other with severe spastic quadriplegic cerebral palsy) participated in this study. All participants provided written consent. The experimental protocol was approved by the research ethics board of the university and affiliated hospital.

Instrumentation and setup

A THERMAL-EYE 2000B thermal video camera by L-3 Communications with thermal sensitivity ≤100 mK [25] was connected via an NTSC to USB TV convertor (Dazzle Multimedia). Videos were recorded as 240 × 320 AVI files (30 fps) and processed offline in MATLAB & Simulink (version R2007b).

Participants were comfortably seated within a laboratory environment. Those with disability remained in their wheelchairs. The thermal camera was positioned anterior and lateral to the participant at a 45° angle. This camera location was chosen over the often-used frontal view, keeping in mind the eventual application as an access switch where the user's field of view ought to be unobstructed. In the 45° angle condition, infrared thermograms only exhibit a small error in recorded temperatures [9]. Each participant was cued to open his or her mouth and to hold it ajar for one second before closing the mouth. Participants were given an auditory prompt upon every open and close action. The end of each mouth closing was followed by a 3 second rest before the onset of the next mouth opening. The participants were instructed to maintain a constant head position, so that their mouth movement stayed within the camera's field of view.

The thermal sensitivity of the infrared camera we used was well beyond what was needed to detect the temperature change due to mouth opening. We are looking at temperature difference of about 1.5 to 3°C between when mouth is closed and when it is open, while the thermal sensitivity of our infrared camera was ≤100 mK.

Thermal video processing

Figure 1 shows a schematic of our algorithm for detecting mouth openings from the thermal video data. The system consisted of three main components, namely face segmentation, thermal intensity-motion filtering and false positive removal. Each component will be discussed below. To begin, the boundary pixels of each video frame (the first and last pixels of every column and every row) were set to zero to detach objects that may be connected to the borders.

Figure 1
figure 1

Components of the proposed mouth opening detection algorithm.

Face segmentation

In addition to the participant's head and facial region, other body parts such as the participant's neck, thorax and upper limbs also appeared in the videos. For the participants with disability, parts of their wheelchairs were also captured on thermal video. Objects in the background, and in a couple of instances people moving around the participant were also recorded. It was thus essential to segment the participant's face region from all other non-target body parts and objects. Each frame of the video was binarized. Given that facial temperature distributions vary within and among individuals [26], we adopted Otsu's method to determine an adaptive rather than fixed intensity threshold which minimized, on a frame by frame basis, the intra-class variance of the grayscale values of the pixels to be binarized [27].

The binarized frames were then morphologically opened with a disk structuring element of radius 5 pixels to remove small objects, break thin connections, remove thin protrusions, and smooth object contours [28]. In the resulting image, the object with maximum area (presumably the face region) was retained and the object's interior holes were filled by morphological closing with a disk structuring element of radius 20 pixels. The camera-user distance and the user's head size affect the dimension of the above mentioned structuring elements. In a real life application, the camera will be mounted on the user's wheelchair at a fixed distance from the user's face. Hence, once the appropriate parameters are selected in the initial calibration, they do not need to be changed for subsequent use. An example of a segmented face region is depicted in Figure 2(b).

Figure 2
figure 2

The action of the different modules of the mouth opening detection algorithm. (a) Input thermal video frame, (b) Segmented face region, (c) Warm facial zones, (d) Moving facial zones, (e) Intersection of warm and moving objects within the face region, (f) After morphological, size variation, and anthropometric filtering, (g) Final output; detected mouth open is highlighted on the original video with a hollow box.

Thermal intensity-motion filtering

All subsequent processing was applied to the intensity image and confined to the identified face region. The region of interest (ROI) was the participant's mouth and the task of interest was mouth opening. A combination of temperature thresholding and motion tracking was used to perceive mouth opening. Warm zones inside the facial region were extracted by thresholding the segmented face with a scaled version of Otsu's threshold [27] to favour higher intensity (i.e., warmer) pixels. The scale factor was empirically derived as

(1)

and typically ranged from 2.5 to 3. This segmentation yielded a warm zone mask which served to detect instances of mouth opening. However, there were occasions where nearby facial regions had similar temperatures as those of the oral cavity. A corroborating cue was therefore required to accurately pinpoint a mouth opening event.

Since mouth opening involves motion, optical flow was utilized to estimate the direction and speed of motion from one video frame to the next using the Horn-Schunck method [29]. Motion vectors in each frame of the video sequence were computed by solving the optical flow constraint equation

(2)

where I x , I y and I t are the spatiotemporal image brightness derivatives, u is the horizontal optical flow and v is the vertical optical flow. By assuming that the optical flow is smooth over the entire image, the Horn-Schunck method computes an estimate of the velocity field, [u v ]T, that minimizes this equation:

(3)

In this equation and are the spatial derivatives of the optical velocity component u, and α scales the global smoothness term [29]. Motion vectors with velocity magnitude exceeding the mean velocity (i.e., the average of velocity magnitudes across the most recent five frames) per frame across time were retained, yielding a motion mask. The intersection of this motion mask and the warm zone mask, introduced above, yielded all the regions of the face that were both warm and moving.

False positive removal

Despite the combination of motion and thermal cues, the processed frames occasionally contained non-mouth objects (false positives) such as parts of the chin, forehead and the periorbital regions. These non-mouth objects were also warm and moving and were therefore retained subsequent to the thermal intensity and motion filters. An example is the forehead, which according to the literature, is the warmest part of the human body with a temperature (34.5°C) close to that inside the mouth [30]. Therefore motion of the forehead may result in a false positive.

To deal with these false positives, we deployed a series of additional filters based on morphology, size variation between frames, and facial anthropometry. Objects that did not meet the following morphological conditions were deemed as false positives and removed.

  1. 1.

    30 pixels < Area < 150 pixels

  2. 2.

    Eccentricity ≤ 0.9.

  3. 3.

The first condition rejects objects which are either too small or too large to be candidate mouth openings. Likewise, the second condition removes regions that are too elongated to qualify as mouth regions while the third condition eliminates hollow regions as the mouth is expected to be solid. The constants in these morphological filters were selected to resemble the shape of the open mouth and were empirically defined. In addition, objects whose size varied less than 25% between the current frame and the frame occurring ten frames earlier were considered static warm facial regions (e.g., forehead, chin, around the eyes, neck) and were also discarded. This constitutes the size variation filter in Figure 1.

Finally we exploited the fact that facial anatomy is static (i.e., unlikely to change over time). Based on human face anthropometry, the mouth is located in the lower half of the menton-sellion length [31, 32]. When we partitioned the facial ROI along its major axis into four strips, we noticed that indeed the mouth was usually located in the second strip from the bottom. With this anthropometric filter, we dismissed candidate ROIs outside of the second facial quarter. Figures 2(c)–(g) demonstrate the action of the different processing modules.

Algorithm evaluation

To facilitate algorithm evaluation, a truth set was prepared manually for each recorded thermal video. The truth set contained the frame numbers corresponding to the beginning and ending of each mouth opening, the end points of the line maximally spanning the width of the mouth at the onset of opening and the end points of the line maximally spanning the height of the mouth when fully ajar. This truth set served as the gold standard for automatic algorithm evaluation. A true positive was defined as the detection of a ROI temporally within the range of frames corresponding to a gold standard mouth opening, and spatially situated within the bounding box defined by the endpoints extracted above. All other detected objects were considered false positives. A mouth opening that was missed by the algorithm was counted as a false negative. A true negative occurred when there was no mouth opening and the algorithm concluded the same. Sensitivity and specificity values were estimated.

Results and discussion

The performance of the proposed algorithm on the thermal video of ten participants is summarized in Table 1. Detection of mouth opening is generally achieved with very high sensitivity and specificity. The exception is the poorer result for participant 10, which is mainly due to participant's posture, frequent involuntary head rotation away from the camera, and suboptimal camera placement. This participant had an awkward position in his wheelchair (See Figure 3(b)) which forced us to position the thermal camera at an angle and distance from the participant that was not consistent with the other participants. Several improvements can be made to enhance the results in situations like this: (1) The algorithm can be updated to track and focus on the region of interest (participant's face) more accurately; (2) Multiple cameras can be used to capture participant's facial region from different angles, so that the problem of participant mouth leaving the camera's field of view will be mitigated; and (3) The user can be trained. Figures reported in the present paper are the result of just one test session. Training is expected to have a positive effect on user performance.

Specificity is generally higher than sensitivity as the algorithm was tuned to minimize false positives, again keeping in mind the alternative access application where inadvertent switch activations are arguably more costly than missed activations. Most of the false positives were repeated detections of the same non-mouth object in multiple frames. The chin was the source of the majority of the false positives, which tended to occur during actual mouth openings. This is perhaps not surprising given that the chin is proximal to the mouth and moves as the jaw descends to open the mouth. Further, the chin is reportedly the warmest facial area after the forehead [33] when measured by thermography.

The proposed algorithm is robust against participant motion and changes to the background scene. Figure 3(a) demonstrates an example of one of the participants moving his arm towards his face. Although the arm is both warm and moving, and even touches the participant's face in some frames, it was correctly disregarded by the algorithm. Figure 3(b) depicts an example of a person entering and leaving the background scene. The algorithm successfully rejected the background activity and did not generate any false positives.

The proposed combination of filters is location and position invariant; regardless of where in the frame the user moves his or her head within the camera's field of view and independent of the user's position (sitting or semi-supine), mouth opening could generally be located relative to the segmented face region.

Table 1 Performance of the proposed mouth opening detection algorithm
Figure 3
figure 3

Robustness of the proposed algorithm to motion artefacts and changes in the background. (a) Robustness to motion artefacts. Top row from left to right shows input thermal video of an able-bodied participant moving his arm to his head (frames 63, 66, 70, and 74). Bottom row depicts face segmentation in the corresponding frames. (b) Robustness to changes in the background. Top row from left to right is an input thermal video of a participant with disability while a passerby traverses the scene in the background (frames 1759, 1765, 1779, 1790). The corresponding face segmentation results are presented in the bottom row.

If one can voluntary control mouth open and close action, sip and puff technology, EMG based switches, and computer vision based switches can also be used. The advantage of the proposed thermography based access pathway over sip and puff and EMG based switches is that it is non-invasive and non-contact, i.e., does not require attachment of any sensor or external object to the user. Hence it is more hygienic and safe, as the risk of choking is also eliminated. Its advantage over visible light computer vision based access pathways is that it is independent of lighting/color and can thus be used both night and day, indoor and outdoor.

Despite these encouraging findings, thermal imaging does have its limitations. Infrared thermal cameras are more expensive than conventional (visible light) cameras. However, recent innovations in affordable, pocket sized, portable thermal cameras [34] may eventually eliminate the cost issue. Thermal image quality is susceptible to fluctuations in ambient temperature, humidity and regional air circulation [9]. A robust thermographic access pathway may need to dynamically compensate for changes in these contextual factors. A final limitation of thermal imaging is the relatively low resolution of infrared cameras and the inherent difficulty in discriminating between fine facial features. These issues may be mitigated by fusing thermal videos with simultaneously recorded visible spectrum imagery [35].

Conclusion

We have demonstrated that infrared thermography can be used as a non-contact and non-invasive access pathway for individuals who retain voluntary mouth opening and closing. Our analyses suggest that the thermographic access pathway may be robust to various lighting levels, different body postures, extraneous user movements, and background variations.