Emotion-specific AUs for micro-expression recognition

Leong, Shu-Min; Phan, Raphaël C.-W.; Baskaran, Vishnu Monn

doi:10.1007/s11042-023-16326-5

Emotion-specific AUs for micro-expression recognition

Open access
Published: 08 August 2023

Volume 83, pages 22773–22810, (2024)
Cite this article

Download PDF

You have full access to this open access article

Multimedia Tools and Applications Aims and scope Submit manuscript

Emotion-specific AUs for micro-expression recognition

Download PDF

Shu-Min Leong ORCID: orcid.org/0000-0003-1041-4170¹,
Raphaël C.-W. Phan¹ &
Vishnu Monn Baskaran¹

1468 Accesses
Explore all metrics

Abstract

The Facial Action Coding System (FACS) comprehensively describes facial expressions with facial action units (AUs). It is a well-used technique by researchers in emotions research to understand human emotions better. Most micro-expression datasets provide FACS-coded AU ground truths corresponding to micro-expressions classes. It is commonly accepted in computer vision-based emotions research that certain emotions are reliably revealed when specific combinations of AUs occur. However, the reliability of the ground truth AUs in the micro-expression datasets is lower than that of normal expressions, as they have lower AU intensities. Moreover, these micro-expression datasets only report the overall reliability of all AUs. It could not be identified which AUs had been accurately coded. This work aims to revisit the ground truth AUs of popular micro-expression datasets, namely CASME II, SAMM and CAS(ME)$^2$, and inspect whether any AUs crucial for micro-expression recognition may need to be reconsidered. This paper also provides a detailed AU analysis which yields new AU-based RoIs for each dataset. These new RoIs improve the micro-expression recognition performances compared to the baselines considered in this work. The proposed RoIs for CASME II, SAMM and CAS(ME)$^2$ improve the recognition rates by $2\%$, $1\%$ and $4\%$, respectively, when compared with the existing RoIs.

Facial emotion recognition using convolutional neural networks (FERC)

Article 18 February 2020

Classification of autism severity levels using facial features and eye gaze patterns

Article 25 June 2024

Detecting Affect States Using VGG16, ResNet50 and SE-ResNet50 Networks

Article 11 March 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Face expression analysis is a substantial research area as it plays an essential role in psychology and human interaction, including modern-day affecting computing systems and processes that are designed to be more emotionally aware [18, 23]. Facial expressions are the facial changes that are based on a person’s internal emotions and intentions [49]. The most commonly used human facial expression descriptor is the Facial Action Coding System (FACS). It is worth noting that the FACS has been intensively used in various aspects of facial expression analysis over the past 30 years [40].

Essentially, face expressions comprise micro and macro expressions. Macro-expressions, commonly known as normal expressions, are voluntary facial expressions that last between 0.5s to 4s [17]. On the contrary, micro-expressions are spontaneous and involuntary expressions that last less than 0.5s [56]. The application of facial expressions is not limited to facial expression recognition but could also be incorporated into various fields such as smart city, human-computer and human-robot interaction, and medicine [6, 7, 55]. Hence, studying facial expressions is important in realizing the various applications in these mentioned domains.

Over the years, micro-expression recognition has also received increasing attention as they carry vital cues for applications such as lie detection, clinical diagnosis and social interaction. This is primarily due to the fact that micro-expression is part of human reflexive behaviour [23]. Hence, it inadvertently reveals one’s true emotions [58]. Several works such as [21, 43, 53] have been focused on using the full face for micro-expression recognition, although [13, 22] advocate that people should look at the regions that are most characteristic for each emotion to achieve higher precision in the recognition. In spite of this, region-based micro-expression recognition is less common than techniques which use the entire face.

The action units (AUs) defined in the FACS describe the muscle-based facial actions triggered by facial expressions. These AUs are often provided in the emotion datasets to detect and understand emotions better. Therefore, the reliability of the AUs in FACS-coded datasets could significantly impact the emotion analysis. Although the AUs are commonly used for recognizing the normal macro-expressions, they are not fully analyzed for micro-expressions [23]. The main factor limiting AUs’ applicability in micro-expressions is the number of video frames in micro-expression clips [23]. In order to label the AUs, the face video should be divided into onset (increasing AU intensities), apex (maximum intensities), offset (decreasing intensities) and neutral (minimum intensities). Not to mention, the AUs exhibited in the micro-expression have much lower intensity. With the brief and low intensity of the micro-expressions, it is challenging to identify them precisely. In spite of these challenges, the AUs provided by the FACS-coded datasets are directly used in micro-expression-related problems, such as micro-expression recognition and spotting. Consequently, the micro-expression recognition and spotting performance are affected by having inaccurate AUs encoded in the micro-expression datasets.

Furthermore, all of the FACS-coded micro-expression datasets only report the average reliability across all AUs [47]. Without considering the reliability of individual AUs, this may mask the low reliability for certain AUs in these datasets [17] and correspondingly distort the actual outcome of the recognition.

Moreover, studies have shown that there are inconsistencies in the training and validation of humans to become certified FACS coders [8]. The reliability of the certified FACS coders is arguable because even certified FACS coders require specific training to code the micro-expressions reliably. Researchers from the research area could have also passed the FACS Final Test when one or more of their coders has demonstrated validated reliability [8]. Besides, it is commonly accepted that certain emotions are reliably revealed when specific combinations of AUs suggested by the FACS occur [4]. However, this might not be the case as the way people express themselves varies substantially across different cultures and situations [4]. Hence, all the more reason to validate the effectiveness of the FACS-based AUs reported in each dataset’s ground truth before considering using them for micro-expression-related problems.

Therefore, in this work, we focus on analyzing the occurrence and impact of AUs in micro-expressions regardless of their intensities, as micro-expressions are known to have low AU intensities [1]. In more detail, we focus our discussion on the 24 main AUs and a few relevant gross behaviours that occur in the recent mainstream micro-expression datasets, namely the CASME II, SAMM and CAS(ME)$^2$ datasets. These AUs are the main AUs that describe muscle-based facial actions induced by facial expressions. This paper provides the following contributions in the realm of micro-expression recognition:

1.
We first assign specific facial landmarks to each AU based on the FACS’ action descriptor. These facial landmarks representing the AUs will then form the AU-based regions of interest (RoIs) by having the facial landmarks as central points to perform the independent AU analysis.
2.
The independent AU analysis then yields our proposed sets of AUs that are shown to be more relevant to each dataset considered in this paper, i.e., CASME II, SAMM and CAS(ME)$^2$ in micro-expression recognition.
3.
We then revisit the existing AUs, including the ground truth’s AUs human-coded by the datasets’ designers. We consider the effectiveness of existing AUs encoded in the widely used micro-expression datasets, i.e., CASME II, SAMM and CAS(ME)$^2$ by comparing the micro-expressions recognition performance in terms of accuracy and F1-score.
4.
In addition, we suggest the universal AUs that are applicable to a specific emotion based on our proposed emotion-specific AUs obtained from the evaluated datasets.

The findings from our analysis show that the proposed AUs can better describe the micro-expressions in CASME II, SAMM and CAS(ME)$^2$, resulting in higher recognition accuracy and F1-scores. To elaborate, the proposed RoIs achieve the F1-scores of 0.6083, 0.4476 and 0.5037 for CASME II, SAMM and CAS(ME)$^2$, respectively, when implemented with the state-of-the-art AU-based technique. Note that the analysis in this paper aims to recommend the most effective AU-based RoIs for each micro-expression dataset, which will be helpful in AU-based studies, particularly AU-based micro-expression recognition.

The rest of the paper is organized as follows. In Section 2, we study the importance of emotion recognition and the works related to FACS. The use of AUs in micro-expression-related problems is also reviewed. We also discuss the reliability of the AUs encoded by the dataset designers. In Section 3, we present our analysis approaches employed in this work. In Section 4, we present the experimental results obtained from independent AU analysis and then formulate the proposed emotion-specific AUs. The performance of the proposed AUs is evaluated in the standard multi-class micro-expression recognition. The robustness of the proposed AUs in the state-of-the-art methods is also discussed in this section. Concluding remarks are given in the last section.

2 Literature survey

2.1 Emotion recognition

Emotion recognition has been of broad interest due to its usefulness in medical, security and automotive fields. Studies such as [3, 12, 29] have been conducted to analyze and better understand emotional states. Barra et al. [3] design an algorithm that recognizes emotions by analyzing the facial landmark points through a virtual spider web on the face. To improve the emotion recognition system, Khattak et al. [29] propose the most optimal convolutional neural network (CNN) based model after experimenting with different machine and deep learning models.

Facial expressions can be used to express and detect emotions. Therefore, facial expression recognition is often related to emotion recognition. For instance, the emotion recognition system suggested in [12] is evaluated on facial expression datasets. Yan et al. [55] demonstrate incorporating a hybrid neural networks-based facial expression recognition for the smart city. The introduction of facial expression recognition enables the equipment in the smart city to capture the user’s instant facial changes, allowing the equipment to accommodate the user’s needs accordingly. Chen et al. [7] suggest a multi-modal emotion recognition algorithm involving human facial expressions and speeches to aid in human-robot interaction. Besides that, emotion recognition is also helpful in the healthcare industry. Bisogni et al. [6] propose a CNN-based facial expression recognition system to identify patients’ emotions in the real-time healthcare framework. Hence, the study of the emotion and expression recognition system is crucial in order to realize the benefits brought by the fields mentioned above.

2.2 Facial action units

The comprehensive and anatomically based measurement system, FACS, was introduced in 1978 [14, 15] and later updated in 2002 [16] by psychology researchers trained in understanding human emotions. By using FACS, the human face can be divided into several parts of facial action units (AUs) inclusive of 24 main AUs that must be considered when scoring the AU. The FACS also provides miscellaneous and optional AUs besides the main AUs. The miscellaneous AUs describe the movement of the lower face. However, the FACS does not distinguish specific behaviour for the miscellaneous AUs. The optional AUs are usually excluded unless they reach a certain level of intensity. Besides, the optional AUs are not often present as they are eye-blinking and winking movements. According to FACS, each action unit has a numeric code, an action description, and the involved muscles.

The FACS could be used for emotion measurement since a facial expression is caused by a single AU or a combination of multiple AUs [17]. The combination of multiple AUs may be additive or non-additive. The additive AU combinations maintain the movement of all involved AUs, whereas the non-additive AU combinations modify each other’s appearance [9]. For instance, the combination of AU1 (i.e., inner brow raiser) and AU2 (i.e., outer brow raiser) is often shown in surprise. The inner and outer brow-raising movements of AU1 and AU2 remain the same regardless of the AUs appearing separately or together. Hence, the combination of AU1 and AU2 is additive. On the other hand, the combination of AU1 (i.e., inner brow raiser) and AU4 (i.e., brow lowerer) is non-additive. This is because when both AUs occur together, the upward movement of AU1 changes the downward action of AU4. Although FACS has effectively described macro-expressions with AUs, the study on micro-expression is not as developed due to the diversity of micro-expressions categories in different datasets [5].

As a trained human rater encodes the manual FACS, the FACS rating is subjective and prone to bias. Therefore, research has been done to automate the FACS rating process [8, 24, 40]. However, this research field is still underdeveloped as many problems remain open [40]. One of the problems with automating FACS is the quality of the face video. Since the AUs only cause changes to the local appearance, slight occlusion of the face could lead to inaccurate results. Besides, it is difficult to treat each combined AU as a single class as there are more than 7000 AU combinations [40]. Hence, fully automatic and real-time FACS are yet to be adopted; the recent mainstream micro-expression datasets, notably CASME II, CAS(ME)$^2$ and SAMM, have their ground truth AU information based on the manual FACS. These, therefore, differ across the datasets.

2.3 AUs for micro-expressions

As the AUs can be used for emotion measurement, several studies have incorporated them in the micro-expression recognition system [33, 42, 54, 60, 63]. The AUs serve as a guidance on the area to focus on since micro-expressions consist of local facial movements [26]. The FACS had proposed the AUs commonly triggered when certain expressions occur [40]. Meanwhile, Davinson et al. [11] also suggested a set of dataset-dependent AUs that apply to the micro-expressions in both CASME II and SAMM to eliminate the bias of human reporting in each micro-expression sample. Table 1 compares the AUs suggested by FACS and the objective classes for specific micro-expressions. However, these suggested AUs are generic, as the same AUs are applied across different datasets. Hence, it is necessary to have more relevant AUs assigned specifically for each dataset for better precision. Recently, the work in [62] extracted the optical flow on specific AU regions for micro-expression recognition. The regions are focused on the brows and mouth, as most micro-expressions occur in these regions.

Table 1 Emotion-specific AUs in FACS [40] and Objective Classes [11]

Emotion-specific AUs for micro-expression recognition

Abstract

Similar content being viewed by others

Facial emotion recognition using convolutional neural networks (FERC)

Classification of autism severity levels using facial features and eye gaze patterns

Detecting Affect States Using VGG16, ResNet50 and SE-ResNet50 Networks

1 Introduction

2 Literature survey

2.1 Emotion recognition

2.2 Facial action units

2.3 AUs for micro-expressions

2.4 Reliability of AUs

3 Proposed architecture

3.1 Temporal interpolation model (TIM)

3.2 Emotion-specific AU-based RoIs

3.3 Analysis module

3.4 Feature extraction

3.5 Classification

4 Experimental results and analysis

4.1 Dataset profile

4.2 Independent AU analysis to propose relevant AUs

4.3 Classification results and discussion

4.3.1 Comparison with existing AU-based RoIs

4.3.2 Effect of proposed AUs on state-of-the-art methods

4.3.3 Generic AUs for specific emotions

4.4 Discussion

5 Conclusion

Data Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Competing Interests

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation