Classification of Physical Exercise Intensity Based on Facial Expression Using Deep Neural Network

Khanal, Salik Ram; Sampaio, Jaime; Barroso, Joao; Filipe, Vitor

doi:10.1007/978-3-030-23563-5_36

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11573))

Included in the following conference series:

International Conference on Human-Computer Interaction

1910 Accesses
2 Citations
1 Altmetric

Abstract

If done properly, physical exercise can help maintain fitness and health. The benefits of physical exercise could be increased with real time monitoring by measuring physical exercise intensity, which refers to how hard it is for a person to perform a specific task. This parameter can be estimated using various sensors, including contactless technology. Physical exercise intensity is usually synchronous to heart rate; therefore, if we measure heart rate, we can define a particular level of physical exercise. In this paper, we proposed a Convolutional Neural Network (CNN) to classify physical exercise intensity based on the analysis of facial images extracted from a video collected during sub-maximal exercises in a stationary bicycle, according to standard protocol. The time slots of the video used to extract the frames were determined by heart rate. We tested different CNN models using as input parameters the individual color components and grayscale images. The experiments were carried out separately with various numbers of classes. The ground truth level for each class was defined by the heart rate. The dataset was prepared to classify the physical exercise intensity into two, three, and four classes. For each color model a CNN was trained and tested. The model performance was presented using confusion matrix as metrics for each case. The most significant color channel in terms of accuracy was Green. The average model accuracy was 100%, 99% and 96%, for two, three and four classes classification, respectively.

You have full access to this open access chapter, Download conference paper PDF

Individual’s Neutral Emotional Expression Tracking for Physical Exercise Monitoring

ML-DCNNet: Multi-level Deep Convolutional Neural Network for Facial Expression Recognition and Intensity Estimation

Article 31 July 2020

Real Time Facial Emotions Detection of Multiple Faces Using Deep Learning

Keywords

1 Introduction

Physical exercise reduces the risk of developing and/or dying from cardiovascular disease by maintaining various types of physiological parameters (Heart rate, blood pressure etc.) and blood components (Blood sugar, cholesterol, triglycerides, etc.). It enhances and maintains physical fitness, increases muscle strength, reduces depression and anxiety, and reduces various types of diseases [1, 2]. The benefits from physical exercise could be increased by proper monitoring in real time [3]. Exercises for the elderly and for rehabilitation may lead to many accidents, which may be caused by lack of proper monitoring of the exercise in real time [4]. Measurement of intensity (how hard it feels for the person to perform exercise) can be performed in subjective or objective ways. Basically, there are three ways of monitoring physical exercise intensity: by extracting or monitoring physiological parameters, such as heart rate (HR), respiratory rate (RR) etc., the rated perceived exertion scale, and the talk test (how hard it is for a subject to talk). Usually, physical exercise intensity is synchronous to heart rate; therefore, if we measure heart rate, we can define a level of physical exercise.

In general, physical exercise intensity is considered “very light” at the beginning and “very hard” at the end of the physical exercise in a submaximal graded exercise. The intensity of the exertion depends on many overall body responses, including heart rate (HR), respiratory rate (RR), blood lactate, physical status, mood state, etc. Therefore, a proper measurement or estimation of these parameters during exercise helps to monitor the physical exercise.

Borg scale is a common way to classify the feeling of hardness during physical exercise [5]. He proposed a subjective technique to classify the perceived exertion during exercise which is called rate of perceived exertion (RPE). The measurement of these parameters during exercise is quite challenging because the feeling is based on the individual, and the subject must be familiar with this scale, making the measurement of these levels quite difficult for people who do not have enough knowledge of the Borg scale. Nowadays, monitoring of physical exercise can be carried out by extracting physiological features, using invasive or non-invasive techniques. The subjective way of defining the level of exercise intensity has been used for a long time and has considerable validity [5]. There are several instruments using invasive techniques or contact-sensor technology to measure physiological signals, including heart rate, respiratory rate, and blood lactate etc. [6, 7]. By measuring these parameters, we can correlate them with physical exercise intensity level. The non-invasive technique to identify/classify exercise intensity can consist of measuring physiological data and converting it into exercise intensity level or classes, or directly recognizing exercise intensity using computer vision technique. It is commonly observed that if the person gets tired his/her facial expression and facial color changes, which could be an important cue for the classification of level of intensity.

Most of the recent research trends for measurement of physical exercise intensity include facial image analysis using feature points analysis [8, 9], facial color analysis, mouth and eye blink analysis [10], body movement tracking [11], etc. In the literature, we can find various ways to measure physical exercise intensity using non-invasive methods. Fatigue can be detected by analyzing the pattern of movement of the muscles [12]. The head motion pose analysis can be measured using feature points tracking, which can be analyzed using statistical and machine learning algorithms [8]. Haque [12] presented an efficient non-contact system for detecting non-localized physical fatigue from maximal muscle activity using facial videos, where the video was taken in a realistic environment. Salik [9] proposed exercise intensity classification using facial feature point analysis.

In computer vision, deep learning is an emerging area to classify images. The classification of physical exercise intensity is more likely to use facial expression analysis since the facial expression changes when a person feels a higher intensity in the exercise [9]. Nowadays, facial emotion analysis using deep learning techniques is also very common and has achieved better results than traditional machine learning techniques [13,14,15,16]. Deep learning can also be implemented to analyze or monitor physical exercise by analyzing body parameters. Gordienko [17] proposed a multimodal approach to estimate the fatigue using deep learning techniques, where the input parameters were extracted using wearable sensors.

The exercise intensity level has been classified using several subjective techniques for a long time, but using objective techniques to achieve this task is still challenging. In this paper, an objective or quantitative technique is proposed to classify exercise intensity using computer vision technique. The ground truth class/level of exercise intensity was defined according to the incremental HR. The intensity level of (class) exercise at the beginning or minimum HR is the initial class ‘light’ and at the end of exercise, the maximum HR is the final class or ‘hard’ level. The other classes or levels are also defined by the HR. The deep learning approach using convolutional neural network was applied to classify the facial images, where images were extracted from a video collected during submaximal exercises.

2 Methods

2.1 Dataset Description

Twenty university students (mean age = 26.88 ± 6.01 years, mean weight = 72.56 ± 14.27 kg, mean height = 172.88 ± 12.04 cm, 14 males and six females, and all white Caucasian) participated in the study. An informed consent form was signed by each participant prior to data collection and they were informed of the study protocol before the recordings. The test consisted of a submaximal ramp exercise protocol in a Wattbike Cycloergometer (Wattbike Ltd, Nottingham, UK), after a 5-min warmup with a constant power output of 60 W. The initial power output was 75 W, which was increased by 15 W min-1 until participants reached 85% of their maximal heart rate (calculated as 208 – (0.7*age) or until they were unable to maintain cadence to generate the required power output throughout this stage. Heart rate data was collected at 100 Hz using the Polar T31 cardiofrequencimeter, (Polar Electro, Kempele, Finland) synchronized to the Wattbike load cell for power output measures, sampled at 100 Hz. For the facial tracking, facial video (25 Hz with spatial resolution of 1080 × 1920 pixels) was recorded during the test using a video camera placed in (90° angle with face and camera) the frontal plane view to capture the participants’ face while performing the exercise. The participants were not allowed to talk during the test but could express their feelings freely with facial expression throughout.

For the purpose of this study, a dataset containing various classes of images with different levels of tiredness was prepared. The image frames were manually assigned the categories, accordingly to the heart rate. Considering two classes, the initial 500 frames of each video were considered as class one (not tired faces) and the last 500 frames were considered as class two (tired faces). Since there were 20 subjects, the total number of images for a class were 10,000 and the total number of images in the dataset was 10,000 times the number of classes. The dataset was prepared for two, three, and four classes separately (see Fig. 1).

The allocation of time slots in a video was based on the incremental HR value. In the case of more than two class classification, the middle classes are considered according to the Heart rate value. For instance, if the minimum heart rate is 80 and the final heart rate is 180, then the image frames at the time of 130 bpm are considered as second class or middle class. Likewise, the time slots for more classes are considered by synchronizing the heart rate with the frame number.

2.2 Pre-processing

Before feeding the neural network with inputs, various image pre-processing techniques were applied. All the pre-processing before the neural network is shown in Fig. 2. Since the images were extracted from the video recorded with a moving object (head movement), the frames extracted may have some blurred effects. Therefore, the first pre-processing consists of detecting and removing any blurring effect on the image frame. In the second step, the face was detected in the frame so that we could specifically analyze the face, not the whole frame. The well-known Viola Jones algorithm [18] was applied to detect the face. After detecting the face, we cropped it and down-sampled it into 96 × 96, which is the size of the input layer. One of the basic purposes of this research is to find out the best color channel to classify the physical exercise; therefore, the experiments were performed with separate raw 2D image representing each color channel (Red, Green and Blue) and Grayscale. So, after cropping the face and resizing, RGB frames were split into R, G, B, and converted to Grayscale image.

2.3 Proposed CNN Architecture

A deep neural network based on the Convolutional Neural Network (CNN) or ConvNet was designed with five hidden layers and two fully connected layers, as shown in Fig. 3. Three main types of layers are used to build ConvNet architectures: Convolutional Layer, Pooling Layer, andFully-Connected Layer (exactly as seen in regular Neural Networks). These layers were stacked to form a full ConvNetarchitecture:

Input Layer [96 × 96]: holds the raw pixel values of 2D the image of the faces.
CONV layer: computes the output of neurons that are connected to local regions in the input, each computing a convolution of their weights, as well as a small region they are connected to in the input volume.
Fully-connected layer: computes the class scores, resulting in volume of size [1 × 1 × n], where n is the number of classes. As with ordinary Neural Networks, and as the name implies, each neuron in this layer will be connected to all the numbers in the previous layer.
The activation function chosen was ReLU.
Maxpooling with Pool size (2, 2).
25% dropout results in the maximum amount of regularization.

The first part of each layer consists of a convolutional layer (Conv2d) which can have spatial batch normalization, Maxpooling, dropout and ReLU activation. Each layer consists of these five tasks. After 5 convolutional layers, the network is led to 2 fully connected layers that always have Affine operation and ReLU activation.

We implemented this architecture in the well-known python library Keras. The experiments were carried out in Google Colab GPU.

2.4 Experiments

The first convolutional layer consists of 64 3 × 3 filters; the second one had 128 3 × 3 filters; the third one had 256 3 × 3; the forth one had 512 3 × 3 filters; and the last one also had 512 3 × 3 filters. In all the hidden layers a stride size of 1, batch normalization, max-pooling of size 2 × 2, dropout of 0.25 and ReLU as the activation function. These five hidden layers are followed by two fully connected layers with 256 neurons and 512 neurons respectively. Both the fully connected layers had batch normalization, dropout and ReLU with the same parameters. SoftMax is also used as an out-loss function. Figure 3 shows our deep neural network architecture.

The training was performed in 75 epochs with the batch size of 64. From the dataset of 10,000 images of each class, the dataset was randomly split into training, validation and testing set in the ratio of 80:10:10. For two classes (tired and not tired) the total number of images was 20,000, where 16,000 were for training, 2,000 for validation, and 2,000 for testing. Experiments with two, three, and four classes were also performed. To reduce overfitting, we used dropout and batch normalization in addition to L2 regularization.

3 Experimental Results and Discussion

Separate experiments were done in order to determine the classification into two, three, and four classes and the accuracy of each case was analyzed. In the experiments, the color images were split into Red, Green and Blue components and the original RGB images were converted into Grayscale. The green color component provides the best accuracy of classification. The confusion matrix was drawn in each case. Most of the cases, the accuracy of classification using a two-class classification is more than 99% (See Table 1). From the overall results, classification into two and three classes was accurate and resulted in very high classification accuracy, whereas the classification with four classes had a lower classification performance in each algorithm.

Table 1. The average accuracy of classification into two, three, and four classes, using red, green, blue and gray channels.

Full size table

Based on the result presented on the table, the classification into two classes has 100% of accuracy in all the cases. It also shows that the best raw color channel is Green which obtained the average accuracy of 100%, 99.86%, and 99.75% in two, three, and four class classification, respectively. From these results it is concluded that the level of tiredness or physical exercise intensity is better reflected by the Green color channel. Therefore, in the remaining part of this article, all the experimental results and slots will only be based on the Green color channel.

The accuracy and loss history during training 75 epoch is shown in Fig. 4(a) and (b) respectively. Only the plot of the Green channel is shown, since Green channel resulted the best average prediction accuracy among all other color channels (Tables 2, 3 and 4).

Table 2. Classification accuracy of each class in the classification of physical exercise intensity into two classes.

Full size table

Table 3. Classification accuracy of each class in the classification of physical exercise intensity into three classes.

Full size table

Table 4. Classification accuracy of each class in the classification of physical exercise intensity into four classes.

Full size table

The confusion matrixes of the green color channel in all the class classifications are presented as shown in Figs. 5, 6 and 7. The accuracy for the two-class classification is 100% and it shows that when using convolutional neural network, it is very easy to classify normal and fully tired face. The test set contains randomly selected 2000 images, where 1037 images were normal faces and 963 images were tired faces.

Similarly, the confusion matrix of the three-class classification is shown in Fig. 6. In this case, the misclassification is only for the first and second classes. The last class is 100% accurate. None of the other classes are classified into this class, nor this class is classified into other class. In the case of class one, among 1025 images, only one class was misclassified as class two. Likewise, in the case of class two, three images out of 1018 were misclassified as class one.

Likewise, the recognition of fully tired faces was easier when compared to the others. The misclassification rate was always greater for the nearest class. For example, the first class, or normal faces (not tired), are mostly misclassified into second class, second class is misclassified into first class and third class, and so on.

4 Conclusion

Based on various experiments with various types of image datasets, the deep learning approach for exercise intensity classification based on facial expression can be a potential method to classify exercise intensity in two, three, four or more levels. In case of a two-class classification, the accuracy rate is 100% and even for the three-class classification it is also around 99%. From all the experiments, it can be concluded that the best color channel for the raw input image is Green, in terms of its classification accuracy. The training and testing dataset were randomly prepared from the same subjects; therefore, this approach is more appropriate for personalized physical exercise monitoring.

Future work can be extended two classify images into more than four classes. The experiments were done with only 20 subjects with little diversity in age and origin. To generalize this model, we can train the model with more diversity and a greater number of subjects, in order to improve the test accuracy. Considering that the training and testing were performed from the same subjects, this approach might be more appropriate for personalized exercise monitoring systems, where the system can be trained from the same subject with image datasets taken in various exercise sessions.

References

Shephard, R.J.: The objective monitoring of physical activity. Prog. Prev. Med. 2(4), 1–7 (2015)
Google Scholar
Simon, H.B.: Exercise and health: dose and response, considering both ends of the curve, the americal. J. Med. 128, 1171–1177 (2015)
Article Google Scholar
Matthews, C.E., Hagströmer, M., Pober, D.M., Bowles, H.R.: Best practices for using physical activity monitors in population-based research. Med. Sci. Sports Exerc. 44(1), s68–s76 (2012)
Article Google Scholar
Frankel, J., Bean, J., Frontera, W.: Exercise in the elderly: research and clinical practice. Clin. Geriatr. Med. 22(2), 239–256 (2005)
Article Google Scholar
Gunnar, B.: Physical Performance and Perceived Exertion. Lund, Copenhagan (1962)
Google Scholar
Poh, M.Z., McDuff, D.J., Picard, R.W.: Advancements in noncontact, multiparameter physiological measurements using webcam. IEEE Trans. Biomed. Eng. 58, 7–11 (2011)
Article Google Scholar
Chirakanphaisarn, N., Thongkanluang, T., Chiwpreechar, Y.: Heart rate measurement and electrical pulse signal analysis for subjects span of 20–80 years. J. Electr. Syst. Inf. Technol. 5, 112–120 (2016)
Google Scholar
Miles, K.H., Clark, B., Périard, J.D., Goecke, R., Thompson, K.G.: Facial feature tracking: a psychophysiological measure to assess exercise intensity? J. Sport. Sci. 1–9 (2017)
Google Scholar
Khanal, S.R., Barroso, J., Sampaio, J., Filipe, V.: Classification of physical exercise intensity by using facial expression analysis. In: 2018 Second International Conference on Computing Methodologies and Communication (ICCMC), Erode, India (2018). https://doi.org/10.1109/ICCMC.2018.8488080
Khanal, S.R., Fonseca, A., Marques, A., Barroso, J., Filipe, V.: Physical exercise intensity monitoring through eye-blink and mouth’s shape analysis. In: 2nd International Conference on Technology and Innovation in Sports, Health and Wellbeing (TISHW), Thessaloniki, Greece (2018). https://doi.org/10.1109/TISHW.2018.8559556
Schmal, H., Holsgaard-Larsen, A., Izadpanah, K., Brønd, J.C., Madsen C.F., Lauritsen, J.: Validation of activity tracking procedures in elderly patients after operative treatment of proximal femur fractures. Rehabil. Res. Pract., 1–9 (2018). Article ID 3521271
Article Google Scholar
Haque, M.A., Irani, R., Nasrollahi, K., Moeslund, T.B.: Facial video based detection of physical fatigue for maximal muscle activity. IET Comput. Vis. 10, 323–330 (2016)
Article Google Scholar
Alizadeh, S., Fazel, A.: Convolutional Neural Networks for Facial Expression Recognition, CoRR, vol. abs/1704.06756, pp. 1–8 (2017)
Google Scholar
Burkert, P., Trier, F., Afzal, M.Z. Dengel, A., Liwicki, M.: DeXpression: Deep Convolutional Neural Network for Expression Recognition, CoRR, vol. abs/1509.05371 (2015
Google Scholar
TeixeiraLopes, A., Aguiar, E.D., Souza, A.: Facial expression recognition with convolutional neural networks. Pattern Recognit. 61, 610–628 (2017)
Article Google Scholar
Sang, D.V., Dat, N.V., Thuan, D.P.: Facial expression recognition using deep convolutional neural networks. In: 9th International Conference on Knowledge and Systems Engineering (KSE), Hue, Vietnam (2017)
Google Scholar
Gordienko, Y., Stirenko, S., Kochura, Y., Alienin, O., Novotarskiy, M., Gordienko, N.: Deep Learning for Fatigue Estimation on the Basis of Multimodal Human-Machine Interactions, CoRR, vol. abs/1801.06048, pp. 1–12 (2017)
Google Scholar
Viola, P., Jones, M.: Robust real-time object detection. Int. J. Comput. Vis., 1–25 (2001)
Google Scholar

Download references

Acknowledgement

This article is a result of the project INOV@UTAD, NORTE-01-0246-FEDER-000039, supported by Norte Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (ERDF).

Author information

Authors and Affiliations

Universidade de Trás-os-Montes e Alto Douro, Vila Real, Portugal
Salik Ram Khanal, Jaime Sampaio, Joao Barroso & Vitor Filipe
Institute for Systems and Computer Engineering, Technology and Science (INESC TEC), Porto, Portugal
Jaime Sampaio
Research Center in Sports Sciences, Health Sciences and Human Development (CIDESD), Vila Real, Portugal
Joao Barroso & Vitor Filipe

Authors

Salik Ram Khanal
View author publications
You can also search for this author in PubMed Google Scholar
Jaime Sampaio
View author publications
You can also search for this author in PubMed Google Scholar
Joao Barroso
View author publications
You can also search for this author in PubMed Google Scholar
Vitor Filipe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Salik Ram Khanal .

Editor information

Editors and Affiliations

Foundation for Research and Technology – Hellas (FORTH), Heraklion, Crete, Greece
Margherita Antona
University of Crete and Foundation for Research and Technology – Hellas (FORTH), Heraklion, Crete, Greece
Constantine Stephanidis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khanal, S.R., Sampaio, J., Barroso, J., Filipe, V. (2019). Classification of Physical Exercise Intensity Based on Facial Expression Using Deep Neural Network. In: Antona, M., Stephanidis, C. (eds) Universal Access in Human-Computer Interaction. Multimodality and Assistive Environments. HCII 2019. Lecture Notes in Computer Science(), vol 11573. Springer, Cham. https://doi.org/10.1007/978-3-030-23563-5_36

Download citation

DOI: https://doi.org/10.1007/978-3-030-23563-5_36
Published: 04 July 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-23562-8
Online ISBN: 978-3-030-23563-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Classification of Physical Exercise Intensity Based on Facial Expression Using Deep Neural Network

Abstract

Similar content being viewed by others

Individual’s Neutral Emotional Expression Tracking for Physical Exercise Monitoring

ML-DCNNet: Multi-level Deep Convolutional Neural Network for Facial Expression Recognition and Intensity Estimation

Real Time Facial Emotions Detection of Multiple Faces Using Deep Learning

Keywords

1 Introduction