Advertisement

Random forest-based approach for physiological functional variable selection for driver’s stress level classification

  • Neska El Haouij
  • Jean-Michel Poggi
  • Raja Ghozi
  • Sylvie Sevestre-Ghalila
  • Mériem Jaïdane
Original Paper

Abstract

This paper deals with physiological functional variables selection for driver’s stress level classification using random forests. Our analysis is performed on experimental data extracted from the drivedb open database available on PhysioNet website. The physiological measurements of interest are: electrodermal activity captured on the driver’s left hand and foot, electromyogram, respiration, and heart rate, collected from ten driving experiments carried out in three types of routes (rest area, city, and highway). The contributions of this work touch on the method as well as the application aspects. From a methodological viewpoint, the physiological signals are considered as functional variables, decomposed on a wavelet basis and then analyzed in search of most relevant variables. On the application side, the proposed approach provides a “blind” procedure for driver’s stress level classification, giving close performances to those resulting from the expert-based approach, when applied to the drivedb database. It also suggests new physiological features based on the wavelet levels corresponding to the functional variables wavelet decomposition. Finally, the proposed approach provides a ranking of physiological variables according to their importance in stress level classification. For the case under study, results suggest that the electromyogram and the heart rate signals are less relevant compared to the electrodermal and the respiration signals. Furthermore, the electrodermal activity measured on the driver’s foot was found more relevant than the one captured on the hand. Finally, the proposed approach also provided an order of relevance of the wavelet features.

Keywords

Physiological signals Functional data Random forests Recursive feature elimination Wavelets Grouped variable importance 

Mathematics Subject Classification

62H30 62P30 

Notes

Acknowledgements

The authors gratefully acknowledge Dr. Chiraz Ben Abdelkader and Dr. Hassine Saidane for proofreading the paper. They also thank the anonymous referees for their useful suggestions and meaningful comments which led to a considerable improvement of this paper.

References

  1. Akbas A (2011) Evaluation of the physiological data indicating the dynamic stress level of drivers. Sci Res Essays 6(2):430–439Google Scholar
  2. Alkali AH, Saatchi R, Elphick H, Burke D (2014) Short-time Fourier and wavelet transform analysis of respiration signal obtained by thermal imaging. In: 2014 9th International Symposium on Communication Systems, Networks & Digital Sign (CSNDSP). IEEE, pp 183–187.  https://doi.org/10.1109/CSNDSP.2014.6923821
  3. Auret L, Aldrich C (2011) Empirical comparison of tree ensemble variable importance measures. Chemometr Intell Lab Syst 105(2):157–170.  https://doi.org/10.1016/j.chemolab.2010.12.004 CrossRefGoogle Scholar
  4. Ayata D, Yaslan Y, Kamasak M (2016) Emotion recognition via random forest and galvanic skin response: comparison of time based feature sets, window sizes and wavelet approaches. In: 2016 Medical Technologies National Congress (TIPTEKNO). IEEE, pp 1–4.  https://doi.org/10.1109/TIPTEKNO.2016.7863130
  5. Bach FR (2008) Consistency of the group lasso and multiple kernel learning. J Mach Learn Res 9(Jun):1179–1225MathSciNetMATHGoogle Scholar
  6. Bostrom J (2005) Emotion-sensing PCs could feel your stress. PC WorldGoogle Scholar
  7. Boucsein W (2012) Electrodermal activity. Springer, BerlinCrossRefGoogle Scholar
  8. Breiman L (2001) Random forests. Mach Learn 45(1):5–32.  https://doi.org/10.1023/A:1010933404324 CrossRefMATHGoogle Scholar
  9. Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and regression trees. The wadsworth and Brooks–Cole statistics-probability series. Taylor & Francis, LondonGoogle Scholar
  10. Breiman L, Cutler A (2015) Randomforest: Breiman and cutler’s random forests for classification and regression. R Package Version 46-12 http://cran.r-project.org/package=randomForest
  11. Chaudhary R (2013) Electrocardiogram comparison of stress recognition in automobile drivers on matlab. Adv Electron Electr Eng 3(8):1007–1012Google Scholar
  12. Deng Y, Wu Z, Chu C, Yang T (2012) Evaluating feature selection for stress identification. In: Information Reuse and Integration (IRI), 2012 IEEE 13th international conference on, pp 584–591.  https://doi.org/10.1109/IRI.2012.6303062
  13. Díaz-Uriarte R, de Andrés SA (2006) Gene selection and classification of microarray data using random forest. BMC Bioinform 7(1):1–13.  https://doi.org/10.1186/1471-2105-7-3 CrossRefGoogle Scholar
  14. El Haouij N, Poggi JM, Sevestre-Ghalila S, Ghozi R, Jaïdane M (2018) AffectiveROAD system and database to assess driver’s attention. In: SAC 2018: symposium on applied computing, April 9–13, Pau.  https://doi.org/10.1145/3167132.3167395
  15. Ferraty F, Vieu P (2006) Nonparametric functional data analysis: theory and practice (springer series in statistics). Springer-Verlag New York Inc., SecaucusMATHGoogle Scholar
  16. Genuer R, Poggi JM, Tuleau-Malot C (2010) Variable selection using random forests. Pattern Recognit Lett 31(14):2225–2236.  https://doi.org/10.1016/j.patrec.2010.03.014 CrossRefGoogle Scholar
  17. Genuer R, Poggi JM, Tuleau-Malot C (2015) VSURF: an R package for variable selection using random forests. R J 7(2):19–33Google Scholar
  18. Goldberger A, Amaral L, Glass L, Hausdorff J, Ivanov P, Mark R, Mietus J, Moody G, Peng CK, Stanley H (2000) Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals. Circulation 101(23):e215–e220CrossRefGoogle Scholar
  19. Granero AC, Fuentes-Hurtado F, Naranjo Ornedo V, Guixeres Provinciale J, Ausín JM, Alcañiz Raya M (2016) a Comparison of physiological signal analysis techniques and classifiers for automatic emotional evaluation of audiovisual contents. Front Comput Neurosci 10:74.  https://doi.org/10.3389/fncom.2016.00074 Google Scholar
  20. Gregorutti B (2016) RFgroove: importance measure and selection for groups of variables with random forests. R Package Version 11 http://cran.r-project.org/package=RFgroove
  21. Gregorutti B, Michel B, Saint-Pierre P (2015) Grouped variable importance with random forests and application to multiple functional data analysis. Comput Stat Data Anal 90:15–35.  https://doi.org/10.1016/j.csda.2015.04.002 MathSciNetCrossRefGoogle Scholar
  22. Gregorutti B, Michel B, Saint-Pierre P (2016) Correlation and variable importance in random forests. Stat Comput.  https://doi.org/10.1007/s11222-016-9646-1 MATHGoogle Scholar
  23. Guendil Z, Lachiri Z, Maaoui C, Pruski A (2015) Emotion recognition from physiological signals using fusion of wavelet based features. In: 2015 7th International Conference on Modelling, Identification and Control (ICMIC), IEEE, pp 1–6.  https://doi.org/10.1109/ICMIC.2015.7409485
  24. Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3):389–422.  https://doi.org/10.1023/A:1012487302797 CrossRefMATHGoogle Scholar
  25. Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer series in statistics. Springer New York Inc., New YorkMATHGoogle Scholar
  26. Healey JA (2000) Wearable and automotive systems for affect recognition from physiology. Ph.D. Thesis, MIT Department of Electrical Engineering and Computer ScienceGoogle Scholar
  27. Healey JA, Picard RW (2005) Detecting stress during real-world driving tasks using physiological sensors. IEEE Trans Intell Transp Syst 6(2):156–166.  https://doi.org/10.1109/TITS.2005.848368 CrossRefGoogle Scholar
  28. Horberry T, Anderson J, Regan MA, Triggs TJ, Brown J (2006) Driver distraction: the effects of concurrent in-vehicle tasks, road environment complexity and age on driving performance. Accid Anal Prev 38(1):185–191CrossRefGoogle Scholar
  29. Imam MH, Karmakar CK, Khandoker AH, Palaniswami M (2014) Effect of ECG-derived respiration (EDR) on modeling ventricular repolarization dynamics in different physiological and psychological conditions. Med Biol Eng Comput 52(10):851–860CrossRefGoogle Scholar
  30. Jolliffe I (2012) Principal Component Analysis. Springer, BerlinMATHGoogle Scholar
  31. Karmakar C, Imam MH, Khandoker A, Palaniswami M (2014) Influence of psychological stress on QT interval. Computing in cardiology 2014:1009–1012Google Scholar
  32. Lin HP, Lin HY, Lin WL, Huang ACW (2011) Effects of stress, depression, and their interaction on heart rate, skin conductance, finger temperature, and respiratory rate: sympathetic-parasympathetic hypothesis of stress and depression. J Clin Psychol 67(10):1080–1091.  https://doi.org/10.1002/jclp.20833 CrossRefGoogle Scholar
  33. Louppe G, Wehenkel L, Sutera A, Geurts P (2013) Understanding variable importances in forests of randomized treesGoogle Scholar
  34. Lykken DT (1972) Range correction applied to heart rate and to GSR data. Psychophysiology 9(3):373–379.  https://doi.org/10.1111/j.1469-8986.1972.tb03222.x CrossRefGoogle Scholar
  35. Mallat SG (1989) A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans Pattern Anal Mach Intell 11(7):674–693CrossRefMATHGoogle Scholar
  36. Nicodemus KK, Malley JD, Strobl C, Ziegler A (2010) The behaviour of random forest permutation-based variable importance measures under predictor correlation. BMC Bioinform 11(1):1–13.  https://doi.org/10.1186/1471-2105-11-110 CrossRefGoogle Scholar
  37. Picard RW, Fedor S, Ayzenberg Y (2016) Multiple arousal theory and daily-life electrodermal activity asymmetry. Emot Rev 8(1):62–75.  https://doi.org/10.1177/1754073914565517 CrossRefGoogle Scholar
  38. Poggi JM, Tuleau C (2007) Classification of objectivization data using cart and wavelets. In: Proceedings of the IASC 07. Aveiro, pp 1–8Google Scholar
  39. R Core Team (2016) R: A language and environment for statistical computing. In: R foundation for statistical computing. Vienna. www.r-project.org
  40. Ramsay JO, Silverman BW (2002) Applied functional data analysis: methods and case studies, vol 77. Springer, New YorkCrossRefMATHGoogle Scholar
  41. Ramsay JO, Silverman BW (2005) Functional data analysis. Springer, New York.  https://doi.org/10.1007/b98888 CrossRefMATHGoogle Scholar
  42. Rigas G, Katsis C, Bougia P, Fotiadis D (2008) A reasoning-based framework for car drivers stress prediction. In: Control and automation, 2008 16th mediterranean conference on. pp 627–632.  https://doi.org/10.1109/MED.2008.4602162
  43. Sharma N, Gedeon T (2012) Objective measures, sensors and computational techniques for stress recognition and classification: a survey. Comput Methods Programs Biomed 108(3):1287–301.  https://doi.org/10.1016/j.cmpb.2012.07.003 CrossRefGoogle Scholar
  44. Sidek KA, Khalil I (2011) Automobile driver recognition under different physiological conditions using the electrocardiogram. PC World 38:753–756Google Scholar
  45. Singh RR, Conjeti S, Banerjee R (2012) Biosignal based on-road stress monitoring for automotive drivers. In: 2012 National Conference on Communications (NCC), IEEE, pp 1–5.  https://doi.org/10.1109/NCC.2012.6176845
  46. Singh M, Queyam AB (2013) Stress detection in automobile drivers using physiological parameters: a review. Int J Electron Eng 5(2):1–5Google Scholar
  47. Smart RG, Cannon E, Howard A, Frise P, Mann RE (2005) Can we design cars to prevent road rage? Int J Veh Inf Commun Syst 1(1–2):44–55.  https://doi.org/10.1504/IJVICS.2005.007585 Google Scholar
  48. Strobl C, Zeileis A (2008) Danger: high power!? exploring the statistical properties of a test for random forest variable importance. In: Proceedings of 18th international conference on computational statisticsGoogle Scholar
  49. Tao J, Tan T (2005) Affective computing: a review. In: International conference on affective computing and intelligent interaction. Springer, pp 981–995Google Scholar
  50. Ullah S, Finch CF (2013) Applications of functional data analysis: a systematic review. BMC Med Res Methodol 13(1):43CrossRefGoogle Scholar
  51. Van Dooren M, De Vries JJ, Janssen JH (2012) Emotional sweating across the body: Comparing 16 different skin conductance measurement locations. Physiol Behav 106(2):298–304.  https://doi.org/10.1016/j.physbeh.2012.01.020 CrossRefGoogle Scholar
  52. Verikas A, Gelzinis A, Bacauskiene M (2011) Mining data with random forests: a survey and results of new tests. Pattern Recognit 44(2):330–349.  https://doi.org/10.1016/j.patcog.2010.08.011 CrossRefGoogle Scholar
  53. Yang K, Yoon H, Shahabi C (2005) A supervised feature subset selection technique for multivariate time series. In: Proceedings of the workshop on feature selection for data mining: interfacing machine learning with statistics, pp 92–101Google Scholar
  54. Zhang L, Tamminedi T, Ganguli A, Yosiphon G, Yadegar J (2010) Hierarchical multiple sensor fusion using structurally learned Bayesian network. In: Wireless health 2010 on—WH ’10. ACM Press, New York, p 174.  https://doi.org/10.1145/1921081.1921102
  55. Zhu R, Zeng D, Kosorok MR (2012) Reinforcement learning trees. Technical reports on University of North CarolinaGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.National Engineering School of TunisUniversity of Tunis El ManarTunisTunisia
  2. 2.Paris-Sud UniversityOrsayFrance
  3. 3.CEA-Tech (CEA-LinkLab), CEATunisTunisia
  4. 4.Paris Descartes UniversityParisFrance

Personalised recommendations