Multimedia Tools and Applications

, Volume 76, Issue 6, pp 7699–7730 | Cite as

EmoAssist: emotion enabled assistive tool to enhance dyadic conversation for the blind

  • AKMMahbubur Rahman
  • ASM Iftekhar Anam
  • Mohammed Yeasin
Article

Abstract

This paper presents the design and implementation of EmoAssist: a smart-phone based system to assist in dyadic conversations. The main goal of the system is to provide access to more non-verbal communication options to people who are blind or visually impaired. The key functionalities of the system are to predict behavioral expressions (such a yawn, a closed lip smile, a open lip smile, looking away, sleepy, etc.) and 3-D affective dimensions (valence, arousal, and dominance) from visual cues in order to provide the correct auditory feedback or response. A number of challenges related to the data communication protocols, efficient tracking of the face, modeling of behavioral expressions/affective dimensions, feedback mechanism and system integration were addressed to build an effective and functional system. In addition, orientation-sensor information from the smart-phone was used to correct image alignment to improve the robustness for real world application. Empirical studies show that the EmoAssist can predict affective dimensions with acceptable accuracy (Maximum Correlation-Coefficient for valence: 0.76, arousal: 0.78, and dominance: 0.76) in natural dyadic conversation. The overall minimum and maximum response-times are (64.61 milliseconds) and (128.22 milliseconds), respectively. The integration of sensor information for correcting the orientation improved (16 % in average) the accuracy in recognizing behavioralexpressions. A usability study with ten blind people in social interaction shows that the EmoAssist is highly acceptable with an Average acceptability rating using of 6.0 in Likert scale (where 1 and 7 are the lowest and highest possible ratings, respectively).

Keywords

Human computer interaction Feature extraction Mobile-server communication Video feed Multimedia for blind Soocial interaction Affective dimensions 

References

  1. 1.
    Atkinson AP, Adolphs R (2005) Visual emotion perception: mechanisms and processes. Emotion and Consciousness 150Google Scholar
  2. 2.
    AKMMahbubur Rahman, Tanveer MI, Yeasin M (2011) A spatio-temporal probabilistic framework for dividing and predicting facial action units. In: Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part II. ACII’11. Springer-Verlag, Berlin, pp 598–607Google Scholar
  3. 3.
    Bee N, Franke S, Andrea E (2009) Relations between facial display, eye gaze and head tilt: dominance perception variations of virtual agents. In: ACII Workshop 2009. doi:10.1109/ACII.2009.5349573
  4. 4.
    Boker SM, Cohn JF, Theobald B-J, Matthews I, Brick TR, Spies JR (2009) Effects of damping head movement and facial expression in dyadic conversation using real–time facial expression tracking and synthesized avatars. Philos Trans R Soc, B Biol Scie 364(1535):3485–3495CrossRefGoogle Scholar
  5. 5.
    Cohen I, Sebe N, Garg A, Chen LS, Huang TS (2003) Facial expression recognition from video sequences: temporal and static modeling. Comput Vis Image Underst 91(1):160–187CrossRefGoogle Scholar
  6. 6.
    Clemons J, Zhu H, Savarese S, Austin T (2011) Mevbench: a mobile computer vision benchmarking suite. In: 2011 IEEE International Symposium on Workload Characterization (IISWC). IEEE, pp 91–102Google Scholar
  7. 7.
    Dunbar NE (2005) Perceptions of power and interactional dominance in interpersonal relationships. J Soc Pers Relat 2:22. doi:10.1177/0265407505050944 Google Scholar
  8. 8.
    Ekman P (1982) Emotions in the human face. Studies in Emotion and Social InteractionGoogle Scholar
  9. 9.
    Goldie P (2002) The emotions: a philosophical exploration. Oxford University Press, USACrossRefGoogle Scholar
  10. 10.
    Graesser A, Chipman P (2006) Detection of emotions during learning with AutoTutor. In: 28th Annual Meetings of the Cognitive Science Society. Erlbaum, pp 285–290Google Scholar
  11. 11.
    Grahe JE, Bernieri FJ (1999) The importance of nonverbal cues in judging rapport. J Nonverbal Behav 23(4):253–269CrossRefGoogle Scholar
  12. 12.
    Grimm M, Kroschel K, Narayanan S (2008) The Vera am Mittag German audio-visual emotional speech database. In: 2008 IEEE International Conference on Multimedia and Expo. IEEE, pp 865–868Google Scholar
  13. 13.
    Hinds A, Sinclair A, Park J, Suttie A, Paterson H, Macdonald M (2003) Impact of an interdisciplinary low vision service on the quality of life of low vision patients. Br J Ophthalmol 11:87Google Scholar
  14. 14.
    Krishna S, Balasubramanian V, Panchanathan S (2010) Enriching social situational awareness in remote interactions: insights and inspirations from disability focused research. In: ACM Multimedia. ACM, Firenze. doi:10.1145/1873951.1874202
  15. 15.
    Krishna S, Bala S, McDaniel T, McGuire S, Panchanathan S (2010) VibroGlove: an assistive technology aid for conveying facial expressions. In: Proceedings of the 28th of the international conference extended abstracts on Human factors in computing systems. ACM, pp 3637–3642Google Scholar
  16. 16.
    Lamb TA (1981) Nonverbal and paraverbal control in Dyads and Triads: sex or power differences? Social Psychology Quarterly 44(1):49–53. http://www.jstor.org/stable/3033863 CrossRefGoogle Scholar
  17. 17.
    Liu L et al (2008) Vibrotactile rendering of human emotions on the manifold of facial expressions. J Multi 3(3):18–25Google Scholar
  18. 18.
    Likert R (1932) A technique for the measurement of attitudes. Arch PsycholGoogle Scholar
  19. 19.
    Lucey S, Ashraf AB, Cohn J (2007) Investigating spontaneous facial action recognition through aam representations of the face. I-TECH Education and Publishing, Vienna, pp 275–286CrossRefGoogle Scholar
  20. 20.
    Mann HB, Whitney DR (1947) On a test of whether one of two random variables is stochastically larger than the other. Ann Math Stat 18(1):50–60MathSciNetMATHCrossRefGoogle Scholar
  21. 21.
    McKeown G, Valstar M, Cowie R, Pantic M, Schroder M (2012) The SEMAINE database: annotated multimodal records of emotionally colored conversations between a person and a limited agent. IEEE Trans Affect Comput 3:5–17CrossRefGoogle Scholar
  22. 22.
    Nicolaou MA, Gunes H, Pantic M (2011) Continuous prediction of spontaneous affect from multiple cues and modalities in valence-arousal space. IEEE Trans Affect Comput 2(2)Google Scholar
  23. 23.
    Palm G, Glodek M (2013) Towards emotion recognition in human computer interaction. In: Neural nets and surroundings. Springer, pp 323–336Google Scholar
  24. 24.
    Pelin A (2011) Real-time mobile-cloud computing for context- aware blind navigation. IJNGC 2. http://perpetualinnovation.net/ojs/index.php/ijngc/article/view/107
  25. 25.
    Peterson LL, Davie BS (2007) Computer networks: a systems approach. ElsevierGoogle Scholar
  26. 26.
    Rahman S, Li L (2010) iFeeling: vibrotactile rendering of human emotions on mobile phones. In: Mobile multimedia processing, vol 5960 of lecture notes in computer science. Springer Berlin / HeidelbergGoogle Scholar
  27. 27.
    Rahman AKMM, Tanveer MI, Anam ASMI, Yeasin M (2012) IMAPS: a smart phone based real-time framework for prediction of affect in natural dyadic conversation. In: Visual Communications and Image Processing (VCIP), 2012 IEEE, pp 1–6. doi:10.1109/VCIP.2012.6410828
  28. 28.
    Reshef DN, Reshef YA, Finucane HK, Grossman SR, McVean G, Turnbaugh PJ, Lander ES, Mitzenmacher M, Sabeti PC (2011) Detecting novel associations in large data sets. Science 334(6062):1518–1524MATHCrossRefGoogle Scholar
  29. 29.
    Roberts NA, Tsai JL, Coan JA (2007) Emotion elicitation using dyadic interaction tasks. In: Coan JA, Allen JJB (eds) Handbook of emotion elicitation and assessment. Oxford University PressGoogle Scholar
  30. 30.
    Russell JA (1978) Evidence of convergent validity on the dimensions of affect. J Pers Soc Psychol 36(10). http://content.apa.org/journals/psp/36/10/1152
  31. 31.
    Saragih JM, Lucey S, Cohn JF (2009) Face alignment through subspace constrained mean-shifts. In: 2009 IEEE 12th international conference on Computer vision. IEEE, pp 1034–1041Google Scholar
  32. 32.
    Saragih JM, Lucey S, Cohn JF (2011) Deformable model fitting by regularized landmark mean-shift. Int J Comput Vision 91(1):16. doi:10.1007/s11263-010-0380-4 MathSciNetMATHGoogle Scholar
  33. 33.
    TakeoKanade Y-L, Cohn JF (2001) Recognizing facial actions by combining geometric features and regional appearance patterns. CiteseerGoogle Scholar
  34. 34.
    Tanveer MI, Anam ASM, Rahman AKM, Ghosh S, Yeasin M (2012) FEPS: a sensory substitution system for the blind to perceive facial expressions. In: Proceedings of the 14th international ACM SIGACCESS conference on Computers and accessibility. ACM, pp 207–208Google Scholar
  35. 35.
    Vapnik V, Golowich SE, Smola A (1997) Support vector method for function approximation, regression estimation, and signal processing. Advances in Neural Information Processing Systems 281–287Google Scholar
  36. 36.
    Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154CrossRefGoogle Scholar
  37. 37.
    Watson RW, Mamrak SA (1987) Gaining efficiency in transport services by appropriate design and implementation choices. ACM Trans Comput Syst (TOCS) 5 (2):97–120CrossRefGoogle Scholar
  38. 38.
    Zeng Z, Pantic M, Roisman GI, Huang TS A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Trans Pattern Anal Mach Intell 31(1):39–58. doi:10.1109/TPAMI.2008.52

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • AKMMahbubur Rahman
    • 1
  • ASM Iftekhar Anam
    • 2
  • Mohammed Yeasin
    • 2
  1. 1.EyelockLawrencevilleUSA
  2. 2.University of MemphisMemphisUSA

Personalised recommendations