Abstract
We developed and evaluated a multimodal affect detector that combines conversational cues, gross body language, and facial features. The multimodal affect detector uses feature-level fusion to combine the sensory channels and linear discriminant analyses to discriminate between naturally occurring experiences of boredom, engagement/flow, confusion, frustration, delight, and neutral. Training and validation data for the affect detector were collected in a study where 28 learners completed a 32- min. tutorial session with AutoTutor, an intelligent tutoring system with conversational dialogue. Classification results supported a channel × judgment type interaction, where the face was the most diagnostic channel for spontaneous affect judgments (i.e., at any time in the tutorial session), while conversational cues were superior for fixed judgments (i.e., every 20 s in the session). The analyses also indicated that the accuracy of the multichannel model (face, dialogue, and posture) was statistically higher than the best single-channel model for the fixed but not spontaneous affect expressions. However, multichannel models reduced the discrepancy (i.e., variance in the precision of the different emotions) of the discriminant models for both judgment types. The results also indicated that the combination of channels yielded superadditive effects for some affective states, but additive, redundant, and inhibitory effects for others. We explore the structure of the multimodal linear discriminant models and discuss the implications of some of our major findings.
Similar content being viewed by others
References
Afzal, S., Robinson, P.: Natural affect data—collection and annotation in a learning context. In: International Conference on affective computing and intelligent interaction. Amsterdam, The Netherlands (2009)
Allison P. D.: Multiple Regression. Pine Forge Press, Thousand Oaks, CA (1999)
Anderson J., Corbett A., Koedinger K., Pelletier R.: Cognitive tutors: Lessons learned. J. Learn. Sci. 4, 167–207 (1995)
Anderson J., Douglass S., Qin Y.: How should a theory of learning and cognition inform instruction?. In: Healy, A. (eds) Experimental Cognitive Ppsychology and it’s Applications, pp. 47–58. American Psychological Association, Washington, DC (2005)
Ang, J., Dhillon, R., Krupski, A., Shriberg, E., Stolcke, A.: Prosody-based automatic detection of annoyance and frustration in human-computer dialog. In: International conference on spoken language processing, Denver, CO (2002)
Arroyo I., Woolf B., Cooper D., Burleson W., Muldner K., Christopherson R.: Emotion sensors go to school. In: Dimitrova, V., Mizoguchi, R., Du Boulay, B., Graesser, A. (eds) 14th International Conference on Artificial Intelligence In Education, IOS Press, Amsterdam (2009)
Asthana, A., Saragih, J., Wagner, M., Goecke, R.: Evaluating AAM Fitting Methods for Facial Expression Recognition. In: International conference on affective computing and intelligent interaction. Amsterdam, The Netherlands (2009)
Baker R., D’Mello S., Rodrigo M., Graesser A.: Better to be frustrated than bored: The incidence and persistence of affect during interactions with three different computer-based learning environments. Int. J. Hum.-Comput. Stud. 68(4), 223–241 (2010)
Banse R., Scherer K.: Acoustic profiles in vocal emotion expression. J. Pers. Soc. Psychol. 70, 614–636 (1996)
Barrett L.: Are emotions natural kinds?. Perspect. Psychol. Sci. 1, 28–58 (2006)
Barrett L., Mesquita B., Ochsner K., Gross J.: The experience of emotion. Ann. Rev. Psychol. 58, 373–403 (2007)
Biggs, J.: Enhancing teaching through constructive alignment. In: 20th International conference on improving university teaching, Hong Kong, Hong Kong (1995)
Bower G.: Mood and memory. Am. Psychol. 36, 129–148 (1981)
Brick, T., Hunter, M., Cohn. J.: Get the FACS fast: automated FACS face analysis benefits from the addition of velocity. In: International conference on affective computing and intelligent interaction. Amsterdam, The Netherlands (2009)
Bull P.: Posture and Gesture. Oxford Pergamon Press, Oxford (1987)
Burleson W., Picard R.: Evidence for gender specific approaches to the development of emotionally intelligent learning companions. IEEE Intell. Syst. 22, 62–69 (2007)
Caridakis, G., Malatesta, L., Kessous, L., Amir, N., Paouzaiou, A., Karpouzis, K.: Modeling naturalistic affective states via facial and vocal expression recognition. In: International conference on multimodal interfaces. Cambridge, Massachusetts (2006)
Castellano G., Mortillaro M., Camurri A., Volpe G., Scherer K.: Automated analysis of body movement in emotionally expressive piano performances. Music Percept. 26, 103–119 (2008)
Chen, L., Huang, T., Miyasato, T., Nakatsu, R.: Multimodal human emotion/expression recognition. In: Third IEEE international conference on automatic face and gesture recognition, pp. 366–371 (1998)
Chi M., Roy M., Hausmann R.: Observing tutorial dialogues collaboratively: Insights about human tutoring effectiveness from vicarious learning. Cogn. Sci. 32, 301–341 (2008)
Cohen J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20, 37–46 (1960)
Cohn J., Schmidt K.: The timing of facial motion in posed and spontaneous smiles. Int. J. Wavelets Multiresolut. Inf. Process. 2, 1–12 (2004)
Conati C.: Probabilistic assessment of user’s emotions in educational games. Appl. Artif. Intell. 16, 555–575 (2002)
Conati C., Maclaren H.: Empirically building and evaluating a probabilistic model of user affect. User Model. User-Adapt. Interact. 19, 267–303 (2009)
Craig, S., D’Mello, S., Witherspoon, A., Sullins, J., Graesser, A.: Emotions during learning: the first step toward an affect sensitive intelligent tutoring system. In: International conference on eLearning, Orlando, Florida, pp. 284–288 (2004)
Craig S., D’Mello S., Witherspoon A., Graesser A.: Emote aloud during learning with AutoTutor: applying the facial action coding system to cognitive-affective states during learning. Cogn. Emot. 22, 777–788 (2008)
D’Mello, S., Craig, S., Gholson, B., Franklin, S., Picard, R., Graesser, A.: Integrating affect sensors in an intelligent tutoring system. In: The computer in the affective loop workshop at 2005 international conference on intelligent user interfaces, New York, AMC Press, pp. 7–13 (2005)
D’Mello S., Craig S., Sullins J., Graesser A.: Predicting affective states expressed through an emote-aloud procedure from AutoTutor’s mixed-initiative dialogue. Int. J. Artif. Intell. Educ. 16, 3–28 (2006)
D’Mello S., Picard R., Graesser A.: Towards an affect-sensitive AutoTutor. Intelligent Systems, IEEE 22, 53–61 (2007)
D’Mello S., Craig S., Witherspoon A., McDaniel B., Graesser A.: Automatic detection of learner’s affect from conversational cues. User Model. User-Adapt. Interact. 18, 45–80 (2008)
D’Mello, S., Jackson, G., Craig, S., Morgan, B., Chipman, P., White, H. et al.: AutoTutor detects and responds to learners affective and cognitive states. In: Workshop on emotional and cognitive issues in ITS held in conjunction with the ninth international conference on intelligent tutoring systems, Montreal, Canada (2008)
D’Mello, S., King, B., Entezari, O., Chipman, P., Graesser, A.: The impact of automatic speech recognition errors on learning gains with AutoTutor. In: Annual meeting of the American Educational Research Association, New York, New York (2008)
D’Mello S., Graesser A.: Automatic detection of learners’ affect from gross body language. Appl. Artif. Intell. 23, 123–150 (2009)
D’Mello S., Craig S., Graesser A.: Multi-method assessment of affective experience and expression during deep learning. Int. J. Learn. Technol. 4, 165–187 (2009)
D’Mello S., Craig S., Fike K., Graesser A.: Responding to learners’ cognitive-affective states with supportive and shakeup dialogues. In: Jacko, J. (eds) Human-Computer Interaction. Ambient, Ubiquitous and Intelligent Interaction, pp. 59–604. Springer, Berlin/Heidelberg (2009)
D’Mello, S., Dowell, N., Graesser, A.: Cohesion relationships in tutorial dialogue as predictors of affective states. In: Dimitrova V., Mizoguchi R., du Boulay B., Graesser A. (eds.) 14th International conference on artificial intelligence in education IOS Press, Amsterdam, pp. 9–16 (2009)
Damasio A.: Looking for Spinoza: joy, sorrow, and the feeling brain. Harcourt Inc., New York (2003)
Dasarathy B.: Sensor fusion potential exploitation: innovative architectures and illustrative approaches. IEEE 85, 24–38 (1997)
de Rosis, F., Castelfranchi, C., Goldie, P., Carofiglio, V.: Cognitive evaluations and intuitive appraisals: can emotion models handle them both? HUMAINE Handbook. Springer, Berlin (in press)
De Vicente A., Pain H.: Informing the detection of the students’ motivational state: An empirical study. In: Cerri, S. A., Gouarderes, G., Paraguacu, F. (eds) 6th International conference on intelligent tutoring systems, pp. 933–943. San Sebastian, Spain (2002)
Dodds P., Fletcher J.: Opportunities for new “smart” learning environments enabled by next-generation web capabilities. J. Educ. Multimed. Hypermed. 13, 391–404 (2004)
Ekman P.: Expression and the nature of emotion. In: Scherer, K., Ekman, P. (eds) Approaches to Emotion, pp. 319–344. Erlbaum, Hillsdale, NJ (1984)
Ekman P.: An Argument for basic emotions. Cogn. Emot. 6, 169–200 (1992)
Ekman, P.: Darwin, deception, and facial expression. In: Conference on emotions inside out, 130 years after Darwin’s the expression of the emotions in man and animals, New York, New York (2002)
Ekman P., Friesen W.: Nonverbal leakage and clues to deception. Psychiatry 32, 88–105 (1969)
Ekman P., Friesen W.: Unmasking the Face: A Guide to Recognizing Emotions from Facial Expressions. Prentice-Hall, Englewood Cliffs, NJ (1975)
Ekman P., Friesen W.: The Facial Action Coding System: A Technique for the Measurement of Facial Movement. Consulting Psychologists Press, Palo Alto (1978)
Ekman P., Friesen W., Davidson R.: The Duchenne smile—emotional expression and brain physiology. 2′. J. Pers. Soc. Psychol. 58, 342–353 (1990)
Fernandez, R., Picard, R.: Classical and novel discriminant features for affect recognition from speech. In: 9th European conference on speech communication and technology (2005)
Fleiss J.: Statistical Methods for Rates and Proportions. 2nd edn. John Wiley and Son, New York (1981)
Forbes-Riley K., Rotaru M., Litman D.: The relative impact of student affect on performance models in a spoken dialogue tutoring system. User Model. User-Adapt. Interact. 18, 11–43 (2008)
Gertner A., VanLehn K.: Andes: a coached problem solving environment for physics. In: Gauthier, G., Frasson, C., VanLehn, K. (eds) International conference on intelligent tutoring systems, pp. 133–142. Springer, Berlin/Heidelberg (2000)
Graesser A., VanLehn K., Rose C. P., Jordan P. W., Harter D.: Intelligent tutoring systems with conversational dialogue. AI Mag. 22, 39–51 (2001)
Graesser A., Lu S. L., Jackson G., Mitchell H., Ventura M., Olney A. et al.: AutoTutor: a tutor with dialogue in natural language. Behav. Res. Method. Instrum. Comput. 36, 180–193 (2004)
Graesser A., Chipman P., Haynes B., Olney A.: AutoTutor: an intelligent tutoring system with mixed-initiative dialogue. IEEE Transac. Educ. 48, 612–618 (2005)
Graesser A., McNamara D., VanLehn K.: Scaffolding deep comprehension strategies through PointandQuery, AutoTutor, and iSTART. Educ. Psychol. 40, 225–234 (2005)
Graesser, A., McDaniel, B., Chipman, P., Witherspoon, A., D’Mello, S., Gholson, B.: Detection of emotions during learning with AutoTutor. In: 28th Annual conference of the cognitive science society, Vancouver, Canada (2006)
Graesser A., Penumatsa P., Ventura M., Cai Z., Hu X.: Using LSA in AutoTutor: learning through mixed-initiative dialogue in natural language. In: Landauer, T., McNamara, D., Dennis, S., Kintsch, W. (eds) Handbook of Latent Semantic Analysis, pp. 243–262. Erlbaum, Mahwah, NJ (2007)
Hocking R.: Analysis and selection of variables in linear regression. Biometrics 32, 1–49 (1976)
Hoque, M. E., el Kaliouby, R., Picard, R. W.: When human coders (and machines) disagree on the meaning of facial affect in spontaneous videos. In: 9th International conference on intelligent virtual agents, Amsterdam (2009)
Hudlicka E., McNeese M.: Assessment of user affective and belief states for interface adaptation: Application to an Air Force pilot task. User Model. User-Adapt. Interact. 12, 1–47 (2002)
Jaimes A., Sebe N.: Multimodal human-computer interaction: a survey. Comput. Vis. Image Underst. 108, 116–134 (2007)
Johnstone T., Scherer K.: Vocal communication of emotion. In: Lewis, M., Haviland-Jones, J. (eds) Handbook of emotions (2nd edn.), pp. 220–235. Guilford Press, New York (2000)
Jonassen D., Peck K., Wilson B.: Learning with technology: a constructivist perspective. Prentice Hall, Upper Saddle River, NJ (1999)
Kapoor A., Burleson B., Picard R.: Automatic prediction of frustration. Int. J. Hum.-Comput. Stud. 65, 724–736 (2007)
Kapoor, A., Picard, R.: Multimodal affect recognition in learning environments. In: 13th annual ACM international conference on Multimedia, Hilton, Singapore (2005)
Keltner D., Ekman P.: Facial expression of emotion. In: Lewis, R., Haviland-Jones, J. M. (eds) Handbook of Emotions (2nd edn.), pp. 236–264. Guilford, New York (2000)
Klecka W.: Discriminant Analysis. Sage, Beverly Hills, CA (1980)
Koedinger K., Anderson J., Hadley W., Mark M.: Intelligent tutoring goes to school in the big city. Int. J. Artif. Intell. Educ. 8, 30–43 (1997)
Koedinger K., Corbett A.: Cognitive tutors: technology bringing learning sciences to the classroom. In: Sawyer, R.K. (eds) The Cambridge Handbook of the Learning Sciences, pp. 61–78. Cambridge University Press, New York, NY (2006)
Kort, B., Reilly, R., Picard, R.: An affective model of interplay between emotions and learning: reengineering educational pedagogy-building a learning companion. In: IEEE international conference on advanced learning technologies, Madison, Wisconsin (2001)
Landauer T., Dumais S.: A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol. Rev. 104, 211–240 (1997)
Landauer, T., McNamara, D., Dennis, S., Kintsch, W. (eds): Handbook of Latent Semantic Analysis. Erlbaum, Mahwah, NJ (2007)
Landauer, T., McNamara, D., Dennis, S., Kintsch, W. (eds): Handbook of Latent Semantic Analysis. Erlbaum, Mahwah, NJ (2008)
Lee C., Narayanan S.: Toward detecting emotions in spoken dialogs. IEEE Transac. Speech Audio Process. 13, 293–303 (2005)
Lehman B., D’Mello, S., Person, N.: All Alone with your Emotions: an analysis of student emotions during effortful problem solving activities. In: Workshop on emotional and cognitive issues in ITS at the ninth international conference on intelligent tutoring systems. Montreal, Canada (2008)
Lehman B., Matthews M., D’Mello S., Person N.: What are you feeling? Investigating student affective states during expert human tutoring sessions. In: Woolf, B., Aimeur, E., Nkambou, N., Lajoie, S. (eds) 9th International conference on intelligent tutoring systems, pp. 50–59. Montreal, Canada (2008)
Liscombe, J., Riccardi, G., Hakkani-Tü, D.: Using Context to Improve Emotion Detection in Spoken Dialog Systems. In: 9th European conference on speech communication and technology, Lisbon, Portugal (2005)
Litman, D., Forbes-Riley, K.: Predicting student emotions in computer-human tutoring dialogues. In:42nd Annual meeting on association for computational linguistics, Barcelona, Spain (2004)
Madsen, M., el Kaliouby, R., Goodwin, M., Picard, R.: Technology for just-in-time in-situ learning of facial affect for persons diagnosed with an autism spectrum disorder. In: 10th ACM conference on computers and accessibility. Halifax, Canada (2008)
Mandler G.: Mind and Emotion. Wiley, New York (1976)
Mandler G.: Mind and Body: Psychology of Emotion and Stress. W.W. Norton and Company, New York (1984)
Marsic I., Medl A., Flanagan J.: Natural communication with information systems. IEEE 88, 1354–1366 (2000)
McDaniel B., D’Mello S., King B., Chipman P., Tapp K., Graesser A.: Facial features for affective state detection in learning environments. In: McNamara, D., Trafton, G. (eds) 29th Annual meeting of the cognitive science society, pp. 467–472. Cognitive Science Society, Austin, TX (2007)
McQuiggan S., Mott B., Lester J.: Modeling self-efficacy in intelligent tutoring systems: an inductive approach. User Model. User-Adapt. Interact. 18, 81–123 (2008)
Mitchell, T.: Machine Learning. Mc-Graw-Hill, (1997)
Morimoto, C., Koons, D., Amir, A., Flickner, M.: Pupil detection and tracking using multiple light sources. In: Workshop on advances in facial image analysis and recognition technology, Fifth European Conference on Computer Vision (ECCV’98), Freiburg, June 1998
Moshman D.: Exogenous, endogenous, and dialectical constructivism. Dev. Rev. 2, 371–384 (1982)
Norman D.: How might people interact with agents. Commun. ACM 37, 68–71 (1994)
Olney, A., Louwerse, M., Mathews, E., Marineau, J., Hite-Mitchell, H., Graesser, A.: Utterance classification in AutoTutor. In: Human language technology—North American chapter of the association for computational linguistics conference. Edmonton, Canada (2003)
Ortony A., Turner T.: What’s basic about basic emotions. Psychol. Rev. 97, 315–331 (1990)
Paiva, A., Prada, R., Picard, R. (eds): Affective Computing and Intelligent Interaction. Springer, Heidelberg (2007)
Panksepp J.: Emotions as natural kinds within the mammalian brain. In: Lewis, M., Haviland-Jones, J. M. (eds) Handbook of Emotions (2nd edn.), pp. 137–156. Guilford, New York (2000)
Pantic M., Patras I.: Dynamics of facial expression: recognition of facial actions and their temporal segments from face profile image sequences. IEEE Transac. Syst. Man Cybern. B 36, 433–449 (2006)
Pantic M., Rothkrantz L.: Toward an affect-sensitive multimodal human-computer interaction. Proc. IEEE 91, 1370–1390 (2003)
Pentland A.: Looking at people: sensing for ubiquitous and wearable computing. IEEE Transac. Pattern Anal. Mach. Intell. 22, 107–119 (2000)
Picard R.: Affective Computing. MIT Press, Cambridge, Mass (1997)
Picard R., Vyzas E., Healey J.: Toward machine emotional intelligence: analysis of affective physiological state. IEEE Transac. Pattern Anal. Mach. Intell. 23, 1175–1191 (2001)
Porayska-Pomsta K., Mavrikis M., Pain H.: Diagnosing and acting on student affect: the tutor’s perspective. User Model. User-Adapt. Interact. 18, 125–173 (2008)
Prendinger H., Ishizuka M.: The empathic companion: a character-based interface that addresses users’ affective states. Appl. Artif. Intell. 19, 267–285 (2005)
Robson C.: Real World Research: Resource for Social Scientist and Practitioner Researchers. Blackwell, Oxford (1993)
Rus V., Graesser A.: Lexico-syntactic subsumption for textual entailment. In: Nicolov, N., Bontcheva, K., Angelova, G., Mitkov, R. (eds) Recent Advances in Natural Language Processing IV: Selected Papers from RANLP 200., pp. 187–196. John Benjamins Publishing Company, Amsterdam (2007)
Russell J.: Is there universal recognition of emotion from facial expression—a review of the cross-cultural studies. Psychol. Bull. 115, 102–141 (1994)
Russell J.: Core affect and the psychological construction of emotion. Psychol. Rev. 110, 145–172 (2003)
Scherer K.: Vocal communication of emotion: a review of research paradigms. Speech Commun. 40, 227–256 (2003)
Scherer K., Ellgring H.: Multimodal expression of emotion: affect programs or componential appraisal patterns?. Emotion 7, 158–171 (2007)
Scherer K., Johnstone T., Klasmeyer G.: Vocal expression of emotion. In: Davidson, R. J., Scherer, K. R., Goldsmith, H. (eds) Handbook of the Affective Sciences, pp. 433–456. Oxford University Press, New York and Oxford (2003)
Shneiderman B., Plaisant C.: Designing the user interface: strategies for effective human-computer interaction. Addison-Wesley, Reading, MA (2005)
Storey, J., Kopp, K., Wiemer, K., Chipman, P., Graesser, A.: Critical thinking tutor: using AutoTutor to teach scientific critical thinking skills’. Behav. Res. Method. (in press)
Tekscan: Body Pressure Measurement System User’s Manual. Tekscan Inc., South Boston, MA (1997)
Turner T., Ortony A.: Basic emotions—can conflicting criteria converge. Psychol. Rev. 99, 566–571 (1992)
VanLehn K.: Mind Bugs: The Origins of Procedural Misconceptions. MIT Press, Cambridge, MA (1990)
VanLehn K., Graesser A., Jackson G., Jordan P., Olney A., Rose C.: When are tutorial dialogues more effective than reading?. Cogn. Sci. 31, 3–62 (2007)
VanLehn K., Lynch C., Schulze K., Shapiro J., Shelby R., Taylor L. et al.: The Andes physics tutoring system: five years of evaluations. Int. J. Artif. Intell. Educ. 15, 147–204 (2005)
Woolf, B., Burleson, W., Arroyo, I.: Emotional intelligence for computer tutors. In: Workshop on modeling and scaffolding affective experiences to impact learning at 13th international conference on artificial intelligence in education, Los Angeles, California (2007)
Yoshitomi, Y., Sung-Ill, K., Kawano, T., Kilazoe, T.: Effect of sensor fusion for recognition of emotional states using voice, face image and thermal image of face. In: IEEE international workshop on robots and human interactive communications, Osaka, Japan (2000)
Zeng, Z., Hu, Y., Roisman, G., Wen, Z., Fu, Y., Huang, T.: Audio-visual emotion recognition in adult attachment interview. In: International conference on multimodal interfaces, Alberta, Canada (2006)
Zeng Z., Pantic M., Roisman G., Huang T.: A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Transac. Pattern Anal. Mach. Intell. 31, 39–58 (2009)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
D’Mello, S.K., Graesser, A. Multimodal semi-automated affect detection from conversational cues, gross body language, and facial features. User Model User-Adap Inter 20, 147–187 (2010). https://doi.org/10.1007/s11257-010-9074-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11257-010-9074-4