1 Introduction

Human communication is multifaceted and information between humans is communicated on many channels in parallel. In order for a machine to become an efficient and accepted social companion, it is important that the machine understands interactive cues that not only represent direct communicative information such as spoken words but also nonverbal behavior. Hence, technologies to understand and put nonverbal communication into the context of the present interaction are essential for the advancement of human-machine interfaces [3, 4].

Multimodal behavior analytics—a transdisciplinary field of research—aims to close this gap and enables machines to automatically identify, characterize, model, and synthesize individuals’ multimodal nonverbal behavior within both human-machine as well as machine-mediated human-human interaction.

The emerging technology of this field is relevant for a wide range of interaction applications, including but not limited to the areas of healthcare and education. Exemplarily, the characterization and association of nonverbal behavior with underlying clinical conditions, such as depression or post-traumatic stress, holds transformative potential and could change treatment and the healthcare systems efficiency significantly [6].

Within the educational context the assessment of proficiency and expertise of individuals’ social skills, in particular for those with learning disabilities or social anxiety, can help create individualized education scenarios [2, 8]. The potential of machine-assisted training for individuals with autism spectrum disorders (ASD) for example could have far reaching impacts on our society.

In the following, I highlight two behavior analytics approaches that were investigated in my PhD dissertation [3] and summarized in a multimodal framework for human behavior analysis [4].

2 Multimodal Behavior Analytics

Laughter Detection One of the most iconic human behaviors is laughter; it is universally understood by all cultures and yet is immensely versatile in its meaning and variable in its expression (e.g. inhaled, exhaled, snort-like laughs as well as laughter bearing various meanings, e.g. humorous, nervous, or social laughter [1]). Due to the relative importance of laughter and its potential impact on human-machine interaction, we investigated the capability of multimodal sequential classifiers to spot laughter in natural multiparty conversations [5]. Utilizing all available data channels, we extracted three independent feature streams, including frequency and spectrum based features from the audio stream, and coarse movement related features from the video stream. Utilizing multimodal sequence classifiers such as hidden Markov models (HMM) and echo state networks (ESN) we achieved considerable accuracies in recognizing this challenging human paralinguistic behavior (\(F_1\) = 0.72 for the HMM, with 0.8 recall and 0.64 precision; \(F_1\) = 0.63 for the ESN with 0.81 recall and 0.52 precision).

Voice Quality Recognition Voice quality, a term that refers to the timbre or coloring of the voice, serves many purposes in human-human communication. In particular, the dynamic use of voice qualities in spoken communication informs us about the attitude, mood, social status, and affective state of the speaker. Yet voice quality is very difficult to identify and often even confused by human experts. In order to investigate the usefulness of uncertain or fuzzy information provided by human experts, we analyzed the classification performance of fuzzy-input fuzzy-output support vector machines (F\(^2\)SVM) [7]. These F\(^2\)SVM outperformed other state of the art approaches significantly, by solely utilizing the information provided by the fuzzy annotations of the human experts during training on a subset of the voice quality data for which the majority vote of the human annotators always coincided with the actual target label. The F\(^2\)SVM classified the voice quality samples with an error rate of 13.88 % (\(\sigma\) = 3.89) in speaker independent classification experiment, and 17.66 % in the cross corpus experiment. This experiment indeed shows that the usage of fuzzy and uncertain information has the potential to improve classification results.

3 Concluding Remarks

To date, we have only scraped the surface of understanding human nonverbal communicative behavior utilizing these novel objective and quantitative behavior analytics approaches. Yet, this vibrant and highly multidisciplinary research that integrates the fields of psychology, machine learning, multimodal sensor fusion, and pattern recognition, emerges as an essential field of investigation for computer science. The thorough understanding of the underlying mechanisms of human behavior will advance the development of technology that tightly cooperates with human interactants and has the potential to optimize human performance and well-being alike.