Virtual meeting rooms: from observation to simulation
- 958 Downloads
Much working time is spent in meetings and, as a consequence, meetings have become the subject of multidisciplinary research. Virtual Meeting Rooms (VMRs) are 3D virtual replicas of meeting rooms, where various modalities such as speech, gaze, distance, gestures and facial expressions can be controlled. This allows VMRs to be used to improve remote meeting participation, to visualize multimedia data and as an instrument for research into social interaction in meetings. This paper describes how these three uses can be realized in a VMR. We describe the process from observation through annotation to simulation and a model that describes the relations between the annotated features of verbal and non-verbal conversational behavior. As an example of social perception research in the VMR, we describe an experiment to assess human observers’ accuracy for head orientation.
KeywordsVirtual Environment Angular Distance Head Orientation Meeting Room Conversational Agent
Much working time is spent in meetings and, as a consequence, meetings have become the subject of multidisciplinary research. The introduction of technology in meetings offers new perspectives on, amongst others, communication and language, human perception and social interaction. In this paper we describe how VMRs can be used to improve remote meeting participation, to visualize multimodal data and as an instrument for research into social interaction in meetings.
The research reported in this paper has been carried out in the context of AMI1, a European Research Project that aims at developing new technologies for supporting meeting activities such as meeting browsers, and technology that makes remote meeting participation easier, more effective and more natural. The Human Media Interaction (HMI) group of the University of Twente is one of the AMI partners (Nijholt et al. 2004). The HMI group has a tradition in research into interaction with embodied conversational agents, computer graphics for virtual environments and machine learning techniques for recognition of higher level features (e.g., dialogue acts, gestures and emotions) from lower level features (e.g., words, hand movements and facial features).
The paper is organized as follows. First, we will summarize how advances in technology allow for new opportunities in supporting meetings through the use of virtual reality. Next, we present a schematic overview of the process from observation to simulation underlying our concept of the VMR. We discuss several possible uses of VMRs in Sect. 4. As an illustration of the scheme we then focus on an experiment we did with an implementation of a VMR.
2 Meetings in virtual reality
In a general sense a meeting is any coming together, willingly or unwillingly, of two or more people at such a close distance of each other that they are aware of each others presence and, willingly or unwillingly, react to it. The concept of distance, and related to that the concept of being in the same meeting room, has strongly been developed and is still being renewed by the development of technology in the last few centuries, in particular by developments in communication and information technology. This is really a process of conceptual development, in which the concept of sharing the same space evolves from physically sharing the same space to mentally sharing the same space, such as the mentally ‘‘shared space’’ in a chat system or the visually shared space of an immersive meeting room. We identify invariantly a number of central themes: the struggle for the individual privacy, respecting each others private space, the need of being respected by others, the will to express oneself and one’s ideas and to realize individual goals. In a more restricted sense a meeting is an organized process of people coming together focusing on a common topic or task. Meeting, in this sense, is one of the characteristics of the modern way we organize our work in all kinds of organizations. However, professionalized and organized a meeting may be, it is still a gathering of people. All the themes that play in the more general sense of meeting can be identified in these meetings as well, be it often in more organized, conventional forms, and mediated by rules of good conduct: turn taking behavior, addressing behavior, politeness rules, and dominance relations.
The impact of technology on meetings cannot be described adequately in terms of quantitative, measurable effects it has on properties of processes that occur in meetings in their existing forms. Technology develops the very idea of meeting itself, and it has impact on how people realize the idea of meeting. Moreover, what is essential for meetings is that technology offers new perspectives on, amongst others, communication and language, human perception and social interaction. These new perspectives may help to gain more insight in the essential qualities of these aspects of social reality. A discussion of the relation between meetings and technology in general from the viewpoint of the three meeting concepts resources, processes and roles, is presented by Rienks et al. (2006b).
As an (immersive) virtual environment, a communication means for real-time remote meeting participation. Conducting remote meetings in a virtual environment allows enhanced visualizations of features in order to stress certain elements in the communicative behavior of the participants such as direction and level of attention, or agreement and disagreement, which are often not very clear in video-based remote meetings.
Presentation of multimedia information about meetings. Information can be directly obtained from recordings of behaviors in real meetings (e.g., tracking of head or body movements, voice), from annotations or from machine learning models that induce higher level features from recordings. 3D virtual replay of meetings allows us to have, for example, restructured and coherent summarization of a topic, even when it was discussed in a disjointed and fragmentary manner in the original meeting, while still capturing many salient (non-verbal) details.
Research into social interaction, and recognition and interpretation of visualized information. Virtual environments allow for tight stimulus control of various independent factors (such as voice, gaze, distance, gestures and facial expressions) and can be used to study how they influence features of social interaction and social behavior. Conversely, the effect of social interaction on these factors can be studied adequately in virtual environments as well.
3 From observation to simulation in a VMR
3.1 Annotations of behavior in meetings
The first step in realizing the process described in the previous section is annotation of recordings containing human-human interactions. Within the AMI meeting project we see a huge effort in meeting data collection, meeting data annotation and dissemination of these data for various multidisciplinary research purposes inside and outside the project. About 100 h of meetings have been recorded, of which about 60% are scenario based meetings with four people meeting four times. This is part of a design project, in which they have to work on a prescribed task to develop a remote TV control unit. Participants have various roles in this scenario and in order to meet reality as best as possible, external events and information are brought in that may influence the decision making process as well as the outcome of the meetings (Post et al. 2004).
The recordings have been annotated in varying levels of detail for different dimensions (Carletta et al. 2006). There are several reasons for creating manual annotations of corpus material. In the first place, ground truth knowledge is needed in order to evaluate (new) techniques for automatic recognition of those same aspects from lower level information. In the second place, as long as the quality of the automatic recognition results is not high enough, only manual annotations provide the quality of information needed to do research on higher levels of interpretation such as human-human interaction patterns (see also Sect. 4.3). A few examples of layers that are annotated are hand and head gestures, speech transcription, communicative acts, argument structures, topics and summaries.
The annotations can be organized in layers of increasing complexity. The lowest layers describe mostly the form of the interactions, or the observable events. The higher layers describe interpretations of these observable events, giving the function of the interactions. Consider for example the situation where a participant raises a hand. The form of this gesture can be observed and annotated as “raised hand”. On an interpretation layer, this event may be annotated with the function of this gesture, such as “request for a dialogue turn” or “vote in a voting situation”.
Once the annotations have been produced they can be analyzed. One of the results of such analyses consists of models of human interaction on varying levels of abstraction. Lower level models might describe how people generally realize certain communicative goals (e.g., how to express the addressee of utterances (Jovanovic and Op den Akker 2004), or how to show disagreement or agreement). Higher level models might describe aspects such as what interaction patterns characterize efficient meetings.
3.2 Regeneration of behavior in meetings
Furthermore, the replay can either be a direct replay of observed behavior, or an interpreted replay, starting from high level interpretation of what happened during the meeting. In the last case, appropriate behavior is generated that expresses the right content but in a potentially different form. The rules for generation of communication are derived from domain knowledge (models and theories of human interaction) collected through the analysis of large amounts of data from real world examples. Examples are models for choosing modalities, realizing gestures or speech, formulating sentences, deciding on communicative goals given beliefs and intentions, choosing communicative actions based on goals, etc. Interpreted replay in its most complete form allows for restructured and coherent summarization of a topic, even when it was discussed in a disjointed and fragmentary manner in the original meeting, while still capturing many salient (non-verbal) details.
4 Uses of the VMR
In Sect. 2 it was already mentioned that this paper focuses on three categories of VMR applications: an environment for teleconferencing that provides a sense of immersion and presence; visualization of multimedia information from meetings for several purposes; and an instrument for elicitation and validation of models for social interaction.
4.1 Remote participation and enhancement of meetings
A VMR can be used as an environment for teleconferencing, as described by Greenhalgh and Benford (1995). In addition to the usual advantages of remote meeting participation, it offers control over some features that are problematic in traditional video-based conferencing (e.g., natural visualization of gaze direction cues). But there are more opportunities for influencing the remote interaction during a teleconference in a VMR.
In the first place different meeting participants need not necessarily have the same view of the virtual environment. This simple fact introduces a lot of possibilities worth investigating. Participants can adapt the virtual environment, in which the meeting takes place to their own preferences and comforts without disturbing the other people. Each person can be given his or her own perception of the seating arrangements. Since it is known that some positions are more advantageous in terms of discussion impact than others, it might be sensible to give each participant such a view of the seating that he or she never feels to be in the most disadvantageous position, leading to all participants feeling more comfortable during the meeting. Another way of adapting the meeting to one’s own preferences involves Transformed Social Interaction, which allows a participant to influence the way that he or she is presented remotely (Bailenson et al. 2004).
A virtual teleconferencing environment also offers the possibility to introduce autonomous agents that have the same communicative channels at their disposal as the human participants (Embodied Conversational Agents or ECAs). This gives opportunities for designing experiments to discover regularities in human social interaction, as will be described in Sect. 4.3. It also facilitates the introduction of helper agents or pro-active meeting assistants into an actual meeting (Rienks et al. 2006a). Existing work by Slater and Steed (2001) has already shown that people can be influenced in their behavior as well as in their assessment of a situation by the presence of autonomous ECAs and their behavior, even if they know that the agents are not representing a real human. Therefore, ECAs can be used, given the emergence of advanced recognition technology for human interaction, partly developed from extensively annotated corpora, to influence the course of the meeting. A simple example would be the introduction of a virtual chairman in the meeting room with a regulating task. Based on an analysis of what is going on in the meeting, the virtual chairman can influence the progress of the meeting (request a vote, encourage silent people to speak, mention gaps in the argumentation). An enhanced version of this chairman becomes possible if the recognition technology is advanced to the point where potentially tense situations can be detected automatically: The virtual chairman could try to defuse such situations by making a joke, or changing the subject of discussion. This topic has been investigated in more detail by Rienks et al. (2006a).
It will be clear that all these kinds of support build on knowledge about what types of events and behaviors in the real meeting are essential to be presented in the virtual meeting in order to maximize the quality of those impressions that are required by the user given his task and role in the meeting, such as the feeling of presence, and the possibility of mutual gaze.
4.2 Re-visualization of meetings
With a general implementation of a VMR, it is also possible to re-visualize the contents of a previously recorded meeting. This can be done literally, by trying to stay as close to the original recordings as possible, or more conceptually, by aiming for a visualization that shows an impression of the most important contents of the meeting (rather than the actual form). The re-visualization process traces a path through Fig. 1 that starts at the bottom left corner (real world/video recordings), and first goes upwards through various stages of observation and interpretation. At a certain point the transition to the right part of the model is made (in a sense “copying” the information present on one level from the left hand side to the same level on the right hand side), after which the generation flow is followed down to produce a replay of the meeting in the VMR (bottom right).
Transition at the lowest levels is already interesting. For example, replaying recognized 3D joint angles in a VMR in parallel to the original video offered a kind of quick validation of the pose recognition process, which helped spot recognition errors. If the recognition is good enough to use as input for a gesture labeling algorithm but not good enough to give convincing replay results, the transition could be made at a higher level. After interpreting the movements as labeled gestures, the replay could be created from these gesture types rather than directly from the body poses, leading to an animation that is not an exact copy of the original video but does express the meaning of the movements possibly more clearly.
Another possible level where the transitions can be made is the level of communicative actions such as contributions to or judgments about the current topic of discussion. The simulation on this level might be created using different realizations for the same communicative actions. This can be useful for applying appropriate culturally determined gestures, or to highlight aspects of the contributions in relation to social conventions. These possibilities also apply to the use of the VMR as a remote meeting facility.
The final and more complex possibility discussed here deals with summarized replay of a meeting or set of meetings. If a discussion about a certain issue is spread over fragments of several meetings, at a certain level of interpretation the main structure of the arguments can be found. By making the transition at this level, selective replay enables a new cohesive and interpreted replay of the discussions. If the models for simulating the different individual participants are accurate, the main points of the original meeting will stay intact (who proposed what, who was for/against, who used/supported, which arguments, etc.), without the redundant information that was conveyed in reality. This form of simulation will deviate much from the original recordings, but the relevant content (the function) remains the same.
4.3 Validation of models of social interaction
If autonomous agents are to display believable social behavior, there are many communicative aspects to be taken care of. For such aspects models are needed. Which communicative actions are desirable, in which circumstances? How does a person show whom (s)he is addressing? Does it depend on status differences? What is acceptable behavior for an ECA to show that (s)he is listening to the speaker and interested in what the speaker says? How do people exhibit and perceive signals related to relative status? Such models are also needed for effective automatic analysis of meetings for other purposes such as retrieval or meeting support. The VMR provides ways to both elicit and validate such models. The following paragraphs give a few examples of this. A few other experiments that use virtual environments for elicitation and/or validation of models of (social) interaction are given by Bailenson et al. (2001) and Pertaub et al. (2002).
4.3.1 VMR Turing test
The VMR Turing test (adapted from Bailenson et al. 2004) allows one to validate a complex set of models, testing whether they result in convincing, natural social interaction by ECAs. It works as follows: a human subject is shown a VMR containing ECAs, as well as avatars controlled by other humans. From the human avatars, all communication channels that the ECA does not have (for example face expressions) are removed. The subject is asked to judge, which participant is an ECA, and which is actually the avatar of a human. For example, one can validate models of listening behavior by having the subject talk to two humanoids, of which one is ECA and one is operated by human. The aim is to find out whether the subject can tell, which is which, if both are not allowed to talk back.
4.3.2 Validating models of conversational behavior
Besides the fact that models of conversational behavior should lead to natural looking behavior, as described in the previous section, it is also important that the behavior transmits the intended conversational cues. This can also be evaluated in a VMR. For example, a possible way to validate models of addressing behavior is to have an ECA simulate a fragment of conversation, expressing the addressee of utterances in one of the many ways allowed by the model (using vocatives, gaze, etc.). A human participant, immersed in the VMR, will then be asked to assess who is the addressee of utterance. This experiment can provide the validation whether a model of addressing behavior is good enough to use in an ECA, insofar as that a human will understand its addressing cues. The same type setup can be used to validate many more models of conversational behavior for their suitability.
4.3.3 Eye contact and intention to interact
Gaze and mutual gaze are powerful elements of human–human interaction. They play a role in many aspects of communication and communication regulation, such as turn taking, backchannelling and determining salience and information status. One of the communicative functions where gaze is an important mechanism is signaling and detecting intention to interact [see for example the work of Cary (1978) and Kendon (1990)]. This has been taken up in the work on BodyChat by Vilhjálmsson et al. (1998), where intention-to-interact is signaled using gaze in a graphical chat environment, and the work of Peters (2005), in which agents calculate the perceived level of interest from potential conversation partners based on gaze behavior, among other cues.
We intend to use the VMR to experiment with models that simulate “intention-to-interact” in interaction and coordination with user behavior and test whether these models are adequate for evoking appropriate reactions from human users. Such models can then be used to enhance the visualization of participants in a remote meeting setting in order to facilitate smooth interaction processes.
5 Experiment in the VMR: perception of head orientation
As an example of perception research in the VMR, we summarize an experiment that we performed to assess human observers’ accuracy for head orientation. There is an obvious relation between head orientation and gaze or focus of attention. Perception of gaze has been well-studied. One of the first experiments is due to Gibson (1963), who measured the accuracy for observing gaze direction in dyadic situations. In these situations, a human observer has to assess where the sender looks at, relative to himself. Triadic situations are different since an observer has to report where a sender is looking, not relative to himself. This was found to be a more difficult task due to the more unfavorable position of the observer (Krüger and Hückstedt 1969). Our interest is to determine how factors such as distance and viewing angle play a role in observation accuracy. The experiment described here is a preliminary investigation to define an estimate of accuracy, to be used in further experiments.
Compared to using recorded settings, the use of a virtual environment differs in that our avatar representation is an abstraction of the real persons. The presented avatar might be too simplistic to reliably determine its head orientation. However, Sagiv and Bentin (2001) found that schematic faces are capable of producing similar effects to real faces. This finding is supported by Wilson et al. (2000), who found that perception of head orientation was high, even for low resolution images.
Performance scores for ball identification with different angular ball distance
The results indicate that decreasing the angular distance between the balls increases the judgment error. One quarter of the stimuli are judged incorrectly when the angular distance is only 15°. With an angular distance of 30°, our results indicate that discrimination in this situation is possible with an accuracy of 97.92%. Analysis of the scores for individual balls revealed differences. Due to the limited amount of space available here, we do not discuss the results here. The interested reader is referred to Poppe et al. (“Accuracy of head direction perception in triadic situations: experiments in a virtual environment”, in preparation).
6 Conclusions and further research
The VMR may add value to the already existing technological means people have to meet and communicate. The various modalities such as speech, gaze, distance, gestures and facial expressions can be controlled, which allows VMRs to be used to improve remote meeting participation, to visualize multimedia data and as an instrument for research into social interaction in meetings. We described the process from observation through annotation to simulation and a model that describes the relations between the annotated features of verbal and non-verbal conversational behavior. This model can be used to relate various research tasks in the field of meeting research. An experiment was conducted in the VMR where we assessed human observers’ accuracy for the perception of head orientation. Use of the VMR allowed for good stimulus control and we demonstrated that we could use this virtual environment instead of video recordings. Regarding our experiment, ongoing work is focused at determining what factors play a role in the assessment of head orientation. Furthermore we will pursue our work on meeting modeling and see how we can present real meetings in an effective way by means of a virtual representation that shows the most informative view on the meeting.
A lot of research remains to be done to see how people perceive and interpret meeting situations and how they react on them in a VMR. Results of such research are necessary to see what information channels and modalities are important to effectively perform the various tasks in a meeting. This concerns not only the transfer of task-based information, but also issues such as maintaining a good feeling of social presence by representing the appropriate communicative cues.
AMI, Augmented Multi-party Interaction, FP6-506811.
This work was partly supported by the European Union 6th FWP IST Integrated Project AMI (Augmented Multi-party Interaction, FP6-506811, publication AMI-187).
- Carletta, JC et al. (2006) The AMI meeting corpus: a pre-announcement. In Proceedings of the MLMI’05 workshop, pp 28–39, Edinburgh. LNCS 3869, Springer. ISBN 3-540-32549-2Google Scholar
- Greenhalgh CM, Benford SD (1995) Virtual reality tele-conferencing: implementation and experience. In Proceedings of the fourth European conference on computer supported cooperative work, pp 163–178Google Scholar
- Jovanovic N, op den Akker R (2004) Towards automatic addressee identification in multi-party dialogues. In M Strube and C Sidner (eds.) Proceedings of the 5th SIGdial workshop on discourse and dialogue, pp 89–92, Association for computational linguistics, Cambridge, MA. Google Scholar
- Kendon A (1990) A description of some human greetings. In conducting interaction: patterns of behavior in focused encounters. Studies in interactional sociolinguistics, Cambridge University Press, LondonGoogle Scholar
- Krüger K, Hückstedt B (1969) Die Beurteilung von Blickrichtungen. Zeitschrift fur experimentelle und angewandte Psychologie 16:452–472Google Scholar
- Nijholt A, op den Akker R, Heylen D (2004) Meetings and meeting modeling in smart surroundings. In A Nijholt and T Nishida (eds.) Social intelligence design. Proceedings third international workshop, pp 145–158, Enschede, The Netherlands. CTIT workshop proceeding series WP04–02, ISBN 90-75296-12-6Google Scholar
- Peters C (2005) Foundations for an agent theory of mind model for conversation initiation in virtual environments. In D Heylen and S Marcella (eds.) Proceedings of the AISB ‘05 symposium on virtual social agents: mind-minding agents, Hatfield, EnglandGoogle Scholar
- Poppe R, Rienks R, Heylen D (2007) Accuracy of head orientation perception in triadic situations: experiment in a virtual environment perception 36, ISSN 0301-0066 (in press)Google Scholar
- Post WM, Cremers AHM, Blanson-Henkemans OA (2004) A research environment for meeting behavior. In A Nijholt and T Nishida (eds.) Proceedings of the 3rd workshop on social intelligence design, pp 159–165, Enschede, The Netherlands. CTIT workshop proceedings series WP04-02, ISSN 0929-0672Google Scholar
- Reidsma D, Rienks R, Jovanovic N (2005) Meeting modelling in the context of multimodal research. In Proceedings of the MLMI’04 workshop, pp 22–35, Martigny. Volume 3361 of LNCS, Springer. ISBN 3-540-24509-X. ISSN 0302-9743Google Scholar
- Rienks RJ, Nijholt A, Barthelmess P (2006a) Pro-active meeting assistants: attention please! In Proceedings of social intelligence design (SID2006), pp 213-228, Osaka, JapanGoogle Scholar
- Rienks RJ, Nijholt A, Reidsma D (2006b) Meetings and meeting support in ambient intelligence. In Th-A Vasilakos and W Pedrycz (eds.) Ambient intelligence, wireless networking, ubiquitous computing. Artech House, Norwood, ISBN 1-58053-963-7Google Scholar
- Slater M, Steed A (2001) The Social life of avatars: presence and interaction in shared virtual environments, chapter meeting people virtually: experiments in shared virtual environments, pp 146–171. Springer, London. ISBN 1-85233-461-4Google Scholar
- Vilhjálmsson HH, Cassell J (1998) BodyChat: Autonomous communicative behaviors in avatars. In proceedings of the second international conference on autonomous agents, pp 269–276, Minneapolis, Minnesota.Google Scholar