A multimodal corpus of simulated consultations between a patient and multiple healthcare professionals

Language resources for studying doctor–patient interaction are rare, primarily due to the ethical issues related to recording real medical consultations. Rarer still are resources that involve more than one healthcare professional in consultation with a patient, despite many chronic conditions requiring multiple areas of expertise for effective treatment. In this paper, we present the design, construction and output of the Patient Consultation Corpus , a multimodal corpus of simulated consultations between a patient portrayed by an actor, and at least two healthcare professionals with different areas of expertise. As well as the transcribed text from each consultation, the corpus also contains audio and video where for each consultation: the audio consists of individual tracks for each participant, allowing for clear identiﬁcation of speakers; the video consists of two framings for each participant—upper-body and face—allowing for close analysis of behaviours and gestures. Having presented the design and construction of the corpus, we then go on


Introduction
The increased prevalence of long term health conditions (LTCs) is one of the main challenges affecting modern day healthcare systems (World Health Organization 2010).Approximately 40% of adults have a LTC, and 25% of adults can be considered to have multi-morbidity (defined as the presence of two or more LTCs) (Barnett et al. 2012).Most modern healthcare systems are predicated on a singledisease model with a lack of collaborative working between specialities.This can result in an inefficient use of resources, can be burdensome for patients and ultimately result in poorer provision of care (Wolff et al. 2002).Interprofessional collaborative working between medical specialities may improve clinical care and is recommended by policy makers, however there is a lack of robust evidence to assess the effect on clinical outcomes (Reeves et al. 2017).
Consultations involving multiple healthcare professionals have a different dynamic to those involving a single professional.Firstly, one-on-one consultations already have an imbalance between the roles (expert vs. layperson); adding multiple professionals (experts) will increase this imbalance.Secondly, an additional dimension is added to the interactions, viz., that between the professionals.Thus, before any conclusions can be drawn as to the efficacy of such consultations we must first understand the effect of these dynamics.By far the best method of gaining this understanding is to analyse audio-visual recordings of multi-professional consultations-yet they do not (or very rarely) happen in real life.Even if such consultations were commonplace, there would be significant ethical and practical considerations related to their capture, as is the case with one-on-one consultations (Martin and Martin 1984).
One method of overcoming these ethical and practical issues is to use realistic rather than real consultations.Such realism is achievable through the use of healthcare simulation, a common process used in medical training, underpinned by a number of educational theories (Ker and Bradley 2010).In such an approach, the patient is portrayed by an actor playing to a specified persona and associated medical history, and the healthcare professionals do as they would as if the actor were a real patient.Similar role-playing techniques have been successfully used as a data collection tool in other sensitive contexts such as dispute mediation (Janier and Reed 2016) in which attempting to record real consultations raises similar ethical and practical questions.Similarly, role-playing is a widely-used tool for the creation and collection of multimodal language resources in general, such as in Bro ˆne and Oben (2015) and Paggio and Navarretta (2017).
We present in this paper the design, construction and output of the Patient Consultation Corpus, a multimodal corpus of consultations between patients, portrayed by actors, and at least two healthcare professionals.The corpus consists of: multiple video recordings of individual participants; separate audio recordings for each participant; combined audio recordings of each consultation; and written transcripts of each consultation.We then go on to describe how the design of the Patient Consultation Corpus will allow its material to be analysed from several different perspectives.
The paper proceeds as follows: in Sect. 2 we provide more in-depth background to healthcare simulation and its use; in Sect. 3 the iterative design process for the corpus is described, including the development of patient personas and associated medical history; in Sect. 4 we outline the creation of the corpus and summarise the resultant output; in Sect. 5 we briefly describe how the design of the Patient Consultation Corpus will allow its material to be analysed from several different perspectives; and in Sect.6 we conclude the paper and provide directions for future work.
2 Background: healthcare simulation Simulation within medical practice can be considered a process, rather than a specific technology (Gaba 2004), whereby a broad range of modalities can be used to recreate real-life clinical situations.These modalities range from highly sophisticated mannequin-based simulated situations to simple verbal role play.In the past 20 years, there has been a marked increase in the use of simulation within medical training, in response to a variety of factors, including competency-based training, clinical governance and societal expectations.Simulation as a training tool is underpinned by a number of educational theories (Ker and Bradley 2010).The fidelity of any given simulation can range from low to high levels of authenticity and is reliant on either (or both) psychological and environmental factors (Faison 1954).A simulated patient is a ''normal person who has been...coached to accurately portray a specific patient...in a standardised, unvarying way'' (Barrows 1993).A number of studies report improvements in participants' communication skills following work with simulated patients, however there is a lack of good evidence assessing efficacy, in terms of improved patient outcomes or health economic benefits (Kaplonyi et al. 2017).Despite this, the use of simulated patients is viewed as being essential in the development of communication and consultation skills for both novice and expert healthcare professionals (Ker and Bradley 2010).
In this study, simulated patients were used in preference to real patient consultations for a number of reasons.Firstly, many patients object to being the subject of a recorded consultation, citing misgivings around confidentiality and embarrassment (Martin and Martin 1984).Secondly, it was felt that the use of multiple cameras and headset microphones would not be conducive to a ''typical'' consultation with a real patient, thereby limiting internal validity (Coleman 2000).Lastly, the use of a simulated patient ensured a standardised response to the consultation (in terms of each individual patient persona), ensuring a rich dataset for the purposes of annotation.
As elaborated on in Sect.2, healthcare simulations are a common process used in medical training.When designing such a simulation event, consideration should be given to the purpose, the process and the participants (Gaba 2004;Ker and Bradley 2010).In this case, the purpose was to accurately recreate a typical consultation involving a range of patients with diabetes and one or more healthcare professional(s).The process involved simulated patients being provided with a patient persona in advance of the simulated consultation.These personas contained an overview of relevant medical and social history as well as a brief description of the patient's personality trait and motivations.The personas also included a summary of current patient concerns, based around diabetes management and/or acute and chronic complications of diabetes.
The design of our four personas followed several iterations.Firstly, an expert in persona design collaborated with a medical expert to design a realistic set of personas and scenarios.These were then shared with the set of co-authors and comments were invited.This resulted in changes being made, in particular to reflect greater diversity in the backgrounds.This process was iterated several times until each member of the study was satisfied.We then trialled the personas in our pilot study on Day 0. After discussion with one of the healthcare professionals during that day, we further revised them slightly.
The healthcare professional was provided with a brief description of the patient's medical history and a number of ''health goals'' designed to accurately reflect the aims of a real-life clinical consultation e.g.encourage healthy diet and weight loss.The simulated consultation was unscripted and allowed to run until reaching a natural conclusion.This lack of time constraint and the presence of more than one member of the multidisciplinary team (MDT) meant that the simulation differed from a real-life consultation.This approach enabled maximum data capture, in the belief that it would not have an adverse impact on overall fidelity.
The participants included the simulated patient (professional actor) and healthcare professionals chosen to reflect the MDT involved in the care of a person with diabetes (physician, general practitioner, dietician, psychologist and podiatrist).Personality traits can predict diabetes glycaemic control (Lane et al. 2000), therefore the personas were written to encompass a range of traits including disengaged/ambivalent, anxious/neurotic, engaged/conscientious and challenging/ detached-traits that are regularly encountered in clinicians' everyday practice.While the challenging/detached combination might seem contradictory at first, this represents a patient who thinks that they are quite knowledgeable about diabetes themselves (challenging) and who is therefore not open to suggestions from the healthcare professionals (detached).Other aspects that were added to the personas were occupation and social situation (family).An overview of the general characteristics for the personas can be found in Table 1.A summary of the medical characteristics can be found in Table 2.

Ethical considerations
Ethical approval was sought and obtained from the School of Science and Engineering at the University of Dundee, to conduct the consultations.This included information sheets and consent forms, for both healthcare professionals and actors, to allow researchers access to the recorded data.

Corpus creation 4.1 Practicalities
All of the simulated consultations were recorded at the Clinical Skills Centre, Ninewells Hospital, Dundee. 1 The room in which the recordings took place is equipped with cameras and microphones (in addition to those described below) which allowed the consultations to be live streamed to a second room, thus making it possible for the researchers to monitor the consultations without actually being in the room and therefore not affecting the dynamic.
On each recording day, the researchers set up the room by placing the chairs in the correct place, arranging the cameras and testing the wireless microphones.A spotlight was positioned in the room, pointing at the ceiling so as to provide the best possible lighting conditions without being intrusive.Before each consultation was recorded, the researchers and the actor discussed the persona so as to address any questions or issues the actor might have had (e.g.clarifying a biographical detail).All participants were provided with the information sheet and consent form described in Sect.3.2, and asked to sign the latter.Participants were all debriefed at the end of the day and given the opportunity to learn more about the project.

Video
The primary aim of the video recording was to allow subsequent annotation and analysis of the participants' upper-body movements.Each participant in each consultation was recorded with two cameras; one capturing a close-up of their face,

123
the other showing their entire upper-body, including their arms and hands.An additional camera in each consultation captured a view of the entire scene.Figures 1 and 2 provide an overview of the positions of the participants and the cameras for, respectively, a three-and four-person consultation.For clarity in both diagrams, the fields of view of only one set of cameras are shown (denoted by a dashed line).Figure 3 provides a screenshot taken from the full-scene camera and shows the setup for a four-person consultation, while Figs. 4 and 5 show, respectively, the framing of the face and upper body cameras.

Audio
Each participant was equipped with a wireless headset microphone.One audio recorder was used per two microphones, with the left and right channels being for different participants.Post-processing split the channels into separate tracks, which were then converted into artificial stereo.As well as retaining the separate tracks, they were also combined into a single track consisting of all participants.

Corpus output
The Patient Consultation Corpus consists of nine consultations recorded over three days, involving five different healthcare professionals and three different actors (playing to multiple personas).The healthcare professionals consisted of: -A general practitioner (physician), with no particular specialisation, -A diabetes expert, a general practitioner with a specialisation in Type 2 diabetes, -A podiatrist, to discuss foot-related issues, -A dietician, to discuss diet-related issues, -A motivational interviewer, for directive, client-centred counselling.
As well as the main discussion with the patient, some consultations also include preand post-consultation discussion between the healthcare professionals.Some of these pre-consultations also involve an additional professional in the role of a general practitioner (GP) who has referred the patient to the specialists; the GP provides some of the patient's background then introduces them, before leaving the room when the main consultation starts.
Table 3 provides summary statistics of the consultations recorded.The word count for each consultations was obtained from the transcript; turns refers to the number of individual statements made, with a statement being a span of text associated with an identified speaker.
In addition to the seven consultations comprising the core corpus, a pilot study was conducted that followed the same role-playing format as the main consultations.The purpose of this study was to determine the suitability of various camera and microphone setups.As a result, the data is not as rich-for instance, only one camera was used for each participant, and a single microphone was used to record all audio.Two consultations were recorded in the pilot study, which we have included in the corpus as Supplementary Material (D0.C1 and D0.C2).The multi-modal nature of the Patient Consultation Corpus allows its data to be analysed from a variety of different perspectives.This not only has significant value within individual research areas, but also provides opportunities to examine connections between them.Here, we briefly outline four ways in which the data in the Patient Consultation Corpus can be analysed: from the perspectives of models of structured dialogue, virtual agent design, communication intent and style, and interpersonal stance.Note that for each perspective, we do not describe a full analysis nor discuss multiple alternative approaches because our intention is only to show that the Patient Consultation Corpus can be analysed in these ways; we leave full analyses to future work.

Models of structured dialogue
Analysing the dialogical structure of multi-party interactions can help understand how those interactions unfold and the strategies that participants adopt in order to reach different outcomes.Even exchanges that seem relatively trivial can contain linguistic and strategic nuances that only become apparent under close analysis.By analysing the Patient Consultation Corpus in this way, we can therefore obtain insights into the ways in which individual practitioners handled patients with different personality types.
Inference Anchoring Theory (IAT) is an analytical framework which enables the structure of dialogues to be represented by extracting the illocutionary force of the locutions (Budzynska and Reed 2011).The structure in IAT is described as ''the shape of the discussion'' and it aims to represent how the participants' dialogical moves combine to form an argument.Encompassing Speech Act Theory (Searle 1969), IAT also allows the relationship between speech acts to be represented.Using IAT to analyse the Patient Consultation Corpus reveals the dialogical structure of the individual consultations, thus providing an understanding of the ways in which they can unfold and the strategies the health care practitioners adopt.Furthermore, IAT analyses can feed into the design and development of reusable models of dialogue using processes such as those proposed by Snaith and Reed (2016).Such models can subsequently be used to underpin dialogue-based health care support systems.
An example IAT analysis, created using the Online Visualisation of Argument (OVA?) tool (Janier et al. 2012), is shown in Fig. 6.This example shows the analysis of a small (254 word) excerpt from the Patient Consultation Corpus, chosen to illustrate the core IAT concepts.The magnified section shows the connection between the dialogical process on the right, and the resultant argument on the left.In a dialogue, individual utterances are connected by dialogical transitions, while transitions and utterances are connected to the argument structure by illocutionary forces (e.g.''Asserting'', ''Disagreeing'').In an argument, individual statements can support, attack or rephrase each other; these are represented by rule applications (e.g.''Default Inference''), conflict applications Fig. 6 Example IAT analysis (e.g.''Default Conflict''), and rephrase applications (e.g.''Default Rephrase'') respectively.

Virtual agent design
There are currently several applications being developed in the Intelligent Virtual Agents research domain where virtual agents are being utilised more as a coach or an assistant than just as a tool to provide information.Researchers are working towards making these agents as human-like as possible by advancing their communicative abilities and social behaviours.Non-verbal behavioural cues like gaze, facial expressions, gestures, and body postures etc., indicate the attitude of a given individual in any social situation (Richmond et al. 1991) and convey information about affect, mental state, personality, and other traits (Vinciarelli et al. 2009).Studies involving human-human interaction can be used to understand the role of verbal and non-verbal behaviours in conversations and incorporate the same into the virtual agents.
The MUMIN multimodal scheme allows for the annotation of multimodal communicative behaviours from the perspective of three communicative functions, namely, feedback, turn management and sequencing (Allwood et al. 2007).Feedback provides information about the interactions through signals such as facial expressions, turn management regulates the interaction flow such as turn gain and turn hold, and sequencing deals with the organisation of a dialogue in meaningful sequences.
To facilitate such annotations, the video recording setup in the Patient Consultation Corpus was designed to capture behavioural cues on two levels.The first is at the individual level, where we aim to capture the non-verbal cues such as gaze behaviour, facial expressions, head movements, and hand gestures and body movement of a single individual.The second is at the group level, where we aim to capture the turn-taking behaviour: how and when individuals take turns to speak or facilitate others to speak, the interpersonal attitude, and the postural congruence.These behaviours help us in understanding the relationship, interpersonal attitude and role of the individuals in the group and can facilitate in modelling virtual agents to fit a specific role e.g., we can study the non-verbal behaviours of a human doctor and model a diabetic coach to emulate the their nature.

Coaching communication intent and style, and interpersonal stance of coaches
When a medical practitioner communicates something to a patient, it is important to consider not only what they communicate, but also how they communicate it, and how it comes across.Furthermore, they need to be able to adjust to changes in stance of the patient.Figure 7 shows part of the VRM, IPA, and IPC (here LR) annotation for the same excerpt analysed using Inference Anchoring Theory (Sect.5.1).It shows annotation for the behaviour of each coach, and for each annotation scheme.For some schemes, we made separate tracks for different categories of behaviour within the models they were based on.We plan to annotate more Excerpts in the near future to gain more insight into interactions between coaches and their patient.

Conclusions and future work
We have in this paper presented a multimodal corpus of consultations between patients portrayed by actors, and two or more healthcare professionals.The corpus consists of seven consultations in which two or three healthcare professionals carry out a consultation with a patient that is being portrayed by an actor playing to a specified persona.This use of healthcare simulation overcame significant ethical and practical issues that would have arisen with using real consultations.Ethically, it is difficult to record patient consultations without affecting the process of the consultation itself.Practically, consultations between a patient and multiple healthcare professionals (at the same time) are rare, but are nevertheless usefulfor instance, in identifying areas of overlap between two specialisations as and when they arise.
The personas portrayed by the actors were created using an iterative design process that took into account a range of factors to ensure that the patients were as realistic as possible.These included personality traits, as well as types of complications that might be faced by patients with their specific medical condition.
We also examined different perspectives from which the corpus can be analysed, thanks to its multi-modal nature.These perspectives are: models of structured dialogue, using Inference Anchoring Theory (IAT); virtual agent design, using the MUMIN annotation scheme; and coaching communication and interpersonal stance, using Verbal Response Modes (VRM), Interaction Process Analysis (IPA), and Interpersonal Circumplex (IPC).In future work, we intend to annotate the entire corpus from the three perspectives described above, including the use of other annotation schemes for these same purposes.This will further enrich the available data, but will also act as a catalyst for identifying overlapping areas between the different schemes.Furthermore, we intend to critically evaluate the quality of the corpus by using reflections from the participants that were captured informally between sessions.

Fig. 1
Fig. 1 Recording setup for three participants

Fig. 3
Fig. 3 Screenshot showing the room setup

Fig. 5
Fig. 5 Screenshot showing the framing of the upper body cameras The audio-visual setup in the Patient Consultation Corpus allows us to make use of annotation schemes that examine: intent behind communication (e.g.Verbal Response Modes (VRM; Stiles 1992); the form of communication (e.g., Interaction Process Analysis, IPA Bales 1951); and the interpersonal stance of participants (e.g. the Interpersonal Circumplex, IPC Leary 1957).The VRM annotation is concerned with what people do by saying something, and not as much the content of what they say.It tries to describe the relation of the speaker to the other in a discourse.It was made to be a general purpose tool to classify speech acts.The IPA annotation is focused on describing the kind of behaviour and the message it conveys.It originates from annotation of conversations had during group work.Broadly speaking, this concerns the type of communication that is being used and classification as task-related communication versus social-emotional communication.The IPC annotation is more focused on the type of personality people convey through the stance they take during discourse.It focuses on the dominance versus submissiveness shown, and the hostility versus friendliness shown.It originates from observations made in psycho-therapeutic settings.

Fig. 7
Fig. 7 Part of the VRM, IPA, and IPC (here LR) annotations for Excerpt 1

Table 1
The general characteristics of the four personas that were used in the recordings

Table 2
The medical characteristics of the four personas that were used in the recordings Weight on the heavy side, but not overweight.Medication required insulin dose has decreased and might not be needed in the future.Diabetes main goal is to balance blood glucose levels and lose weight, which is difficult to combine with social life.Worried about getting a hypo, so blood glucose tends to be high (12 mmol/l average for past 14 days).HbA1c is 75 mmol/mol.Other