1 Introduction

In speech based HHI semantic and prosodic cues effectively communicate certain dialog functions such as attention, understanding or other attitudinal reactions of a speaker [1]. Furthermore, it is assumed that these short feedback signals are uttered in situations of a higher cognitive load [4] where a more articulated answer cannot be given. Among these cues the discourse particles (DPs), as “hm”, “uh”, and “uhm”Footnote 1, recently gained increased attention [2, 12] also in HCI [25]. We distinguish two types of DPs: “hms” are used to provide the listener’s feedback and “uh” as well as “uhm” are mostly used as filled pauses. Thus, we will shortly denote “hm” as feedback (signal) and “uhm” as well as “uh as fillers. In HHI the interaction partner is able to understand the DPs and is using them himself whereas in HCI the technical system as dialog partner is mostly neither able to understand these cues nor using them. Positive exceptions are presented for instance in [28]. But, DPs are verifiably used in both HHI and HCI [22, 27] and specific form-function-relations could be confirmed [15, 17]. Also influences of specific subject characteristics and certain personality traits on the use of prosodic cues, such as DPs, were uncovered [9, 24].

Enabling technical Systems to understand and use these particles will help to detect crucial points within the interaction. A previous study we rely on, analyzed three different interactions [6]. The considered corpora comprise HHI, indirect HHI, where the subject knew that a wizard simulated the system, and HCI, but with different subjects. A functional analysis shows that even if DPs are used in structurally similar contexts as in HHI they do not always serve the same purposes. For instance, feedback signals are often not directed to the hearer and therefore are unlikely to display perception and understanding [6]. The authors of [7] concluded that the use of partner-oriented signals (e.g. feedback signals) decreases while the number of signals indicating a talk-organizing, task-oriented, or expressive function (e.g. fillers) increased.

For this paper, we were in the advantageous situation to perform our investigations on a dataset providing both types of interaction, HHI and HCI. Thus, we could investigate which similarities in the use of DPs can be observed between these two different types of interaction. Furthermore, different styles of dialogs are present in HCI. As we also have additional knowledge about the subject’s age, biological gender, and personality traits, we incorporated this information as additional factors. In this paper, the focus is on identifying similarities and influencing factors on the use of DPs in HHI and HCI.

The remainder of the paper is structured as follows, Sect. 2 shortly describes the methods utilized in the present investigation. Then, Sect. 3 describes the utilized dataset providing HHI as well as HCI. Section 4 presents and discusses the results of our investigation on the DPs use in HHI and HCI. Finally, Sect. 5 concludes the paper and provides an outlook for further research directions.

2 Methods

As DPs are verbalized, we took into account the number of uttered tokens of the subjects. These tokens are words and vocalizations. This measure is denoted as the number of verbalized tokens (\(\#\mathrm {Token}\)). It varies from subject to subject and influences the DPs’s use. Therefore, we used the normalized DPs frequency (\(\Vert \mathrm {DP}\Vert \)). The subject’s number of uttered DPs is divided by the subject’s total number of uttered tokens during the (considered part of the) HCI and HHI interaction multiplied by 100:

$$\begin{aligned} \Vert \mathrm {DP}\Vert = \frac{\#\mathrm {DPs}}{\#\mathrm {Token}}\cdot 100 \end{aligned}$$
(1)

Afterwards, we averaged over all considered subjects and calculated mean and standard deviation. To analyze the impact of the different factors — user characteristics as sociobiographic variables and personality traits, or communication partner — we used a median split to gain two groups of subjects. Afterwards, we used a non-parametric version of the ANOVA, the Mann-Whitney U-test [16].

3 Dataset

For our study we utilize the LAST MINUTE Corpus (LMC) [20]. It contains 130 high-quality multi-modal recordings of German speaking subjects during Wizard of Oz (WOZ) experiments. This part is referred as HCI-part. It is already the object of examination regarding affective state recognition [8, 26] and linguistic turns [21]. Furthermore, 73 of these subjects underwent a semi-structured interview, which followed the HCI-part experiment. This part is referred as HHI-part.

The corpus was recorded with several opposing speaker groups, young (y) vs. elderly (e) speakers and male (m) vs. female (f) speakers. The younreg group was represented by subjects being 18 to 28 years old. The elder group consists of subjects being older than 60 years. The combination of both age and gender led to four sub-groups: (ym, em, yw, and ew). Additionally, several questionnaires had to be answered by the subjects after participating in experiment, regarding personality traits (NEO-FFI [5], IIP [10]) and further psychological user characteristics including technical affinity (TA-EG [3]), attributional style (ASF-E [18]) and stress coping behavior (SVF [11]).

3.1 LMC’s HCI-parts – Personalization and Problem Solving

The setup of the HCI part revolves around an imaginary journey to the unknown place Waiuku. With the help of an adaptable technical system, the subjects have to prepare the journey, by packing the suitcase, and select clothing and other equipment by using voice commands. Each experiment takes about 30 min. All experiments are transcribed according to the GAT-2 minimal standard [23], enabling the automatic extraction of speaker utterances. For a subset of 90 subjects all DPs are annotated. Furthermore, each HCI-part is distinguished into two modules, with two different dialog styles [19].

The personalization module, being the first part of the experiment, has the purpose of making the subject familiar with the system and ensuring a more natural behavior. In this module the subject is asked a set of questions focused on personal details and on recent events which made him happy or angry.

During the problem solving module the subject is expected to pack the suitcase from several depicted categories, for instance ”Tops“or” Jackets & Coats“. The dialog follows a specific structure of specific subject-action and system-confirmation dialogs. This conversation is task focused. This part of the experiment has a much more command-like regularized dialog style.

3.2 LMC’s HHI-part – the Interview

After the WOz experiment, nearly half of the subjects took part in a semi-structured interview. Therein, they were asked to described their individual experience of the experimental interaction and the simulated system [13, 14]. The interview focused in particular on the subjects‘ emotions occurring during the interaction, the subjects‘subjective ascriptions to the system and the subjects’ overall evaluation of the system.

The interview questions were formulated in a way that the subjects were enabled to speak freely, in order to get narrations, which allow examining individual subject’s experiences using methods from qualitative social research. Hence, the subjects’ part of speech exceeded that of the interviewer. In order to ensure a naturalistic, comfortable, open and friendly dialog atmosphere, the interviewer gave feedback in terms of nodding and DPs, as well as further queries throughout the whole interview (no strict feedback policy). To extract all DPs in the interviews, a manual transcription and annotation has been performed. As this is quite time-consuming not all interviews were translated completely and thus not all interviews could be used further. For a subset of 64 subjects the interviews are transcribed and all DPs are transcribed.

3.3 Utilized Subset

A subset of 44 subjects, having transcribed interviews, transcribed experiment recordings and DPs-annotations for both interactions, were used for this study. This subset of LMC has a total duration of approx. 30 hours of HCI data and approx. 35 hours of interview data. The age and biological gender distribution is nearly balanced, see Table 1.

Table 1. Distribution of speaker groups in the utilized subset of the LMC.

4 Results

At first, we analyzed the different use of DPs in HHI and HCI. Afterwards, we tried to explain these differences by examining different influencing factors.

4.1 Similarities Between HHI and HCI in the Use of DPs

To analyze differences between HHI and HCI the frequency of DPs-occurrence is of major interest. This aspect is depicted in Fig. 1. As noticeable differences in the various speaker groups were observable, the normalized DPs frequency (cf. Eq. 1) is depicted for all groups of speakers.

The interviewer’s mean DPs frequency is 0.088, with a standard deviation of 0.043. This is much higher than the observed subjects’ DPs frequency and can be explained with the fact that the interviewer’s aim is to keep the subject talking and show interest and understanding in order to achieve a comfortable, open communication atmosphere. Thus, mostly short feedbacks, as DPs, were used.

Fig. 1.
figure 1

Mean and standard deviation for the DPs regarding different speaker groups for the HHI and both HCI parts. The stars denote the significance level: * (\(p<0.05\)), \(\star \) denotes close proximity to significance level.

Regarding Fig. 1, it can be seen that a differentiation into speaker groups has to be used for HCI data, because the specific speaker group differences can be averaged out. Within each speaker group, the DPs frequency of the interview is quite similar to the personalization module. Although male subjects have a slightly higher DPs frequency within the interviews and female subjects’ DPs frequency is slightly higher within the personalization module, none of these differences are significant. But in comparison with the problem solving there are remarkable differences. Male and younreg subjects used more DPs during the interview and the personalization, while female and elderly speakers used more DPs within the problem solving. For male, female and young subjects the difference between interview and problem solving module is close to significance (p < 0.07). Significant differences (p < 0.05) between interview and problem solving can be observed for elderly male subjects and elderly female subjects. Further significant differences are prevented by the high standard deviation of our samples.

Besides the average DPs frequency, also the ratio between fillers and feedback signals is of interest, as this analysis reveals further information on the behavior of the subjects in interactions with and without a human conversation partner. In Fig. 2 it can be observed that in the personalization module the use of the fillers is significantly outstanding (p < 0.001) against the use of feedback signals. In problem solving the use of feedback signals and fillers is nearly similar. Analyzing the HHI part, we observed a very different use of DPs. Nearly twice of all DPs occurrences are feedback signals, which is highly significant (p < 0.005).

Fig. 2.
figure 2

Mean and standard deviation of fillers and feedback signals for the HHI part and both HCI parts. The stars denote the significance level: ** (\(p < 0.005\)) and *** (\(p < 0.001\)).

The frequent use of feedback signals in the interviews is expectable, as feedback signals are used to minimally communicate certain speaker and dialog states and the existing interviewer is able to understand them. The lower use of feedback signals in the personalization module, although of similar dialog style as the interview, indicates an influence of the absent human conversation partner. But the higher use of feedback signals within the problem solving module in comparison to the personalization module indicates that the use of feedback signals is also dependent from additional factors, as for instance in this case a challenging dialog (cf. Sect. 4.3). The distribution of the different DPs-types on the speaker groups is relatively balanced in the interview as well as in the two HCI-modules personalization and problem solving. Thus, group-specific ratios of feedback signals and fillers are not depicted.

4.2 Investigating Different Influencing Factors

As it can be seen in Fig. 1, the standard deviation for all speaker groups is quite high. Thus, additional factors seem to influence the use of the DPs. We investigated the subjects’ individual attitude in using DPs and his/her psychological user characteristics.

Influence of the Subjects’ Individual Dialog Attitude. To analyze the individual dialog attitude, we distinguished two groups, low scorers and high scorers. Both groups are obtained by a median split regarding the normalized DPs occurrences on the particular parts. Low scorers have an individual normalized DPs frequency below the median and high scorers have an individual normalized DPs frequency at or above the median. For all interaction styles (interview, personalization, problem solving), the number of subjects are nearly balanced.

Fig. 3.
figure 3

Mean and standard deviation of low and high scorers for the HHI part and both HCI parts.

The mean and standard deviation for all groups regarding personalization and problem solving module as well as the HHI part are depicted in Fig. 3. As the distribution of low and high scorers for the different speaker groups according to age and gender is equal in the interview as well as in the two HCI-modules, they are not depicted. The correspondence of subjects along the low and high scorers between the personalization and the problem solving module is 79.55 % and between both HCI modules and the interview the correspondence is 80.65 %. The difference of the DPs use between low and high scorers quite huge. This substantiates the grouping in low and high scorers, as these groups produce verifiably different normalized DPs distributions The subjects’ individual dialog attitude remains the same for all interaction styles. High scorers have very similar DPs frequencies. Low scorers are using fewer DPs during problem solving than in the interview and personalization. But this is not significant.

Influence of Personality Traits. To investigate the influence of different personality traits, we differentiated between subjects with traits below the median (low trait) and those at or above the median (high trait). We selected certain personality traits, where in previous studies a relationship with the use of DPs could be verified [24]:

  • SVF positive strategies (SVF pos)

  • SVF negative strategies (SVF neg)

  • IIP vindictive competing (IIP vin)

  • NEO-FFI Agreeableness (NEO agr)

Fig. 4.
figure 4

Mean and standard deviation for the DPs divided into the two dialog styles regarding different groups of personality traits. The stars denote the significance level: * (\(p<0.05\)), \(\star \) denotes close proximity to significance level.

The results are depicted in Fig. 4. Again, distinguishing according to age and gender of the subjects does not show significant differences and thus the results are not depicted. Analyzing the influence of personality traits for the use of DPs in the interview data, only subjects having a SVF pos trait below the median are using significantly more DPs than subjects above the median (p < 0.05). This observation shows that, subjects having lower skills in stress management with regard to positive distraction use substantially more DPs. For all other traits, no significant difference and thus no influence is detectable. The normalized DPs frequency is nearly equal for SVF neg, IIP vin and Neo age.

Considering the two HCI parts, no significant differences are noticeable for personalization. For the problem solving module, the differences for SVF neg and NEO agr have just close proximity to significance level (p < 0.07). This is due to the fact, that on the one hand we compare very few users within a very heterogeneous sample. On the other hand, the influence of psychological characteristics heavily depend on the situation in which the subject is located. The distinction between a open dialog and command-like dialog may not be sufficient to describe the situation. Especially in the command-like problem solving module very different situations are induced by the experimental design, which also produce partly contradictory user reactions. This will be analyzed in the next section.

4.3 Subject Behavior After Dialog Barriers

As discussed in Sect. 4.1, the distinction between a open dialog (personalization) and command-like dialog (problem solving) may not be sufficient to describe the subject’s situation. Thus, we investigated the problem solving module in detail. Within this module pre-defined barriers occur for all users at specific time points [8]. These barriers are intended to interrupt the dialog-flow of the interaction and provoke significant dialog events: Although Baseline (BSL) does not represent a barrier, it serves as an “interaction baseline” from which the other barriers are distinguished. At the Weight Limit Barrier (WLB) the system refuses to pack further items, since the airline’s weight limit of the suitcase is reached. Thus, the subject has to unpack things.

Fig. 5.
figure 5

Mean and standard deviation of fillers and feedback signals for the BSL and WLB dialog parts.

First, we investigated the ratio between fillers and feedback signals. As it is shown in Fig. 2, subjects use feedback signals and fillers nearly equally in the problem solving module. The same behavior can be seen in Fig. 5. The only difference is that the use of fillers and feedback signals is remarkably lower during BSL than after the WLB (Fig. 6).

Fig. 6.
figure 6

Mean and standard deviation of low and high scorers for the HHI part and both HCI parts. The stars denote the significance level: * (\(p<0.05\)).

Afterwards, we analyzed which influence the dialog situation has on the individual dialog attitude. Again, we distinguished two groups, low scorers and high scorers. Both groups are obtained by a median split regarding the normalized DPs occurrences on both dialog parts. The correspondence of subjects along the low and high scorers for BSL and WLB is 85.31 % and 81.34 % against the two HCI modules. Verifiably different normalized DPs distributions are produced by low and high scorers. It can be seen that low scorers do not differ much between the BSL and WLB dialog parts. But high scorers use significantly more DPs during the WLB dialog part than within the BSL part (p < 0.05).

Fig. 7.
figure 7

Mean and standard deviation for the DPs of the two barriers regarding different groups of user characteristics. The stars denote the significance level: * (\(p<0.05\)), \(\star \) denotes close proximity to significance level.

Finally, we analyzed the influence of personality traits on the use of DPs after the BSL and WLB events. From Fig. 7, the following conclusions can be drawn. Subjects having better skills in stress management with regard to positive distraction use substantially less DPs. The finding on SVF negative strategies (SVF neg) confirms this statement. Subjects not having a good stress management or even having negative stress management mechanisms use more DPs. Subjects who use DPs more frequently are also more likely to have problems in trusting others or are suspicious and rather quarrelsome against others (IP vin). The interpretation of the NEO-FFI traits also confirms the IIP-findings. Subjects using fewer DPs show less confidence in dealing with other people which is determined by the factor agreeableness (NEO Agree).

5 Outlook

Within this paper, we could show similarities and differences in the use of certain DPs within HHI and HCI by analyzing the same group of subjects. Therefore, we used the LMC containing both types of interactions. Furthermore, we distinguished two types of interaction within HCI (personalization and problem solving) and considered interrupted dialog flows after pre-defined barriers.

First of all, we observed similarities in the occurrence of DPs in the case similar dialog styles were used, but the type of DPs varied. In general, the number of DPs between the interview (HHI) and the personalization module (HCI), two similar dialog styles, is roughly equal. But the ratio of the two types, fillers and feedback signals, is significantly different. This could be traced back to the absence of a proper conversation partner within the HCI. This result is in line with the findings of [7], that talk-organizing DPs are increasing in HCI in comparison to HHI.

Another similarity is the constant influence of low and high scorers. Subjects using only few DPs within HHI are also using few DPs during the HCI, with a similar frequency. Thus, the absence of a conversation partner does not lead per se to an increase or decrease of DPs.

Afterwards, we analyzed to what extent personality traits influence the use of DPs. In the interview only an influence of SVF positive strategies was shown. To analyze the influence of personality traits in HCI, we had to take into account the dialog barriers. In this case, a significant difference in the use of DPs for SVF positive strategies, SVF negative strategies, IIP vindictive competing, and NEO-FFI Agreeableness was shown. The use of DPs is mainly stimulated by “negative” psychological characteristics. Bad stress regulation capabilities will cause the use of DPs in situations of higher cognitive load. This supports the assumption that DPs are an important pattern to detect situations of higher cognitive load [4].

By analyzing the subjects’ use of DPs after the dialog barriers, we showed that both fillers and feedback signals are used more often in situations of higher cognitive load, although the conversation partner is not able to understand these cues. One possible explanation is that the subjects ascribe this ability to the system because they unconsciously assume humanlike characteristics and mental states to the system [13]. Another insight could be drawn from the comparison of the behavior of low and high scorers after the investigated barriers. High scorers used significantly more DPs during the distorted dialog than during a non-distorted dialog. This again shows the influence of the personality traits, as the frequent use of DPs is an indicator for bad stress regulation capabilities which will lead to an increased use of DPs within the distorted dialog.

Our results are in line with the findings of [7] showing that the use of feedback signals and fillers is different between HHI and HCI. But we could also show that certain dialog situations influence the use of DPs. In future, a deeper analysis of the functional meaning of the uttered DPs, although not directed to the artificial conversation partner, will be performed. Together with the knowledge that subjects in situations of higher cognitive load tend to use more DPs, this enables future technical systems to examine lonreg-lasting natural interactions and dialogs and to identify critical situations.