1 Introduction

Weizenbaum’s computer program ELIZA [1], which represented a homage to person-centered psychotherapy according to Carl Rogers [2], was assumed to have background knowledge, logical reasoning and the ability to felt understood after a short time. Users’ reactions revealed the human tendency to experience even simple computer systems as empathic and trustworthy after even marginal interaction. Nass and colleagues [3, 4] came to similar conclusions and referred to the fact that humans adopt behavior patterns from face-to-face contact in human-computer interaction. Many users interact with their computer systems as if they had human motives and intentions. With respect to individuality, it can be suggested that every single user experiences the same system in different ways. Therefore, people will show diversity in their interaction behavior. Those differences might be especially observable during challenging situations, at the moment of change from unobstructed interaction to interaction experienced as challenging or requiring the person to solve a task.

On this account, the present paper surveys changes during an interaction with a simulated, natural-linguistically controlled computer system. The focus is on the impact of user characteristics on interaction behavior when faced with a challenging situation.

2 Problem Solving and User Characteristics

Before classifying users in their contact and behavior with technical devices, it is first necessary to detect suitable user characteristics. However, there is hardly any research in this field meeting empirical standards [5], as most surveys center on usage behavior. Studies conducted this way allow for only limited conclusions to be drawn as they do not directly evaluate the interaction process with computer systems. They usually use questionnaires where the participants have to rate factors such as interaction behavior, attitude towards technology or frequency of technology usage.

2.1 User Characteristics in Human-Computer Interaction

The aforementioned statements are not meant to imply a complete lack of consideration of this topic in the literature. The most frequently surveyed user characteristics regarding actual behavior are personality, age, gender, technological experience and technological affinity.

The impact of personality theories has been considered since the early days of the field, mainly with regard to the dimensions of the Big Five personality traits [6]. Currently they are used in the technology acceptance model (TAM) with focus on the dimension of extraversion [7, 8].

For various reasons, it is not possible to make a clear statement on the impact of age and gender as user characteristics. There are numerous studies examining the effect of age, but these are difficult to compare. Most of them focus only on one age cohort, and comparisons between cohorts are rare. For older users, it is not only age that has an effect on the use of computer systems, but also factors like socio-economic status, educational level, extent of computer anxiety and interest in computer usage [9].

Findings concerning gender are indeterminate and seem to change over time. For example, Howard [10] found that females showed a higher level of computer anxiety compared to males, whereas King and colleagues [11] identified the reverse effect. Likely on the basis on such results, some authors argue that gender has no effect on human-computer interaction [12]. However, it seems undisputable that females tend to have significantly less self-confidence regarding the usage of computer systems compared to males [13]. This can be explained by gender differences in attributional styles [14]. Females tend to see failures or problems in computer usage or interaction as their own fault, which is known as internal attribution, whereas males tend to place blame for their failures and problems with someone or something else (external attribution) [13].

Technological experience is closely associated with technological affinity. According to Hassenzahl [15], the motivational aspect is fundamental when examining the extent of computer experience, which should be considered when looking at user patterns and usage models [16]. Furthermore, correlations exist between computer experience and attitudes towards technology in general [17] as well as self-efficacy [18].

2.2 Findings Regarding User Characteristics and Problem Solving

According to Dörner [19], a problem arises when a barrier prevents a subject from (1) experiencing its present state as satisfying and (2) attaining a desired goal state. Here, a distinction needs to be made between simple and complex problems. A complex problem is characterized by the need for reduction to the essential interdependence of involved variables, situational change over time and a lack of transparency [20].

The ways in which humans cope with those complications are determined primarily by their personality characteristics. There are clear connections between the Big Five personality traits, coping with stress and handling situations experienced as challenging. People with high levels of openness and extraversion are less stress sensitive [21, 22]. Optimal performance is achieved at lower stress levels for introverts and higher stress levels for extraverts [22]. Higher levels of neuroticism increase perceived stress, which may lead to depressive withdrawal, melancholy and a reduction of self-esteem [21, 23]. With regard to conscientiousness, there is no explicit evidence for a stress-reducing effect.

Furthermore, the influence of age on performance should be considered. There is evidence that older people with higher domain knowledge tend to reach their limits of effectiveness faster than younger people with lower domain knowledge [24].

For gender, the aforementioned attributional styles still apply. It can be supposed that females use internal attribution with regard to problems or failures during an interaction with a computer system. According to Abramson and colleagues [25], this attributional style is the basis for the theory of learned helplessness. This means that people who tend to attribute their failures as internal, global and stable are more likely to feel helpless in challenging situations.

3 Methods

The current study attends to the effect of user variables on interaction behavior with a simulated, speech-controlled and automated computer system. More concretely, we investigated the impact of situations experienced as challenging on interaction behavior.

After examining the aforementioned findings from previous research, we generated and examined the following research questions: What impact do stressful situations, hereafter referred to as challenge situations, have on participants’ task performance while interacting with a computer system? Do correlations exist between user characteristics and performance?

3.1 Sample

We recruited participants aged between 18 and 29 as well as over 60 years old. Obtaining an equal distribution of age, gender and educational level was taken into account. Distinctions were made between participants with a “higher educational level” (general matriculation standard, studies at a university or a university of applied sciences) and “lower educational level” (secondary school or secondary modern school certificate, apprenticeship as highest educational/occupational qualification). Altogether we gathered 130 participants, of which one could not be properly assigned to a level of education (Table 1).

Table 1. Sample distribution regarding age groups and gender

3.2 Wizard of Oz Experiment

We developed a Wizard of Oz (WoZ) experiment which suggested to participants that they were interacting with an autonomous and automated computer system. The simulation was controlled by operators who worked in a separate room. A brief explanation of the experimental setting follows [27, 28].

An experimental supervisor briefed participants that they would be interacting with a computer system via speech input and output. At the beginning, the system asked for personal information to allow participants to get familiar with speech control and the operating mode. Subsequently, story tasks and maintaining restrictions were given by the system. Participants had to arrange their luggage for a trip with the help of the system under certain time restrictions. In doing so, they were able to choose items out of 12 categories (jackets, tops, trousers, shoes, etc.). The instructions given by the system encourage participants to imagine that they need luggage for a summer vacation. At this point, all restrictions are transparent and the task proceeds without limitations. This stage is called baseline (BSL). After selecting items from the eighth category, participants receive a limitation without prior notice requiring them to not only insert but also remove items. Participants receive assistance from the system on how to unpack items. This stage is called the weight limit barrier (WLB). Two categories later, participants have to handle another challenge situation, called the weather information barrier (WIB); they are informed that this would be a winter vacation rather than a summer one as previously assumed. They needed to adapt their strategy to the current conditions under not clearly defined time restrictions. At this point, a randomly selected portion of our sample got an affect-oriented intervention. This empathic intervention was based on general factors of psychotherapy (resource activation, problem actualization, accomplishment and clarification) [29]. Former studies have already shown that interventions given by computer systems can influence the interaction process [30, 31]. The other randomly selected participants proceeded without any further interventions. At the end, all participants got the opportunity to change some items in their luggage (revision stage (RES)). With reference to Funke [20], we can differentiate between a simple set of problems at WLB and a complex set of problems at challenges WIB and RES.

Dialog success: Besides satisfaction, performance can be seen as the most important part of user experience [32] and is defined as follows: Performance “includes measuring the degree to which users can accomplish a task or set of tasks successfully. Many measures related to the performance of these tasks are also important, including the time it takes to perform each task, the amount of effort to perform each (such as number of mouse clicks or amount of cognitive effort), the number of error committed…” [32, p. 44]. Thus, performance can be evaluated without having to rely on the subjective appraisal of users or test supervisors, allowing it to be termed an “objective goal variable”. Because we used a speech-based control system, we were able to measure changes in interaction behavior at different experimental stages, which provides information about users’ performance. The performance dimension “dialog success” describes participants’ efforts to adapt to the altered conditions (challenge situations).

In terms of the technical implementation, participants’ interaction behavior had to be operationalised initially, which was conducted using so called “logs” [33]. During the experiment, all contributions of speech output from the computer system, including their exact times, were logged. Afterwards, outputs were identified which represented a reaction to participants’ interaction contributions (e.g. phrases, single words or longer silence). This allowed for a categorization of participants’ interaction contributions without regard to the contents of transcripts, with the two categories of system is able to process contribution (positive logs) and system is not able to process contribution (negative logs). Negative logs are characterized by synonym failure (e.g. participant said stockings instead of socks) and all utterances outside of a clearly defined domain [27]. The positive logs represent all participant contributions that could be processed by the system.

Using the experimental values, it is possible to generate a “log quotient” (\( Quotient_{logs} = \frac{{N_{positiv logs} }}{{N_{all logs} }} \)) for each participant. The log quotient permits an intra- and inter-individual comparison of different time stages during the experiment. A high value indicates that a participant was more successful in adapting to the conditions.

3.3 Psychometric Questionnaires

Before the last-minute experiment, we collected data regarding socio-biographic variables and aspects of experience with and usage of technical devices. We also conducted a system evaluation upon experiment completion. We made a separate appointment where participants completed various psychological questionnaires concerning coping with stress, interpersonal problems, attributional styles as well as technological affinity. In addition, they filled in a questionnaire regarding the Big Five personality traits, the NEO Five-Factor Inventory (NEO-FFI) [26], which measures the factors of neuroticism, extraversion, openness for experience, agreeableness and conscientiousness.

4 Results

We used repeated-measures ANOVAs for data analysis to analyze the effects of different independent variables on dialog success. We conducted one within-subject ANOVA to test only the effect of the different conditions (time). Here, we found a statistically significant interaction effect of time (F(2.74, 20.54) = 138,19, p < 0.001)Footnote 1. One ANOVA was conducted with time as the within-subject factor and age (young vs. olds) as the between-subject factor. This revealed a significant main effect of age (F(1, 126) = 8.75, p < 0.004), showing young participants to have more dialog success, while the significant interaction of time and age (F(3.74, 345.65) = 3.651, p = 0.016) showed that these differences occurred only in the phases WIB and RES, but not during the first two (BSL and WLB) (Fig. 1).

Fig. 1.
figure 1

Dialog success over time by age

We also looked for a main effect of psychotherapeutic-based intervention. Averaged over all test intervals, no statistically significant difference could be found (F(1, 124) = 1.03, p < 0.311) between the control and experimental groups regarding the dialog success. Neither was there a significant interaction effect of time and intervention (F(2.74, 339.44) = 0.027, p = 0.991). Therefore, it was not necessary to consider these groups separately in further measurements.

Several other ANOVAs were conducted with time as the within-subject and one of the Big Five factors or computer experience as the between-subject factor. To run the ANOVAs we had to split the sample for each independent variable (NEO-FFI scales, computer experience). We used a median split to subdivide these independent variables into two groups (lower vs. higher value). The following ANOVAs revealed statistically significant main effects for neuroticism (F(1, 124) = 5.94, p < 0.016), agreeableness (F(1, 124) = 5.274, p < 0.023) and computer experience (F(1, 124) = 4.58, p < 0.034). On closer examination of descriptive statistics, it became obvious that participants with a lower value of neuroticism, higher value of agreeableness and more computer experience (Fig. 2) showed better performance at WLB and WIB in particular. There were few differences between groups (lower vs. higher value) during BSL. The last challenge (RES) revealed the re-harmonization of the dialog success in the examined groups. In conclusion, participants who showed lower dialog success were older and had higher scores in neuroticism, lower scores in agreeableness and less computer-experience. The measurements showed no significant main effects for gender or the NEO-FFI scales for extraversion, openness and conscientiousness.

Fig. 2.
figure 2

Dialog success over time divided by NEO-FFI neuroticism (top left), NEO-FFI agreeableness (top right) and computer experience (bottom).

The three-way-interaction between time, age and extraversion revealed a statistically significant effect (F(2,74/340,09) = 2.75, p < 0.047). Another significant effect could be revealed for the interaction of computer experience with age and time (F(2,78/344,18) = 3.89, p < 0.011). All other interaction effects were not significant.

5 Conclusion

The implementation of a computer system perceived to be trustworthy, available and able to adapt presupposes an individualization process. Relevant user characteristics need to be identified from the beginning for improved classification and prediction of usage behavior. Therefore, the present study aimed to analyze the influence of individual user characteristics on participants’ ability to deal with situations experienced as challenging during an interaction with a computer system. Challenging situations of different complexity levels had to be dealt with while handling tasks at determined points.

By using a largely standardized WoZ experiment, we were able to compare performance via dialog success in different challenging situations for 130 participants, who were chosen with regard to age, gender and educational level. Studies regarding problem-solving in connection with personality traits as well as socio-demographic variables allowed for the selection of potentially relevant user characteristics [2125]. By using repeated-measures ANOVA, we could identify significant correlations between age, computer experience and the Big Five dimensions of neuroticism and agreeableness [26] as well as average dialog success during the course of interaction (tests of within-subject effects). Participants with lesser computer experience, higher scores in neuroticism and lower scores in agreeableness showed considerably less dialog success even at the beginning of a simple problem (WIB). Among older participants, however, this decrease did not occur until the problem complexity was increased. These results were achieved on the basis of actual interactions with a computer system and are in line with previous empirical findings in personality research. It was only for extraversion that no significant differences could be detected, whereas a difference (tests for within-subject effects) existed between measurement points and extraversion when considering age. This may originate from the fact that the time intervals had different lengths, with the duration of BSL equal to the total duration of all three challenges (WIB, WLB, RES) combined. Furthermore, the comparatively small sample size may underlie certain variability.

Nevertheless, the detected results substantiate the need for an individualization process as a fundamental basis for the acceptance of advancing automated computer systems to the point of companion systems.