1 Introduction

Increased attention is being given in the design of new digital products and services to the experience of engagement. In the terminology of Quesenbery [1], engagement is defined as: “…the degree to which the tone and style of the interface makes the product pleasant or satisfying to use.” Product attributes such as ‘tone’ and ‘style’ can be seen as hedonic attributes relating to the intrinsic stimulation of a product that are seen as distinct from pragmatic attributes relating to the proper functioning of a product [2]. These style aspects have become increasingly important since digital products are moving from professional environments into our everyday lives [3, 4]. However, engagement remains an evasive concept in view of the many different terms used to describe it [5, 6].

A framework of ‘Richness, Control and Engagement’ (RC & E) was developed to assess the qualities of the experience of engagement and to identify the factors that influence it [7]. The preliminary experiences of the framework were positive: The framework was found to be helpful in designing towards engagement in a video game due to its predictive power and its openness to allow freedom in the creation of design solutions. However, the fact that the RC & E framework was found to be useful during experiential tasks (free play) does not necessarily mean that it will be useful during goal-directed tasks (use). In this study, the RC & E framework is applied in the domain of voicemail browsing as a means to (1) design digital products towards increased levels of experienced engagement, and (2) to examine whether the RC & E framework holds its predictive power during goal-directed tasks.

In brief, the RC & E framework explains the levels of experienced engagement via the levels of experienced richness and control, which are shaped by the features of a product and the expertise of a person. Engagement assesses the extent to which an activity is intrinsically enjoyable, and arises when the activity supports the functioning and growth of an individual [810]. During engagement, a range of positive emotions can be experienced such as excitement, freedom and enjoyment, and time and energy are willingly invested [1118]. Richness captures the growth potential of an activity by assessing the variety and complexity of thoughts, actions and perceptions as evoked during the activity [1923]. The higher the levels of experienced variety and complexity, the higher the levels of experienced richness. Control captures the extent to which a person is able to achieve this growth potential by assessing the effort that is experienced in the selection and attainment of goals [24, 25]. The more effort is experienced, the lower the levels of experienced control. An activity is considered to be optimally engaging when it affords high levels of experienced richness and control, i.e. the activity provides growth potential that can be achieved. More specifically, the levels of engagement could be predicted by taking the square root of the product of experienced richness and control (Fig. 1).

Fig. 1
figure 1

Visualization of the levels of experienced engagement (curved lines) as a function of the levels of experienced richness (x axis) and experienced control (y axis) according to the formula: E = R 0.5 C 0.5. The levels of experienced engagement increased when either the levels of experienced richness or the levels of experienced control increased. Increasing the number of product features led to increased levels of experienced richness and reduced levels of control. Further, levels of control could decrease according to the extent to which the product physically constrained goal attainment. The levels of control could increase in time as users’ expertise increased

Experiences of richness and control are shaped by the features of a product and the expertise of a person [7, 26, 27]. In the current study, the focus is on the influence of product features on the levels of experienced richness and control. Product features can include functional aspects, i.e. the possibilities of the digital product; manipulation aspects, i.e. the responsiveness of the product to the actions of a person; and appearance aspects, i.e. the presentation of the product [28, 29]. Function, manipulation and appearance aspects can vary independently in digital products due to digital mediation [30].

Varying product features on the functional, manipulation and appearance aspects influences the levels of experienced richness at the mental, behavioral and sensorial levels, respectively. For example, the game of chess is considered to be mentally richer than the game of Tic-Tac-Toe, since chess allows more choice [31, 32]. Playing chess by moving physical pieces across a three-dimensional space is considered to be behaviorally richer than playing chess by typing in the coordinates using pushbuttons, since the former allows more expressiveness in physical action [33, 34]. Playing chess with chess pieces that are visually presented at a high level of detail is considered to be sensorially richer than using chess pieces represented in a minimal, abstract form since the former allows more sensorial stimulation [3537]. An earlier study showed that the experienced behavioral and sensorial richness could be integrated into overall richness judgments by using an additive rule [23, 38].

Similarly, varying product features on functional, manipulation and appearance aspects influences the levels of experienced control at the mental, behavioral and sensorial levels. Product functions provide goals, product manipulation provides pathways to goals, and product appearance provides information about goals [24]. An earlier study showed that the levels of experienced control decreased in two different ways: objective and subjective [7]. First of all, the experienced control could decrease when the product physically constrained interaction, i.e. the product features did not offer the required functions, pathways and/or information to attain a goal or to attain a goal in an efficient manner. Secondly, the experienced control could decrease when a person lacked the expertise needed to attain goals with the product. He/she then had trouble in understanding the product functions, performing actions that are needed in product manipulation and attuning to the appropriate product information. Increased expertise resulted in increased levels of experienced control, provided that this increase was possible given the product’s physical constraints.

The RC & E framework was developed for games during experiential tasks. In these tasks a person can pursue goals freely as they emerge from the features of the product, and a person was free to start or stop playing at will [39, 40]. During goal-directed tasks, however, goals are bound by the purpose for which the product is used, and these goals should be met within a certain timeframe. In this situation, the levels of experienced control may be more influential in the levels of experienced engagement than the levels of experienced richness [41]. Further, the relationship between the number of product features and the levels of experienced richness and control may be affected by changes in goals [24], focus of attention [42, 43] and behavior [20].

When engagement is discussed within the domain of voicemail browsing, both types of tasks should be included since people may experience products as things in themselves, i.e. toys, or as things for something else, i.e. tools [4446]. Further, consideration should be given to the effect that the user interface and the content of the product have on the levels of experienced engagement. Both the user interface and the content interact in terms of the overall product features, since both transform each other [47]. For example, a website of an online store might be considered engaging since sound and animation are used to represent the virtual products. However, as the number of products within the virtual store increases, the levels of engagement may decrease if the total amount of sounds and animations becomes overwhelming. In this case, a user interface that represents virtual products by visual images only may lead to higher levels of engagement.

To examine the extent to which the user interface and the content of a digital product affect the levels of experienced richness, control and engagement during experiential tasks versus goal-directed tasks, a prototype of a voicemail application was developed. Following the framework, voicemail browsing can be designed towards high levels of experienced engagement in play by designing products towards increased levels of experienced richness and control. However, it is not yet clear if the levels of engagement increase similarly during goal-directed tasks compared with experiential tasks, and how the user interface and content in combination affect the levels of experienced richness and control across both types of tasks.

2 Method

2.1 Prototype

A novel voicemail machine was developed that uses gestures and sound to access voicemails. Previous studies showed that various types of information could be represented in sound [48, 49]. Further, gestures combined with sound feedback can be used effectively to access digital information such as music [5052]. An earlier study identified the three most relevant voicemail properties for voicemail browsing [53]. These were: (1) the sender of the voicemail message; (2) the time the voicemail message was sent; and (3) the level of urgency of the voicemail message. These properties were used to drive the design of the sounds in the current study. In terms of the sender property, three senders were defined: Henk, John and Anna. In terms of the reception date, three date/time categories were defined: yesterday, today, and most recent. In terms of the urgency of the message, three levels were defined: chitchat, informative and urgent. These voicemail properties formed the basis for distinguishing voicemails.

The voicemail machine consists of a physical product that serves as the basis to which gestures and sound are added (Fig. 2). The physical product has a circular form that is 230 mm in diameter and 70 mm in height, with 16 slots along the ridge and 1 slot at the center of the product. Voicemail messages can be stored virtually above the spaces of the physical slots along the ridge. Messages can be accessed by performing gestures with an input device above the slots. The input device has a rod-like form that is 40 mm in diameter and 55 mm in height. The slot at the center of the product serves as a base for the input device.

Fig. 2
figure 2

Picture of the physical product within its physical context. The physical product has a circular form, with 16 slots along the ridge and one slot at the center of the product. Voicemail messages may be stored within one of the 16 voicemail slots, and can be accessed by performing gestures above the product. The center slot serves as a base for the input device

Three versions were designed that varied in the way voicemail properties were represented in sound and how gestures influenced these sounds. Sound feedback was used to inform the user about the voicemail properties defined above during browsing, and to inform the user about the locations of the physical slots in which voicemails are stored within the physical product. In this sense, the gestures (manipulation) and sound (appearance) can be considered as the user interface embodying the voicemail content (function). The three versions are described in more detail below (Table 1).

Table 1 Overview of user interfaces I–III
  1. With user interface I, spoken voicemail messages are heard when the input device is placed closely above the physical slots where the voicemails are stored. When this is the case, the voicemail message starts playing at the beginning of the message and stops playing when the input device is removed above the physical slot. Voicemail properties such as sender, reception date and urgency are communicated directly through the spoken message by the voice of the sender and the subject of the message. In terms of sender, the voicemails were recorded using two male voices and one female voice. In terms of reception date, a second female voice states the time the message was received before the message starts, and in terms of urgency the actors carefully choose content and applied intonation to fit the message into three categories of urgency. This user interface can be characterized as a discrete selection tool, given that one voicemail may be heard at a time.

  2. With user interface II, additional sounds can be heard that complement the spoken voicemail messages. These sounds involve material impact sounds, which are elicited when the input device enters virtual rod-like regions extending upwards from the physical slots. Voicemail properties (sender, time and urgency) are translated into timbre, volume and pitch, respectively. The sender was communicated by bell-like, wood-like and water-like impact sounds. The reception date was communicated by volume differences. The volume levels for older voicemails were lower than for the more recently received ones. Urgency was communicated by pitch differences. This user interface can be characterized as a discrete selection tool combined with a musical instrument, given that a variety of sounds can be created by gesturing above the product.

  3. With user interface III, multiple-sounding voices are given additionally to the previous sound feedback described above. Voicemail properties such as sender, reception date and urgency are communicated directly through the spoken message. The volume levels of the voicemail samples are mapped to the distance between the input device and the physical slots. Thus, at a given location of the input device above the physical product, multiple spoken voicemail messages can be heard simultaneously at different volume levels. By moving the input device closer to a physical slot, the volume level of the voicemail stored at that location increases accordingly. This user interface can be characterized as a discrete selection tool and a musical instrument combined with a chattering box, given that multiple people can be heard chatting at the same time.

2.2 Experimental design

The experiment was set-up as a full factorial design with three levels of User interface, three levels of Content and two types of Tasks. User interface and Content varied within participants and Task type varied between participants. As described earlier, the User interface was manipulated by simultaneously varying the number of gestures that resulted in sound feedback and by varying the number of sounds in which the voicemails appeared. Content was manipulated by varying the number of voicemails that were stored within the product, ranging from one and six to 12 voicemails. Task type was manipulated by asking half of the participants to play freely with the product, and asking the other half to use the product in a search task. The nine experimental conditions were presented to the participants in a random order.

2.3 Participants

Twenty-eight subjects participated in the experiment. The participants were randomly selected from a consumer evaluation panel database consisting of a diverse range of people living within the vicinity of the Delft University of Technology. Participants’ ages were between 24 and 49 years, with an average age of 41 years and a standard deviation of 7 years. Of the participants, 12 were women and 16 were men. Each person was paid seven euro for participating.

2.4 Procedure

Participants were told that the goal of the experiment was to evaluate a novel voicemail machine. Before the experiment started, it was explained to the participants that voicemails could be accessed by performing gestures above the tangible product to produce sound feedback. The voicemail properties, i.e. sender, reception date and urgency, and their representation in the spoken voicemails and material sounds, were explained to the participants. After that, participants were asked to interact with the voicemail machine in nine different sessions without prior training. In each experimental condition, the voicemails were randomly distributed around the tangible product. During experiential tasks, participants were free to play with the product and could stop playing at will. During goal-directed tasks, participants were given the task of locating a specific voicemail as efficiently as possible within 30 s. For instance, the experimenter could ask a participant to locate a voicemail message that Anna sent yesterday, or to locate John’s most urgent voicemail message. After each session, participants were asked to fill in a questionnaire, and the experiment was concluded with a 5-min interview.

2.5 Measures

In the current study, both behavioral and experiential measures were taken. In terms of behavior, playing time was measured during experiential tasks as an indicator of intrinsic interest. Playing time was defined as the time between the start of the session and the moment the participant willingly ended the session. Participants could play for a maximum of 3 min. Search time and the percentage of errors were measured during goal-directed tasks as indicators of performance. Search time is defined as the time between the beginning of the task and the moment the participants identified target voicemails. Participants were given a maximum of 30 s per search task. An error was defined as the identification of the wrong voicemail, and the error percentage was calculated across participants.

In terms of experience, a questionnaire and a structured interview were used. The questionnaire was used to assess the levels of experienced richness, control and engagement via 17 items on a ten-point scale. Richness included items assessing the extent to which variety and possibilities are experienced [15, 20, 54]. Control included items assessing clarity, ease and self-confidence [24, 25, 5557]. Engagement included items assessing excitement, challenge, enervation, stimulation, enjoyment, fun, motivation, freedom and personal style [1113, 15, 17, 58, 59]. After each experimental condition, the questionnaire was filled in. The experiment was concluded with a structured 5-min interview in which participants could elaborate on their experiences.

3 Results

3.1 Behavior

Two separate analyses of variance were conducted to examine the extent to which the User interface and the Content manipulations affected playing time during Experiential tasks, and the extent to which search time and percentage of errors were affected during Goal-directed tasks. During experiential tasks, increased amounts of voicemail Content led to increased playing time (Fig. 3). No effect of the User interface on playing time was found. However, User interface version III reduced playing time when Content increased from 6 to 12 voicemails (Fig. 3), but this effect was not significant (Table 2). During goal-directed tasks, increased amounts of voicemail Content led to increased search time (Fig. 4) and to an increased percentage of errors (Fig. 5). These effects were at significant levels (Table 3), and indicated that search behavior became increasingly less effective and efficient as voicemail Content increased.

Fig. 3
figure 3

Mean scores of playing time across the User interface and voicemail Content conditions during Experiential tasks

Fig. 4
figure 4

Mean scores of search time across the User interface and voicemail Content conditions during Goal-directed tasks

Fig. 5
figure 5

Mean percentage of the number of errors across the User interface and voicemail Content conditions during Goal-directed tasks

Table 2 Effects of User interface and Content during Experimental tasks on playing time
Table 3 Effects of User interface and Content during Goal-directed tasks on search time and the number of errors

3.2 Principal component analysis

To investigate participants’ experiences of the experimental conditions, a principal component analysis was performed to find the components underlying the assortment of questionnaire items. Three components were extracted that explained about 72% of the variance (Table 4). The first component was interpreted as a component related to engagement. The engagement component captured groups of items assessing enjoyment, excitement and freedom. The second component was interpreted as a component related to richness. The richness component captured variety, possibilities and enervation. The third component is interpreted as a component related to control, and captured self-confidence, ease and clarity. The group of enjoyment and excitement items loaded on both the engagement and richness component. The items assessing freedom and personal style loaded on both the engagement and control component.

Table 4 Results of principal component analysis with Varimax rotation given the items measuring richness, control and engagement

For experienced richness, control and engagement sum scales were developed by grouping the individual items according to their highest loadings. For each sum scale, Cronbach’s alpha was calculated as a measure of internal consistency. For the component assessing richness, alpha measured 0.83 (N = 252, 4 items); for the component assessing control, alpha measured 0.90 (N = 252, 4 items); and for the component assessing engagement, alpha measured 0.93 (N = 252, 9 items). For all components, alpha measured above the critical threshold of 0.70, indicating that each group of items measured a similar construct and that they could be grouped [60].

3.3 MANOVA

A multivariate analysis was conducted to examine the extent to which User interface, Content and Task type influenced the levels of experienced richness, control and engagement. Results showed that by increasing the amount of voicemail Content, the levels of experienced richness increased and the levels of experienced control decreased (Figs. 6, 7). No significant differences were found for the Content manipulations on engagement (Fig. 8). Further, using the product in a Goal-directed manner led to lower levels of experienced control compared to Experiencing the product (Fig. 7). Additionally, Task type and Content were found to affect the levels of experienced richness in combination. While the levels of experienced richness increased with increased amount of voicemail Content during Experiential tasks, the levels of experienced richness did not increase with increased amounts of voicemail Content during Goal-directed tasks (Fig. 7). All of these effects were significant (Table 5). No significant effects were found for the User interface manipulations on any of the experiential measures: the contribution of the user interface to the overall manifestation of the product involved transforming voicemail content rather than adding something by itself.

Fig. 6
figure 6

Figures showing the levels of experienced richness as a function of Content, User interface and Task type. Experiential tasks (left) and Goal-directed tasks (right)

Fig. 7
figure 7

Figures showing the levels of experienced control as a function of Content, User interface and Task type. Experiential tasks (left) and Goal-directed tasks (right)

Fig. 8
figure 8

Figures showing the levels of experienced engagement as a function of Content, User interface and Task type. Experiential tasks (left) and Goal-directed tasks (right)

Table 5 Effects of User interface, Content and Task type on experienced richness, control and engagement

3.4 Correlation

Figure 9 shows the levels of experienced engagement (curved lines) as a function of the levels of experienced richness (x axis) and experienced control (y axis) according to the formula: E = R 0.5 C 0.5. The mean scores of all 18 experimental conditions are plotted in this space. When the RC & E framework is applied to the scores, it appears that the levels of engagement could not be predicted by the formula: r = 0.237, p < 0.343. However, closer examination revealed that when regression lines are drawn between the actual levels of experienced engagement and the predicted levels of experienced engagement for experiential and goal-directed tasks separately, the resulting regression lines were nearly parallel (Fig. 10). This indicates that the levels of experienced engagement could be predicted by the addition of a constant factor to the formula. This factor was calculated for Experimental tasks at 0.10 and for Goal-directed tasks at 1.42. When the factor was included in the formula, almost 80% of the total variance could be explained: r = 0.886, r 2 = 0.785, p < 0.000.

Fig. 9
figure 9

Figure showing the levels of experienced engagement (curved lines) as a function of the levels of experienced richness (x axis) and experienced control (y axis) according to the formula: E = R 0.5 C 0.5. The mean scores of all 18 experimental conditions (User interface × Content × Task type) are plotted in this space. The numbers in the plots represent the assessed levels of engagement; not the predicted ones based on the above formula

Fig. 10
figure 10

Figure showing the scores of User interface, Content and Task type on the actual levels of experienced engagement (x axis) and the predicted levels of experienced engagement (y axis). For both types of tasks, regression lines are calculated. The figure shows that the regression lines appear to be parallel, indicating that the effect of Task type on the model is an additive one. At a more detailed level, however, it appears that products with less voicemail content appear to slightly increase in levels of engagement during goal-directed tasks

4 Discussion

The results showed that the levels of experienced engagement in voicemail browsing could be predicted based on the levels of experienced richness and control with the addition of a task factor. According to the RC & E framework, the levels of experienced engagement increased with increased levels of experienced richness and increased levels of experienced control. Further, with increased number of product features, the levels of experienced richness were found to increase and the levels of experienced control were found to decrease.

During goal-directed tasks, the predicted levels of experienced engagement following the formula: E = R 0.5 C 0.5 were found to be systematically lower than the actual levels of experienced engagement, indicating that the task may have a level of engagement in itself that should be added to the formula: E = R 0.5 C 0.5 + task. The value of this task factor could depend on the levels of experienced control, which decreased due to the addition of external demands. Decreased levels of experienced control based on the difficulty of meeting goals resulted in higher levels of experienced engagement. However, it can be expected that engagement may not increase and may even decrease when the levels of control are reduced further, given that products with less voicemail content appeared to increase the levels of engagement during goal-directed tasks, while products with more voicemail content were not found to increase engagement.

Further, the task was found to influence the levels of experienced richness based on the number of product features. Participants may have been more sensitive to the variety of product features during experiential tasks than during goal-directed tasks, since the focus of attention was broader and behavior more explorative. Some support was found for this explanation since some participants who browsed voicemails during goal-directed tasks expressed during the interview that they were unaware of the material sounds. Either they did not explore this sound feedback, resulting in a lower elicitation of these sounds, or they paid no attention to them during goal pursuit.

As well as the influence of the number of product features on the levels of experienced richness and control, the randomness of the product and the playfulness of the participants should be taken into account. Distributing voicemails randomly over the physical voicemail slots caused confusion since participants searched for a systematic ordering. Further, minor system fluctuations in the tracking technology had as a result that the spatial locations in which sound feedback was given were not always directly aligned with the spatial location of the corresponding physical slots. Since sound was the only source of information to identify the location of a voicemail, both forms of randomness were experienced as very confusing. The levels of experienced control might considerably increase when these two manifestations of randomness are cancelled out. Finally, individuals’ playfulness was found to influence the levels of experienced engagement. It was observed that some participants were very performance-oriented during experiential tasks, while others remained relaxed. The opposite was found during goal-directed tasks. It appears that people are able to vary how they respond to the given context, indicating that their behavior may still be playful even during highly demanding tasks [61].

5 Conclusions

This study showed that the RC & E framework could be applied to design engagement in voicemail browsing by taking into account the levels of experienced richness and control based on product features and the type of task. Product features may lead to experiences of engagement in voicemail browsing by providing rich and varied experiences and by providing ways to make it easy to find voicemails. Voicemail browsing can lead to higher levels of experienced engagement when a clear goal has been set and when difficulty is experienced in goal attainment, compared with the situation when no goal has been set. This may explain why a game that is experienced as not very engaging during experiential tasks based on its product features may be experienced as engaging when a game is played in competition, either with oneself or with others.

The results appear to be difficult to generalize to product use in which the outcome of the activity can have serious consequences, such as products used in surgery or money transactions. It can be questioned whether in those cases engagement increases when the level of experienced control decreases. An alternative explanation could be that the task factor affects the relative weight of richness and control in influencing engagement. The relative weight of control may increase with the importance of meeting goals.

Some comments can be made regarding the applicability of the RC & E framework to interaction design practice. By providing insight into how product and task relate to engagement, designers can foresee pitfalls and opportunities, thereby increasing the chance of attaining engagement through design. At the same time, the framework may also hinder designers since conceptual frameworks limit one’s experiential field. The realities in which designers operate are often more complex than frameworks are able to capture. Designers’ intuition should therefore be exploited, especially within the initial stages of design [62]. This means that the challenge in developing the RC & E framework for design practice will be to point out relevant factors of engagement while still allowing designers to use their intuition and creativity.

Future studies should be conducted to develop the framework further by simultaneously including all three relevant factors, i.e. the number of product features, the person’s expertise and the type of task in a study. In this way, the interactions between these factors at the levels of richness, control and engagement can be investigated. Further, the framework should be investigated within realistic settings to capture the actual concerns people have when a product is used to achieve a specific goal, and to examine when people are in fact playing with the product. Longitudinal studies are needed to investigate the effect of a person’s expertise in more detail.