1 Introduction: Five Basic Requirements for a Standardized Assessment of UX

User Experience (UX) is nowadays regarded as a key factor for the successful development of interactive systems. According to the norm ISO 9241-210 (2010), it covers “all the perceptions and reactions of a user before, during and after interacting with a product or service” [1]. While the traditional view on human-computer interaction has a strong focus on instrumental aspects, such as usability and usefulness, User Experience Design explicitly aims to provide positive experiences and positive user emotions by satisfying psychological needs [2, 3]. The achievement of these objectives requires the assessment of UX already in the early phases of User Centered Design (UCD) and to monitor it throughout the whole development process. In addition to qualitative methods, such as the Valence Method [4] and the Laddering Technique [5], which provide in-depth information about personal needs, values, judgments, and individual meaningfulness, quantitative methods are needed that allow for standardized comparisons [6] and for verifying specific requirements (e.g., benchmarks) with respect to UX. For this purpose, we developed the meCUEFootnote 1 questionnaire [7,8,9,10,11,12]. It aims to fulfill five basic requirements (R1 to R5) that we consider as central for a standardized and lean UX measurement and which are only partially met by other UX questionnaires, such as AttrakDiff [13] or the User Experience Questionnaire UEQ [14]:

  • R1 Comprehensiveness: UX is a complex construct and its measurement should be as comprehensive as possible. Hence, the questionnaire should comprise all aspects that are characteristic for experiencing a product, such as the perception of particular product qualities, emotional reactions during usage, behavioral consequences and the forming of an overall opinion about the product.

  • R2 Efficiency: The questionnaire should support a lean and fast assessment of UX. Therefore, it should consist of as few items as possible – but without neglecting relevant UX aspects.

  • R3 Intelligibility: Items of a questionnaire should be short, unambiguous and easy to understand. They should all adhere to the same format to spare respondents the mental effort of switching between different formulations and scales.

  • R4 Psychometric quality: The questionnaire should fulfill the central psychometric quality criteria. Empirical studies should guarantee that it measures UX in a valid, reliable and objective way. Moreover, they should ensure that is suitable for a number of different application domains.

  • R5 Adaptability: Not all UX aspects might be equally relevant for all iteration cycles of the UCD process, or for all kinds of users, systems and application contexts. Therefore, it should be possible to discard parts of the questionnaire which are not adequate for a particular research question or in a particular phase of development. Since the omission of items or scales usually harms the psychometric quality of a questionnaire, adaptability calls for an instrument which consists of modules that have been validated independently from each other. Such modules could be freely combined, i.e., any module could be left out without harming to psychometric quality of any other module or of the configuration that is chosen for a study.

The development of meCUE questionnaire aimed to account for all five requirements. It started from a theoretical framework which specifies UX key components and offers a sound basis for modularity and adaptability.

2 Theoretical Foundation of the MeCUE Questionnaire and Its Structure

Demanding comprehensiveness (R 1 ) for assessing a complex construct such as UX, calls for a psychological theory which postulates basic sub-constructs for this construct. The degree of comprehensibility that is achievable by a questionnaire can be judged against this specific theoretical background. A high degree requires that the theoretical concepts are reflected by the dimensional structure of the questionnaire and addressed by its items.

The structure of meCUE is based on the Components model of User Experience (CUE model) by Thüring and Mahlke [15]. This model distinguishes between the perception of instrumental and non-instrumental qualitiesFootnote 2. While instrumental qualities comprise particular aspects of usability and usefulness, non-instrumental qualities include features like visual aesthetics and identification (compare also [16]). The perception of both types of qualities is directly influenced by interaction characteristics (i.e., product features, user characteristics, and the context of use). It has to be emphasized that the term perception is not only referring to sensation and the forming of a coherent percept, but also includes immediate judgment processes (e.g., goal conduciveness, compatibility with standards). Emotions are an important component of UX, since positive emotions ensure that the overall user experience assumes a positive shape [17]. In the CUE model, their relevance is acknowledged by their central position and their relation to both perceptions of product qualities (see Fig. 1). As the bidirectional relationships indicate, emotions result from these perceptions, but may also react back upon on them.

Fig. 1.
figure 1

Reprinted from [12].

Components model of User Experience (CUE).

All three components together (i.e. perception of both qualities and emotions) determine the consequences of use, such as the overall product judgment, acceptance and intentions of future use.

The UX components of the model as well as the consequences of use are reflected by four modules in the meCUE questionnaire (see Fig. 2). The dimensions, which structure the modules in more detail, correspond to selected sub-constructs of the CUE model. Altogether, the questionnaire has 34 items. Each item consists of a statement (e.g., “The product is stylish.”) in combination with a 7-point Likert Scale reaching from 1 (“strongly disagree”) to 7 (“strongly agree”). The only exception from this format is the single-item of module IV (overall evaluation). It is formulated as a question at the end of the questionnaire (“Finally, how would you rate the product overall?”) and can be answered on a semantic differential, ranging from −5 (“bad”) to 5 (“good”) with an increment of .5, respectively.

Fig. 2.
figure 2

Modular structure of the meCUE questionnaire.

The grouping of dimensions and their associated items into modules addresses the requirement of adaptability (R 5 ). Depending on the research question, user group, system type, context of use or iteration cycle in the UCD process, it should be possible to choose any combinations of modules that are considered as adequate. The prere-qui-site for this freedom is that the psychometric quality criteria are fulfilled appropriately (R 4 ). This goal was pursued by validating the modules independently from each other in the course of developing the questionnaire.

3 Development and Validation of MeCUE

The creation of meCUE started with a German version for which a pool of 67 items was initially generated. Item selection and validation of the question-naire were based on five data collections [7,8,9,10,11,12]. Three surveys were conducted online (n1 = 238, n2 = 238, n3 = 237) and two in a laboratory setting (n4 = 67, n5 = 24). In all studies, participants rated a wide range of different interactive products (e.g., electronic devices, mobile applications, software, and home appliances). The first two online studies focused on determining those items which loaded high on the scales of the questionnaire. Data was analyzed using principle component analyses and resulted in the selection of 33 items measuring nine scales in three different modules. The constructed scales showed a high internal consistency (Cronbach’s alpha values between .69 and .83) [7, 11]. Proportions of explained variance were acceptable for all modules (module I: 69.9%, module II: 57.4%, module III: 63.5%). The final set of items and the structure of the questionnaire were replicated under laboratory conditions (Cronbach’s alpha values of the scales between .76 and .94, proportions of explained variance for module I: 81.1%, for module II: 74.3%, and for module III: 74.1%) [7, 11]. In order to assess the judgment of a product as a whole, meCUE was supplemented by a fourth module which consists of the single semantic differential described above [9, 11].

The validity of the final version was tested using three different approaches. First, the correlations between meCUE and the dimensions of other questionnaires measuring similar constructs were examined (convergent validity). It was found that meCUE consistently led to comparable values [7, 8, 11]. Strong correlations (r > .7) were observed between meCUE’s ‘usability’ and AttrakDiff’s ‘pragmatic quality’ as well as ‘perspicuity’ and ‘dependability’ of the UEQ. Ratings of ‘visual aesthetics’ (meCUE) were highly correlated with ‘classical’ and ‘expressive aesthetics’ [18] (r  .7), whereas correlations between ‘visual aesthetics’ and ‘status’, ‘commitment’ and especially ‘pragmatic qualities’ were on a more modest level (.4 < r < .56). With respect to emotions, strong correlations were obtained between positive affect of PANAS [19] and meCUE (r = .51) as well as between the dimensions for negative affect (resp. emotions, r = .63). Moreover, valence ratings captured by the Self-Assessment Manikin (SAM [20]) highly correlated with meCUE (r = .66 for positive and r = −.65 for negative emotions). Finally, the overall evaluation (module IV) was significantly correlated with ‘attraction’ of AttrakDiff (r = .559) [9, 11], AttrakDiff mini (r = .919) [10] and UEQ (r = .887) [9, 11].

A second approach to test the validity was to correlate subjective ratings with the number of completed tasks within a given time interval of five minutes (TTS = total tasks solved) as external criterion (criterion-related validity). In this study, 67 participants worked with two versions of text-editing software [7, 11]. TTS was significantly correlated with ratings of ‘usefulness’ (r = .32, p < .01) and ‘usability’ (r = .34, p < .01), whereas non-significant correlations were obtained between TTS and ‘visual aesthetics’ (r = .03, ns), ‘status’ (r = .04, ns) and ‘commitment’ (r = .14, ns).

Finally, it was investigated whether the assessment of UX with the meCUE questionnaire leads to results that are comparable to those of an expert review (discriminative validity). The comparison showed that the dimensions of the meCUE questionnaire replicated the results of the experts evaluation very well and even minimal variations of usability and visual aesthetics were consistently captured [11, 12].

Since the evaluations reported so far had demonstrated that the German version of meCUE was well suited for measuring the main components of UX, it was decided to create an English version. Three native speakers who had been working as professional translators or language teachers for several years translated and retranslated the items independently from each other [12]. The factorial structure of the English version was then examined in an online survey. Fifty-eight participants rated their experience with an interactive product of their own choice. For each module of the questionnaire, a principle component analysis was calculated. The results replicate the structure that underlies the German version. Proportions of explained variance were acceptable for all modules (module I: 79.5%, module II: 59.3%, module III: 74.5%) and the scales showed a high internal consistency (Cronbach’s alpha values between .76 and .91) [12].

4 Insights from Using MeCUE in Practice

Since its development, the meCUE questionnaire has been successfully used in multiple studies (e.g., measuring UX of consumer products, digital devices, software, mobile applications). For instance, Lebedev et al. [21] implemented the questionnaire to test temporal changes in UX when interacting with a mobile health application. In their study, children with sickle cell disease rated the app regularly over five weeks with an online version of the questionnaire. Klenk et al. [22] employed meCUE to investigate the UX of a fitness app. Even for business software that enables crowd-based requirements engineering, meCUE has been has been successfully applied [23].

However, there are also some limitations and critical remarks that must be regarded. For example, Doria et al. [24] refer to a study that aimed at measuring the subjective quality of lower limb ortheses. For this very special type of device, it was found that meCUE was not able to cover all relevant aspects of product use. Missing issues in this case concerned safety, hardware ergonomics, or wearing comfort. Furthermore, some dimensions of meCUE might get an entirely different meaning in such as medical context and thus call its validity into question. For example, ‘status’ with respect to medical products might be a problematic scale. It could be argued that the item “By using the product, I would be perceived differently” is rather a measure of stigmatization and social isolation than of social affiliation. There might be other contexts in which specific dimensions of meCUE cannot be clearly evaluated by users, e.g., ‘commitment’ to industrial software that aims at supporting product development processes [25]. In summary, it must be emphasized that the free combinability of the meCUE modules gives investigators many degrees of freedom. However, it is still their responsibility to check the completeness and the appropriateness of scales’ carefully in each individual case.

This requirement is closely linked to the fact that - under some circumstances - non-instrumental product qualities might not be applicable to business applications. In a recent study by Lallemand and Koenig [26], it was reported that a participant raised the question how an intranet could be like a friend to her, when she was about to rate the meCUE item “The product is like a friend to me”. At first sight, this remark seems surprising since a low acceptance of such applications often seems to be caused by an insufficient consideration of positive UX. Numerous examples have demonstrated that especially business applications benefit from the creation of hedonic experiences. Barnickel [27], for instance, proposed a number of design solutions for a time tracking tool that aimed at satisfying basic psychological needs, such as autonomy, relatedness, and competence.

Barnickel’s work nicely illustrates that the more important issue might be how an intranet can be designed that actually feels like a friend to its users. Nevertheless, merging items for instrumental and non-instrumental qualities into a common module is probably a limitation with respect to adaptability (R5). As a consequence, it seems reasonable to eliminate this restriction by splitting up Module I (product perceptions) into two modules, thus separating the items for instrumental qualities from those for non-instrumental qualities. This separation makes it possible to use the new modules independently from each other, e.g., to refrain from the non-instrumental items when they seem to be inadequate for the system or the usage context under investigation.

To ensure that the change of module I does not impair the psychometric quality of the questionnaire (R4), two data sets were re-analyzed to validate the new structure – one for the German version, the other one for the English version of meCUE.

5 Re-analyzing the Factorial Structure of the English Version

For the English version, the data of an online survey were processed which had served to validate the prior English version [12]. In this study, fifty-eight native speakers from the United Kingdom and the United States (ages between 23 and 56, M = 32.6 years) had rated their experience with an interactive technical product from their personal environment (e.g., mobile devices, laptop, TV, software, mobile application, household appliances). Participants had been free to choose which product they wanted to rate. On average, they had owned the selected device for 9.4 months. Table 1 summarizes the characteristics of the sample. While the four modules of the questionnaire had been presented in a fixed order (i.e., product perceptions, emotions, consequences of use, and overall evaluation), the sequence of items within the respective modules had been randomized.

Table 1. Characteristics of the sample and the rated products.

Based on the Minimum Average Partial (MAP-) Test [28], a principle component analysis was calculated for each module of the questionnaire to check the factor loadings. In contrast to the prior study [12], now two separate component analyses were performed for the items measuring the instrumental and the non-instrumental product qualities.

The analysis of the items measuring instrumental product perceptions revealed two independent components (see Table 2). This finding corresponds to the result that was obtained for the German items.

Table 2. Module I (Product perceptions: instrumental qualities). Factor loadings > .4

For the items measuring non-instrumental qualities, the principle component analysis identified the expected three independent components (see Table 3).

Table 3. Module II (Product perceptions: non-instrumental qualities). Factor loadings > .4

The items and factor loadings of module III (User emotions) are listed in Table 4. The analysis showed two independent factors measuring positive and negative emotions.

Table 4. Module III (User emotions). Factor loadings > .4

Finally, a two-dimensional structure was found for module IV (Consequences of use). The pattern of factor loadings is equivalent to the German version with the two dimensions “product loyalty” and “intention to use” (Table 5).

Table 5. Module IV (Consequences of use). Factor loadings > .4

Cronbach’s alpha values for each scale were used as a measure of internal consistency (see Table 6). All values are between .76 (acceptable) and .91 (excellent). Table 6 also shows the proportion of explained variance for each scale and the cumulative proportions for each module. All values are comparable to those achieved for the German version.

Table 6. Proportions of explained variance and Cronbach’s alpha for all scales.

6 Re-analyzing the Factorial Structure of the German Version

Analogous to the re-analysis of the English items, data of an online survey were chosen which formerly had been conducted to test the reliability and the validity of the final German meCUE [11]. The procedure of that study had been identical to the procedure of the study validating the English meCUE. Two hundred thirty-seven German native speakers (139 women and 98 men, with an average age of M = 29.8 years) had rated their experience with an interactive technical product from their personal environment. Again, participants had been free in their choice of the product. On average, they had owned the selected device for 23.6 months. Table 7 summarizes the detailed characteristics of the sample.

Table 7. Characteristics of the sample and the evaluated products.

In the re-analysis, a principle component analysis based on the Minimum Average Partial (MAP-) Test [28] was calculated for each module of the questionnaire. With regard to product perceptions (module I), two separate component analyses for the items measuring subjective ratings of instrumental and non-instrumental product qualities were calculated. The analysis of the six items measuring the perception of instrumental product qualities revealed the expected two independent components “usefulness” and “usability” (see Table 8).

Table 8. Module I (Product perceptions: instrumental qualities). Factor loadings > .4

Items and factor loadings of the items measuring non-instrumental product qualities are displayed in Table 9. Here, the principle component analysis showed the expected three independent components measuring “visual aesthetics”, “status”, and “commitment”.

Table 9. Module II (Product perceptions: non-instrumental qualities). Factor loadings > .4

Table 10 lists the items and factor loadings of module III (User emotions). The analysis indicated two independent dimensions. According to the corresponding items, these dimensions differ in the quality of emotional reactions, namely negative and positive valence.

Table 10. Module III (User emotions). Factor loadings > .4

Finally, a two-dimensional structure was found for module IV (Consequences of use). The pattern of factor loadings represents the two expected dimensions “product loyalty” and “intention to use” (Table 11).

Table 11. Module IV (Consequences of use). Factor loadings > .4

As a measure of internal consistency, Cronbach’s alpha values were determined for each scale (see Table 12). All values are between .70 (acceptable) and .91 (excellent). Table 12 also shows the proportion of explained variance for each scale and the cumulative proportions for each module.

Table 12. Proportions of explained variance and Cronbach’s alpha for all scales.

7 Version 2.0 of the MeCUE Questionnaire

The results of both re-analyses show that the former module I (product perceptions) can be divided into two sub-modules separating the items for measuring the perception of instrumental product qualities (new module I) from those measuring the non-instrumental qualities (new module II) without reducing the psychometric quality of the questionnaire. This applies to both, the German and the English version. Based on our results, we suggest version 2.0 of meCUE with a revised modular structure (see Fig. 3).

Fig. 3.
figure 3

The structure of the meCUE (2.0) questionnaire.

In addition to the structural change, we propose to emphasize the phenomenological nature of the questionnaire more explicitly than before. First, this concerns the instruction which precedes the items. Instead of requesting a rating of product features, the new instruction explicitly asks to rate how these features are experienced. The appendix provides detailed information on the modified instruction. Second, this has also consequences for the single-item of module V. The new item asks “How do you experience the product as a whole?” (In German: “Wie erleben Sie das Product insgesamt?”). As previously, a semantic differential is offered to answer the question. It reaches from “as bad” (−5) to “as good” (5) in the English Version and from “als gut” to “als schlecht” in the German one.

8 Discussion

To what extent does meCUE 2.0 meet the basic requirements we proposed for a standardized assessment of UX? With respect to the CUE model, both, the old as well as the new version are highly comprehensive (R1) at the level of UX components. They address all three components postulated by the model as well as the consequences of usage and the user’s overall judgment. At the level of sub-constructs, however, not all product qualities are accounted for. Neither items for the perception of the haptic or acoustic quality have been developed so far, nor are all needs addressed which might contribute to a positive user experience when they are satisfied [2, 29]. While accounting for these aspects would certainly increase the comprehensiveness of the questionnaire, it would also decrease its efficiency. Since we aim at providing a lean assessment of UX which is also suitable for practitioners and companies, we had to limit the range of aspects and the number of needs that are covered.

Moreover, a theoretically based questionnaire like meCUE can only be as comprehensive as the theory from which it is derived. At the level below the UX components, perceptions of further product qualities might be relevant that the CUE model does not account for. Three issues seem to be especially important in that respect:

  • Safety critical applications might be experienced differently than innocuous ones. Backhaus and Thüring [30], for instance, discussed a number of pros and cons for cloud services from the user perspective. Such services may differ with respect to perceived trustworthiness and thus may induce trust or distrust which in turn may impact emotional reactions, behavioral consequences and overall judgments.

  • The CUE model focuses on the experience of users who interact with a technical device in a rather isolated manner. Interpersonal relations and social influences that shape experiences in social media are neither explicitly addressed by the model nor by the questionnaire [31].

  • In addition to technical products, technology-based services are experienced as well. This may shift the focus of research from user experience (UX) to customer experience (CX). According to Bruhn and Hadwich [32], CX comprises “all individual perceptions, interactions as well as the quality of an offered service that a customer experiences during his interaction with a company” (p. 10). Gentile, Spiller and Noci [33] proposed six dimension of CX. Four of them are sensory, emotional, cognitive and behavioral in nature and thus correspond to UX components as specified by the CUE model. CX, however, is the broader construct since it also incorporates lifestyle characteristics and social features.

Future research has to clarify whether these issues should be accounted for by the CUE model and an extended version of meCUE, or if it were more appropriate to address them in a different framework and by another questionnaire.

Like the first version of meCUE, the second one consists of 34 items in total. At first sight, it therefore appears as less lean than the AttrakDiff questionnaire or the UEQ, with 28 and 26 items, respectively. It must be noted though, that 12 of the meCUE items address users’ emotions which are not accounted for in the other two questionnaires. If investigators apply meCUE, they need no additional tool to assess users’ emotions. Moreover, they are free to leave out all modules which they consider as irrelevant for their research, thus adapting meCUE to their specific needs and increasing the efficiency of their survey (R2). In addition to the obvious benefit that adaptability offers for efficiency, the time that is necessary to fill in the questionnaire indicates too that meCUE is a lean assessment tool. On average, it usually takes between 3 and 7 min and thus requires only little effort on behalf of the participants of a study.

To ensure a high degree of intelligibility (R3), the items of meCUE were formulated in a short and unambiguous way with a clear reference to the product and the respondent of the questionnaire. Compared to the old version, meCUE 2.0 puts special emphasis on the experiential character of the UX assessment. This is reflected by the reformulation of the instruction and by the revised item for the users’ overall evaluation. All items – with exception of the single item in module IV – have the same format: a statement combined with a 7-point Likert Scale. This format was chosen to offer an alternative to the semantic differentials of the AttrakDiff questionnaire and the UEQ. Homogeneity of items is no matter of course in UX studies. Investigators who aim to assess a variety of UX aspects must often rely on different types of questionnaires, such as semantic differentials, Likert Scales or grids. Hence, their participants must read and understand several instructions and adapt to a number of divergent formats which might be confusing and requires additional mental effort. For meCUE, this drawback is avoided since its modules cover a wide array of UX components in a uniform way.

The psychometric quality (R4) of meCUE has been successfully checked in a number of studies for the German as well as for the English version [7,8,9,10,11,12]. In summary, the results of these studies demonstrated its high internal consistency, its good validity and its ability to explain great proportions of variance. This also applies to meCUE 2.0. As the re-analysis of two datasets showed, the structural change, that was accomplished by splitting module I into two independent parts, does neither harm the internal consistency nor does it impair the factor loadings within any module.

Compared to other UX questionnaires, the modular structure of meCUE is one of its major advantages because it ensures a high degree of efficiency (R2) and adaptability (R5). The modified structure of meCUE 2.0 opens up even more possibilities for its future use. It provides more flexibility in combining modules according to a research goal and the kind of product, user group or context of use that is investigated. Moreover, the questionnaire is now more suitable for comparing different design options at all stages of the user-centered design process and for a lean detection of changes in experience during long-term usage.

Although meCUE 2.0 fulfills the basic requirements which we proposed for tools that assess UX, previous experience has demonstrated that investigators still have the responsibility to check carefully whether the subjective perspective of their participants is sufficiently addressed by the questionnaire. In particular, they must decide which of the modules are appropriate for the purpose of their evaluation. A high degree of standardization is a crucial aspect for the quality of any method. However, standardized instruments do not always fit the user group or the system under investigation perfectly [26], and must therefore be carefully selected and handled.