1 Introduction

In the last years, some studies about how cultural differences influences user experience (UX) have started being published. Most of these studies concluded that cultural differences really affect UX, whether in product development [1], systems such as internet banking [2] or games [3]. These studies were conducted in different countries, using different methods and led by researches with different cultures. Observing these points, the main objective of this review is to map the work in this area, listing the methods and results around the world, and to suggest a research guideline for similar work.

A systematic literature review is a method to identify, evaluate and interpret all relevant research available for a specific research question, subject area or phenomenon of interest. The main reasons and advantages for conducting a systematic review are: to summarize the information about a particular topic, find gaps in subject areas or provide a background to a study [4]. Therefore, a systematic literature review was conducted to answer these following questions:

  • How do the user experience studies define “cultural difference”?

  • Which methods exist to assess whether cultural differences influence UX?

  • What are the results of this type of study?

Based on the results, we identified and propose a guideline to replicate this kind of work in other scenarios. This guideline may represent a significant contribution to the area, perhaps enabling an increase in the number and standardization of certain cross-cultural studies, in the development of new techniques and in the relevance of this subject during UX projects.

2 Methodology

The review was conducted in five academic databases IEEE Digital Library, ACM Digital Library, Capes Periodicals, Capes Thesis and USP Thesis.

Twelve keywords in English and two keywords in Portuguese were used on the search engines (Table 1).

Table 1. Keywords used on the systematic literature review

Search strings were built with the keywords that were submitted to the search engines of the mentioned bases. Based on reading the abstracts, the criteria for inclusion and exclusion listed below were applied.

Inclusion criteria:

  • Works that address cultural differences related to usability or synonyms were included.

Exclusion criteria:

  • Papers that present ratings without presenting the method used were excluded.

The full text of the papers included through these criteria was read and the criteria for final selection were reapplied. Then the following information was extracted from the selected works:

  • Definition of “cultural difference”;

  • Method to evaluate cultural difference;

  • Method to evaluate user experience;

  • Result of the study;

  • Countries that the study covers;

  • Audience;

  • Number of participants;

  • Statistical methods used.

Finally, there was a quantitative and qualitative analysis of the information extracted.

3 Results

The search results for each of the sources of research were recorded and are shown in Table 2.

Table 2. Results by search source

After reading the abstracts, the 45 accepted papers were classified into two categories: type and media as shown in Table 3. The type category refers to the phase of the product lifecycle in the cultural influence is analyzed: development, testing, final user experience or in research of new methods. The media category refers to which type of interface was explored in the study.

We then decided to focus on the papers with the UX type. Among the 28 papers included in this type, 5 were excluded because they were studying user experience with objects instead of software, focusing on reliability instead of a broader definition of UX or were a precursor study of a final study already present in this review. Then the 23 remaining papers were analyzed fully [527].

In order to answer the first research question, “How do the user experience studies define cultural difference?”, we sought to examine how each study categorized the culture of the participants and how it defined this concept. As shown in Fig. 1, the work of the anthropologist Hofstede [28] predominates as the basis of the definition of culture, followed by the study Nisbett [29] with only three entries.

Table 3. Accepted papers categorized by type and media
Fig. 1.
figure 1

Cultural difference definition models

The Hofstede perspective is methodologically advantageous to work that wishes to map the behavior of cultural groups, especially highlighting the focus on measuring cultural traits. Hofstede follow a particular line of thought in anthropology, according to which there are generalizable cultural traits [30].

Although Hofstede’s analysis is based on cultural dimensions, the dimensions are calculated by territories and, therefore, continue to use political divisions as part of its foundation. Some studies [5, 7, 21] considered the birthplace and the time and place of residence to set the culture of the individual, while other studies [13, 16, 23, 27] used only the birthplace. It is understood that it is appropriate to analyze where the individual lives, if the current reality has influenced the culture derived from the birthplace.

Then we analyzed the second research question “Which methods exist to assess whether cultural differences influence UX?”. As shown in Fig. 2, the most widely used method to evaluate the influence of cultural differences in user experience is the use of questionnaires, followed by performance measurement in predefined tasks.

Fig. 2.
figure 2

Evaluation methods

Questionnaires play a key role in usability evaluation [31]. In the papers evaluated in this review, questionnaires were very popular and used to identify the importance of usability attributes, such as effectiveness, efficiency and satisfaction, in finished products [5, 6, 11, 24, 25] and to investigate the user experience [10, 14, 1621, 26, 27]. Questionnaires were also used in a less conventional way, in tests in which the user completed sentences according to their understanding [22], in initial mapping of user culture [7, 8] and to recognize icons [23].

The second most used method, measuring performance in tasks, constitutes successful evaluation in tasks in relation to time and/or amount of hits. This method, used in some studies [7, 10, 13, 17, 18, 21, 26, 27], was conducted by asking users to execute activities or perform tasks in different interfaces to correlate the performance of any given interface with the culture of a certain group.

The method of conducting interviews, used in some papers [8, 16, 26], is similar to applying the questionnaire in relation to the purpose of raising the awareness of the user, but adopting a personal interaction as a strategy to capture more information.

The last method listed, monitoring, analyzes the user experience through the capture of user signals. In this set of papers there are two examples, the traditional eye-tracking [16] that monitors eye movements and gaze direction in order to assess several factors, including user satisfaction, and the less conventional FunToolKit [19], which analyzes children’s facial expressions and smiles to measure satisfaction.

The higher number of evaluation methods in relation to the number of papers occurred because some combined more than one method. Table 4 shows the combinations of methods were used.

Three of the papers included did not perform user experiments themselves [9, 12, 15], but based their conclusions only on literature reviews and annotations.

The third research question, “What is the result of the study?”, focuses on how many studies have indicated that cultural differences impact on user experience and how many got the opposite result. The vast majority of studies, 87 %, conclude that there is influence.

For a better analysis of this result, a comparative analysis of the papers follows, highlighting some of the strengths and weaknesses in their methods and definitions, in order to indicate which results inspire greater confidence.

Table 4. Combinations of methods

Among the 20 articles that claim that culture influences UX, three performed no experiment [9, 12, 15] and three did not discuss which statistical methods were used [21, 26, 27]. All other studies conducted user experiments and showed the statistical methods used, including the three that could not find evidence of culture influencing UX.

Of the papers that performed experiments and showed their statistics, 16 adopted young people such as students, trainees, gamers etc. as their target group. There is only one exception [19], which used children as target audience. It is considered appropriate to choose as target users of the same general age group, or to control for it, since age is one of the factors that influences culture.

One study selected numerically equal groups of men and women [17]. This control is important since gender may interfere with both culture and UX. The amount of target audience for the studies varied within the range of 40 [7, 8, 19] to 5000 [22] participants. Studies using online questionnaires had larger samples.

Among the papers that claim to have found significant influence of culture in UX, two stood out for demonstrating through the experiment that there was effective performance improvement in the activities when using the culturally appropriate interface [7, 13]. This type of analysis has high added value because it not only shows that the performance differs between cultures but also using a culturally appropriate interface can improve both performance and UX.

These studies measured user experience and performance on tasks exposing the same interfaces to groups with different cultures and analyzed these measurements with statistical methods and methodologies typical of reliable research.

On the other hand, the two papers that claim to have found no influence [10, 18] may present small methodological divergences. One of them [19] said the experiment was inconclusive, but used a relatively small sample of young children as subjects in two different countries and with considerably different sample sizes in each country, which complicates data analysis. It also used the Fun Toolkit to measure satisfaction, along with self-reported questionnaires (which may also be more problematic for children to answer). Other two articles [10, 18] used a different research procedure: instead of subjecting users of different cultures to the same interface, as most of the articles made, they chose to show two significantly different user interfaces to users of a single culture, and were then unable to detect any difference.

4 The Guideline

Based on the results and in the best practices adopted on the studies we analyzed, a guideline was created to help researchers in future studies that aim to investigate whether cultural differences between two or more specific groups are deep enough to affect user experience.

The first step is to choose the population to the study. It is recommended to choose a population with very similar characteristics, in which only one factor influences cultural differences, to minimize the interference of other factors (or one must be aware of these other factors and control for them during data analysis). When analyzing differences in regional culture, for instance, which is a very common type of study, one should attempt to choose people within the same ranges of age, income and education. Profession (or general professional area) may also play a large role in culture, particularly in aspects related to that profession, so it should also be as uniform as possible or controlled for. Gender may or may not have an influence but since analyzing for a single gender is usually undesirable, we recommend controlling for it, and choosing the same number of men and women in each cultural group may also alleviate this problem. Finally, for regional differences, it is interesting to confirm that the study participants live in the same region they were born and never lived for long periods in other regions to avoid the adoption of other cultural traits.

The second step is to evaluate if the groups of the study really shows cultural differences. In the attempt to pick a population as uniform as possible aside from one factor such as regional culture, one might end up picking from a group, such as “computer science students aged between 20 and 25” which has its own more uniform subculture that supplants the influence of other factors, such as where they live. To verify whether the different cultural groups in the population do show cultural difference, we recommend the use of Hofstede’s Values Survey Module [28], because it is methodologically advantageous, giving clear, numerical results through the application of a questionnaire, and easily replicable.

The third step is to choose the system or interface that will serve as a basis for the UX evaluation. It is important to equalize or control for previous knowledge and experience with the system, to avoid that users with more familiarity obtain results significantly different compared to those that have no practice. Therefore, we recommend creating a new system or interface if feasible, used only for the experiment, ensuring that all users will have no familiarity with it, unless one wishes to test experienced users, in which case training before the experiment may alleviate these differences in experience.

The fourth step is to choose the methods to evaluate the user experience and satisfaction about the system. We recommend the use of questionnaires and task performance measurements, frequently used in studies analyzed in this review, as ways to get quantitative results to compare. Standard questionnaires such as QUIS [32] or others could be used, and are preferable since they have already been vetted by the research community. If possible, we also recommend the application of some form of interview to get further results and explanations, including qualitative results.

The fifth step is to define the tasks to be measured. It is recommended to choose different system features and, in some of these, place some “traps” (i.e. instances where uses may easily commit errors or purposefully ill-designed elements of interaction) to analyze user reaction to these elements.

The sixth step is to define questions for the interview. Common options are asking about what users liked best and liked least, what they would change in the interface and letting them make general comments.

The seventh, and last, step is to apply the experiment and to compute the results. If the VSM results and the UX tests results shows statistically significant difference between the different cultural groups but statistically converge between the users of a same group, this strongly suggests that the cultural differences between these groups is deep enough to impact the UX. It is then possible to analyze which aspects of the experience differ mostly between groups and to begin to understand and design for these differences.

5 Conclusion

Our goal was to map the work in this area, discussing the methods and results around the world, and to suggest a research guideline for similar work.

Our results indicate that Hofstede has the most accepted definition of culture and his method to calculate cultural differences using cultural dimensions is methodologically advantageous, easily replicable and largely utilized in this kind of study.

Questionnaires and task performance measurements are the most used methods to evaluate the UX in this kind of study.

Most of the studies analyzed on this review, 87 %, concluded that the cultural differences really affects the UX.

To help researchers that want to replicate this kind of study in other groups of user we elaborated a guideline with seven steps:

  1. 1.

    Choose the population

  2. 2.

    Evaluate cultural differences

  3. 3.

    Choose a new system/interface

  4. 4.

    Choose the methods to evaluate UX

  5. 5.

    Define the tasks

  6. 6.

    Define the interview questions

  7. 7.

    Apply the experiment and compute the results

We believe that culture is an important part of UX and, the more studies are performed, the more techniques that account for culture in the UX process will evolve. With this work, we contribute with futures studies by making experimental design less complex with a guideline based on success studies.