1 Introduction

In recent years, the correlation of interface aesthetics with usability has become of major interest within the field of human-computer interaction, especially in the context of user experience research [2]. Abundant research indicates that there is a strong connection between interface aesthetics and perceived usability, however, with further experiment, the significant relationship between interface aesthetics and perceived usability appears to be questioned [13, 18, 24].

In spite of existing debates on whether interface aesthetics is strongly tied to perceived usability, this condition is variable because different circumstances alter different cases. Thus, in this research, we tend to focus on how users perform in two different screen designs applied to not only the same content, but also the same function.

Due to the argument focusing on the correlation of aesthetics and usability, it’s important to test whether aesthetics can make a positive effect on usability. The investigation of this topic will not only provide insights to researchers for further reference, but also provide design requirements and design guidelines for designers.

For this study, we chose the tool page of DIA2 as the research tool. DIA2 (Deep Insights Anytime, Anywhere) is a web-based visual analytics platform [11]. One of DIA2’s target groups are professors in STEM (science, technology, engineering, and mathematics). Moreover, the tool page is where users perform their tasks, and a refined design for the tool page has just been finished. Conveniently, two different designs for the tool page are suitable for this study.

In the next section, we summarize related literatures in order to provide an impression of what is the attitude of the academy and industry toward the relationship of aesthetics and usability.

2 Literature Review

In the industry, [22] found that good interface design can have a positive effect on usability, and this kind of relationship still stands even after actual use of the system [23]. [6] supported the importance of aesthetics and further suggested a theoretical framework for future practice. In the product design area, aesthetics for vehicle interiors has been examined and the author indicated that the impact of aesthetics cannot be ignored [7]. In the meanwhile, aesthetics of different forms of products have also been studied, concluding that the positive effect of product aesthetics is obvious [17]. In the field of human-computer interaction, the correlation of aesthetics and usability has been studied for a long time, and numerous results showed support for the principle “what is beautiful is usable” [15, 16, 20, 21, 27]. Studies related to web user interface agreed with the correlation between perceived usability and aesthetics [4, 5, 9]. The tight aesthetics-usability relation became widely accepted within several academic areas. The overall attitude toward the correlation of aesthetics and usability is positive and supportive.

However, with further experiments, the significant relationship between interface aesthetics and perceived usability appears to be questioned. [13] brought up the conclusion that though web page aesthetics has positive effect on usability, but the effect is small. [24] argued that the inclusion of the visual attractiveness construct contributes more on enjoyment but less on usefulness. [18] concluded that the type of aesthetics that is relevant to users’ perceptions appears to depend on the application domain. [14] indicated that multiple mechanisms could be responsible for the relationship between attractiveness and usability. Moreover, some of the findings showed that, although the essential result supported the correlation of aesthetics and usability, differences between objective usability in reported problems and subjective usability ratings have been affected by multiple factors such as halo effect and provided content [5, 25, 26]. The existing controversy in the area of Web design and usability points to the need for further research. We explore this topic in the context of an online visual analytics platform. As opposed to a regular website, a visual analytics platform’s main role is functional. Its goal is to provide clear information to users. It does not have hedonic or entertainment purposes. Thus, we can think of such a platform as a critical case: if aesthetics has an effect on perceived usability for this utilitarian product, it will likely have an effect on other products that have broader purposes.

In terms of testing the correlation between aesthetics and usability, we bring up the research question: does the refined design of the tool page of DIA2 have a positive effect on the usability of professors in STEM?

In the following section, we explain the method we used in this study, including sampling strategy, participants and the procedure of data gathering.

3 Methodology

3.1 Design

Two screen designs for DIA2 tool page were chosen for this study (See Fig. 1.). The only difference between these two screens is the design of two parts of the interface. For the dashboard area on the left side of the screen, we reduced the spaces between each dashboard. We added a blue bar to indicate the active status of the dashboard. “Add a dashboard” button changed, too. The refined version emphasized the button by adding a blue background. For the tool bar area on the top side of the screen, we moved all the icons toward to the right side of the screen in order to let users notice them easily. Participants were manually assigned to use one of the screens to perform provided tasks.

Fig. 1.
figure 1

Current version (left) and Refined version of DIA2 tool page

3.2 Sampling Strategy

The participants were professors within areas of science, technology, engineering, and mathematics (STEM), and have been awarded, are seeking or have sought funding from the National Science Foundation (NSF). This is because the visual analytics platform presents data from the NSF. The sampling strategy we used is a mix of criterion sampling and snowball sampling (one participant helped with finding another participant) [10]. These sampling strategies are advantageous to study participants that meet some criterion [12], but also with special experiences [3], and participants may be able to recommend useful candidates for study [8].

3.3 Participants

The sample of this study consisted of 8 participants, including 2 females and 6 males. All of them were professors at Purdue University. Previous knowledge shows that they all have experience with trying to get funded by the National Science Foundation. Some of them own projects that have successfully gotten funded by National Science Foundation. They had never experienced with DIA2 before this study.

3.4 Measures

Perceived Interface Attractiveness.

Participants were asked to choose the screen design they like more between two difference screens designs.

User Performance.

Both quantitative and qualitative measures of user performance were recorded. For quantitate measures, task completion time referred to the time needed to accomplish the task. For qualitative measure, verbal expression and non-verbal expression were recorded during the task.

3.5 Procedure

The study was conducted in participants’ offices. A laptop was provided. The whole process lasted for approximately half an hour.

There are four sections for data gathering. The first section is a pre-task, semi-structured interview, asking demographic information and previous experience regarding activities related to NSF. In the second section, participants were asked to perform 8 tasks using computer provided by researcher. These tasks were selected because performing these tasks requires more interactions with dashboard and tool bar, which are the two main areas that differs the two versions of the tool page. After finishing the tasks, participants were asked to fill out a questionnaire regarding of the usability of the tool page. Finally, participants answered a list of open-ended questions in terms of their experience with the tool page. The process has been recorded with the permission of the participants. Participants’ facial expression and screen activities during the second section were recorded with a software called Silverback with the permission of the participants. Please see Appendix A for interview questions and task list.

3.6 Data Analysis

The researcher calculated the time each participant spent on individual tasks. Participants’ performance of each task was summarized and compared. Verbal and non-verbal expressions were categorized into three categories (positive, neutral and negative expressions) based on the scale provided by [1]. Answers to follow-up interviews were summarized based on each question.

4 Results

4.1 Perceived Website Attractiveness

The result of the attractiveness evaluation of two screen designs is presented in Table  1.

5 out of 8 participants chose the refined version as their preference design. 2 out of 8 thought the current version was better. One participant couldn’t choose between two designs. She explained:

I like the consistency of this bar (the current version). The “add a dashboard” really fits into the whole screen. The top banner tells me what this site is about. But on the other hand, here (the refined version) has more room. This white space. The dashboard looks better in here. Oh I didn’t notice “my project” in here (the refined version). Sorry, I…I can’t choose between. They all seem fine to me.

Representing an aesthetics preference, the result supported the hypothesis that the refined version of the tool page is more attractive to participants than the current version.

Table 1. The result of website attractiveness
Table 2. Comparisons of time spent on 8 tasks
Table 3. Comparison of time spent on each task

4.2 Time Spent on Tasks

Time Spent on Completing 8 Tasks.

The comparison of how much time in average participants spent on different screens is showed in Table 2. It’s obvious that the time spent on the refined version of the tool page is shorter that the time spent on the current version.

Time Spent on Each Task.

Table 3 shows tasks that have significant difference among eight performances. 4 left dots present participants using current version while performing tasks, and 4 right dots means participants using refined version. Green lines represent the average time spent on each task. It’s obvious that most of the performance that involving current version took longer time than using refined version.

4.3 Website Credibility

The result of website credibility is presented in Table 4. 4 of the participants didn’t consider the possibility that aesthetics will affect their judgment toward such website. One participant described:

Table 4. Website credibility

I know it’s a NSF website. I won’t question it, because I already know the organization behind it is NSF.

3 participants chose to trust the refined version more due to the “clear look” of the design. Two of them admitted the aesthetics is one of the main reasons that affect their judgment. Only one participant chose to trust the current version. He described:

This (the current version) is a more, how can I say, academic look to me? Yes, I feel that this (the current version) is an academic website, instead, this one (the refined version) is too clean.

4.4 Attitude Toward Aesthetics Before Task Performance

During the pre-task interview section, we asked participants what were their attitudes toward aesthetics versus function, in other words, do they consider function as the only factor that will affect usability, or do they think aesthetics can make a difference to usability performance. Surprisingly, 7 out of 8 participants thought only function matters. One participant said:

I don’t think myself sensitive to, you know, design? I can’t really tell which design is better. And personally I don’t care. To this system, because there are a lot of data involved, if it is easy to use, I can’t see how many difference aesthetics can help with me using this system.

Another participant explained his opinion from another perspective:

I mean, aesthetics is not the main focus of this website, right? All I need is to find the right information. No matter how beautiful, how nice this website is, as long as I can find the information, it’s useful. Aesthetics? Aesthetics can’t help me find the information, right?

As a conclusion, participants believed aesthetics would not affect usability, which contradicts the task performance data.

4.5 Task Performance

Below presents participants’ performance for each task.

Task One.

For task one, participants were asked to change the first dashboard’s name to “My first dashboard”. This task is designed for the purpose of testing whether participants can interact with the features of dashboard easily. To this purpose, all the participants shared the same process of naming the first dashboard: they hovered the mouse over the text that says “name the dashboard”. After a grey shadow appeared, participants clicked on the text. After an input box appeared, they typed “my first dashboard” into the correct input box and clicked the “save” button. Three participants asked if they should rename the dashboard before beginning the task. One used the current version, the other two used the refined version.

Task Two.

Task two asked participants to find the profile of a specific individual who is currently working at the University of Michigan. We designed this task based on the fact that in order to finish this task, participants need to interact with the tool bar. The individual to be found was chosen because (1) the name of the individual is within a reasonable length, (2) the spelling of the name is comparatively easy, and (3) the name can be found within the “people explorer” tool. Similar to the first task, all the participants shared the same process and completed the task successfully. First, participants clicked on the “people explorer” tool icon, and then input the name of the individual. However, two participants using the current version experienced a noticeable usability issue. When they clicked the “people explorer,” they didn’t click right on the icon. Instead, they chose to click the area around the icon. Since for the current version, the tool will be active only if the icon itself has been clicked, this design caused one participant to click the “people explorer” twice and the other to click three times before the tool showed up. This action indicated that participants ideally thought the entire area between the separators should be clickable.

Task Three.

For task three, participants were asked to choose a different dashboard and name it “my second dashboard”. All the participants completed this task by simply clicking on another dashboard and then repeating the process performed for the first task. Yet, before starting this task, two participants using the refined version asked which dashboard they should choose. We responded by asking why they had asked this question. One participant answered me by saying:

The dashboard looks like a three-step process. You see, with all these numbers and, yeah, like this is step one, step two, and step three. They are too close, maybe.

The other one provided a similar explanation:

I firstly thought these tabs here, I don’t know why, but these seem like steps, you know, like you need to finish these tabs then you can find what you want.

Task Four.

Participants were asked to find the graph that shows the collaboration within Purdue University. The task requires interactions with the tool bar but with a different tool. All the participants followed the same process and successfully finished the task. The usability issue here, however, was the same as during task two. Two participants clicked the empty area around the icon instead of clicking right on the icon itself.

Task Five.

Task five asked participants to find a specific program categorized by the National Science Foundation. To finish this task, participants had to understand the definition of “program” and know which tool was the correct one. All the participants clicked the “NSF program explorer” tool without hesitating. However, the same issue that occurred during tasks two and four also occurred during this task. One participant using the current version clicked the tool area twice and showed a little bit of impatience. On the other hand, one participant who also used the current version chose to move the mouse comparatively carefully toward the icon when choosing the tool.

Task Six.

Participants were asked to add two more dashboards in this task. The feature they needed to interact with was the “add dashboard” button placed in the dashboard area. What participants did was simply clicked the “add dashboard” button twice.

Task Seven.

For this task, participants needed to delete the fifth dashboard, which is the dashboard with a “5” on the top-left corner. All the participants deleted the assigned dashboard by clicking the “delete” button placed on the bottom-right corner of the fifth dashboard. However, all of the participants using the current version either scrolled the screen or dragged it to the bottom in order to delete the fifth dashboard. The reason for this action is that for the current version, the browser window isn’t long enough to accommodate all the five dashboards at once.

Task Eight.

Participants were asked to find the profile of another specific individual at Purdue University. Unlike with task two, this profile can’t be found through the “people explorer”; instead, the “institution explorer” is the correct tool for searching for such information. Thus, this task requires more constant interactions with the tool bar. All the participants first clicked on the “people explorer” and typed the name of the individual into the input box. After receiving incorrect information, they clicked the “institution explorer” icon, found the profile for Purdue University, and typed the name into the collaborator search bar.

4.6 Expressions During the Task Performance

Due to the design of data gathering process, not many comments were gathered. However, several noticeable negative verbal and non-verbal expressions were captured during the process, and a comparatively large amount of neutral verbal comments were recorded. Next section will start with neutral expressions, following with negative expressions.

Neutral Expressions.

During task performance, interjections and short confirmations have been used constantly for indicating a finish of one task. “ok.” “oh, I see.” “let’s see.” “Done.” were highly used. Other than these expressions, participants merely said or behaved anything that could be categorized into “neutral expression” category.

Negative Expressions.

Several noticeable verbal comments were captured during task performance. During task 4, two participants using current version of the tool page said “come on!” after clicking one icon twice but nothing happened. During task 7, all 4 participants slightly muttered when they couldn’t find the fifth dashboard on the screen.

Obvious facial expressions were also recorded during task performance. During task 4, one participant using current version of the tool page slightly frowned when clicking twice was not triggering anything. And during task 5, all four participants using current version of the tool page slightly frowned when the same situation happened.

4.7 Attitude Toward the Website After Task Performance

After showing two screen designs to participants, most of the feedback to the refined version is positive.

I like the clean look of this one (refined version) rather the other website.

For this website (refined version), I have a bigger space to put this small windows.

Cool, I like this one better. The tabs are more concise.

Suggestions were also brought up. One participant said that the dashboard looks like “a three-step process instead of individual dashboard because they are too close to each other”. Another suggestion is that the tab “add a dashboard” is not consistent with the dashboard above.

Though there were two participants that prefer the current version, they didn’t provide negative feedback to the refined version. When being asked why they prefer the current version, one participant said that the current version looks like an academic website to him. Another participant couldn’t explain why. In his word, he was used to the dark blue color, which might affect his judgment in some ways.

5 Discussion

The finding shows aesthetics has a positive effect on usability of DIA2, which supports the hypothesis. There are significant differences between the performance using two different screen designs. It’s obvious that participants using the refined version of tool page performed better than the participants using the current version. Moreover, due to the design difference, two usability problems were solved in the refined version. Although there were not many verbal or non-verbal expressions, negative expressions were mainly concentrated in the performance that was related to current version of tool page, which indicated that the enjoyment of using current version was not as pleasing as using refined version. It is confirmed that in this particular case, aesthetics helps with usability.

On the other hand, based on the pre-task interview and feedback from participants after task performance, though most of the participants didn’t think aesthetics could help with usability, the result shows the opposite. The refined version, which is considered as more well-designed, has a better performance than the current version. Faced with this conflict, it is not surprising that a majority of arguments are focusing on discussing whether aesthetics matter. People don’t realize the fact until they perform. They tend to trust their own experience or knowledge and are used to judge certain unknown perception based on their consciousness. In spite of existing conclusions that it depends on the circumstance whether aesthetics help with usability, in this case, aesthetics does have a positive effect on usability. Specifically, this study’s results suggest that, even for a very functional product, and even when participants claim that aesthetics would not influence usability, aesthetics does improve both actual and perceived usability.

In addition, this study is subject to a number of limitations. The number of participants is not ideal. 8 is not a suitable number for quantitative data gathering. The environment of conducting this study is not ideal. Though we were staying in each participant’s office, researcher provided the device they used for performing the task. Unfamiliarity of the device may affect task performance. Besides, one of the participants has just done an eye surgery, such situation seemed to affect his performance in a noticeable way.

6 Conclusion

Although it is widely accepted by several academic areas that aesthetics has positive effects on perceived usability, we consider different circumstances alter different cases. Thus, for DIA2, this particular web platform, we conducted a study in order to find out whether the refined version of DIA2 tool page has a positive effect on usability. Eight participants were involved in this study by being interviewed and asked to perform 8 tasks. The result demonstrated that the refined version has a positive effect on usability, which support the hypothesis that aesthetics helps with perceived usability when using DIA2. Future work can focus on identifying specific aspects of aesthetics and their individual influence on usability, in order to produce a deeper understanding of how aesthetics affect usability.