Keywords

1 Introduction

1.1 Automation System

Today, technological progress offers many opportunities to use automation systems such as cleaning robots and automated driving systems in everyday life. An automation system is a technology that autonomously conducts a task on behalf of humans [1]. Their use is expected to increase further in the near future.

Automation systems provide a completely new interaction between people and artifacts. Classical first-generation artifacts, such as knives and hammers, are used to support users’ physical activities. The physical structures of such artifacts are usually simple and their functions are easy to understand. Therefore, in the interaction between people and classical artifacts, the artifacts normally perform along the users’ intentions [2]. Furthermore, cognitive artifacts, such as computers and smartphones, are used to support users’ cognitive activities by storing, expressing, and manipulating information [3]. Such artifacts are considered second-generation artifacts. As the inner structures of such artifacts are complex, their functions difficult to comprehend. Therefore, in the interaction between people and cognitive artifacts, discrepancies between the users’ intentions and the artifacts’ performance often occur [4].

Automation systems, in contrast, autonomously conduct tasks in the users’ physical and cognitive activities. Moreover, when using classical or cognitive artifacts, the central player of each task is the user, and the artifacts support the user. However, when automation systems are employed, the central player of each task is the system, and the users monitor and supervise the systems. Thus, automation systems are considered third-generation artifacts. Additionally, because automation systems conduct far more complex information processing than first- and second- generation artifacts, users generally cannot understand the internal functions of the systems. However, if the discrepancies between the users’ intentions and the automation systems’ performance occur, they could lead to fatal accidents [1].

1.2 Usability Test

Artifacts used in everyday life are often evaluated through usability tests. Usability is defined as “the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use” [5]; this is the world standard definition of usability [6]. The evaluation targets are usually cognitive artifacts, mainly computer software systems, to be developed and improved [6].

On the other hand, automation systems are usually evaluated based on the users’ trust in the systems or their assigned workload [1]. The evaluation targets, in this case, are automated control systems in nuclear and manufacturing plants and automated operating systems of airplanes and ships. With the increased use of automation systems in daily life, their usability must be evaluated as well. Objective data on the use of automation systems may be obtained in multiple ways. However, automation systems cannot be evaluated using existing usability test questionnaires, such as SUS (system usability scale) [7], SUMI (software usability measurement inventory) [8], and QUIS (questionnaire for user satisfaction) [9], because the features of the automation systems, their autonomy and the complex inner information processes, are not considered in these questionnaires. Therefore, a new evaluation index is required, which will consider the features of automation systems.

In this study, we developed a generalized usability questionnaire for evaluating automation systems, which can be applied under different environmental settings.

2 Questionnaire Design

The usability elements effectiveness, efficiency, and satisfaction defined in ISO [5] were reflected in the usability questionnaire for automation systems. Effectiveness is “the accuracy and completeness with which users achieve specified goals,” efficiency is “the resources expended in relation to the accuracy and completeness with which users achieve goals,” and satisfaction is “the freedom from discomfort, and positive attitudes towards the user of the product” [5].

We added three new usability elements to the questionnaire to evaluate automation system usability. The first element is understandability the comprehensibility of the intentions of automation systems. Automation systems autonomously conduct tasks; however, as users cannot usually comprehend the internal functions of the systems, it is desirable for users to understand the intentions of the systems without discrepancies [10]. The second element is discomfort the absence of comfort or ease in using automation systems. Because automation systems receive information from the external environment and make decisions using this information, automation systems sometimes achieve tasks with users’ unpredicted behaviors [10]. Furthermore, the task-performing process influences users’ subjective evaluations of automation systems [11]; users feel comfort or discomfort because of the task-performing process, regardless of the task performance of the systems. The third element is motivation the users’ desire to conduct the tasks by themselves. Because automation systems autonomously conduct tasks instead of the users, users tend to significantly depend on the systems [1, 12]. Additionally, users’ overreliance on the systems reduces their motivation and ability to conduct tasks by themselves [13]. Automation systems should not take away the users’ motivation and ability to conduct tasks.

Based on these considerations, we developed a usability questionnaire for automation systems comprising six questions for each element effectiveness, efficiency, satisfaction, understandability, discomfort, and motivation and a total of 36 questions, rated on a five-point scale: 1. Strongly disagree, 2. Disagree, 3. Neither agree nor disagree, 4. Agree, 5. Strongly agree.

3 Experiment

We conducted an experiment using and evaluating automation systems with the designed usability questionnaire to investigate its reliability and validity.

3.1 Experimental Task

Line-Tracing Task. We used a line-tracing task as an experimental task, a modified version of the line task used by Maehigashi et al. [14]. In this task, participants traced a line that scrolls downward past a circular vehicle. When the vehicle veered off the line, the performance score reduced according to the operational error. The participants were allowed to switch to either the auto-tracing mode (operation performed entirely by the system) or the manual mode (operation performed by participants using left and right arrow keys). The circular vehicle in the task had a diameter of 24 pixels. The window scrolling speed was 24 pixels per second. In the line task, the line was five-pixels wide. When operational errors occurred, a flashing red square frame appeared around the task window as error feedback.

Auto-tracing Systems and Task Situations. We set up two types of auto-tracing systems: the operational pathway-displaying system and the non-displaying system (Fig. 1). The operational path-way-displaying system indicated the pathway (light green) along which the system tried to operate the vehicle. The non-displaying system, in contrast, did not indicate the pathway. The participants used and evaluated these systems in various situations: (1) the system accurately traced the line or failed to trace it, (2) the vehicle stably traced the line or unstably snaked along the line in the auto-tracing mode, and (3) pressing a selector on the keyboard or typing a command, switch, when it was necessary to switch from the manual to the auto-tracing mode.

Fig. 1.
figure 1

Two types of auto-tracing systems: operational pathway-displaying system (left) and non-displaying system (right).

The auto-tracing system performance was controlled by the success rate of the operation command. In the experimental task, the system’s operation in the auto-tracing mode was generally reflected in the vehicle movements with a 50 Hz sampling frequency. We controlled the percentage of how much the system’s operation reflected in actual vehicle movements. In one situation, the system’s operation, which was always perfect, was reflected in 100 % of the vehicle’s movements. Therefore, the vehicle did not veer off from the line. In another situation, however, the system’s operation was reflected in only 40 % of the vehicle’s movements. The remaining 60 % of the system’s operation was accepted in the experimental task system as no operational command. Therefore, the vehicle did not appropriately trace the line.

Auto-tracing system behavior was controlled by the observed vehicle behavior, independent from the system performance. If the system performed perfectly, the center of the circular vehicle was on the line. In another situation, although the system performed perfectly, the vehicle unstably snaked along the line. However, the circular vehicle was always on the line.

Switching from the manual to the auto-tracing mode was controlled by the command input method. The participants either pressed the spacebar or typed a command to switch from the manual to the auto-tracing mode. Switching back from the auto-tracing to the manual mode was performed by pressing the spacebar.

3.2 Method

Participants. Forty-eight university students participated in the experiment.

Procedure. Half the participants used the operational pathway-displaying system, and the other half used the non-displaying system. Each participant conducted the task in the eight different situations created by combining the task situations mentioned above. Performing the task in each situation took five minutes. After the participants completed the task in each situation, they evaluated the system with the usability questionnaire. After the evaluation, they took a two-minute break. Before they engaged in the task in each situation, they were instructed to achieve as high a score as possible and to use the auto-tracing system to evaluate the system.

Results. First, factor analysis based on the maximum likelihood method with varimax rotation was performed on the 384 data points collected from the 48 participants. As a result, six factors were extracted that explained 73.77 % of the variability in the dataset (Table 1).

The first factor loadings exceeded .50 in six questions for effectiveness and in two questions for efficiency, and were below .50 in one question for motivation. Therefore, the first factor is considered to indicate effectiveness. The second factor loadings exceeded .50 in four questions for efficiency. Consequently, the second factor is considered to indicate efficiency. The third factor loadings exceeded .50 in six questions for satisfaction. Therefore, the third factor is considered to indicate satisfaction. Regarding the fourth factor, the factor loadings exceeded .50 in six questions for understandability and in one question for discomfort. Therefore, the fourth factor is considered to indicate understandability. Regarding the fifth factor, because the factor loadings exceeded .50 in five questions for discomfort, it is considered to indicate discomfort. Finally, the sixth factor loadings exceeded .50 in five questions for motivation; thus, the sixth factor is considered to indicate motivation.

Next, we selected three questions whose factor loadings were among the three highest for each factor, and we prepared a questionnaire comprising these 18 selected questions. We performed factor analysis using the maximum likelihood method with varimax rotation of the collected data for these 18 questions. As a result, six factors were extracted, and the validity of the questionnaire was confirmed (Table 2). The accumulated proportion of the six factors was 72.30 %. In each factor, the factor loadings exceeded .50 for three questions in each element.

Furthermore, we calculated Cronbach’s alpha coefficient among the three questions for each factor. The coefficient was .959 for the first factor, effectiveness; .833 for the second, efficiency; .872 for the third, satisfaction; .836 for the fourth, understandability; .911 for the fifth, discomfort; and .933 for the sixth, motivation. Since the alpha coefficient for each factor exceeded .80, the reliability of the questionnaire was confirmed.

Table 1. Thirty-six questions and factor loadings. Factor loadings greater than .50 are in gray. (R) indicates reverse scored question.
Table 2. Eighteen questions and factor loadings. Factor loadings greater than .50 are in gray. (R) indicates reverse scored question.

4 General Discussion

Based on the conducted experiment, the usability questionnaire with 18 questions was developed, and its reliability and validity were confirmed (Appendix A). In the following, we indicate the relationships between the six usability elements effectiveness, efficiency, satisfaction, understandability, discomfort, and motivation.

We developed a structural equation model to investigate the relationships. The model is based on two assumptions; (1) the users’ evaluations of effectiveness, efficiency, discomfort, and understandability directly reflect the automation system behaviors and (2) the users’ evaluations of satisfaction and motivation are influenced by their evaluations of the other four elements. Therefore, we separated the usability elements into high-level elements satisfaction and motivation and low-level elements effectiveness, efficiency, discomfort, and understandability to develop the model.

Based on the answer data for the 18 questions in the questionnaire, we performed covariance structure analysis using the proposed model. First, the model fit test results showed that the proposed model is sufficient to explain the data: CFI (goodness of fit) \(=\) .93, TLI (Tucker-Lewis index) \(=\) .92, RMSEA (root mean square error of approximation) \(=\) .09, and SRMR (standardized root mean square residual) \(=\) .05. Second, the analysis results showed interactive influences among the low-level elements: effectiveness and efficiency \((r = .78, p < .001)\), effectiveness and discomfort \((r = .66, p < .001)\), effectiveness and understandability \((r = .58, p < .001)\), efficiency and discomfort \((r = .71, p < .001)\), efficiency and understandability \((r = .68, p < .001)\), and discomfort and understandability \((r = .70, p < .001)\).

The analysis results also showed that the four low-level elements influence one of the high-level elements, satisfaction. The path coefficients indicating the influence of effectiveness, efficiency, discomfort, and understandability on satisfaction were .43 \((p < .001)\), .53 \((p < .001)\), .13 \((p < .001)\), and .09 \((p < .001)\), respectively. The effects of discomfort and understandability were very weak, but negative. Therefore, negative influence on users’ satisfaction was the result of not only their discomfort but also their low understandability. In this experiment, the participants used the operational pathway-displaying system or the non-displaying system to conduct the line-tracing task in various situations. Thus, when the operational pathway-displaying system was used, in particular, in a situation where the auto-tracing system did not trace the line accurately, users’ satisfaction prominently reduced because the displayed pathway and the actual traced pathway were obviously different. Consequently, understandability affects satisfaction negatively.

Moreover, the analysis results showed that the low-level elements effectiveness, efficiency, and understandability affect the other high-level element, motivation. The path coefficients indicating the influence of effectiveness, efficiency, and understandability on motivation were .35 \((p < .001)\), .22 \((p < .001)\), and .06 \((p < .005)\), respectively. Discomfort did not affect motivation. Higher effectiveness, efficiency, and understandability of the system reduced users’ motivation to conduct tasks by themselves. Additionally, there was no correlation between satisfaction and motivation \((r = -.01, p = .20)\).

In summary, we considered automation systems as third-generation artifacts and developed a generalized usability questionnaire for such systems. This questionnaire can be used to evaluate various types of automation systems and is useful for developing and improving automation systems as daily used artifacts.