Background

Usability evaluation is considered critical for successful implementation and optimization of an information system. According to the International Standard Organization (ISO) [1] usability means the extent to which special users can utilize a product to achieve specific goals in particular environments by considering effectiveness, efficiency, and users’ satisfaction. Nielsen [2] defines usability as “a quality attribute which assesses how easy user interfaces are to use”. According to Nielsen, usability is defined by five components including learnability, efficiency, memorability, error prevention, and user satisfaction. In addition, usability evaluation aims to improve a software system by identifying its usability problems and prioritizing them based on their impact on the users. In the field of healthcare, usability concerns the understandability, learnability, acceptability, attractiveness, usefulness, and performance of Healthcare Information Systems. Further, it evaluates how easy these systems are operated by users and to what extent support them to provide effective healthcare services to patients.

Some usability evaluation studies [3,4,5] focused on identifying usability problems in health information systems which affect the users and healthcare organizations. During the past few decades, developers have emphasized the evaluation of healthcare information systems in order to support their users in healthcare organizations since usability evaluation is regarded as an important component in the development process of an information system.

A large number of users with different backgrounds interact with the systems in clinical and administrative environments. Given the limited time of the health care providers, especially nurses, to learn a new information system, on the one hand, and the high cost of training on the other, an appropriate assessment method is required to determine the usability of such a system and to help reducing the time and cost of training [6].

In general, user-based testing and expert-based inspection methods are the two main types of usability evaluation methods [6]. Different usability methods play unique roles in detecting the problems [7, 8] and each method has its advantages and disadvantages. For example, user-based methods mostly detect special problems which prevent users from performing tasks while expert-based methods often identify general user interface problems. Think aloud (TA) and Heuristic Evaluation (HE) are the most common types of these two methods, respectively [6, 9]. TA method, which originated from the field of cognitive psychology, encourages users to express out loud what they are looking at, thinking, doing, and feeling, as they perform tasks [10]. TA is considered as the golden standard of usability evaluation methods, since it provides an in-depth insight into the problems during the user-system interaction [6]. Data obtained from this type of evaluation provides a valuable opportunity for identifying specific problems which the users experience during their workflow [11]. HE is regarded as an informal usability inspection technique in which experts evaluate whether user interface elements of a system adhere to a set of usability principles known as heuristics [12] . Furthermore, HE is a simple and cost-effective method which identifies “Minor” and “Major” problems in a system user interface. Moreover, both methods can be employed in the formative and summative evaluation of a system [6]. Given the potential of each of the user-based and expert- based methods for identifying specific problems in a system, the previous studies have emphasized on using a combination of different evaluation methods [6, 13] . Additionally, according to some studies [9, 14,15,16] the combination of HE and TA can pave the way for designing user interfaces which are appropriate for novice and low experienced users. However, the previous studies [9, 14,15,16,17] that utilized TA and HE methods neither combined the results of both methods nor applied statistical analysis for comparing the methods. In addition, they focused on evaluating a single system with a small sample of usability evaluators or participants. Conducting a study with a sufficient number of evaluators or users and utilizing both quantitative and qualitative analysis can reveal the potential of each method and a combination of methods for detecting different types of usability problems.

The Social Security Electronic System (SSES) is considered as one of the widely used Hospital Information Systems in Iran, which has recently been implemented in all hospitals affiliated with Social Security Organization (SSO). According to a previous study [18], users were not completely satisfied with this information system. Since satisfaction is regarded as one of the main usability components, usability evaluation of this system can reveal the problems diminishing it and other usability components such as effectiveness. Hence, the results of usability evaluation by the above-mentioned methods can help improve the acceptance of the information system by the users in their administrative and clinical environments, resulting in fulfilling the main goals of the SSO such as improving the health and safety of the patients.Previous similar studies in Iran used a standard checklist or questionnaire [19, 20], the TA followed by a questionnaire [21], and the HE method [22, 23] to evaluate the usability of health information systems. To our knowledge, so far no evaluation study has specifically investigated the effectiveness of a combination of user-based and expert-based usability methods worldwide. Accordingly, the present study sought to examine the potential of combining the TA and HE methods for evaluating two main administrative and clinical modules of the SSES (inpatient admission and nursing information systems). The results of the present study are expected to help the designers improve the design of user interfaces of health information systems.

Methods

The aim, design and setting of the study

The current study was conducted to evaluate the usability of the inpatient admission and nursing information modules of the SSES in Iran by combining the Think aloud and Heuristics evaluation methods in 2018.

This study was performed at Payambare-Aazam Hospital in Kerman, which is the largest social security institution in the southeast of Iran. This hospital is ranked sixth in terms of the number of beds among other social security institutions of Iran. The inpatient admissions module of the SSES is mainly used for admitting the inpatient, transferring patients from the emergency rooms to one of the inpatient wards, allocating patients to a clinical ward, providing different statistical reports, as well as managing files, insurance claims and patient discharges. The nursing information system of the SSES is utilized for procedures such as requesting laboratory tests, medications, and other para-clinical materials, as well as recording consultations, physician visits, and all clinical procedures. In the present study, the TA was performed on the nursing information system used for Intensive Care Unit (ICU).

The characteristics of participants

The user interfaces of the two information systems were evaluated by eight medical informatics specialists who were trained in heuristics evaluation (HE). In addition, 18 senior nursing students and 17 undergraduate and postgraduate students in health information technology and medical informatics were invited as the potential users of the nursing information system and inpatient admission information system to participate in the TA tests. None of the participants had working experience with the nursing information and inpatient admission systems of the SSES. The TA tests were conducted in laboratory conditions and away from the actual clinical environment in order to preserve the patients’ safety.

The description of materials

Heuristics evaluation

Eight evaluators independently examined the design of user interfaces related to both nursing information system and inpatient admission information system against 10 Nielsen principles [6] in three to four sessions. Each session lasted approximately two hours and the evaluators identified the violations of each principle as a usability problem and entered them into a list.

Think aloud

To perform the user testing, first, a number of scenarios which cover most of the user’s tasks were designed in consultation with the end-users and the heads of inpatient admission wards and nursing departments.

Figures 1 and 2 illustrate the six and ten most common scenarios used for evaluating the inpatient admission information system and the nursing information system, respectively.

Fig. 1
figure 1

Six scenarios comprising 10 tasks in the admission department

Fig. 2
figure 2

Ten scenarios containing 15 tasks in ICU

Then, all interactions of users with the systems including their speech, gestures, and their actions on the screen were captured using Morae Recorder version 3.3 (TechSmith Corp.) in 35–45 min sessions. Next, eight independent evaluators reviewed all recordings, utilizing Morae Manager in order to identify the problems which the users encountered during their interaction with the systems. Then, these evaluators independently assigned a severity score ranging from 0 to 4 [24] (Table 1) to each identified problem based on three criteria proposed by Nielsen, including the frequency, impact, and persistence [25]. Finally, all problems were classified according to the combination of the six usability attributes proposed by ISO and Nielsen [1, 2], i.e., satisfaction, effectiveness, efficiency, learnability, memorability, and error prevention. It is worth mentioning that the memorability attribute was impossible to evaluate and thus was removed from the classification since the participants interacted with each system only once.

Table 1 The rate of problems based on their severity

Data analysis and comparisons

Qualitative analysis

Duplicate problems were eliminated during three stages as follows. First, all evaluators met in two sessions to investigate the individual lists of problems identified by each method in each system (four lists of problems) and remove duplications within each list. Second, duplicate problems between the two lists of the identified problems by each method in two systems were eliminated in a session and a single list of problems for each method was obtained accordingly. At this stage, the problems were categorized into five groups of ISO-Nielsen usability attributes. Eventually, the duplicate problems between the lists of the two methods were removed, in order to integrate the problems identified by both methods, and the evaluators approved the final list of usability problems in a session.

Quantitative analysis

Data related to all three methods (i.e., TA, HE, and the combined method) were analyzed using SPSS, version 25 (SPSS Inc., Chicago, IL, USA). Further, the Chi-square test [26] was utilized to compare the total number of problems identified by TA and HE methods, as well as the number of problems categorized into different groups between the methods. Ultimately, the relationship between the mean severity scores of problems identified by TA and HE was evaluated using the Mann-Whitney U test since the distribution of the data was not normal.

Results

Table 2 demonstrates the number of problems identified by TA, HE and a combination of these methods, as well as the number of similar problems between the two methods based on ISO-Nielsen usability attributes. As a result, 423 problems remained by removing duplicate problems. HE identified 268 problems in nursing information system and 180 problems in inpatient admission information system. The elimination of duplicates yielded 163 unique problems detected by HE. The number of identified problems using TA in nursing and patient admission information system were 72 and 88, respectively. After eliminating the duplicates between these two groups of problems, 127 unique problems were remained. Finally, forty-five problems were identically identified by both methods.

Table 2 The number of the identified problems per method and usability group

Based on the results of the Chi-square test, a significant difference was observed between the numbers of problems identified by the two methods (P ≤ 0.0001). Furthermore, a significant difference was found between the number of problems identified by both methods in terms of usability attributes (i.e., P < 0.0001, P = 0.034, P < 0.0001, P < 0.0001, and P < 0.0001 for satisfaction, effectiveness, efficiency, learnability, and error, respectively). Moreover, from the total number of the problems i.e., 423 in the combined method (TA + HE), 39, 36, and 25% were detected by HE, TA, and both methods (TA&HE), respectively (Table 2).

Table 3 presents the mean severity level of problems identified by each of the methods per five usability attributes. Based on the results, the mean severity level of problems detected by both methods and the combined method was at the “Major” level (i.e., 3.34, 3.25, and 3.26 in TA, HE, and (TA + HE) methods, respectively).

Table 3 Mean and Standard Deviation of the severity scores of identified problems per method and usability attribute

Generally, the result of the Mann-Whitney U test indicated no significant difference was found between the mean severity of problems identified by the two methods in terms of effectiveness (P = 0.44), learnability (P = 0.41), and error (P = 0.11) attributes. Accordingly, no significant difference was observed between the mean severities of problems detected by using the two methods (P = 0.43). However, a significant difference was found between the mean severities of problems identified by the two methods related to satisfaction and efficiency attributes (P = 0.001 and P = 0.01).

Additionally, Table 4 summarizes some of the most important problems identified by TA and HE. These problems were categorized in terms of the usability attributes.

Table 4 Problems detected by the two methods in terms of usability attributes

Discussion

The results of the present study demonstrated that the number of the problems identified by the Think aloud (TA) and Heuristic Evaluation (HE) methods were different. In addition, both methods identified various problems related to each of the five usability attributes. Further, the mean severity of the problems identified by both methods was at the “Major” level and no significant difference was detected between the mean severities of the problems identified by these methods. However, merely a significant difference was observed between the mean severities of the problems related to the satisfaction and efficiency usability attributes.Consistent with the results of the studies by Karat [7] and Jeffries [8], in this study, HE significantly identified a higher number of problems compared to TA. Conversely, in two previous studies which compared the effectiveness of TA with Cognitive walkthrough (CW) [13], and HE with CW [27], no significant difference was found between the number of problems identified by each of these two methods. Contrary to the study by Hasan [28], in which HE and TA methods identified a higher number of “Minor” and “Major” usability problems, respectively, in the present study, the mean severity of problems identified by both methods was at the “Major” level. Similarly, Khajouei [13] reported that there was no significant difference between the mean severity scores of the problems identified by the two methods. Based on the results of the current study, a significant difference was detected between the two methods in terms of the number of problems identified related to each usability attribute. The TA method identified more problems concerning the effectiveness and efficiency attributes while more problems related to the satisfaction, learnability, and error attributes were identified by using the HE method. In a previous study [27], HE identified a higher number of problems related to satisfaction attribute as compared to CW. HE identified problems with “Major” and “Catastrophe” severity such as the inconsistency of button, fields, and the color of links; the use of the same icons for different tasks and vice versa; and system failure to respond when entering wrong information while TA falls short in finding these problems. Furthermore, TA identified high severity problems such as the need to take multiple steps to perform a task, and the lack of a feedback in response to the users’ actions as well as a search field. Using only HE results in missing such important problems.

TA mostly identified interactive problems which users encounter during the completion of tasks while HE missed these problems. Consistent with the study by Doubleday [29], in this study, each of the HE and TA methods identified many distinctive problems which were not identified by the other method. Based on these results, using only one of these methods in the development process of a system is unable to guarantee a complete usability of that system. Therefore, it is recommended combining these two methods to identify all types of problems and to improve the usability of the system.The results of this study highlighted that the HE method mostly identifies problems concerning inappropriate design of the user interface components. In line with this finding, the results of a previous study reported that this method often identifies common and general problems in the design of system user interfaces [9]. However, the TA method identifies problems which hinder users from accomplishing specific tasks due to the lack of some necessary features in the system. Examples of these problems are the impossibility of searching patients on the home page in the inpatient admission system, the lack of the functionality to retrieve laboratory tests and medications in the nursing information system, failure to display information needed by clinicians, the lack of system help and a breadcrumb element, and failure to provide feedback in response to users’ actions. Given that none of the HE principles cover these problems, results of HE may not fully meet the cognitive needs of users. By considering the limited scope of problems identified by each method, it is recommended to apply a combination of the two methods (TA + HE) to effectively evaluate a health information system.

Our work clearly has some limitations. First, evaluating all modules of the Social Security Electronic System was impossible since this Hospital Information System is a large system with multiple modules. In this regard, to be able to examine the maximum functionalities of the system, we evaluated two clinical and administrative modules of this system (i.e., nursing information system and inpatient admission information system). Second, to avoid interference with providing health care services to patients and adhere to the regulations of patient safety, the usability tests were performed in the laboratory setting. To simulate the real working environment without threatening patients’ safety we used dummy patient information. In addition, the scenarios were designed in such a way that they cover all real user tasks, including simple, medium, and complex tasks. Finally, since the users only could accomplish each task once, it was impossible to examine potential problems related to the memorability attribute. Accordingly, users emphasized their need for training to learn how to use the system effortlessly and sought for the system help. These results indicate potential memorability problems of the systems. Future studies can identify memorability problems by conducting the tests at appropriate intervals.The previous studies [14, 30,31,32,33,34] that compared the effectiveness of one or both of the methods used in this study recruited a lower number of evaluators or users than the present study. These studies only evaluated a single system and did not use statistical analysis to compare the methods. Based on the results, there were significant differences between the two methods in terms of the number and type of usability problems. Consistent with the results of previous studies [14, 30, 35, 36], the results of this study emphasize using a combination of the two methods as complementary to each other. The results of the present study can help the decision-makers and information technology managers of hospitals and clinical centers to select an appropriate method for evaluating the usability of health information systems and to improve it. As a result, the end-users of these systems, especially nurses and physicians will have an easy and successful interaction with these systems.

Conclusion

The results demonstrated that each of think aloud (TA) and heuristic evaluation (HE) methods can identify different usability problems. The HE method mostly detected problems related to satisfaction, learnability, and error prevention attributes while the TA method mainly identified problems related to effectiveness and efficiency attributes. Since the problems detected by each of the methods were at a “Major” severity level, using only one of these methods can result in missing a number of important problems which are merely detectable by the other method. Since using a combination of user-based and expert-based methods can lead to the identification of almost all the usability problems, it is recommended to use it for evaluating the usability of healthcare information systems. In the present study, we combined two of the most common user-based and expert-based methods. Since there are various methods of user-based and expert-based methods, future studies can examine the effect of combining other methods. This can provide a good insight for selecting the most appropriate method to evaluate specific systems.