Usability Evaluation of in-Vehicle AR-HUD Interface Applying AHP-GRA

Usability is regarded to be a fundamental requirement for in-vehicle HMIs, and usability evaluation reflects the impact of the interface and the acceptance from the users. This study introduced a usability evaluation model of AR-HUD interface by applying grey analytic hierarchy process (AHP). First, based on the ameliorated PSSUQ (Post-Study System Usability Questionnaire), the usability evaluation system was modified and optimized according to the characteristics of AR-HUD. On this basis, the preference weights for evaluation indexes were calculated by AHP and the idea of group decision. Finally, the criteria of usability were integrated into grey relational degree by applying grey relational analysis (GRA) to obtain optimal design. A case study was conducted to demonstrate the applicability of the developed model to the usability evaluation of AR-HUD interface design. According to the existing AR-HUD interface design, 7 dimensions of design elements (A-G) and 18 interface prototypes (S1-S18) were selected by Taguchi orthogonal array test (TOAT). As the results indicated, the grey relational degree of S5 was 0.923, signifying that it was the optimal sample; and the results were also compared with entropy-TOPSIS to verify the feasibility of the proposed method. The grey-based AHP evaluation model can be used to evaluate the usability level of AR-HUD interface effectively, which may help designers achieve insights for design process and samples decision-making.


Introduction
According to data released by the Ministry of Public Security of China in 2016, the main cause of traffic accidents is the driver's inattention and distraction while driving [1], bringing about a large number of traffic crashes and vehicle collisions [2]. With the increasing use of smartphones, central control displays and navigation systems, the severity of distraction is further deepened [3]. Mimura [4] found that if the driver's sight deviates from the driveway for no more than 2 s, the risk of traffic accidents would increase greatly. The all-over displays represent an important symbol in the digitalization of the future automobile [5]. Particularly, 80% of driving information is visually perceived [6,7]. Therefore, visual interface design is of great significance to the usability of the vehicle human-machine interface (HMI).
Accordingly, providing essential information during driving while maintaining driver's FOV on the road can be an effective way to reduce the risk of traffic accidents [8]. In recent years, first-tier automobile enterprises have begun to implement AR-HUD (Augmented Reality-Head Up Display), i.e. a "virtual display" that the driver can watch with his head up, projecting real-time driving information, such as speed, navigation and warning signals etc. The driver can obtain driving information while looking ahead, which could reduce the risk of traffic accidents that may occur due to the driver's visual distraction [9]. The optical breakthrough of AR-HUD provides a more natural image processing method. This technology has also become the main development trend of automobile HMI [10]. With the whole windshield as the display medium, the display image and other information are fused with the real environment [11]. The AR-HUD system is a device that can not only reduce the drivers' cognitive load, but also mitigate the danger of accidents caused by line-ofsight deviation [12]. AR graphics can increase drivers' forward situational awareness and cognition while minimizing distraction [13][14][15]. The design of AR HUDassisted driving systems should take the risk of inattention into account and propose countermeasures accordingly [16].
Research on the usability of AR-HUD interface can be roughly divided into two categories. One is to study how to improve the usability of the interface through the elements design of interface; the other is to comprehensively consider the usability level of the interface, and summarize the research to assist design decision-making. Aiming at serving the elder drivers, Alexandra [17] conducted a driving experiment to evaluate the in-vehicle HMI usability and proposed a design strategy for promoting HMI usability and acceptance. Toffetti [18] performed a user test to assess the usability of HMI prototype, whose findings were used during the subsequent human-centric design phase. Park [19] analyzed the AR-HUD interface through different methods, and proposed the design principles of interface usability for reference. Li [20] measured the impacts of three types of HUDs on experienced and inexperienced drivers, and assessed their driving performance. On this basis, he put forward an optimized design strategy for the HUD interface. Researches on summative aspects mainly focus on mobile applications, websites design or products design, while the design of in-vehicle HMI is relatively rare. The usability of mobile apps design and products design is very different from that of in-vehicle HMI design. This study will explore the usability evaluation method of in-vehicle AR-HUD from the perspective of summative research.
Grey relational analysis (GRA) is a quantitative multi criteria decision-making (MCDM) method for analyzing the relationship of discrete arrays [21]. This method is very suitable for the MCDM system featured with "small sample, poor information, clear extension and unclear connotation", thus fitting properly with the features of interface usability. In addition, since the various components of usability are not equally weighted, it is also necessary to consider the user's preferences for different evaluation factors when analyzing the evaluation information. Analytic Hierarchy Process (AHP) is deemed as an effective method that can convert users' subjective attitudes towards different evaluation factors into quantitative weights [22]. At present, few studies have integrated GRA and AHP to evaluate the usability of in-vehicle AR-HUD interface design.
This study firstly proposed an evaluation model of vehicle AR-HUD interface usability by applying grey-based AHP. To begin with, AHP was used to obtain the preference weights of users for different evaluation indexes. Then, the performance of all evaluation indexes was integrated into grey relational degree by GRA method, on the basis of which the design optimization was made. A prototype interface was taken as an example for case study and method verification.

Usability
Usability is an important quality index for assessing HMI, and many research results have been accumulated in the fields of interaction design, experience design and other related fields [23]. Simply speaking, it is to meet the users' needs, and provide the maximum convenience and minimum errors to the users. The ability to perform functional tasks is the core of usability. Usability evaluation is an important part of the overall user interface design process, which consists of iterative cycles of designing, prototyping, and evaluating [24] [25]. A wide range of usability evaluation techniques have been proposed, and a subset of these is currently in common use [26]. For example, Thacker and Tullis [27] extracted four indicators, namely, interface information density, object clustering degree, layout complexity and user extraction time for static digital interface evaluation, and then determined the prediction model of user search times and interface satisfaction. Streveler and Wasserman [28] also proposed a quantitative evaluation system for interface layout to determine the indicators and algorithmic accuracy of expert users ' evaluation of screen layout.
The PSSUQ (Post-Study System Usability Questionnaire) is a 19-item instrument designed for assessing users' perceived satisfaction with their HCI systems [29]. The PSSUQ has been used by Walch et al. [31] to evaluate in-vehicle HMI. The present study thus examined the applicability of these questionnaires for the evaluation of in-vehicle HMIs. Based on PSSUQ, this study established an evaluation system suitable for automobile AR-HUD interface usability, and then applied AHP and GRA to design evaluation.

AHP
AHP, proposed by Satty, is a multi-objective decision analysis method which combines qualitative analysis and quantitative analysis [31]. Its main idea is to decompose the complex system into several levels and elements, compare and judge the importance of the two elements, establish the judgment matrix, and obtain the weight of different factors by calculating the maximum eigenvalues and eigenvectors of the judgment matrix. AHP is an effective analysis method that can transform users' qualitative attitudes into quantitative weights, thus being widely used in MCDM problems [32]. The speciality of AHP lies in its flexibility to be integrated with different techniques like Linear Programming, Quality Function Deployment, Fuzzy Logic, etc. This enables the 1 3 user to extract benefits from all the combined methods, and as a result, achieve the desired goal in a better way [33].
In the comparative judgment process of AHP, there may be some limitations in individual experience cognition of interface usability, which affects the objectivity of evaluation results. Group decision-making method can effectively compensate for the shortcomings of individual user evaluation process by integrating cognitive attitudes from different users [34]. In this study, the idea of group decision was integrated into AHP to analyze the preference weight of vehicle AR-HUD interface usability.

Grey Relational Analysis
Grey system theory mainly solves the problems in the case of incomplete information, uncertain behavior mode and unclear operation mechanism [35]. GRA in grey system theory transforms the discrete behavior of system factors into piecewise continuous broken lines through linear interpolation, constructs the sample relational degree according to the geometric characteristics of broken lines, and then evaluates the overall level of the system according to the relational degree [36] [37]. At present, GRA has been successfully applied to many research fields such as manufacturing process [38], and decision-making [39]. In this study, GRA was used as the basis for the usability evaluation of invehicle AR-HUD interface. The main reasons are as follows.
(1) Grey system theory is ideally suited for the problems of clear external information and unclear internal information [40]. Usability system is a typical grey system. This study attempts to explore the usability evaluation of vehicle AR-HUD interface with uncertainty through known subjective evaluation and objective performance. (2) Evaluating interface usability through GRA usually does not require a large sample size, and designers can rely on only a small and discrete amount of evaluation information to evaluate usability and obtain important design clues. (3) By using GRA, designers can infer the level of grey relation of samples based on the accessible information and understand the relative degree of relationship between samples, which can effectively compensate for the lack of subjective judgment.

Proposed Method
In this study, a usability evaluation model of AR-HUD interface was proposed based on AHP and GRA, and its research architecture is shown in Fig. 1. Firstly, the usability evaluation system was constructed. Then, the weight of the indicators in the evaluation system was analyzed by AHP which integrated the idea of group decision-making. On this basis, according to the weight of the evaluation indicators, the data of each indicator in the usability evaluation test were integrated into the grey relational degree through GRA. According to the grey relational degree, the sample was prioritized to obtain the optimal sample prototype.

Construction of Vehicle AR-HUD Usability Evaluation System
In this study, the evaluation system of interface usability was constructed based on the PSSUQ usability evaluation model. Since the evaluation system is mainly aimed at two-dimensional display interfaces such as mobile apps and website pages, and the AR-HUD interface of automobile is threedimensional and interactive, visual interference and visual fatigue were added [41] so as to conduct a comprehensive study on usability evaluation indexes such as occluding the real environment; meanwhile the indexes in the original evaluation system were properly revised around the design evaluation of the interface.

Weight Analysis of Evaluation Index
The weight analysis of the evaluation index includes five steps [42]: designing the questionnaire and survey, establishing the judgment matrix, calculating the eigenvalues and their eigenvectors, consistency checking and the weight calculation of the index fused with group decision.
(1) The scales in the questionnaire consist of the following five classes: equally important, slightly important, quite important, extremely important, and absolutely important. The five scales correspond to the evaluation values of 1, 3, 5, 7, and 9; while 2, 4, 6, and 8 are the middle values of the adjacent scales; similarly, the disadvantage comparison can also be divided into 9 scales, which are assigned a weighting value of 1/9 to 1. (2) According to the evaluation results, the judgment matrix of n evaluation indicators can be obtained, as shown in Eq. (1) where C 1 , C 2 , ⋯ , C n is the usability evaluation index of vehicle AR-HUD interface; a ij represents a quantitative judgment nal is a comparison of the indicator itself and is therefore equal to 1, and a ij = 1 a ji , a ij > 0.
(3) On the basis of judgment matrix, the feature vector W = W i 1×n and corresponding eigenvalue max of index is calculated by using the normalized mean value of row vector, calculated as follows: The consistency check uses consistency indicators (C.I.) and consistency ratios (C.R.), R.I. represents ratio indicator (the R.I. value of the corresponding order can be found by looking up the table [43]). When the ratio of C.I. and R.I. is less than 0.10, it means that the consistency meets the requirement [44], as shown in Eq. (3) and Eq. (4), (5) All index weights ( W k i ) that pass the consistency check are grouped and integrated to obtain the final weight coefficients of each index ( W G i ). The aggregation of individual priorities (AIP) method is used, which achieves the integration of weights by geometrically weighting the weight vectors among decision makers [45] [46], as shown in Eq. (5) and Eq. (6).

Generation of Test Samples
Based on the main driving task, the functional requirements of vehicle AR-HUD in the use scenarios were analyzed by the focus group, and the corresponding design parameters were summarized. Referring to the relevant literature and practical design specifications [47][48][49][50], the AR-HUD interface design parameters table was obtained. Finally, the Orthogonal Array (OA) design based on Taguchi Orthogonal Array Testing (TOAT) method was used to determine the representative combination from the type combination of all design parameters. With reference to the mainstream products in the current market, the typical samples were selected as the test sample.

Usability Evaluation Test
The driver' s subjective perception of the human-machine interface of the AR-HUD system is an important part of evaluating the AR-HUD auxiliary driving system. Drivers are invited to conduct interface usability evaluation experiments, and the usability is measured according to the constructed evaluation system. The measurement of subjective evaluation items is based on the 7-level Likert scale [51] (1 representing "strongly disagree", and 7 "strongly agree'').

GRA of Test Results
GRA analysis of the experimental results includes five steps [35]: (1) Define sample comparison sequence. Define the results of the interface usability evaluation test as a comparison sequence X i according to the sample number.
where i denotes the number of the sample, X i (k) represents the evaluation value of the i sample under the k secondary index. (2) Conduct comparison sequence dimensionless processing. Since the dimensions of each index of interface usability may be different, it is necessary to conduct dimensionless processing, i.e., to adjust the sequence by dividing the column to ensure that the scores of all sequences are in the interval [0,1].
(3) Define the sample reference sequence. According to the feature of the evaluation index, the ideal value is found in the comparison sequence to form a new sequence, which is defined as a reference sequence X 0 . (4) Calculate the grey relational coefficient (X 0 ) . The grey relational coefficient reflects the degree of relational between the dimensionless values of each design sample and the optimal values, calculated as follows: where is the resolution coefficient, and usually = 0.5 (5) Calculate the grey relational degree of each sample. In order to compare each design sample on the whole, it is necessary to integrate the grey relational coefficient into a value, i.e. the grey relational degree r i .

Determination of Optimal Sample
TO obtain the overall optimal design samples, the grey rational degrees of all samples are calculated by Eq. (10). The samples are ranked based on grey relational degree between the test samples. If r i ≥ r j , it indicates that sample X i is superior to sample X j [52]. The optimal sample can be determined in accordance with this logic. A design sample with maximum grey relation degree is the optimal one among all tested samples, as it will have highest main effect on usability evaluation [53].

Case Study
According to the polices issued by the State Council in 2021, it is necessary to promote the application of intelligent Internet-connected vehicles (intelligent vehicles, autonomous driving and vehicle-road coordination) [54]. With the support of macro policies, more and more automobile enterprises have invested in the research and development of vehicle intelligent technology, such as intelligent voice interaction, AR-HUD and other technologies. Based on practical work, this study set AR-HUD interface as a case study.

Construction of Vehicle AR-HUD Usability Evaluation System
Based on the usability evaluation model proposed [31] [41], and combined with the driver's physiological and psychological demands and the characteristics of the automotive AR-HUD system, the usability evaluation system of the vehicle AR-HUD interface was constructed from the perspective of effectiveness, interference and reliability. The system contains a total of 13 evaluation indicators, as shown in Fig. 2. The reliability analysis of the above evaluation system shows that its Cronbach = 0.928, which means the reliability meets the requirement [55]. Therefore, the evaluation system is correct and reliable, and can be used for the usability evaluation of AR-HUD interface design.

Weight Analysis of Usability Evaluation Metrics
A total of 30 subjects were invited, including 15 males and 15 females (Mean = 29.4, SD = 4.07). All the subjects were in good health, with normal vision (or corrected) and no reported achromatopsia or tritanopia. First, the subjects were asked to compare the evaluation indicators at the same level in pairs to obtain the judgment matrix of the evaluation indicators; then, according to Eq. (2), the weight vector and its eigenvalues of the evaluation index could be obtained; on this basis, the eigenvalues were substituted into Eq. (3) and Eq. (4) for consistency check, as shown in Table 1; next, according to Eq. (5) and Eq. (6), the weights that passed the consistency check were grouped and integrated, and the relative weights of the evaluation indicators were obtained; and finally, according to the hierarchical structure, the relative weight value of the evaluation index was multiplied by the relative weight value of the upper dimension to obtain the absolute weight of the evaluation index. The results are shown in Table 2.

Generation of AR-HUD Interface Test Samples
Combined with the constructed evaluation system and related literatures [23,26,29], the vehicle AR-HUD application scenarios and usage scenarios were analyzed through focus group method and causal analysis method, and a total of 7 design parameters were summarized(A-G), and each parameter contained 3 levels, as shown in Table 3.
The full factorial of this combination would have required up to 2187 (3 7 ) samples. Taguchi's orthogonal array testing (TOAT) ensures the testing scenarios providing good statistical information with a minimum scenario number in the uncertain operating space, which significantly reduces the testing burden [56]. TOAT has been proven to be able to select optimally representative scenarios for testing from all possible combinations [57]. By using the TOAT method, 18 representative design samples were selected from all combinations through the L 18 (3 7 ) orthogonal table, as shown in Table 4.
According to the above orthogonal table, 18 different design samples were obtained, as shown in Fig. 3.

AR-HUD Interface Usability Evaluation Test
A total of 30 subjects were invited, including 15 males and 15 females (Mean = 29.4, SD = 4.07), and all subjects had more than 2 years of driving experience. The subjects were asked to evaluate the 18 samples respectively at random order to balance the impact of practice effect in the experiment [58]. During the test, the subjects were asked to evaluate these samples by the order of Validity, Intrusion and Reliability. Then the subjects evaluated the information quality and interface quality usability according to the completion degree of the tasks. The results of the AR-HUD interface usability evaluation test are shown in Table 5.

GRA of Experimental Results
The experimental data of the 18 AR-HUD interface design samples were converted into a comparison sequence by Eq. (7), and the comparison sequence was nondimensionalized according to Eq. (8) to obtain X 1 ∼ X 18 , as shown in Table 6. Then according to the properties of the evaluation index, the most ideal value was selected from all the dimensionless series to form a reference sequence X 0 , Finally, according to Eq. (9), the grey relational coefficients ( 1 ∼ 18 ) of the 18 interface design samples were calculated respectively, as shown in Table 7. Based on Eq. (10), the grey relational coefficients of each sample were converted to a single value, and the calculation result was shown in Table 8.

Optimization of Vehicle AR-HUD Interface Design
From the results of GRA, it can be found that, as shown in Table 9, sample 5 has the highest grey relational degree of 0.923, representing the best interface usability; followed by sample 8 (0.810) and sample 9 (0.789). The worst interface usability evaluation result among all the samples went to sample 14 (0.477), followed by sample 3 (0.506) and sample 7 (0.509). As a result, sample 5 denoted the optimal design. Since the usability evaluation of the in-vehicle HMI interface is different from the general VDTs (visual display terminals), its evaluation criteria are mainly reflected in the visual display effect and the perception of information content [59][60][61]. Therefore, the grey relational coefficient and the evaluation factors related to information display (A1 ~ A8, B1 ~ B3) were analyzed, and the Pearson test results are shown in Table 10. The grey relational coefficient showed a significant positive relational with A1, A2, A4, A5, A6, A8, and a significant negative relational with B1, B2, B3, and no significant correlation with A3, A7.

Result Interpretation
The grey relational coefficient of readability (A1) is in the order of S5 > S8 > S9 > S18 > S11, which is basically consistent with the overall grey coefficient of the samples. Also, the Pearson analysis results showed that the p-value for readability equals 0.891, indicating that readability has the most significant effect on interface usability. The grey relational coefficient of occlude the real environment (B1) is ranked as S18 > S9 > S8 > S11 > S5, which served as the exactly the opposite order of user evaluation, as  B1 showed a negative relation with grey relational coefficients. Combined with the samples, it can be seen that when the form of the speedometer is digit or digit & round, the interface usability under these two parameters is relatively high because it can make the drivers read the specific number directly instead of any further cognitive processing [63]. However, due to the fact that the round speedometer does not directly display the numbers or the number displayed is too small, more efforts are required to identify the speed. Accordingly, its usability is poor, and should be avoided in practical design.    S5 > S8 > S9 > S11 > S18 > S1 > S15 > S13 > S16 > S2 > S17 > S12 > S6 > S10 > S4 > S7 > S 3 > S14 In addition, by summarizing the oral description of the subjects after the experiment, several clues were found that might be useful for the design. For example, for navigation icons, most participants indicated that the navigation of arrows would overlap with the pointing arrows on the road under certain circumstances, or be too similar to the arrows on the road, resulting in cognitive confusion, and thus affecting the main task of driving. This was similar to the conclusion drawn by the previous literature [64], so they were more inclined to the navigation icon of the boomerang type; When the speed font is sans serif, the visual clarity and readability is the best [65]. Although the LED font has a strong sense of fashion and artistic design, it does not meet the common speed signs in daily driving and requires additional cognitive processing in reading.

Methods Validation
To verify the validity of the grey-based AHP evaluation model constructed in this study for sample optimization and usability evaluation results, the proposed method was compared with another decision-making method, i.e. entropy-TOPSIS [66]. As has been stated clearly in Sect. 5.1, the purpose of this study is to evaluate reference-worthy samples, the top 5 samples (S5, S8, S9, S11, S18) were selected for method validation. The evaluation process of entropy-TOPSIS method is shown in Fig. 5, and the evaluation values and ranking results obtained are shown in Table 11.
It can be seen that although the two methods yield different evaluation orders, the selection for the optimal sample remains the same. This also verifies the feasibility of the decision model constructed in this study for the usability evaluation and optimization of AR-HUD interface design. However, the grey-based AHP-GRA evaluation model is more credible than the entropy-TOPSIS method in determining the weighting samples; and it also possesses the advantages of lower data requirements, less workload, and less influence when facing the problem of multiple samples evaluation and decision making [37,39,45], which the entropy-TOPSIS evaluation method does not require.

Limitations
Given that the purpose of this study is rather to present a new methodology, some aspects of the case study may appear incomplete or neglected. For example, 13 usability evaluation indicators were used; yet, there exist some specific characteristics for in-vehicle AR-HUD interface, such as safety, information intensity. This may be improved in future studies. Although an in-vehicle AR-HUD interface was used as an example, the proposed method can be applicable to other HMIs as well. However, it is necessary to re-analyze the design parameters of HMI based on the specific characteristics of the interface, and revise the usability evaluation system.

Conclusion
Usability is an important issue of user experience in automobile industry. To evaluate the interface usability, a greybased AHP-GRA evaluation model was constructed in this study by combination of the AHP and GRA to compensate for the deficiency of a single evaluation method, and the applicability of the AHP-GRA evaluation model was verified through analysis on eighteen selected design samples of in-vehicle AR-HUD interface prototypes. The following conclusions are drawn in this study: (1) The AHP-GRA hybrid evaluation model can better complement the lack of accuracy and objectivity of a single evaluation model, and it can evaluate the interface usability in a more scientific and reasonable way.
(2) The AHP-GRA evaluation model is adopted for usability evaluation, which helps designers find the optimal design prototype in the candidate samples, and obtain valuable design clues.
(3) With minor modifications on the evaluation system, the proposed method can be used to compare the usability of various HMIs.
Author Contributions C has made significant contribution to the paper writing and data analysis, Z and Y have assisted in paper editing and methology, and T offers interface design support.

Conflict of Interest
The authors declare that they have no competing interests.
Ethics Approval Not applicable.

Consent for Publication Not applicable.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.