Does visual saliency affect decision-making?

In the present study, we explore potential effects of visual saliency on decision quality in context of multi-criteria decision-making (MCDM). We compare two visualization techniques: parallel coordinates (PC) and scatterplot matrices (SPM). We investigate the impact of saliency facilitated by means of either color or size. The saliency and visualization techniques were factors in our analysis, and effects were evaluated in terms of decision quality, attention, time on task, and confidence. Results show that the quality of choice and attention were comparable for all saliency conditions when SPM was used. For PC, we found a positive effect of color saliency both on the quality of choice and on attention. Different forms of saliency led to varying times on task in both PC and SPM; however, those variations were not significant. A comparison of PC and SPM shows, users spent less time on the task, obtained better decision quality, and were more confident with their decision when using PC. To summarize, our findings suggest that saliency can increase attention and decision quality in MCDM for certain visualization techniques and forms of saliency. Another contribution of this work is the novel suggestion of the method to elicit of users’ preferences; its potential benefits are discussed in the end of the paper.


Introduction
A number of studies (e.g., Jarvenpaa 1990; Glaze et al. 1992;Lohse 1997;Speier 2006;Lurie and Mason 2007) have shown that more vividly presented information is likely to be acquired and processed before the less vividly presented information. Increasing the use of salient information may come at the expense of ignoring other relevant information (Glaze et al. 1992), which may have significant implications in the context of decision-making. As far as we know, though, there are no previous studies where the influence of visual saliency has been evaluated for its impact on the performance, i.e., the quality of choice in multicriteria decision-making (MCDM). Indeed, this is true not only for visual saliency, but for the impact of almost any aspect of visualization on MCDM. One of the few exceptions is the study by Dimara et al. (2018), where the authors attempt to evaluate three different visualization techniques (scatterplot matrix, parallel coordinates, and tabular visualization) for their ability to support decision-making tasks. They use a novel approach, defining the quality of decisions as the consistency between the choice made and the selfreported preferences for criteria. The authors observed no indication of differences between different visualization techniques. This, at least in part, may be due to the shortcomings of the method they used to elicit participants' preferences.

Objectives and research questions
The main gal of our study is to investigate potential effects of visual saliency on multi-criteria decisionmaking. Our first objective was to evaluate the effects of saliency on the outcome of a decision process, i.e., on the quality of decisions. The second objective was to evaluate in what way visual saliency may affect users' attention during the decision process. These objectives are achieved answering the following research questions: 1. How do the introduced saliency modes (no saliency, color saliency, size saliency) compare with regard to quality of decisions? 2. How do the introduced saliency modes compare with regard to users' attention to the most preferred criterion? 3. How do the introduced saliency modes compare with regard to time spent on decision tasks? 4. How do the introduced saliency modes compare with regard to users' confidence in decisions?
To our knowledge, there are no previous studies on the impact of visual saliency on decision-making. In that respect, our study makes an important contribution to the research concerned with the role of visualization in the context of multi-criteria decision-making. Furthermore, we suggest an alternative method for elicitation of users' preferences, which we believe improves the reliability of the presumably accurate ranking of alternatives. We use the same approach as suggested in Dimara et al. (2018) to obtain indicative measure of the quality of decisions. However, we use a different method, SWING weighting, to assess participants' preferences for criteria. In SWING weighting, preferences for criteria are obtained considering ranges of values in criteria, instead of rating the importance of criteria without considering the values of actual alternatives.

Theoretical background
The terms necessary for understanding the concept of multi-criteria decision-making and decision tasks are explained in Sect. 2.1. In Sect. 2.2 we give some examples of how visualization is used in in today's decision support systems, and in Sect. 2.3 we address relevant issues regarding the evaluation of visual decision support tools. We explain the concept of visual saliency and give a brief overview of studies concerning the impact of saliency on decision-making in Sect. 2.4.

Multi-criteria decision-making
The central task of multi-criteria decision-making, sometimes referred to as multi-criteria decision analysis (MCDA), is evaluating a set of alternatives in terms of a number of conflicting criteria (Zavadskas et al. 2014). Keeney and Raiffa (1993) define MCDA as ''... a methodology for appraising alternatives on individual, often conflicting criteria, and combining them into an overall appraisal.'', and summarize the paradigm of decision analysis in a five-step process: 1. Preanalysis. Identify the problem and the viable action alternatives. 2. Structural analysis. Create a decision tree to structure the qualitative anatomy of the problem: what are the choices, how they differ, what experiments can be performed, what can be learned. 3. Uncertainty analysis. Assign probabilities to the branches emanating from chance nodes. 4. Utility analysis. Assign utility values to consequences associated with paths through the tree. 5. Optimization analysis. Calculate the optimal strategy, i.e., the strategy that maximizes expected utility.
Multi-criteria decision-making is often classified as either multi-attribute (MADM) or multi-objective (MODM). Colson and de Bruyn (1989) define MADM as ''...concerned with choice from a moderate/small size set of discrete actions (feasible alternatives)'' and MODM is defined as the method that ''... deals with the problem of design (finding a Pareto-optimal solution) in a feasible solution space bounded by the set of constraints''. One of the most popular MADM methods is Analytic Hierarchy Process (AHP) (Saaty 1980), a method based on decomposition of a decision problem into a hierarchy (goal, objectives, criteria, alternatives), pairwise comparisons of the elements on each level of the hierarchy, and synthesis of priorities. Ideal point methods, such as Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) (Hwang and Yoon 1981), evaluate alternatives in relation to a specific target or goal (ideal point). Another frequently used family of methods are outranking methods, such as ELECTRE (Benayoun et al. 1966) and PROMETHEE (Brans and Vincke 1985), which are based on pairwise comparison of alternatives for each criterion. Weighted Linear Combination (WLC) and its extension Ordered Weighting Averaging (OWA) are methods based on the simple additive summation of the products of criteria weights and criteria values for each alternative. It is important to emphasize that we in this paper use the term criteria weight for weight coefficients of utility functions of criteria. These criteria weights are scaling constants as described in Keeney and Raiffa (1993). The basis for criteria weights are participants' preference evaluations of criteria ranges and thus not ranking of criteria or answers to questions of importance of criteria. The calculation of weight coefficients of utility functions is explained in Sect. 3.7. In this paper, we refer to the criterion with the highest weight as the most preferred criterion.
In this study we are concerned with visualization as a support for multi-criteria decision-making, where visual features are used to represent the alternatives in the attribute space. Regardless of the decision method used in a particular decision task, visualization can help the decision-maker to get insight into the distribution of alternatives, to get better understanding of the relations between criteria and potential trends that are difficult to detect in raw data, to detect potential outliers which may lead to reassessment of the criteria weights, etc.

Use of visualization in decision support systems
Virtually all today's decision supports systems rely in one way or another on interactive visualizations to present not only a decision space with available alternatives or outcomes but even more abstract variables, such as criteria weights, utility differences between different outcomes, and decision-maker's preferences. Dimara et al. (2018) listed a number of decision support tools designed to aid multi-criteria choice using different visualizations, such as parallel coordinates (Riehmann et al. 2012;Pu and Faltings 2000;Pajer et al. 2017), scatterplots or scatterplot matrices (Pu and Faltings 2000;Ahlberg and Shneiderman 2003;Elmqvist et al. 2008), or tabular visualizations (Carenini and Loyd 2044;Gratzl et al. 2013). Many recently developed decision support tools use combinations of the mentioned visualizations for different purposes.
PriEsT (Siraj et al. 2015), based on Analytical Hierarchy Process (AHP) (Saaty 1980), uses table views and graph views to show inconsistencies in the decision-maker's judgments regarding the importance of criteria (judgments which violate the transitive property of ratio judgments are considered inconsistent). Pareto Browser (Vallerio et al. 2015) uses three-dimensional graphs to visualize the Pareto front, two-dimensional graphs for states and controls, scatterplots for visualization of objective functions, and parallel coordinates for visualization of Pareto optimal solutions. Visual GISwaps (Milutinovic and Seipel 2018), a domainspecific tool for geo-spatial decision-making, uses interactive maps to visualize alternatives in geographical space, a scatterplot to visualize alternatives in attribute space, and a multi-line chart for visual representation of trade-off value functions. Apart from the mentioned visual representations, other visualizations have been used in the decision-making context. Decision Ball (Li and Ma 2008) is a model based on the even swaps method (Hammond et al. 1998); it visualizes a decision process as moving trajectories of alternatives on spheres. VIDEO (Kollat and Reed 2007) uses 3D scatterplot to visualize up to four dimensions, where the fourth dimension is color coded, and in AHP-GAIA (Ishizaka et al. 2016), a n-star graph view is used to visualize the decision-maker's preferences.

Evaluation issues
Regardless what method or tool is used as a support in a decision-making process, the outcome is ultimately dependent on the decision-maker's preferences, expectations, and knowledge. The fact that decision tasks by definition do not come with an objectively best alternative makes comparative evaluations of these tools and methods difficult, as there exists no generally best outcome, nor are there reliable metrics for measuring their efficiency. Evaluation of visual decision support tools is even more difficult, as evaluating visualizations in itself is a demanding task. This is one of the main reasons that such non-comparative evaluations are usually performed through qualitative studies, focusing on user opinion and perception (e.g., Pajer et al. 2017;Salter et al. 2009;Jankowski et al. 2001)).  used a process tracing-based approach to evaluate tools and techniques in CommonGIS, observing the participants while working with appropriate tools for different tasks. Arciniegas et al. (2011) performed an experiment to assess usefulness and clarity of tool information in a set of collaborative decision support tools. The assessment was based on participants' ratings of the experience with the tool as well as their answers to a number of questions related to their understanding of the tool. In Gratzl et al. (2013) an experimental study was used for qualitative evaluation of LineUp-a visualization technique based on bar charts. The tool was evaluated using a 7-point Likert scale, based on the questionnaire provided to the participants.
Use of quantitative evaluation methods is more common in comparative studies. For example in Carenini and Loyd (2044), a quantitative usability study was performed to compare two different versions of ValueCharts based on user performance in terms of the completion time and the quality of choices on low level tasks. Andrienko et al. (2002) performed a quantitative study to test five different geovisualization tools implemented in CommonGIS for learnability, memorability and user satisfaction.
Even when quantitative methods are used in evaluations of MCDM decision support tools and methods, objective measurement of performance is rarely used to assess the effectiveness of a tool, as there are no objective metrics for measuring the quality of a choice, and constructing reliable performance metrics is extremely demanding and difficult task. The only study known to us in which such performance metrics was used to assess the impact of a decision support tool on the quality of decisions was presented in Arciniegas et al. (2013). In their study, the authors measured the impact on decisions of three different decision support tools. Quality of a choice was used as the metrics to assess the impact on decisions. The quality of a choice was determined by comparing the made choice with the utility values of different choices based on expert judgment. However, one obvious problem with this approach is that the participants' preferences and knowledge were not taken into consideration. Instead, the objective ranking of the different choices, thus the existence of an objectively best choice, is assumed. It may then be argued that the task performed by the participants was not a proper decision-making task; it was de facto to find the best solution, rather than to make an informed choice.

Visual saliency
Looking at Fig. 1 exemplifies that attention will most certainly be drawn to the green circle in image 1, and the larger circle in image 2. This is because those two visual elements differ from their surroundings-they pop out. Indeed, visual attention is attracted to parts of an image which differ from their surroundings, may it be in color, contrast, intensity, speed or orientation of movement, etc. This attraction, which is the effect of bottom-up visual selective attention, is unrelated to the actual relevance of the salient object-it is not voluntary, but purely sensory-driven.
Psychophysical and physiological aspects of visual attention have been the subject of many studies (e.g., Koch and Ullman 1985;Moran and Desimone 1985;Treisman and Gelade 1980;Treisman 1988;Treisman and Sato 1990;Desimone and Duncan 1995)). Koch and Ullman (1985) suggest that early selective visual Fig. 1 The green circle in image 1 and the larger circle in image 2 are likely to attract viewer's attention attention emerges from selective mapping from the early representation into a non-topographic central representation. The early representation consists of different topographic maps, in which elementary features, such as color, orientation, direction of movement, etc., are represented in parallel. At any instant, the central representation contains the properties of a single location in the scene-the selected location.

Saliency maps
The concept of the saliency map was first introduced in Koch and Ullman (1985), on the assumption that conspicuity of a location in a scene determines the level of activity of the corresponding units in the elementary maps. An early model of saliency-based visual attention for rapid scene analysis by Itti et al. (1998) was built on this strict hypothesis of a saliency map, that low-level visual features attract visual attention and determine eye movements in the initial inspection of a scene, regardless of cognitive demands. In Itti and Koch (2001), however, the authors argue that a more advanced attentional control model must also include top-down, i.e., cognition-based influences, as a simple architecture based solely on bottom-up selective attention can only describe the deployment of attention within the first few hundreds of milliseconds.
A majority of researchers today agree that both top-down and bottom-up processes influence the allocation of attention. However, there is no agreement regarding the question of to what extent those processes influence attentional selections. The results of the experiment deploying eye-tracking, presented in Underwood et al. (2006), confirmed that the observer's goals and expectations do influence the fixation patterns, and that the task demands can override the saliency map. The study presented in Donk and van Zoest (2008) showed similar results. The authors found that saliency is not persistently represented in the visual system, but only for the time of a few hundreds of milliseconds. After this interval has passed, the visual system only holds information concerning object presence, but not information concerning the relative salience of objects, and top-down control overrides bottom-up control. The results of the study by Parkhurst et al. (2002) showed different results. Namely, while attention was most stimulus-driven just after a visual content was presented, it remained stimulus-driven to smaller extent even after the activation of topdown influences. Even the analysis presented in Orquin et al. (2018) showed that bottom-up and top-down processes do not operate in different time windows, but are active simultaneously.

Saliency and decision-making
In an early study concerning the impact of visual saliency on decision-making, Glaze et al. (1992) found that the vividness of graphic information may increase its use in decision-making, and that components of decision-making that are most accessible, i.e., most clearly addressed by the information, are likely to be the focus of decision-making. The assessment of the impact of framing effects on decision-making presented in Lurie and Mason (2007) showed that visual saliency moderates the effect of positive versus negative frames on judgment. An interesting finding presented in this study was that the attraction effect is more likely to influence decision-making if the visual representation used display's information by criteria, rather than if the information is displayed by alternative. The influence of criteria saliency in graphical representations was also demonstrated in Sun et al. (2010). Kelton et al. (2010) found that information presentation can affect the decision-maker influencing his/her mental representation of the problem, and influencing his/her characteristics such as involvement and task knowledge. A study by Orquin et al. (2018) showed that visual biases such as saliency may lead decision-makers to focus their attention in ways that are arbitrary to their decision goals. The results of experiments presented in Lohse (1997) demonstrated the importance of attention for choice behavior. The authors found that consumers choosing businesses from telephone directories viewed color ads 21% longer than non-color ones, and that they viewed 42% more bold listings than plain listings, spending on average 54% more time viewing ads for businesses they ended up choosing. Similar results were obtained in Milosavljevic et al. (2012), showing that , when making fast decisions, visual saliency influences choices more than preferences do, and that the bias is particularly strong when the preferences among the options are weak.

Methodology
The study is based on a user performance experiment, carried out in order to obtain data for rigorous quantitative analysis. Participants worked on a simple multi-criteria decision task using a web application developed for the purpose. In this section, we present the decision problem (3.1), experiment design (3.2), data sets (3.3), a brief overview of the web application structure and features (3.4), the type of collected data (3.5), the details of visual representations used in the evaluation (3.6), and the explanation of the performance metrics used to assess choice quality (3.7).

Decision problem scenario
When choosing a decision task for evaluation studies, it is first and foremost important to provide a task to which all participants can relate. The decision task we used in this study was to choose a hotel for a holiday stay. Participants were presented with 50 different alternatives, i.e., 50 hotels, and asked to choose the most preferred alternative. Regarding the complexity of the task in terms of number of criteria, we opted to keep it low, as increased complexity is shown to lead to the use of simplifying decision strategies (Timmermans 1993). Payne (1976) found that increased complexity often leads to decision-makers resorting to heuristics, such as elimination-by-aspects. In the present study, each alternative was described in terms of five criteria: Price, Distance to city center, Cleanliness, Service and Breakfast.

Experiment
The experiment was run on the Amazon Mechanical Turk 1 crowd-sourcing platform. A total of 153 participants took part in the experiment. We did not impose any requirements regarding participants' background, knowledge or skills.
At the beginning of the experiment, participants were presented with the explanation of the process of assigning the SWING rating values to virtual alternatives (see Sect. 3.7). They were then asked to assign rating values to the virtual alternatives representative to both data sets. Those rating values were then used to calculate criteria weights based on participants' preferences. After completing the rating process, participants were presented with the explanation and examples of either parallel coordinates, or scatterplot matrices, depending on which of the two techniques was randomly assigned first. After getting familiar with the technique, participants proceeded to the first task. After completing the first task, the participants were familiarized with the second technique and then performed the second task. After completing both tasks, the participants answered a questionnaire.
The experiment followed a two-factor design with visualization as a within-subject factor and saliency as a between-subject factor. In order to counterbalance the order of the within-factor and to maintain comparable group sizes across the between-factor, participants were quasi-randomly assigned to one of the following test sequences: 1. PC with no saliency (PC_N) followed by SPM with no saliency (SPM_N) 2. SPM with no saliency (SPM_N) followed by PC with no saliency (PC_N) 3. PC with color saliency (PC_C) followed by SPM with color saliency (SPM_C) 4. SPM with color saliency (SPM_C) followed by PC with color saliency (PC_C) 5. PC with size saliency (PC_S) followed by SPM with size saliency (SPM_S) 6. SPM with size saliency (SPM_S) followed by PC with size saliency (PC_S)

Data sets
One potential issue with participants working on the same decision task using different visualization techniques is a possible impact of learning bias. In order to avoid it, we used two different data sets. The list of hotels, as well as the relevant information regarding price and location, was obtained through Trivago web site. Values in terms of price were stated in Euro (the less, the better), and values in terms of distance were given in kilometers (the closer, the better). Values in terms of the remaining three criteria, obtained from the TripAdviser web site, were expressed as ratings on the scale from 1 to 10 (the higher, the better).
The first data set contained fifty alternatives (hotels) in Berlin, Germany, and it was used when the participants worked with parallel coordinates. The second set contained fifty hotels in London, UK, and it was used when the participants worked with scatterplot matrices. Minor adjustments to the values in the second data set were made, in order to fit them into the same ranges of values across the criteria as in the first data set.

Software
The web application used in this study was implemented using D3.js JavaScript library. 2 It consists of three conceptual units. The first unit is a preference assessment unit, used to elicitate a participant's preferences which are then used to calculate the weight for each criterion (Fig. 2). These weights are used to calculate utility values for the alternatives (see Sect. 3.7). The decision unit is the main unit, where participants make their choices. There are six different visual representations of the decision space: PC, PC with color saliency, PC with size saliency, SPM, SPM with color saliency, and SPM with size saliency. Finally, the choice assessment unit is used to obtain a participant's own subjective assessment of the made choice.

Data collection
Saved data for each participants include as well as how confident, on a scale 1-10, the participant is that he/she: -understood the decision task.
-understood the process of rating virtual alternatives. -understood parallel coordinates and used them correctly. -understood scatterplot matrices and used them correctly.
-made the best possible choice with parallel coordinates.
-made the best possible choice with scatterplot matrices.

Visual representation
In our implementation, we use the full matrix for scatterplot matrices, and Inselbergs (Inselberg 1985) original representation of parallel coordinates, where parallel axes represent criteria (dimensions) and polylines represent alternatives. The point in which a polyline intersects an axis represents the value of the alternative represented by the polyline in terms of the criterion represented by the axis. To avoid visual clutter and to utilize screen estate, axes were automatically scaled to the value ranges in the dataset, both for scatterplots and parallel coordinates. We used a static layout with no interactive reordering of axes, rows, and columns, not least to minimize biasing factors between subjects. Visual appearance is consistent in terms of size and color across all six different visualizations (compare 3.2). The default color for alternatives, polylines in PC and dots in SPM, was a medium light yellow with the coordinates [44 , 0.98, 0.55] in terms of the HSL color space. In the visualizations where saliency was used to emphasize the most preferred criterion either deviating color or size were used to mark alternatives along the corresponding criterion axes (both in PC and SPM).

Salient color
For the visualizations deploying color saliency, we chose to show values of alternatives with respect to the most preferred criterion in blue color. The choice of blue as salient color is motivated, as it does not have any apparently misleading connotation in context of the decision task. Also according to opponent color theory, blue is well contrasted against the default color yellow. To assure comparable contrast with the white background, a lightness value close to the one of the yellow was chosen for the blue. In the color literature, saturation is often discussed as a perceptual dimension of color that is associated with uncertainty of a variable (for a comprehensive overview, see, e.g., Seipel and Lim 2017). Therefore we also maintained almost equal saturation levels for the blue color of the salient criterion, which has the coordinates [240 , 0.97, 0.59] in terms of the HSL color space. For SPM, the alternatives (dots) are simply colored blue in scatterplots concerning that criterion. In PC, since the representation of an alternative is a continuous polyline, we opted for a linear transition from default dark yellow to blue color, starting from neighboring axes toward the axis representing the most preferred criterion (Fig. 3). We used the Data Visualization Saliency (DSV) model by Matzen et al. (2017) to assess whether our color enhanced representation is suitable for the purpose, i.e., if visually emphasized areas would draw a viewer's attention as intended. We chose this model as it is tailored to perform well for abstract data visualization. It also showed to agree well with experimental validation using eye-tracking data (Matzen et al. 2017). Saliency maps of our color enhanced visualization obtained by applying the DVS model are shown in Fig. 4.

Salient size
For parallel coordinates, the most preferred criterion is accentuated by increasing its size by 100% compared with the size of the coordinates representing the other four criteria. For scatterplot matrices, for each scatterplot concerning the most preferred criterion, the axis on which that criterion is plotted is increased in length by 100% compared to the remaining axes. Furthermore, the size of dots in the plots concerning the most preferred criterion is set to 4 pixels, compared with a dot size of 3 pixels for the remaining plots. One example of each visualization is shown in Fig. 5.

Interaction
During the pilot studies prior to the experiment, we noticed that a majority of participants, regardless of the visualization technique and the saliency enhancement they were working with, concentrated almost exclusively on filtering feature and made their choices by adjusting the thresholds until a single alternative was left. For that reason, although data filtering (PC and SPM) and dimension reordering (PC) are useful and frequently used interaction features, we opted not to enable them in the final version of the web application used in the experiment.

Performance metrics
Due to the subjective nature of decision-making, there is never an objectively best outcome, i.e., an outcome which would be best for every decision-maker. In addition, the quality of a choice is difficult to assess   2018) calculated desirability scores representing the consistency between a participant's choice and his/her self-reported preferences as an indicative measure of accuracy. We deploy the same principle; however, we use a different metrics to elicit participants' preferences. Dimara et al. (2018) used rating of the criteria importance (0-10) to calculate criteria weights. Comparing the importance of different criteria without considering the actual degree of variation among the alternatives was criticized by many (e.g., Hammond et al. 1998;Keeney 2013;Korhonen et al. 2013). It introduces a level of abstraction which, together with the possibility of participants not being able to perfectly express their preferences, is likely to introduce further noise to the accuracy metrics, as pointed out by Dimara et al. (2018). To eliminate this level of abstraction and minimize risks of further biases, we use SWING weighting (Parnell 2009), which considers the value ranges in criteria, to collect data about participants' preferences. These data are then used to calculate weight coefficients for the utility functions (criteria weights) of all criteria (see Clemen and Reilly 2013). SWING weighting is based on comparison of n þ 1 hypothetical alternatives, where n is the number of criteria. One of the alternatives, the benchmark alternative, has the worst value in terms of all n criteria, and its grading value is set to zero. Each of the remaining n alternatives has the best value in terms of one of the criteria, and worst value in terms of the others. The decision-maker assigns a grading value 100 to the most preferred alternative. In the example in Fig. 6, it is alternative A2. Then the decision-maker assigns the grading values for the other alternatives in a way that reflects his or her preferences. In the example, the decision-maker assigned the following values: A1 : 85; A2 : 100; A3 : 60; A4 : 40; A5 : 75. Fig. 5 The decision unit using parallel coordinates (left) and scatterplot matrices (right) with size saliency Fig. 6 The preference assessment unit after a participant has assigned grading values to the alternatives G. Milutinović et al. The grading values of the virtual alternatives, g i , are used for calculations of weight coefficients by normalization (values between 0 and 1), where n is the number of criteria. For example, the weight coefficient for utility function of criterion Price in the example above is We assume that the utility is linear for all criteria and calculate the utility values of the actual alternatives by normalizing the criteria values, v i . For Cleanliness, Service and Breakfast, the utility values of the alternative a are obtained as and for Price and Distance to city center, which are ''the less, the better'' type of criteria, the rescaled values are calculated as where v i max is the maximum value for criterion i, and v i min is the minimum value for criterion i. The weighted summation method is then used to calculate the total utility value u for each alternative as For our evaluations we use two metrics, denoted as Q and R. The value of Q expresses how consistent the participant's choice is with his/her self-reported preferences. It expresses the closeness between the selected alternative A and the alternative H that has the highest utility value based on Eqs. (3)(4)(5). Q is calculated as the proportion of the total utility of the selected alternative, u A , out of the total utility of the best alternative, u H , according to participants' preferences, i.e., As such, Q is indicative of the quality of choice. The value of R is calculated considering only the most preferred criterion. It is based on the highest and the lowest values of that criterion, v i max and v i min , respectively, and the value in terms of that criterion of the alternative the participant selected, v i ðaÞ. For example, if a participant chose the hypothetical alternative A2 from the example in Fig. 2 as the best one, R is calculated for the criterion Distance. The best value for Distance is the lowest value, v D min ¼ 0:1 km, and the worst value is the largest value, i.e., v D max ¼ 13:2 km. Suppose that the value for Distance is 0.8 km for the alternative which the participant selected, i.e., v D ðaÞ ¼ 0:8 km. In this example R is calculated as When the highest criterion value is the best one (Cleanliness, Service and Breakfast), R is calculated as In other words, R measures the score of the chosen alternative a in terms of the most preferred criterion, and as such, it is indicative of the participant's attachment to that criterion. It is important to note that R does not tell us anything about the total utility of a.

Results
Prior to the experiment, we carried out a pilot study. The results of the pilot study and post-experiment conversations with those pilots indicated that the minimum time needed to complete a task was twenty seconds per decision scenario. Based on that, we decided that the results for 20 out of 153 participants who spent less than twenty second working on any of the two tasks could not be considered reliable and should be discarded. Of the remaining participants, 44 participants worked with plain representation, 45 participants worked with representation with color saliency, and 44 participants worked with representation with scale saliency. Statistical analysis of results was carried out using an estimation approach instead of commonly used null hypothesis significance testing, offering nuanced interpretations of results (see Cumming 2014;Dragicevic 2016). Our estimations are based on confidence intervals and effect sizes. We followed recommendations by Cumming (2014), based partly on Coulson et al. (2010), and neither reported nor made any conclusions based on p values.
We used R for inferential statistics, with the bootES package (Kirby and Gerlanc 2013) for calculation of bootstrap confidence intervals. For calculations and plotting, we used modified R code developed by Dimara et al. (2018), available at https://aviz.fr/dm. Inferential statistics with regard to the decision quality are given in Sect. 4.1, with regard to participants' attention in Sect. 4.2, with regard to time in Sect. 4.3, and with regard to participants' perception of the techniques and confidence in Sect. 4.4.

Decision quality
No noticeable difference in performance was observed between groups working with scatterplot matrices with different saliency modes. For parallel coordinates, the results showed clearly better performance in the group working with color saliency, compared to the group working with the basic visualization with no saliency and the group working with size saliency. For PC_C -PC_N, the average increase in decision quality was 0.135, and with 95% probability not lower than 0.043. For PC_C -PC_S, the average increase was 0.122, and with 95% probability not lower than 0.034 (Figs. 7, 8).
A comparison of the results for the within-subject variable visualization (parallel coordinates and scatterplot matrices) for all types of saliency reveals a clear difference. Participants performed noticeably better when working with parallel coordinates compared with using the scatterplot matrix (Figs. 9, 10).

Attention
Attention to salient parts of a visualization can be measured with gaze tracking. However, due to the design of our study as a web experiment, this is not a viable approach. We therefore characterize users' attention indirectly in terms of their attachment to the most preferred criterion (R), and by quantifying their interaction with the visualization close to this attribute.
Results for the R-value of the chosen alternative show a similar pattern as the results regarding the decision quality. There is a strong indication of difference in R-value between participants working with parallel coordinates with color saliency and participants working with parallel coordinates with no saliency. The average increase in R for PC_C -PC_N is 0.092, and with 95% probability not lower than 0.002. However, there is no noticeable difference between color saliency and size saliency. For the visualizations with scatterplot matrices there are no clearly evident differences for different modes of saliency Fig. 7 Q-value means for each saliency mode (no saliency, color saliency, and size saliency) for parallel coordinates (PC) and scatterplot matrices (SPM), respectively (Figs. 11,12). A comparison of the results for the within-subject variable visualization (parallel coordinates and scatterplot matrices) for all types of saliency shows no notable differences (Figs. 13, 14).
To quantify users' interaction we analyzed the recorded mouse data, which comprised timestamps and positions of the mouse when clicked. Based on spatial proximity to the visualized variable with the highest weight, such mouse interactions where classified as near the salient coordinate. The analysis of click tracking data for parallel coordinates shows indication of difference between participants working with color saliency and participants working with no saliency. Participants were more likely to concentrate clicks near the coordinate representing the most preferred criterion when working with color saliency. On average, 47% of all clicks in the PC_N group were near the coordinate with the highest weight, compared to the PC_C group, where 65% of clicks were near that coordinate. No noticeable difference was detected for different saliency modes for SPM (Figs. 15, 16). However, the percentage of clicks in a plot concerning the most preferred criterion when working with SPM is clearly higher than the percentage of clicks near the coordinate representing that criterion when working with PC (Figs. 17, 18).

Time
In terms of the time spent on the task, the results indicated no difference between representation types for parallel coordinates. For participants working with scatterplot matrices, there is a weak indication that participants may tend to spend more time on a task when working with color saliency, compared to size saliency or no saliency (Figs. 19, 20). On average, participants spent 15% more time working with SPM, compared to working with PC (Figs. 21, 22).

Perception and confidence
Participants' ratings show that, on average, participants understand the parallel coordinates technique better than scatterplot matrices, and that they are more confident in their decisions when working with parallel coordinates (Figs. 23,24,25,26). This is consistent with the results concerning the decision quality (Sect. 4.1).

Discussion and conclusion
Wouldn't it be appealing to use visual saliency in visualizations for MCDA to direct decision-makers' attention toward criteria of their highest preference, if that would help them to arrive at better decision outcomes? On the other hand, given humans' limited cognitive capacity, wouldn't too much of attention on some preferred criteria also come with the risk of overlooking, or at least underestimating, the value of remaining criteria for the total utility of the chosen alternative? The overreaching goal of the study presented here was to investigate, if preference controlled saliency in visualizations of multiple attribute datasets has Fig. 19 Means for time in seconds spent on the task for each saliency mode (no saliency, color saliency, and size saliency) for parallel coordinates (PC) and scatterplot matrices (SPM), respectively  Altogether, the results from our experiment show that the quality of decision outcomes differed not only depending on the mode of visual saliency used (or if no saliency was used), but also depending on the employed visualization technique. We feel confident to state that visual saliency-based enhancement on the most preferred criterion did not lead to any adverse effect, i.e., decision quality did not degrade, no matter if color or size were used as facilitating visual variables and regardless of the chosen visualization technique (scatterplot matrices or parallel coordinates). On the other hand, we could observe favorable effects, i.e., improved decision quality, under certain conditions. More specifically, visual saliency, when facilitated by means of color, led to substantial improvement of decision quality in terms of our quality metric Q, but only when parallel coordinates were used for visualization. Compared with that, scale as a visual variable to accomplish saliency did hardly exhibit any positive effect on decision outcome in any of the visualizations in our study. This is unexpected, considering that the 100% scaled-up attribute axis/scatterplots consumed more screen estate leading to less cluttered representations for these attributes. Evidently, the degree to which visual saliency is influential to the quality of the outcome in MCDA tasks as studied here, varies depending on the visual variable used to facilitate visual saliency. Effect sizes in terms of increased decision quality are also most likely a matter of parameter tuning, i.e., optimal choices of chromaticity differences and scaling ratios. Regarding our choice of salient color, we made a perceptually informed best attempt by choosing opponent colors and considering other constraints. As for chosen 100% up-scale factor, there seems to be room for improvement. More research will be needed in the future to establish the relationship of those parameters on effect size, as well as their sensitivity to other factors such as, e.g., task complexity.
The total absence of effects of visual saliency (both color and scale) in the scatterplot matrix visualizations may, at least to some extent, be explained with observed longer task completion times (89 seconds on average for parallel coordinates, 123 seconds for scatterplot matrices). From a practical point of view, the increased times for SM are most likely not relevant, however, they suggest that with scatterplot matrices users had to put more effort-by interacting and thinking-into the task. Indeed, the scatterplot matrices were also rated more difficult to be understood by subjects in our study (see also 4.4), users interacted more with them in terms of mouse clicks, and yet they reported to be less confident with their choices. Altogether, this leads us to conclude that users spent more cognitive efforts on the task when working with SPM. This, by comparison with parallel coordinates, increased amount of top-down processing is to our belief a factor that overrides, or at least counteracts, the effects gained from increased attention from visual saliency in the short time bottom-up processing phase of visual stimuli, as discussed in Donk and van Zoest (2008). From this we lean toward the conclusion that visual saliency is probably more effective in multiple-criteria decision tasks that require fast user response such as in crisis management or alarm handling.
Decision quality Q in our study was measured in terms of how close (in percent) the subject's choice is to the best alternative based on the subject's own preferences. Except for the parallel coordinate visualization with color saliency, these values are around 0.62-0.66 on average (see Fig. 7). These numbers are surprisingly low, and they illustrate, that choosing best alternatives is a difficult task even in limited multiobjective decision-making situations. For the parallel coordinates with color saliency, decision quality was  close to 80% on average. This means for the chosen alternative an improvement, which in practice indeed can make a considerable difference in terms of criteria values. Therefore, and in light of the fact that none of the visualizations with saliency introduced any adverse effects on decision quality, we consider it a rational design choice, to employ preference-controlled visual saliency in visual tools for multi-criteria decisionmaking.
Another result of our study relates to how visual saliency affects users' attention to the most preferred criterion. Due to the design of our experiment as a web-based experiment, the use of gaze-tracking for validation of users' attention was not a viable option. Instead, we first used Data Visualization Saliency (DSV) model by Matzen et al. (2017) to qualitatively assess if the intended visual saliency is maintained in our visualizations. For the experimental evaluation, we devised two indirect measures to capture users' attention on their most preferred criterion. The R-value describes the chosen alternative's score only with respect to this criterion. In addition, we analyzed how much users interacted with visual elements representing this criterion by determining the percentage of mouse clicks nearby those elements. We note that visual saliency, regardless of the visualization method, led users to choices, which are in favor (in terms of high R-values) of the most preferred criterion, which is consistent with a strategy of maximizing score on this criterion. Significantly increased scores were, however, only observed for the parallel coordinates visualization with color saliency (see Fig. 12), which is consistent with the pattern already found for decision quality. Increased attention on the most preferred criterion under the use of visual saliency became also evident in terms of percentage mouse-clicks nearby that attribute. However, although differences are on average as large as 20% (see Fig. 16) they are not significant in terms of a 95% confidence interval.
Assessing decision quality in MCDA tasks in an objective way is a delicate undertaking due to the inherent subjective nature of individuals' preferences. The approach chosen by Dimara et al. (2018) who suggested a metric based on subjects' compliance with their own preferences is a very appealing approach to this problem. In their work the authors used rating on a normalized scale for direct elicitation of user preferences, and they point out the risks of bias caused by user's difficulties to express their criteria preferences. We highly agree with their discourse and we strongly believe that some of these difficulties arise from the abstraction induced by direct criteria ranking using standardized (abstract) scales. To alleviate this, we suggested to use an alternative approach, SWING weighting, as a method to elicit users' criteria preferences, whereby users had to relate to the real value ranges (and units) of the attributes. By that, we believe to reduce one level of abstraction and thus to reduce inherent bias in the preference elicitation phase. Albeit, based on the results of our study, we cannot preclude that participants in the study, knowingly or not, did have difficulties to use SWING weighting correctly to express their preferences. More work is needed, rather in the field of MCDA than within visualization, to study the sensitivity of alternative preference elicitation methods in the context of assessment of decision quality. Another critical aspect to our methodology is the potential risk that participants, knowingly or not, would reassess significantly their preferences if the visualizations they worked on would reveal unanticipated patterns in the data, which is usually the case in a real application. To prevent this, we designed decision scenarios, which exhibited no unanticipated relations or trends between criteria, nor clear outliers in the data sets. This ensures our assumption that participants acted in agreement with their preferences, which is our quality metric.
Revisiting the questions in the beginning of this section, we conclude that in our study no adverse effects of using visual saliency in form of color or size were observed, neither in terms of reduced decision quality nor in terms of efficiency (notably longer time on task). Instead, specific combinations of saliency form and visualization method seem to be favorable in terms of gained decision quality and attribute attachment. Without drawing too far-reaching conclusions, we consider the results very encouraging, and we assert that it is relevant to consider saliency in visualizations for MCDA in different ways: Firstly, by creating an awareness about saliency effects in visualizations using saliency analysis according to, e.g., Matzen et al. (2017) designers can reveal potential risks for biases in visual MCDA. Secondly, this research can inform the design of novel visual MCDA tools and their evaluation. In this context, devising general guidelines on how to design visualizations for saliency is an interesting direction of more research in the future, which in a more general perspective should analyze the effects of spatial layout and use of visual variables on saliency in visualizations.
They can inform the design of novel MCDA tools and visualizations for forthcoming research to evaluate the effectiveness of saliency in visualizations for other MCDA tasks.
Funding Open access funding provided by University of Gävle.

Does visual saliency affect decision-making?
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.