A real-time network-based approach for analysing best–worst data types

Best–worst scaling is a widespread approach in market research used for collecting data on the needs and preferences of people. However, the current preparation of its design and the analysis of the data depends on complex statistical methods. One of the most commonly used models for estimating individual preference probabilities is the hierarchical Bayes model, which can only be applied after the data collection phase. This type of calculation needs more infrastructural background and a large sample to provide accurate estimations. Here, we introduce a new application that enables fast calculations and individual-level real-time estimations, which also has a great potential to ask additional questions depending on the respondent’s answers during live interviews. Our network-based approach (integrating the PageRank algorithm) works well for online surveys, and it supports our dynamic and adaptive, real-time evaluation (DART) of best–worst data types, and results in more relevant decision making in marketing.


Introduction to best-worst scaling
Investigating human preferences is a constant challenge in market research, and currently, best-worst scaling (BWS) is becoming increasingly popular across a wide range of topics. In the marketing world, best-worst scaling, also known as maximum difference scaling (MaxDiff), is one of the simplest ways of measuring preferences (Finn and Louviere 1992;Louviere et al. 2015). The measurement records the best and worst cases from a set of alternatives (brands, products or services, etc.). The term MaxDiff reflects the method of measuring the psychological process behind choosing a pair of alternatives that are at the farthest end of a continuum in interest. Different subtypes of this method are available today, and in many studies, the approach utilized is not a classical MaxDiff. Although MaxDiff is still widely used in market research, the original authors have now suggested using BWS, which is a more general term for these methods and has become widely known in many fields of academic research (Louviere et al. 2015). Following these recommendations, we also use the term BWS in our study, hereafter. Although BWS is a common method, we are experiencing an increasing need for new alternatives; alternatives that can

Introduction to the basic methods and their possible shortcomings
The selected preference alternative pairs and the implied preferences are often coded as dummy or effects-coding variables. The analytical methods assume that the individuals derive a parameter for each alternative in a choice set and differ by the assumption of how respondents select the best and worst alternatives. Specific models used to estimate preference choice parameters and probabilities are, most commonly, hierarchical Bayes, best minus worst counting, multinomial logit/probit model, weighted least squares regression, latent class analysis, and max-diff scalingcoding (Flynn 2008;Flynn et al. 2007;Hensher et al. 2015;Louviere et al. 2015). The commonly known general model of random utility framework (McFadden 1974) assumes that parameters are valid only at an individual level, nevertheless, based on a sample of individuals, estimation of those parameters are also possible.
The usual models for estimating preference probabilities are generally based on specific statistical assumptions rather than behavioural assumptions. Although these methods are well-developed, they have some weaknesses. Accurate estimations of individual-level parameters require large sample sizes. Gathering the appropriate number of responses often results in a lengthier data collection phase. This, in turn, hinders the onset of analysis, meaning that obtaining results is slower and often more expensive. Additionally, based on a large set of responses, estimations applying commonly used models can be misleading. If we were to fill gaps in the individual-level parameters, it might lead to inadequate estimations. The information to accurately predict them, due to the design of BWS is simply incomplete.

Specific aims and expected applications
Our aims in this study are to (1) show examples of inadequate estimations using different datasets, (2) provide an alternative approach for analysing BWS data,and (3) demonstrate the additional benefits of using this new approach in market research. For these purposes, we calculated the hierarchical Bayes estimations of preference probabilities using our own and freely available datasets. We developed a strategy using network-based approaches that can be applied to BWS data recording and analysis and can make the processes faster and more cost-effective. Additionally, our solution enables us to design dynamic and adaptive surveys with real-time analysis of best-worst investigation types to predict preference probabilities at the individual level. We also discuss how this real-time network-based evaluation of preference probabilities can improve the accuracy and thus, the quality of estimations. Furthermore, we present our own experimental data collection which demonstrates our approach, and we also discuss post-hoc analysis of preference probabilities and validity checking.

Materials and methods
Our research had two major stages. First, we investigated previously published databases and results to describe and present the existing issues with currently used designs and statistical calculations (Exploratory stage). Second, we created an experimental design for the introduction of a novel approach and the comparison of its performance with the common approaches (Experimental stage).

Data sources and basic analysis
We have examined three databases and a self-developed experimental design using best-worst type inquiries. All online databases contained the designs and the results of standard MaxDiff questionnaires. We then compared the estimation of individual preference choice probabilities given by the hierarchical Bayes method ('HB' hereafter) and by our network-based approach.
First, we examined a publicly (no-cost) available dataset (web1) based on 302 respondents asked about ten technology brands (section "Example: technology"). Each respondent received the same set of questions, where each question contained a combination of 5 brands. The corresponding design of the six blocks of questions (Table 1) could also be downloaded (web2). Here, we also indicated an example of best and worst choices (see section "Graphical representation of choices among alternatives").
Secondly, we used a dataset that included 20 ice cream alternatives (section "Example: ice creams"). Unlike the Technology dataset respondents, who received the same design, these 251 respondents received a variety of designs. Each of them included 12 blocks with 5 items per block.
Thirdly, we investigated a dataset that included 15 employee benefits (section "Example: employee benefits") such as, discount in a local restaurant. The 301 respondents received different designs, each of which included 9 blocks with 5 items per block.
We estimated individual-level preference probabilities applying the HB model implemented in Sawtooth (Chrzan and Orme 2019; Lipovetsky 2018; Lipovetsky and Conklin 2015; Orme 2009) using the three online databases. For our experimental dataset, we used the HB method available in Displayr (Q Research Software 2020), which tool can also provide 'Sawtooth style' estimations. Therefore, the values can easily be compared.
Then, we recalculated the preferences probabilities using the PageRank algorithm (Dode and Hasani 2017;Page 2006) to compare the results with our real-time evaluations. We provide a detailed description of our experimental design and networkbased approach in the subsequent sections. We also calculated Pearson's correlation coefficients to measure the level of similarity in the HB and PageRank estimations. We used Mathematica© (Wolfram Research 2020) for calculations and the preparation of the figures.

Experimental design
We developed a possible strategy that could be used in real-time with many data collection tools, such as online surveys in market research (Fig. 1). The basic idea is to ask additional questions that facilitate real-time correction of the preference probability estimations. These additional questions are dynamically built based on the answers of the respondents.
The infrastructural background for our dynamic and adaptive real-time (DART) evaluation was provided by DataExpert Services Ltd (http:// www. datae xpert. hu/). The aim was to test our solution in analysing best-worst data types. To achieve this goal, we picked a topic to investigate human preferences, familiar to many people-dog breeds. The survey was hypothetical and cannot be used for predicting preferences of dog breeds. However, the answers were given by people and represented their real preferences corresponding to our behavioural assumptions. The HB method for analysing BWS data requires specific conditions to appropriately address predictions. As such, we recommend a new application which can be quickly calculated using individual data in real-time.
The online survey was programmed in the UNICOM® Intelligence v7.0 software family (web3). The questionnaire had three main parts: (1) demographic questions, (2) two rounds of BWS questions, and (3) open questions. The designs of the two BWS blocks were pre-calculated and programmed into the survey (see also Supplementary Information-BWS_design). After the first round of BWS questions, the respondents' answers were sent for calculation, and the estimated values were returned to the questionnaire in real-time. Based on the top 5 statements, the second BWS round was adaptively modified for each respondent. Finally, we repeated the same calculation after the second set of BWS questions and asked 3 open questions relating to the respondents' top 3 preferred statements.
To compare the changes in preference uncertainties, we calculated entropy for both rounds (Attneave 1959;Shannon and Weaver 1949). Entropy is a measure of information, choice and uncertainty. If the value of entropy decreased after a certain action (in our case after asking a more targeted question), it means that the Fig. 1 The general steps for collecting survey and best-worst data (on the left) and our proposed solution and possible improvements (on the right) information content of our data increased and the uncertainty decreased. Consequently, we got more accurate data and hence, a clearer preference on the alternatives. With p 1 , p 2 , …, p n as preference probabilities, entropy is calculated as: Assuming a uniform distribution when the uncertainty is maximum, the entropy is also maximum. In contrast, if one of the preference probabilities approaches 1, then entropy approaches 0, which means that the uncertainty decreases, and the preference choice becomes more certain. To further evaluate the validity of the extra question, we also measured the time (seconds) spent on each task during the online interviews.
Moreover, in market research one of the most important next steps is the segmentation (assigning respondents into groups) based on their preferences, i.e., based on the BWS results. Therefore, we applied hierarchical cluster analysis (Kaufman and Rousseeuw 1990) with the Ward linkage method on our BWS data (Louviere et al. 2015).
We established the real-time connection between the survey and a server with R v4.0.3 (R Core Team 2020), set up in the Amazon Web Services (web4) environment. We built and installed a custom R package, specifically for processing the data and preparing the inputs for visualising the results.
In the final step, we redirected the respondent to a dynamic online dashboard created in Microsoft Power BI v2.87.1061 (web5), where the results appeared immediately after finishing the survey.

Graphical representation of choices among alternatives
Let A = a 1 , a 2 , … , a n , where n ≥ 2 denotes the choice set; the set of all possible alternatives. An element a i ∈ A may denote, for example, a brand or a profile of a brand, depending on the context of the inquiry. The subset B ⊂ A denotes a block of the alternatives from which a respondent chooses one best and one worst alternative. Suppose we are given a set of blocks B 1 , B 2 , …., B n and for each block, the task of a respondent is to choose one best and one worst element. Given the choices, we may code them according to the standard graphical terminology of vertices and edges. The alternatives correspond to the vertices and the preferences implied by the choices correspond to the directed edges between the alternatives. For simplicity, suppose that the number of alternatives is 10, B 1 = a 1 , a 2 , a 3 , a 4 a 10 , and B 1 was presented to the respondent and they selected a 1 as best and a 10 as worst. The preference of a i over a j will be denoted as a j → a i and the set of implied preferences {a 2 → a 1 , a 3 → a 1 , a 4 → a 1 , a 10 → a 1 , a 10 → a 2 , a 10 → a 3 , a 10 → a 4 } may be represented by the graph, where only the indices are used for denoting the corresponding vertices (Fig. 2).

Network-based analysis of the preferences
The orderings of the alternatives, based on the best minus worst counting give rather raw information of respondent preference values and probabilities. However, if we have a sample of respondents as well as their choices, specific statistical modelling may give a more sophisticated estimation of the preference probabilities of individuals or individual classes. Unfortunately, these statistical models (mostly a version of the random utility model family) require a considerable number of technical assumptions, such as error distributions, of which validity is critical. Some popular statistical models, such as the HB model use the specific individual choices and the full sample data during the estimation of individual preference utilities/probabilities. Regarding preference choices, this can be problematic due to the assumption of individuals influencing each other.
As a result, we recommend using the PageRank algorithm (Dode and Hasani 2017; Page 2006) which can provide a vector for the estimation of individual preference choice probabilities. It can also establish some general behavioural assumptions about how people follow a preference network, especially their network represented by a graph based on their own preference choices (see Fig. 3). PageRank is a centrality index of a graph and represents the likelihood of a respondent randomly following their directed preference links by the number of visits at vertices/alternatives. This kind of behaviour may correspond to how we think about, for example, brands, that is, we may Internet surf on a hypothetical cognitive network represented by our preference graph. Consequently, the probabilities of preference alternatives correspond to the likelihood of visiting the vertices/alternatives. PageRank uses information on the linkage connections among vertices/alternatives, with the meaning of alternatives negligible (Dode and Hasani 2017;Page 2006). An alternative becomes preferred if preferred alternatives link to it, which is recursive because the preference of an alternative refers back to the preference of other alternatives that link to it.
Any vertex can be a starting vertex (in most applications with a probability of 1) and if a link arrives at a vertex which has no out-link (generally referred to as a dangling node), the respondent is randomly taken to another vertex/alternative. Another situation is when a respondent 'spends a lot of time' following preference links, loses interest, moves to another random vertex/alternative and continues following preference links until they once again lose interest. The so-called damping parameter (between 0 and 1 and 0.85 by default) which may be interpreted as a kind of decay factor, refers to the perseverance of the preference links that follow. A low damping parameter indicates a respondent jumping between vertices/alternatives randomly, and as a consequence, an almost uniform distribution of preference probabilities. However, if we suppose that respondents persist in following preference links, then we should use a higher damping parameter. It is worth noting here that PagRank has no algorithmic limitation and by setting additional parameters, it can be more flexible. However, since this approach is not a statistical method, we are unable to draw statistical conclusions, therefore, significance testing is currently not available. Figure 4 shows PageRank preference probabilities based on the graph in Fig. 2. The actual probabilities are {0.369, 0.055, 0.060, 0.101, 0.056, 0.180, 0.042, 0.042, 0.047, 0.047}, respectively.
To better understand the idea behind the PageRank vector of preference probabilities, Fig. 5 shows several examples, including the graph and the corresponding PageRank vector of preference choices.
Links in Fig. 5a and b are equal except (b) includes all the links of (a), but also link 2 → 7. This new link causes a change in the PageRank preference probabilities of alternatives 5, 6, and 7, i.e., alternative 7 becomes most important because it gets an extra link from alternative 2. For this reason, alternative 5 drops a little, and because now, alternative 2 shares its out-degree with 5 and 7, the importance of links from alternative 2 also dropped a little. Alternative 6 gained a little because all its previous links have remained and alternative 5 also dropped. The links in Fig. 5c and d differ by the extra link 1 → 3 in (d), and this causes changes again in Fig. 4 PageRank preference probabilities of alternatives the PageRank preference probabilities for the same reasons that applied for (a) and (b). It is important to note that alternative 1 gained a bit of importance as it now has an additional longer path which influences preferences of other alternatives. We can say that from alternative 1 there are more possibilities to reach alternatives 3, 4, 5, 6.

Example: technology
The correlation between the individual HB and PageRank estimations is high (mean ± SD = 0.848 ± 0.079), but at the same time, it differs enough to examine the correspondences deeper (Fig. 6). Regarding the alternatives, the correlations between HB and PageRank estimations are as follows: alt1: 0.888, alt2:  834. These correlations are also high but need more investigation to determine their differences. We calculated the absolute value differences between the estimations for each individual (Fig. 7). The histogram illustrates well that the mean value (0.119 ± 0.028) of the differences is not negligible, meaning that it is necessary to also check the observed individual responses and compare them to the estimations.
Firstly, we examined CASE ID 21 (the maximum of the differences between the two estimations is approximately 0.150), which interestingly illustrates the differences between the basic ideas of the PageRank and HB estimations (Tables 2 and 3).
The PageRank preference choice probabilities are shown in the graph, where sizes of red colour vertices correspond to the probabilities (Fig. 8).
In CASE ID 113 the maximum of the differences between the two estimations is 0.128 which shows that in some cases the HB estimation gives an acceptable result. However, the PageRank results fall more in line with our natural view of preferences (Fig. 10, Tables 4 and 5).   At this point, the HB estimations are important because an anomaly can be detected, especially regarding the estimations of alternatives 2 and 3 ( Table 5). The discrepancy between the two estimations (i.e., alternative 2 = 0.149 and alternative 3 = 0.012) is too large-because the in-and out-degrees are equal, these values should be much closer. Moreover, neither alternative 2 nor alternative 3 were chosen in any of the blocks as best or worst, therefore, there is no reason for these findings. Alternatives 1 and 6 also show unexpected HB estimations. Their values are almost equal but alternative 1 was chosen as the best in block 6 (Table 4), hence the preference value of alternative 1 should be significantly higher than alternative 6 ( Table 5). Regarding PageRank estimations, no anomaly can be found and all the estimated preference choice probabilities are understandable and acceptable.

Example: ice creams
We calculated both the HB and PageRank estimations of the preference choice probabilities based on the ice cream dataset, and here, found even more discrepancies between them compared to the estimations of the earlier Technology example. The results of the two methods were correlated again (0.790 ± 0.160) and the maximum differences between the two estimations were found to be a bit higher (0.225 ± 0.110).
Of course, we do not expect that the estimations are fully synchronised with the observed best-worst choices (as it sometimes happens with PageRank estimations as well), but we can highlight (CASE ID 19) several anomalies we found of HB individual estimations of preference probabilities. The design for CASE ID 19 is presented in Table 6. From the design we can see that ice cream 2 and 11 are not found in the same block, i.e., they were not compared at all.
We can see that ice cream 2 is the most preferred one regarding the estimated HB preference probabilities (0.714) and also the in-degree (12) and out-degree (0) values. However, what can we say about ice cream 11 (Table 7)? The HB estimation of ice cream 11 (0.071), the in-degree (9) and out-degree (1) values together do not seem to be in sync with the HB estimation of ice cream 2. Moreover, as mentioned, due to the design (Table 6) the two ice creams were not compared. Consequently, there seems to be no reason for the high discrepancies between the observed implied preferences and the HB estimations, especially as ice cream 2 is approximately 10 times more preferred to ice cream 11. Regarding PageRank estimations, the values correspond more to the implied preferences and, because of the damping parameter, the discrepancies, between ice cream 2 and 11, for example, are more clearcut.

Example: employee benefits
Using the employee benefits dataset, we calculated again both the individual HB and PageRank estimations of the preference choice probabilities, which were correlated (0.756 ± 0.177). The maximum differences between the two estimations for each respondent were similar to the previous examples (0.119 ± 0.052).
We present an example CASE ID 431 because it clearly shows the problems with individual HB estimations with the usual BWS designs. The design and choices are given in Table 8.
We can see that benefit 6 was always selected as best, the in-degree is 12 and the out-degree is 0, and it was never in the same block with benefit 1 and 12  (Table 9). The individual HB estimation of preference probability for benefit 6 is 0.469, which seems to be a good estimation. However, we can also see that benefit 6 was in blocks including low preferred benefits (2,4,5,7,8,9,11,13,14,15) and their total preference probability is 0.334. As a result, benefit 6 is better than the lower preferred 10 benefits but was not compared to the remaining 4 benefits (1, 3, 10, 12) which have a total probability of 0.196. In addition, the second most preferred alternative is benefit 5 but it is also not in the same block as benefit 1 and 12. However, as the PageRank damping parameter provides some 'movement' between the alternatives, it has enabled some comparison of all the benefits.
We also found a significant difference in entropy values between round 1 and round 2 of BWS questions, which indicates that asking more targeted questions (either a single or a series) can improve the results. Entropy values were separately calculated for round 1 and round 2 PageRank values of all respondents. Lower values correspond to stronger and clearer preferences. The mean entropy values were 3.428 with a standard deviation of 0.213 for round 1, and 3.317 with a standard deviation of 0.215 for round 2. Since entropy significantly decreased after round 2 (S = 32, p = 0.011), it indicates that preference probabilities became significantly clearer and more accurate. Furthermore, considering a raw estimation of sample means, the order of the statements was 2, 8, 10 in round 1 but changed to 2, 10, and 8 after round 2.
Based on the results of the cluster analysis (Table 11), it can be seen that groups 1 and 5 are the same in both rounds, group 2 differs little between the rounds, however, groups 3 and 4 share fewer common items. Also, considering the decrease in entropy values after round 2, these groups could be hiding more relevant information. Consequently, concerning important marketing strategies, different conclusions could be made after this second round.
The time spent on the first task during round 1 was high (mean 41.670 s) due to the investigation of the task, as a result, we excluded it from the analyses. The average time spent on round 1 was 16 s with a standard deviation of 13 s but for round 2 it was 23.210 s with a standard deviation of 23.880 s. The time values for round 1 and round 2 are significantly different (S = 34, p = 0.002), which supports the hypothesis that respondents are thinking significantly more during the comparison of the top 5 statements, i.e., while deciding the best order. This may also indicate the validity of the results.

Discussion
Best-worst scaling is a commonly used technique in market research, however, the analytical approaches used to estimate respondents' preference probabilities often require specific technical background and more estimation. We demonstrated some examples of possibly misleading outcomes using the hierarchical Bayes method (Lipovetsky 2018;Lipovetsky and Conklin 2015) and introduced an alternative, network-based approach applying the PageRank algorithm (Dode and Hasani 2017;Page 2006). We provided, for the first time, a process in which all steps are performed in an online environment. Our solution is quick and can be used in dynamic and adaptive real-time surveys (DART), based on the respondents' answers. It also performs with similar efficiency either applying on a single answer set or any (small or large) number of responses.
Using the estimated values of the Technology dataset, let's take the example of four competing tennis players (in our case alternatives 1, 2, 3 and 4). If one player (alternative 3) was defeated by the other two players (alternatives 2 and 4), then we can assume that alternative 3 is the worst. However, if alternative 3 plays and defeats the champion (alternative 1) and the other two players (alternatives 2 and 4) do not win any more games, then we can say that alternative 3 is better than the other two. This kind of property of PageRank highlights the differences between HB and Pag-eRank estimations, and also explains the different ordering of alternatives 2, 3, and 4 in Table 3. This basic behaviour of the PageRank algorithm (Dode and Hasani 2017) allows us to more realistically model people's behaviour during BWS tasks.
In market research, it is a common requirement to perform calculations during live interviews, however, the current practice offers applications for rather simple tasks. Our initial aim was to provide a fast and effective solution for a complex problem, the accurate estimation of preference probabilities, in real-time. However, we found notable differences between the HB and network-based estimations presented above. Although the HB and the PageRank estimations are highly correlated in all the examples (> 0.73 in all cases), the number of discrepancies raises the following question. How could so many inaccurately estimated values occur when a good HB estimation requires more and more individual designs (see the ice creams example)?
If we use the tennis analogy again, we can say that some players defeated the weaker players but was not tested against stronger ones. Examples can be seen in the employee benefits dataset regarding alternatives 5 and 6 (Tables 8 and 9). The HB estimation and the applied design combination is a possible weakness of the approach. One theory is that the inaccurately estimated values may have resulted from the HB estimation procedure of individual preference utilities/probabilities. Since HB partly uses the total sample properties for estimating the individual parameters, if the design is different for each respondent, then the estimation cannot be as effective as when the design is the same for each. Additionally, HB requires perfectly balanced designs (Chrzan and Orme 2019; Lipovetsky 2018; Lipovetsky and Conklin 2015;Orme 2009). For these reasons, we have a preference for using our alternative approach, since the estimations can independently be calculated from any design. However, a more thorough investigation of this hypothesis is planned.
Our findings support the validity of asking extra questions on top alternatives depending on real-time individual answers. Respondents thought longer during the second exercise which indicates that solving a complex problem (choosing the best of bests) requires increased mental activity and hence, more time (Scherer et al. 2015). The data completed with extra answers can provide more accurate estimations. The improved results can lead to different conclusions in important marketing decisions and strategies.
We believe that our method performs better than previously used approaches. However, some challenges may occur during the setup of the entire system, albeit challenges which can be resolved. Finding such solutions will most likely require a high level of experience in survey programming and the integration of multiple platforms. A more detailed evaluation of the effectiveness of this approach is already planned, using more data and a multi-method analysis to compare results. We presented a real-time solution for evaluating best-worst data types, which solution may have some technical difficulties because the service provider needs to establish the infrastructural background for real-time calculations: e.g., server setup with appropriate software environment, sufficient capacity for fast and simultaneous real-time calculations when multiple respondents are filling out the survey. Moreover, since the PageRank algorithm is not a statistical method, significance testing is currently not available for it, which is a valid limitation of this approach. Evaluating the performance of the proposed approach is one of our future research directions and, thus, we are seeking every opportunity where we should be able to implement our solution.
Based on the examples shown, we can conclude that the greatest advantage of our network-based approach is its real-time, live application. It significantly decreases the amount of time required for getting the results, because the customer does not need to wait until the entire fieldwork is completed and the answers are analysed.
The real-time, professional evaluation of individual-level preferences enables the preparation and position of additional questions, e.g., the second round of choice question(s) that help to provide more accurate estimations of individual-level and community-shared preferences.
The fast calculation further facilitates the development of specific research designs focussing only on the most preferred brand and thus, allowing the preparation of dynamic and adaptive surveys which change based on the answers of the respondents during the live interview.