Machine learning strategy identification: A paradigm to uncover decision strategies with high fidelity

Fang, Jun; Schooler, Lael; Shenghua, Luan

doi:10.3758/s13428-022-01828-1

Machine learning strategy identification: A paradigm to uncover decision strategies with high fidelity

Published: 04 April 2022

Volume 55, pages 263–284, (2023)
Cite this article

Download PDF

Behavior Research Methods Aims and scope Submit manuscript

Machine learning strategy identification: A paradigm to uncover decision strategies with high fidelity

Download PDF

Jun Fang¹,
Lael Schooler¹ &
Luan Shenghua^2,3

1191 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

We propose a novel approach, which we call machine learning strategy identification (MLSI), to uncovering hidden decision strategies. In this approach, we first train machine learning models on choice and process data of one set of participants who are instructed to use particular strategies, and then use the trained models to identify the strategies employed by a new set of participants. Unlike most modeling approaches that need many trials to identify a participant’s strategy, MLSI can distinguish strategies on a trial-by-trial basis. We examined MLSI’s performance in three experiments. In Experiment I, we taught participants three different strategies in a paired-comparison decision task. The best machine learning model identified the strategies used by participants with an accuracy rate above 90%. In Experiment II, we compared MLSI with the multiple-measure maximum likelihood (MM-ML) method that is also capable of integrating multiple types of data in strategy identification, and found that MLSI had higher identification accuracy than MM-ML. In Experiment III, we provided feedback to participants who made decisions freely in a task environment that favors the non-compensatory strategy take-the-best. The trial-by-trial results of MLSI show that during the course of the experiment, most participants explored a range of strategies at the beginning, but eventually learned to use take-the-best. Overall, the results of our study demonstrate that MLSI can identify hidden strategies on a trial-by-trial basis and with a high level of accuracy that rivals the performance of other methods that require multiple trials for strategy identification.

Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations

Article Open access 30 January 2023

A survey on semi-supervised learning

Article Open access 15 November 2019

Decision Making: a Theoretical Review

Article 15 November 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Decision makers typically gather information before making a choice or inference. Imagine someone who wants to find a reliable used car and chooses between, say, a 2008 Honda and a 2012 Toyota. To facilitate the choice, the person needs to gather relevant information, which usually takes the form of attributes about the alternatives, such as the mileage, model year, and accident histories of the cars. These attributes are often referred to as cues (e.g., Stewart, 1988). How do people use this information (e.g., cues) to make decisions? One approach is to posit that people are equipped with a variety of strategies that they can select adaptively to solve the decision problems they face (e.g., Beach & Mitchell, 1978; Gigerenzer & Selten, 2002; Lieder & Griffiths, 2017; Payne et al., 1993), and the selection of a particular strategy depends on many factors, such as time pressure (Payne et al., 1996; Rieskamp & Hoffrage, 2008), information cost (Beach & Mitchell, 1978; Bröder, 2000), feedback on decision outcomes (Rieskamp & Otto, 2006), and cognitive cost associated with each strategy (Fechner et al., 2018). With a repertoire of possible strategies, it has been a challenge for researchers to identify which strategy or strategies people use in a given task.

Some strategy identification methods use only individuals’ choices to infer the strategy they may have applied (e.g., Bröder & Schiffer, 2003; Bröder, 2003; Hilbig & Moshagen, 2014; Lee, 2016). Other methods consider additional data, such as confidence ratings, decision time, and eye-tracking measures (e.g., Glöckner, 2009; Rieskamp & Hoffrage, 2008). Compared to relying on choices alone, strategies can be identified more reliably and accurately when a variety of behavioral measures are taken into account (e.g., Glöckner, 2009; Lee et al., 2019; Riedl et al., 2008).

The present study proposes a machine learning method for strategy identification that takes both choices and other behavioral data into consideration. Compared to existing methods, this method has fewer constraints on the form and the amount of behavioral data required, and it can identify strategies on a trial-by-trial basis. Thus, it can detect dynamic changes in strategy selection that elude other methods. The remainder of this article is structured as follows. We start by describing the decision strategies that we use to illustrate the machine learning approach. We then outline methods of strategy identification applied frequently in the literature, followed by an introduction to the novel machine learning approach that we call machine learning strategy identification (MLSI). After that, we describe the performance of MLSI in three experiments. To conclude, we discuss some limitations of MLSI and future directions.

Decision strategies

The set of decision strategies we focus on in this study have been studied extensively in the simple heuristic research framework, which posits that people are equipped with a toolbox of strategies that they can apply adaptively in different task environments (Gigerenzer & Goldstein, 1996; Gigerenzer & Selten, 2002; Gigerenzer et al., 1999). Consider the used car example shown in Fig. 1, in which five cues are available and are ranked according to their validities. In a paired-comparison task, where one must infer which alternative of a pair has a larger criterion value, a cue’s validity is defined as the probability that it makes a correct inference given that the cue discriminates, that is, when the cue values are different for the two alternatives being compared. Knowing the cue information (i.e., cue values and cue validities), how do people use that information to make a decision? In other words, what are the decision strategies they may use? We describe four below.

A person using take-the-best decides by searching for a cue that discriminates between the alternatives (Gigerenzer et al., 1991). Specifically, take-the-best searches cues in the order of their validities. If a cue discriminates, it chooses the alternative that has a cue value associated with a higher criterion value. If the cue values of the alternatives are the same, then take-the-best moves on to the next cue. If no cue discriminates, it selects one of the alternatives randomly. Table 1 shows how take-the-best, as well as the three other strategies, would make the used-car decision in Fig. 1.

Table 1 Examples of strategy applications for the decision problem shown in Fig. 1

Full size table

Δ-inference works similarly to take-the-best (Luan et al., 2014). However, instead of stopping search when the cue values differ between the alternatives, Δ-inference stops search and makes a decision when the cue value of one alternative exceeds that of the other by a threshold Δ. Take-the-best can be considered as a special case of Δ-inference when Δ is set at zero for all cues. For take-the-best and Δ-inference, a cue-wise information search is expected; that is, one would inspect both alternatives’ values for a cue before moving on to the next cue or making a decision. Because of their close similarities, take-the-best and Δ-inference often lead to the same choices, posing a challenge to strategy identification. This challenge is especially pronounced when Δ is small.

The third strategy is weighted-additive (WADD), in which one weights an alternative’s cue values by each cue’s importance, adds up the weighted values for an overall score, and selects the alternative with the highest score (Payne et al., 1988). In a task with binary cues, cue values can be weighted by each cue’s validity. When cues are not binary, cue dichotomization may simplify the weighting-and-adding process. Specifically, one may first dichotomize a cue with a threshold, treating the higher or more favorable values as “1” and others as “0.” A weighted score for each alternative is then calculated based on the dichotomized cue values and cue validities.

Tallying is a special case of WADD, in which one treats all cues to be equally important (Payne et al., 1988). In doing so, tallying can reduce the amount of computation substantially. For WADD and tallying, an alternative-wise search is expected; that is, one would inspect all cue values of one alternative before checking those of the other alternative.

These four strategies can be grouped into two general categories based on how they use cue information. Take-the-best and Δ-inference are examples of non-compensatory strategies, in which favorable or unfavorable values on lower-ranked cues cannot compensate for favorable or unfavorable values on higher-ranked cues, and thus cannot overrule decisions made by higher-ranked cues. For example, one insists on buying a four-wheel drive Jeep no matter what discount the dealer offers on a two-wheel drive model. In contrast, WADD and tallying are compensatory strategies that allow trade-offs among cues, so that favorable values on lower-ranked cues can compensate for unfavorable values on higher-ranked cues.

Inferring which strategies people use to make decisions, such as which used car to buy, is a problem that decision researchers have tried to solve for decades (e.g., Bröder & Schiffer, 2003; Glöckner, 2009; Hilbig & Moshagen, 2014; Lee et al., 2019). Before introducing the new method that we tested in this study, we review the main existing methods that have been applied to the problem of determining what strategies people are using. We contrast outcome-based methods, which use only the choices made, with process-based methods, which incorporate behavioral data leading up to the choice.

Outcome-based methods

Choice outcomes are frequently used to infer decision-making processes because they are readily observed. For instance, in structural modeling, researchers run multiple regressions between cues and choice outcomes, and take regression weights as indications of how heavily a person relies on each cue (e.g., Brehmer, 1994; Stewart, 1988). Because it only describes the statistical relations between cues and choice outcomes, this approach does not reveal much about the actual process of how a decision is made (Bröder, 2000).

In comparative model fitting, a metric, such as maximum likelihood, is calculated to gauge how well a strategy describes choice outcomes. For example, Bröder and Schiffer (2003) compared an individual’s choice outcomes with predictions made by several strategies and treated the strategy with the highest estimated likelihood as the one most likely to have produced the observed choice pattern. This method assumes that an individual uses only one strategy and applies that strategy with a constant error rate across different combinations of alternatives, which they referred to as item types. For this method to work well, researchers need to carefully design a set of item types, so that the strategies make markedly different outcome predictions across trials (Jekel et al., 2010). The comparative model fitting method is not unique in this regard, diagnostic items are required for all identification methods to a greater or lesser extent. Hilbig and Moshagen (2014) further developed this method by using a multinomial process tree formalism, which allows varying error rates across item types (i.e., each type can have its own error rate), instead of a fixed error rate for all item types. The error rates are further grouped into random application errors and systematic errors associated with an item type, helping to increase the accuracy of strategy identification.

The comparative model fitting approach assumes that individuals use the same strategy over time, and it identifies strategies based on the overall choice outcomes in a task. However, studies have found that people may use a mixture of strategies over a sequence of decisions (e.g., Davis-Stober & Brown, 2011; Scheibehenne et al., 2013) and adapt strategies in response to environmental changes (Lee et al., 2014; Lieder & Griffiths, 2017; Rieskamp & Otto, 2006). A latent mixture model can accommodate these findings, because it allows for the possibility of incorporating multiple strategies into a single mixture model. Such models are amenable to various statistical methods, including Bayesian methods. For example, Scheibehenne et al. (2013) used a Bayesian framework to infer the strategy or combination of strategies most likely to have produced the outcome data by comparing the posterior probabilities of a single-strategy model and a multiple-strategy model (see also Lee, 2011, 2016).

Process-based methods

Outcome-based methods draw inferences about strategies based on observed decisions. In contrast, process-based methods infer strategies from an array of dynamic process data associated with each strategy. The process data can be collected when information is acquired, integrated, and evaluated. The methods to collect process data include mouse tracing, verbal report, brain imaging, eye tracking, and so on. Researchers analyze these process measures, inferring individuals’ cognitive processes and, thereby, the decision strategies used. Various process-tracing techniques have been developed to collect data about how people acquire information (see Schulte-Mecklenbeck et al., 2017b, for a recent overview).

For example, information boards display cues and cue values in a matrix and are commonly used to track how individuals acquire information in an experiment (e.g., Bettman et al., 1990; Johnson, et al., 2008; Payne et al., 1993). In most computer-based information matrices (e.g., MouseLab), information about the cue values is hidden behind boxes. At the beginning of a decision trial, all boxes in the matrix are closed. As a participant moves the mouse over or clicks on a box, it opens to reveal the cue value. Figure 2 shows an example of how participants may move their mouse on an information board display when applying take-the-best.

A variety of process measures can be constructed from mouse movements on an information board that are indicative of different decision strategies. These measures include the total time spent on a trial, the proportion of information searched, variability in the amount of information searched per alternative, the ratio of cue-wise transitions to alternative-wise transitions, to name a few (for a comprehensive list, see Riedl et al., 2008). These measures have also been extended to eye-tracking studies (Krol & Krol, 2017; Schulte-Mecklenbeck et al., 2017a).

The conventional approach to strategy identification compares the observed process measures to the canonical process patterns of that strategy (e.g., Day, 2010). Sometimes, analyses of process measures are used to bolster conclusions from outcome-based methods. For instance, Rieskamp and Hoffrage (2008) found that participants who were identified as using take-the-best or WADD based on their choices differed significantly on six process measures. When different strategies arrive at the same choices, process measures provide more information to assist strategy identification.

Combining outcome-based and process-based methods

Strategies make predictions not only about the decisions people make, but also about the information search and cognitive processing they undertake before reaching decisions. Researchers have stressed the importance of combining outcome-based and process-tracing data to uncover human decision processes more accurately than using each type of data alone (Costa-Gomes et al., 2001; Harte & Koele, 2001; Glöckner, 2009; Lee et al., 2019). For example, Riedl et al. (2008) developed a decision tree, called DecisionTracer, with three process measures and one outcome-based measure as the decision nodes. Glöckner (2009, see also Jekel et al., 2010) developed the multiple-measure maximum likelihood (MM-ML) method that integrates outcomes, decision times, or confidence ratings to identify strategies based on the Bayesian information criterion. We use MM-ML as a benchmark in the present study, because it relies on data that are typically collected in decision experiments.

Incorporating multiple sources of data may decrease the number of decisions required to identify the strategies people are using. Several recent attempts have been made to identify strategy switches, because decision makers can use different strategies over time, even when facing the same kind of decisions (Lee & Gluck, 2020). Brusovansky et al. (2018) proposed a model that deploys strategies in a trial-by-trial stochastic manner by using a probabilistic switching parameter. Lee et al. (2019) incorporated decision outcomes, verbal report data, and search behavior into a Bayesian hierarchical model to infer when individuals may have changed strategies and how often they have done so. The existing methods make inferences about strategy switch based on observations of multiple trials. In many situations outside the laboratory, however, people do not make repeated decisions. Here, we propose machine learning techniques that can identify strategies on a trial-by-trial basis by integrating process and outcome data collected in one decision trial.

Machine learning strategy identification (MLSI)

The problem of strategy identification entails inferring individuals’ strategies based on behavioral data, such as choice outcomes and process measures. These data help differentiate strategies because each strategy is presumably associated with a signature pattern of data. Machine learning (ML) techniques, specifically supervised learning, solve similar classification problems. The goal of supervised learning applied to strategy identification is to find a good classifier that can distinguish between strategies based on the combination of outcome and process data in data sets where strategy labels are known.^{Footnote 1} The resulting classifier is then applied to assign strategy labels in data sets where individuals’ strategies are unknown. We next explain the detailed workflow of this method (see Fig. 3 for an overview).

1.
The strategy identification problem

The ML method aims to find a function that maps behavioral traces (i.e., all data collected during an experiment) to strategies. We look for this function by training an ML algorithm on a data set with known trace-label associations. The trained model will then be able to assign a given behavioral trace with a strategy label, such as take-the-best or WADD.

2.
Collect labeled data

The labeled data are relevant behavioral data engendered by a strategy, and their forms depend on the means of data collection (e.g., mouse movements or eye tracking). To increase the efficacy of trained ML algorithms, the labeled data should be representative of the unlabeled data that will need to be classified later.

3.
Construct features

Even a single decision can produce a large amount of raw data. Using the raw data to differentiate strategies is, however, not ideal, because these data typically contain much noise and irrelevant information. Moreover, the high dimensionality of the raw data can result in computational inefficiency. Therefore, when building a classification model, it is crucial to derive a set of informative features from the raw data. For example, the process-based approaches we reviewed above suggest some potentially useful features, such as the time spent on reading information and the proportion of information searched. Furthermore, using meaningful features can help improve the interpretability of the resulting classifier and, in turn, provide a better understanding of the strategies people use to make decisions. With these goals in mind, we constructed features from the raw data.

4.
Divide data into training and testing sets

Labeled data are divided into a training set and a testing set. An ML model is trained on the training set, while its performance is evaluated on the testing set. A model’s performance in the testing set is one way to measure its ability to generalize to unseen data, and one common metric to evaluate model performance is its identification accuracy on the testing set.

Some ML models have hyperparameters that control the learning process (e.g., the maximum depth of trees in random forest). These hyperparameters need to be tuned to optimize a model’s performance. A common approach is to test all combinations of hyperparameters in a predefined search space through K-fold cross validation. Specifically, the training set is split evenly into K subsets. A model is trained on K \(-\) 1 subsets and evaluated on the remaining subset, and the procedure is repeated for each subset to find the best combination of hyperparameters.

5.
Select ML algorithms

To classify decision strategies, we use classic supervised learning algorithms, including K-nearest neighbors (KNN), random forest (RF), decision trees (DF), support vector machines (SVM), and multilayer perceptron (MLP). The algorithms are implemented in Python using the scikit-learn library (Pedregosa et al., 2011). A comprehensive review of these and other ML algorithms can be found elsewhere (e.g., Friedman et al., 2001; Kotsiantis et al., 2007).

The performance of an ML model is determined mainly by how accurately it classifies the testing set. If the performance does not meet preset criteria, we will return to the previous stage, trying to improve the diagnosticity of the features or include more ML algorithms (see Fig. 3). One criterion is the relative performance of a model compared to that of other models. We also consider the interpretability of a model and the ease of collecting the required behavioral data.

A worked example

Here, we present an example of how the ML approach works for strategy identification, using KNN and a SVM model with a linear kernel (linear SVM) as the classification models (see Fig. 4). We first simulated 40 take-the-best and 40 WADD trials. In each trial, we labeled the trial with the strategy that generated its data and recorded two features, total decision time and proportion of information searched. The 80 trials were divided so that 60% of the trials were used for training and 40% for testing. The ML models learned the decision boundaries that separated the labeled trials based on the training set, and a trial from the testing set was labeled according to which side of the boundary it was located. The learned decision boundary can be linear or nonlinear, depending on the ML algorithm. In our case, linear SVM builds a linear decision boundary, whereas KNN produces a nonlinear boundary.

In sum, we have reviewed strategy identification methods that rely on choice outcomes, or process measures, or both, and introduced a new machine learning strategy identification (MLSI) method with a worked example. Next, we explore the potential of MLSI in three experiments.

Experiment overview

We evaluate the MLSI method by investigating how well it identifies the decision strategies individuals use in multi-attribute decision tasks. In Experiment I, we applied MLSI in a task environment where the cues took on continuous values. Such an environment represents a broad range of real-life decision tasks, in which Δ-inference and take-the-best often lead to the same decisions. In Experiment II, we compared the strategy identification accuracy of MLSI and the MM-ML (multiple-measure maximum likelihood) method. In Experiment III, we adapted the ML model that performed best in Experiment I to analyze participants’ decision processes in an environment that favors non-compensatory strategies. The goal was to examine whether and how participants might learn to adopt strategies that yield the highest rewards in such a task.

Experiment I

We explore the potential of the MLSI method in an environment in which cue values are continuous, consistent with how information is often presented in the real world. In the first part of the experiment, we taught participants how to use take-the-best, tallying, or Δ-inference. In the second part, we asked participants to make decisions using the strategy they had been taught, thus obtaining the labeled data needed to evaluate the MLSI method.

Participants

We recruited 180 undergraduate participants from the participant pool of the psychology department at Syracuse University. Participants were randomly assigned to one of the three strategy conditions with 60 participants in each. Informed consent was obtained before the experiment. This study was approved by the university’s institutional review board (IRB). The experimental session took approximately 30 min, and participants were given credits to fulfill a course research requirement. The data for ten participants in the tallying condition, two participants in the Δ-inference condition, and two participants in the take-the-best condition were excluded because their decisions deviated from the ones predicted by the strategy they had been taught in more than 20% of the trials. Overall, there were 58, 58, and 50 participants in the take-the-best, Δ-inference, and tallying conditions, respectively.

Materials

Participants were asked to take the role of a college student interested in purchasing one of two used cars with the goal of choosing the car that would last longer. Each car was described by five cues: the car’s mileage, model year, the number of accidents, the number of previous owners, and the frequency of maintenance as shown in Fig. 2. Values of these five cues of the two alternatives (cars) were displayed in a computerized information board, a matrix of five rows and two columns. The cue validities were shown in parentheses to the right of the cue names. The thresholds to dichotomize cues were presented between the cue boxes for the tallying participants, and the Δ values were shown between the cue boxes for Δ-inference participants. Participants looked up cue values by moving the mouse over the corresponding box and clicking, and indicated their choices by clicking a button at the bottom of the screen. The experiment program recorded mouse locations with a 5-ms precision.

Decision trials varied in terms of search requirements for take-the-best. One to five cues had to be inspected before a cue discriminated between the alternatives and a decision could be made. The five different search patterns were repeated eight times, resulting in a total of 40 decision trials per participant. Cue configurations were constructed such that take-the-best and Δ-inference predicted different choices than did tallying on 20 trials. Take-the-best and Δ-inference made identical predictions on 30 trials. The presentation of the trials was randomized for each participant.

Procedure

Each participant went through a tutorial to learn a particular strategy according to the condition they were in. After the tutorial, they did five practice trials where they were given feedback on each of their decisions. A “correct” decision was one that matched the decision their assigned strategy would make. Participants needed to make correct decisions on at least 80% of the practice trials to proceed to the next step; otherwise, they repeated the tutorial from the beginning. After completing the tutorial, participants proceeded to the decision task that consisted of 40 trials where they were supposed to apply the strategy they had learned. There was no feedback during the decision task on whether their decisions matched their assigned strategy. During both the tutorial and the decision-making phases, participants were alerted when their behavior suggested that they were not on task. Such behavior included checking only the cues for one of the alternatives or making a choice without opening any box.

Results

We first report descriptive statistics about the participants’ overall performance (Table 2), before reporting the results of the MLSI analysis.

Table 2 Descriptive statistics for Experiment I

Full size table

The descriptive results shown in Table 2 were in line with our expectations for the strategies. A Bayesian one-way ANOVA analysis indicates that the total decision time and the proportion of information searched were significantly different among the strategies (BFs > 1000).

Feature selection

The goal of the feature selection step is to construct features from the labeled raw data, so that the ML algorithms can classify trials as coming from users of take-the-best, tallying, or Δ-inference. Participants interacted with the experiment interface as shown in Fig. 2. Their search behaviors were recorded as mouse coordinates every five milliseconds. Figure 5 shows the mouse traces of three representative trials in which take-the-best, tallying, and Δ-inference were applied to the same decision pair. The traces illustrate the cue-wise search typical of Δ-inference and take-the-best and the alternative-wise search expected from tallying. We then encoded the mouse trace data as features that are indicative of the strategies.

Table 3 summarizes the features that were selected. A primary feature is the decision time in each trial (\({x}_{1}\)). As discussed previously, non-compensatory strategies tend to search for less information, resulting in shorter decision times. Compensatory strategies integrate all cue values, which potentially takes longer. The proportion of cues searched (\({x}_{2}\)) reflects the number of boxes that have been opened. Participants using non-compensatory strategies typically need to open fewer boxes than those using compensatory strategies.

Table 3 Feature set for machine learning models in Experiment I

Full size table

Each box on the information board was assigned a number (see Fig. 5). We recorded the total time taken to process each box, yielding ten features (i.e., \({x}_{3}\) to \({x}_{12}\)) for a five-cue decision task. Ten features ( \({x}_{13}\dots {x}_{22}\)) represent the search order; that is, the order in which the boxes were opened. We recorded search order by entering the box number of each opened box into one of these ten features. If fewer than ten boxes were opened, then zeros were recorded in the remaining search order features. Non-compensatory and compensatory strategies are associated with cue-wise and alternative-wise search, respectively. An example of take-the-best’s search order is \((\mathrm{1,5},\mathrm{2,6},\mathrm{0,0},\mathrm{0,0},\mathrm{0,0})\) and tallying is \((\mathrm{1,2},\mathrm{3,4},\mathrm{5,6},\mathrm{7,8},\mathrm{9,10})\).

Participants’ choices were compared with the predictions of each strategy (\({x}_{23}\) to \({x}_{25}\)). If a participant’s decision was consistent with a particular strategy, the feature value was coded as a “1,” whereas an inconsistent decision was coded as “0.” For example, if a participant’s final choice is consistent with take-the-best and Δ-inference but not with tallying, the feature vector is (1,1,0). An additional feature \({x}_{26}\) is designed to differentiate take-the-best from Δ-inference. It is a binary feature that indicates whether the cue values on the cue before the discriminating cue were the same. If they were different, the participant was likely using Δ-inference, and the feature is encoded as “0”; otherwise, the participant was more likely using take-the-best, and the feature is encoded as “1.” The intuition behind this feature is that when the cue values for both alternatives are different on the previous cue, the participant could not have used take-the-best, because they would have made their decision on this cue already.

MLSI analysis

We randomly selected five participants from each strategy group to serve as the testing set data, yielding 600 trials in total. We then used the K-fold cross-validation method (K = 10) to train the ML algorithms on the remaining participants’ trials (the training set) and searched for optimal hyperparameters.^{Footnote 2} Lastly, we applied five ML models in the training set and evaluated their performance in the testing set. Table 4 shows the identification accuracy of the five trained ML models in the testing set. The best-performing model is random forest, with an identification accuracy of 93.8%. The data, the Python code for the analysis, and the confusion matrices can be found in the Supplementary Material.

Table 4 Strategy identification accuracy of machine learning models on test participants

Full size table

The classification results for each test participant’s decision trials are plotted in Fig. 6. Tallying, the compensatory strategy in this experiment, is perfectly identified. The identification accuracy for take-the-best and Δ-inference is 91.5% and 89.5%, respectively. The misclassified Δ-inference trials are exclusively classified as take-the-best trials, and vice versa. This pattern was expected, because take-the-best and Δ-inference are both non-compensatory strategies that compare alternatives cue-wise.

Feature importance

It is informative to not only have an accurate model, but also an interpretable one. In strategy identification, we also want to know what features were important to distinguish the strategies. Having a better understanding of the model’s logic can help verify that the model is correct and potentially improve the model by selecting the most relevant features. The random forest algorithm can be difficult to interpret due to its randomized nature, but it is possible to learn something about what features are most important for random forest.

Gini importance (or mean decrease in impurity) is a measure of feature importance for a random forest model. Random forest constructs a set of decision trees, and each tree has its own internal nodes and leaves. Each internal node uses a feature to divide a data set into two separate sets with similar responses within. Therefore, we can measure how well each feature decreases the impurity of the split and rank the features according to how much they decrease impurity. Gini importance represents how much each feature decreases the impurity over all trees in the forest on average (Archer & Kimes, 2008).

Figure 7 shows the Gini importance of the 26 features (Table 3). The feature designed to differentiate between Δ-inference and take-the-best (\({x}_{26}\)) is the most important one. The second most important feature (\({x}_{14}\)) codes which box was opened second. If participants applied cue-wise search, the second box opened should correspond to the highest validity cue of the second alternative (box 6). In contrast, an alternative-wise search suggests that the second box opened should correspond to the second highest validity cue of the first alternative (box 2). Moreover, as suggested by the analysis of decision times, total decision time (\({x}_{1}\)) is also an important feature. The time to read the second alternative’s highest validity cue (\({x}_{8}\)) could help distinguish Δ-inference from take-the-best too, because it generally takes longer to assess whether two cue values differ by a Δ than simply to determine whether they are different.

Discussion

In this experiment, we tested MLSI in an environment with continuous cues. Random forest, the best-performing ML model, classified participants’ strategies on a trial-by-trial basis with an overall accuracy rate of 93.8%. The compensatory strategy tallying was perfectly differentiated from the non-compensatory strategies. The most misclassified trials were between take-the-best and Δ-inference, because both strategies search information cue-wise and usually lead to the same decisions. Perhaps it is because their decisions and search patterns are so similar that random forest relied heavily on feature \({x}_{26}\), which is effective in differentiating take-the-best from Δ-inference.

Experiment II

In Experiment II, we compare MLSI with MM-ML in terms of strategy identification accuracy. Participants were trained to use either take-the-best, WADD, or tallying to make decisions, and then made a series of decisions using the strategy they were taught. We use the stimuli from Glöckner (2009) to ensure that the strategies make different predictions on the dependent variables, thereby making MM-ML feasible. Δ-inference was not included, because in this experiment all the cue values were binary, and the performance of take-the-best and Δ-inference are identical in such a task environment.