# Predicting similarity judgments in intertemporal choice with machine learning

## Abstract

Similarity models of intertemporal choice are heuristics that choose based on similarity judgments of the reward amounts and time delays. Yet, we do not know how these judgments are made. Here, we use machine-learning algorithms to assess what factors predict similarity judgments and whether decision trees capture the judgment outcomes and process. We find that combining small and large values into numerical differences and ratios and arranging them in tree-like structures can predict both similarity judgments and response times. Our results suggest that we can use machine learning to not only model decision outcomes but also model how decisions are made. Revealing how people make these important judgments may be useful in developing interventions to help them make better decisions.

## Keywords

Classification tree Decision tree Intertemporal choice Judgment Machine learning Similarity## Introduction

Would you prefer to receive $100 today or $105 in one month? Intertemporal choices such as these involve trading off smaller rewards available sooner with larger rewards available later. The temporal discounting approach to intertemporal choice models these tradeoffs by assuming that people subjectively devalue future rewards based on the time delay to receiving those rewards. Therefore, discounting models integrate the reward amount with the time delay to generate a discounted value for each option.

- 1.
*How do the small and large values of the reward amounts and time delays combine to predict similarity judgments?*Rubinstein (1988) proposed that either the numerical difference (large value − small value) or numerical ratio (small value / large value) between values could be used to make similarity judgments. For example, when comparing $3 vs. $4, one could focus on the difference of 1 or the ratio of 3/4. Stevens (2016) measured similarity judgments and found that both difference and ratio independently accounted for these judgments. Here, we test whether different mathematical operations combine small and large values to predict similarity judgments. We use classification algorithms from machine learning to predict people’s similarity judgments based on numerical difference and ratio or other psychophysical and decision-making functions (Table 1). This will tell us how small and large values combine to generate similarity judgments. - 2.
*Do trees capture similarity judgments?*Researchers often use regression models to investigate what factors classify responses. We propose an alternative classification method used in machine learning:*classification trees*(Breiman, Friedman, Olshen, & Stone, 1984). These algorithms produce*decision trees*, which are sequential decision rules for classifying outcomes based on a set of predictors. These trees are represented by nodes for each relevant predictor (e.g., difference or ratio) and a threshold for each predictor that divides into branches (Fig. 1b). One can move down a tree by determining if the threshold of a predictor for a particular pair of values (e.g., $3 vs. $4) is met. Eventually, the tree ends in a terminal node that classifies the response. An advantage of decision trees is that they can make predictions not only for outcome data (e.g., choices, judgments) but also for process data (e.g., response times), which is useful for assessing decision strategies. In this study, we evaluate whether decision trees produced by machine-learning algorithms can model how similarity judgments are made by predicting both the judgment outcomes and response times.

Predictors

Predictor name | Value/Function | Source |
---|---|---|

Small value | | |

Large value | | |

Difference | | Rubinstein (1988) |

Ratio | \(\frac {S}{L}\) | Rubinstein (1988) |

Mean ratio | \(\frac {S}{\frac {S+L}{2}}\) | Eisler and Ekman (1959) |

Log ratio | log(\(\frac {S}{L}\)) | Künnapas and Künnapas (1974) |

Relative difference | \(\frac {L-S}{L}\) | González-Vallejo, Reid, and Schiltz (2003) |

Disparity ratio | \(\frac {L-S}{\frac {S+L}{2}}\) | Boysen, Berntson, Hannan, and Cacioppo (1996) |

Salience | \(\frac {L-S}{S+L}\) | Bordalo, Gennaioli, and Shleifer (2012) |

Discriminability | log(\(\frac {L}{L-S}\)) | Welford (1960) |

Logistic | \(\frac {1}{1+e^{L-S}}\) |

To explore these questions, we used classification-tree algorithms from machine learning to assess what predictors best accounted for participants’ similarity judgments and whether the resulting decision trees predicted judgments better than regression analyses. Combined, these findings reveal what cognitive processes influence similarity judgments.

## Methods

### Data sets

We tested our research questions on two data sets. Data set 1 was collected from 65 participants (29 males and 36 females) with a mean ±SD age of 30.3 ± 9.1 (range 22-72) years recruited from the Adaptive Behavior and Cognition Web Panel at the Max Planck Institute for Human Development in Berlin, Germany in August 2011. Participants received a flat fee of €3 for completing the survey. Web panel participants made similarity judgments between 50 pairs of amount values (e.g., €6 vs. €8) and 50 pairs of delay values (e.g., 6 days vs. 8 days): “Please decide whether the numbers are similar”. This research was approved by the Max Planck Institute for Human Development’s Ethics Committee.

Data set 2 was collected from 90 participants (29 males and 61 females) with a mean ±SD age of 20.0 ± 1.6 (range 18-26) years recruited from the University of Nebraska-Lincoln Department of Psychology undergraduate participant pool in December 2014. Participants received course credit for their participation. Participants started by making 20 intertemporal choices before rating the similarity of 43 reward amount values 43 and time delay values: “Do you consider receiving [small amount] and [large amount] to be similar or dissimilar?” and “Do you consider waiting [short delay] and [long delay] to be similar or dissimilar?”. The intertemporal choices used the same value pairs as the similarity judgments and were included first to expose participants to the range of amount and delay magnitudes and to provide the overall decision context before they made similarity judgments. This research was approved by the University of Nebraska-Lincoln Internal Review Board (IRB Approval # 20130313118EP).

We chose the sample sizes of 65 and 90 because they were comparable to or greater than the sizes used in Stevens (2016), which detected medium-sized effects in the intertemporal choice model selection analyses. For both data sets, we recorded the similarity judgments for each question and demographic information, including age and gender. For data set 2, we also recorded response time and included attention checks with the same small and large value (10 vs. 10) or with very large differences between large and small values (1 vs. 90).

### Classification trees

Prior to the classification-tree analysis, we removed participants who (1) made the same similarity judgment in over 95% of the trials, (2) judged 10 vs. 10 to be dissimilar, (3) judged 1 vs. 90 to be similar, or (4) showed inconsistencies in judgments. To measure for inconsistencies, we included sets of questions in which the large value was fixed and was paired with at least 10 different small values. We removed participants with more than three switches between dissimilar to similar in at least one of these sets. In all, we removed 31 of the 155 participants, leaving 124 (Data set 1: *n* = 50; Data set 2: *n* = 74).

We included a set of 11 predictors of similarity judgments (Table 1; Figure ??) for both CART and multiple logistic regression models. To compare the model classes, we used cross-validation to calculate out-of-sample predicted accuracy—the proportion of out-of-sample judgments accurately classified by the models. First, we randomly split the data in half (training sample and test sample). We then fit each model with all predictors on the training sample, which generated model-specific parameters (regression weights for each predictor and decision nodes and thresholds). Next, we used the fitted parameters to classify the test sample, which allowed us to calculate out-of-sample predicted accuracy. Finally, we switched the training and test samples and repeated the process. Model prediction occurred for each of the participants’ data individually and separately for amounts and delays. Each participant’s data was cross-validated 100 times for both decision-tree and regression models.

### Data analysis

For response time data, we removed outliers with modified Z scores greater than 3. We calculated Bayes factors (*BF*) to provide the weight of evidence for the alternative hypothesis relative to the null hypothesis (Wagenmakers, 2007). For example, *BF* = 10 means that the evidence for the alternative hypothesis is 10 times stronger than the evidence for the null hypothesis. Bayes factors between 1-3 provide only anecdotal evidence, those between 3-10 provides moderate evidence, those between 10-100 provide strong evidence, and those above 100 provide very strong evidence (Andraszewicz et al., 2015). Bayes factors associated with generalized linear mixed models were converted from Bayesian Information Criterion (BIC) using *BF* = \(e^{\frac {BIC_{null} - BIC_{alternative}}{2}}\) (Wagenmakers, 2007). Bayes factors for t-tests were computed using noninformative priors (Rouder, Speckman, Sun, Morey, & Iverson, 2009).

When comparing measures within a participant, we calculated within-subjects 95% confidence intervals (Morey, 2008). For mixed-effects models, we calculated profile likelihood 95% confidence intervals for coefficients. Confidence intervals are presented in brackets after the parameter estimate.

We analyzed the data using R Statistical Software version 3.4.2 (R Core Team, 2017)^{1}. Data, R code, and supplementary tables and figures are available in the ?? and at the Open Science Framework (https://osf.io/ew8dc/).

## Results

### Predictors of similarity judgments

Stevens (2016) demonstrated that both difference and ratio independently influence similarity judgments. Here, we (1) attempt to replicate this finding on new data and (2) evaluate how difference and ratio combine to predict similarity judgments. To address this, we restricted our analysis to data set 2, where we specifically created value pairs that varied difference while holding ratio constant and vice versa.

*β*= -1.01 [-1.10, -0.91],

*BF*> 100), ratio (

*β*= 1.10 [0.51, 1.69],

*BF*> 100), and type (

*β*= 0.82 [0.68, 0.97],

*BF*> 100) independently influenced similarity. Value pairs were judged as more similar with larger differences, with smaller ratios, and for delays compared to amounts. Furthermore, difference and ratio interacted (

*β*= 0.53 [0.40, 0.66],

*BF*> 100), with a weaker effect of difference at higher ratios. That is, as the ratio increased and values were more similar, the difference between values affected judgments less. People’s judgments of similarity between two reward amounts or two time delays depended on

*both*the numerical difference and numerical ratio.

*Thus, both difference and ratio contributed to similarity judgments.*

The fact that both difference and ratio predict similarity judgments raises two possible causes. First, difference and ratio may combine mathematically, meaning they both are *simultaneously present* in the function used by our predictors (e.g., the predictor relative difference includes both difference and ratio in its expression—Table 1). Alternatively, difference and ratio may enter the tree *separately in sequence* (i.e., one predictor before the other one). We tested these alternative hypotheses by classifying similarity judgments with classification trees that included our predictors. If ratio and difference combine mathematically, then one of the combined predictors should best predict judgments for both amounts and delays. If they combine sequentially, then just difference and ratio predictors should be the best predictors of judgments.

For each participant and judgment type, the classification-tree algorithm generated a decision tree with the single best predictor for classifying the judgments (i.e., the first node in the tree). For 95-98% of participants across both data sets, either difference or ratio was the best predictor for amount and delay judgments (Table 2). *Thus, difference and ratio combined sequentially in a tree-like way to influence similarity judgments rather than in a more complicated mathematical operation.*

Best predictors for individual participant decision trees

Data set | Judgment type | Large | Difference | Ratio | Relative difference | Logistic |
---|---|---|---|---|---|---|

1 | Amount | 0 | 26 | 24 | 0 | 0 |

1 | Delay | 0 | 25 | 25 | 0 | 0 |

2 | Amount | 0 | 62 | 10 | 1 | 1 |

2 | Delay | 1 | 50 | 15 | 3 | 2 |

All | Amount | 0.0% | 71.0% | 27.4% | 0.8% | 0.8% |

All | Delay | 0.8% | 62.0% | 33.1% | 2.5% | 1.7% |

### Decision trees as process models

#### Decision trees predict similarity judgments

*d*= 0.80,

*BF*> 100) and delay judgments (Mean difference in accuracy = 9.3% [8.3, 10.4], Cohen’s

*d*= 1.09,

*BF*> 100) (Table 3).

*Thus, decision trees predicted similarity judgments better than regression models.*

Mean percent predicted accuracy for models

Judgment type | Model | Mean accuracy |
---|---|---|

Amount | Regression | 80.2 [79.2, 81.1] |

Amount | Tree | 86.0 [85.0, 87.1] |

Delay | Regression | 77.5 [76.5, 78.5] |

Delay | Tree | 86.8 [85.9, 87.8] |

#### Decision trees track response time

Decision trees may be able to track these differences in response time when judgments can be made after a single node or after multiple nodes (Fig. 1b). If the judgment process follows a tree-like structure, we hypothesized that, when the tree predicts that the judgment requires traveling further down the tree, the participants’ judgment times should increase due to processing multiple nodes. This was demonstrated in the fast-and-frugal priority heuristic for risky choices, where gambles that should only take one step to resolve had shorter responses times than gambles that took more than one step (Brandstätter, Gigerenzer, & Hertwig, 2006).

*n*= 51; Delay:

*n*= 52; Figures ?? and ??). For each value pair, we determined at which decision node that participant’s tree predicted that the judgment would be made. We then calculated the median response time for each participants’ judgment at each node level and for each judgment type. We conducted a linear mixed effect model of median response time with number of node levels and judgment type as fixed factors and subject as a random factor (Fig. 5). Number of node levels positively predicted response times (

*β*= 0.14 [0.09, 0.20],

*BF*> 100) but judgment type did not (

*β*= -0.14 [-0.30, 0.01],

*BF*= 0.24), and there was no interaction (

*β*= 0.02 [-0.06, 0.10],

*BF*= 0.01). Judgment response time, therefore, increased as participants had to work their way down the trees.

*Thus, response time data were consistent with decision tree processing predictions.*

Number of participants trees with each number of nodes in data set 2

Judgment type | 1 Node | 2 Nodes | 3 Nodes | 4 Nodes | 5 Nodes |
---|---|---|---|---|---|

Amount | 20 | 29 | 18 | 7 | 0 |

Delay | 17 | 27 | 20 | 6 | 1 |

## Discussion

Our results reveal that numerical difference and ratio predict similarity judgments for amounts and delays. Classification-tree algorithms indicate that, rather than combining mathematically, difference and ratio predictors are used separately and sequentially to make these judgments. These trees outperform regression models in predicting similarity judgments, and response time data suggest that decision trees not only predict judgment outcomes but also hint at tree-like judgment processes: People may evaluate one predictor before moving to a second if the first fails to result in a judgment.

For most participants, small and large values combine in rather simple ways via numerical differences and ratios to generate similarity (Table 2). Although both difference and ratio influence similarity judgments (Fig. 3), they do so separately rather than via more complicated mathematical relationships. Thus, rather than previously proposed decision-making and psychophysical functions (Table 1), simple differences and ratios best predict similarity judgments.

The importance of difference and ratio in similarity judgments mirrors patterns observed in psychophysical domains, including brightness, loudness, weight, and length (Stevens, 1975). Likewise, both difference and ratio are critical to human (and nonhuman) number discrimination. This is evidenced by the numerical distance effect, which shows discrimination based on difference (Rilling & McDiarmid, 1965), and Weber’s law, which shows discrimination based on ratio (Mechner, 1958). Therefore, similarity judgments of monetary amounts and time delays follow core psychophysical principles of quantity judgments.

In this study, we used amount and delay magnitudes ranging from 0-100. Given that similarity judgments are context specific, the absolute magnitude of amounts and delays might influence how these judgments are made. First, the range of magnitudes assessed early on in testing might set anchors that bias judgments. We included the intertemporal choice questions before asking participants to make similarity judgments to illustrate the range of magnitudes and reduce bias and order effects. Second, participants may use different predictors, thresholds, or even classification algorithms across different magnitude ranges. Further work is needed to determine whether these results generalize across different magnitude ranges.

We also observed small differences in similiarity judgments across amount and delay judgment types (Fig. 3; Table 2). While it is possible that these are meaningful differences, we do not yet have strong evidence that delay pairs are generally judged as more similar than amount pairs or that difference and ratio are better predictors for one judgment type over another. Further work is needed to investigate whether there are robust differences between amount and delay judgments.

Rather than using classification-tree algorithms as only a statistical approach, we propose that these algorithms produce decision trees that might offer a process model of similarity judgments. Compared to regression models, decision trees use fewer predictors and compare predictor values to a threshold rather than weight them by a coefficient. Despite being simpler and more frugal in their information use, decision trees outperform regression models in predicting judgments.

Process data also support tree models: When decision trees predict the use of fewer nodes, participants indeed make judgments faster than when they are predicted to use more nodes. Both outcome and process data support decision trees as process models of similarity judgment. Since similarity judgments also apply to risky and strategic choice (Rubinstein, 1988; Leland, 2013), this approach can be extended to these choice domains, as well.

Understanding what factors influence similarity judgments is important because it provides opportunities to alter the “downstream” intertemporal choices. Therefore, these results not only give us insights into how people make these choices, but may also inspire interventions to help them make better decisions. Interventions that increase similarity judgments of time delays may focus attention on the reward amounts and nudge people into making more patient choices for their long-term benefit. This could help people improve their long-term health (diet, exercise, alcohol and drug consumption), financial stability (credit card debt reduction, retirement savings), and environmental sustainability (resource consumption, pollution reduction).

In conclusion, the similarity model can account for both outcome and process data in intertemporal choices (Leland, 2002; Rubinstein, 2003; Stevens, 2016), risky choices (Rubinstein, 1988; Leland, 1998), and strategic choice (Leland, 2013). This model moves the bulk of the decision process from the choice to the similarity judgment. Our work addresses how people make similarity judgments by showing that (1) rather simple combinations of small and large values (numerical differences and ratios) can predict similarity judgments and (2) decision trees capture both the outcome and process data. We used machine-learning algorithms to not only create statistical models of judgment outcomes but also develop process models that capture how decisions are made. Thus, machine-learning algorithms provide a useful set of tools for modeling judgment and decision making, with the potential to help people make better decisions.

## Footnotes

- 1.
We also used the BayesFactor, car, cowplot, dplyr, foreach, ggplot2, lattice, lme4, MBESS, papaja, plyr, rpart, rpart.plot, tidyr, and xtable packages (package usages and citations are provided in ??).

## Notes

### Acknowledgements

This research was funded by an Alexander von Humboldt Foundation TransCoop Grant and by National Science Foundation grants (NSF-1062045, NSF-1658837). We would like to thank Isabella Otto for collecting data in Germany; Duy Nguyen for developing the Java-based data collection program and helping analyze data; Nik Leger and Cherylynn Gibson for helping analyze data; Noah Svec for testing participants; and UNL’s CB3 Club for comments on an early draft.

## Supplementary material

## References

- Andraszewicz, S., Scheibehenne, B., Rieskamp, J., Grasman, R., Verhagen, J., & Wagenmakers, E.-J. (2015). An introduction to Bayesian hypothesis testing for management research.
*Journal of Management*,*41*(2), 521–543. http://dx.doi.org/10.1177/0149206314560412 - Bordalo, P., Gennaioli, N., & Shleifer, A. (2012). Salience theory of choice under risk.
*Quarterly Journal of Economics*,*127*(3), 1243–1285. http://dx.doi.org/10.1093/qje/qjs018 - Boysen, S.T., Berntson, G.G., Hannan, M.B., & Cacioppo, J.T. (1996). Quantity-based interference and symbolic representations in chimpanzees (
*Pan troglodytes*).*Journal of Experimental Psychology: Animal Behavior Processes*,*22*(1), 76–86. http://dx.doi.org/10.1037/0097-7403.22.1.76 - Brandstätter, E., Gigerenzer, G., & Hertwig, R. (2006). The priority heuristic: Making choices without trade-offs.
*Psychological Review*,*113*(2), 409–432. http://dx.doi.org/10.1037/0033-295X.113.2.409 - Breiman, L., Friedman, J.H., Olshen, R.A., & Stone, C.J. (1984).
*Classification and regression trees*. New York: Chapman; Hall.Google Scholar - Eisler, H., & Ekman, G. (1959). A mechanism of subjective similarity.
*Nordisk Psykologi*,*11*(1), 1–10. http://dx.doi.org/10.1080/00291463.1959.10780403 - Ericson, K.M.M., White, J.M., Laibson, D.I., & Cohen, J.D. (2015). Money earlier or later? Simple heuristics explain intertemporal choices better than delay discounting does.
*Psychological Science*,*26*(6), 826–833. http://dx.doi.org/10.1177/0956797615572232 - González-Vallejo, C., Reid, A.A., & Schiltz, J. (2003). Context effects: The proportional difference model and the reflection of preference.
*Journal of Experimental Psychology: Learning, Memory, and Cognition*,*29*(5), 942–953. http://dx.doi.org/10.1037/0278-7393.29.5.942 - Künnapas, T., & Künnapas, U. (1974). On the mechanism of subjective similarity for unidimensional continua.
*The American Journal of Psychology*,*87*(1/2), 215–222. http://dx.doi.org/10.2307/1422015 - Leland, J.W. (1998). Similarity judgments in choice under uncertainty: A reinterpretation of the predictions of regret theory.
*Management Science*,*44*(5), 659–672. http://dx.doi.org/10.1287/mnsc.44.5.659 - Leland, J.W. (2002). Similarity judgments and anomalies in intertemporal choice.
*Economic Inquiry*,*40*(4), 574–581. http://dx.doi.org/10.1093/ei/40.4.574 - Leland, J.W. (2013). Equilibrium selection, similarity judgments, and the nothing to gain/nothing to lose effect.
*Journal of Behavioral Decision Making*,*26*(5), 418–428. http://dx.doi.org/10.1002/bdm.1772 - Loh, W.-Y. (2011). Classification and regression trees.
*Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery*,*1*(1), 14–23. http://dx.doi.org/10.1002/widm.8 - Mechner, F. (1958). Probability relations within response sequences under ratio reinforcement.
*Journal of the Experimental Analysis of Behavior*,*1*(2), 109–121. http://dx.doi.org/10.1901/jeab.1958.1-109 - Morey, R.D. (2008). Confidence intervals from normalized data: A correction to Cousineau (2005).
*Tutorial in Quantitative Methods for Psychology*,*4*(2), 61–64. http://dx.doi.org/10.20982/tqmp.04.2.p061 - R Core Team (2017). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. http://www.R-project.org/
- Rilling, M., & McDiarmid, C. (1965). Signal detection in fixed-ratio schedules.
*Science*,*148*(3669), 526–527. http://dx.doi.org/10.1126/science.148.3669.526%20 - Rouder, J.N., Speckman, P.L., Sun, D., Morey, R.D., & Iverson, G. (2009). Bayesian
*t*tests for accepting and rejecting the null hypothesis.*Psychonomic Bulletin & Review*,*16*(2), 225–237. http://dx.doi.org/10.3758/PBR.16.2.225 - Rubinstein, A. (1988). Similarity and decision-making under risk (Is there a utility theory resolution to the Allais paradox?).
*Journal of Economic Theory*,*46*(1), 145–153. http://dx.doi.org/10.1016/0022-0531(88)90154-8 - Rubinstein, A. (2003). Economics and psychology? The case of hyperbolic discounting.
*International Economic Review*,*44*(4), 1207–1216. http://dx.doi.org/10.1111/1468-2354.t01-1-00106 - Scholten, M., & Read, D. (2010). The psychology of intertemporal tradeoffs.
*Psychological Review*,*117*(3), 925–944. http://dx.doi.org/10.1037/a0019619 - Stevens, J.R. (2016). Intertemporal similarity: Discounting as a last resort.
*Journal of Behavioral Decision Making*,*29*(1), 12–24. http://dx.doi.org/10.1002/bdm.1870 - Stevens, S.S. (1975).
*Psychophysics: Introduction to its perceptual, neural and social prospects*. New York: Wiley.Google Scholar - Wagenmakers, E.-J. (2007). A practical solution to the pervasive problems of
*p*values.*Psychonomic Bulletin & Review*,*14*(5), 779–804. http://dx.doi.org/10.3758/BF03194105 - Welford, A.T. (1960). The measurement of sensory-motor performance: Survey and reappraisal of twelve years’ progress.
*Ergonomics*,*3*(3), 189–230. http://dx.doi.org/10.1080/00140136008930484