A remarkable feature of the human mind is its ability to improve itself continually. As helpless babies develop into mature adults, they not only acquire impressive perceptual and sensory-motor skills and knowledge about the world. They also acquire cognitive skills such as the abilities to perform mental arithmetic, plan, and problem-solve (van Lehn, 1996; Shrager & Siegler, 1998; Lieder & Griffiths, 2017; He et al., 2021; Jain et al., 2019). These abilities can be understood in terms of computational procedures that people perform on their mental representations of the external environment. Such computational procedures are known as cognitive strategies. Here, we focus on cognitive strategies for planning and refer to them as planning strategies. There are many different types of planning strategies that people can use. And as a person gains more experience they might switch from a less effective strategy to a more effective one. For instance, the first time a person plans a road trip they might start by thinking about which nearby location they might visit first, mentally simulating how good it would be to visit that location, then think about where they might go next, mentally simulating what it would be like to be there, and so on. By the time that this person plans their tenth road trip, she might start by mentally simulating especially attractive distant locations that the road should be designed to lead to. These two examples illustrate that people’s planning strategies draw on a shared set of elementary planning operations that mentally simulate states and actions but differ in what planning operation they perform under which conditions.

Developmental and learning-induced changes in how people think and decide are collectively known as cognitive plasticity. Just like the acquisition of perceptual skills (Hubel & Wiesel, 1970), the acquisition of cognitive skills requires specific experiences and practice (van Lehn, 1996; Ericsson et al., 1993). Despite initial research on how people acquire cognitive skills such as the abilities to perform mental arithmetic, plan, and problem-solve (van Lehn, 1996; Shrager & Siegler, 1998; Lieder & Griffiths, 2017; He et al., 2021; Jain et al., 2019), the underlying learning mechanisms are still largely unknown. Reverse-engineering how people discover effective cognitive strategies is very challenging. This is chiefly because it is impossible to observe directly people’s cognitive strategies or how people’s strategies and strategy choices change with experience – let alone the underlying learning mechanisms. Instead, cognitive plasticity has to be inferred from observable changes in behavior. This is difficult because any observed behavior could have been generated by many different cognitive mechanisms. This problem is pertinent to all areas of cognition.

We assume that each planning strategy performs a sequence of internal information gathering operations (Callaway et al., 2022b). Concretely, we assume that each of these planning operations mentally simulates what might happen if one took a particular action in a particular situation. We assume that the outcome of each simulation is the reward that the person expects the action to generate. Furthermore, we treat the mental simulation of each state-action pair as a separate planning operation. These assumptions make it possible to measure planning by externalizing the process of information gathering that would otherwise occur through memory recall and mental simulation (Callaway et al., 2017; Callaway et al., 2018; Callaway et al., 2022b). Building on this theory and a previous method for studying how people choose between alternatives with multiple attributes (Payne et al., 1993), we introduce a process-tracing paradigm for revealing the sequence of information gathering operations people perform during planning (see Fig. 1) and a computational method for inferring the underlying planning strategies (see Fig. 2). We will refer to these methods as the Mouselab MDP paradigm and our computational microscope.

Fig. 1
figure 1

Illustration of the Mouselab-MDP paradigm. This figure shows a three-step planning task that can be created within the Mouselab-MDP paradigm. Here, the participant has to choose a series of three moves. Starting from the central location, the first decision is whether to move left, up, or right (Step 1); in each case there is only one option for the second move (Step 2), and then the spider can turn either left or right in the third step. Rewards are revealed by clicking, prior to selecting a path with the arrow keys. At each node each of the four possible rewards is equally likely to occur

Fig. 2
figure 2

Illustration of the basic idea of measuring people’s planning strategies. The Mouselab MDP paradigm is a process-tracing method that utilizes mouse tracking to measure which pieces of information people inspect during planning and in which order they inspect them. The computational microscope is a model-based inference method that determines which of 79 different planning strategies the participant is most likely to have used on a given trial

Our process-tracing method renders people’s behavior in a route planning task highly diagnostic of their planning strategies by

requiring them to click on locations they consider visiting to find out how costly or rewarding it would be to do so (see Figure 1). That is, when a person clicks on the state that they would get to by taking a certain action in a certain state, we treat it as an indication that they just performed the corresponding planning operation. The Mouselab-MDP paradigm poses people a series of planning problems (one in each trial). For each trial, it records the sequence of clicks (planning operations) that the participant performed, which information each click revealed, and the plan that the participant selected based on the resulting information (see Fig. 3). As Fig. 3 illustrates, this makes it possible to observe how the type of planning operations a person performs and the order in which she performs them change from each trial to the next. Our computational microscope uses the resulting process-tracing data to perform model-based inference on the trial-by-trial sequence of planning strategies the participant used to make his or her decisions. Together, these two methods allow researchers to specify a planning task and directly measure how people’s planning strategies change from one trial to the next (see Fig. 2). To facilitate adoption of the toolbox, we provide JavaScript and Python libraries for both components and a tutorial on how to use them. We hope that this toolbox will help researchers measure how people’s planning strategies change depending on their experience.

Fig. 3
figure 3

Illustration of the process-tracing data that can be collected with the Mouselab-MDP paradigm. The recorded interactions (clicks and moves) the participant made and the information the participant observed are enumerated in the order in which they occurred. In this example, the first participant started out with a short-sighted planning strategy and gradually discovered a more far-sighted one. On the first trial she made two clicks on immediate outcomes on their first trial and then selected a path. In the last trial the first participant inspected three final outcomes. The process-tracing data from the intermediate trials documents the participant’s transition between these two very different ways of planning

People changing their planning strategies in response to how well they worked is a prime example of what we call metacognitive reinforcement learning (Krueger et al., 2017; Lieder & Griffiths, 2017; Lieder et al., 2018c; Jain et al., 2019; He et al., 2021). Metacognitive reinforcement learning is set of mechanisms through which people learn when to perform which cognitive operations through trial and error. These mechanisms might play an important role in how people discover new cognitive strategies, adapt their strategies to the structure of their environment, and acquire cognitive skills (Lieder & Griffiths, 2017; Krueger et al., 2017; Jain et al., 2019; He et al., 2021).

Metacognitive learning is difficult to study because its effects and mechanisms cannot be observed directly. Throughout this article we will present a series of case studies to illustrate that our new computational method is useful for characterizing how people learn how to plan and elucidating metacognitive reinforcement learning more generally.

The plan for this paper is as follows: First, we summarize and illustrate the functionality offered by our toolbox for measuring how people learn how to plan and explain how it works. Next, we provide a practical step-by-step user’s guide on how to apply it. We then demonstrate the reliability and validity of the inferences of our computational microscope.

In closing, we discuss directions for future work enabled by the methodology introduced in this article.

New methods for measuring how people learn how to plan

Planning, like all cognitive processes, cannot be observed directly but has to be inferred from observable behavior. This is generally an ill-posed problem. In previous work, researchers have inferred properties of human planning from the decisions participants ultimately made or asked participants to verbalize their planning process. However, many different planning strategies can lead to the same final decision, and introspective reports can be incomplete or inaccurate. In the 1970s researchers studying how people choose between multiple alternatives (e.g., apartments) based on several attributes (e.g., rent, size, location, etc.) faced a similar problem (Payne, 1976). To overcome this problem, Johnson et al., (1989) developed a process-tracing paradigm that elicits and records behavioral signatures of people’s decision strategies. Concretely, in the Mouselab paradigm (Payne et al., 1993), the alternatives’ attribute values are initially concealed and the participant can make clicks with their computer mouse to reveal one attribute value at a time. The Mouselab paradigm allows researchers to trace people’s decision strategies by recording which attributes of which alternatives people inspect in which order (Payne et al., 1993). While these behavioral signatures are still indirect measures of cognitive processes, and the means of observation might disturb the normal processes of decision-making, they do at least provide additional information about potential underlying decision strategies.

The Mouselab paradigm has enabled an extremely productive stream of research on the processes of multi-attribute decision-making (Payne et al., 1988; Ford et al., 1989; Payne et al., 1993; Schulte-Mecklenbeck et al., 2011; Schulte-Mecklenbeck et al., 2019). Here, we introduce two new methods that extend the process-tracing methodology from the domain of multi-attribute decision-making to the domain of planning. We start by describing a new process-tracing paradigm for measuring individual planning operations (Section 5). Measuring planning operations can yield valuable insights into how people plan (Callaway et al., 2017; Callaway et al., 2022b). But most research questions, such as how human planning compares to planning algorithms used in artificial intelligence, are not formulated at the level of individual planning operations but instead at the level of planning strategies.

Analyzing the data collected with our process-tracing paradigm suggested that people use a wide range of different planning strategies. We found that which strategy people use does not only depend on the structure of the environment (Callaway et al., 2018; Callaway et al., 2022b) but also on the participant’s learning history and individual differences. Concretely, we found that people may use as many as 79 different planning strategies across different environments and different points in time. These strategies prioritize different types of information, such immediate outcomes versus long-term consequences, highly uncertain outcomes, or outcomes following gains rather than losses, and they also differ in when they stop collecting more information (e.g., upon uncovering a path yielding a reward of at least $48). The resulting set of strategies includes variants of classic planning algorithms, such as breadth-first search, depth-first search, and best-first search, as well as several novel strategies, such as first identifying the best possible final outcome and then planning backward from it. The 79 planning strategies can be grouped into 13 different types, including goal-setting strategies with exhaustive backward planning, forward-planning strategies similar to breadth-first search, and forward planning strategies similar to best-first search (see Section A for a list of all strategies grouped by strategy type).

To make it possible for researchers to measure which strategies were used, we developed a computational method that leverages each participant’s process-tracing data to infer which strategy he or she used on the first trial, the second trial, the third trial, etc. We introduce this method in Section 5. The basic idea is to invert a probabilistic model of how the participant’s process-tracing data was generated by a series of planning strategies through Bayesian inference. This is a challenging methodological problem because people rarely execute any given strategy perfectly. We solve this problem by explicitly modeling the variability in the strategy that people use, in their execution of the strategy, and in the way the execution of the strategy manifests in their process-tracing data. In addition, we also model that there might be trials on which people don’t use any particular strategy or a strategy that is still unknown.

Our computational microscope can be applied to reveal people’s planning strategies in a wide range of different task environments. Used in combination, our two methods can be used to characterize the cognitive mechanisms of human planning, investigate how a person’s planning strategies evolve across trials, and uncover how planning strategies are affected by contextual factors and differ between individuals. Our methods support this research by providing trial-by-trial measurements of four aspects of human planning: the series of planning operations they performed, which of the 79 different planning strategies was the most likely source of those planning operations, which type of strategy it was, and how different types of previously postulated mechanisms (e.g., habits vs. Pavlovian mechanisms vs. reasoning) might have shaped a person’s planning on a given trial.

Figure 4 summarizes the information that our computational microscope provides the user about how a given participant planned in a given Mouselab-MDP experiment. The following sections illustrate each of these functionalities in turn.

Fig. 4
figure 4

Illustration of the hierarchically nested information that our method provides about a participants planning throughout the n trials of a Mouselab-MDP experiment. The participant’s learning trajectory is characterized by the sequence of planning strategies that the participant used on trial 1, trial 2, ⋯, trial n, respectively. The strategy the participant used on a given trial is characterized by a procedural description, the general type of planning strategy it instantiates, the sequences of clicks it performed on that trial, the plan that they selected on that trial, and how the influences of different decision systems and other factors combine to generate that strategy. Each click sequence comprises a series of clicks. Each click is characterized by where the participant clicked and which information (reward) their click unveiled. Timing data is also available

In this section we give a brief high-level overview of the functionality offered by our methods. The technical details are presented in the following section.

Measuring individual planning operations with the Mouselab-MDP paradigm

To make individual planning operations measurable, we developed a process-tracing paradigm that externalizes people’s beliefs and planning operations as observable states and actions (Callaway et al., 2017). We refer to this paradigm as the Mouselab-MDP paradigm because it extends the approach of the Mouselab paradigm (Payne et al., 1993) to a general class of planning tasks known as Markov Decision Processes (MDPs) (Sutton and Barto, 2018). A Markov Decision Process comprises a series of decisions. Given the current state (e.g., location) the agent has to choose an action that, together with the current state, determines both an immediate reward and the next state. The task is to maximize the sum of all rewards over time. Inspired by the Mouselab paradigm (Payne et al., 1993), the Mouselab-MDP paradigm uses people’s mouse-clicking as a window into their planning. As illustrated in Fig. 1, this paradigm presents participants with a series of route planning problems. Each route planning problem is presented as a map where each location (the gray circles), harbors a gain or loss. These potential gains and losses are initially occluded, corresponding to a highly uncertain belief state. The participant can (expensively) reveal each location’s reward by clicking on it and paying a fee. This is similar to looking at a map to plan a road trip. Clicking on a circle corresponds to thinking about a potential destination, evaluating how enjoyable it would be to go there, or perhaps how costly it would be to go through there on the way to somewhere else, and then adjusting one’s assessment of candidate routes accordingly. The set of revealed rewards constitutes the state of the participant’s knowledge which we will refer to as the belief state. The tasks in this paradigm are designed such that each planning operation requires the participant to make a specific click and each click is the output of a specific planning operation. Participants can make as few or as many clicks as they like. After that the participant has to select a route through the environment using the arrow keys. For each location they visit, the corresponding reward is added to their score. The task is to maximize the money earned by traversing the environment minus the fees paid for collecting information.

The Mouselab-MDP paradigm can be used to create a wide range of environments that vary in size, layout (structure), and reward distribution. Figures 17a-c, and 9 illustrate the variety of task environments that can be created with this paradigm. Several of the illustrative examples below and the experiments used to validate our methods are based on the simple three-step planning task shown in Fig. 1. Here, the participant can earn money by navigating a money-loving spider through a “web of cash”. There are six possible paths the participant can choose between. Each path comprises three steps, starts from the gray node in the center of the web, and proceeds along the arrows. In the first step, the spider can go left, up, or right. In the second step, it has to continue in that direction. In the third step, it can choose to either turn left or right. Each node that the spider might visit along the chosen path harbors a gain of up to $48 or loss of up to $-48. The player earns a monetary bonus proportional to the sum of the three rewards along the chosen path minus the fees they paid for clicking. In the beginning all gains and losses are concealed. The participant can uncover them for a fee of $1 per click. The participant can make as many or as few clicks as they like. Once they are done collecting information (planning), they start acting by moving the spider with the arrow keys. The participant receives the gain or loss at a given location if and only if they move the spider there. Clicking on a node only reveals the information which gain or loss they would receive if they moved to the inspected location but does not collect that reward. Furthermore, whether or not a node has been inspected has no effect on the reward the participant receives when the spider enters that location. Critically, in this particular three-step planning task, the variance of the potential rewards is smallest for the nodes that can be reached within one step, larger for the nodes that can be reached within two steps, and largest for the potential final destinations that are three steps away from the spider’s starting position at the center of the web (see Figure 1). This captures a common feature of real-world planning problems, namely that long-term outcomes are more important than short-term rewards.

The Mouselab-MDP paradigm can be used to elicit information about people’s planning operations at a level of detail which was inaccessible with previous behavioral paradigms. It makes it possible to measure which information people’s planning strategies consider in which order and how this depends on the information revealed by previous planning operations. Figure 3 illustrates the kind of process-tracing data that can be obtained with the Mouselab-MDP paradigm. The data from any given trial traces the strategy that an individual participant used to reach their decision on that trial. Taken together, the data from a series of trials traces how the participant’s decision strategy changed along with the observations and experienced rewards that preceded each change. Concretely, the example illustrated in Fig. 3 what the data might look for a participant who starts out with a myopic planning strategy and gradually discovers the optimal far-sighted goal-setting strategy.

A computational microscope for inferring people’s planning strategies

The fine-grained information about the planning operations obtained from the Mouselab-MDP paradigm can be used to draw much richer inferences about how people plan and how the way they plan changes over time. However, the raw click sequences are difficult to analyze directly without sophisticated and typically theory-laden modeling tools. The computational microscope is a computational method that makes it possible to characterize how the participants of your experiment planned at the level of planning strategies, strategy types, and the contributions of different decision systems and other factors. In this section, we first give an overview of the computational microscope’s functionality. We then give a detailed account of how this functionality is implemented and close with an illustrative example of how the computational microscope can be used.

Overview of the computational microscope’s functionality

The computational microscope makes use of the information about people’s planning operations collected with the Mouselab-MDP process-tracing paradigm to help us better understand how people plan and how their planning changes over time. It makes it possible to infer which of 79 known planning strategies a participant used on a given trial from their clicks in the Mouselab-MDP paradigm. The set of 79 planning strategies includes the strategy that does not plan at all, a strategy that only inspects the immediate rewards, a strategy that inspects only the potential final outcomes and terminates planning once it discovers a large positive value, a variant of this strategy that plans backward from the preferred final outcome, search-based planning strategies (Russell & Norvig, 2016), such as breadth-first search (i.e. first explore nodes that are one step away, then explore nodes at are two steps away, and so on) and best-first search (i.e., explore nodes in decreasing order of the values of the paths they lie on), a strategy that explores all final nodes that are farthest away from the start node, and many others. For the hypothetical data set illustrated in Fig. 3, our computational microscope would likely infer that the participant started with the myopic planning strategy that terminates upon uncovering a positive value (Strategy 53 described in Section A) and eventually discover the optimal goal-setting strategy (Strategy 6 described in Section A).

In addition to fine-grained information about concrete planning strategies, the computational microscope also provides high-level information about which kind of planning strategy the person is using. Concretely, the microscope distinguishes between 13 types of planning strategies: four types of goal-setting strategies that explore potential final outcomes first, a strategy that explores immediate outcomes on the paths to the best final outcomes, a satisficing version of that strategy, forward-planning strategies (i.e strategies that start planning from nodes that are one step away from the start node) similar to Breadth First Search, middle-out planning (i.e the strategies that click the nodes in the middle of a path, then click the nodes that are nearest to the start node and then click nodes that are the farthest away), forward-planning strategies similar to Best First Search, local search strategies that focus on information about subtrees and next or previous steps along the paths that have received the most consideration so far, frugal planning strategies (i.e strategies that explore very little or not at all), myopic planning strategies (i.e. strategies that only explore nodes that are one step away from the start node) and a few other strategies that do not fit any of these categories. The four types of goal-setting strategies differ in how many potential goals they inspect (all vs. some), in how many and which earlier outcomes they inspect (all vs. some), and in when and how often they transition between inspecting goals versus earlier outcomes. For instance, goal-setting with exhaustive backward planning inspects all potential goals and all earlier outcomes. By contrast, frugal goal-setting strategies only explore some of the potential goals and none or only a small number of the earlier outcomes. Maximizing goal-setting with limited backward planning first identifies an optimal final outcome and then either terminates planning or inspects only the nodes on the path leading to the best final outcome. By contrast, maximizing goal-setting with exhaustive backward planning inspects the paths to all potential goals in the order of the goals’ rewards after having inspected all potential goals.

For the hypothetical data set illustrated in Fig. 3 our computational microscope would likely infer that the participant started with a frugal planning strategy and eventually discovered a maximizing goal-setting strategy with limited backward planning. The definitions of these strategy types are presented in Section A.

The computational microscope’s functionality is realized through model-based probabilistic inference. The model comprises three components: probabilistic models of 79 planning strategies, a probabilistic model of how planning strategies generate click sequences (observation model) and a probabilistic model of the sequence of planning strategies (prior on strategy sequences). As shown in Fig. 5, our method assumes that which planning strategy (St) a participant uses can change from each trial (t) to the next but remains constant within each individual trial. In other words, we assume that exactly one planning strategy is used in each trial and that this strategy may be different from the one that was used in the previous trial and the one that will be used in the following trial. Furthermore, our method assumes that the strategies themselves do not change. Therefore, the computational microscope infers the trial-by-trial sequence of planning strategies that the participant used in the experiment (i.e., which strategy her or she used in the first trial of the experiment, which potentially different strategy he or she used in the second trial of the experiment, etc.). This sequence of planning strategies is inferred from the corresponding sequence of trial-by-trial click sequences (i.e., one click sequence for each trial). The basic idea is to find the sequence of planning strategies that is most likely to have generated the observed sequence of click sequences. The trial-by-trial changes in the relative influences of different decision systems and other factors can then be read off from the inferred strategy sequence because we make the simplifying assumption that way in which those factors interact to generate the behavior of a given strategy does not change over time. The computational microscope requires access to a set of planning strategies which generate the planning operations in a trial and models transitions among these strategies using a prior. We first describe how we formally model the planning strategies. We then describe the generative model of clicks (planning operations) given a strategy and then discuss how the computational microscope performs model inversion by taking into consideration information about participants’ clicks obtained from the Mouselab-MDP and the prior on strategy sequences to make inferences about the most likely sequence of strategies that might have generated the data. Obtaining the most likely sequence of strategies also gives us information about the strategy types and the temporal evolution of relative influence of decision systems (see Section 5).

Fig. 5
figure 5

Overview of the computational microscope describing the Hidden Markov model that generates the observed process-tracing data as a graphical model

Modeling planning strategies

To make it possible to extract interpretable strategies from the raw click sequences, we formulated a set of 79 planning strategies (\(\mathcal {S}\)) through a data driven methodology. Concretely, we manually inspecting the process-tracing data from an experiment in which participants completed 31 trials of the 3-step planning task illustrated in Fig. 1 (for description, see Appendix A.1). We visually inspected this data one click sequence at a time. Each time, we checked whether the current click sequence could be an instance of an already identified strategy. When this was not the case, we manually added an additional strategy to account for this new pattern. We then proceeded to the next click sequence and repeated the same procedure. If there was no apparent pattern, we identified it as an instance of a strategy that clicks randomly. We continued this process until our strategies were able to account for all click sequences of every participant who participated in the experiment described in Appendix A.1.

We modelled each of these planning strategies as a stochastic procedure that generates a sequence of planning operations (clicks). That is, a planning strategy specifies a probability distribution over what the first click might be and conditional probability distributions over what each subsequent click might be depending on which clicks were made previously and which rewards they revealed. For instance, the best-first search strategy distributes the probability of the first click evenly among the immediate outcomes and concentrates the probability of subsequent clicks on proximal outcomes that follow the best immediate reward(s). Furthermore, the planning strategy also specifies the conditional probability to terminate planning and select an action based on the information that has been revealed so far. For instance, for many of our planning strategies, the probability of terminating planning increases with the sum of the rewards of the best path that has been identified so far. As detailed in the next section, each planning strategy (s) entails a probability distribution (P) over which process tracing data (d) might be observed if a participant used that strategy (P(d|s)). Different strategies differ in which planning operations they perform first, in how they use the revealed information to select the subsequent planning operations, and in when they terminate planning. We model each sequence of planning operations a participant performed from the beginning of a trial to the end of that trial as the manifestation of a single strategy.Footnote 1

According to our model, all strategies are probabilistic in the sense that they randomly select between all functionally equivalent planning operations that are consistent with what the strategy does in the current step. For instance, when the first step of a strategy is to inspect immediate outcomes until it uncovers a positive value, then our model assumes that the strategy chooses uniformly at random between all planning operations that inspect an uninspected immediate outcome. For more details about the strategies, please see Appendix A.4.

We found that, collectively, the 79 planning strategies can capture people’s click sequences much better than the random strategy. Concretely, we found that, on average, each click made by a participant is 3 to 6 times as likely under the best fitting strategy than under the random strategy. That is, for the environment with increasing variance, the maximum likelihood estimate of people’s strategies achieve an average click likelihood of 0.38 whereas the random strategy achieves an average click likelihood of only 0.10. For the environment with constant variance (Fig. 7b), the average per click likelihood is 0.50 whereas it is 0.09 for the random strategy. For the environment with decreasing variance (Fig. 7a), the average per click likelihood is 0.37 whereas it is 0.08 for the random strategy. And finally, for the environment used in the transfer task (Fig. 7c), the average per click likelihood is 0.19 whereas it is 0.03 for the random strategy.

Modeling how strategy sequences generate process-tracing data

To develop an efficient computational method for inferring the temporal evolution of people’s planning strategies, we make the simplifying assumption that the trial-by-trial sequence of peoples’ cognitive strategies (S1,S2,⋯ ,ST) forms a Markov chain whose hidden states emit the observed process tracing data collected on each trial (d1,⋯ ,dT). This hidden Markov model requires additional methodological assumptions about i) how cognitive strategies manifest in process-tracing data, ii) the set of cognitive mechanisms that can be learned (defined in Section 5), and iii) the nature and amount of cognitive plasticity that might occur. The following paragraphs detail our assumptions about the components i) and iii) in turn.

Observation model

To plan in the Mouselab-MDP paradigm participants have to gather information by making a sequence of clicks. Our observation model thus specifies the probability of observing a sequence of clicks dt on trial t if the strategy was St (i.e., P(dt|St)).

To achieve this, we quantify each planning strategy’s propensity to generate a click c (or stop collecting information) given the already observed rewards encoded in belief state b by a weighted sum of 51 features (f1(b,c),⋯ , f51(b,c)). The features describe the click c relative to this information (e.g., by the value of the largest reward that can be collected from the inspected location) and in terms of the action it gathers information about (e.g., whether it pertains to the first, second, or third step). A detailed description of the features and strategies is available in Appendix A.6.

The depth feature, for instance, describes each click by how many steps into the future it looks. The features and weights jointly determine the strategy’s propensity to make click c in belief state b according to

$$ P(\mathbf{d_{t}}|S_{t})=\prod\limits_{i=1}^{|\mathbf{d_{t}}|} \frac{\exp\left( \frac{1}{\tau} {\sum}_{k=1}^{|w^{(S)}|} w_{k}^{(S)} f_{k}^{(S)}(c_{t,i},b_{t,i}) \right)}{{\sum}_{c \in \mathcal{C}_{b_{t}}} \exp\left( \frac{1}{\tau} {\sum}_{k=1}^{|w^{(S)}|} w_{k}^{(S)} f_{k}^{(S)}(c,b_{t,i}) \right)} , $$

where dt,i is the ith click the participant made on trial t (or the decision to stop clicking and take action), the decision temperature τ was considered as a hyperparameter which was set by the inference procedure, and w(S) is the weight vector of strategy S. According to this probabilistic soft-max model, all clicks are possible under each strategy in each situation but their probability is higher the better they are aligned with the strategy.

The strategies differ in how much information they consider (ranging from none to all to exploring all the nodes), which information they focus on, and in the order in which they collect it. Building on the observation model in Eq. 1, we represent each strategy by a weight vector \(\mathbf {w}=\left (w_{1},\cdots ,w_{51} \right )\) that specifies the strategy’s preference for features such as more vs. less planning, exploring nodes with more uncertainty vs. less, considering immediate vs. long-term consequences, satisficing vs. maximizing, avoiding losses (cf. Huys et al.,, 2012), exploring paths that have a larger number of explored nodes, exploring nodes that are related to already observed nodes such as the ancestor nodes, successor nodes and siblings, and other desiderata. These weights are computed by generating data by simulating which clicks each strategy would make and then fitting the weights in Eq. 1 using Maximum Likelihood Estimation (MLE). These weights span a high-dimensional continuous space with many intermediate strategies and mixtures of strategies. Cognitive plasticity could be measured by tracking how those weights change over time. But this would be a very difficult ill-defined inference problem whose solution would depend on our somewhat arbitrary choice of features. As a first approximation, our method therefore simplifies the problem of measuring cognitive plasticity to inferring a time-series of discrete strategies. A detailed description of the features used in the observation model can be found in Appendix 5

Prior on strategy sequences

Inferring a strategy from a single click sequence could be unreliable. To smooth out its inferences, our method therefore exploits temporal dependencies between subsequent strategies by using a probabilistic model of strategy sequences.

Transitions from one strategy to the next can be grouped into three types: repetitions, gradual changes, and abrupt changes. While most neuroscientific and reinforcement-learning perspectives emphasize gradual learning (e.g., Hebb, 1949; Mercado, 2008; Lieder et al.,, 2018c), others suggest that animals change their strategy abruptly when they detect a change in the environment (Gershman et al., 2010). Symbolic models and stage theories of cognitive development also assume abrupt changes (e.g., Piaget, 1971; Shrager & Siegler, 1998), and it seems plausible that both types of mechanisms might coexist.

We considered three kinds of priors on the strategy transitions: gradual, abrupt and a combination of gradual and abrupt transitions. We did not find any significant relationship between the probability of transition from one strategy to the next and the distance between the strategies (see Appendix A.2.1). We found that the frequency of a transition from a strategy to itself was more likely than a transition from a strategy to some other strategy (t(975) = 7.55, p < 0.0001,BF > 1000). Model selection using either AIC (Akaike, 1974) or BIC (Schwarz & et al. 1978) values computed using the likelihood values of the maximum likelihood estimate of the strategy sequence also revealed the abrupt prior to be the best performing. Therefore, we use the abrupt prior for all our inferences. The gradual and the mixed priors are described in Section 5.

The abrupt changes prior assumes that transitions are either repetitions or jumps.

$$ \begin{array}{@{}rcl@{}} &&P(S_{t+1}=s|S_t,m_{\text{abrupt}}) = \\ &&\ \ p_{\text{stay}}\mathbb{I}(S_{t+1}=S_t)+(1-p_{\text{stay}})\frac{\mathbb{I}(s \neq S_t)}{|\mathcal{S}|-1}, \end{array} $$

where \(\mathcal {S}\) is the set of strategies, \(|\mathcal {S}|\) is the number of strategies and pstay is the probability of strategy repetitions.

We model the probability of the first strategy as a uniform distribution over the space of decision strategies (i.e., \(P(S_{1})=\frac {1}{|\mathcal {S}|}\)).

Together with the observation model and the strategy space described above, the prior defines a generative model of a participant’s process tracing data d; this model has the following form:

$$ \mathit{P}(\mathbf{d},S_{1},\cdots,\mathit{S}_{T}) = \frac{1}{|\mathcal{S}|} \prod\limits_{t=2}^{T} \mathit{P}(S_{t}| S_{t-1} | m_{\text{abrupt}}) \mathit{P}(\mathbf{d_{t}}|S_{t}). $$

Inverting this model gives rise to a computational method for measuring an important aspect of cognitive plasticity.

Inferring strategy sequence by model inversion

Our model describes how the sequences of planning strategies a participant uses across the different trials of the experiment manifests in their process-tracing data. To measure this sequence of planning strategies, we have to reason backwards from the process tracing data d to the unobservable cognitive strategies S1,⋯ ,ST that generated it. To achieve this, we first model the generation of process-tracing data using a Hidden Markov Model with the 79 planning strategies as the possible values of its latent states and the prior m +abrupt as its transition prior. Having modelled how likely alternative strategies are to generate a given sequence of clicks, we can apply Bayes theorem to compute how likely a person is to have used different planning strategies given the clicks that they have made. More concretely, the computational microscope computes the sequence of strategies s1,s2,⋯ ,sT that is most likely to have given rise to the process-tracing data observed on the corresponding T trials (d1,d2,⋯ ,dT). This is achieved by applying the Viterbi algorithm (Forney, 1973) to compute the maximum a posteriori (MAP) estimate \(\arg \max \limits _{s_{1},s_{2},\cdots ,s_{T}} P(s_{1},s_{2},\cdots , s_{T} | \mathbf {d}_{1},\mathbf {d}_{2},\cdots ,\mathbf {d}_{T})\) of the hidden sequence of planning strategies S1,⋯ ,ST given the observed process tracing data d, the measurement model mabrupt, and the parameter (pstay of Eq. 2 and the strategy temperature parameter τ of the observation model. This inference combines the likelihood that a possible strategy would generate an observed click sequence with how probable potential sequences of planning strategies are a priori. The prior probability of strategy sequences is assigned based on the knowledge that people are often somewhat more likely to repeat the strategy they used on the previous trial than to switch an arbitrary other strategy.

To estimate the model parameter pstay we perform grid search with a resolution of 0.02 over pstay ∈ [0,1]. The value of τ is set using 50 iterations of Bayesian Optimization, with the likelihood of MAP estimate of the click sequence as the objective it maximizes. We use the Tree-structured Parzen estimator approach to Bayesian Optimization implemented in the hyperopt Python package (Bergstra et al., 2013) for optimizing the parameter τ.

Inferring the hidden sequence of cognitive strategies in this way lets us see otherwise unobservable aspects of cognitive plasticity through the lens of a computational microscope.

Inference on strategy types and meta-control

To understand what types of strategies people use, we grouped our 79 strategies using hierarchical clustering on the distances between the strategies. Since the strategies are probabilistic, we defined the distance metric Δ(s1,s2) between strategy s1 and s2 as the Symmetrised Kullback-Leibler divergence

between the distributions of click sequences and belief states induced by strategies s1 and s2 respectively, that is

$$ \begin{array}{@{}rcl@{}} {\Delta}{ (s_1, s_2)} &=& \text{JD}\left[p(\mathbf{d}|s_1),p(\mathbf{d}|s_2)\right] \\ &=& \text{KL}\left[p(\mathbf{d}|s_1),p(\mathbf{d}|s_2)\right] \\&&+ \text{KL}\left[p(\mathbf{d}|s_2),p(\mathbf{d}|s_1)\right], \end{array} $$

and approximated it using Monte-Carlo integration.

Applying Ward’s hierarchical clustering method (Ward, 1963) to the resulting distances suggested 13 types of planning strategies described in Section 5.

As discussed in Section 5, we assume that people’s choice of planning operations is shaped by the interactions of multiple decision systems and other factors. To measure the contribution of each factor in a strategy, we first assigned each feature to one of the decision systems. Then, for each decision system, we added the weights of the features which belonged to that decision system if the feature represented an increase in that decision system and subtracted it if it represented a decrease in that decision system to give us a weight wds for a decision system. The relative influence of the decision system on a strategy is measured by:

$$ \begin{array}{@{}rcl@{}} \text{RI}_{ds} = \frac{|w_{ds}|}{\sum\limits_{ds \in D}{|w_{ds}|}} , \end{array} $$

where D is the set of all decision systems.

An example of applying the computational microscope

To illustrate the functionality of our computational microscope, we applied it to data from an experiment evaluating intelligent tutors that teach people effective planning strategies (i.e., the experiment described in Appendix A.1). In this experiment participants practiced planning in the three-step decision task illustrated in Fig. 1 (see Section 5) for 10 trials (training block) and were then tested on 20 more trials of the same task (test block). Participants in the experimental conditions received two different types of feedback during the training block. Participants in the control condition received no feedback.

Table 1 lists all strategies that people used on at least 2% of the trials ordered by strategy type and frequency. As can be seen, the most common strategy types were maximizing goal-setting with limited backward planning, frugal planning, local search, myopic planning, frugal goal-setting, and other miscellaneous strategies that don’t belong to any other strategy type. These 6 types of strategies jointly accounted for 96.5% of all strategies that people used in this environment. For more information about these strategy types and the corresponding planning strategies, please see Appendix A.4.

Table 1 Summary of the planning strategies that people used most frequently in the environment illustrated in Fig. 1

Measuring the relative contributions of different decision systems and other factors

How people plan is shaped by the interaction of multiple different types of mechanisms throughout the decision-making process (van der Meer et al.,, 2012; Huys et al.,, 2012, 2015; Dolan & Dayan, 2013; Cushman & Morris, 2015; Keramati et al.,, 2016; Daw, 2018). In most real-life decisions it is infeasible or unwise to consider all possible sequences of actions, states, and outcomes. To decide which alternatives to consider and which ones to ignore, the model-based system relies on the recommendations of simpler mechanisms such as Pavlovian impulses (Huys et al., 2012), value estimates learned through model-free reinforcement learning (Cushman & Morris, 2015), and simple heuristics (Huys et al., 2015). Furthermore, previous findings indicate the existence of an additional decision system that is specialized for deciding between continuing to gather information (e.g., by foraging) versus acting on the information that is already available (Rushworth et al., 2012). Since deciding how to plan is like foraging for information, the decision when to stop planning might also be made separately from the decision how to plan. This decision can be made by determining whether the best plan identified so far is already good enough (satisficing) or other stopping criteria. In addition, people are also known to engage in metareasoning (Ackerman & Thompson, 2017; Griffiths et al., 2019) – that is reasoning about reasoning – to figure out what is the best way to figure out what to do. Furthermore, all else being equal, the way in which people decide seems to follow the law of least mental effort (Patzelt et al., 2019; Balle, 2002; Kool et al., 2010), that is people seek to avoid mental effort.

We assume that all of these factors simultaneously influence how a person selects his or her individual planning operations while making a single decision (Keramati et al.,, 2016; Huys et al.,, 2012, 2015; Daw, 2018). To measure the relative contributions of these different types of factors to each of the 79 planning strategies, we divided the features whose weights determine the strategies’ preferences for alternative planning operations into five categories: Pavlovian, model-free values and heuristics, model-based metareasoning, mental effort avoidance, and satisficing and stopping criteria.

The Pavlovian features report how attractive or repelling it is to think about a state based on the rewards and losses that precede or follow it. The category model-free values and heuristics includes structural and relational features of state-action pairs that people might come to associate with rewarded versus unrewarded planning operations. The features in the category model-based metareasoning are derived from a model of how alternative planning operations reduce the decision maker’s uncertainty about which plan is best. The category mental-effort avoidance includes a single feature that distinguishes between performing a planning operation (more mental effort) versus acting without further planning (less mental effort). The features in the category satisficing and stopping criteria describe conditions under which specific stopping rules would terminate planning, such as whether there is a path whose expected return exceeds $48 which is an instance of satisficing (Simon, 1955). For a detailed definition of these categories in terms of the constituent features see Appendix A.6. To measure the relative influence of these five types of factors on how a person planned on a given trial, we first sum up the weights that the inferred strategy assigns to features of this type to get a total weight for the type and then normalize its absolute value by the sum of absolute values of total weights of all types. Performing this calculation separately for first, second, third, ⋯, last trial allows us to track how the relative influence of different decision systems (i.e., the model-based system, the Pavlovian system, and model-free systems) and other factors (i.e., mental effort avoidance and stopping criteria) changes as people learn how to plan.

For the hypothetical data set illustrated in Fig. 3 our computational microscope would likely infer that the participant started out relying primarily on structural features (a sub-category of model-free values and heuristics), satisficing features, and mental effort avoidance. Furthermore, it would most likely infer that the participant then transitioned to relying increasingly more on model-based metareasoning features.

Measuring cognitive plasticity

Our method makes it possible to measure how people’s approach to planning changes at multiple levels of resolution across time scales ranging from seconds to decades. It can resolve changes in people’s planning at the level of individual planning operations, planning strategies, strategy types, and the contributions of different decision systems and other factors. By default, our method’s temporal resolution is the amount of time that passes from one trial to the next. This makes it suitable for reverse-engineering the learning mechanisms through which people discover and continuously refine their planning strategies (Jain et al., 2019). It can also measure how people’s approach to planning evolves over longer time scales, such as blocks, sessions, years, and decades. This makes the computational microscope suitable for investigating how people learn how to plan and how they adapt their planning strategies to new environments. Figure 6 illustrates the computational microscope’s ability to reveal how people’s propensities towards different types of planning strategies evolve as they learn how to plan in the task illustrated in Fig. 1; to obtain these results we applied the computational microscope to the data from the control condition of the experiment described in Appendix A.1. The output of the computational microscope revealed that the strategies which explore the final outcomes first and terminate upon finding a high value became the most frequent strategy type. During this transition people shifted away from frugal planning strategies (i.e., strategies that explore only a few outcomes) which were the most common strategies at the start of the experiment along with the myopic planning strategies (strategies that explore immediate outcomes first). The miscellaneous strategies also decreased in frequency. The frequency of local search (i.e., the strategies that focus on information about subtrees or paths that have been explored the most so far) and frugal goal-setting strategies (i.e., strategies that start exploring from the final outcomes and only explore a few outcomes) initially became more frequent and then decreased again.

Fig. 6
figure 6

Measured time course of frequencies of strategy types in the experiment described in Appendix A.1.

In addition, the computational microscope can also be used to measure the transfer of learning from one task to another. Traditionally, transfer effects are established by demonstrating the training’s effect on people’s average performance in an untrained task. The computational microscope makes it possible to determine whether people transfer the specific strategies they learned in the training task to untrained tasks. To illustrate this, we applied the computational microscope to data from a transfer experiment in which participants practiced planning in a simple, small environment and were then tested on a larger and more complex environment. Concretely, the participants in the second experiment from Lieder (2018b) performed the five-step planning task illustrated in Fig. 7c after having practiced planning in the three-step planning task illustrated in Fig. 1 with optimal feedback (experimental condition) or without feedback (control condition). As shown in Fig. 8, the computational microscope revealed that participants from both conditions transferred the near-optimal goal-setting strategy they had learned in the three-step planning task to the five-step planning task.

Fig. 7
figure 7

Illustration of the environment with decreasing variance (a), the environment with constant variance (b), and the five-step version of the environment with increasing variance (c). In the environment with decreasing variance, the rewards at the first, second, and third step are sampled uniformly at random from the sets {− 48,− 24,+ 24,+ 48}, {− 8,− 4,+ 4,+ 8}, and {− 4,− 2,+ 2,+ 4}, respectively. In the environment with constant variance, the rewards at all locations are independently sampled from the same uniform distribution over the set {− 10,− 5,+ 5,+ 10}. In the five-step planning task with increasing variance the rewards at steps 1 to 4 are drawn from normal distributions with mean 0 and standard deviation σ1 = 20, σ1 = 21, σ1 = 22, and σ1 = 23, respectively, and the reward at step 5 is drawn from a normal distribution with mean 0 and standard deviation σ5 = 25

Fig. 8
figure 8

Comparison of frequencies of strategy types between the environment with increasing variance and transfer task. For a detailed description of the strategy types see Appendix A.4

Furthermore, our approach can also be used to characterize how people’s approach to planning changes across the lifespan (Das et al., 2019). Finally, our method can also be used to detect and compare the effects of (pedagogical) interventions on how people learn how to plan and to elucidate inter-individual differences in metacognitive learning (e.g., in psychiatric disorders).

A step-by-step guide to measuring how people learn how to plan

Experimenters can make use of our paradigm and our computational microscope very easily. In this section, we provide a tutorial like introduction for running experiments with the Mouselab-MDP paradigm and applying the computational microscope on data generated using the Mouselab- MDP paradigm.

A step-by-step guide to creating and running process-tracing experiments with the Mouselab-MDP paradigm

Having motivated the paradigm, we briefly describe both the interface through which experimenters specify experiments, and the interface through which participants engage in the task. Two screenshots of the paradigm are shown in Fig. 9, and a live demo can be viewed at The code for Mouselab-MDP and an example of how to use it are available at

Fig. 9
figure 9

Two example paradigms created with the Mouselab-MDP plugin for JsPsych: a) Each state is labeled with the reward for reaching that state; these rewards become visible after they are clicked, with a $0.10 fee per click. b) The reward for making a transition is revealed only while the mouse is hovering over the corresponding arrow

On each trial, an environment is conveyed by an intuitive visualization (see Fig. 9). Formally, each environment corresponds to a directed graph with states as nodes and actions as edges. The participant navigates through the graph using the keyboard, attempting to collect the maximal total reward. States or edges are annotated with the reward for reaching the state or taking the action. Crucially, these labels may not be visible when the trial begins. Rather, the participant may need to click or hover their mouse over a state or edge to see the associated reward. The timecourse of these information-gathering operations provides fine-grained information about the person’s planning strategy. Furthermore, our paradigm allows researchers to investigate how people negotiate the tradeoff between the cost of thinking and its benefits. This can be done by manipulating the cost of information gathering; for instance by charging participants a certain number of points per click.

With the Mouselab-MDP jsPsych plugin, experimenters can create a planning experiment by specifying the following critical components:

  1. 1.

    graph is a mapping sA from a state s to action contingencies A. Each action contingency is a mapping \(a \mapsto (r, s^{\prime })\) from an action to a reward r and the next state \(s^{\prime }\). The graph structure thereby specifies the actions a available in each state, as well as the reward r and resultant state \(s^{\prime }\) associated with each action.

  2. 2.

    initial is the state in which the participant begins the trial.

  3. 3.

    layout is a mapping s↦(x,y) that specifies the location of each node on the screen.

Specifying only these settings will result in a graph with rewards shown on the edges between nodes and no labels on the states.

To take advantage of additional Mouselab features, the user must specify at least one of the following optional properties:

  1. 1.

    stateLabels is a mapping s that specifies the labels to be shown on each state.

  2. 2.

    stateDisplay∈ { ‘never’, ‘hover’, ‘click’, ‘always’ } specifies when state labels are displayed. When set to ‘click’, clicking on the state causes the label to appear and remain visible until the end of the trial. The optional parameter stateClickCost specifies the cost (a negative number) for clicking on a single state. When set to ‘hover’, the label appears only while the mouse is hovering over the associated edge. There is no cost for this option because the participant’s mouse might pass over an edge by accident.

  3. 3.

    edgeLabels is analagous to stateLabels, except that it defaults to the rewards associated with each edge.

  4. 4.

    edgeDisplay is analagous to stateDisplay. edgeClickCost specifies the cost.

Using this concise yet flexible plugin, various state-transition and reward structures can be displayed automatically. This allows experimenters to quickly create a large number of highly variable stimuli. Our plugin thereby enables experimenters with only basic knowledge of JavaScript to create a wide range of qualitatively novel experiments that can be run online with crowd-sourcing services such as Amazon Mechanical Turk.

Step-by-step guide on using the computational microscope

Given a data set collected with the Mouselab-MDP paradigm with uniform click costs and no edge rewards, our computational microscope can be used to obtain a detailed analysis of how the participants learned how to plan without any additional programming . Here, we provide a step-by-step guide to applying the computational microscope. To help users get started with the computational microscope without having to collect data first, the computational microscope comes with data from four experiments using the tasks illustrated in Figs. 1 and 7a-c, respectively. The computational microscope provides information about the strategy sequence, the amount of noise in the application of the strategy, the sequence of strategy types and the change in the relative frequency of decision systems. The computational microscope requires git and Python3 to be installed on the user’s machine. The following steps describe how to apply the computational microscope to a data set and the output it provides.

  1. 1.

    Access data sets and the source code of the computational microscope by cloning the corresponding github repository using the command:

    git clone Enhancement/ComputationalMicroscope.git

    The repository includes four data sets that are contained in the folder data/human/. For a detailed description of these data sets, see Table 2

  2. 2.

    Navigate to src/ and install the package requirements running the following command in the cloned repository’s root directory:

    pip3 install -r requirements.txt

  3. 3.

    Apply the computational microscope on any of the 4 data sets described in Table 2 using the following command:

    python3 <dataset> <block> <condition>

    The values that the parameters in the above command take can be found out by using the command:

    python3 help

    Here, the parameters <dataset>, <block> and <condition> define the name of the dataset, the block of the experiment which generated the dataset, and the condition of the experiment, the computational microscope is to be run on. Upon successful completion, a dictionary with the participant IDs as keys and the strategy sequences as its values are stored as a pickle file in the path "results/inferred_sequences/<dataset> _<block>_<condition>_strategies.pkl" and the corresponding noise parameter values, in the same format, are stored in "results/inferred_sequences/<dataset> _<block>_<condition>_temperatures.pkl".

    For example, to run the computational microscope on the test block of the dataset with increasing variance for participants who belong to the condition without feedback, run the following command:

    python3 increasing_variance train none

  4. 4.

    Analyze the generated sequences by running the command:

    python3 <dataset> <block> <condition>

    This command produces plots of the trial-by-trial changes in the frequencies of the top-5 strategies and strategy types, and in the influence of different decision systems and other factors. It integrates the data from all participants into the plots in the "results/<dataset>_plots" directory.

    Table 2 Datasets included in the computational microscope repository
    Fig. 10
    figure 10

    Generated analysis plots for training block of the no feedback condition of the increasing variance data set. a Influence of different decision systems and other factors. b Trial-wise changes in strategy type frequencies. c Trial-wise changes in strategy frequencies

    For example, the following command generates the plots shown in Fig. 10.

    python3 increasing_variance test none

The computational microscope, in its current implementation, can be applied to task structures that are symmetric and do not have cycles. But the general approach described in this article works for arbitrary environments. The implementation and a detailed tutorial on applying the computational microscope to a custom dataset are available at

Does it work?

To test whether using the computational microscope in conjunction with the Mouselab-MDP paradigm is a reliable way to measure how people plan, we test this approach using simulations and empirical data. First, we perform simulations to test our hypothesis that the Mouselab-MDP paradigm yields so much information about how people plan that it becomes possible to accurately infer which planning strategy they used on a single trial and how that strategy differed from the strategies that the participant used on the preceding trial and on the following trial. In follow-up simulations we then assess whether this is also true for the relative contributions of different decision systems. Following these simulation studies, we test whether the inferences of our method are valid measures of planing and learning by applying it to empirical data from studies where planning and learning were experimentally manipulated.

Simulation studies

To test if our experimental paradigm makes it possible to infer people’s planning strategies on a trial-by-trial basis, we simulated which process-tracing data we would obtain in a Mouselab-MDP experiment depending on which strategies people use and how those strategies change from each trial to the next. We then applied our computational microscope to the simulated process-tracing data to test if that data would be sufficiently informative about the underlying planning strategies that we would be able to infer them correctly. Concretely, we report two sets of simulations suggesting that our method can accurately measure changes in people’s planning strategies and the relative influence of different decision systems, respectively.

Is the process-tracing data from the Mouselab-MDP paradigm sufficiently informative about people’s planning strategies?

We simulated a Mouselab-MDP experiment with 31 trials of the 3-step planning task illustrated in Fig. 1 and described in Section 5 for various different sequences of planning strategies. We derived six sets of sequences of planning strategies from five different models of how people might learn how to plan. To generate the first data set, we applied the rational model of strategy selection learning by Lieder and Griffiths (2017); the parameters of this model were fit to the data from 57 participants performing 31 trials of the 3-step planning task illustrated in Fig. 1 (i.e., the control condition of the experiment described in Appendix A.1). We created four additional data sets by modeling the temporal evolution of people’s planning strategies as gradual learning, insight-like learning, a mixture of both gradual and insight-like learning, or a random process that chooses the strategy on each trial independently at random (random model). In all cases, the generation of the strategy sequence and the generation of each click sequence given the sampled strategy involved a considerable amount of randomness that matched or exceeded the variability observed in human data. For a more detailed description of how the data was generated, please see Section A in the Appendix. To avoid bias towards any one of the five models, we used each of them to generate a data set with 500 simulated participants completing 31 trials each. We then combined the resulting five data sets into a single data set from 2500 simulated participants.

We then used our computational microscope to compute the maximum a posteriori estimate of the sequence of strategies for each participant and compared it to the ground truth sequence of strategies. We evaluated the informativeness of our process-tracing paradigm in terms of how accurately the strategies and strategy types could be inferred from the simulated process-tracing data. We found that the process-tracing data made it possible to infer the true strategy for 80 ± 0.01% of the trials and to infer the true strategy type for 92 ± 0.00% of them. These findings suggest that our experimental paradigm yields so much information that we can hope to be able to infer people’s planning strategies on a trial-by-trial basis. Furthermore, these results suggest that we have implemented our computational method correctly and that the 79 candidate strategies are different enough that it is possible to discern between them. For a detailed description of model-wise strategy and strategy type accuracies, please see Appendix A.3.

Validation of measuring the contributions of different decision systems and other factors

We validated our method’s ability to recover the trend in the relative influence of different decision systems and other factors across a series of 79 trials. Each simulation assumed one of three possible trends: increasing influence, decreasing influence, or constant influence. For each factor, for the increasing and decreasing trends, we created a sequence of 79 strategies in which each strategy appears only once and the order of the strategies in the sequence is the sorted order of the contribution of the factor to the corresponding strategy. We then generated a dataset of 500 sequences of click sequences. For the constant case, for each factor, we partitioned the set of strategies into up to 3 groups based on the 33rd, 67th and 100th percentiles of the relative influence of the factor across all strategies. We validated our microscope on 500 simulated sequences. To generate a sequence, we randomly selected one of the three groups to generate sequences from and then sampled 79 strategies from that group and arranged them in sequence. Figure 11 shows that our computational microscope recovered the trends in the relative influence of the decision systems and other factors very accurately.

Fig. 11
figure 11

Smoothed plots for comparison of the actual and inferred trends in the relative influence of different decision systems and other factors. The computational microscope was applied to click sequences generated from strategy sequences where the weight of one of the five factors was systematically increasing (top row), decreasing (center row), and constant (bottom row) respectively. Each line is based on a different strategy sequence

Validation on empirical data

We also validated our computational microscope on empirical data, that is we tested whether it can detect the effects of experimental manipulations and task structure on people’s planning strategies and metacognitive learning.

Detecting the effect of feedback on cognitive plasticity

To verify whether our computational microscope can detect the effect of an experimental manipulation expected to promote cognitive plasticity, namely feedback, we applied it to the Mouselab-MDP process-tracing data from the experiment described in Appendix A where 164 participants solved 30 different 3-step planning problems of the form shown in Fig. 1. Participants in the control condition received no feedback whereas participants in the first experimental condition received feedback on their actions (Action FB) and participants in the second experimental condition received feedback on how they made their decisions (Metacognitive FB). Action FB stated whether the chosen move was sub-optimal and included a delay penalty whose duration was proportional to the difference between the expected returns of the optimal move versus the chose one. In contrast to Action FB, Metacognitive FB pertains to how the decisions are made rather than to the decisions themselves. Metacognitive FB is given after every information gathering operation (click). It has two components that convey the informational value of the planning operation and the planning operation that the optimal strategy would have chosen, respectively.

This metacognitive feedback was designed to be more effective than action feedback at teaching people the optimal planning strategy for the task illustrated in Fig. 1. This strategy (Callaway et al., 2018) starts by searching the potential final destinations for the best possible outcome and terminates planning when it finds one of them.

Fig. 12
figure 12

Comparison of frequencies of forward-planning and near-optimal strategies across different types of feedback in the experiment described in Appendix A.1. The green, orange and the blue lines represent the metacognitive feedback, action feedback and the no feedback conditions respectively. The circles represent the forward planning strategies and the stars represent the near-optimal planning strategies

As Fig. 12 shows, the computational microscope correctly detected that feedback boosted metacognitive learning. Concretely, the computational microscope revealed that metacognitive feedback boosted the discovery of the optimal planning strategy (58% vs. 31% in the no feedback condition, z = 15.44,p < 0.0001, BF > 1000)Footnote 2 and decreased people’s propensity to start planning by considering immediate outcomes, i.e. forward planning (2% vs. 14% in the no feedback condition, z = − 13.27,p < 0.0001,BF > 1000) whereas action feedback reduced the frequency of the near-optimal planning strategy (24% vs. 31% in the no feedback condition, z = − 4.74,p < 0.0001,BF > 1000) and did not change the frequency of the forward planning strategies (15% vs. 16% in the no feedback condition, z = 1.00,p = 0.3193,BF = 0.10 ).

The computational microscope allows us to gain additional insights into how those changes in people’s strategies come about. Concretely, correcting for multiple comparisons (αsidak = 0.0034) and applying Wilcoxon-signed rank test, Fig. 13 shows that metacognitive feedback significantly accelerated people’s transition to choosing their planning operations increasingly more based on the model-based metareasoning system (T = 248,p = 0.0004,BF = 65.31), the Pavlovian system (T = 276,p = 0.0007,BF = 38.15), and the system for deciding when to stop planning (T = 82,p < 0.0001,BF = 23568.70). This makes sense because the structure of the environment makes it beneficial to inspect nodes that are most uncertain (a feat accomplished by the metareasoning system), explore nodes that lie on the path to the most valuable nodes (as recommended by the Pavlovian system), and to stop as soon as a very good path has been identified (a feat that accomplished by the system for deciding when to stop). Also, Metacognitive feedback, in general, drove people towards planning more by reducing the amount of mental effort avoidance (T = 1.0,p = 0.0001,BF = 167.25). Action FB, by contrast,

Fig. 13
figure 13

Temporal evolution of the relative influence of different decisions systems and other factors in the control condition without feedback (a), the experimental condition with metacognitive feedback (b), and the experimental condition with action feedback (c), respectively

drove people towards relying more on the Pavlovian system (T = 183,p = 0.0004,BF = 1236.80), and the decision system for deciding when to stop planning (T = 134,p = 0.0001,BF = 685.42) and relying less on the model-free values and heuristics (T = 229,p = 0.0004,BF = 172.56) decision system. In the condition without feedback, people relied increasingly more on the Pavlovian system (T = 148,p < 0.0001,BF = 1852.39), the system for deciding when to stop planning (T = 173,p < 0.0002,BF = 647.56) and on the model-based metareasoning system (T = 206,p = 0.0012,BF = 38.51) but less significantly when compared to the metacognitive feedback condition.

The computational microscope also provides insights into which unique strategy types people go through during learning (learning trajectories) and how this is affected by feedback. Overall, we found that 86% of people’s learning trajectories were unique. However, when we zoom out to the level of strategy types, the computational microscope reveals several common learning trajectories (see Table 3).

Table 3 Common trajectories of strategy types by the type of feedback participants received

We found that the number of strategy types people go through from their initial strategy to the final strategy was lower when participants received metacognitive feedback than when they received action feedback (t(107) = − 3.73, p = 0.0002,BF = 161.30) or no feedback (t(107) = − 2.65,p = 0.0046,BF = 8.77). We found no significant difference between the Action FB and the No Feedback conditions (t(106) = 1.46,p = 0.0737,BF = 0.09)

Measuring how people’s planning strategies differ depending on the structure of the environment

Previous work has shown that people adapt their cognitive strategies to the structure of the decision environment (Payne et al., 1993; Callaway et al., 2018; Lieder & Griffiths, 2017; Gigerenzer & Selten, 2002). Here, we verify that our method is able to detect differences in people’s strategies across the four environments described in Section 5. To do so, we applied the computational microscope to the process-tracing data participants generated in the test blocks of the corresponding experiments after they had learned about their respective environment in the training block (see Table 2). Because participants went through a sufficiently large number of training trials, we observed that participants’ planning strategies were stable. As shown in Table 4, the computational microscope revealed that people adapted their planning strategy to the structure of their environment. These differences are systematic in the sense that how people’s strategy choices differ across environments roughly corresponds to how the strategies’ performance differs across those environments. To quantify this, we report the relative performance (rrel) of the most common strategies relative to the best-performing strategy of each environment. The performance of each strategy (ri) was determined by running 100,000 simulations, and then normalized according to \(r^{\text {rel}}_{i}=\frac {r_{i}- \min \limits _{j} r_{j}}{\max \limits _{j} r_{j} - \min \limits _{j} r_{j}}\).

Table 4 Summary of the performance of the most frequent strategies across four different environments

For both environments with increasing variance, our computational microscope detected that the most common strategy was the near-optimal goal-setting strategy which exploits that the most distant rewards are most variable.

By contrast, people almost never used this strategy in any of the other environments. For the environment with decreasing variance, our computational microscope detected that people primarily use strategies that exploit the structure of this environment by prioritizing its immediate outcomes.

For the environment with constant variance, the computational microscope detected that after inspecting all immediate outcomes the second most frequent strategy performs Best-First Search with Satisficing, which is adaptive in this environment (Callaway et al., 2018), although the most commonly used strategy was not particularly adaptive.

These results show that the computational microscope can reliably reveal how the planning strategies people use differ depending on the structure of the environment. Furthermore, comparing the strategies the computational microscope inferred for the 5-step version of the increasing variance environment that was used as a transfer task to the 3-step version of that environment that was used as a training task suggests that the computational microscope can reveal the transfer of learning across environments.

Equally, the strategy types inferred by our computational microscopes were consistent with previous findings suggesting that people adapt their decision strategies to the structure of the environment (Payne et al., 1993; Callaway et al., 2018; Lieder & Griffiths, 2017; Gigerenzer & Selten, 2002). Table 5 shows the performance and frequency of the inferred strategy types in decreasing order of their frequency for each of the 4 environments. The performance of a strategy type was determined by the weighted average of the performances of the strategies belonging to that strategy type where the weight of a strategy is the relative frequency of the strategy among the strategies belonging to the cluster. As expected, we find that in both increasing variance environments, people primarily rely on strategies that prioritize the potential final outcomes. For the environment with decreasing variance, the computational microscope inferred that most people used the strategy type that is best adapted to this type of environment, namely myopic planning strategies. For the environment with constant variance, the computational microscope inferred that forward planning strategies similar to best first-search was the second most frequently type of planning strategies. The most common strategy type was “Myopic Planning” which includes several strategies that are similar to Best First Search (see Section 5).

Table 5 Summary of the performance of the most frequent strategy types for four different environments

Overall, the results in Tables 4 and 5 illustrate that our computational microscope makes it easy for researchers to describe both the adaptiveness of human planning and its limits.


We have developed a computational process-tracing method that allows us look at how people plan and how their planning strategies change over time. Our method extends the Mouselab paradigm for tracing people’s decision strategies (Payne et al., 1993) in three ways. First, it progresses from one-shot decisions to sequential decision problems. Second, it introduces computational methods for analyzing process tracing data in terms cognitive strategies. Third, we have extend the approach to measuring how people’s planning strategies change over time. Our method is easy to use and freely available. We have successfully evaluated our methods using simulations and human data. The results suggest that our computational microscope can measure cognitive plasticity in terms of the temporal evolution of people’s cognitive strategies and also provide us with valuable information about the trends in changes of strategies, strategy types and also how people change their strategies with changes in environments. We have applied our computational microscope to a number of data sets. The results of these analyses contribute to a more detailed understanding of how people plan and revealed some interesting empirical characteristics of metacognitive learning.

Our method can be used to study many different types of cognitive change across a wide range of different timescales. This makes it suitable for investigating learning, cognitive development, decision-making, individual differences, and psychopathology.

We are optimistic that computational microscopes will become useful tools for investigating the learning mechanisms that enable people to acquire complex cognitive skills and shape the way we think and decide. This will be an important step towards reverse-engineering people’s ability to discover and continuously refine their own algorithms. From a psychological perspective, this line of work might also help us understand why we think the way we do and lead us to rethink our assumptions about what people can and cannot learn. Developmental psychologists could use our method to trace the development of cognitive strategies across the lifespan and elucidate how learning contributes to those developmental changes. Similarly, clinical psychologists and computational psychiatrists could apply it to trace how person’s cognitive strategies changes as they develop and recover from different mental disorders. Importantly, our method can also be used to investigate how cognitive plasticity depends on the learning environment, individual differences, age (Das et al., 2019), time pressure, motivation, and interventions – including feedback, instructions, and reflection prompts. Using our method to measure individual differences in cognitive plasticity might reveal why equivalent experience can have fundamentally different effects on the psychological development of different people. This, in turn, can help us understand why some people are predisposed to develop certain cognitive styles, personalities, and mental disorders. Applications in computational psychiatry might use this approach to understand the development of mental disorders and to create computational assays for detecting whether a person is at risk for developing specific forms of psychopathology long before its symptoms occur.

To facilitate these applications, future work might extend the proposed measurement model to continuous strategy spaces, a wider range of tasks and strategies, and learning at the timescale of individual cognitive operations. In addition, future work will also leverage our computational microscope to elucidate individual differences in cognitive plasticity within and across psychiatric conditions and different age groups. We will also work on making our inferences more precise by learning models of strategies and strategy transitions from human data. To move towards a more naturalistic planning task, future versions of our method could present participants with fully-revealed environments and infer their planning strategies from eye-tracking data. The computational approach could be analogous to the one presented here instead that clicks are replaced by saccades.

The ideas of our approach are not entirely novel. Process-tracing has already been extensively used to study people’s decision strategies (Payne et al., 1993; Schulte-Mecklenbeck et al., 2011; Schulte-Mecklenbeck et al., 2019) and Bayesian inference has been used to infer which decision strategies are include in individual participants’ repertoire (Scheibehenne et al., 2013), when people switch between different decision strategies (Lee & Gluck, 2021), and which strategies people use in economic games Costa-Gomes and Crawford (2006), Crawford (2008), and Costa-Gomes et al., (2001). Our method has several advantages.

What differentiates our approach from the original Mouselab paradigm (Payne et al., 1993) is that it measures how people plan and that we infer people’s strategies from the process-tracing data. On a high level, the Bayesian Toolbox approach by Scheibehenne et al., (2013) also infers people’s strategies. Their approach infers which strategies are included in the person’s repertoire. However, it does not attempt to resolve which strategy was used on which trial. Instead, it makes the simplifying assumption that every decision is influenced by all strategies that are in the person’s toolbox. By contrast, our method makes the different assumption that on each trial each participant draws a single strategy from the toolbox. Based on this assumption, our method infers which individual strategy a participant used on the first trial, which individual strategy they used on the second trial, and so on.

The methods developed by Lee and Gluck (2021) and Lee et al., (2019) are more similar to our method in that they infer which strategy each participant used on each trial of the experiment. The main difference is that these methods were developed for studying multi-cue decision-making whereas our method was developed for studying planning. The method by Lee et al., (2019) has the advantage that it uses process-tracing data, verbal reports, and choices whereas our method exclusively relies on the process-tracing data. While our method and Lee et al., (2019) analyze the data of each participant individually, the method by Lee and Gluck (2021) additionally performs inference at the group level and constrains inferences about individual participants by the characteristics of the group. Furthermore, the method by Lee and Gluck (2021) additionally infers two aspects of the generative model of strategy sequences from the data, namely the probabilities of possible initial strategies and the probabilities of possible strategy transitions. The main advance of our method is that it differentiates between a much larger number of different strategies (79 vs. 4). Furthermore, we examined multiple alternative models of strategy transitions and validated our method on data from multiple different experiments that varied the decision environment and induced systematic learning-induced changes in people’s strategies over time.

Finally, the approaches that have been developed to infer which strategies people use in economic games (Costa-Gomes & Crawford, 2006; Crawford, 2008; Costa-Gomes et al., 2001) assume that each person always uses the same strategy and cannot measure how a person’s strategy changes over time. Furthermore, the strategies these methods measure are specific to strategic social interaction. The strategies people use in tasks such as planning a road trip or project are very different. Therefore, studying them requires a different methodology such as the one we have developed in this work.

In conclusion, the approach introduced in this article complements these existing approaches in useful ways that make it possible to measure people’s planning strategies and how they discover them.

Our methods are not without limitations. First and foremost, the Mouselab-MDP paradigm inherits at least one of the limitations of the Mouselab paradigm that it is based on. Concretely, the Mouselab-MDP paradigm might change how people plan by making information acquisition costlier than it might otherwise be. Previous research comparing Mouselab-based measures of people’s decision processes against equivalent measures based on eye-tracking found that the increased cost of information acquisition in the Mouselab paradigm led people to acquire less information and, to some extent, it also changed the order in which people acquire information (Lohse & Johnson, 1996). We believe that it is likely that similar differences also exist for the Mouselab-MDP paradigm. As Lohse and Johnson (1996) pointed out, such differences are more important for some research questions than for others. Following the logic of their analysis, we believe that there are many important questions about planning and metacognitive learning that are unaffected by such differences. Concretely, our method should be well-suited to characterize the qualitative effects of experimental manipulations on planning and learning as long as it can be expected that the qualitative effects would be the same if the cost of information acquisition was lower. Regardless thereof, we believe that comparing the process-tracing data collected with the Mouselab-MDP paradigm to corresponding process-tracing data based on eye-tracking is an interesting direction for future work.

A perhaps more provocative possibility is that the planning environment that the Mouselab-MDP paradigm seeks to emulate is one in which people cannot simply look up what the outcomes of their actions would be but have to estimate them through effortful mental simulations. In this sense, it is conceivable that the Mouselab-MDP paradigm is closer to the real-world problem that is designed to mimic than an equivalent eye-tracking paradigm would be. This suggest that future work should compare the plans that people arrive at when they have to rely on mental simulations to the plans that they arrive at when those mental simulations are externalized with the Mouselab-MDP paradigm.

One limitation of our computational microscope is that its current implementation requires that the task environment is symmetric and has no circular paths in it. This is because of the features defined in Eq. 1 are computable currently only for such structures. Generalizing the implementation of the computational microscope so that it can be applied to other kinds of environments may be a worthwhile direction for future work.

In summary, our method makes it possible to more directly observe the previously hidden phenomenon of cognitive plasticity in many of its facets – ranging from skill acquisition, learning to think differently, cognitive decline, self-improvement, changes in cognitive dispositions, and the onset, progression, and recovery from psychiatric symptoms and mental disorders. In conclusion, we believe that the method introduced in this paper can be used to advance cognitive science, psychology, and psychiatry in many promising ways.