States of nature and states of mind: a generalized theory of decision-making

Canonical economic agents act so as to maximize a single, representative, utility function. However, there is accumulating evidence that heterogeneity in thought processes may be an important determinant of individual behavior. This paper investigates the implications of a vector-valued generalization of the Expected Utility paradigm, which permits agents either to deliberate as per Homo economics, or to act impulsively. This generalized decision theory is applied to explain the crowding-out effect, irrational educational investment decisions, persistent social inequalities, the pervasive influence of non-cognitive ability on socio-economic outcomes, and the dynamic relationships between non-cognitive ability, cognitive ability, and behavioral biases. These results suggest that the generalized decision theory warrants further investigation.

reject the hypothesis that thought-process heterogeneity can be adequately modeled as if it was taste heterogeneity (Swait and Bernardino 2000;Swait and Adamowicz 2001;Hess et al. 2012;Kaplan et al. 2012;Vij and Walker 2014), which suggests that some decision situations may admit multiple distinct utility representations. Indeed, mixture models have already been shown to significantly increase explanatory power by operationalizing heterogeneous decision-making between individuals, even accounting for their greater degrees of freedom (Conte et al. 2011). There is also considerable evidence that individual decision-makers do not consistently apply the same decision criterion, either across time or across decision situations (Luce and Raiffa 1957;Loewenstein et al. 2003;DellaVigna 2009). We recognize these empirical findings by analyzing the simplest possible generalization of Expected Utility Theory that can admit heterogeneous thought processes, both between individuals and within individuals.
Our generalized decision theory incorporates two distinct utility formulations that represent two distinct thought processes. On an individual level, this approach naturally operationalizes the quintessential human conflict between deliberative and impulsive thought processes, which motivates the extensive psychological 'dual-self' literature, and which has recently been popularized by Kahneman (2011) and Peters (2012). 1 Thus, under the proposed model, agents may deliberate as per Homo economicus, but they may alternatively act on impulse, and each agent's propensity to act deliberatively is modeled as an individual-and situation-specific probability distribution. This generalized decision theory sits within the class of general random utility models that was formally defined by Manski (1977), although it is distinguished from existing theory since the choice problem generating process is explicitly modeled. 2 We apply our generalized decision theory to explain the empirical phenomenon of crowding out, and to explain a series of interrelated empirical anomalies within the broad field of human capital development. The former application demonstrates the importance of situation-specific heterogeneity in thought processes, whilst the latter demonstrates the importance of individual-specific heterogeneity in thought processes. In particular, Sect. 2 shows that crowding out will arise whenever the provision of an additional extrinsic incentive raises the probability that an undesirable extrinsic thought process will be adopted rather than a desirable intrinsic thought process. Meanwhile,Sect. 4 shows that an individual who acts impulsively rather than deliberatively could achieve educational and employment outcomes that are divergently below normatively optimal predictions.
Our generalized decision theory may be particularly salient when it is applied to the decision-making of children. Although each of us will, on occasion, act without first considering the consequences of that action, children are particularly likely 1 An exposition of the dual-self paradigm is provided by Kahneman and Frederick (2002); a review of psychological theories based upon it is provided by Alós-Ferrer and Strack (2014); and a review of the existing economic dual-self literature is provided by Embrey (2019), who concludes that most existing approaches are nested within the generalized model presented here. 2 In Manski (1977), the choice problem generating process is an arbitrary probability distribution over the set of possible (choice-set, decision-rule) pairs. When studies such as Costa- Gomes and Crawford (2006) empirically designate individuals with a particular decision criterion, they are implicitly assuming the existence of some individual-and situation-specific choice problem generating process This paper explicitly models that same process, and allows it to evolve as children develop into adults. to do so. 3 It has already be observed on an intuitive level that this tendency could lead to normatively suboptimal levels of educational investment (Lavecchia et al. 2015). Our results in Sects. 4 and 5 provide a theoretical basis for that intuition, and they also provide intuitive yet original explanations for several other empirical truths. These include: chronic unemployment, strong inter-generational persistence of social inequalities, dynamic complementarity between cognitive and non-cognitive abilities, divergent developmental pathways dependent upon small changes in early-life experiences, and an explanation for the observed relationships between IQ, Cognitive Reflection (as measured by the Cognitive Reflection Test of Frederick 2005), behavioral biases (including present bias and risk aversion), and other social outcomes (such as health and financial decision-making). These findings suggest that the generalized decision theory has the potential not only to improve our understanding of specific behavioral anomalies, but also to bring together diverse strands from the existing literature.
The concept that deliberative thought processes may not always override individuals' impulsive responses is not new; it was discussed by Plato (ca. 380 B.C.E. 1906), Smith (1759), and Marshall (1890) amongst others. However, the psychological literature has only recently converged toward a default-interventionist paradigm to formalize that concept (Evans and Stanovich 2013). 4 The default-interventionist paradigm is also closely aligned with the perspective which Bechara (2005) distils from the neuroscientific literature; however, it is at odds with all existing economic dual-self theories (Embrey 2019). Moreover, the existing economic dual-self literature maintains the Neoclassical assumption of thought-process homogeneity, either by assuming that some meta-rational process mediates between the alternative utility formulations (e.g., Fudenberg and Levine 2006), or by assuming that context alone perfectly determines which utility formulation will predominate (e.g., Thaler and Shefrin 1981).
The proposed model is, therefore, most closely related to the those of Laibson (2001) and Bernheim and Rangel (2004), since, although neither is framed as a dual-self theory, each describes addiction as an alternative, flawed, decision process. Nevertheless, in those models, addictive thinking is triggered whenever an external cue is received, which leads their authors to focus on a representative and completely informed agent's rational response to that situation. 5 By contrast, the present model emphasizes the dynamic consequences of individual heterogeneity in thought processes, for everyday situations where individuals may not even be aware that they have made a decision, much less possess complete knowledge of their own decision processes. Such situa-3 Other common situations in which unconsidered decision-making is particularly likely include: intoxication, addiction, sleep deprivation, malnutrition, stress, poverty, and morbidity (Metcalfe and Mischel 1999;Donohew et al. 2000;Goldman 2012;Mani et al. 2013). 4 Under the default-interventionist description of dual-selves: individuals will act on impulse unless deliberative reason intervenes in their decision-making. Under the alternative, parallel-processing, description: individuals always determine both an impulsive and a deliberative optimum, and both thought processes influence every decision. 5 If one were to assume the existence of some representative agent with meta-rational self-knowledge of the generalized decision theory presented here, then their ex ante optimal decision-making could, indeed, be represented as Neoclassical utility maximization (Karni and Safra 2016). A more complete comparison of the theoretic merits of each approach is provided by Embrey (2019). tions cannot be characterized by any single representative agent, unless impulsive and deliberative decision processes happen to coincide.
The main advantage of the Neoclassical assumption of a single representative agent is that it typically affords a mathematically elegant analysis. However, that mathematical elegance should not be mistaken for parsimony. The Neoclassical approach requires three layers of assumption: firstly, the set of relevant motivations is postulated; secondly, a functional form for each motivation is prescribed; finally, the functional form of a single-valued utility function is also prescribed, whereby those disparate motivations are assumed to be traded off against each other. The generalized approach typically also requires the first two layers of assumption, but it does not impose any homogeneous rule by which disparate motivations must be traded off. Thus, ceteris paribus, the law of parsimony would favor the generalized theory (Ockham ca. 1323(Ockham ca. 1974; a conclusion which holds a fortiori since that generalized theory provides an unified explanation for a number of open empirical questions. It is rare for modern decision theory to explicitly consider the validity of the above 'single-self' assumption set. This is because the revealed-preference paradigm of Samuelson (1938), and its formalization by Savage (1954), demonstrated that an expected utility representation must exist whenever a number of postulates are satisfied. The generalized theory proposed here is fully compatible with those seminal observations. Our contribution is to expand the applicability of Expected Utility theory from situations in which the Savage postulates apply globally, to situations in which they apply conditional upon the decision-maker's state of mind. Thus, we do not require preferences to be complete, transitive, and consistent through time, but only that these characteristics describe the agent's preferences under any given thought process. This substantially more tenable assumption relaxes the neoclassical assumption in much the same way that conditional independence relaxes an econometric independence assumption.

An application to explain crowding out
Crowding out is a well-documented empirical phenomenon, but it is not a natural consequence of standard Expected Utility Theory. The phenomenon is observed when an incentive that is intended to induce some target action has little effect, or even the opposite effect, on behavior. A classic example is provided by Gneezy and Rustichini (2000), who find that the introduction of a charge for the late collection of children from nurseries led to a significant increase in late collections, but crowding out is also observed in many other environments (Gneezy et al. 2011).
Despite its empirical prevalence, there are few existing theoretical explanations for crowding out. This is because, under Expected Utility Theory, any additional incentive unambiguously makes the target action more attractive. One possible explanation is contributed by Bénabou and Tirole (2006), who derive a crowding-out effect by assuming the existence of a third utility component to enumerate individuals' social reputation. This approach is innovative, but it requires an additional layer of assumptions, and it is also highly situation specific. For example, a social reputation component does not explain crowding out in the educational environment (for exam-ples, see Levitt et al. 2016), since in that environment, social reputation tends to push against, rather than for, the actions that policy-makers wish to incentivize.
Our generalized decision theory affords an intuitive yet novel explanation for crowding out that directly operationalizes the descriptive perspective of Gneezy et al. (2011). We propose that additional extrinsic payoffs could cue undesirable extrinsic thought processes, such that desirable intrinsic motivations are less likely to be considered. Our explanation is extremely parsimonious: it requires (i) that intrinsic and extrinsic thought processes exist, in the sense that they induce a strategy which is a probability measure on the set of possible acts, (ii) that the probability of the set of target acts is greater under the intrinsic thought process than under the incentivised extrinsic thought process, and (iii) that the introduction of an additional extrinsic incentive increases the probability that the extrinsic thought process will be adopted. Note: that (i) is also assumed under existing approaches, although these additionally postulate a utility representation for each motivation as well as a functional form whereby those motivations will be spliced together, that (ii) characterizes precisely those situations in which crowding out is observed (Gneezy et al. 2011), and that (iii) has strong empirical support from the literature on framing and priming effects. No further assumptions are necessary: in particular the present application requires no specialized utility components, and it requires none of the postulates that underpin expected utility theory. Figure 1 applies our generalized decision theory to explain crowding out. Rather than imposing some functional form by which utility components are weighed against each other, we assume that one of two possible thought processes will predominate. 6 In any decision instance, agent i's dominant thought process will be determined by their state of mind, which we can represent in reduced form by the probability p i that they will decide based upon their intrinsic thought process. In the case of nursery pick-ups, relevant intrinsic motivations could include social norms and altruism towards nursery staff, and in the case of educational interventions, intrinsic motivations could include the benefits of learning. In both cases, extrinsic motivations are likely to include the direct and opportunity costs of the target action, and also the treatment incentive.
To analyze the implications of Fig. 1, we require some notation. Let A be the set of possible acts, and T ⊂ A be the set of acts targeted by an intervention. Let γ i : ℘ (A) → [0, 1] be the probability measure that represents the strategy induced by the agent i's intrinsic thought process, and χ i ,χ i be the probability measures that represent the strategies induced by their extrinsic thought process, respectively, before and after the intervention is introduced. Analogously, letp be the probability that agent i adopts their intrinsic thought process once the intervention has been introduced. γ i = γ i because an incentive does not affect the intrinsic value of any outcome. We can now derive the effect of the incentive on the probability that agent i chooses a target act: crowding-out effect (1) Equation 1 shows that the total effect of the incentive on the probability that the agent chooses a target action is given by its direct incentive effect less an indirect crowdingout effect. The direct incentive effect is simply the increase in the probability that a target action would be chosen under the extrinsic thought process, multiplied by the probability of that the extrinsic thought process was already adopted. The crowdingout effect is the difference between the probability that a target action would be chosen under the intrinsic thought process and the probability that it would be chosen under the incentivised extrinsic thought process, multiplied by the probability that the act of providing the incentive would induce a switch from the intrinsic to the extrinsic thought process. We can, therefore, say that a crowding-out effect is present for individual i if ( p i −p i )[γ −χ i (T )] > 0, because in that case, the intended direct effect of the incentive would be reduced by its unintended thought-process switching effect. Moreover, we can say that a crowding-out effect is strong if , because in that case, the intended effect would be entirely reversed by the crowding-out effect.
We can now show Proposition 1, a proof of which is provided in the appendices: Proposition 1 1. Assumptions i-iii suffice to generate the crowding-out effect, 2. The crowding-out effect will be strong (reverse the intended effect), if additionally the percentage increase in the probability of adopting an extrinsic thought process is larger than the increase in the probability of choosing a target action under the extrinsic thought process as a percentage of the increase that would be induced by a switch to the intrinsic thought process.
Proposition 1 shows that our generalized decision theory can provide an intuitive yet original explanation for the phenomenon of crowding out. Our theory is also considerably more parsimonious than existing theories, since it does not require any additional specialized utility components with tightly constrained properties. Moreover, our the-ory operates on a primitive decision-theoretic level that does not even require the postulates of expected utility theory. It is nevertheless compatible with those postulates, provided that they are assumed conditional upon the decision-maker's state of mind, that is for each thought process separately.
Our explanation of the crowding-out effect has important and novel policy implications within an educational development context. When incorporated into the model presented in Sect. 3, it suggests that short-run cognitive skill development could be ensured by a sufficiently large incentive, but that this benefit may be offset in the long run, since that same intervention could preclude the development of the child's non-cognitive skill (their propensity to deliberatively engage with educational opportunities). This explanation of the crowding-out effect, therefore, supports the stereotypical teacher's intuition that effective interventions should develop a child's conscientiousness, both by explicitly teaching and implicitly demonstrating a deliberative decision-making processes.

An application to human capital development
We now apply our generalized decision theory to the domain of human capital development. Within that broad domain, many outcomes arise as the cumulative consequence of a series of incremental participation decisions. For example, health outcomes could be determined by incremental decisions such as whether to: smoke another cigarette, avoid exercise today, or eat fast-food rather than cook; employment decisions include whether to: search for jobs this morning, apply for the present role, or prepare for an interview; and educational decisions include whether to: attempt today's classwork, study for tomorrow's test, or take up an extra-curricular opportunity. As these examples demonstrate, the complex decisions of traditional economic theory can often be broken down into a series of binary decisions, and so the model presented here will adopt the simplifying assumption that all such elemental decisions are binary.
We model each incremental participation decision through the extensive-form game presented in Fig. 2. Under this decision process, the agent's state of mind determines which of two standard expected utility maximization problems will be solved. Thus, as before, our generalization amounts to the admission of two possible utility representations, only one of which will be followed on any given occasion. Our generalized decision theory could, therefore, be reduced to a Neoclassical model by the imposition of any functional form by which agents should trade off the disparate motivations represented by each utility function. However, such an imposition would amount to an assumption that agents always act as if they were meta-rational, because it would imply that their state of mind p was a decision variable. We do not impose that assumption because it has little empirical support within the context of the incremental decisions described above (see the literature reviewed in Sect. 4). 7 12 I. P. Embrey

Fig. 2 A generalized decision framework, applied to human capital development
The intuition behind Fig. 2 is simple: agents will override their impulsive preference to impose their deliberative preference with probability p it . An agent's state of mind is, therefore, determined according to the realization of p it , which we model as a random draw from P it , the pdf of agent i exerting deliberative self-control across all potential decision circumstances in period t. The realized draw, therefore, manifests any decision-specific contextual factors, whilst the distribution P it describes an aspect of the agent's non-cognitive ability at time t. Conceptually, the non-cognitive ability to exert conscious deliberation over one's actions is closely aligned with the psychological trait of conscientiousness.
An agent's action under either thought process will be determined by the Bayesian Nash Equilibrium of its corresponding subgame, which may be contingent upon the agent's believed probability of success at the task in question. The true probability of success π it of agent i at time t will, for any given task, be drawn from it , the agent's current cognitive ability distribution across possible tasks. An agent's believed probability of success is, therefore, given by some decision-weighting function w it (π it , it ), which represents their expected probability of success based upon their true probability-of-success distribution and any signal as to their realized probability of success at the task in hand. Thus, under either thought process, the task will be attempted if and only if where we have normalized the payoff of avoidance to be 0 under either thought process. 8 Thus, for each thought process, if the payoffs to success and failure have the same sign, then that sign will determine participation. Otherwise, there will be a critical threshold level of believed cognitive ability which determines whether the agent will attempt the task. π * it is well defined, provided the induced map w it (π it ) | it is injective for any cognitive ability distribution it . This condition is weak, since it is sufficient that ∂w it (π it ,·) ∂π it ≥ 0, which will hold, provided that the agent's believed probability of success is a weakly increasing function of the signal that they receive.
Thus, it and P it parametrize the agent's current cognitive and non-cognitive abilities, respectively, and crucially affect their decision outcome in period t. In turn, that period t decision outcome will affect i,t+1 and P i,t+1 , according to its human capital development implications. The dynamic process by which cognitive and noncognitive abilities are developed in our model is, therefore, built upon our assumptions regarding the feedback processes between decision outcomes and state variables. Let us first consider the implications of cognitive and non-cognitive abilities for the decision process.
In an educational context, the deliberative thought process will be dominated by the long-term returns to educational investment, since these considerably surpass any reasonable participation cost until well beyond the end of compulsory education (Oreopoulos 2007; Cunha and Heckman 2008). Since we apply our model to the period of compulsory education, we, therefore, simplify our analyses by assuming that participation will always be optimal under the deliberative thought process. Under the impulsive thought process, this will not be the case, since children may be deterred by factors such as: the cost of exerting effort, the opportunity cost of their outside option, or by a fear of perceived failure (for an insightful review of the behavioral economics of education see Lavecchia et al. 2015). However, we can be confident that the impulsive payoff will be greater for a child who perceives that s/he has been successful than for a child who perceives that s/he has been a failure (see, for example: Bénabou and Tirole 2002;Wang and Yang 2003). Thus, the cognitive ability threshold condition derived in Eq. (2) is likely to be binding under the impulsive thought process.
We now consider the consequences of each outcome for cognitive and noncognitive ability development. Our model attempts to characterize reality through the assumptions listed in Table 1.
Recall first that π it is defined as the probability that agent i will consider themselves to be successful in the period t task, and so to be precise, it describes an agent's cognitive ability relative to the expected standard for their age. The mechanism by which it could increase is through participation in educational tasks-indeed, this is the defining property of an educational task. We, therefore, capture (relative) cognitive ability development by an assumption that i,t+1 will stochastically dominate it if agent i participates in period t. If the agent avoids the task in period t, then it would be natural to assume that their cognitive ability would remain constant in absolute terms, and so decrease in relative terms. We capture this by an assumption that i,t+1 will be stochastically dominated by it , if agent i avoids the period t task. 9 Recall next that p it is the probability that agent i will adopt a deliberative thought process in the period t task. Since avoidance in period t necessarily follows from an impulsive thought process in period t, it is, therefore, natural to assume that avoidance will increase the likelihood of impulsivity in future periods. This would certainly be the case if there is hysteresis in self-control, which could arise through confirmatory bias (Rabin and Schrag 1999), or through the experiential development of impulsive responses (Denes-Raj and Epstein 1994). Such hysteresis also suggests that success will increase the likelihood of deliberation in future periods, and this assumption is further supported by the existing literature where the experience of success is considered to make future period participation more likely (Bénabou and Tirole 2002;Wang and Yang 2003). Finally, we must consider the effect of failure on p i,t+1 . Intuitively, one might suppose that failure would hurt the non-cognitive ability to participate in order to reap long-term rewards. This conclusion is supported by a large literature on attribution, where it has been established that young people who are unsuccessful are less likely than their peers to attribute their outcomes to their own decision-making (Gurney 1981;Furnham 1984;Määttä et al. 2002). A negative effect of failure on noncognitive ability could also be expected because psychological payoffs such as a fear of failure are likely to become more salient following negative experiences (Tversky and Kahneman 1974).
Given the dynamic consequences of Table 1, repeated implementation of the stage game depicted in Fig. 2 sets out a model of human capital development as a participation supergame. This model amounts to a production technology for the dynamic development of cognitive and non-cognitive skills throughout an individual's educational journey. We are primarily interested in applying this model to explain the empirical anomalies set out in Sect. 4, and so we will apply it to model agents who transition directly from compulsory eduction into the labor force. At this transition, the context of the participation decisions faced by our agents changes, but the con-sequences of those decisions change little. Participating in job-search activities still builds skills that make success more likely in future; there will still be hysteresis in task participation; and failure in job-search tasks is still likely to reduce future noncognitive ability for the reasons discussed above. Our model, therefore, suggests that the mechanism by which educational success is produced might closely resemble the mechanism by which labor-market success is produced.

Implications for human capital development
In this section, we synthesize the literature within the broad domain of human capital development, to identify five important empirical truths that are explained by our model. Several of these phenomena do not yet have any viable theoretical explanation, whilst others rely on highly specialized assumptions. However, in this section, we demonstrate that each of these phenomena arises as a natural consequence of our model. We, therefore, conclude that these phenomena may be interconnected, and that they are only anomalous under the assumption that there exists some single representative thought process.

Grossly suboptimal human capital investment
A substantial minority of individuals drop out of education considerably before it would be optimum for them to do so (Harmon et al. 2000;Oreopoulos 2007;Cunha and Heckman 2008). This represents a challenge, both to society and to existing economic theory, respectively, because educational investment yields substantial benefits on a societal level and on an individual level. In particular, it is now well established that the financial returns to education appreciably surpass market rates of return (Cahuc et al. 2014), that non-pecuniary benefits of education may well surpass those financial returns (Oreopoulos and Salvanes 2011), and that social returns to education are probably of comparable magnitude to those personal benefits (McMahon 2004).
There are several valuable theoretical contributions which could explain the prevalence of educational under-investment, but none provides a convincing explanation for its magnitude. Oreopoulos (2007) estimates that, even as a purely financial proposition, extended educational participation is 3-6 times more beneficial than the outside option at any reasonable discount rate, and Cunha and Heckman (2008) incorporate a psychic cost of effort to explain that differential: they establish that the implicit value of unobserved participation costs would need to be in the order of $500,000 to explain observed U.S. college enrolment decisions. These findings are striking, because one would certainly expect that an incentive which either tripled an individual's life-time earnings or provided a lump-sum of $500,000 would be enough to induce almost any child to attend college.
The model proposed in Sect. 3 explains this anomaly by admitting impulsive decision-making as a distinct thought process. Whilst a deliberative thought process must account for the long-term returns to education, we propose that a child who acts impulsively might not even consider those returns. In effect, we capture the imme-diate components of a standard behavioral decision-utility function within a separate representative thought process. Although it seems unlikely that a high-stakes one-shot educational investment decision could be made without considering its long-term consequences, we propose that, in reality, the canonical educational investment decision of (Becker 1962(Becker , 1964 is the cumulative outcome of a series of minor educational participation decisions that children face on a daily basis. Children could very well take such incremental investment decisions without first deliberating their long-term consequences. Existing theoretical predictions are bounded within the vicinity of the normatively optimal investment level, because they only augment the normatively optimal utility function. Nevertheless, such theories can provide useful insights into childhood decision-making. One such insight is that credit constraints and limited information regarding the returns to education could both reduce educational investment. However, these factors have little effect in the developed world (Rouse 2004;Oreopoulos 2007;Jensen 2010;Lavecchia et al. 2015). Additional insights can be gained by behavioral models which incorporate specific additional payoffs within their agents' utility functions. For example, a payoff to self-worth has been shown to induce a fear of failure that would reduce equilibrium participation (Wang and Yang 2003;Köszegi 2006), and a payoff to social identity has also been shown to reduce participation, for agents who choose to fit in with the 'burnouts' because they are neither cool enough to fit in with the 'leading crowd' nor intelligent enough to fit in with the 'nerds' (Akerlof and Kranton 2002).
Present bias is likely to exert as an additional negative effect on educational participation. This observation has both intuitive relevance and proven significance in the context of childhood decision-making (Lavecchia et al. 2015;Shoda, Mischel and Peake 1990); however, an extraordinary degree of present bias would be required to offset the overwhelmingly positive returns to schooling, at least when it is operationalized through the quasi-hyperbolic discounting of Laibson (1997). Our generalized decision theory suggests an alternative operationalization of present bias. Rather than assuming that individuals trade off deliberative and impulse motivations in an idiosyncratic manner, we assume that they have an idiosyncratic propensity to act either impulsively or deliberatively. This subtle yet profound distinction is analogous to the distinction between a mixed-strategies agent who always acts as x% strategy X (and 100− x% strategy Y ), and an agent who acts as pure X on x% of occasions. We formalize the latter approach, and in doing so, we demonstrate that it yields markedly distinct dynamic implications: that is, agents of the latter type do not act as if they were agents of the former type. The most importantly different implication is that, under the generalized theory, the decision outcome for an agent who acts impulsively could be arbitrarily distant from the normatively optimal outcome; hence, grossly suboptimal decisions are to be expected.
Proposition 2 formalizes this intuition, by establishing that, if an individual has sufficiently low cognitive and non-cognitive ability, then those abilities will decline further in expectation. Thus, if an individual is sufficiently unlikely to participate in any given educational opportunity, then they are even less likely to participate in any future opportunity, and so very low educational investment is a stable and attractive state for low-ability individuals. Table 1 are balanced, then both π it and p it are expected to decrease through time if:

Definition 1
The ability development consequences listed in Table 1 are balanced if: is the effect of that outcome on the expected value of x. This definition provides the simplest parameterization of Table 1 that preserves median ability levels in the vicinity of one half.

Cognitive and non-cognitive skills
An extensive literature has demonstrated that non-cognitive skills are pervasively important in the determination of socio-economic outcomes (Heckman 2006;Moffitt et al. 2011;Koch et al. 2015). Nevertheless, there is little consensus over the definition of non-cognitive skills (Humphries and Kosse 2017), and no theoretical exposition of the mechanism by which they influence individual outcomes. The proposed decision theory addresses these open questions, by demonstrating that an individual's propensity to think deliberatively, P it , plays a pivotal role in the determination of their educational and other socio-economic outcomes. This suggests that P it represents a fundamental non-cognitive ability-a characterization which is closely compatible with the empirical literature surveyed by Humphries and Kosse (2017). Heckman andCunha (e.g., 2007, 2010), have established that cognitive and noncognitive skills must exhibit both self-productivity and cross-productivity, to explain nine key stylized facts of human capital development. However their theoretical models take those dynamic relationship as a primitive assumption. This paper contributes to the literature of cognitive and noncognitive skill formation by providing a mechanism which could generate both self-productivity and cross-productivity. Given the model set out in Fig. 2 and the dynamic consequences set out in Table 1, we have that: Proposition 3 Provided that the agent's believed probability of success w(π it , it ) is a strictly increasing function of π it and it , and that its support is connected, open, and contains π * : 1. Cognitive ability will be self-productive provided that non-cognitive ability is not perfect: With balanced ability development consequences, non-cognitive ability will be self-productive for reasonably able individuals: 3. Non-cognitive ability will be cross-productive: 4. With balanced ability development consequences, cognitive ability will be crossproductive for reasonably able individuals provided that non-cognitive ability is not perfect: Proposition 3 sets out conditions under which the probability that cognitive and non-cognitive abilities will be increased this period is an increasing function of current ability levels. Whilst cognitive ability is likely to be self-productive, it is interesting to note that, under our model, non-cognitive ability production will only be increasing in current ability levels, if cognitive ability is sufficiently high. This result suggests that an intervention which successfully increases educational participation could have a negative effect on non-cognitive ability if the individual concerned is unlikely to perceive that they have succeeded in the tasks that were attempted. Teachers and parents will recognize this phenomenon if they have ever persuaded a reluctant child to attempt something, only for that child to give up when they do not perceive that the attempt has been successful. Section 5 explores more thoroughly the effects of simulated interventions.

Persistence in social inequalities
Although the importance of persistent social inequality is well established (Hobcraft 2002), there is little consensus concerning the mechanisms by which it is perpetuated. Parents, teachers, schools, peers, neighborhoods, family structure, and family finances have all been found to affect socio-economic outcomes (e.g., Breen and Jonsson 2005;Carrell et al. 2010;Bradley and Nguyen 2004;Sacerdote 2011;Sparkes and Glennerster 2002;Kiernan and Hobcraft 2001;Gregg and Machin 2000). Each of those factors may, therefore, contribute toward the observed inter-generational persistence of social inequality. Nevertheless, convincing evidence in this area is limited, partly because each of these factors is closely co-determined with other socio-economic variables (Haveman and Wolfe 1995), but also because the observed effects are probably the compound result of many mechanisms (Koch et al. 2015).
The contribution of this paper is to describe one common mechanism which could contribute toward each of the aforementioned effects. Proposition 3 demonstrates the importance of current ability levels for future ability development, and Proposition 2 demonstrates that individuals with sufficiently low ability levels will, in expectation, be trapped into a cycle of low educational investment and decreasing relative ability levels. Moreover, Fig. 4 demonstrates that, for a robust set of functional form assumptions, the first few incremental educational experiences will have a substantial lasting effect on educational and employment outcomes. Taken together, these results suggest that a child who develops low levels of cognitive and non-cognitive ability during their early years is likely to diverge away from their high-performing peers thereafter.
Divergence in cognitive development is well established (Heckman 2006); however, under our model it is also entirely avoidable. If a disadvantaged child were to deliberate upon the future benefits of each educational opportunity, then s/he would participate regularly, and so, over time, his/her initial disadvantage in both cognitive and noncognitive abilities could be overturned (see Fig. 5). Empirically, Heller et al. (2017) have recently found strong corroborating evidence for the importance of an individual's propensity to act deliberatively as a determinant of their life outcomes. However, a young child's propensity to act deliberatively is likely to be shaped by influences such as his/her parents, teachers, schools, peers, neighborhoods, family structure, and family finances. Thus, our model suggests that at least part of the effect of these determinants of social inequality will be mediated by its effect on children's noncognitive propensity to deliberately participate in educational opportunities.

Relationships between abilities and behaviors
High school dropouts are 2.1 times less likely to earn over $25,000 per annum, 2.4 times more likely to be incarcerated, 3.0 times more likely to have a child before the age of twenty, 1.3 times less likely to report 'good' health outcomes, and 1.2 times less likely to report 'good' happiness outcomes. 10 These results outline the well-known correlation between educational outcomes and a wide range of normatively suboptimal behaviors, but they say nothing about what causal mechanisms drive these findings.
There is also a large literature connecting the Cognitive Reflection Test (CRT) of Frederick (2005) to a comparable range of social outcomes, as well as to decision metrics such as present bias and risk aversion (Frederick 2005;Oechssler et al. 2009;Toplak et al. 2011, data from Shenhav et al. 2017). The behavioral correlations in this literature are "so strong" as to be "begging for a theoretical explanation" (Frederick 2005, p. 26). Our generalized decision theory provides that explanation and also brings together these two bodies of literature.
Cognitive Reflection Test items are designed so that they have an intuitively appealing wrong answer, whilst the correct answer is counter-intuitive. As such, the CRT provides a direct empirical measure of an individual's propensity to act deliberatively, which is their fundamental non-cognitive ability under the decision theory presented in Fig. 2. The present section applies that model to incremental educational decisions, but we have already identified that incremental health decisions could equally be modeled in this way, and we note now that the incremental decisions that constitute a pathway to incarceration or teenage parenthood could also fall within the paradigm of Fig. 2. Similarly, we noted in Sect. 4.1 that our model amounts to an alternative conceptualization of present bias, since, under our model, an individual's propensity to act deliberatively is precisely their propensity to take due consideration of the longterm consequences of their actions. Crucially, this propensity to deliberate represents a closely related aspect of non-cognitive ability across any of these decision domains.
Whilst the nested utility formulations for impulsive and deliberative thought processes will be influenced by context-specific motivations, our generalized decision theory, therefore, implies that an individual's propensity to deliberate will be a fundamental non-cognitive ability with pervasive influence across decision domains. In turn, this implies that observable outcomes across those domains should be correlated, and that measures such as the CRT should mediate those correlations.

Chronic non-employment
Classically, non-employment is studied as a demand-side phenomenon. The theoretical literature emphasises its macroeconomic determinants, such as distortions or shocks that prevent the labor market from clearing (Pigou 1933;Keynes 1936), or structural factors, such as stochastic wage offers or imperfect matching technology (McCall 1970;Mortensen 1970;Pissarides 1990Pissarides , 2000. More recently, economic theory has explained the observed negative employment outcomes for low-skilled population subgroups through skill-biased technological change (Autor et al. 2003), or trade liberalization (Wood 1995). However, none of these theories can explain why any particular individual should experience chronic non-employment, unless there is a persistent dearth of accessible job vacancies in their area. Empirically, this is not the case in the UK: Only 0.4% of inactive UK individuals blame a lack of job vacancies, 11 and, over the most recently available 12 months of data, an average of 274 elementary vacancies per month were notified to job centers in each of the 297 UK Travel To Work Areas. 12 Yet around 4.7% of UK individuals fail to gain employment within 24 months of leaving education, 13 and these individuals face a substantial risk of chronic non-employment (Gregg 2001). It is, therefore, important to develop a supply-side theory which can explain these observations.
The canonical theory of labor supply (originating from Wicksteed 1910) can only explain non-employment as voluntary. However, it is doubtful whether 4.7% of society would deliberately choose the strikingly negative financial and non-pecuniary outcomes associated with social exclusion (Hills et al. 2002). In contrast to most existing theory, this paper emphasises that no individual chooses their employment outcomes, but rather they choose the effort which is put into the job-search process. The job-search process may, therefore, be modeled by the theory set out in Sect. 3, in which case individuals could remain chronically and involuntarily non-employed as a result of inadequate skill development. Inadequate non-cognitive ability implies a reduced frequency of identifying and engaging with application processes, and inadequate cognitive ability implies a reduced standard of applications, whenever they are attempted. 11 Jan-March 2017 Labour Force Survey (ONS 2017). 12 Moreover, only 28 of those 3574 month × travel-to-work-area observations reported no new elementary job vacancies (ONS 2012). This is a conservative estimate of the number of unskilled job vacancies, since only the Standard Occupational Classifications (SOC2000) 'elementary' classifications were included, and since "possibly less than half" of vacancies are reported to job centers (ONS n.d.). 13 A detailed derivation of these statistics is provided in the supplementary materials, as is the associated Stata code.
The direct effect of inadequate skill levels on an individual's job-search activity is likely to be compounded by an indirect effect through educational attainment. Taken together, it is possible that some individuals might never gain employment, because the signal that they send to any prospective employer implies that their marginal contribution to productivity will be below the statutory minimum wage rate. This possibility represents a natural extension to the standard labor-market models of either Spence (1973) or Pissarides (2000), given that this paper predicts grossly suboptimal levels of educational investment from a substantial minority of individuals. These implications will now be formally derived by simulating our model for a robust set of parametric assumptions.

Simulating the life course
To further investigate the implications of the proposed model, this section applies it to simulate individual developmental pathways. Relative levels of cognitive and non-cognitive ability are generated across multiple periods of educational tasks and job-search activity, according to the implications that Table 1 summarised for each potential task outcome. Since we are primarily interested in understanding social exclusion, we focus here on a uniform commencement of job-search activity at the end of a compulsory education period. Job search tasks are modeled by the same process as educational tasks, save that the probability of success is reduced, and the duration of each task is increased, to reflect their greater complexity. 14 Additionally, once success is achieved in the job market, the resulting employment is modeled as an absorbing state. Although in reality, individuals may lose their jobs, the school-towork transition is crucial in the determination of an individual's life course (Bradley and Nguyen 2004), and the specification of an absorbing employment state allows us to focus on that transition. Existing theory, such as Pissarides (2000), can provide a good description of the unemployment which results from subsequent job loss.
This section maintains the assumptions that were set out in Sect. 3. In addition, we require various distributional and effect magnitude assumptions to simulate the model. All of these assumptions are operationalized by Table 2, and they are discussed briefly below. The supplementary materials provide an extended rationale for our assumptions, and they also demonstrate that the qualitative results of our simulations are robust to changes in any of those assumptions. Furthermore, the code we supply is designed so that custom manipulation of the model parameters is straightforward.
The main assumption of Table 2 is that the probabilities of success and failure are Beta-distributed. The Beta distribution is a natural choice for distributions of probabilities since it is the conjugate prior of many probability distributions. It has a bell-shaped density on [0, 1] and it is also self-conjugate, which means that the consequences of each outcome as set out in Table 1 are straightforward to parametrize as shown in Table 2. In summary, participation in educational tasks increases cognitive ability whilst avoidance decreases it, and success in educational tasks increases non- The simplest possible parametrization of Table 1 that preserves 0.5 as median ability; Robust to truncated normal adaptation.
Parametric assumptions for the simulations presented in this paper. In all cases, these were also my 'first guess' parameter values. A detailed discussion and robustness checks are provided in the supplement cognitive ability, whilst both failure and avoidance decrease it. The other assumptions set out effect magnitudes, and most may be varied considerably without affecting our qualitative findings. The only parameter for which variation does have qualitatively important implications is the critical ability threshold π * . If this parameter drops too close to our agents' initial expected cognitive ability level, then participation becomes likely under either thought process, and so our model would generate little variation in outcomes. The qualitative significance of π * is, therefore, derived from the fact that we simulate outcomes for a set of counterfactual agents who all receive the same initial ability endowment, and who all make participation decisions based upon the same two representative thought processes. We assume homogeneous endowments and homogeneous tastes to allow us to identify cleanly the novel effects of heterogeneity in thought processes. Figure 3 illustrates the human capital development of 12 ex ante identical economic agents across 500 periods of educational decisions, followed by 201 periods of job search. Although a more extended time horizon would better reflect reality, Fig. 4 shows that 500 periods are adequate to establish developmental trends, and we are constrained here by legibility. The realized draws from agents' cognitive ability distributions {π it } t are plotted for each period until they gain employment, at which point their outcome will become fixed at Y . An individual's history of realized π values, therefore, captures the evolution of their full ability distribution through time. In Fig. 3, it can be seen, for example, that the variance of these distributions reduces across time for all individuals, and most markedly so during their early development. This phenomenon is reminiscent of the observations which motivated Case-Based Decision Theory (Gilboa and Schmeidler 1995). It can also be seen that many individuals (e.g., 5, 9) remain close to the average ability level of 0.5, whilst others diverge towards much lower or higher rates of success (eg. 1, 12 rsp.). Figure 3 also captures the evolution of agents' non-cognitive ability p it . This can be read off as the trend in the density of π it realizations, since these are only plotted for periods where tasks are attempted. Thus, the relationship between cognitive and non-cognitive skill levels is plain: those individuals with upward-trending cognitive ability also attempt tasks with increasing frequency, whereas those with downwardtrending success probabilities also exhibit a deteriorating participation likelihood. This demonstrates that dynamic complementarity and self-productivity of cognitive and non-cognitive skills can indeed explain substantial levels of inequality in educational outcomes, even for individuals with identical initial ability endowments.
Individuals' employment outcomes are captured in Fig. 3 by the number of periods between the start of job search (the vertical line) and the achievement of employment (whereafter outcomes ≡ Y ). It can be seen that individuals 1, 4, 7, 8, and 11 do not gain employment within 201 periods of job search, and furthermore that their attempts to do so are both irregular and of a low standard. These outcomes contrast with those of individual 3, who develops high levels of ability and gains a job immediately, and individual 12, who experiences unexpectedly poor job-market outcomes, but has developed sufficient non-cognitive ability that they persevere without any noticeable disengagement. The contrasting outcomes of individuals 1 and 12 substantiate the hypothesis of Duckworth et al. (e.g., 2007) that an individual's level of grit may be a key determinant of their life course.
To investigate the origins of the substantial heterogeneity evident above, Fig. 4 presents the average treatment effect for individuals whose first five outcomes are 24 I. P. Embrey  15 This panel validates the interpretation of realized π values as relative ability draws, since their expected value closely follows 0.5 across the educational phase. During job search, the cumulative employment rate of untreated individuals demonstrates that, whilst most quickly gain employment, 14% of the cohort remain unemployed after 300 periods of job search. This result reflects reality appropriately well; however, a precise interpretation is not intended as that would necessarily rely on some conjectured real-world duration for the model's time periods. Nevertheless, a precise comparison between cohorts is appropriate, since each individual in each cohort has a counterpart in the other cohorts who experiences identical stochastic circumstances. Thus, panel A provides a perfect counter-factual against which to compare the treated cohorts.
Panel B of Fig. 4 shows the average treatment effect of guaranteed success in each of the first five periods. This treatment not only produces rapid initial development, but also leads to a continuing upward trend in relative ability throughout the educational phase. The resulting improvement in employment outcomes is substantial: this cohort achieves 99.5% employment after 94 periods of job search, and full employment after 220 periods of job search.
Panel C shows that the effect of five periods of guaranteed failure is small in the long run; however, an initial spike in cognitive skill levels is evident. These observations are explained by the offsetting effects that experience of failure produces: cognitive ability is developed at the cost of a reduced likelihood of future participation. 16 This finding is of particular interest, since it exposes the ineffectiveness of any intervention which provides sufficient extrinsic motivation to ensure that an individual attempts tasks, but which fails to thereby improve their conscientiousness or intrinsic motivation. Such an intervention might improve that individual's academic attainment for as long as it is maintained, but at the cost of a commensurate reduction in relative noncognitive ability. It is possible that the modern era of high-stakes school competition might incentivise schools to supply such an intervention throughout a child's education, thereby improving their academic performance at the cost of their non-cognitive development. The detrimental impact of such an intervention would become manifest only upon school graduation.
Panel D of Fig. 4 shows that five periods of initial avoidance could have a devastating effect on an individual's developmental pathway. This cohort develops markedly lower skill levels during the educational phase, and only 58% of them have gained employment by the time that cohort B is fully employed. This is remarkable, since, at the commencement of job search, the initial five periods of treatment account for less than 0.2% of the agents' forgetfulness-adjusted memories. Nevertheless, the results are in keeping with the empirical literature, which has firmly established the importance of early-life investment in skill development (e.g., . It is also interesting to note that the developmental pathway of a 'median' individual who experiences initial disadvantage is observationally equivalent (after those few periods) to an individual who is endowed with lower ability levels. Thus, in this model, social factors could explain up to the entire variation in observed educational, employment, or health outcomes.
The simulations presented in Fig. 5 investigate the extent to which an effective later intervention could counteract initial disadvantage. An intervention is conceptualized here as an exogenous force which ensures that individuals attempt all tasks. As such, intervention is guaranteed to improve cognitive ability (the probability of task success), however its effect on non-cognitive ability (the probability of attempting tasks) is uncertain-individuals' participation likelihood will increase if and only if they experience some success. This conceptualization of an intervention is commensurate with the provision of support in the form of encouragement, mentoring, or academic/job-search assistance, but contrasts with enforced or 'bribed' participation which could reduce its subjects' independent participation likelihood. As noted in Sect. 2, and in the analysis of Fig. 4c above, any intervention which failed to develop the intrinsic participation likelihood of its subjects should, under the present framework, be modeled as guaranteed failure, because its short-run cognitive benefits would be offset by non-cognitive losses which would manifest only once the intervention was removed. 17 16 The corresponding plot for the evolution of E i ( p it ) shows a negative initial spike which is almost a precise mirror image of that displayed by {E i (π it )} t in Fig. 4C. Since this is the only instance in which any cohort's evolution of E i ( p it ) differs noticeably from its corresponding E i (π it ) pathway, the former are not reproduced here, but are available in the supplementary materials. 17 An alternative intervention which guarantees success would be straightforward to program, but is difficult to envisage in reality: for example, children who suspect that their success has been falsely inflated tend to suffer reduced confidence, and in any case one cannot sustain false appearances of success indefinitely.

Fig. 5 Average Treatment Effects for Interventions by Time Point
The thick black lines in Fig. 5 illustrate the baseline outcomes of a cohort of 200 individuals who are disadvantaged by five periods of initial task avoidance. Each additional line represents a counterfactual cohort which experiences the aforementioned disadvantage, but also 18 periods of intervention-hence, these cohorts follow the baseline until the onset of that intervention. It can be seen that an intervention starting at period 20 has considerably greater impact than an intervention starting at period 100, although further investigations suggest that rate of decline in intervention efficacy is quite low thereafter. Where an intervention takes place during job search, it benefits from the added possibility that some of its subjects' attempts might successfully result in employment. Thus, the dashed gray intervention appears to be highly successful in the short term. However, post-intervention that cohort's employment rate increases markedly more slowly than that of the baseline, because the individuals who succeeded during the intervention were those who were already the most likely to gain employment. This is a manifestation of the dead-weight loss associated with an untargeted intervention (Besley and Kanbur 1991). Figure 5 demonstrates that, within the proposed model, intervention will be considerably more effectual if it takes place before individuals become trapped into the downward spiral of deteriorating participation and outcomes that was predicted by Proposition 2. This conclusion was also reached by Lavecchia et al. (2015) in their survey of the behavioral economics of education. It is also worth noting that any individual who grows up with a familial and social context that is sufficiently supportive Footnote 17 continued Similarly, the act of providing an individual with a job would not, in itself, improve their ability to succeed in applying for future jobs (although the human capital gained whilst in artificial employment might improve future success likelihoods, precisely as per the intervention modeled here). so as to ensure perpetual participation would effectively be in a state of continual intervention. Thus, even an individual with a particularly low ability endowment would be likely to achieve a high level of success if they were socially advantaged. Indeed, further analyses suggest that a cohort with an ability endowment corresponding to 200 periods of initial avoidance could, by period 500, achieve an average ability level in excess of any in Fig. 4, if they were supported to attempt every educational task thereafter. Conversely, if an individual's familial and social circumstances constantly reduce their likelihood of participation, even a highly effective early intervention could be overturned if it is not followed up. Thus, the proposed model can explain high levels of intergenerational persistence in educational, employment, or health outcomes whenever an individual's background affects their task participation decisions.
These simulations have demonstrated that the proposed model can explain substantial heterogeneity in individuals' outcomes as the result of small differences in their early-life experiences. Further, we have seen that an individual's propensity to apply an economically rational thought process could be at least as important as their cognitive ability level in the determination of those outcomes. Since these results were derived despite maintaining the strong assumption of taste homogeneity, they suggest that heterogeneity in thought processes may indeed be an important source of individual differences.
Our model, therefore, implies that the environment in which young people develop could have far-reaching consequences, independent of any innate individual characteristics. This conclusion has clear policy implications. One implication is that parents and educators should attempt to foster supportive home and educational environments, specifically by encouraging children to try their best and to view mistakes positively. Another implication is that effective support for new mothers and young children could represent an extremely worthwhile investment, and that such support would be effective whenever it develops those children's non-cognitive skill levels; specifically their conscientiousness, openness to experience, and resilience to disappointment. Furthermore, those non-cognitive skills should remain a key focus throughout the educational system. On a practical level, this could be achieved by an investigative pedagogy where open tasks predominate, and by an increased recognition of sports and the arts, since the intrinsic appeal of those activities often motivates participants to try their best to overcome failures.

Discussion and conclusion
This paper has responded to the accumulating evidence that Homo sapiens exhibit heterogeneous thought processes, by generalizing the Neoclassical modeling approach to admit two distinct utility formulations. Commensurate with the default-interventionist paradigm, agents' deliberative reasoning is assumed to override their impulsive response according to an individual-and situation-specific probability distribution. This approach is the theoretical dual of the common empirical practice that allows each individual to implement any of a finite mixture of possible decision processes, and it operationalizes the much older concept that individuals may act "variously and accidentally, depending on whether mood, inclination, or self-interest happens to be uppermost" (Smith 1759, p. 276). This paper has demonstrated that such thoughtprocess heterogeneity can explain substantial individual differences-even for agents with homogeneous tastes-a finding which suggests that the Neoclassical approach may sometimes misrepresent the former source of heterogeneity as the latter.
Thought-process heterogeneity is an essential feature of any decision situation wherein individuals may act without first considering the consequences of that action. Such situations include the class of minor decisions that incrementally affect individuals' educational, employment, or health outcomes. This paper applied the generalized decision theory to describe educational investment decisions, and thereby provided an explanation for several empirical puzzles. For example, a non-negligible proportion of society is observed to develop such low human capital that they become socially excluded: this paper suggests that those individuals never choose that outcome, but rather that it arose as the cumulative consequence of many minor decisions, any of which could be determined according to impulsive rather than deliberative thought processes.
Our model, therefore, implies that an individual's propensity to act deliberatively should be considered their fundamental non-cognitive ability. That characterization contributes the first concrete theoretical description of non-cognitive ability, and it provides a mechanism by which non-cognitive ability would be a dynamic complement of cognitive ability. By endogenising the observed dynamic relationships between cognitive and non-cognitive ability, the proposed model is also able to explain each of the nine stylized facts of human capital development listed by , chief among which is a path dependence within human capital development. Such path dependence brings economic understanding closer to the health inequalities literature, which considers unhealthy decisions to be a downstream product of socioeconomic determinants, rather than a consequence of heterogeneity in individual tastes (Graham 2007;Watt 2007). Under our generalized decision theory, individuals who make normatively poor decisions need not undervalue their long-term consequences; rather, they may simply not consider those consequences in the moment of decision, because their childhood did not condition them to do so. This paper has also begun to explore the wider applicability of the generalized decision theory. This exploration has yielded a direct theoretical basis for the crowding-out hypothesis; namely that the provision of an additional, extrinsic, motivation may have the perverse effect of cueing an undesirable thought process. Similarly, the empirically puzzling relationships between cognitive ability, non-cognitive ability, behavioral biases, and socio-economic outcomes were seen to arise as direct consequences of the generalized decision theory. It, therefore, seems likely that other empirical puzzles might also be explained as consequences of that theory. For example, it is known that factors which appear to be orthogonal to individuals' tastes, such as the font-size of experimental instructions and concurrent cognitive load, can have a significant influence over decision-outcomes (Alter et al. 2007;Shiv and Fedorikhin 1999). Although such findings challenge the commonly maintained hypothesis that economic agents differ only in their tastes, they are commensurate with those orthogonal factors' likely influence on any individual's propensity to think deliberatively-a mechanism which this paper has demonstrated to be a credible source of individual differences. Further research is, therefore, needed to explore the implications of the generalized decision theory more fully.
In many situations, the generalized decision theory is likely to yield little additional insight over the Neoclassical approach. In particular, it seems reasonable to model individuals as agents who maximize one single, representative, utility function for applications within the traditional economic domain of profit maximization, or for any application wherein the theorist seeks to prescribe a normatively optimal trade-off between conflicting motivations. Furthermore, in those situations, the Neoclassical approach will provide a more mathematically elegant solution than its generalization. Nevertheless, this paper has derived original and intuitive explanations for individuals' observed decision-making in behavioral situations, as a result of the strictly weaker assumption set which underlies our generalized decision theory. Therefore, future theoretical research should explicitly consider whether the Neoclassical 'single-self' assumption is appropriate, and, if not, whether the generalized decision theory proposed in this paper could better describe their agents' decision-making.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

A1 Proof of proposition 1
For the first part, note that Assumption (ii) implies that γ i (T ) >χ i (T ), and that Assumption (iii) implies thatp i < p i . Thus, these assumptions together ensure that ( p i −p i )[γ −χ i (T )] > 0, which is precisely the condition derived in the text for the crowding-out effect to exist.
For the second part, note that the percentage increase in the probability of adopting an extrinsic thought process due to the intervention is given by ( p i −p i )/(1 − p i ), and that the increase in the probability of choosing a target action under the extrinsic thought process as a percentage of the increase that would be induced by a switch to the intrinsic thought process is given by [χ i (T )−χ i (T )]/[γ −χ i (T )]. Thus if the former exceeds the latter then we have provided that (1− p) ≥ 0, which is assured by its construction as the complement of a probability, and if [γ i (T )−χ i (T )] ≥ 0 which is provided by assumption ii). Thus condition stated in the proposition implies the condition derived in the text for the crowding-out effect to be strong.

A3 Proof of proposition 3
Let us begin by observing that cognitive ability is produced by participation in educational tasks. Specifically, in Sect. 3, it was assumed that i,t+1 will stochastically dominate it if agent i participates in period t, and conversely that i,t+1 will be stochastically dominated by it if agent i does not participate in period t. Thus the change in the expected value of that distribution will increase, that is E(π ) it > 0, if and only if the agent participates in period t.
For the first result, it will, therefore, suffice to show that an agent's probability of participation is an increasing function of their current cognitive ability level. Now if E( p) it = 1 then this is not so, because agent i would be guaranteed to participate in period t. However, if E( p) it < 1 then there is some positive probability that the agent will adopt an impulsive thought process. If so, then we showed in Sect. 3 they will participate if and only if their believed probability of success w(π it , it ) exceeds their critical ability threshold π * it . Now, w(π it , it ) is an increasing function of both its arguments, which in the case of the distribution it means that w(π it , it ) > w(π it ,˜ it ) if it stochastically dominates˜ it , and the converse. Thus in expectation the probability that w(π it , it ) > π * it is increasing in E(π ) it , because by construction all differences in E(π ) it arise through stochastic dominance relations, and because stochastic dominance relations are transitive. Thus, we have demonstrated the first result of the proposition.
The third result is straightforward. Because π * it is within the open support of w(π it , it ), we have that pr(w(π it , it ) > π * it ) < 1. Thus, there is some chance that, under the impulsive thought process, the present task would be avoided. Since participation is guaranteed under the deliberative thought process, an increase in E( p) it , therefore, unambiguously increases the probability of developing cognitive ability in expectation.
Let us now recall that non-cognitive ability is produced by the perception of success in educational tasks. As before, this effect is realized by a stochastic dominance relation whereby P i,t+1 will stochastically dominate P it if agent i succeeds in period t, and conversely P i,t+1 will be stochastically dominated by P it if agent i does not succeed in period t. Thus, the change in the expected value of that distribution will increase, that is E( p) it > 0, if and only if the agent succeeds in period t.
For the second result, we build upon the third result, which established that an increase in E( p) it unambiguously increases the probability of task participation in expectation. Now, the expected effect of task participation on noncognitive ability is given by: E( p) it (participation) = π it E( p) it (success) + (1 − π it ) E( p) it (failure) E( p) it (success), and so, if E(π ) it > 1 3 , and with balanced ability development consequences such that E( p) it (f) = − 1 2 E( p) it (s) we have that E( p) it (participation) > 0. Thus, under those conditions, since E( p) it increases the expected probability of task participation, it also increases pr( E(π ) it > 0).
The fourth result follows from results (1) and (3). In the proof of (1) we established that participation probability is an increasing function of cognitive ability provided that E( p) it < 1, and in the proof of (3) we established that the probability of positive noncognitive development is an increasing function of participation probability provided that E(π ) it > 1 3 .