1 Introduction

Our paper experimentally investigates portfolio investment choices. We deal with this issue of vast empirical relevance within an evolutionary framework. Evolution relies on path dependence, e.g., on the average past success or fitness of the available options, and is usually justified by assuming an infinite population of short-lived actors who are randomly matched to interact with finitely many others. Optimal adaptation is also predicted by equilibrium behavior whose justification is, however, relying on forward-looking rational deliberations of all the interacting parties. Here, we explore whether mutual best responding could be learned. Based on the literature on phenotypical learning, one can distinguish learning in the form of best-reply dynamics, i.e., interacting agents best respond to the most recent constellation of circumstances beyond their own control like stochastic costs and others’ choices, or regret-driven behavioral adaptation.

We focus essentially on the distinction of equilibrium learning and (regret) impulse balancing since, for the setup given, evolutionary stability implies equilibrium behavior. Both concepts allow for static equilibria, based on the common knowledge assumptions of strategic equilibria, but differ in what the interacting parties are supposed to optimise, namely maximize expected profits or minimize regret, across assets. Our experimental scenario modifies Blume and Easley (1992)’s evolutionary model of financial portfolio survival which, under quite strong assumptions, predicts that the share of capital, invested proportionally to the stationary positive probabilities of the various states, eventually converges to 1. Specifically, the state-specific assets pay (the same) positive dividend when their state is realized and become otherwise worthless. Thus, to preserve wealth across time, one has to engage in all assets and to invest proportionally to the state-specific probabilities, as predicted by probability matching.

Like Blume and Easley (1992), we let investors periodically determine their expenditure shares for the various assets and, due to constant supply of assets, this allows to derive the market-clearing prices. Whereas the past average profits are the driving forces of evolutionary selection, as in Blume and Easley , our experiment instead substitutes evolutionary selection by letting participants adapt their portfolio composition across time in light of feedback. Actually, the evolutionarily stable portfolio composition, namely that expenditure shares are proportional to asset probabilities, can also be justified as strategic equilibrium behavior, which requires commonly known rationality. Thus our experimental analysis confronts the joint hypothesis of evolutionarily stable or equilibrium behaviorFootnote 1 with that of impulse balancing, capturing in a quantitative way phenotypical learning via the average choice probabilities of cyclic directional learning (Selten and Buchta 1999).

Table 1 The Blume and Easley ’s model and our contribution

Both approaches provide a static and dynamic justification (see Table 1) where, of course, the dynamics generally do not have to converge. Instead, the static interpretations rely on mutual best responses by equalizing expected profits (strategic equilibrium) and expected regret (impulse balancing). Thus, our study confronts static concepts, based on common knowledge (the top row of Table 1), and their dynamic justifications (the bottom row of Table 1). The former are imported from evolutionary biology; the latter predict phenotypical learning but differ in their driving forces: past average profits (evolutionary stability) or retrospective analysis of own past choices in the light of individual experiences (direction learning).

Furthermore, we will categorize portfolio adaptation in the light of feedback information on past states, past prices, and own investment success. Is portfolio adaptation shaped by both anticipating its likely consequences (the shadow of the future, as in consequentialist logic) and learning from the past (the shadow of the past, as in learning and evolutionary theory)? In view of impulse balancing, we will specifically pay attention to alternative measures of regret (e.g., Avrahami et al. 2005).

The theoretical and empirical study of financial portfolio choice and adaptation has important real-world implications, even when based on stylized setups. For instance, one could easily weaken the assumption of stationary state probabilities by allowing for rare shocks, after which probabilities become stationary again till the next shock applies. The prediction of Blume and Easley (1992) would then still predict the drift in portfolio adaptation in phases of constant probabilities. Whether and how at best boundedly rational participants anticipate such shocks seem so far only exploratively analyzed.

Regarding the comparison of equilibrium behavior and impulse balancing, the experimental evidence so far mainly rests on experimental games with (mostly unique) equilibria in completely mixed (all possible strategies are used with positive probabilities) strategies. Here, the evidence speaks quite in favor of impulse balancing as its mixing predictions are based on payoffs of the mixing players, whereas strategic equilibrium mixing is based on rendering other players indifferent and therefore depends on others’ payoffs.

To assess our conclusions robustly, we vary the vector of stationary state probabilities, the number of assets, and the market size by doubling the number of interacting traders. We will describe how we disentangle between learning from the past and anticipation of changes in crucial parameters after introducing the market model with its benchmark solutions and stating hypotheses.

Altogether our data reveal strong anchoring of portfolio design on the success probabilities, in line with the intuition of probability matching. However, this behavior is less prominent in later rounds and when fewer options are available. Furthermore, reacting to own individual past success in the spirit of direction learning becomes less important over time. Among those who respond to past outcomes, different sources of regret seem to be relevant.

Section 2 describes the market model that is the basis of our experimental setting and offers the benchmark solutions for assessing actual behavior. Section 3 states some hypotheses, and Sect. 4 provides details of the experimental implementation. After the data analysis of Sect. 5, we discuss and summarize our results in the concluding Sect. 6.

2 The market model and its benchmark solution

Let \(m (\ge 2)\) denote the number of assets \(j=1,\ldots ,m\), whose positive success probabilities are stationary, and \(n (\ge 2)\) the number of investors \(i=1,\ldots ,n\). The same n investors face this stationary market environment repeatedly. In the evolutionary analysis, the focus is on the final wealth shares of stationary, e.g., genetically or phenotypically determined portfolio composition types. Reaching the end result of evolution may require infinitely many periods by any selection (see Blume and Easley 1992). However, in the experiment, the horizon of this repeated interaction, the “shadow of the future”, is captured by a finite number of successive investment periods.

Assets are state-specific: for each asset \(j=1,\ldots ,m\), there is a random state with positive probability \(w_j(>0)\). Evolutionary selection assumes repeated application of these stationary and state-specific probabilities, but does not require investors to know them. To allow for immediate or quick convergence to equilibrium behavior, the experiment, however, induces common knowledge of probabilities, respectively of their changes.Footnote 2 An asset j is only rewarded with return rate \(R(>0)\) when state j is realized; otherwise, asset j becomes worthless. Specifically, it will be assumed, and experimentally induced, that vector \(w=(w_1,\ldots ,w_m)\) is commonly known. Actually, w and m and n will be the conditions varied in our experimental analysis.

The only possibility to save wealth from one period to the next is by investing it, since what is not invested is lost. For all investors \(i=1,\ldots ,n\), let E(i) denote the wealth of investor i which i can invest in the m assets by choosing portfolio shares \(s(i)=(s_1(i),\ldots ,s_m(i))\), with \(0\le s_j(i) \le 1\) for \(j=1,\ldots ,m\) and \(s_1(i)+\cdots + s_m(i)=1\), i.e., \(s_j(i)E(i)\) is the amount which i invests in asset \(j=1,\ldots ,m\).Footnote 3

For each of the m assets, there exists a constant supply of \(S=100\) units. Thus, the market-clearing price \(p_j\) for asset \(j=1,\ldots ,m\) is determined by \(p_j S=\sum _{i=1}^n s_j(i) E(i)\). At this price, \(p_j\) investor i buys according to i’s hyperbolic demand function \(\frac{s_j(i) E(i)}{p_j}\) units of asset j which pay him, in case of state j being selected, \(\pi _j(i)=R \cdot \frac{s_j(i)E(i)}{p_j}\) and become worthless otherwise.Footnote 4 The expected payoff of choice s(i) by investor i is \(\sum _{j=1}^{m}w_j\pi _j(i)\) for \(j=1, \ldots m\).

The equilibrium benchmark s for this interactive portfolio choice is portfolio \(s^*=(s_1^*,\ldots ,s_m^*)\) with \(s_j^*=w_j\) for \(j=1,\ldots ,m\) and \(s^*(i)=s^*\) for \(i=1,\ldots ,n\). In addition to its obvious prominence as the only possible anchor for participants in the experiment, the equilibrium benchmark can be justified by evolutionary stability.Footnote 5

If \(E(i)=E\) for all interacting investors \(i=1,\ldots , n\), as experimentally implemented, common reliance on benchmark behavior \(s(i)=s^*\) for \(i=1,\ldots ,n\) implies the equilibrium prices \(p_j=\frac{nE}{S}w_j\) for \(j=1,\ldots ,n\) and expected payoffs RS/n. Thus, investors would earn in expectation less when only n increases, but would earn the same when both, s and n, increase, such that S/n remains constant. The effect of increasing R is obvious.

In their innovative evolutionary justification, Blume and Easley assume each investor \(i=1,\ldots ,n\) to be only once positively endowed. Therefore, in later periods, one’s endowment is positive only when having positively invested in all previous winning states. In the experiment, we deviate from this to keep all participants “alive” in the sense of maintaining them as active investors. Specifically, we endow participants repeatedly in each period and also allow them to change their portfolio composition, i.e., to adapt their portfolio choice from period to period. Nevertheless, the intuition of the evolutionarily stable portfolios \(s^*=w\) provides a useful benchmark prediction about how portfolio choices might converge with more and more experience. We predict learning to let average portfolio choices s(i) converge to \(s^*=w\), eventually.

Evolutionarily stable choices often coincide with equilibrium behavior in a game-theoretic sense, i.e., when the game and rationality of all investors are commonly known.Footnote 6 The normative justification, based on common(ly known) rationality and risk neutrality, does not require learning or evolution, but is purely based on consequentialist forward-looking deliberation (the shadow of the future).Footnote 7 Crucially, observing convergence to \(s^*\) choices only after learning would support the evolutionary or learning rather than the normative \(s^*\) justification.

Even when participants frequently rely on \(s^*\) already initially, this may not necessarily support the equilibrium justification of \(s^*\), as initial reliance on \(s^*\) is predicted by anchoring in the form of probability matching. Actually, \(w=(w_1,\ldots ,w_m)\) is the only numerical anchor provided by the instructions. If at all, the only alternative anchor, namely \(s_j(i)=1/m\) for \(j=1,\ldots , m\) and \(i=1,\ldots ,n\), the so-called 1/m-heuristic (Benartzi and Thaler 2001) (often referred to as the Golden Mean, 1/m), is only implicitly provided.

However, neither the static equilibrium concept nor anchoring predict whether and how participants will question its recommendation after receiving feedback information. If, by chance, the first randomly selected states are rather unexpected, e.g., by being very unlikely, one might question anchoring on w. Similarly, when frequently experiencing a state j with \(w_j\) exceeding 1/m considerably, the 1/m-heuristic might lose its appeal. Furthermore, when allowing subjective probability adaptation to past realizations of states in spite of the independent repeated application of w, one may engage in best-reply dynamics to past results, for instance, due to the gambler’s fallacy.

An alternative equilibrium concept presupposes regret balancing, for instance, impulse balance equilibria (Selten and Buchta 1999; Ockenfels and Selten 2005) capturing the average choice probabilities of cyclical direction learning. In our context, this means to balance the possible regret when investing in the m different assets. Here, we measure regret via investment shares rather than by expected monetary returns, which depend on market prices and others’ behavior. Specifically, when asset j was drawn, agent i regrets everything not invested in asset j. This occurs with probability \(w_j\) and amounts to the expected regret of \(w_j(1-s_j(i))\). When asset j was not drawn, i regrets everything invested in j, i.e., the portfolio share \(1- s_j(i)\). This occurs with probability \((1-w_j)\), amounting to \((1-w_j)s_j(i)\). Balancing these two amounts results in \(s_j(i)=w_j\), so that, in the case of monomorphic \(s^*=w\)-behavior, no investor would suffer from unbalanced regret. In the analysis of our experimental data, we also present other, more behaviorally motivated measures of regret and explore their relationship to dynamic portfolio adjustment (see Sect. 5.3.3).

When considering only their final predictions, all four approachesFootnote 8evolution, equilibrium, w-anchoring, and impulse balancing—predict the same portfolio compositions, i.e., \(s^*=w\), whereas the 1/m-heuristic predicts, at least initially, w-insensitivity. For the experimental analysis, the research questions, therefore, concern more the dynamics, i.e., the individual adaptation of portfolio composition, than the final choices.

  • Will the same final \(s^*\)-predictions be confirmed by monotonic adaptations, possibly by moving away from \(s_j(i)=1/m\) choices, towards \(s^*\)

    • already initially,

    • only after learning,

    • not at all?

  • Will portfolio adaptation be cyclic as suggested by direction learning (Selten 2004)?

  • Will the experimentally observed adaptation steps allow to distinguish between the static (anchoring, equilibrium, and impulse balancing) or dynamic (evolutionary, learning, e.g., qualitative direction learning) \(s^*\)-justification?

3 Hypotheses

From a behavioral perspective, (expected) regret may not be purely restricted to investment, but is probably also affected by wealth effects that depend on market-clearing prices. Therefore, for example, impulse balancing, the underlying intuition of directional learning, provides a clear hypothesis of adaptive behavior: when i has invested \(s_j(i) E(i)\) in asset j whose state j is not selected, we expect a likely \(s_j(i)\)-decrease in the next period. The effect should be regret related via \((1-s_j)\frac{s_j(i)E(i)}{p_j}\) or via \(s_k\frac{s_j(i)E(i)}{p_k}\), where k denotes the actually selected state in the past period. The first specification does not presuppose that one could have guessed correctly the actually realized state j; the second assumes that one could have guessed right. Of course, in the case of two states only, \(m=2\), which we also explore, the two specifications coincide.

The evolutionary analysis by Blume and Easley does not rely on the individual adaptation of portfolios but on individual wealth. To capture this intuition and test it by experimental data, one could explore whether, across time those investors with past choices closer to \(s^*\) have on average also earned more. This would, at least qualitatively, confirm the intuition of the evolutionary selection driving the result of Blume and Easley . For the static justifications of \(s^*\), one may at best observe some initial effects, e.g., when by chance the first random realizations differ considerably from w. Will this induce adaptation of subjective beliefs away from w, or will anchoring at w persist and be revealed by constant portfolio choices close to w?

In our view, the multi-dimensional action space for choosing s(i) and the rather fine grid, imposed by integer percentages from 0% to 100% when selecting \(s_j(i)\), provide an interesting setup for studying and comparing learning in a rather rich setup.

Behaviorally, we cannot hope for the same portfolio selection by all individuals, which renders the anticipation of market-clearing prices questionable. When analyzing how portfolios are adapted, one can use past instead of anticipated prices. From an evolutionary perspective, one may, for instance, rely on \(w_j/p_j\), with \(p_j\) denoting the past price of asset \(j=1,\ldots ,m\) to compare what a marginal token would achieve in different states j.

Concerning impulse balancing, we rely on the dynamic intuition of direction learning (see Selten and Buchta 1999) to investigate whether and how participants react to feedback information after each period. This implies conditioning on just the past state k, so that \(s_k(i)\) will not be increased when state k had not been drawn and that \(s_k(i)\) should not be decreased when state k was realized. Of course, how one reacts to (non) realized states may also depend on past prices due to impulses \(s_j(i)E(i)/p_j\) and \(s_k(i)E(i)/p_k\), a quantitative aspect of impulse balancing but not of purely qualitative direction learning.Footnote 9

Participants may over-or under-estimate the probability of observing the same state in successive rounds, resulting in frequent restructuring of portfolios, which may question (convergence to) \(s^*\)-behavior. Behaviorally, participants may be influenced by past prices \(p_i\) and state-specific \(w_i\) in other ways than predicted by impulse balancing or equilibrium (evolution). As postulated by reinforcement learning, path dependence may also be based on longer memory length than just one round as supposed by direction learning.

The theoretical predictions do not depend on endowments and differences in individual endowments. Furthermore, probabilities of the m states can be rather similar and very different, which will be explored experimentally by varying the probabilities of the m states. If, for instance, the probabilities of the m different states do not differ much, this should enhance the support for the so-called “1/m” heuristic (Benartzi and Thaler 2001) predicting equal portfolio shares for all m assets. In the case of substantially different probabilities, even investors, fearing the risk of neglecting low probability assets, should learn to invest significantly less in substantially less likely states.

4 Experimental design and procedures

Participants i are asked to allocate 100 tokens to the available assets or prospects by choosing s(i), repeatedly. When a prospect is selected, its returns are positive; otherwise, its returns are null. Specifically, the returns from the prospect are given by €0.10 \(\times\) the number of units purchased of the selected prospect.Footnote 10

Overall, four different sets of prospects are considered, with either m=2 or m=4 states (see Table 2).

Table 2 Probability of success for each prospect (\(w_j\))

The experiment consists of two parts (see Fig. 1): Part 1 and Part 2. In Part 1, participants are matched together in groups of 4, while in Part 2, they are matched together in groups of 8. Each part has four phases: Phase 1, Phase 2, Phase 3, and Phase 4. In each phase, participants choose how to invest in the available prospects, facing one of the four sets in Table 2 over seven successive rounds. Which set is displayed in which phase is randomly defined at the matching group level, but all subjects see all sets, and within a phase, the set is kept constant. In Part 1, when groups are made of two participants, individuals interact in matching groups of size 8. In part 2, the matching groups include 16 subjects each. The composition of the group does not change within a phase, but is randomly redefined from phase to phase.

Fig. 1
figure 1

Experimental structure (Part 1 (2): 4 (8) investors)

Overall, participants face 56 successive decision tasks. The final payment in the experiment is given by randomly drawing one round from each of the four phases in Part 1 and one round from each of the four phases in Part 2.

The experiment was run at the laboratory of the Max Planck Institute Jena, Germany. The experiment was programmed and conducted using zTree software (Fischbacher 2007). A total of 128 participants took part in the experiment over four experimental sessions. In total, we collected 7168 individual portfolio choices, s(i).

5 Results

First, we analyze decision times for completing the task to understand how participants familiarize themselves with the task. Then, we present investment choices and their deviation from probability matching behavior. In Sect. 5.3, we investigate investment dynamics with specific attention to alternative regret measures. Finally, regression analysis provides us with a general assessment of our main results.

5.1 Decision times

The graph (Fig. 2) represents the distribution of median decision times in seconds in each round of the experiment.

Fig. 2
figure 2

Decision time

Participants become much faster in choosing as they familiarize themselves with the task, testified by the polynomial fitting of the data (dashed line). Spikes correspond to periods \(1+7x\) with x \(\in [0,7]\) and highlight the adjustment needed when a new experimental condition, a new probability vector w and/or a larger number of investors, is announced and portfolios are adjusted what, as such, indicates forward-looking inclinations. A Wilcoxon signed-rank test comparing choice times in the first and in the last period shows that choice times significantly decrease over time (p value\(<0.001\)).

Thus, our data show that proficiency in the task increases over rounds as measured by time to choose. Time spent in the choice is generally higher in the first round after announcing a new experimental condition.

5.2 Investment choices

Figure 3 displays how deviations from probability matching are distributed for each prospect whose probability is given below. The symbol “x” denotes the average deviations and medians are indicated by bold lines.

Fig. 3
figure 3

Deviations from probability matching

The graphs show that median choices are generally close to probability matching. However, the distribution of tokens is widely dispersed, and several outliers are present in each condition of choice. Choices clearly reveal that commonly known numerical probabilities provide a much stronger cognitive anchor than potential alternatives, like the Golden Mean or the 1/m-heuristic, especially in the case of the prospect sets with probabilities differing considerably from 1/m. From this, one may infer that the 1/m-heuristic may require ambiguity of success probabilities to emerge.

A series of non-parametric tests shows that the central tendency of the token distribution in the prospect with the lowest probability is significantly different from probability matching, at the conventional 5% level, for Set 2 and 3 when the group size is equal to 4 and for Set 1 when the group size is equal to 8. When pooling data irrespective of set and group size, a (positive) statistically significant difference is registered (p value<0.001).Footnote 11 In terms of investment dynamics, no significant differences are observed in token allocations, when comparing the first and the last round in each block with seven successive choices (all p values \(\ge 0.152\)). Table 3 presents a few summary statistics of absolute deviations from equilibrium prices in each set and for each group size.

Table 3 Deviations from equilibrium prices

Individual heterogeneity in token allocation does not average out in terms of market prices. Actually, prices, a measure of group behavior, are far from their equilibrium levels in each set and for each group size. Furthermore, standard deviations are quite sustained in all experimental conditions. Deviations are generally larger for larger group sizes, as confirmed by a series of Wilcoxon rank-sum tests comparing averages at the matching group levels across the two group sizes (all p values \(\le 0.019\)).

As a complement of the "static" analysis presented above, Fig. 4 provides us with a representation of the evolution of probability matching over rounds. The graph reports the average absolute difference between the share of tokens invested in a prospect by a participant (\(s_{j}(i)\)) and the probability of the prospect (\(w_{j}(i)\)), over the seven rounds.

Fig. 4
figure 4

Average distance from probability matching over rounds

According to Fig. 4, the average distance from probability matching tends to increase over time. A Wilcoxon signed-rank test confirmed this, showing that choices in the first and the last round are significantly different (p value\(<0.001\), test on averages at the matching group level).

5.3 Investment dynamics

5.3.1 Individual-level changes in drawn asset

Figure 5 shows the average variations at the individual level in the share of tokens allocated to the prospect drawn in the previous period \((\Delta s_{\widehat{k}})\). A positive (negative) value denotes individuals who, on average, increase (decrease) the amount allocated to the winning asset of the previous round. The vertical dashed line captures the average change.

Fig. 5
figure 5

Changes in allocation to drawn prospect

According to Fig. 5, the average change is negative but small: most of the changes (78.1%) lie in the interval -0.05/+0.05. Non-parametric tests show that the central tendency of the distribution is negative, either when conditioning the tests at the matching group level or at the individual level (p value\(=0.001\) and p value\(<0.001\), respectively).

The pattern emerging from Fig. 5 suggests that more investors tend to be discouraged from investing in the previously drawn prospect, whereas only the opposite adjustment (or no adjustment at all) is compatible with the notion of directional learning. The observed adjustment may relate to a well-known bias in decision-making under uncertainty, i.e., the gambler’s fallacy, which denies the independence of successive iid-chance moves.

5.3.2 Directional learning

Figure 6 presents the change in the share invested in the winning prospect \({\hat{k}}\) of the previous round: \(\Delta s_{\widehat{k}}(i)=s_{\widehat{k}}^{t}(i)-s_{\widehat{k}}^{t-1}(i)\), where \(s_{\widehat{k}}^{t}(i)\) is the share of tokens invested by subject i at time t in the winning asset \(\widehat{k}\) at time \(t-1\) and \(s_{\widehat{k}}^{t-1}(i)\) is the share of tokens at time \(t-1\).Footnote 12 A change in the share invested in the previously drawn prospects is compatible with directional learning if the share allocated at time t is increased relative to the share at time \(t-1\). White bars capture adjustments compatible with directional learning,Footnote 13 i.e., when the change is positive (\(\Delta s_{\widehat{k}}(i)>0\)), light gray bars when there is no change (\(\Delta s_{\widehat{k}}(i)=0\)), and dark gray bars capture negative changes (\(\Delta s_{\widehat{k}}(i)<0\)).

Fig. 6
figure 6

Round-to-round changes in allocation to drawn prospect

Overall, the majority of changes from round to round are not compatible with directional learning. For the groups of size 4, the average frequency of positive changes is 26.43%; the average frequency of negative changes is 29.82%; the remainder 43.75% is due to inertia. A similar pattern emerges for group size 8, with positive changes covering 22.59% of the observations, negative changes 27.08%, and absence of change 50.33%. Altogether, no adjustment dominates and increases its share as rounds progress. It seems that over time participants appreciate the true randomness of winning chances and become less sensitive to the last random event.

Concerning the evolution of changes over rounds, a series of non-parametric tests find a significant decrease in positive variations from the first available round to the last one, both for group size 4 and 8, as shown by Wilcoxon signed-rank tests at the matching group level (p value=0.021 and p value=0.014, respectively). At the same time, inertia significantly increases across rounds, both for group size 4 and 8 (p value=0.017 and p value=0.014, respectively). Furthermore, the difference between the share of negative changes in the first and last round is statistically significant for group size 8 (p value=0.014) but not for group size 4 (p value=0.417). Altogether, the general tendency is a progressive shift from active change, in both directions, to the invariance of choices.

5.3.3 Regret analysis

To better understand the adaptation dynamics of token allocation, we distinguish four alternative sources of regret which may correlate with a positive change in the token allocation to the winning asset \(\widehat{k}\) in the previous round (\(\Delta s_{\widehat{k}}>0\) or \(\Delta ^+ s_{\widehat{k}}\), see above). The analysis relies on 1506 out of the 6144 available observations (24.5%).

The first measure we adopt is \(regret\_payoff_t(i)=R \frac{1}{1+\sum _{j \ne i}s_{\widehat{k}}^{t-1}(j)} - R \frac{s_{\widehat{k}}^{t-1}(i)}{\sum _{j=1}^ns_{\widehat{k}}^{t-1}(j)}\), the difference between the payoff individual i would have earned had she invested all her resources in the previously drawn asset \(\widehat{k}^{t-1}\) and the amount actually earned by i in period \(t-1\). Second and third we adopt \(regret\_tokens_t(i)=1-s_{\widehat{k}}^{t-1}(i)\), capturing the share not invested in the drawn asset, and \(regret\_others_t(i)=\frac{1}{n-1}\sum _{j\ne i}^ns_{\widehat{k}}^{t-1}(j)-s_{\widehat{k}}^{t-1}(i)\), the difference between the average amount invested by the others in one’s (market) and share invested by individual i in the drawn asset.Footnote 14 The fourth measure is \(regret\_probabilities_t=w_{\widehat{k}}^{t-1}-s_{\widehat{k}}^{t-1}(i)\), the difference between the exogenously given probability \(w_{{\hat{k}}}\) and share invested in the drawn asset.

Table 4 Potential regret sources of behavioral adaptation (Pearson’s product–moment)

According to Table 4, all regret measures are positively and significantly correlated to changes in token allocation to the last drawn assets \(\varvec{\Delta }^+ s_{\widehat{k}}\), except for \(regret\_payoff\). Table 4 also highlights strong correlations among alternative regret measures. In particular, \(regret\_others\) and \(regret\_probabilities\) are strongly correlated (rho=0.922). This is most likely due to the fact that the mean investment in a prospect is very close to the probability of the winning prospect (see Fig. 3).

5.4 Regression analysis

The multivariate regression analysis of Table 5 provides a more refined picture of the relative impact of the alternative regret sources. To corroborate results emerging from the analysis reported above, we present four regression estimates with different model specifications, both in terms of dependent and independent variables as well as in terms of sample size. Regressions (1), (2), and (4) of Table 5 report the outcome of linear mixed models (LMM). Regression (3) is a logit generalized linear mixed model (GLMM Logit). All models are estimated with random effect intercepts at individual and matching group levels.

In Regression (1), the dependent variable is the absolute difference between the share of tokens invested in a prospect by a participant (\(s_{j}(i)\)) and the probability of the prospect (\(w_{j}(i)\)), which we interpret as a direct measure of probability matching.

In Regression (2), the dependent variable is the expected payoff of a participant in each round of choice, given the actual choices of the others in the group (\(\xi [\Pi ]=\sum _{1}^m w_j\pi _j\), where m is the number of prospects in a set and \(\pi _j\) are the earnings in prospect j). This measures the expected profitability of one’s allocation strategy.

In Regression (3), the dependent variable \(I(\Delta ^+ s_{\widehat{k}}(i)\) is equal to 1 if the difference between the amount invested at time t and at time \(t-1\) in the last winning prospect \(\widehat{k}\) (\(\Delta s_{\widehat{k}}(i)=s_{\widehat{k}}^{t}(i)-s_{\widehat{k}}^{t-1}(i)\)) is strictly positive and thus compatible with directional learning. Otherwise, the dependent variable is 0, i.e., if \(\Delta s_{\widehat{k}}(i)=s_{\widehat{k}}^{t}(i)-s_{\widehat{k}}^{t-1}(i)\le 0\).

In Regression (4), the dependent variable is the difference between the amount invested at time t and at time \(t-1\) in winning asset \(\widehat{k}\) at round \(t-1\). (\(\Delta s_{\widehat{k}}(i)=s_{\widehat{k}}^{t}(i)-s_{\widehat{k}}^{t-1}(i)\)). The dependent variable captures the investment reaction to the previous random event, \(\widehat{k}\) in t-1. Only positive adjustments are considered, to shed light on regret balancing.

As explanatory variables, experimentally controlled manipulations are employed across all three model specifications: Round captures the period of choice; \(Size_8\) is a dummy variable taking value 1 when the group size is 8 and value 0 when it is 4; \(Set_i\) with \(i \in \{1,2,3,4\}\) is a dummy variable equal to 1 when choices are from set i and equal to 0 otherwise.

In Regression (2), a measure of absolute relative deviation from probability matching for each portfolio of choices is computed (\(Abs\_Rel\_Diff=\frac{|s_{j}(i)-w_{j}(i)|}{w_{j}(i)}\)). In Regression (4), a few additional controls are added to capture alternative sources of potential regret related to choices in the previous period. A detailed description of these measures is provided above (see Sect. 5.3).

Table 5 Mixed models estimations (random effect intercepts at individual and matching group levels)

Regression (1) reveals that the distance from probability matching tends to increase over rounds. Interestingly, in Set 2, for which the 1/m-heuristic is much closer to probability matching, the shares are further away from probability matching than in Set 1. Furthermore, a Linear Hypothesis test shows that deviations are statistically larger in Set 3 than in Set 4 (Chi-square test, p value\(=0.037\)).

Regression (2) shows that payoffs in Set 1 are lower than those in other sets. Furthermore, a larger distance from probability matching implies a considerable decrease in expected payoffs.

Our findings of the relevance of probability matching are summarized in the following result:

Result 1

Overall, probability matching seems to provide a strong anchoring for investments decisions. However, deviations from probability matching increase over rounds and are larger for smaller sets of options. This has an impact on outcomes, as larger deviations from probability matching impact expected payoffs negatively.

Regression (3) confirms that the likelihood of adjusting in line with directional learning decreases over rounds. The tendency towards directional learning is weaker in larger groups than in smaller ones (\(Size_8\)). Furthermore, sets with more investment options are less likely to induce behavior compatible with directional learning (\(Set_3\) and \(Set_4\)).

Our findings of the relevance of directional learning are summarized in the following result:

Result 2

Overall, directional learning is not a main behavioral driver, and its strength is even weaker in later rounds, for larger groups, and larger sets of options.

Regression (4) highlights the impact of alternative regret measures on changing one’s allocation: both the amount of tokens not invested in the drawn prospect (\(regret\_tokens\)) and the distance between the probability of the drawn asset and the share invested in it (\(regret\_probabilities\)) positively affect how much is invested in the drawn asset. Interestingly, the distance from the behavior of others, which has to be inferred from past market-clearing prices, has no significant impact, what had to be expected, since such inference is cognitively demanding. The difference between the payoff that one would have obtained by investing in the drawn asset and the actual payoff (\(regret\_payoff\)) negatively impacts on behavior what seems compatible with the intuition of directional learning. However, these results must be taken with caution because of the strong correlation between \(regret\_probabilities\) and \(regret\_others\) (see Table 4) and the potential issue of multicollinearity.

Our findings of the relevance of alternative regret measures are summarized in the following result:

Result 3

Regret related to foregone payoffs impacts portfolio adjustments, in line with directional learning. Furthermore, the distance of the investment from the probability of the drawn asset and the distance from entirely investing in it also impact portfolio adjustments.

6 Discussion

Our analysis investigates the choice behavior of enormous field relevance, namely how to invest in different financial assets. But then, we seem to lose all field relevance, partly in line with the theoretical finance literature, by presupposing and experimentally inducing numerically specified and commonly known winning chances. If at all, winning chances in the field are ambiguous, and many professional investors claim individual superiority in knowing them. Interestingly, the evolutionary justification of the benchmark \(s(i)=s^*\) for \(i=1, \ldots , n\) only does require sufficient stationarity of winning chances to allow for evolutionary selection.

In our view, it is surprising that investors’ awareness of numerical winning chances \(w_j\) for \(j=1, \ldots , m\) seems to crowd out heuristics like, for instance, the Golden Mean (1/m). Although this appears like an obvious default in case of considerably blurred winning chances, it is even more astonishing that this crowding out is immediate. The 1/m anchoring is substituted by w-anchoring, at least when w is commonly known. But then, the common knowledge and cognitive demands of the two equilibrium justifications of \(s(i)=s^*\)-behavior for all \(i=1,\ldots ,n\) question seriously their behavioral relevance, especially its psychological validity. Could the average choice tendencies in our data, which are quite in line with this behavior, be due to reaction to myopic regret? This means to retrospectively analyze what would have been best, or at least better, choice behavior in the last period and to assess how much one has actually lost compared to this in the last period. Obviously, the latter can never exceed the former, and a positive difference measures a loss, assessed via a retrospective choice analysis when knowing the last winning asset.

Compared to other investment scenarios, our setup is simple via its block-wise stationary vector of state-specific probabilities across all seven periods of a phase but also much more complex than usual portfolio-choice experiments, since investors endogenously determine the market-clearing prices for markets with 4, respectively 8, interacting agents. The initial steep decline in decision times reveals that the task is at least initially cognitively demanding. The prominent role of probability matching in explaining investment choices, for instance, vis-a-vis the 1/m or Golden mean heuristic, via allocating wealth shares questions that one changes one’s portfolio in the light of myopic feedback information about the most recent results as, for instance, postulated by direction learning. Apparently participants develop more stable investment behavior after more experiences with random success, revealed by increasing inertia across time. Less myopic path dependence, as in reinforcement learning, could account for such increasing inertia share. In our regret analysis, we restricted ourselves to myopic path dependence but, as in direction learning, distinguished several obvious and direct regret measures whose effects on portfolio adjustments in the light of only the most recent results could be confirmed.

Altogether, our results support the prominent and outstanding theoretical prediction of w-homogeneity in investing—a prediction supported by evolutionary stability, strategic equilibrium, anchoring, and regret balancing. It captures the main tendencies even after block-wise changing probability vectors, but does not account for the persistent heterogeneity, which we could at least partly explain by myopic regret effects.

In conclusion, our stylized setup offers insights into policy interventions in real financial markets. Therefore, one might pool all available information and provide it publicly and enhance thereby block-wise stationarity of winning chances. It would also render the assumption that all investors rely on the same winning chances, a crucial assumption of our setup, more realistic and could render financial markets less perilous by limiting exploitation by privately informed traders.