# Modeling judgment of sequentially presented categories using weighting and sampling without replacement

## Abstract

In a series of experiments, Kusev et al. (Journal of Experimental Psychology: Human Perception and Performance 37:1874–1886, 2011) studied relative-frequency judgments of items drawn from two distinct categories. The experiments showed that the judged frequencies of categories of sequentially encountered stimuli are affected by the properties of the experienced sequences. Specifically, a first-run effect was observed, whereby people overestimated the frequency of a given category when that category was the first repeated category to occur in the sequence. Here, we (1) interpret these findings as reflecting the operation of a judgment heuristic sensitive to sequential patterns, (2) present mathematical definitions of the sequences used in Kusev et al. (Journal of Experimental Psychology: Human Perception and Performance 37:1874–1886, 2011), and (3) present a mathematical formalization of the first-run effect—the judgments-relative-to-patterns model—to account for the judged frequencies of sequentially encountered stimuli. The model parameter *w* accounts for the effect of the length of the first run on frequency estimates, given the total sequence length. We fitted data from Kusev et al. (Journal of Experimental Psychology: Human Perception and Performance 37:1874–1886, 2011) to the model parameters, so that with increasing values of *w*, subsequent items in the first run have less influence on judgments. We see the role of the model as essential for advancing knowledge in the psychology of judgments, as well as in other disciplines, such as computer science, cognitive neuroscience, artificial intelligence, and human–computer interaction.

### Keywords

Temporal-sequence patterns Frequency judgments First-run effectMany events in life occur in temporal sequence—for example, sunny and rainy days. A long history of research has investigated memory for—and judgment of—the *frequency* of events encountered in temporal sequence (Brown, 1997; Hasher & Zacks, 1979, 1984). Several prominent cognitive theories have also analyzed how people reason about the processes underlying sequences and how they anticipate individual events in a sequence (Ayton & Fischer, 2004; Kusev et al., 2011; Kusev, van Schaik, Ayton, Dent, & Chater, 2009; Oskarsson, Van Boven, McClelland, & Hastie, 2009; Sedlmeier & Betsch, 2002). Accordingly, in this article we will attempt to offer a theoretical formalization to account for judgments of stimuli experienced in temporal sequences.

*runs*—that is, repetitions of type of stimulus in a sequence

^{1}(Kusev et al., 2011). For example, across a wide range of sequences varying in the relative frequencies of their elements, one would be entitled to assume that, when one encountered a run of one type of stimulus, that stimulus type was likely to be more preponderant in the sequence.

^{2}Consistent with this notion, Kusev et al. (2011) identified a

*first-run effect*, whereby, after experiencing a sequence of stimuli, people give higher estimates to the frequency of a given category of event when that category is the first

*repeated*category to occur in the sequence. For example, in the sequence below, “

**O**” is the first repeated

*type*of stimulus, and although the “O”s and “X”s are equally frequent, “O” is judged to be more frequent.

It is important to note that, although the first run had a biasing effect on frequency judgments, the last run did not. Furthermore, the categories investigated by Kusev et al. (2011) were chosen to be abstract (i.e., checkerboard patterns, geometrical shapes, or sine-wave tones) in Experiments 1–5, or common (in Exp. 6), so that previous experience with the categories (e.g., in terms of familiarity or intuitiveness) would not affect people’s frequency judgments within each experiment. Accordingly, the finding of a first-run effect by Kusev et al. (2011) is in agreement with the results of other research on the perception of the randomness of stimuli (e.g., the “hot-hand fallacy”; Ayton & Fischer, 2004; Gilovich, Vallone, & Tversky, 1985), where a bias in judgment occurred in favor of the category that was repeated in a sequence.

The study of memory has revealed that human judgments are informed by consulting memory for the individual items or by their temporal order. For example, extensive research has indicated that, with a list of objects, participants are likely to remember items at the beginning (Anderson, 1965; Asch, 1946), and end (Miller & Campbell, 1959)—*primacy* and *recency* effects. In contrast, the results in Kusev et al. (2011) demonstrated that judgment of the frequency of types of items in a sequence and a respondent’s recall of the number of individual items of each type in the sequence are dissociated. This finding could not be anticipated by theories that predict that frequency is assessed according to the ease with which individual instances can be brought to mind (e.g., Tversky & Kahneman, 1973).

We attribute this phenomenon to a simple memory heuristic. For example, across a wide range of sequences varying in the likelihood of events, respondents assume that a sequence with a first run of a particular event is more likely to be preponderantly composed of those items. The rationale for this finding is based on the assumption that, in making frequency judgments, people are constrained by information-processing limitations, and hence have a propensity to avoid cognitive load. According to Simon (1956), one way for people to achieve this is to simplify by using satisficing strategies—rather than to attempt to use optimal or normative strategies. We identified the first-run effect as one such possible strategy.

## Stimuli and method

*i*—are given by Eqs. 2, 3, and 4 below (examples are presented in Table 1).

Probabilities of first runs of different lengths (calculated with the JRP model)

First Run of Length 2 Stimuli | First Run of Length 3 Stimuli | First Run of Length 6 Stimuli | ||||||
---|---|---|---|---|---|---|---|---|

Number of Stimuli in Sequence |
| Odds Ratio | Majority Category | Minority Category | Majority Category | Minority Category | Majority Category | Minority Category |

28 | 50:50 | 1.00 | 24 | 24 | 11 | 11 | 1 | 1 |

28 | 60:40 | 2.25 | 32 | 15 | 17 | 5 | 2 | 0 |

28 | 70:30 | 5.44 | 45 | 7 | 30 | 2 | 7 | 0 |

28 | 80:20 | 16.00 | 61 | 3 | 47 | 0 | 20 | 0 |

28 | 90:10 | 81.00 | 79 | 0 | 70 | 0 | 47 | 0 |

30 | 50:50 | 1.00 | 24 | 24 | 11 | 11 | 1 | 1 |

30 | 60:40 | 2.25 | 35 | 15 | 20 | 5 | 3 | 0 |

30 | 70:30 | 5.44 | 48 | 8 | 33 | 2 | 9 | 0 |

30 | 80:20 | 16.00 | 63 | 3 | 50 | 0 | 23 | 0 |

30 | 90:10 | 81.00 | 81 | 1 | 72 | 0 | 50 | 0 |

42 | 50:50 | 1.00 | 24 | 24 | 12 | 12 | 1 | 1 |

42 | 60:40 | 2.25 | 35 | 14 | 20 | 5 | 3 | 0 |

42 | 70:30 | 5.44 | 47 | 8 | 32 | 2 | 9 | 0 |

42 | 80:20 | 16.00 | 61 | 3 | 48 | 0 | 21 | 0 |

42 | 90:10 | 81.00 | 77 | 1 | 68 | 0 | 44 | 0 |

Each respondent was presented with one random sequence of stimuli from two categories (in a predefined proportion). Participants were instructed that they should try to remember as much as possible about the stimuli and were informed that they would be viewing checkerboard patterns or geometrical figures, or listening to tones. The respondents were presented with one sequence of stimuli and then, immediately after the sequence had been presented, they were explicitly asked, via a visual message on the computer screen, to make one judgment of the frequency (as a percentage) of one of the stimulus categories experienced in the sequence.

During the presentation of the experimental trials, respondents were not required to make any explicit judgments of the stimuli; presentation of each stimulus was self-paced via the computer keyboard, and there was no limit on how much time participants could spend observing each stimulus. The next stimulus appeared without delay after a participant’s keypress.

## Judgment relative to patterns

In the interest of clarity, we offer mathematical definitions of the sequences used in this article (see Table 1) and a formalization of the first-run effect in the judgment-relative-to-patterns (JRP) model. Specifically, we found that judgments of frequency are informed by the apprehension of patterns. In particular, after experiencing a sequence of stimuli, people give higher estimates to the frequency of a particular category of event when that category is the first repeated category to occur in the sequence (the first-run effect). The model accommodates the first-run effect by assuming that the judgment of the frequency of a given category of items appearing within the first run is directly influenced by the sequence pattern. Accordingly, the JRP model estimates the probability of the first run occurring by assuming that the elements in sequences are sampled without replacement from a finite sample equal to the sequence length; thus, the probability of a run of one stimulus category continuing diminishes, the more items from that category appear. Empirically, Kusev et al. (2011) found no effect of the length of the first run on frequency judgments, so the model also weights—in a decreasing manner—each probability associated with consecutive items in the first run, such that the weight of each item decreases as the number of category repetitions increases. Defined in this way, the judged frequency of the first-run category produced by the JRP model is always greater than, or at least equal to, the actual predefined proportion.

## Derivation of the model

*N*;

*l*is the length of the first run, or in other words, the number of items within the first run;

*n*is the number of items in the first-run category (the number of items in the non-first-run category is then given by

*N*–

*n*);

*FJ*

_{l}denotes the frequency-judgment estimation in the JRP model. Furthermore, let \( A_n^{(1)} \) denote the event of first appearance of an item from category

*A*in the first run, and \( A_n^{(2)} \) the second appearance of an item from the same category

*A*in the first run. Similarly, we define the event \( A_n^{(i)} \) as the

*i*th appearance of an item from category

*A*in the first run. Thus, the probability of a category

*A*item appearing for the first time within the first run in position

*x*is

*n*

_{x}is the number of remaining items of the first-run category before the first run, and

*N*

_{x}is the total remaining items (of the first-run

*and*the non-first-run categories) before the first run.

*n*

_{1}=

*n*and

*N*

_{1}=

*N*. The probability of a first run of length 2 is the probability that the second item in the first run is from the same category as the first item.

*A*item appearing in Position 3 in the first run is then

*A*item appearing in position

*i*is

*A*starting at Position 1—in other words, to have a first run of this category. If

*x*> 1, then the numerator \( {n_x} = n - {{{\left( {x - 1} \right)}} \left/ {2} \right.} \) if

*x*is uneven, and \( {n_x} = n - {{{\left( {x - 2} \right)}} \left/ {2} \right.} \) if

*x*is even, and the denominator \( {N_x} = N - \left( {x - 1} \right) \). In addition, the length of the first-run,

*l*, has to be less than or equal to the total number of items in the series,

*N*, minus the starting position,

*x*, (i.e.,

*l*≤

*N*– 1). This constraint is necessary in order to guarantee the applicability of \( P_{{n_x}}^{(i)} \) to the special case in which the first run appears at the very end of the test sequence. What is important is that this is the first appearance of a repeated sequence (pattern) in the experiment. The equations above can be generalized using the Gamma function Γ(

*α*) = (

*α*– 1)!, as follows. By rewriting Eq. 4 and using

*n*

_{x}instead of

*n*(which is a special case of

*n*

_{x}), we arrive at the probability of having

*i*repeated consecutive same-category items (out of a total of

*n*such items), \( P_{{n_x}}^{(i)} \), for the first time in a total sequence of items with length

*N*:

*i*≤

*l*. Given the independence of the events \( \left( {A_{{n_x}}^{(i)}\left| {A_{{n_x}}^{(1)},A_{{n_x}}^{(2)},...,A_{{n_x}}^{\left( {i - 1} \right)}} \right.} \right) \), the probability that either of these events occurred is the sum of each probability, \( \sum\limits_{i = 1}^p {P_{{n_x}}^{(i)}} \).

## Results and discussion

The JRP model formally estimates the relative frequency judgment of one of the categories as a function of the probability of the response category to appear in a repeated sequence of arbitrary length. We see the role of the model as essential for advancing knowledge in the psychology of frequency estimation: It provides for the transferability of psychological knowledge to related disciplines such as computer science, artificial intelligence, and human–computer interaction. Our work, formalized in the JRP model, contributes to the psychology of frequency estimation by highlighting and modeling the role of sequential patterns within stimuli in this estimation. Thus, these patterns need to be accounted for, in addition to the role of stimulus characteristics, in research on frequency estimation. In effect, our work exemplifies the ubiquity of sequence effects that have been exposed in other areas of research, such as psychophysical judgments. There it has been claimed that, due to sequence effects, none of the psychophysical laws, such as Weber’s, are general (Lockhead, 2004).

The model accommodates the first-run effect by assuming that the judgment of the frequency of a given category of items appearing within the first run is directly determined by the sequence pattern. Accordingly, *FJ*_{l} assumes that in addition to summing \( P_{{n_x}}^{(i)} \), we also weigh in a decreasing manner each probability associated with consecutive items from the first run. In other words, we assign a weight \( \left( {\frac{{1}}{{{i^w}}}} \right) \) to each probability \( P_{{n_x}}^{(i)} \) for 1 ≤ *i* ≤ *l* and *w* ≥ 1. Note that, chosen in this way, the weight of each probability \( P_{{n_x}}^{(i)} \) decreases as *i* increases. This assumption accounts for the discriminability properties of JRP; that is, as the position of an item within the first run increases, its influence on the judgment decreases, because the weight of any consecutive item in the sum in Eq. 6 below decreases as fast as \( \left( {\frac{{1}}{{{i^w}}}} \right) \to 0 \). In other words, as far as their importance for (or contribution to) JRP is concerned, the subsequent members/items within the first run have an increasingly smaller weighting.

*FJ*

_{l}takes the following form:

*i*th item in the first-run pattern. Note that, defined in this way,

*FJ*

_{l}is always greater than the actual percentage of the judged category, which is given by the ratio 100 ×

*n*

_{x}

*/N*

_{x}, as is shown in Fig. 1. For example,

*the value of FJ*

_{l}predicted by the model in Eq. 6 will always be greater than or equal to 50 (corresponding to a frequency estimate of 50 %) for an experimental sequence of items with two equally represented categories (Fig. 1).

In Fig. 1, we depict the dependence of *FJ* (as given in Eq. 6) on the weights (*w*) and on the length of the first run (*l*), represented as a two-dimensional surface in the space spanned by (*w*, *l*, *FJ*). Figure 1 clearly demonstrates that the frequency judgment approaches the ratio *n*_{x}*/N*_{x} with increases in *w*, and levels off to a constant with increases in *l*.

In Kusev et al.’s (2011) Experiment 1, in which two stimulus categories occurred with equal frequencies, 47 of 78 participants judged the frequency of the first-run category to be > 50 %, consistent with the first-run effect. Another 23 correctly produced a frequency judgment/estimate of 50 %, and a further eight produced a frequency judgment < 50 %. In our model, the frequency of the initial repeated items in the first run makes a larger contribution to the value of *FJ*_{l} than do later items, as described by Eq. 6. The stimulus number within the first run is weighted by the parameter *w.* By increasing the value of *w*, subsequent items in the first-run length have less influence on judgment. Figure 1 depicts the relationship between the weight, *w*, the length of the first run, *l*, and frequency judgment in the model.

In the model fitting, the data were analyzed for the participants (*n* = 47) whose responses were frequency estimates > 50 %. For each participant’s frequency estimate, the model parameter was estimated, using the Wolfram Mathematica 8 platform. For each of 47 jack-knife samples, taking into account the starting position of the first run *x*, the average model parameter of the jack-knife sample was then used to calculate the error in predicting the remaining participant’s frequency estimate. The mean value (with *SD*) of the model parameter was 1.73 (0.45), 95 % confidence interval [CI(*M*).95] = [1.60, 1.86], indicating high precision of the estimate. Therefore, the weight of subsequent positions *i* in the first-run category in the modeled frequency estimate *FJ*_{p} is, on average, reduced by a factor of *i*^{1.73}. Within the jack-knife samples, the model fit was excellent, with a maximum error of < 10^{–11}. Regarding the remaining cases in the jack-knife samples, the actual and predicted frequency judgments were substantially and significantly aligned: intraclass correlation coefficient = .53, *F*(46, 46) = 3.21, *p* < .001. The relative prediction error was relatively small, on the order of 5 %: For the mean absolute error relative to the actual frequency of the remaining case, the mean value (*SD*) was .04 (.05), with CI(*M*).95 = [.03, .05], and for the mean absolute error relative to predicted frequency for the remaining case, the mean value (*SD*) was .04 (.05), with CI(*M*).95 = [.03, .06].

## Conclusion

Our findings are consistent with the idea that people’s frequency judgments are achieved in a similar fashion, insofar as they are made without recollecting individual items in the sequence, but—instead—are influenced by specific properties of the sequence configuration. In particular, we propose a simple strategy that draws minimal effort from our limited-capacity attentional mechanism, whereby respondents use the first run as a cue to frequency.

The basic finding of the first-run effect and its formalization in the present model could have implications for real-world phenomena. Therefore, applied research should investigate how the effect can account for judgments in different domains (e.g., the weather, the outcome of sport matches, and visual search in human–computer interaction).

Another important consideration is the difference between judgments from memory and those made immediately after stimulus presentation (e.g., Dickert & Slovic, 2009). This distinction could also apply to the judgment of sequences of stimuli, where judgments from memory for sequentially presented stimuli (as in Kusev et al., 2011) and judgments based on simultaneous presentation might involve different types of processing. It could be expected that with simultaneous presentation, holistic processing (Hsiao & Cottrell, 2009) would be more likely, thereby reducing the biasing influence of the first run on judgment.

Our work shows that the frequency estimate for a run of symbols will be strongly affected by order effects. A potentially important implication concerns fragment-based approaches to learning, motivated by associative-learning theory. For example, artificial grammar learning (AGL), is a widely employed paradigm for studying learning processes. In an AGL task, participants are typically exposed to training sequences in a first phase and are subsequently asked to identify new sequences compatible with the old ones. Several theorists have proposed that participants can perform such tasks on the basis of knowledge derived from the statistical information about symbol co-occurrence in the training items (for reviews, see Pothos, 2007, 2010). However, if such co-occurrence information is distorted by order effects, the corresponding models would be in need of revision (cf. Pothos, 2007, 2010).

While other authors have proposed that frequency information is automatically encoded with minimal demand on attentional resources (Zacks & Hasher, 2002), our proposal does not address the issue of whether the process underlying this strategy is automatic or controlled (“System One” or “System Two”), although—plainly—this is open to investigation. In sum, however, our research provides a specification and formalization of a process by which judgments of the frequency of types of events might be made, with implications for descriptive theories of identification, categorization, and decision-making, as well as for their practical application.

In this article, we use the term “run” when at least *two* consecutive occurrences of the same category appear in a sequence; accordingly, a single occurrence is not considered a run (Kusev et al., 2011).

Of course, for those sequences in which the category with greater relative frequency is *not* signaled by the presence of a run, there will be bias.

## Author note

P.K., P.v.S., and N.C. are supported by Economic and Social Research Council Grant No. RES-000-22-1768. We also thank the Nuffield Foundation (Grant No. 50045SP) and the British Academy (Grant No. SG 47881) for supporting P.K. in his research.