This section illustrates the application of SweetPea to three different experimental design problems. The first design problem (Stroop task) illustrates the use of basic and derived factors, the second design problem (task switching) introduces factors that describe transitions between two trials, and the third design problem (2-Back task) requires solving sequential dependencies between more than two trials. We chose to focus on experiments that are commonly used to study cognitive control (i.e., the ability to flexibly pursue goal-directed behavior in the face of distraction). However these experiments share similarities with experiments in other domains of psychology, as well as other empirical disciplines, such as neuroscience and machine learning. Before walking through each example, we describe the installation of SweetPea. We then focus on each experiment, by first describing the experimental design problem, then showing the code for expressing and solving the problem in SweetPea, and finally walking the reader line-by-line through the example code. In addition, each example includes a solution for the design problem that was identified by SweetPea. Due to their medium complexity, the experiments described in this section are not amenable to uniform sampling, so we sampled experiment solutions non-uniformly from the space of possible solutions. The reader may find examples that implement uniform sampling, as well as other examples—both proof of concept and for realistic experimental designs—online (https://sweetpea-org.github.io/). The materials for all walkthrough examples described in this section are available at https://osf.io/b4nsy/.
Installation of SweetPea
The current version of SweetPea requires Python 3.7 or later and can be installed in a local Python environment using pip install sweetpea
Designing a Stroop Task
In previous sections we described the Stroop task (Stroop, 1935) in which participants are typically asked to name the color in which a color word is displayed (e.g., say “green” to the word RED displayed in green). Here, we consider the Stroop color naming experiment with the two regular factors described above: the color factor representing the color in which the stimulus is displayed, and having four levels: red, green, blue, brown; and the word factor representing the word itself, also having four levels corresponding to each of the colors (Fig. 2a). On a given trial, one level from each factor is used to generate the stimulus, and the participant is required to respond to the color factor, indicating the color in which the word was displayed. For instance, they may be required to press the left arrow key if the word was displayed in red, the right arrow key if it was in green, the up arrow key if it was in blue, the down arrow if it was in brown. This results in a response factor with four levels: left, right, up and down.
In the general case, an experimenter may want to ensure that each color is paired with each word. This could be achieved by crossing all colors with all words. The full crossing of colors and words includes conditions in which the color and the word are the same (congruent; e.g., the word
displayed in red), as well as conditions in which they are different (incongruent; e.g., the written word
displayed in red). Together, the two conditions define the congruency of a trial. In this particular experiment, we wish to pair each color with each word, subject to the constraint that only incongruent trials are included. Finally, we wish to generate a minimum of 20 experiment trials. For the ease of the reader, we interleave each chunk of code with an explanation of what it does for the first Walkthrough Example. The full code implementing the specified design is shown in Listing 1.
We begin with importing the SweetPea modules:
Modules in line 1 are needed to specify regular and derived factors. Modules in line 2 are required to implement design constraints, such as specification of the minimum number of trials and the exclusion of factor levels. The imported modules in lines 3 and 4 serve to generate, sample, print, tabulate and export the final experiment.
We continue with defining the two basic factors color and word,
as well as the response factor:
Levels of the response factor are dependent on the factor color. Each of its derived levels is defined by a predicate that takes as input the color factor describing the current stimulus (lines 13–20). For instance, the following function is used to define a leftward response:
The function returns true if the factor color on a given trial evaluates to the level
. The function is used to specify the derived level
in the definition of the factor response (lines 22–27):
The first argument specifies the level name
. The second argument is the function within_trial described in Section Defining Factors. It passes the color factor of the current trial to the is_response_left predicate, to determine whether the factor response evaluates to the level
. The congruency factor is defined in a similar fashion:
Both levels of the congruency factor are dependent on the color and word factors of the current trial. Thus, the predicates specifying these levels take the color and the word factors as arguments (lines 31–35). For instance, a trial is considered congruent if the color and word of the current trial match:
Note that the definition of the corresponding derived level must include within_trial, with both color and word as arguments (lines 37–38):
The experiment code specifies two constraints:
First, it specifies the minimum number of trials (line 47). The minimum_trials constraint ensures that the experiment sequence includes at least 20 trials. Note that a full crossing of all valid trials (without 4 possible congruent trials) requires a multiple of 4 × 4− 4= 12 trials. Thus, minimum_trials will not satisfy a full crossing unless the specified minimum number of trials is a multiple of 12. To mitigate this issue, SweetPea successively samples trials without replacement from counterbalanced blocks. In this example, the first 12 trials are sampled from a block of 12 counterbalanced trials, and the remaining 8 trials are sampled without replacement from another counterbalanced block.Footnote 5 The second constraint defines an exclusion criterion according to which the level congruent of the factor congruency is excluded from the experiment (line 48). All factors to be included in the design are listed in line 50. We continue with specifying the full experiment:
The entire experimental design is defined by the factors color, word, response and congruency (line 54). The crossing between all colors and all words is specified in line 55. The design, crossing and constraints are used to define a fully crossed experiment block (subject to said constraints; lines 56–57). However, a complete crossing between all colors and words is not possible because we want to exclude all congruent trials for which the color and the word match. Thus, fully_cross_block would return no solution to the experimental design unless we allow the crossing to be incomplete, by setting require_complete_crossing to False. We can now generate, print, tabulate and save a desired sequence of trials:
Line 59 specifies how the experiment should be generated: experiments = synthesize_trials_non_uniform (block, 1)
In this case, we sample the experiment block only once, thus the argument 1. The function synthesize_trials_non_uniform solves for experiment sequences without guaranteeing that they are sampled uniformly from the space of all possible solutions (see Section Solving Experimental Designs). We print the experiment in line 61, yielding the following output (only first six lines are shown; see Table 1 for the full output):
1 trial sequences found.
Experiment 0:
color brown | word red | response down |
congruency incongruent color green | word
red | response right | congruency
incongruent color brown | word blue |
response down | congruency incongruent
color red | word blue | response left |
congruency incongruent
Table 1 Example solution to a Stroop color naming design
To check the frequency of each factor combination, we tabulate the generated experiment sequence for the factors specified in the crossing (line 63). The generated table lists the frequency and proportion of each factor level combination (only first six lines of the output are shown; see Table 2 for the full output):
Experiment 0:
color red | word red | frequency 0 |
proportion 0.0
color red | word green | frequency 2 |
proportion 10.0
color red | word blue | frequency 1 |
proportion 5.0
color red | word brown | frequency 2 |
proportion 10.0
color green | word red | frequency 2 |
proportion 10.0
Table 2 Frequencies and proportions of factor level combinations in an example solution to the Stroop experiment (cf. Table 1)
Finally, we export the generated experiment sequence to a CSV file named “experiment_0.csv”Footnote 6 into the local folder (line 65).
Designing a task switching experiment
In many experiments, the sequence in which trials are presented is an important part of the design. For example, in task switching paradigms—used to study the flexibility with which people can adapt their behavior—the transition between trial types is an important factor. One example of this is the cued task switching paradigm, in which participants receive a task cue on every trial that instructs them which of two (or more) tasks they should perform on the current trial (Fig. 2b). The cue may instruct them either to repeat the task from the previous trial (task repetition) or to switch to a different task (task switch). A common measure in such designs is the cost associated with task switches—that is, reaction time or accuracy on switch relative to repetition trials. Thus, the design must consider task transition as a factor.
Here, we consider a task switching paradigm that builds on the Stroop experiment described in the previous example. In this experiment, participants are presented with a sequence of Stroop stimuli (i.e., color words displayed in a particular color). For simplicity, we consider only two colors (red and green), resulting in two levels for the color factor and two levels for the word factor. The task factor indicates which of the two tasks the participant is instructed to perform on a given trial, with two levels: color naming (respond to the color in which the word is displayed) and word reading (respond to the word). Thus, the correct response on every trial depends on both the stimulus and the relevant task. As in the previous example, we assume that the participant responds by pressing one of two buttons, but the stimulus-response mapping is the opposite for the two tasks: for color naming, the left button should be pressed for the color red and the right button for the color green; conversely, for word reading, the right button should be pressed for the word “red” and the left button for the word “green”. Finally, there is a task transition factor, with two levels: repeat (if the instructed task for the current trial is the same as the last trial) and switch (if it is different).
In addition to the factors described above, it may also be important to include a response transition factor, that determines whether the response required on the current trial (i.e., whether the left or right button is the correct response) is the same or different as the one required on the previous trial. For example, this response transition factor has been shown to interact with performance costs associated with task switches (Kiesel et al., 2010; Rogers & Monsell, 1995). Thus, to control for this, it may be important to ensure that the same number of response transitions occurs in the task switch and task repetition conditions. This can be done by specifying a full crossing of the task transition, response transition, color, word and task factors.
Fully crossing the factors described above will ensure a balanced sampling of all combinations of their levels, including types of task and response transitions, over the course of the experiment. However, when selecting trials to execute, it does not necessarily preclude the possibility of undesired local sequences, or “runs”—that is, sequences of trials in which some level of a factor (e.g., whether the current trial is a task repetition or response repetition) remains the same, or alternates in some seemingly predictable but undesired way—for several trials in a row. This could invite misleading expectations from the participant, or otherwise impact performance in undesired ways. While these could also be specified as higher level factors, it can be more convenient to control for this by imposing constraints on the number times a particular level of a given factor, or a particular crossing of two or more factors is allowed to occur in sequence. As an example, here we limit task transitions to four of the same type, and the number of consecutive response repetitions or response switches to four. The code shown in Listing 2 implements this design.
Lines 1–4 import the SweetPea modules needed to specify the experiment as described for the previous example. We also include transition (line 1) to define factor levels based on transitions between trials.
Lines 8–10 define the three regular factors, color, word and task. Lines 14–25 define the derived factor congruency, as in the previous example (cf. Listing 1, lines 31–43). However, for conciseness, here the derived_level function is directly passed to factor.
Lines 29–41 define the derived factor response. The levels of this factor are defined by the two functions is_response_left and is_response_right. Note that each of the functions depends on the three factors color, word and task. For instance, the is_response_left function
implements the rule that the left response button should be pressed if the task is
and the color is
, or if the task is
and the word is
.
The task transition factor is defined in lines 45-56. The levels of this factor are dependent on the factor task in the previous trial and the current trial. The predicate is_task_repetition expresses this between-trial dependency to define the task repetition level:
In this example, the factor task is passed as an ordered list with two elements. The first and second elements of the list encode the task of the previous trial (task[0]) and the current trial (task[1]), respectively. If the two are the same, the current trial is considered a task repetition, and a task switch otherwise. Note that the derived levels of task_transition are now defined using the transition function. For instance,
defines the level
, and uses the transition function to pass the factor task as a list (encoding the task on the current and previous trial) to the predicate is_task_repetition. Lines 60–71 define response_transition in an analogous manner.
Lines 75-77 define constraints for the experiment. The function no_more_than_k_in_a_row in line 76 implements the sequential constraint that each level of the factor task_transition cannot occur more than four times in a row; the same is declared for the factor response_transition in line 77.
Analogous to Listing 1, the design, the crossing, as well as the constraints are integrated into an experiment (lines 81-84) that is first sampled non-uniformly (line 86), and then printed (line 88).
The output shown in Table 3 illustrates one solution to the specified design. Note that the factors task_transition and response_transition are not defined for the first trial of the experiment, simply because there exists no preceding trial with the factors task and response, respectively. Full counterbalancing of five factors, each of which has two levels, requires at least 25 = 32 trials. SweetPea adds an additional filler trial at the beginning of the experiment sequence to accommodate the circumstance that transition factors cannot be defined for the first trial of an experiment. We can check the generated sequence by breaking down the frequencies of every factor level combination in the crossing (lines 90–91; Table 4). Note that we instruct the tabulate_experiments function to only consider all trials from the second trial (indexed as 1) to the last trial (denoted as 33) since we can ignore the first filler trial.
Table 3 Example solution to a task switching design Table 4 Frequencies and proportions of factor level combinations in an example solution to the task switching design (cf. Table 3) Designing a 2-Back Task
Some experiments may involve factors that are dependent on more than two consecutive trials. The function window allows the user to define such factors in SweetPea. Here, we illustrate its functionality in the design of an N-Back task—a psychological task commonly used to assess working memory performance, e.g., how well participants can update and maintain task-relevant information over time. In the N-Back task considered here, participants are presented with a sequence of letters, one letter per trial (Cohen et al., 1997). Participants are instructed to press a button if the letter on the current trial matches the letter some number N trials back. For simplicity, we consider a 2-Back task in which participants should press a button if the current letter was presented N = 2 trials back (Fig. 2c). The letter factor describes which of the following letters is presented on the current trial: “A”, “B”, “C”, “D”, “E”, “F”. The target factor determines whether the letter on the current trial matches the letter two trials back, in which case the participant has to press the button. Finally, we consider a trial to be a lure if (a) it isn’t a target trial but (b) the letter on the current trial matches the letter one trial back. Lure trails are important because they can help determine whether, when participants make mistakes, it is because they are having trouble remembering the letters themselves (e.g., a problem with maintenance) or the order in which they were presented (e.g., a problem with updating). It is also possible that participants may be better at responding to some letters than others. Measures of working memory performance could be biased if those letters occur more often in the experiment. This bias can be avoided by ensuring that each letter is a target and a non-target for equal numbers of trials. Previous studies address this issue by sampling letters randomly. However, as discussed above, random sampling of individual trials is only reliable if the sample size is large. Experiments with a small number of trials may risk confounding the target factor with the letter factor. SweetPea can be used to address this problem directly, by balancing the letter factor with the derived target factor. Listing 3 shows SweetPea code that generates such an experiment sequence.
Lines 1–3 import relevant SweetPea modules, as described in the previous two examples. Note, however, that we also include window (line 1) to derive factor levels from sequences of more than two trials.
Lines 7–9 define the regular factor letter with its six levels.
The target factor is defined in lines 13-21. The predicates implementing each level of the factor target receive a list as input, similar to levels for transition factors described in the previous example. For instance,
expects a list labeled letter. Each element of the list refers to a different trial within a specified window of trials relative to the current trial. In the function is_target, the argument coding for the factor letter is treated as a window of size 3. The last element (letter[2]) refers to the letter on the current trial and the first element (letter[0]) refers to the letter two trials back. A trial is considered a target if the letter on the current trial matches the letter two trials back. The window is specified in the declaration of the derived level
for the target factor (line 19):
window is given the predicate is_target that returns true if the sequence satisfies requirements for this level. It passes the list of factors [letter] to the function is_target. The last two arguments of window define the window size—the number of past trials to consider, including the current trial—as well as the stride. The stride determines the number of trials to skip between the trials that are considered when selecting the new, derived level. For instance, if we were to determine the presence of a target on every other trial (instead of every trial), we may specify a stride of 2 (instead of a stride of 1).
Lines 25–33 define the lure factor in an analogous manner, but using a window size of 2. Note that a trial is considered a lure if it is not a target and if the letter on the previous trial is the same as the letter on the current trial. The is_lure predicate, used to determine whether a trial is a lure, takes two arguments, letter and target,
with both factors passed as a window of size two. Thus, target[1] and letter[1] refer to the target and letter factors, respectively, on the current trial and letter[0] refers to the letter factor on the previous trial. The window size of two trials for the predicate is_lure is specified in line 31:
Finally, line 37 lists the factors to consider in the design and line 38 defines the crossing between letter and target. The design and crossing are embedded without constraints in an experiment block (line 39). Line 41 generates an experiment sequence from the block with non-uniform sampling. Line 43 displays the output. Table 5 shows an example output. Note that the target is not defined for the first two trials and the lure is not defined for the first trial due to a window size of two and one, respectively.
Table 5 Example solution to a 2-Back task design Finally, we validate the generated sequence by breaking down the frequencies of every combination of factor levels in the crossing (lines 45–46; Table 6). Here, we instruct the tabulate_experiments function to only consider trials from the third trial (indexed as 2) to the last trial (indexed as 14) since we can ignore the first two filler trials.
Table 6 Frequencies and proportions of factor level combinations in an example solution to the 2-Back task design (cf. Table 5)