Conditions
The study was conducted in Italian, specifically the variety spoken in Bari. As is the case with other languages, Bari Italian has a number of pitch contours for cueing positional information of items in sequences (Savino, 2001, 2004; Savino et al., 2006). Positions that are pre-final (penultimate in each triplet and in list) and non-final (any other position that is not final in list) are signalled by different types of rising contours, whereas final position is signalled by a fall (see Table 1 in the Appendix for details on all contour types). On the basis of this tonal inventory, two list types were compiled: Intonation Contour A and Intonation Contour B:
Intonation Contour A had an intonation contour at the end of the first and second triplets (Positions 3 and 6) signalling non-finality, and a final contour at the end of the entire list (Position 9).
Intonation Contour B additionally had a contour signalling pre-finality in each triplet, and in list (Positions 2, 5 and 8).
Two additional list types had a neutral falling contour on all digits. The Grouped-by-Pauses condition had a pause after Positions 3 and 6, whereas the Ungrouped condition had no pauses. For a schematic representation of these four prosodic patterns, see Fig. 3 in the Appendix.
Preparation of stimuli
To construct the stimuli, we first produced sequences of the same digit in all nine positions with Intonation Contour A, Intonation Contour B, and with the neutral falling contour. For example, for digit uno (one), the sequence “uno uno uno uno uno uno uno uno uno” was produced once with Contour A, once with Contour B, and once with a neutral falling contour on each digit. In this way, all intonational realizations for each position in each prosodic condition were available for each digit, taking into account downtrends in fundamental frequency (F0) across stretches of natural speech (Ladd, 1984). All sequences were produced by a trained speaker of Bari Italian (author M.S.) in the same recording session. All digit renditions were saved as individual audio files and were used as “building blocks” for creating the stimuli for all experimental conditions, by concatenating the individual audio files into nine-digit sequences.
Spoken digit renditions with the neutral falling pitch shape were concatenated to create sequence stimuli for the conditions Ungrouped (control) and Grouped-by-Pauses. In the latter case, a 310-ms long silence was inserted after digits in Positions 3 and 6. Digits produced with Intonation Contour A and Intonation Contour B were used for creating sequences of these two intonation contour types, respectively. An example of a digit sequence for each of the four prosodic conditions is shown in Fig. 1.
We produced 17 stimuli for each experimental condition, for a total amount of 68, including eight stimuli to be used in the training session (two per prosodic condition). The duration of each stimulus sequence averaged 6.4 s. The concatenated nine-digit sequences were created on the basis of 68 nine-digit lists we derived by pseudo-random permutation of the 1–9 digits, avoiding two adjacent digits in ascending or descending order, or the same digit in an identical position in consecutive lists. All steps for the preparation of stimuli were carried out using Praat (Boersma, 2001).
Participants
Seventy-eight participants (63 female, 15 male, Mage = 22.35 years, SD = 3.29 years) took part in the experiment for course credit. They were undergraduate and graduate students of psychology at the University of Bari, all born and living in the Bari dialectal area. Participants did not report any speech or hearing deficits, and they did not have any background in phonetics or speech science.
Procedure
Participants were tested individually in a quiet laboratory, sitting in front of a computer and wearing headphones. They were instructed to listen to each sequence and recall all nine digits in the same order in which they were presented (the importance of recalling in the correct order was emphasized in the instructions). Participants responded immediately after the presentation of the last digit. No grouping strategy was suggested.
Each list was preceded by a 890-ms tone (263 Hz), followed by 500 ms of silence. After each response, participants proceeded to the next sequence by pressing the space bar. They were allowed to pause whenever they wanted during the session, and they were encouraged to take a break after every block of 15 stimuli. Stimuli from the same condition were blocked, with block order balanced across participants. Before starting the task, participants were tested using the WAIS-R Digit Span test (Wechsler, 1987).
In contrast to the stimuli manipulation, which was within participants, the response modality manipulation was between participants. A group of 29 participants (23 female, six male, Mage = 22.8 years, SD = 4.55, digit span = 6.76, SD = 0.77) were asked to recall the lists orally. Participants in this condition wore a microphone for recording their responses. Another group of 24 participants (20 female, four male, Mage = 22.33, SD = 2.64, digit span = 6.5, SD = 0.96) were instructed to write down each sequence in a nine-box grid drawn on paper, from left to right (in contrast to Frankish, 1995, grids were not drawn in a way to overtly suggesting grouping into triplets). They were instructed to fill all nine boxes in the grid even if they were unsure of the correct response. A third group of 25 participants (20 female, five male, Mage = 21.84, SD = 1.46, digit span = 6.48, SD = 0.81) performed the task by typing the digits on a computer keyboard, and pressing the “return” key after the end of each recalled sequence. Each session (i.e., including the digit span test, and independently from the recall modality) lasted approximately 40 min. Trials were implemented and run using SuperLab 2.0 (Cedrus Corporation, 1991).
Statistical analysis
We used R (Version 3.6.0; R Core Team, 2019) and the package brms 2.9.0 (Bürkner, 2017) to compute a mixed Bayesian logistic regression model on the accuracy scores. The main fixed effects were response modality (spoken, keyboard, grid) and condition (Intonation A, Intonation B, pause, control). In addition, we included a fixed effect for “position within triplet,” which was added as a monotonic variable (see Bürkner & Charpentier, 2018). This variable codes for the first, second, and third position within each triplet (1, 2, 3 versus 4, 5, 6 versus 7, 8, 9). Thus, the first “position within triplet” codes for Positions 1, 4, and 7; the second codes for 2, 5, 8; and the third for 3, 6, 9.
As fixed effects, we also included a Position Within Triplet × Condition interaction, as well as a Response Modality × Condition interaction. Digit span and overall position (1 to 9) were added as control variables. To account for primacy and recency effects, we added overall position also as a squared predictor, which models the parabolic shape seen in most serial recall curves. The random effects component included random intercepts for participant as well as random slopes for all within-participant variables (including random slopes for interactions) and correlation terms between all random effect components. Markov Chain Monte Carlo sampling was performed with 4,000 iterations for four chains (2,000 warm-up), resulting in 8,000 posterior samples. There was no indication of any convergence issues (all Rhat = 1.0). Posterior predictive checks indicated no issues.
All data and code for the statistical analyses are made available under the following OSF repository: https://osf.io/5b94c