Dyslexia is the most common developmental disorder in the western world with a prevalence rate of around 5% (Shaywitz 1996). Typically, a child or teenager with dyslexia is identified through difficulties in reading although the range of symptoms associated with dyslexia can be much broader such as having difficulties following complex instructions, forgetting things like PIN numbers, and poor essay/report organisation and planning skills (NHS 2018). The cognitive impairments frequently observed in dyslexia primarily affect the phonological domain resulting in limitations of verbal short-term memory and lack of awareness with phonological information. Further, there is evidence of poor long-term verbal learning which manifests itself in difficulties learning sequences such as days of the week or months of the year (Snowling and Stackhouse 2006). The most common impairment is when reading.

Acquiring mastery of reading requires the learner to establish the relationship between the letters (graphemes) in written words and the speech sounds (phonemes) that vocally accompany those words. In the early stages of reading most of the focus is on using the phonological pathway but as the reader develops, they begin to rely upon the lexicalised form of the words to help them say the words (Harley 2014). Lexicalised retrieval is vitally important for the pronunciation of exception words such as ‘yacht’ in which the phonological pathway provides very little help. Children with dyslexia with deficits at the phonological level are hindered in their reading development as they are unable to use this pathway and instead rely on reading words by rote (Snowling 2000). Rote learning is an inadequate learning strategy because it is poor for generalising knowledge to other words, in particular non-words (Rack et al. 1992). Given that semantic skills of people with dyslexia are within the normal range, it has been proposed that dyslexics rely on these to develop their word reading (Nation and Snowling 1998).

The phonological deficit hypothesis (e.g. Snowling 2000) is arguably the most popular explanation for dyslexia. According to this hypothesis the deficit exhibited by those with dyslexia resides in processing and representation of phonological information. This explanation, however, is not immune from criticism. For example, some individuals with developmental dyslexia are unable to read fluently but show no phonological impairment (Paulescu et al. 2001) and some individuals with dyslexia show other deficits beyond reading and/or spelling such as working memory (Smith-Spark and Fisk 2007) and motor sequencing (De Kleine and Verwey 2009). Finally, studies with children and adults reveal that these deficits are not restricted to verbal materials but extend to non-verbal materials such as item delayed repetition (Martinez Perez et al. 2012, 2013). This suggests that alternative explanations of dyslexia may be useful.

Recent research suggests that the linguistic and non-linguistic difficulties that comprise dyslexia are better explained due to a deficit in memory for order information, that is, in reproducing the correct order of sequential information (although see Staels and Van den Broeck 2014a, b). A vital process in language-learning is the ability to correctly segment a sequence of fluent speech so that individual words can be identified and learnt. This can be achieved by processing the statistical information contained in such sequences. Statistical regularities aid the identification of word boundaries because sounds that delineate the transition within a particular word tend to occur with greater frequency than those sounds that delineate the end of one word and the beginning of another. Take for example the word-form “creeping-death”. The transitional probability from “cree” to “ping” is greater than the transitional probability from “ping” to “death”. Using the familiarisation-preference procedure (differential listening times to familiar and unfamiliar sound sequences), Saffran and colleagues (1996) showed that infants as young as eight months old are able to extract this vital statistical information as shown by significantly longer listening times to newly presented examples of artificial language compared to previously presented examples of said language.

These statistical regularities in language have been argued to underpin the classic Hebb effect (Hebb 1961). In this simple task, participants are presented with a series of verbal serial recall trials such as “k, s, b, l, n” which they have to recall. Some trials are only presented once but, unbeknownst to the participant, one trial is presented more often. The classic finding is that the repeated trial is recalled significantly more often than the single-presented trials due to the increased statistical regularities inherent in their presentation. Szmalec et al. (2011) showed that adults with dyslexia did not show the classic Hebb effect despite having similar levels of recall of the non-repeated trials with non-dyslexics. The memory mechanisms involved in reproducing a sequence in serial order are argued to underpin those in both immediate serial recall and the acquisition of novel word-forms. Acquiring new words is essentially the process of learning overly familiarised sequence of discrete elements (like letters). So, having to learn the sequence “k, s, b, l, n” in a short-term memory experiment in the laboratory is akin to learning the new word-form “kayessbeeellenn” in the real-world (Page and Norris 2009). In this way, the Hebb effect paradigm is suggested to be a way of mimicking language-learning in a laboratory environment (Page and Norris 2008, 2009).

Szmalec et al. (2011) showed that individuals with dyslexia were unable to learn sequences of implicit long-term sequential information and from their extensive literature review they argued that findings from previous studies also support this idea. For example, it has been known that dyslexics often have difficulties with letter reversals during misreading (“was” with “saw”; Whitney and Cornelissen 2005). Further, dyslexics showed impairment on implicit learning for serial information but not for spatial context (Howard et al. 2006).

Instead of requiring participants to learn new long-term information as in the Hebb effect, the current study utilised participants’ existing knowledge of long-term sequential order information in their language via syntax and bigrams. That is, we exploited participants’ knowledge of their language by asking them to recall adjective-noun (happy-chair) and noun-adjective (chair-happy) pairings as well as familiar (RT) and unfamiliar (ZT) bigrams in a verbal short-term memory task. Evidence that knowledge of long-term memory information affects short-term memory derives from many laboratory studies. Although the received view of short-term memory comprise concepts such as storage, decay and interference (e.g. Baddeley 1986; Neath 2000), another view proposes that it is actually parasitic on language processing and production (e.g. Gupta and MacWhinney 1997; Jones et al. 2006; Martin and Saffran 1997).

These latter accounts argue that short-term memory is intricately tied to semantic and syntactic properties that are inherent in everyday language comprehension and production. Examples of this are abundant with studies showing that recall is superior for real words than non-words (Crowder 1978), for high-frequency words than low-frequency words (Watkins 1977) and for phonotactically legitimate consonant clusters than for impermissible ones (e.g. GH than for ZX; bigram frequency effect, Mayzner and Schoenberg 1964). Further, word lists with greater approximations to the English language are better recalled than those with lesser approximations (Miller and Selfridge 1951). These statistical regularities in sequence structure also increase short-term memory for artificial grammar (Botvinick and Bylsma 2005). Finally, Perham et al. (2009) demonstrated that participants benefit from knowledge of an example of statistical regularities – in this case the syntactical rule that adjectives precede nouns in the English language (we recognise that there are situations where nouns precede adjectives such as in post positive adjectives, for example “president elect”, and subject-verb-object-object sequences, such as “she painted the door green”, however these were not part of the stimuli used) – as recall of lists of adjective-noun pairings, “watery, lion, stormy, banana, defeated, coat”, was superior to that of noun-adjective pairings “window, itchy, penguin, rainy, tonsil, misty”. However, one might argue that individuals who have difficulties processing long-term sequential order information, such as poor readers and individuals with dyslexia, would not benefit from such knowledge and thus not show this syntax effect. Perham et al. (2013) tested this hypothesis using poor readers as assessed by the Revised Adult Dyslexia Checklist (Vinegrad 1994). As predicted, poor readers, compared to good readers, did not exhibit the syntax effect. This suggests the possibility that individuals who may believe themselves to have dyslexia may also fail to exhibit the syntax effect.

Diagnosing whether someone may or may not have dyslexia is vital in terms of providing support for those who are positively diagnosed. Full diagnostic assessments can be obtained from educational and occupational psychologists but can be lengthy and costly. Further, non-diagnostic tests such as the Adult Checklist (Smythe and Everatt 2001) and Vinegrad’s (1994) Revised Adult Dyslexia Checklist are available but they are quite subjective in nature by requiring the person to answer a series of questions about their experiences. The serial recall task in the current study is an objective measure of short-term memory that is both robust (Jones 1999) and sensitive to subtle order manipulations (e.g. Perham et al. 2009) such as those to do with syntax. One way to assess whether the task is good at discriminating between individuals who are dyslexic and those who are not is through the use of Receiver Operating Characteristics (ROC). ROC analysis reports the likelihood that any randomly selected case (dyslexic) will score higher on a diagnostic measure than any randomly selected non-case (non-dyslexic). This form of analysis is typically reported in terms of the area under the curve (AUC) which represents the test’s accuracy (specificity) across the full range of its potential sensitivity. The AUC can range from 0 to 1 with values approaching 1 indicating near perfect diagnostic validity, values below 0.5 indicate negative predictive validity (cases tend to score lower on the test than non-cases) whilst an AUC of 0.5 suggest that the test has no diagnostic validity.

The aim of the current study was to explore whether a simple verbal short-term memory task has the potential to be used as a diagnostic tool for dyslexia by examining whether it could discriminate individuals who were dyslexic from those who were not. The study differed from similar studies in one major way: the task utilised participants’ long-term sequential order information in their language rather than artificially creating it in the task by repeating sequences. It was anticipated that participants who were dyslexic would show reductions in both the syntax and bigram frequency effects. Further, if the task was able to discriminate between the two groups then ROC curve analysis should reveal a larger AUC with lower confidence intervals above 0.5 (no information).

Experiment 1

Method

Participants

Forty-six participants from a University in South Wales took part. During the recruitment process, potential participants from the undergraduate population were asked to participate if they had a diagnosis of dyslexia (dyslexic group) or that they were not dyslexic (non-dyslexic group). Those in the former brought evidence of their diagnosis to the experiment which was noted to confirm their membership of the dyslexic group. There were 20 in the dyslexic group and 26 in the non-dyslexic group. Of the 20 participants in the dyslexic group, 11 were male (mean age 24.27 years) and 9 were female (mean age 19.99 years). Of the 26 participants in the non-dyslexic group, 10 were male (mean age 20.3 years) and 16 were female (mean age 21.75 years).

Design

A mixed design was employed with the between-subject variable being group (dyslexic or non-dyslexic) and the within-subject variables being list type (adjective-noun (A-N) or noun-adjective (N-A)) and position (1–6). The dependent variable was the number of words recalled correctly in the order in which they were presented (ranging from 0 to 6 for each list).

Materials

The 24, 6-item recall lists were taken from version A of Perham et al’s. (2009) study and comprised 3, phonologically dissimilar word pairings (A-N or N-A). To minimise the influence of background knowledge, each pairing was implausible in that it was unlikely to occur in real life – e.g. an implausible pair was ‘itchy window’ and a plausible pair was ‘bright sun’. Adjectives and nouns were chosen on the basis of having one or two syllables and to be familiar to the Perham et al. authors. All list items were presented using Microsoft PowerPoint and were displayed in black Times New Roman against a white background in font size 72. Each word was displayed at rate of one every 700 ms and a 1 s interval was placed between items 2 and 3, and 4 and 5, to promote rehearsal of the pairs of words, i.e. items 1 and 2, 3 and 4, and 5 and 6. Participants were given 10 s to recall the list following the presentation of the last item and this was indicated by the word ‘RECALL’.

Procedure

All participants were tested individually in a quiet laboratory. Each participant began the experiment by reading the information sheet informing them that they were to see lists of 6 words and that their job was to recall each word list in the order in which the items were presented after they were prompted to do so. Once the participant fully understood what their participation involved and signed the informed consent sheet the experiment then commenced. Each experiment began with three trial tasks so that the participants could get used to the pace of the task and understand exactly what the task required.

Results

Figure 1 shows performance by the non-dyslexic group and reveals that they recalled more items in the A-N, compared to the N-A, condition. Further, the A-N list displayed a saw-tooth pattern with performance higher for nouns than for adjectives. For the dyslexic group the same saw-tooth pattern was visible for the A-N condition yet there was very little difference in performance between the two conditions – it was roughly the same.

Fig. 1
figure 1

Non-dyslexics’ and dyslexics’ word recall by list type and position

A three-way ANOVA was conducted and significant main effects of list type, F(1, 44) = 33.89, MSE = 1, p < .001, η2 = .44, position, F(5, 220) = 34.16, MSE = 1.28, p < .001, η2 = .44 and group, F(1, 44) = 18.06, MSE = 4.27, p < .001, η2 = .29, were observed. A-N lists were recalled better than N-A lists and the recall data followed the general features of a slightly flattened typical recall curve (greater performance at the start of the list, poorest performance during the middle items and a slight increase in performance on the penultimate and last items) observed in many short-term memory studies in that performance was best in the first position, decreased over the middle items and then increased during the final items. Finally, participants in the non-dyslexic group recalled more than those in the dyslexic group.

A significant two-way interaction was observed for list type by position, F(5, 220) = 4.75, MSE = .08, p < .001, η2 = .09, which can generally be attributed to the serial position curve typically observed in such studies. No such interaction was observed for group by position.

Finally, a three-way interaction between list type, group and position was observed, F(5, 220) = 2.52, MSE = .04, p < .05, η2 = .05. Pairwise comparisons revealed that for participants in the non-dyslexic groups, performance was significantly better at every position for the A-N, compared to the N-A, list type (all p < .005 apart from position 1 where p < .05). In contrast, participants in the dyslexic group recalled only significantly more in the second position (better A-N recall, p < .05) and no significant differences were shown for any other position.

To explore the diagnostic accuracy of the syntax serial recall task, multireader ROC analysis was conducted (see Fig. 2). In general, an AUC of 0.5 suggests no discrimination (i.e. ability to diagnose patients with and without the disease or condition based on the test), 0.7 to 0.8 is considered acceptable, 0.8 to 0.9 is considered excellent, and more than 0.9 is considered outstanding (Hosmer and Lemeshow 2000). An AUC value of .88 was obtained indicating that the task had an excellent ability to correctly classify those participants who were dyslexic and those who were not.

Fig. 2
figure 2

ROC curve for predicting whether participants were dyslexic or not depending on their word recall

Experiment 2

Method

Participants

The same forty-six participants that took part in Experiment 1 also took part in Experiment 2.

Design

A mixed design was employed with the between variable being group (dyslexic or non-dyslexic) and the within variables being bigram (low versus high frequency) and position (1–6). The dependent variable was the number of letters recalled correctly in the order in which they were presented (ranging from 0 to 6 for each list).

Materials

The 24, 6-item recall lists were taken from Jones and Mewhort’s (2004) corpora of uppercase-uppercase bigram frequencies. High bigrams were those with a high frequency (e.g. NG; mean of 6.61, standard deviation of 1.93) and low bigrams were those with a low frequency (e.g. VJ; mean of 3.26, standard deviation of 2.02). It was initially considered that high frequency bigrams could be reversed to create low frequency bigrams. However, this did not always guarantee low frequency bigrams as, for example, RT and TR are both quite common bigrams. Thus, it was decided to use different letters for the high and low frequency bigrams. The presentation (timings, font size, colours) were identical to those in Experiment 1. An independent t-test showed that the mean bigram frequency for the high bigrams was significantly higher than for the low bigrams, t(58) = 6.57, p < .001.

Procedure

The procedure for Experiment 2 was identical to that of Experiment 1 with the caveat that half of the participants completed Experiment 1 first and the other half completed Experiment 2 first.

Results

Figure 3 revealed that participants in the non-dyslexic group recalled more high, than low, bigrams. In contrast, there was very little difference between the recall of the two types of bigrams for participants in the dyslexic group.

Fig. 3
figure 3

Non-dyslexics’ and dyslexics’ letter recall by list type and position

Significant main effects of list type, F(1, 44) = 15.71, MSE = .63, p < .001, η2 = .26, position, F(5, 220) = 16.49, MSE = .31, p < .001, η2 = .27, and group, F(1, 44) = 10.46, MSE = 2.95, p < .05, η2 = .19, were observed with familiar bigrams being recalled better than unfamiliar bigrams, the pattern of performance following the typical serial recall curve, and participants in the non-dyslexic group recalling more than those in the dyslexia group.

A two-way interaction was observed for list type by group, F(1, 44) = 18.53, MSE = .74, p < .001, η2 = .29. Pairwise comparisons revealed that for participants in the non-dyslexic group, recall of familiar bigram lists was significantly better than for unfamiliar bigram lists, p < .001. This difference was not apparent for participants in the dyslexic group. Pairwise comparisons also revealed that familiar lists were recalled significantly better for participants in the non-dyslexic, rather than the dyslexic, group (p < .001) but no difference was noted for unfamiliar lists. No other interactions were observed.

An AUC value of .86 was obtained indicating that, as in Experiment 1, the task had a good ability to correctly classify those participants who had dyslexia and those who did not (see Fig. 4).

Fig. 4
figure 4

ROC curve for predicting whether participants were dyslexic or not depending on their letter recall

Discussion

We report two studies that successfully demonstrate that a simple verbal short-term memory task can discriminate between participants who have dyslexia and those who do not. That is, those with dyslexia failed to benefit from the long-term knowledge inherent in the English language, presumably because of the impact on performance of their impairment. These findings are consistent with other recent research showing that individuals with dyslexia find it difficult to profit from the knowledge of long-term sequential order information in language (Szmalec et al. 2011). We suggest that this task may be a useful screening tool for dyslexia.

In Experiment 1, the syntax effect was replicated with non-dyslexic participants (Perham et al. 2009, 2013) with adjective-noun pairs being recalled more accurately than noun-adjective pairs – participants who had dyslexia did not show this effect. Experiment 2 repeated Experiment 1 but instead used familiar and unfamiliar bigrams instead of adjective-noun and noun-adjective pairs. Once again, only those who did not have dyslexia showed a significant difference (familiar being significantly better recalled than unfamiliar lists) between both sets of materials. Although the sample size in both groups may have been quite small, the syntax effect finding is the second replication of this effect (Perham et al. 2009) and the similar group effect is the first replication of this effect (Perham et al. 2013). Thus, these experiments replicate and extend research on the verbal short-term memory of long-term sequential order information and, more importantly, demonstrate that individuals who think they have dyslexia do not show these effects – that is, they cannot benefit from this knowledge when reading. The second finding was that the verbal serial recall task was very good at discriminating between both groups of participants suggesting that it has the potential to be used as a screening tool for dyslexia.

Amongst the many characteristics that can comprise a diagnosis of dyslexia, two are a difficulty in reproducing the order of information (e.g. misreading ‘saw’ as ‘was’) and a difficulty in remembering information over the short-term (Staels and Van den Broeck 2014a; Whitney and Cornelissen 2005). Szmalec et al. (2011) combined these features in demonstrating that a deficit in verbal short-term memory only occurred when sequential long-term information was involved. We further support this by showing that this deficit manifests itself in both poor readers (Perham et al. 2013, as identified through Vinegrad’s (1994) Revised Adult Dyslexia Checklist) and now, in the case of the current study, individuals who have dyslexia. Furthermore, we extend Perham et al.’s (2013) study by observing the same impairment, not just with words, but with pairs of letters.

The task used in the current study was verbal serial recall or, as called elsewhere, the digit span task or short-term memory span. Essentially it tests ones’ ability to retain and retrieve information over the short-term and typically consists of a list of items (anywhere from 3 or 4, for children, and up to 9 for adults) whose content can be digits, consonants, words or non-words (pictorial versions of the task use images, see Paivio et al. 1975). The mechanism by which this takes place is through the process of rehearsal (seriation) as participants use their articulatory abilities to maintain the order of events. One can easily imagine a situation in which, when presented with a telephone number, one has no means of writing the new number down and is thus reliant on one’s own vocal apparatus to successfully encode and, hopefully later, retrieve said number. Recollection of any part of that sequence in the wrong order thwarts one’s primary goal of contacting the recipient of the telephone number. This essentially is the verbal serial recall task and it is generally scored using a strict serial recall criterion in which a point is given for every item that is correctly recalled in its correct presentation position. Indeed, there is a more sensitive test of order information whereby the criteria just requires an item to be in the correct order with regard to the item that preceded it (see Beaman and Jones 1998; Perham et al. 2007).

The ease with which these items can be rehearsed increases the chance with which they can be accurately recalled. However, a number of factors make rehearsal more difficult - the number of words/syllables in the list (word length effect, Baddeley et al. 1975), the difference between one item and the others (distinctiveness effect, Hunt 1995; Perham and Newson 2008), having to repeatedly articulate, for example, the word “the” (suppression effect, Jones et al. 2004), the presence of acoustically-varying background sound (irrelevant sound effect, Perham and Vizard 2010), and, in the case of the current study, having to rehearse a sequence that is incongruent with one’s long-term knowledge of the sequential order information inherent in the English language – syntax and bigram frequency. In contrast, if the task does not require the use of rehearsal (such as recalling a list of categorical items by their categories, Perham et al. 2007, or identifying a missing item, Beaman and Jones 1997) or recalling a very familiar sequence (e.g. “1, 2, 3, 4, 5, 6, 7, 8, 9”) then accuracy is similar to that observed in quiet.

The novelty of the current study is in the use of participants’ long-term knowledge of their own language within a test of short-term memory. Although, in general, serial recall tasks test the ability to recall information in their presentation order, they are independent of the content within in it – that is, it does not matter whether the items are words, non-words, consonants or digits (Jones 1999). Our paper is one of the few that explore the syntactical composition of the list to examine Szmalec et al.’s (2011) claim of the influence of long-term sequential order. However, Staels and Van den Broeck 2014a, b) disagreed with this assertion. They refer to a lack of serial order learning impairment in children and adolescents as demonstrated by no difference in performance between dyslexics and non-dyslexics in serial recall of a list of one-syllable animals (Staels and Van den Broeck 2014a, see McDonald 2009; McDonald et al. 2008, for similar findings). However, in neither study did the materials – recall of a list of seven one-syllable animals, nine nonsense syllables, seven digits or seven novel symbols – actually tap into participants’ long-term memory for sequential information. That is, there would be no long-term representation whereby any sequence within those mentioned (e.g. “dog, horse, mouse” or “2, 7, 4”) was any more familiar than another. Indeed, it is widely known that when creating materials for the serial recall of digits, researchers deliberately avoid starting the sequence with the sequence “1, 9…” or overly familiar sequences “3, 4, 5” so that participants cannot use their long-term knowledge of those sequence to aid their recall.

The second focus of this paper was to explore whether the serial recall task replete with long-term sequential information was beneficial as a potential screening test for dyslexic. That is, was it able to discriminate between individuals who did and did not have dyslexia? Firstly, a logistic regression analysis revealed that the model (containing words and letters) significantly predicted whether a participant was dyslexic or not and explained between 46% (Cox & Snell R Square) and 62% (Nagelkerke R Square) of the variance. AUC analyses successfully showed that the task was able to do this when the content was either word or letter pairs. Further, we can calculate an index that indicates whether an individual would be categorised as potentially dyslexic or not. To do this we can subtract performance for incongruent trials (noun-adjectives and unfamiliar bigrams) from performance in congruent trials (adjective-nouns and familiar bigrams). For individuals with dyslexia in the current study, the index is approximately 0, that is, .0091 for recalling words and - .0058 for recalling letters. Further, the index is .009 for poor readers recalling words (Perham et al. 2013). In contrast, the index is for individuals without dyslexia, recalling words and letters respectively, in the current study is .1638 and .1429, and the index for good readers recalling words is .0838 (Perham et al. 2013). Converting these values to percentages reveals that for the dyslexic/poor readers in all studies, the difference between the congruent and incongruent conditions is less than 1%. In stark contrast, this the difference between the congruent and incongruent conditions for non-dyslexics/good readers is between 8% and 16%. Given that the lower percentage is attributed to the good readers, the increased percentage differences for dyslexics is consistent with greater impairment due to the diagnosed dyslexia.

So it would seem that an individual who obtains an index of near 0 could indicate that they have a difficulty with processing long-term sequential order information and may need further assistance and support. Obviously these values are only based on two studies and further studies with larger sample sizes would need to be conducted to obtain more accurate norms. However, the values do suggest a promising starting point and, if the task was made into an app where the participants’ responses could be displayed immediately after performance, it would be very easy for an administrator and a participant to actually see a potential deficit. Given that this task is very easy to manipulate in terms of increasing the number of trials, the length of presentation of materials, and the content of the to-be-recalled items, and that the current study only lasted for about twenty minutes for each participant, one can see that this task is a quick, simple, and flexible task that could be part of the armoury of diagnostic tests of dyslexia. Further, one might envisage that this task could be performed online in the privacy of a person’s home to provide an indicator of potential issues.

In sum, the current study replicates our previous work with poor readers and further extends the concept of long-term sequential order information from syntax effect to bigrams. Each finding lends further support to the suggestion that individuals with dyslexia have an inherent difficulty with processing long-term sequential order information which can manifest itself in poorer short-term recall for sequences of both words and letters. Finally, the findings provide a useful starting point for a quick and easy screening test of dyslexia.