Evaluation of the “Three Steps in Screening for Dyslexia” Assessment Protocol Designed for New Zealand Teachers

Traditionally, the New Zealand Ministry of Education opposed the recognition of dyslexia. However, since 2007, the Ministry of Education’s position has started to change, evidenced by the development of a working definition. In 2021 the Ministry of Education released Three Steps in Screening for Dyslexia (TSSD), an assessment protocol designed to support teachers to screen for dyslexia. The current research evaluated the TSSD with a sample of 209 children in Years 4 to 6 (8–10 years-of-age) from New Zealand. The research investigated whether children could be accurately classified using tests from the TSSD, whether the three-step protocol described in the TSSD was a valid assessment approach, and what effect operationalising the term average at different cut-off points had on dyslexia screening. Children were classified using two cluster analyses. The first analysis was based on tests from the Woodcock Johnson IV and the second analysis was based on tests from the TSSD. Subsequent analyses investigated specific aspects of the TSSD protocol, including its sequential design and the placement of cut-off points. Results revealed a number of limitations to the TSSD approach. The authors discuss three changes that could be made to improve the validity and reliability of the TSSD, including a broader assessment of the decoding and language comprehension constructs; directing teachers to assess both decoding and language comprehension, irrespective of a child’s language comprehension ability; and placing a greater emphasis on discrepancy bands over cut-off points.


Introduction
New Zealand has a relatively large proportion of children who exhibit reading difficulties (Ministry of Education, 2017;Tunmer et al., 2013). The 2016 PIRLS data showed that New Zealand's mean reading scale score was worse than similar English-speaking countries from the OECD including Northern Ireland, the United States, Ireland, England, Canada, and Australia. Compared to these countries, a greater proportion of New Zealand's children also fell within the bottom two achievement bands (Ministry of Education, 2017). Some children within the lower end of the reading comprehension continuum exhibit difficulties consistent with a specific learning difficulty in reading, commonly called dyslexia. Traditionally, New Zealand has opposed the use of labels such as specific learning difficulty and dyslexia because of concerns that the use of labels may stigmatise some ethnicities who are more likely to exhibit reading difficulties (Tunmer & Chapman, 2007). However, groups advocating for those with dyslexia began to place increased pressure on the Ministry of Education to formally recognise dyslexia in the early 1990s (SPELD NZ, 2021).
The Ministry of Education formally recognised dyslexia in 2007 and a working definition was provided. This definition was controversial because it failed to include key components inherent within existing/accepted definitions of dyslexia. It also implied that children could have problems with musical notation, but no reading difficulties, yet still be considered as having dyslexia. This view is not consistent with accepted views within the scientific community (American Psychiatric Association, 2013;International Dyslexia Association, 2002;Rose, 2009). Tunmer and Greaney (2010) recommended a revised definition should be developed that defined dyslexia in terms of four key components: (a) persistent literacy difficulties (b) in otherwise typically developing children (c) despite exposure to high quality, evidence-based literacy instruction and intervention, (d) due to an impairment in the phonological processing skills required to learn to read and write. The Ministry of Education has since revised its definition to include these key components (Ministry of Education, 2016).
Whilst the Ministry of Education's definition of dyslexia is now consistent with contemporary research on dyslexia, considerable confusion remains regarding how classroom teachers can identify children with dyslexia. Nicholson and Dymock (2015) surveyed teachers on their ability to identify and support children with dyslexia. Most respondents (95%) believed they had children with dyslexia at their school, however, fewer believed they could identify these children (65%) and very few believed they were equipped to support children with dyslexia (12%). In recent years the Ministry of Education has come under considerable pressure to provide more specific guidance on how children with dyslexia can be identified and supported by classroom teachers (Ministry of Education, 2018).
Historically, New Zealand teachers have not been able to access valid and reliable tools that can be used to screen for dyslexia, meaning that assessment for dyslexia is typically conducted by a psychologist or a trained assessor associated with Specific Learning Difficulties New Zealand (New Zealand Qualifications New Zealand Journal of Educational Studies (2022) 57:465-482 Authority, n.d-a). Because these assessments are undertaken outside the school system, families are usually required to cover the assessment costs. Financial constraints have resulted in inequitable access to educational assessments (New Zealand Qualifications Authority, 2020). Assessors typically use the Wechsler scales or the Woodcock-Johnson batteries to test for dyslexia (New Zealand Qualifications Authority, n.d.-b) and most teachers are unable to administer tests from these batteries because they have not completed the necessary training to obtain the registration level required to purchase and administer these tests (New Zealand Qualifications Authority, n.d.-c).
In 2017, the government responded to 46 recommendations made by the Education and Science Select Committee report, which inquired into the identification of and support for children with dyslexia, dyspraxia, and autism spectrum disorders in primary and secondary schools (New Zealand Parliament, 2017). The recommendations included the need for earlier school-based identification of dyslexia and in 2019 the Ministry of Education released the Learning Support Action Plan (Ministry of Education, 2019) that identified six priorities designed to drive progress towards an inclusive education system. The second priority focused on screening and early intervention and stated that between July 2019 and December 2025 the Ministry of Education would develop evidence-based screening tools for dyslexia (Ministry of Education, 2019). In 2021, the Ministry of Education released Three Steps in Screening for Dyslexia (TSSD; Ministry of Education, 2021), a document that describes an assessment protocol that can be used by teachers to screen for dyslexia.
The TSSD is based on the Simple View of Reading (SVR; Gough & Tunmer, 1986), which predicts that children can be assigned to one of three poor reader groups based on their proficiency in decoding and language comprehension. Within this model, the label dyslexia is applied to children who exhibit decoding difficulties in the absence of language comprehension difficulties. The label specific comprehension difficulty (SCD) is applied to children who exhibit language comprehension difficulties in the absence of decoding difficulties and the label mixed difficulty is applied to children who exhibit both decoding and language comprehension difficulties. These three groups have been identified in previous SVR classification research (Aaron et al., 1999;Catts et al., 2003;Ebert & Scott, 2016;Morris et al., 2017) including a recent New Zealand study (Sleeman et al., 2021).
The TSSD directs teachers to follow a three-step protocol that begins with the assessment of a child's reading comprehension ability. If the child exhibits reading comprehension difficulties, teachers proceed to the second step of assessing language comprehension ability. According to the TSSD, children who exhibit average or above-average language comprehension ability are probably dyslexic and teachers are directed to the final step of assessing the child's decoding ability. The TSSD may not identify children who exhibit the mixed difficulty or SCD profiles because these children are likely to exhibit below-average language comprehension ability, which means their decoding ability will not be assessed. As a result, teachers will not obtain valuable information about children who exhibit reading comprehension difficulties that are not primarily due to decoding difficulties. This is particularly 1 3 troubling because recent research has identified that these groups may make up over 60% of poor readers (Sleeman et al., 2021).
The proportion of children who are identified as dyslexic using the TSSD may be influenced by the way teachers operationalise the term average. Reading comprehension, language comprehension, and decoding ability all fall along a continuum of performance, making it difficult to determine where cut-off points should be placed to discriminate between average and below-average performance. The placement of cut-off points on the reading comprehension and language comprehension variables will influence the proportion of children who progress through Steps 1 and 2 of the TSSD, which will influence the proportion of children who are identified as dyslexic. Teachers may interpret the term average in different ways, which will influence the proportion of children who are identified as dyslexic. For example, average could be interpreted as the median, so those below a 50% cut-off are 'below average'. It seems unlikely that many teachers would consider 50% of all children to have problems with reading, therefore, a lower cut-off is more useful. Those with more experience of assessment measures may have read about the 'average range', which is often taken as one standard deviation above and below the mean. This would lead to approximately the bottom 15% of scores being considered 'below average'. Other cut-off values have been used across a range of assessment measures, such as the bottom 10%, or the bottom 5%, and experience with these assessment tools may lead teachers to interpret below average in other ways. (See Sleeman, 2021, for a discussion of different cut-off values used in SVR validation studies.) Variation in the way teachers might operationalise the term average raises questions about the reliability of the TSSD for screening purposes.
The TSSD identifies a small number of informal and standardised tests that can be used by teachers and schools to assess reading comprehension, language comprehension, and decoding ability. The way reading comprehension, language comprehension, and decoding are defined and operationalised within the TSSD may influence the validity of the TSSD protocol for screening purposes. For example, within the SVR, reading comprehension is defined as the ability to extract meaning from linguistic discourse represented in print (Hoover & Tunmer, 2018). The key difference between reading comprehension and language comprehension is the medium in which these skills are applied. Reading comprehension focuses on text, whereas language comprehension focuses on speech. According to Hoover and Tunmer (2021) it is fundamental that parallel tests are used to assess these constructs, such as a passage read by an assessor followed by comprehension questions paired with a similar passage read by the child themselves followed by similar comprehension questions asked by the assessor. Varying only the medium of presentation enables conclusions to be made about whether the child's difficulties are primarily due to text reading difficulties or difficulties with understanding. However, the TSSD does not make clear the potential usefulness of matching the reading and listening comprehension measures used, which seems a further limitation of the current screening procedures. Given that measuring both modes of comprehension is indicated in the procedures, it seems a shame that the benefits of making these as equivalent as possible was not also detailed. This would not increase assessment procedures, but it may help teachers understand the logic behind such testing.
New Zealand Journal of Educational Studies (2022) 57: [465][466][467][468][469][470][471][472][473][474][475][476][477][478][479][480][481][482] Under standardised assessment measures, the TSSD lists the PAT Reading Comprehension and PAT Listening Comprehension tests. However, these tests are not suitable for classification purposes because the tests report scale scores rather than standard scores. A relatively small change in scale scores on the Listening Comprehension test (around 2 scale scores) is equivalent to a year's worth of growth (New Zealand Council for Educational Research, 2014). This means it is difficult to discriminate between average and below-average performance within a year level. Raw scores on this test are also converted to stanines, which can be used to compare a child's performance to a national reference sample. Stanines span a range of percentiles so provide a more general indication of a child's performance relative to their peers than standard scores. This means they are likely to be less suitable for classification purposes than standard scores which map to an exact percentile. No other listening comprehension test is listed under standardised assessments in the TSSD; however, two expressive vocabulary tests are included (BPVS-III and the Peabody Picture Vocabulary Test). Assessing language comprehension ability using only a receptive vocabulary test may not provide a sufficiently broad assessment of the language comprehension construct. Assessment protocols that test children's vocabulary knowledge and listening comprehension ability provide a better indication of children's language comprehension ability than protocols that assess only one of these skills (Braze et al., 2007;Silverman et al., 2013). Nevertheless, the TSSD does not direct teachers to assess both of these skills.
Care must also be taken when operationalising the decoding construct. Decoding, within the SVR, is defined as the ability to quickly, accurately, and effortlessly access word meanings from our mental lexicon (Hoover & Tunmer, 2018). It is typically operationalised using word identification and word-attack tests Catts et al., 2006;Language & Reading Research Consortium, 2015). The former typically involves reading aloud individual real words of varying complexity (such as varying frequency of familiarity or age of acquisition) out of context (i.e., not in a sentence) whereas the latter typically involves pronouncing made-up words or non-words/pseudo-words, which provides an assessment of how the child deals with novel letter strings. The TSSD identifies both word identification (Burt Word Recognition Test and STAR Test) and word attack (Martin & Pratt Non-Word Reading Test) tests, however, the word attack test listed (Martin & Pratt Non-Word Reading Test) is no longer sold by the New Zealand Council for Educational Research (NZCER), the organisation that sells educational assessments to New Zealand schools, meaning that teachers may forgo assessing this skill. Yet, assessing only word identification ability provides too narrow an assessment of the decoding construct. The ability to decode words using knowledge of phoneme-grapheme relationships and English spelling rules is a prerequisite for orthographic mapping (Frith, 1980;Tunmer & Hoover, 2019), thus it is important for teachers to screen a child's word attack ability because it is a primary feature associated with difficulties in learning to read and spell presented by those with dyslexia (Everatt & Denston, 2020;Gillon, 2018;Nicholson & Dymock, 2015;Snowling, 2000).
This review of the TSSD identified three limitations that may influence the validity and reliability of the assessment protocol: (a) research has not confirmed whether a sequential three-step assessment protocol is an appropriate screening method for dyslexia; (b) teachers are asked to differentiate between average and below-average performance but the TSSD does not describe how the term average should be operationalised; (c) research has not confirmed whether the tests described in the TSSD provide a valid assessment of the reading comprehension, language comprehension, and decoding constructs.
The aforementioned limitations mean that research is needed to investigate whether the TSSD is a valid dyslexia screening protocol. The present research examined whether classification based on tests from the TSSD can identify children who exhibit a dyslexia profile. It also investigated what impact the use of a three-step protocol has on the identification and classification of children with reading difficulties. Finally, it examined what proportion of children is identified as dyslexic when using different language comprehension cut-off points. Having determined a liberal cut-off for reading comprehension weaknesses to avoid missing children with reading difficulties, the classification of difficulties into dyslexia-based versus language comprehension still requires a cut-off for the second step, which assesses children's language comprehension ability. In this study, we look at the impact using various points for this second step decision has, given that the majority of those with reading comprehension weaknesses should be in the sample.

Participants
The participants came from nine primary schools in an urban city in New Zealand. These children were in Years 3, 4, and 5 (aged 8-10 years). Children in these year levels were targeted as reading comprehension ability is influenced, to a similar extent, by both decoding and language comprehension ability in this age range Catts, 2018;Georgiou et al., 2009). Principals and teachers of the target year groups were provided with information and consent forms and all agreed to support the research by identifying children to participate in the research and releasing them from their classrooms to complete the four individually administered assessment sessions. Schools were asked to identify children who performed below the 40th percentile on one of two school-based standardised assessments commonly used within New Zealand: the e-asTTle Reading test (Auckland UniServices Limited, 2009) or the Progressive Achievement Test for Reading Comprehension (Darr et al., 2008). A liberal cut-off point (40th percentile) was used in this research because New Zealand has a relatively large proportion of children who exhibit reading difficulties (Ministry of Education, 2017). The 40% figure was considered to be the most liberal that a teacher would use (see above) and would include those who would have been selected via a lower percentile score. Teachers were also able to nominate children who exhibited reading difficulties on other school assessments. All of the children identified were invited to take part in this research. Consent to participate in this research was obtained from all the participating children and their parents. This research adhered to the ethical requirements of the New Zealand university in which the authors were working and an application to one of its ethics committees was approved.
In total, 216 English-speaking children took part in this study. Seven children performed above the 40th percentile on the researcher administered Passage Comprehension test from the Woodcock-Johnson IV (WJIV; Schrank et al., 2014) and were excluded from the research, leaving a final sample of 209 children with an average age of nine years and eight months (SD age = 11 months). Table 1 provides an overview of the participants broken down by year and gender.

Procedure and Measures
All children undertook seven individually administered assessments carried out by the first author. The assessments, for each child, were completed over four sessions lasting approximately 20 min each within a two-week period. For reliability purposes, a second marker reviewed 20% of the assessment record sheets. No discrepancies between markers were identified during this process. Decoding and language comprehension were assessed using tests from the WJIV that are commonly used by assessors (New Zealand Qualifications Authority, n.d.-b) and researchers (Aaron et al., 1999;Catts et al., 2003) who wish to identify children with dyslexia. Decoding and language comprehension were also assessed using tests named for use in the TSSD. The following sections describe these tests. Tests that assess decoding and language comprehension ability from the WJIV that are typically used by assessors were included to determine the conclusions from a professional assessment. Tests from the recommended list in the TSSD were used to indicate what a teacher using this protocol may conclude. Comparisons of the two with the same children can then be used to determine the level of similarity/dissimilarity between the two procedures.

WJIV Reading Comprehension Assessment
Reading comprehension ability was assessed using the Passage Comprehension test from the WJIV (Schrank et al., 2014). This test required students to read short passages of text silently and then supply a key missing word in each passage. The initial items on this test were one sentence in length. As children progressed through the test, the items increased in length and complexity. The Examiners Manual (Mather . Both tests were administered following the procedures described in the WJIV manual and were stopped when a child made six consecutive errors.

TSSD Decoding/Word Reading Assessment
The Burt Word Recognition Test (Burt test; Gilmore et al., 1981) from the TSSD was used to assess children's word identification ability. This test assessed children's ability to read a range of regular and irregular words that increased in length and complexity. Testing ceased when children were unable to correctly read 10 consecutive items. The test manual reports high internal consistency (.97) within the 8.03-10.09 age range (Gilmore et al., 1981). Word-attack was not assessed as the TSSD test identified to assess word attack ability is not available for teachers.

WJIV Language Comprehension
Language comprehension was measured using the Oral Comprehension test from the WJIV Oral Language battery and the Oral Vocabulary test from the WJIV Cognitive battery. The Oral Comprehension test required children to listen to short passages and then supply a missing final word to each. The Oral Vocabulary test required children to provide synonyms and antonyms for orally presented words.
The tests demonstrated excellent reliability within this sample (Oral Comprehension = .75; Oral Vocabulary = .84), similar to those reported in the WJIV manual (Schrank et al., 2014; Oral Comprehension = .82, Oral Vocabulary = .89). The tests were administered following the procedures outlined in the WJIV manual and ceased when the child made six consecutive errors.

TSSD Language Comprehension Assessment
The TSSD identifies two standardised tests that can be used to assess vocabulary ability (British Picture Vocabulary Scale and Peabody Picture Vocabulary Test). In the current research, children's vocabulary ability was assessed using the British Picture Vocabulary Scale, 3rd Edition (BPVS-III; Dunn et al., 2009). The BPVS-III test assessed children's ability to identify one picture from a selection of four that represented an orally presented word. A reliability figure of 0.91 has been reported for this assessment (Dockrell & Marshall, 2015). This test was discontinued once children made eight or more errors in a set of 12 items. The listening comprehension test (PAT Listening Comprehension) was not administered because it is difficult to differentiate accurately between average and below-average performance within a year level using the scale scores and stanines reported by this assessment.

Standard and Composite Scoring
Initially, raw scores from the WJIV subtests and the BPVS-III were converted to standard scores using the relevant administration manuals or conversion software. The Burt test reports age equivalent bands. Whilst these scores can be used to identify average and below-average performance, they were not suitable for classification purposes in this research. We calculated standard scores for the Burt test using the mean and standard deviation from a study that administered this assessment to a similar age group of children who were representative of all ability levels (see Mandelaine & Wheldall, 1998). A composite decoding score was calculated by finding each child's average standard score on the Word Attack and Letter Word Identification tests. A composite language comprehension score was derived by calculating each child's average score on the Oral Comprehension and Oral Vocabulary tests. These composite scores were used to classify children. To ensure the four aforementioned tests provided a reliable indication of children's decoding and language comprehension ability, reliability scores were calculated for this sample and have been reported within the above sections. All analyses were performed using IBM SPSS Statistics for Windows, version 26.0. Table 2 reports the mean and standard deviation in standard scores for each test that was administered. Initially, children were classified according to their performance on the composite decoding and language comprehension variables. This analysis is referred to as the WJIV classification analysis. The composite variables were entered into a two-step cluster analysis that used log-likelihood as the distance measure. In the first step, the program examined every record and decided whether that record should be merged with a previously formed group of records (cluster) or whether it should form the basis for a new cluster based on a specified distance criterion. In the second step, the program took the clusters identified in the first step and grouped them into the desired number of clusters. In this analysis, the program was allowed to determine the optimal number of groupings. Recent research indicated that classification based on a cluster analysis provided a better fit for the data than alternative cut-off line approaches (Sleeman et al., 2021). As expected, three poor reader groups were identified: mixed difficulty, SCD, and dyslexia. The majority of children (83.3%) were assigned to the dyslexia (38.8%) and SCD (44.5%) groups. A smaller proportion of children were assigned to the mixed difficulty group (16.7%). Table 3 reports the proportion of children who were assigned to each group.

Results
The next analyses investigated whether the Burt and BPVS-III tests from the TSSD can be used for classification purposes. Analyses indicated a strong correlation between the Burt test and the decoding variable (r = .767, N = 209, p < .001) and between the BPVS-III and the language comprehension variable (r = .624, N = 209, p < .001). This confirms the predicted positive relationship between these variables, however, it does not confirm that these tests will lead to the same classification outcomes.
To evaluate whether the same three poor reader groups can be identified using the Burt and BPVS-III tests a second classification analysis was conducted. As in the previous classification analysis, children were classified using a two-step cluster analysis, referred to as the TSSD classification analysis. Log likelihood was again used as the distance measure and the analysis software program was allowed to determine the optimal number of groupings. The classification analysis using the TSSD identified the same poor reader groups predicted by the SVR. However, the proportion of children assigned to each group differed across the WJIV classification analysis and TSSD classification analysis. A larger proportion of children were assigned to the mixed difficulty group in the TSSD classification analysis (35.9%) than the WJIV classification analysis (16.7%) and a smaller proportion of children were assigned to the dyslexia group (19.6% and 38.8% respectively). The same proportion of children was assigned to the SCD group across the classification approaches (44.5%). Table 3 reports the proportion of children assigned to each group across the classification approaches.  Table 4 reports the results from an analysis that examined the individual-level assignment to groups across the WJIV classification analysis and the TSSD classification analysis. If children were assigned to the same groups across the analyses it would indicate that, respectively, the Burt and BPVS-III tests provide an accurate assessment of decoding and language comprehension ability, which would mean they may be suitable for classification purposes. The results indicate that most children in the mixed (89%) and SCD (71%) groups were assigned to the same group across these approaches. In contrast, a relatively small proportion of children (40%) were accurately assigned to the dyslexia group in the TSSD classification analysis.
The TSSD recommends assessing decoding ability only if a child performs within the average or above-average range on the language comprehension variable. This analysis investigated what effect operationalising average language comprehension ability at different cut-off points (10th, 20th, 30th, 40th, and 50th percentile) had on the proportion of children who were identified as dyslexic using data from the WJIV classification analysis. Note that 81 children were identified as dyslexic in the cluster analysis using scores on the WJIV. However, one of these 81 children performed below the 10th percentile on the language comprehension test. If we use the 10th percentile as the cut-off point in the screening procedures, this child would be considered non-dyslexic and would be excluded from the procedures at this point. However, it seems unlikely that a teacher would interpret the 10th percentile as the point to determine average or above-average skills. To examine this, analyses were performed to compare different cut-offs. These analyses used data from the WJIV classification analysis, rather than the TSSD analysis because of the limitations associated with the TSSD classification analysis. When stricter cut-off points were used (> 10th percentile), fewer children were identified as dyslexic because they did not perform above the language comprehension cut-off point. The results indicated (see Table 5) that operationalising average performance as performance above the 40th percentile led to 70% of children who exhibited the dyslexic profile in the WJIV cluster analysis now being classified as non-dyslexic. This is despite these children showing difficulties with reading comprehension and decoding in the WJIV data. And note that these children's decoding difficulties would not be assessed if this cut-off were used in the TSSD procedures. Such children would be classified as not needing dyslexia-related support only because they are showing some weaknesses in the listening comprehension measure, weaknesses that may be due to poor reading experience leading to less exposure to words, which can in turn lead to poorer vocabulary levels that can impact on listening comprehension performance.

Discussion
Although the TSSD classification analysis also identified the three poor reader groups identified by the WJIV assessment measures, the results indicated that children who exhibit a dyslexia profile were not consistently identified across the two analyses. This suggests that teacher-implemented TSSD procedures will produce differing dyslexia-identification results to assessment procedures that use more widely used standardised tests, and that the recommended, and available, word identification and receptive vocabulary tests may provide an incomplete assessment of the decoding and language comprehension constructs. The findings indicated that only 40% of children identified as having dyslexia by the WJIV assessment measures were assigned to the dyslexia group in the TSSD classification analysis. Over 30% of the children identified as having dyslexia via the WJIV assessment measures were assigned to the mixed difficulty group using the TSSD method. This indicates that when they were assessed using a receptive vocabulary test many of the children showed poorer scores on the language comprehension measure in the TSSD procedure. Difficulties with vocabulary may be due to a lack of reading experience leading to poorer exposure to words in differing contexts that receptive vocabulary measures assess. This negative impact may then increase with age as those with poor reading experience encounter fewer word-context exposures compared to those with good reading levels (i.e., Matthew effects as suggested by Stanovich, 1986).
Nearly 30% of children who were assigned to the dyslexia group by the WJIV assessment measures were assigned to the SCD group in the TSSD classification analysis. This movement is surprising because the dyslexia and SCD groups should exhibit opposing profiles. Children with dyslexia should show decoding difficulties along with relatively unaffected language comprehension ability; whereas, children who exhibit the SCD profile should demonstrate language comprehension difficulties with relatively unaffected decoding ability. To move from the dyslexia group in the WJIV classification analysis to the SCD group in the TSSD classification analysis, children must have performed better on the Burt test than the composite decoding variable and worse on the BPVS-III than the composite language comprehension variable. The latter suggests that these children find receptive vocabulary tests more difficult than tests that assess listening comprehension and expressive vocabulary ability as discussed in the previous paragraph. However, this finding suggests that those with dyslexia will find word identification tests relatively easy compared to word attack tests. This may be due to some children being able to recall words that they have been taught easier than having to use decoding processes to translate letter strings into appropriate verbal forms.

Reading Comprehension
The current research asked schools to identify children who were exhibiting reading difficulties using the e-asTTle Reading test, the Progressive Achievement Test for Reading Comprehension, or other classroom reading assessments. Nearly 97% of children who were nominated by schools performed below the 40th percentile on the researcher administered reading comprehension test. This suggests that teachers could use classroom assessment data to identify children with reading difficulties. Whilst this is a promising finding, we cannot rule out the possibility that other children in these classrooms should have been identified as struggling readers, as we did not assess children who were not nominated to participate in this research.

Language Comprehension
The TSSD states that after assessing reading comprehension ability teachers should assess children's language comprehension ability (Ministry of Education, 2021). One of the recommended language comprehension tests, the BPVS-III, was used in this research. Results indicated that assessing only receptive vocabulary ability, as the BPVS-III does, did not provide a sufficiently robust assessment of language comprehension ability. The TSSD identifies two additional standardised tests that can be used to assess language comprehension ability: The Peabody Picture Vocabulary Test (PPVT) and the PAT Listening Comprehension test. Both the PPVT and the BPVS-III assess receptive vocabulary ability so it seems unlikely that the PPVT will provide a more robust assessment of language comprehension ability than the BPVS-III. Listening comprehension ability was not assessed because the PAT Listening Comprehension test is not sufficiently accurate for classification purposes (New Zealand Council for Educational Research, 2014). The results from the present analyses indicate that assessing listening comprehension ability, in addition to vocabulary knowledge, provides a more robust assessment of language comprehension than vocabulary alone. To ensure listening comprehension is assessed, an appropriate listening comprehension test should be added to the TSSD. Furthermore, the TSSD should also state that teachers should assess both vocabulary and listening comprehension ability. Assessing only one of these skills may not provide a sufficiently accurate assessment of a child's language comprehension ability.

Decoding
If a teacher finds that a child exhibits reading comprehension difficulties but has average or above-average language comprehension ability, the TSSD states that decoding ability should be assessed (Ministry of Education, 2021). The Ministry of Education identifies three standardised tests that can be used to assess decoding ability. One of these tests is the Burt test, which was used to assess word identification ability in this research. Whilst there was a strong positive correlation between the Burt and composite decoding variable, analyses indicated that this test did not provide a sufficiently broad assessment of decoding ability. In the present study, some children found the Word Attack test from the WJIV particularly difficult, which suggests that assessing decoding ability requires using both word identification and word attack tests. The TSSD does not identify a standardised word attack test that can be purchased through NZCER, for use in New Zealand schools. It is recommended that a standardised word attack test that can be purchased from NZCER should be added to the TSSD and the TSSD should state that teachers must assess both a child's word identification and word attack ability. Both word and non-word reading measures can be quick to administer, particularly those with a good stop rule (i.e., the point when the test is stopped due to the number of errors made), so including both in a screening procedure would not increase testing time substantially but should increase the accuracy of assessment and hence improve support.

Assessment Protocol
The TSSD (Ministry of Education, 2021) states that decoding ability should be assessed if a child exhibits reading comprehension difficulties and average, or above average, language comprehension ability. Because struggling readers can exhibit the dyslexia, mixed difficulty, or SCD profile, both decoding and language comprehension ability should be assessed when reading comprehension difficulties are identified. Children who exhibit decoding difficulties in addition to language comprehension difficulties exhibit the mixed difficulty profile. Whereas, children who exhibit unaffected decoding ability exhibit the SCD profile. These children will likely benefit from different instructional approaches because the root cause of their reading difficulties varies (Aaron et al., 2008;Carson et al., 2013;Clarke et al., 2010;Gillon et al., 2019).
The results indicate that the TSSD should define average ability. The proportion of children who were identified as dyslexic varied considerably depending on how the term was operationalised. Teachers may subjectively define this term if a clear definition is not provided, thus potentially reducing the consistency of the tool across schools and between teachers. Rather than directing teachers to screen students using cut-off points, the TSSD could direct teachers to look for a discrepancy between a child's decoding and language comprehension ability. This approach is similar to that employed in the WJIV classification analysis, which identified a group of dyslexic children who exhibited decoding difficulties and relatively unaffected language comprehension ability. If this approach is adopted, the TSSD should describe how a discrepancy should be operationalised. It may be helpful to use discrepancy bands. For example, a large discrepancy may place a child within the highrisk range for dyslexia. However, smaller discrepancies may place a child within a moderate or low-risk group. Because the discrepancy between children's decoding and language comprehension ability falls along a continuum, the use of discrepancy bands may provide a more accurate screening approach than approaches that rely on a single cut-off point. Similar approaches using significant differences (Sleeman, 2021) and discrepancies based on standard deviations (Wagner et al., 2020) have been used to identify children who exhibit the dyslexia profile.

Limitations and Future Research Opportunities
The relative importance of decoding and language comprehension to reading comprehension changes over time (Catts, et al., 2005;Georgiou et al., 2009;Hoover & Gough, 1990), which influences the proportion of children who are assigned to the dyslexia, SCD, and mixed difficulty groups (Catts et al., 2003). Future research examining whether similar patterns to those found in the current research can be identified at different year levels would be valuable as part of comprehensive recommendations across New Zealand school years. Such research may also find that a different set of tests would be more useful to identify the dyslexia, SCD, and mixed difficulty profiles at higher year levels. For example, some research has found that fluency measures provide a better indication of decoding ability in older children than accuracy-only measures (Kershaw & Schatschneider, 2012).

Conclusion
The results indicate that the TSSD requires further refinement to be suitable for the screening of dyslexia and other reading difficulties. Although the process has some positive features, particularly in terms of increasing awareness of issues related to dyslexia and poor reading comprehension among mainstream New Zealand teachers, it has three main limitations. First, the decoding and language comprehension tests from the TSSD do not provide a sufficiently broad assessment of these constructs. As a result, some children (particularly those with dyslexia) may be misclassified using these tests. Second, once teachers identify a child who exhibits reading comprehension difficulties they should be directed to assess both the child's decoding and language comprehension ability. This information can be used to discriminate between children who exhibit the dyslexia, SCD, or mixed difficulty profiles, which may then inform initial support strategies (e.g., focus on making the link between writing and language sounds and/or improving vocabulary knowledge and strategies for understanding oral discourse). Finally, encouraging teachers to identify at-risk children using discrepancy bands based on children's decoding and language comprehension ability may lead to fewer false positives and false negatives than the current approach of determining average performance. With a strategically selected/ developed set of tests, these bands could even be provided for teachers. Therefore, although partially negative, the findings from this study should inform the development of a more accurate dyslexia screening tool suitable for use within New Zealand schools.
Funding Open Access funding enabled and organized by CAUL and its Member Institutions.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/ licenses/by/4.0/.