Re-imagining narrative writing and assessment: a post-NAPLAN craft-based rubric for creative writing

According to creative writing pedagogies academic Susanne Gannon (English in Australia, 54(2), 43–56, 2019), and the Federal government-commissioned NAPLAN review (McGaw et al., 2020), NAPLAN has restricted how writing is taught in secondary schools. A NAPLAN-influenced structural approach to teaching writing has subsumed the development of imaginative capacity. Given the considerable negative criticism of the NAPLAN writing tests, including the negative impact it has had on the teaching of writing, there is a need, we argue, for a fit-for-purpose assessment rubric that assesses creative writing. In a 10-week project, teaching creative writing with three classes of Year 9 students in Steiner schools, we evaluated the use of a novel creative writing rubric, created by published creative writers and lecturers (the second and third authors), to assess the students’ creative writing pre- and post-program. Consecutively, the NAPLAN narrative criteria were also used to assess the same writing scripts as a point of comparison. The creative writing criteria privileged craft-based approaches to imaginative writing compared to the function and form-focused criteria of NAPLAN. Statistical analyses of the reliability and validity of the creative writing rubric showed that the construct can be scored with a significant moderate level of reliably by different raters (r = 0.5–0.7; ICC = 0.6). Internal consistency reliability of the criteria was found to be excellent (Cronbach’s alpha = 0.94). Content validity of the instrument was found to be strong (r = 0.7–0.9) and significant. Unexpectedly, analyses for concurrent validity showed that the instrument correlates strongly (r = 0.7) and significantly with the NAPLAN narrative rubric, suggesting some overlap, but not parity with the NAPLAN assessment. We found that students’ post-project writing improved in all aspects according to the creative writing rubric, with a statistically significant improvement in students’ structural elements and presentation and group average improvement approaching significance in two other criteria: words, sentence, and voice and characters and context (effect sizes d = 0.3–0.4). However, there were no significant improvements in the students’ post-program writing according to the NAPLAN criteria, possibly because the NAPLAN narrative task criteria did not capture student development of a unique writing style or individual “voice” or other craft-based standards of proficiency measured by the creative writing rubric. Given the validity and reliability evidence, we conclude that the creative writing rubric is a fit-for-purpose guide to school-based learning and assessment of creative writing.


Introduction
The Australian Curriculum Assessment and Reporting Authority (ACARA) is the independent authority responsible for the development and management of the National Assessment Program (NAP) which includes the National Assessment Program -Literacy and Numeracy (NAPLAN). The writing assessment tasks within the literacy component of the test alternate from year to year between narrative writing and persuasive writing. The ACARA (2019a) National Literacy Learning Progression, Creating texts sub-element classifies a narrative as an imaginative text, as distinct from the two other main text types that are taught: informative and persuasive texts. ACARA further describes the competency of "Creating Texts" as the ability to "Plan, draft and publish imaginative, informative and persuasive texts containing key information and supporting details for a widening range of audiences, demonstrating increasing control over text structures and language features." However, the concept of control in the NAPLAN rubric conforms to conventional, rather than flexible and experimental, concepts of narrative structure and language choice.
Although imaginative texts are recognized by ACARA as a text type to be taught explicitly and produced by students, seldom are teachers and pre-service teachers offered explicit tools to embed imaginative text writing in their classrooms or given the pedagogical skills to teach creative writing (Gannon, 2009(Gannon, , 2019. The notion of students being empowered as writers of their own texts is all-too-often subsumed by the need to meet other curriculum goals and outcomes, which focus on criteria that do not necessarily help students become proficient creative writers. Literacy academics (Caldwell & White, 2016;Gannon, 2011Gannon, , 2019 and authors of government-commissioned reports who have critically reviewed NAPLAN (Perelman, 2018;McGaw et al., 2020) have commented that the NAPLAN writing assessment criteria (Appendix A) focus on generic and derivative writing skills and knowledge that tend to wash back into the competencies taught in the classroom. Washback refers to the extent to which a test influences teaching and learning; teachers may teach directly for specific test preparation, or learners might focus on specific aspects of language learning found in assessments, instead of receiving more comprehensive development in the requisite skills and knowledge (Messick, 1996).
One of the main criticisms of the NAPLAN writing test to come out of the Perelman (2018) and McGaw et al. (2020) reviews is the large number of criteria used (ten) and the difficulty assessors experience in applying the criteria analytically in the time required. In addition, our criticism of the 10 criteria is that they focus mainly on an analytical evaluation of function and form. Seven out of 10 of the NAPLAN criteria involve a structural focus on function and form (Appendix A, criterion 2 and 5-10). Comparatively, only one of the five criteria used in the creative writing rubric we evaluate in this study has a structural focus on function and form (Appendix B, criterion 5); the other criteria involve a focus on creative writing craft and innovation.
The NAPLAN narrative criteria can be summarized to cover (1) audience, (2) text structure, (3) ideas, (4) character and setting, (5) vocabulary, (6) cohesion, (7) paragraphing, (8) sentence structure, (9) punctuation, and (10) spelling. Regarding criterion four, the writer should be able to orient, engage, and affect the reader (ACARA, 2010). Except for criterion four, which requires greater rating subjectivity, this choice of criteria aligns with standardized testing constructs which are more statistically reliable and valid when measured through inter-rater reliability and test re-test evaluations (Muenz et al., 1999). Tests need to be evaluated for reliability and validity, but a one-size-fits-all approach to the determination of what competencies to assess can lead to a negative washback effect on writing practice, restricting, and distorting what is taught about creative writing.
Since NAPLAN commenced in 2008, there has been little-to-no improvement across Australia in young people's writing skills (ACARA, 2019b;McGaw et al., 2020;Perelman, 2018;Thomas, 2019;Wyatt-Smith & Jackson, 2016). In the last decade, NAPLAN writing achievement has been static in years 3 and 5 and has declined in years 7 and 9 (McGaw et al., 2020). It may therefore be assumed that feedback from NAPLAN has done little to inform pedagogy in the classroom to improve young people's writing capacity. State and federal governments and the media like to blame teachers and schools for this state of affairs (Gannon, 2019), but it is also likely that washback from the NAPLAN test is contributing to falling standards (Perelman, 2018). Some teachers, possibly due to a lack of knowledge of how to teach creative writing, have resorted to explicitly teaching-to-the-test with assistance from publishers and tutoring services that prepare students for NAPLAN (McGaw et al., 2020).
As a result, the current approach to narrative writing in Australian schools is narrow and reductive and does not assist students to use their imaginations to become innovative creative writers (Caldwell & White, 2016;Gannon, 2011Gannon, , 2019Perelman, 2018;McGaw et al., 2020). Student narratives that demonstrate creativity and originality are penalized in the NAPLAN marking criteria (Caldwell & White, 2016). Creativity is "steadily losing ground… undervalued and increasingly outlawed" (Harris, 2014, p. 18) in Australian classrooms, in favor of a set of skills that can be prescriptively taught and assessed by standardized tests.
Another reason for falling standards may be that students and their teachers are not being afforded the opportunity to see themselves as writers. Gannon (2009) suggests that the problem with writing is that students are not writing, not writing often enough, freely enough, or having the chance to explore and expand their capacities in the craft of creative writing. Instead, students across Australian high schools are compelled to write to criteria which result in wooden and narrow narratives, monitored, and judged for their correct use of sentences, their employment of long words, and their carefully formulaic five paragraph essays, in essence, their mimicry of a narrow range of text types. Young writers are constrained by a restrictive set of criteria that do not enable them to develop their own unique writer's voice or capacity with words (Gannon, 2011). Peel et al. (2000) suggest that teachers mostly teach writing from a critical readerly point of view and are less confident teaching as writers because they see themselves as readers and are themselves less practiced at writing beyond the perennial genre of the expository essay. We posit that the first step in freeing young writers from the restrictive criteria that is not enabling them to become creative writers is for English teachers themselves to practice the craft of creative writing and at least perform creative pedagogical strategies in the classroom in order to model (and assess) student efforts at being creative (Williams, 2019).

The craft of writing and finding a voice
Learning the craft of writing, which means to innovate with style and play with words, enables writers to find their own "voice." Once writing craft is learned, which includes the understanding and use of structural elements (grammar, spelling, punctuation) for particular effect and not as ends in themselves, the ability to write artfully becomes a versatile, highly transferable skill that serves students across subjects, disciplines, and contexts.
The hallmark of creative writing is the writer's ability to create a unique and individual "voice," one that can impact readers, move them, inspire them, as well as inform them. Creative writing, from a writer's perspective, reveals the world to a reader in a new wayenables the reader to shift perspective, even belief systems. This requires both craft-based skill and imaginative capacity.
Much of the power of writing is found "between the lines" and in the white spaces, especially in what is left unwritten. Elision and sentence fragments, which are assessed as errors in formal genres, have the power to affect readers. These elements cannot be adequately assessed by a rubric that penalizes such experimentation with form or, in the case of the NAPLAN narrative writing task, a limited version of a story with a specific conflict-resolution plot focus which does not allow for other types of storytelling (Caldwell & White, 2017).

Imaginative capacities and creative writing
We have coined the term imaginative capacities to evaluate how, and to what extent, creative writing skills are generated and applied. The Australian curriculum refers to the term creative thinking in an attempt to capture these capacities, but does not go far enough to demonstrate exactly how "students learn… to generate and apply new ideas in specific contexts… [involving] complex representations and images" (ACARA, 2019c). The concept of creative capacities draws upon Bloom's taxonomy which defines such capacities as "putting elements together to form a coherent or functional whole; reorganizing elements into a new pattern or structure through generating, planning, or producing" (Anderson & Krathwohl, 2001, p. 21). The concept is also informed by Franklin and Theal (2021) in Developing Creative Capacities in which they suggest that such capacities are qualities of subjects (creative writers and thinkers) rather than specific applied techniques. Brolin (1992) previously identified these capacities as "strong motivation, endurance, intellectual curiosity, independence in thought and action, strong desire for self-realization, and a strong sense of self…" (p.64).
Creative writing is a term used loosely in schools and universities and in general to refer to the act of writing something that is not defined by a particular form or goal-focused expectation. What sets it apart from other types of writing is that the operative adjective "creative" is often ignored (and avoided) in writing activities set in Australian schools, even in narrative and "creative" tasks. Seemingly unmeasurable, "creativity" is not taught; only the various forms of writing, such as narrative writing, essay writing, personal writing, persuasive writing, or speech writing are explicitly taught.
All of these types of writing are, in fact, creative, in that writers are creators of ideas, of a multiplicity of worlds. What specific types of writing do not make space for is the idea that learning to write well is a process that goes far beyond a skill set employed and relies fundamentally on the writer's imaginative capacity and the writer's ability to control and craft those imaginings into words. Without this unquantified capacity, writing tasks are mechanistic. It is every young writer's job to make their words work well and to reflect as best as they can an imaginative realm of ideas, a story, or landscape. To do that, young writers need the space to imagine things not prescribed and secondly to be taught craft-based skills, not just for "creative" tasks but for a multitude of writing assignments. To that end, being able to identify and assess those craft-based skills are essential for the development of a young writer's capacity to imagine and create.

The project
In 2018 and 2019, we embarked on a funded research project that aimed to evaluate a creative writing project and a set of novel criteria for the assessment of creative narrative tasks. This paper reports on a component of that project: the evaluation of a novel creative writing rubric used to assess students on the competencies taught during the project. To evaluate the validity and reliability of the creative writing rubric, we conducted a series of statistical tests on the instrument and also compared it concurrently against the NAPLAN standardized national instrument for assessment of narrative writing. We did this by using two NAPLAN narrative tasks with the same three groups of students delivered before and after the project. The scripts were then assessed using both the NAPLAN rubric and the creative writing rubric pre-and post-project to determine any improvement in writing proficiency on both independent scales.

The study and participants
Over the course of one term (10 weeks), three classes of Year 9 students from three Steiner schools (N = 54) embarked on a creative writing project. Our study, entitled "Approaching Literacy Through Narrative and Creative Writing" was funded by a grant from Steiner Education Australia. The pragmatic purpose of the project was to address the national priority of Year 9 literacy, where often engagement and attitudes to writing present challenges to educators, students, and families. To this end, this creative writing project focused on increasing awareness of writing for a purpose that had meaning beyond the acquisition of a skillset required to perform well in standardized tests.
The study was informed by a synthesis of Steiner pedagogies and performative approaches to creative writing where craft-based exercises allowed young writers to "perform" the act of writing, for example, using short sentences, or long sentences, using minimalism, or engaging in "lazy writing" to experience first-hand the effects of their writing. The English teachers introducing the project were provided with 1 day of professional development on delivering the creative writing project where they did most of the exercises that they were going to be presenting to the students and discussed the pedagogy and the effect of each exercise in building a particular writing capacity.
The pedagogical approach of the Steiner Curriculum lent itself to this project, as Year 9 thematically focuses on autobiography and the "hero's journey" which, according to Steiner, meets the developmental need of early teens to make meaning of their own journeys and become the heroes in their own narratives (Steiner, 1996). Therefore, the Steiner schools we engaged in the project had the capacity to allow for 10 weeks of English lessons to be taken up by our craft-focused creative writing project using specific scaffolded creative writing exercises designed to facilitate engagement. We hoped that this novel pedagogy would have impact on this bounded case of Steiner schools, resulting in improved attitudes towards writing and improved writing ability in some of the criteria we assessed with the two rubrics. We acknowledge that this is a very specific context, with specific pedagogies, so the findings are not generalizable to the large-scale environment of standardized testing.

The creative writing project
Each week, in classes run as creative writing workshops led by the teacher of the class, each student wrote between 300 and 700 words in class, so that by the end of the 10-week project, each individual produced a notebook of approximately 7000 words of exercises. These pieces were not assessed, but verbal, formative feedback from fellow students and the teacher was given as young writers shared their attempts. The creative writing activities, which culminated in a short story, are summarized in Table 1.
Over the 10 weeks, the students spent their English lessons working through some of the chapters in the progressively designed creative writing textbook Playing with Words: An Introduction to Creative Writing Craft (Davidow & Williams, 2016). As outlined above, students focused on reading specific short stories from a writer's point of view, writing exercises without being assessed and while being allowed to play, to "fail," and to discover what words do. Then later, they self-reviewed a longer text with an editing criteria sheet, peerreviewed, and then submitted their longer texts/stories to their teacher for final feedback. Before and after this activity, they also sat a pre-and post-project NAPLAN narrative writing test. At no time were the students "taught to the test," and none of the NAPLAN criteria was considered in the teaching of the craft-based exercises.

The creative writing rubric
In this study, we evaluate a novel creative writing rubric (Appendix B), designed to be used by teachers and students in the classroom environment, which differs from the use of the NAPLAN narrative rubric: for large-scale standardized census testing. However, we thought it would be a useful exercise to compare the two rubrics, mainly because the NAPLAN rubric has found its way into classrooms as a teaching resource for feedforward on writing standards and assessment of students' writing. The creative writing rubric was designed for assessment of creative writing scripts in a creative writing course or in school assessment within a class. The rubric is holistically, rather than analytically scored. "A holistic scale measures the relative success of a text but does so through a rubric that incorporates many of the traits in analytic scoring as heuristics towards a conception of a whole rather than as a sum of autonomous components" (Perelman, 2018, p. 11). The rating scale contains integrated standards of proficiency (i.e., integrated rather than discrete assessment standards). Each integrated standard is equally weighted on a scale of 1-10. This differs from the NAPLAN rubric, which assesses standards based on 10 multi-trait analytically scored discrete descriptors. Holistic scoring was developed to achieve greater reliability than multi-trait scoring, which often had large inconsistency among markers' scores. Acceptable levels of inter-rater reliability, with correlations of 0.7 to 0.8, have been achieved (Elliot, 2005;White, 1984) by training readers to rate essays holistically (Perelman, 2018).
Another form of the creative writing rubric is supplied to teachers and student writers which has question prompts added (Appendix C) to further exemplify the criteria for users of the rubric (Davidow & Williams, 2016). The premise on which this form of the rubric is based is that it provides dialogue and interaction, rather than fixed criteria, and is a learning tool. This is a two-way process by which teachers (assessment moderators) and users engage with a more fluid assessment model based on the imaginative capacities of the writer. In order to help writers to improve future work, not past work, we apply Marshall Goldsmith's concept of "feedforward" (Gonzalez, 2021) and Hirsh's The Feedback Fix (Hirsh, 2017) The feedforward is formative rather than summative and encourages improvement, viewing each piece as a draft that needs reworking rather than a final product.
The creative writing rubric was designed according to a minimalist stylistic tradition originating with the French Realists (and in particular, Flaubert's insistence on le mot juste [the right word or expression] (Flaubert, 1856, cited in Hamrick, 2017, n.p.); Strunk and White's The Elements of Style (2014Style ( [1918) and their advocacy for a simple writing style in order to give writing power (p. 24); George Orwell's rules for concreteness and clarity (Orwell, 2006(Orwell, , n.p. [1946); Ernest Hemingway who argues that simple language denotes actual things and gives us true experience of the world (Lodge, 2018); Raymond Carver's notion of aesthetic delight using "commonplace but precise language" (Carver, 1981, n.p.); and Annie Dillard's rules for avoiding fancy prose (Dillard, 1981). Minimalism came to the fore in the twentieth century and since then has retained prominence as a literary "gold standard" aspired to by writers for the demands it makes on clarity of expression and rigorous editing.

Data collection and analysis
NAPLAN narrative writing tasks from the 2009 and 2010 test batteries were used as the stimulus materials for pre-and post-project tests of writing so that we could concurrently assess student writing with a NAPLAN rubric and compare task performance against the creative writing rubric. The narrative writing tasks, which are the same task for students in Years 7 and 9, were administered to the Steiner Year 9 student participants by their Steiner teacher following the NAPLAN administration protocol: The administration of the writing tasks employs closely scripted scaffolding. The teacher reads the directions on the writing prompt aloud to all students. The prompt includes images which can support students in crafting their response. Students have 5 minutes to plan, 30 minutes to write and 5 minutes to edit (ACARA, 2010). Both the narrative writing task raters and researchers were blinded to which writing task was used pre and post the creative writing project to prevent the rater basing their rating decisions on a perception of which may be pre-or post-project scripts as they rated. In addition, all the pre-and post-project tests from the three Steiner schools were de-identified and randomly coded with reidentifiable numbers created with a random number generator. The scripts were randomized and distributed non-sequentially to the raters to prevent the rater from basing their rating decisions on a pattern that was emerging in school groups and to ameliorate list effect bias.
Proficiency in narrative writing can be categorized as a latent variable (Borsboom, et al., 2003). A latent variable is one which cannot be directly measured because it cannot be directly observed as a single variable. However, it is made up of various underlying variables which are observable, definable, and therefore measurable. These underlying variables are the criteria that are detailed in the rubrics.
Standard procedures to establish if the novel creative writing rubric is reliable and valid were conducted: (1) inter-rater reliability, (2) content validity, and (3) concurrent validity.
Inter-rater reliability is evidenced through an inter-rater reliability analysis, whereby a group of trained raters' scores for a sample of scripts are tested for their level of correlation. Content validity is an estimate of the extent to which each creative writing criterion appears to be measuring the same concept of creative writing proficiency.
Concurrent validity is demonstrated when a test correlates well with a measure of the same overall skill (latent variable) that has previously been validated, in this case with NAPLAN. Concurrent validity establishes that the same skill (latent variable of narrative proficiency) is being measured, not that the underlying variables (narrative writing criteria) are equivalent. Concurrent validity is evidence that a test construct is valid as it shows that the test measures a construct, in this case proficiency in narrative writing, in a way that does not differ significantly from another established test instrument (Author, 2009). Therefore, the purpose of conducting a concurrent validity analysis in this study was to ascertain whether the creative writing rubric measures the same construct of narrative writing in a way that is not significantly different from the NAPLAN test, not to establish that the test rubrics are similar. The aim was to determine the extent to which the same students' writing receives similar results for the same writing task. This level of association confirms that the latent trait of narrative writing proficiency is significantly associated between the two test instruments. To ascertain the concurrent validity of the creative writing grading rubric, all the writing scripts were graded by an accredited and experienced NAPLAN rater using the NAPLAN narrative rubric (NAPLAN, 2010; Appendix A) and also by five university tutors in creative writing using the creative writing rubric (Authors, 2016;Appendix B).
Finally, to assess the proficiency of the students' creative writing responses pre-and postproject, paired t-tests were performed on the mean differences between data gathered pre-and post-project for both the NAPLAN rubric-rated and creative writing rubric-rated scripts. Five research questions were developed to guide the initial evaluation of the creative writing rubric: (questions 1-3) and the main results of interest gained from a comparison of the pre-and postproject ratings of the NAPLAN tasks with the NAPLAN rubric and the creative writing rubric (questions 4-5): Research question 1: Does the creative writing rubric have a construct that is scored reliably by different raters? Research question 2: Does the creative writing rubric have content validity? Research question 3: Does the creative writing rubric have concurrent validity based on association with the NAPLAN narrative task? Research question 4: Is there an improvement in individual features of the writing assessed using the NAPLAN narrative writing grading rubric? Research question 5: Is there an improvement in individual features of the writing assessed using the creative writing rubric?

Results
Research question 1: Does the creative writing rubric have a construct that is scored reliably by different raters? As the creative writing rubric was not previously validated for inter-rater reliability, a random sample of pre-and post-project scripts (n = 13) were repeat-marked by five creative writing tutors who were experienced in teaching and assessing with the rubric. To determine the level of inter-rater reliability between the five raters, a Pearson correlation test was performed on the 13 rated scripts ( Table 2). The correlation test revealed that two raters (rater 4 and 5), who were grading differently in the norming session, did not correlate with the other three. Rater 1 had a moderate correlation with raters 2 and 3 (r = .513, p = .073, and r = .637, p = .019), but the correlations did not quite reach significance between rater 1 and rater 3. Raters 2 and 3 correlated strongly (r = .637, p = .019, and r = .714, p = .006). As a result, the decision was made to use the three correlated raters 1-3 for all the grading using the creative writing rubric.
To better assess the inter-rater reliability across the three strongly correlated raters, an intraclass correlation coefficient (ICC) was calculated. The ICC assesses rating reliability by comparing the variability of different ratings of the same subject to the total variation across all ratings and all subjects. It is more sensitive to rater mean differences than Pearson correlation so provides a better measure of rater agreement. A two-way random effects model (absolute agreement) was used to calculate the ICC. The ICC is a value between 0 and 1, where values below 0.5 indicate poor reliability, between 0.5 and 0.75 moderate reliability, between 0.75 and 0.9 good reliability, and any value above 0.9 indicates excellent reliability (Koo & Li, 2016). Our sample size of n = 13 rated by with k = 3 raters met the requirements for ICC to test the hypothesis that the expected ICC = 0.9 with k = 3 raters, a 95% confidence interval, and .006 desired width of the confidence interval = 0.2 (Bonett, 2002). The average rater intraclass correlation between the three raters was found to have moderate reliability, ICC = 0.6, p < 0.01. Research question 2: Does the creative writing rubric have content validity?
The content validity of the creative writing rubric was investigated by using Cronbach's alpha which estimates the extent to which each creative writing criterion appear to be measuring the same concept of creative writing proficiency. A Cronbach's alpha coefficient of 0.94 (N = 52) was obtained which indicates the overall internal consistency reliability of the measure is "excellent" based on the number of items and the mean inter-item correlations (George & Mallery, 2003). All the "alpha if item deleted" coefficients were similar, which suggests consistency of the items in relation to the total score, and removal of any of the five criteria would not improve the overall alpha calculated. This finding indicates that the five criteria are measuring the same concept of creative writing proficiency. Research question 3: Does the creative writing rubric have concurrent validity based on association with the NAPLAN narrative task?
To ascertain the concurrent validity of the creative writing rubric, all the writing scripts were first graded by a trained and experienced NAPLAN rater using the NAPLAN narrative rubric (ACARA, 2010) designed for the narrative writing tasks pre-and post-project. The scripts were randomized so that pre-and post-scripts and scripts from different schools were listed nonsequentially. This was done to prevent the rater grading to a pattern that was emerging and to prevent bias related to the list effect (rating practices changing over time in sequence effecting some scripts more than others). A strong positive and significant correlation (r = .649; p < .01) was found between the mean total scores of student grades between the two grading instruments. Following conventional test validation reasoning, this finding suggests that the creative writing rubric could be considered to be a valid instrument due to its concurrent validity with the NAPLAN instrument. This was an unexpected result that is interpreted in the Discussion section. Research question 4: Is there an improvement in individual features of the writing assessed using the NAPLAN narrative writing grading rubric?
A paired samples t-test (CI = .95) was conducted on the pre-and post-project writing script scores rated with the NAPLAN grading rubric. Group means showed slight improvement for most criteria, but none of the differences was statistically significant (Table 3; see Appendix). Research question 5: Is there an improvement in individual features of the writing assessed using the creative writing grading rubric?
The pre-versus post-project analysis of each rubric criterion showed an improvement for all creative writing criteria, but only the structural elements and presentation criterion reached significance (p = .019). The Those Who Speak: Characters and Context criterion and the Words, Sentences and Voice criterion approached significance with a medium Cohen's d effect size 1 of 0.4, meaning that we can be moderately certain that the participants' structural elements and presentation improved as a result of the project (Table 4; Appendix B).

Discussion
The purpose of this study was to evaluate the validity and reliability of a novel creative writing grading rubric applied to the grading of creative writing tasks performed within three Steiner secondary schools whose students participated in a novel creative writing project for 10 weeks. Two rubrics-the novel creative writing rubric and the NAPLAN narrative rubric-were applied to the grading of two NAPLAN narrative writing tasks delivered pre-and a postproject to compare the novel creative writing rubric with the NAPLAN rubric.
The first three quantitative research questions sought to evaluate the reliability and validity of the creative writing rubric. Statistical analysis showed that the construct can be scored with a significant moderate level of reliably by different raters (r = 0.5-0.7; ICC = 0.6). Internal consistency reliability of the criteria was found to be excellent (Cronbach's alpha = 0.94). Content validity of the instrument was demonstrated through correlations of all criteria which were all strong (r = 0.7-0.9) and significant. Analyses for concurrent validity showed that the creative writing rubric correlates strongly (r = 0.7) and significantly with the NAPLAN narrative assessment instrument.
We were surprised to find strong concurrent validity between the NAPLAN and creative writing criteria despite the differences in focus of the two rubrics. What we assume this tells us is that both sets of criteria similarly distinguished proficiency in the rated scripts, despite the descriptors valuing, or rewarding mostly different features of the writing.
We found the students' post-project writing improved in all aspects according to the creative writing rubric, with a statistically significant improvement in students' structural elements and presentation and group average improvement approaching significance in two other criteria: words, sentence and voice and characters and context (moderate effect sizes d = 0.3-0.4).
However, there were no significant improvements in students' post-program writing according to the NAPLAN criteria. Our interpretation of this finding was that the NAPLAN narrative task criteria did not capture student development of a unique writing style or individual "voice" or other craft-based standards of proficiency that could be captured by the creative writing rubric. The creative writing criteria we evaluated provided a fit-for-purpose construct for the assessment of craft-based creative writing. Students showed improvement in the areas covered in the creative writing project that aligned with the creative writing rubric; it is evident that the NAPLAN rubric does not assess imaginative capacity.
The analysis of inter-rater reliability showed that the raters applying the creative writing rubric criteria did not reach the high level of inter-rater reliability required by a standardized national test. However, it was not our intention in such a small-scale study to propose the creative writing rubric as a replacement for the NAPLAN narrative rubric. Further inter-rater reliability studies with greater numbers of trained raters and sample scripts could be conducted to improve the rigor of this reliability evidence. Instead, we would prefer to study the raters' experience of using the novel rubric through a community of practice approach to rubric development (Grainger et al., 2017) to further refine the rubric criteria, make it easier for raters to interpret, and apply to scripts in the hope of finding better alignment between trained raters' rating decisions.

Conclusion
Assessment of imaginative texts would appear, on the face of it, to be more complicated than the assessment of other text types such as persuasive or informative texts. However, our study has provided evidence that a creatively written text can be assessed with a high level of validity and reliability when the narrative form is taught with a craft-based approach. In this approach, the elements of writing that are privileged are not those that are easier to count, such as spelling and sentence structure "errors" that are traditionally assessed in standardized tests. Given the validity and reliability measures found in this study, we conclude that the creative writing rubric is a fit-for-purpose guide to school-based learning and assessment of creative writing, particularly when teachers join students to adopt dispositions of creative writers who are engaged with developing their writing craft and individual voice.
Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.