Abbott, R. D., & Berninger, V. W. (1993). Structural equation modeling of relationships Among developmental skills and writing skills in primary- and intermediate-grade writers. Journal of Educational Psychology,
Applebee, A. N., & Langer, J. A. (2006). The state of writing instruction in America’s schools: What existing data tell us. Albany, NY: University at SUNY, Albany.
Bachman, L. (2004). Statistical analyses for language assessment. Cambridge: Cambridge University Press.
Barkaoui, K. (2007). Rating scale impact on EFL essay marking: A mixed-method study. Assessing Writing,
Beck, S. W., & Jeffery, J. V. (2007). Genres of high-stakes writing assessments and the construct of writing competence. Assessing Writing,
Bereiter, C., & Scardamalia, M. (1987). The psychology of written composition. Hillsdale, NJ: Lawrence Erlbaum.
Bouwer, R., Beguin, A., Sanders, T., & van den Bergh, H. (2015). Effect of genre on the generalizability of writing scores. Language Testing,
Brennan, R. L. (2011). Generalizability theory and classical test theory. Applied Measurement in Education,
Brennan, R. L., Goa, X., & Colton, D. A. (1995). Generalizability analyses of work keys listening and writing tests. Educational and Psychological Measurement,
Coker, D. L., & Ritchey, K. D. (2010). Curriculum based measurement of writing in kindergarten and first grade: An investigation of production and qualitative scores. Exceptional Children,
Cooper, P. L. (1984). The assessment of writing ability: A review of research. GRE Board research report no. GREB 82-15R/ETS research report no. 84-12). Princeton, NJ: Educational Testing Service.
Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measurements: Theory of generalizability for scores and profiles. New York: Wiley.
Cumming, A., Kantor, R., & Powers, D. E. (2002). Decision making while rating ESL/EFL writing tasks: A descriptive framework. The Modern Language Journal,
Deno, S. L. (1985). Curriculum-based measurement: The emerging alternative. Exceptional Children, 52, 219–232.
DeVellis, R. F. (1991). Scale development. Newbury Park, NJ: Sage.
Duke, N. K. (2014). Inside information: Developing powerful readers and writers of informational text through project-based instruction. New York: Scholastic.
Duke, N. K., & Roberts, K. M. (2010). The genre-specific nature of reading comprehension. In D. Wyse, R. Andrews, & J. Hoffman (Eds.), The Routledge international handbook of english, language and literacy teaching (pp. 74–86). London: Routledge.
East, M. (2009). Evaluating the reliability of a detailed analytic scoring rubric for foreign language writing. Assessing Writing,
Eckes, T. (2008). Rater types in writing performance assessments: A classification approach to rater variability. Language Testing,
Espin, C. A., De La Paz, S., Scierka, B. J., & Roelofs, L. (2005). The relationship between curriculum-based measures in written expression and quality and completeness of expository writing for middle school students. The Journal of Special Education, 38, 208–217.
Florida Comprehensive Assessment Test (FCAT) 2012 writing: Grade 4 narrative task anchor set. Retrieved from http://fcat.fldoe.org/pdf/G4N12WritingAnchorSet.pdf.
Gansle, K. A., Noell, G. H., VanDerHeyden, A. M., Naquin, G. M., & Slider, N. J. (2002). Moving beyond total words written: The reliability, criterion validity, and time cost of alternate measures for curriculum-based measurement in writing. School Psychology Review,
Gansle, K. A., Noell, G. H., VanDerHeyden, A. M., Slider, N. J., Hoffpauir, L. D., Whitmarsh, E. L., et al. (2004). An examination of the criterion validity and sensitivity to brief intervention of alternate curriculum-based measures of writing skill. Psychology in the Schools,
Gansle, K. A., VanDerHeyden, A. M., Noell, G. H., Resetar, J. L., & Williams, K. L. (2006). The technical adequacy of curriculum-based and rating-based measures of written expression for elementary school students. School Psychology Review,
Gebril, A. (2009). Score generalizability of academic writing tasks: Does one test method fit it all? Language Testing,
Graham, S., Berninger, V. W., Abbott, R. D., Abbott, S. P., & Whitaker, D. (1997). Role of mechanics in composing of elementary school students: A new methodological approach. Journal of Educational Psychology,
Graham, S., Harris, K., & Hebert, M. (2011). Informing writing: The benefits of formative assessment. Washington, DC: Alliance for Excellent Education.
Hale, G., Taylor, C., Bridgeman, B., Carson, J., Kroll, B., & Kantor, R. (1996). A study of the writing tasks assigned in academic degree programs. In: TOEFL Research Report 54. Princeton, NJ: Educational Testing Service.
Hammill, D. D., & Larsen, S. C. (1996). Test of Written Language-3. Austin, TX: Pro-ed.
Hammill, D. D., & Larsen, S. C. (2009). Test of Written Language-4th edition (TOWL-4). Austin, TX: Pro-Ed.
Hamp-Lyons, L. (2007). Worrying about rating. Assessing Writing,
Huot, B. (1990). The literature of direct writing assessment: Major concerns and prevailing trends. Review of Educational Research,
Jewell J., & Malecki C. K. (2005). The utility of CBM written language indices: An investigation of production-dependent, production-independent, and accurate-production scores. School Psychology Review,
Kim, Y.-S., Al Otaiba, S., Puranik, C., Sidler, J. F., Greulich, L., & Wagner, R. K. (2011). Componential skills of beginning writing: An exploratory study. Learning and Individual Differences,
Kim, Y.-S., Al Otaiba, S., Sidler, J. F., & Greulich, L. (2013). Language, literacy, attentional behaviors, and instructional quality predictors of written composition for first graders. Early Childhood Research Quarterly,
Kim, Y.-S., Al Otaiba, S., Folsom, J. S., Greulich, L., & Puranik, C. (2014). Evaluating the dimensionality of first grade written composition. Journal of Speech, Language, and Hearing Research,
Kim, Y.-S., Al Otaiba, S., Wanzek, J., & Gatlin, B. (2015). Towards an understanding of dimension, predictors, and gender gaps in written composition. Journal of Educational Psychology,
Kondo-Brown, K. (2002). A facets analysis of rater bias in measuring Japanese second language writing performance. Language Testing,
Kuiken, F., & Vedder, I. (2014). Rating written performance: What do raters do and why? Language Testing,
Lane, S., & Sabers, D. (1989). Use of generalizability theory for estimating the dependability of a scoring system for sample essays. Applied Measurement in Education,
Lembke, E., Deno, S. L., & Hall, K. (2003). Identifying an indicator of growth in early writing proficiency for elementary school students. Assessment for Effective Intervention,
McMaster, K. L., Du, X., & Pestursdottir, A. L. (2009). Technical features of curriculum-based measures for beginning writers. Journal of Learning Disabilities,
McMaster, K. L., Du, X., Yeo, S., Deno, S. L., Parker, D., & Ellis, T. (2011). Curriculum-based measures of beginning writing: Technical features of the slope. Exceptional Children,
McMaster, K., & Espin, C. (2007). Technical features of curriculum-based measurement in writing: A literature review. The Journal of Special Education,
Moore, & T., Morton, J. (1999). Authenticity in the IELTS academic module writing test: A comparative study of task 2 items and university assignments. In: IELTS Research Reports No. 2 (pp. 74–116). Canberra: IELTS Australia.
Mushquash, C., & O’Connor, B. P. (2006). SPSS and SAS programs for generalizability theory analyses. Behavioral Research Methods,
National Center for Education Statistics. (1999). The NAEP 1998 writing report card for the nation and the states, NCES 1999-462, by E. A. Greenwald, H. R. Persky, J. R. Campbell, and J. Mazzeo. Washington, DC.
National Center for Education Statistics. (2003). The nation’s report card: Writing 2002, NCES 2003-529 by H. R. Persky, M. C. Dane, & Y. Jin. Retrieved from http://nces.ed.gov/.
National Center for Education Statistics. (2012). The nation’s report card: Writing 2011 (NCES 2012-470). Washington, DC: Institute of Education Sciences, U.S. Department of Education. Retrieved from http://nces.ed.gov/nationsreportcard/pdf/main2011/2012470.pdf.
National Governors Association Center for Best Practices & Council of Chief State School Officers. (2010). Common Core State Standards for English language arts and literacy in history/social studies, science, and technical subjects. Washington, DC: Authors.
Nunnally, J. C. (1967). Psychometric theory. New York: McGraw Hill.
Olinghouse, N. G. (2008). Student- and instruction-level predictors of narrative writing in third-grade students. Reading and Writing: An Interdisciplinary Journal,
Olinghouse, N. G., & Graham, S. (2009). The relationship between discourse knowledge and the writing performance of elementary-grade students. Journal of Educational Psychology,
Olinghouse, N. G., Santangelo, T., & Wilson, J. (2012). Examining the validity of single-occasion, single-genre, holistically scored writing assessments. In E. Van Steendam (Ed.), Measuring writing: Recent insights into theory, methodology and practices (pp. 55–82). Leiden: Koninklije Brill.
Puranik, C. S., Lombardino, L. J., & Altmann, L. J. (2007). Writing through retellings: An exploratory study of language-impaired and dyslexic populations. Reading and Writing: An Interdisciplinary Journal,
Puranik, C., Lombardino, L., & Altmann, L. (2008). Assessing the microstructure of written language using a retelling paradigm. American Journal of Speech Language Pathology,
Schoonen, R. (2005). Generalizability of writing scores: An application of structural equation modeling. Language Testing,
Schoonen, R. (2012). The validity and generalizability of writing scores: The effect of rater, task and language. In E. Van Steendam (Ed.), Measuring writing: Recent insights into theory, methodology and practices (pp. 1–22). Leiden: Koninklije Brill.
Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: A primer. Newbury Park, CA: Sage.
Shavelson, R., Webb, N., & Rowley, G. (1989). Generalizability theory. American Psychologist,
Stuhlmann, J., Daniel, C., Delinger, A., Denny, R. K., & Powers, T. (1999). A generalizability study of the effects of training on teachers’ abilities to rate children’s writing using a rubric. Journal of Reading Psychology,
Swartz, C. W., Hooper, S. R., Montgomery, J. W., Wakely, M. B., de Kruif, R. E. L., Reed, M., et al. (1999). Using generalizability theory to estimate the reliability of writing scores derived from holistic and analytical scoring methods. Education and Psychological Measurement,
Tillema, M., van den Bergh, H., Rijlaarsdam, G., & Sanders, T. (2012). Quantifying the quality difference between L1 and L2 essays: A rating procedure with bilingual raters and L1 and L2 benchmark essays. Language Testing,
van den Bergh, H., De Maeyer, S., van Weijen, D., & Tillema, M. (2012). Generalizability of text quality scores. In E. Van Steendam (Ed.), Measuring writing: Recent insights into theory, methodology and practices (pp. 23–32). Leiden: Koninklije Brill.
Wagner, R. K., Puranik, C. S., Foorman, B., Foster, E., Tschinkel, E., & Kantor, P. T. (2011). Modeling the development of written language. Reading and Writing: An Interdisciplinary Journal,
Wechsler, D. (2009). Wechsler Individual Achievement Test-3rd edition (WIAT-3). San Antonio, TX: Pearson.
Weigle, S. C. (1998). Using FACETS to model rater training effects. Language Testing,