Self-assessment: accuracy versus value
There are several inconsistencies in the literature regarding the utility of self-assessment with several authors praising the practice (Blanche & Merino, 1989; Gardner, 2000; McDonald & Boud, 2003; Patri, 2002), while others reporting serious concerns as to the reliability and validity of it (Blue, 1994; Huang, 2010; Matsuno, 2009; Oldfield & MacAlpine, 1995; Sullivan & Hall, 1997). One major perceived benefit of self-assessment is that it is assumed to enhance learners ability to self-monitor their learning, which leads to autonomy (Paris & Paris, 2001). The inherent subjectivity of self-assessment has led to questioning of its validity as a measure of student learning, particularly in relation to foreign language proficiency, and several studies have looked at ways of validating the process through attempts to correlate self-assessment scores and scores of other assessments, such as course grades, tests and teacher or expert ratings (Butler & Lee, 2010). Ross (1998), looking at second-language assessment, performed a meta-analysis of validation studies and concluded that accuracy in self-assessment depended on the skill being evaluated, and found that writing was among the more difficult skills to self-assess reliably. In addition to variation of the skill under assessment, other researchers have found that other factors can influence the accuracy in self -assessment, such as the learners’ proficiency level (Heilenman, 1990, Patri, 2002), second-language anxiety (MacIntyre et al. 1997), and motivation (Dornyei, 2001). Butler and Lee (2010) studied the effects of self-assessment on English language learners in South Korea and found that improvements in accuracy were only marginal over time, and that training in the process was critical, as well as providing sustained feedback to the self-assessors. One teacher in Sato’s study reflected that if self-assessment were ever to be taken seriously by students and teachers, then it needed to be associated with formal grading, rather than remain an ungraded exercise.
Perhaps the most controversial aspect of self-assessment is that students appear under-qualified to accurately assess their own learning when compared to expert (teacher) raters.
Matsuno (2009) for instance, looking at Japanese learners on writing tests, used multifaceted Rasch measurement to find that self-assessors consistently under-rated themselves in comparison to teacher raters, while rating their peers more highly. It was speculated that this was perhaps a result of their Japanese cultural conditioning to appear individually modest while reverential to peers. Matsuno therefore conclude that self-assessment was less accurate and therefore less valuable than other assessments, such as teacher- and peer-assessment.
While there appears to be much evidence that student and teacher assessment rarely show high correlation, there is nonetheless a substantial body of literature promoting self-assessment for other reasons. Bedore and O’Sullivan (2011), for instance, discuss the importance of “removing the instructor from the position of sole authority” (p. 13) while Blanche and Merino (1989), in reviewing the literature on self-assessment, point out a number of studies addressing the increased learner motivation associated with including students in their own assessment. Similarly, Harris (1997) and Gardner (2000) highlight the relationship between self-assessment and increased learner motivation and autonomy. Sadler (1989) suggests that learners can move beyond becoming consumers of education, and places them at the center of their learning. He encapsulates the pedagogical rationale for incorporating self-assessment in the formal grading process by emphasizing the metacognitive benefits inherent to the practice, while emphasizing the increased sense of community it promotes:
Providing guided but direct and authentic evaluative experience for students enables them to develop their evaluative knowledge, thereby bringing them within the guild of people who are able to determine quality using multiple criteria. It also enables transfer of some of the responsibility for making decisions from teacher to learner. In this way, students are gradually exposed to the full set of criteria and the rules for using them and so build up a body of evaluative knowledge. (p. 135)
Further adding importance to self-assessment, Hattie (2008) conducted an extensive quantitative study of over 100 factors that influence student learning (from teacher quality to curriculum design) and found that the single most salient indicator of student learning was their ability to accurately self-assign grades. Even with such evidence of the efficacy of the practice, the reality is that while “teachers embrace the theoretical promise of self-assessment, few devote much time to its practice” (Hilgers et al. 2000, p. 9) indicating that, perhaps, the lack of accuracy in student-assessment outweighs the other pedagogical benefits.
However, Boud (1990, 2000) reminds us that assessment in higher education is often at odds with the purported values espoused by universities, and that self-assessment is one way to encourage critical thinking and responsibility—traits that will serve students well in their lives after graduation: “Assessment therefore needs to be seen as an indispensable accompaniment to lifelong learning. This means that it has to move from the exclusive domain of assessors into the hands of learners” (Boud, 2000, p. 151). Boud (1990) laments the lack of student participation in decision-making at universities claiming that there is an “unhealthy dominance of a situation where staff are always both an authority and in authority. The challenge is to find a place for significant student responsibility in this context” (p. 106). While all assessments need to in some way lead to learning, self-assessment of academic writing in particular can actually raise students’ metacognative awareness of their own capabilities in ways that teacher-only assessment cannot.
Raider-Roth (2005), while not explicitly addressing self-assessment, makes a compelling case for including students in decision-making processes that impact their lives. Like other student-centered theorists such as John Dewey (1903), who likewise advocated for the promotion of democratic environments even within the formal institutional settings of schools, Raider-Roth (2005) implores educators to provide conditions where students feel safe enough to “challenge teachers’ authority” (p. 34). Such environments allow students to be “dangerous and to take risks, to voice that which had not been said” before (p. 34). Empowering students, it would seem, can have a lasting influence on their lives long after they leave the college classroom. However, such a philosophy is not prominent in the Confucian-based education models of East Asia, including Japan (Marginson, 2011). Therefore, introducing student-liberating pedagogies to East Asian contexts may be met with confusion or resistance, at least initially, even at a Japanese liberal arts university based on the western model that promotes developing “adventurous minds capable of critical thinking and sensitivity to questions of meaning and value” (Citation removed for blind review.)
Much of the research in self-assessment uses student self-reporting on reflective-type diagnostic questionnaires (such as a series of “can do” statements) in order to ascertain perceived student competence (e.g. Blanche & Merino, 1989; Harris, 1997; Heilenman, 1990). These questionnaire responses are then compared to teacher evaluations of the students. The fundamental problem with such a system is that it is far too broad. It attempts to address overall student competence, often across skill areas, using different rating rubrics (e.g. a formalized scoring rubric for the teachers that is used in grading, and a reflective one for the students that is not.) Butler & Lee (2010), did not look at graded self-assessments, while Matsuno (2009), who did in fact look at self-assessment on graded tasks, used a “simplified” version of a teacher-rating rubric because it was believed that students were too inexperienced to effectively apply the same rubric teachers use. This two-tier system of assessment may indicate to students that there is one grading procedure for teachers that is “real,” and another one for students that is not. The current study overcomes this limitation in that both student self-raters and expert-raters utilized the exact same points-based grading rubric in order to assign scores to four timed essay tests over the course of two ten-week terms. Reflective questionnaires here were used to elicit student reactions to the self-assessment process, not to self-assess their own perceived writing proficiency.