Introduction

Policy-related dialogue is healthy, given the social and economic importance placed upon teachers and teaching. This complex set of relationships is illustrated in a recent requirement for all Australian teacher education programs to develop or adopt an approved teaching performance assessment (TPA). This study analyses how education academics from a consortium of Australian universities collaborated in TPA design to develop a summative assessment of teacher education candidates’ ‘classroom-readiness’ (DET, 2015) and in line with parameters set by the education authority.Footnote 1 It has necessarily invoked questions of pedagogical purposes of measurement, and universities’ role in the intellectual autonomy they defend. We focus our analysis on the efforts around TPA design requirement to measure ‘the actual practices of teaching, including but not limited to planning, teaching, reflecting and assessing student learning’ (AITSL, n.d., p. 2).

Australia’s Professional Standards for Teachers (AITSL, 2017) define a relationship between industry, government and academia in shaping teaching. Adopted in 2011, and following consultation with the profession and the public, the Standards outline knowledge competencies considered essential to teacher—and teaching—effectiveness, and profoundly influence ITE. The ontological consistency of these standards has been challenged (Buchanan, 2017; Buchanan & Schuck, 2016; Connell, 2009; Mulcahy, 2011; Ryan & Bourke, 2013). In 2017, following a Teacher Education Ministerial Advisory Group (TEMAG) review, an additional teacher quality measure attending the graduate-level Standards was announced: henceforth, and in addition to university course completion, graduating or pre-service teachers nationwide would separately be assessed on ‘classroom-readiness’ (DET, 2015; Buchanan & Schuck, 2016). That is, pre-service teachers' performance in an internship setting would be assessed against relevant Standards via development of a ‘validated’ instrument, ostensibly rivalling University graduation as evidence of teacher and classroom-readiness. There was ambiguity, however, with at least four measurement aspects: Firstly, a generic (and contested (Reid, 2019) ‘classroom-ready’ concept was proposed, acting as an ill-defined meta-standard; secondly, ‘readiness’ connoted a disposition, conflating teachers and teaching as the assessment focus; thirdly, a national commissioning process co-funded two possible designs from consortia of ITE providers, rather than adopting a validated standalone or external licensing system as in other countries; accordingly, no one instrument developed would be chosen to assess pre-service teachers against the meta-standard or the existing Australian Standards. Universities have successfully demonstrated that there can be more than one approach, nevertheless prompting a concern within teaching regulatory bodies about an acceptable number of TPAs.Footnote 2 These ambiguities have provided opportunity for a diversity of disciplines and academics – from psychometricians to sociologists – to contribute to defining Australian TPAs, bringing authenticity to the process and reflection thereon.

Approach and analytical frame

We examine the intellectual endeavour facing ITE academics from a consortium of then nine universities, and from various disciplinary perspectives, in steering and shaping an assessment. Focus group data and critical reflections offered by approximately 12 academics developing a teaching performance assessment in Australia are analysed accordingly. These data, along with TPA candidate feedback, were collected as part of a formal evaluation undertaken after the first year of design and implementation by the AfGT Consortium, using approved protocols. Transcripts of PST focus groups were analysed for perceptions and user-experiences, and thematically coded, with team-member cross-checking. Two sets of PST focus groups were used, representing TPA candidates from two institutions.

We also present an autoethnographic case study authored by an academic at the intersection of designing and also teaching a TPA-integrated unit in one institution. This case study presents a question about duty of care and the extent to which the value of the TPA is rationalised within the confines of being mandatory. In an instance of ‘doing Reflexivity’ (Dean, 2017) we provide a case study that offers a standpoint relative to the power relations between academics and PSTs, and between academics and the education authority mandate.

At the heart of our investigation into teacher and teaching performance assessment is the intellectual and the moral in education. We consider education’s broad and deep social value as an ontological basis for this focus and, correspondingly, a definitive rationale for university-based ITE. Consequently, and as we explain, teaching is imbued with the intellectual foundations and reasoning as befits a profession of its importance and complexity.

Our approach engages with three philosophical/sociological contributions to recent debates about assessment specifically in teacher education. Firstly, Biesta (2009, 2010) highlights the prevalence of assessment and measurement globally for its hegemonic, anti-intellectual and anti-education tendency. Biesta (2014) questions the propensity to value the measurable, over measuring the valuable. We use this as a conceptual lens to analyse how education academics have developed the Assessment for Graduate Teaching (AfGT, 2018), one model of a nationally mandated test all would-be teachers in Australia must now pass. Secondly, we draw on Mills and Goos’ (2017) response to the Australian Government’s intentions and requirements, on the dangers to the independence of ITE pedagogy and curricula. For these scholars, the TEMAG Report (Craven et al., 2014) outlining the introduction of a mandatory assessment of pre-service teachers, mis-reads the social impact of standardising tests for teachers, education academics, and school students. A key concern is the extent to which a national teaching assessment exercise confines teaching to easily measured normative skills, thereby constricting ‘good’ teaching and teacher education (Buchanan et al., 2020).

Graduate or initial teacher performance assessment is well-worn territory (Stacey et al., 2019). Teacher accreditation or licensing programs operate in many countries (Clinton, et al., 2016). A third contribution to our thinking on testing and licensing teachers is Wahl’s (2017) analysis of a divided education academia in the USA, where pre-service testing has been conducted for over a decade; practice-based evidence and debates about unintended consequences of Teacher Performance Assessments (TPAs) highlight social and political tensions emergent when the complexity of learning to become a professional is poorly grasped by governments and education agencies. Wahl observes how scholars differ over the TPA, largely along a familiar epistemological binary: positivist versus critical social and post-structuralist traditions concerning the certainty of knowledge, and measurement in education. Wahl’s exposé of intellectual impasse is salient for Australian academics developing assessment with ‘normative validity’ (Biesta, 2009, p. 35), or measuring the valuable in education. The focus on intellectual work is germane to analysis of teacher education because critical, reflective and disciplined analysis (as an established form of intellectual activity) is a cornerstone of university-based ITE, not least because fostering critical thinking in learners is crucial to teachers’ ongoing professional learning and practice. This is politically contestable without a philosophy of education (Wahl, 2017).

Intellectual work in ITE is also manifested in critiques of educational measurement and policy structures that have prompted an Australian TPA (Mills & Goos, 2017), and in the work of those designing such an assessment (Allard et al., 2014; Stacey et al., 2019). However, there exists a deeper fault-line, one between positivist and anti-positivist intellectual traditions. This concerns the contention that those universities and ‘anti-positivist’ academics resisting a culminating national standardised assessment for teachers are effectively supporting neoliberalism. The argument proceeds thus: ITE providers rejecting accountability measures will hinder the establishment of ‘respect’ for the teaching profession; these ITE providers fall prey to neoliberal arguments that university-based ITE has contributed to poor quality teachers, through inadequate rigour (e.g., Allard et al., 2014). Although university-based ITE in Australia operates under principles of intellectual freedom, scepticism of national teacher assessment or licensure frames academia as publicly irresponsible when school underperformance manifests; intellectual freedom as universities’ raison d’être is construed as culpable accordingly. Consequently, academia bows to state regulation. An intrinsically anti-neoliberal position is pressed into the service of a neoliberal narrative (Allard et al., 2014; Wahl, 2017). Both ‘sides’ of this political rift—positivist and anti-positivist—agree that Universities are integral to quality ITE (Wahl, 2017). This scenario persists in the US where ‘TPA wars’ have endured, where the distinction between professionalism and performance is posed as a proxy for college versus state control of ITE; and where advocacy of non-college and ‘alternate pathways’ for entering the profession are espoused (NCTQ, 2011), wedging universities into focusing on performance of competency, potentially at odds with their holistic tertiary education which is critical.

This background helps explain how Australian universities and education authorities have responded to a mandatory TPA.

Unit of analysis: the intellectual and moral in education

Our experiences within a consortium of universities developing an Australian TPA convinced us we were contributing to designing an assessment that reflected teachers’ intellectual work, that is, of their critical reflection on their context and classroom-informed teaching decisions based upon the outcome thereof. By focusing on the centrality of pre-service teachers’ intellectual work in the AfGT, we argue that the processes of developing the assessment criteria ripened the ‘classroom-ready’ concept beyond narrow definitions of competence. Critical reflection and ultimately reflexivity (standpoint) is a social scientific quality and epistemology that is variously emphasised in education, and realisable through encounter, i.e., through the performance of educating.

Deciding on education as a matter of competency or intellectual activity, and explaining an Aristotelian principle about human action, Biesta (2009) distinguishes between competency as poiesis, that is, making-work, and praxis, the dia-logic or reflexive doing (which might include making). Praxis concerns meaning-making through doing and vice versa. It is a dialectical process; action requires thought and thought requires action (even if that action is to maintain the status quo). Praxis entails an obligation to act and so is understood as a process towards a cumulative reflexivity: phronesis, in Aristotelian terms. Praxis operates within and towards the ability to think critically and theorise, questioning ‘why’ and not only ‘how’ action should occur (Loughran et al., 2019). It is reasoning, but with a caveat that it is not simply and always a cause-and-effect deliberation by one actor (e.g., a teacher). Our dialectic of thought-action is not compartmentalised within the intellectual smelter of the individual mind. As social theory holds, praxis is socially mediated if not socially determined, including within education (Saxena, 2010; deAbreu & Elbers, 2005; Freire, 1993; Vygostsky, [1936] 1986).

These concepts of classical philosophy are not unfamiliar to education (Kemmis, 2012; Kemmis & Smith, 2008). Neither is there disagreement that a professional should have command of agreed competencies for their field of action. The field of action, however, is crucial: Learning, and therefore teaching, among other professions, has an inescapable and deliberate moral dimension (Menter & Tatto, 2019; Kemmis & Smith, 2008), and to engage with the moral dimension requires intellectual engagement and effort. If a real and moral obligation of education is to enable individuals to pursue fulfilment and participate fully in society, including through developing a reflexivity to understand any limits to personal fulfilment in that society, then praxis is what educators ought to be able to do and to teach others to do. If this is value-able in education, then, following Biesta, this is what is worth assessing or measuring.

The aim thus far has been to explain and foreground the intellectual aspect of education and to form a conceptual bridge with the moral aspect of education. Arguments and evidence from practice are abundant about social complexity and unpredictability defining everyday lives, sustaining claims that educators and education are charged with an inherently moral duty to assist learners—people—to tackle that complexity (Ball & Wilson, 1996; Ollsen, 2017). This holds for teacher educator academics, and for the teachers they have taught. This concern has long been defended. As Kerr (1987, p. 35) contends,

It is, then, teachers’ moral responsibility not just to introduce students to the forms of knowledge as the disciplined ways with which others inquire and structure experience but also to help all students understand the importance of making their own choices as well on the basis of disciplined beliefs and values.

Kerr introduces further complexity to teaching in terms of knowledge as ‘disciplinarity’, and as socially constructed. Competency in teaching in this view includes being competent in engaging critically with knowledge and developing that ‘skill’ in others. Accordingly, should a national assessment of teaching, of ‘classroom-readiness’, assess aspiring teachers on their ability to select and deploy a certain learning tool (with instructions) that evidence suggests will improve reading? Or should it assess whether the aspiring teacher applies and reflects upon a given teaching strategy and its relationship to their learners? Arguably both, through various forms of assessment. However, in terms of administering a nationally applicable ‘high-stakes’ summative assessment (Mills & Goos, 2017; Wahl, 2017), a one-time measurement of a performance as competent is contestable because of its power to define the profession. Currently, in Australia there are a number of other TPAs in receipt of or seeking accreditation and claiming to address and assess different components of the Australian Professional Teaching Standards. This indicates interpretations are possible of what should be valued and therefore assessed, and how.

Evidence from the field

The AfGT: making-do?

In 2017, a consortium of (then) eight Australian Universities collaboratively designed a TPA. Participants in the broader research project of which this study is part included initial teacher educators, PSTs, school principals, mentor teachers, school practicum co-ordinators and institutions’ placement personnel, all of whom gave informed consent, including for their anonymised data being used in conference presentations, publications and in teaching. This was partly funded by competitive grant from AITSL; one of only two such grants then awarded. The notion of teaching as intellectual work and arguably the raison d’être for university-based ITE, was addressed in AfGT consortium meetings, with the guide provided to AfGT candidates stating:

The AfGt has been designed with consideration of the complex intellectual work of teaching…Elements 1-3 of the AfGT [planning, analysing and assessing] are linked to a sequence of lessons that combine to assess a pre-service teacher’s capacity to gather and use evidence to reflect on their practice (AfGT Consortium, 2018, p.7)

A telling challenge for focusing on assessing the teacher, rather than their teaching per se, was evident in discussions of video evidence as a TPA requirement. Significant discussion emerged in early consortium meetings and in an AfGT rubric development workshop in 2017 concerning video as a means of assessing a teaching episode, rather than as an aide for the preservice teacher to explain and analyse their planning and situational judgements. While arranging video has its own challenges for preservice teachers (PSTs) and schools, corroborated by our survey data from AfGT candidates (Keamy et al, 2019), the AfGT Consortium’s intention for preservice teachers to video their practice is to ensure they create authentic material for reflection, individually and collaboratively with their mentor teacher.

In an AfGT national workshop in November 2017, and in various consortium meetings that year, the inclusion of video took some time to be appreciated in this context. It is one aspect which sets the AfGT apart from other TPA assessments using video.Footnote 3 Socialised or default assessment practices in ITE based around observing teaching were the subject of critical questioning during ‘practice marking’ rounds at an AfGT development workshop (2018) and were a regular theme in meetings. AfGT assessors would not be assessing the video teaching episode. This contrasts with the assessment of teaching activity via observation as carried out by teacher mentors and professional experience supervisors as part of their remit for assessing ITE programs’ practice component. Focus group data from academics implementing the first-round trial of the AfGT reaffirmed a stubbornly socialised view about video evidence as prima facie material for the assessor instead of the candidate (AfGT Academic Focus Group 3, Dec 8th, 2017). Over time and through the AfGT design process that default view shifted. The important issue in this ‘design episode’ is the consensus making about the unit of analysis: teachers’ awareness of their actions and decisions, rather than an observation of their time in a classroom. This is how the intellectual aspect of teaching could be framed for assessment.

Academics involved in the AfGT implementation were asked in Focus Group evaluations about the adequacy of the AfGT to capture the key elements of teaching. The collective response of one focus group was that the assessment had an additional ‘educative aspect’ in enabling PSTs to evaluate their own processes of teaching as they reflected upon students’ learning. This was corroborated by AfGT candidates in another focus group:

You really had to think about it in more depth, in terms of the students’ learning, or what you were doing as a teacher…You watch and you see your students actually learning, and your teaching’s actually helping them. When you’re teaching, your mind is everywhere, and you can’t. So the video actually helps. It was good to reflect on it (Participant #4)

This response related specifically to including video in the assessment. This insight, however, can be considered alongside the Aristotelian binary between competency (poiesis) and making meaning (praxis). The PST cited above explains an almost autonomic process of teaching (‘your mind is everywhere’) and only re-viewing their own actions provided opportunity to reflect more cogently. This episode also supports the importance for the AfGT to measure that reflection, and the possibilities for theorising action, as opposed to (a) performance.

A significant challenge for educators, particularly in high-stakes assessment is ‘teaching to the test’ (Dullude et al., 2017). Mills and Goos (2017) criticise the standardisation of education that attends such testing, and by implication its anti-intellectualism. We discern a fine line between ‘backward mapping’ of ITE programming, and slavishly practising a TPA. This difference in pre-teaching for the AfGT was exemplified during the 2017 trial in one consortium member institution. In two ITE programs within this institution, lectures scheduled throughout a semester explicitly taught skills including collecting and representing data and collecting evidence to address teaching standards. Students in these programs completed drafts and received extensive feedback prior to submission of their final versions of their AfGT. While in a third ITE program, students received a much briefer series of lectures that introduced and explained the expectations of AfGT components with some illustrative examples of how both qualitative and quantitative data might be represented and used for reflection.

Academic focus group respondents (AfGT Focus Group 3, 2017)Footnote 4 referred to the need for ensuring the AfGT is sufficiently anticipated by candidates (‘backward mapping’), while some AfGT candidates suggested pedagogical techniques of making the assessment explicit (teaching to the test).Footnote 5 As one focus group preservice teacher commented:

I feel like if you do it as a subject within a tutorial, actually having specific time where you’re actually sitting down and working on (AfGT) template 1, and then you’re provided with…feedback like a teacher to a student (Participant #2)Footnote 6

The trial phase inevitably led to more explicit structured engagement between academics and PSTs. We note that academics administering the assessment were, similarly, learning about the AfGT through the trial phase. Calibrating the degree of scaffolding must be part of the process for those of us concerned with an ‘inquiry’ approach to the AfGT as an ‘intervention’ (after Cochrane-Smith & Lytle 2009). Reflecting on what this implies for pedagogy of assessment in a professional field lacking a tidy definition of ‘classroom-readiness’ raises a tension in education concerning assessment, especially of a summative nature. Other AfGT candidates undertaking the trial reported that their ITE program had prepared them well for the AfGT, despite an absence of practice testing or backward mapping certain AfGT elements into earlier subjects. Most PST anxiety related to gaining school permission for videorecording, a logistical challenge that the trial did not fully anticipate.

One micro-feature of standardisation was critiqued by a PST focus group respondent. An element of the AfGT requires candidates to gather assessment data on the lessons planned and delivered in school:

I obviously gained the data that it was asking for, but I was kind of like…it’s not that it wasn’t engaging, it’s just that I don’t think it’s very accurate. I felt like I saw a lot more learning when they were standing there performing something. When I was asking some of the questions, I was cognisant of that as well. I was like, ‘Okay, I’m just asking questions here so I can generate data, but I don’t feel like these are the best ways to assess what they did’ (Interviewee 1).Footnote 7

This reflection illuminates the situationally specific context of assessment data and the intellectual engagement of the PST who ‘felt like [they] saw a lot more learning’ when the students were undertaking something in class. It should nonetheless be incumbent upon the teacher to capture or record somehow the ‘a lot more learning’ in the classroom, and especially to ponder ways to understand that learning, so as to enhance it (AfGT focus group—Part A, University 3, 2017; Buchanan et al., 2020). An academic focus group drew complementary conclusions about the trial:

It’s not good to imply that the only way you can demonstrate quality teaching is to measure it quantitatively. That is a bad message and counter to the messages they’ve been given throughout their course.Footnote 8

This comment reflects at least two issues: firstly, one ‘side’ in the positivist/anti-positivist binary that often defines education academia, explained in an earlier section of this paper; and secondly, affirmation of a moral dimension to teacher education and how assessment might be understood philosophically in relation to ITE and TPAs in Australia. Reconciling each of these points emerges as an intellectual opportunity in the Australian TPA landscape.

One PST account suggested a positive effect of the AfGT on pedagogy and on their learners’ classroom experience:

But it was really good (pause), generally speaking, it definitely did have an impact on my students’ learning, in particular when we were asked to reflect on the different types of communication and instruction-giving. That, particularly for me (pause), I felt really strengthened in my practice, because I was then reflecting on (pause), okay, usually I might just stand here and speak to the students about this, or verbalise the instructions. But I really made an effort to double up on the level of instruction. So, I would spend a lot of time mapping out written instruction; I would also map out rubrics and stuff that I’d sent out to them or given them, so they had written instruction as well as verbal instructions. So, in that sense, it had an impact on them.

This account indicates a deeper engagement by the PST with their students’ learning during the professional experience. While the process added to the PST’s workload, the sentiment captured above clearly values the opportunity and outcomes. The following exchange between two academics also identifies value in the TPA:

Participant 2: That’s why this thing [a TPA]’s been needed; we’ve never had the evidence before, and that’s why I think it’s a good idea, we’ve never had concrete evidence that teacher education students on prac[ticum] have ever learnt anything.

Participant 3: We’ve never insisted on that evidence.Footnote 9

Some form of meaningful, evidence-based assessment of classroom experience is supported in these statements. Yet, these reflections indicate no exercise of intellectual freedom on behalf of academia to change what they clearly have felt was previously inadequate. The TPA design has offered an opening for dialogue between academics as education assessment experts and regulatory authorities about the Standards.

Inevitably, when attempting to measure a standard, it may be the requirement rather than the instrument or evidence that is problematic. Through a process of socialising and deliberating the AfGT rubrics in a design workshop, ITE academics from diverse institutions agreed that the national teaching Standards were in some cases inadequate, and might be reworked, and reimagined. There were signs of discontentment with external regulations:

If it’s a good measure of practice and it doesn’t measure all standards maybe it’s the standards that needed to be reviewed, but that’s not going to happen…we’re not going to have a chance to do thatFootnote 10

Arguably, this academic in the AfGT trial advocates not just a long-overdue re-vision of the Standards themselves; rather, any validation of the assessment is both technical and social, over time, and that in principle, the standards, too, should be seen as socially constructed and mediated as knowledge changes and as evidence of their merits and shortcomings mounts. To deny this is to accept as a self-fulfilling prophecy a decline in academic independence.

Case study

Participation in the AfGT consortium: Anita

As a teacher educator in a regional university, the unique opportunity to co-construct a high-stakes summative assessment task with colleagues from other universities has been challenging, stimulating, and rewarding. I often use narrative inquiry to reflect on and examine my practice. As the coordinator of an ITE program featuring rich, embedded school partnership initiatives, I am concerned about increasing assessment- and curriculum-related constraints as we demonstrate to accreditation authorities an alignment with Standards. Involvement in a TPA design that could further limit academic freedoms, change or out-mode the evidence-based placement assessment my university has designed with school partners, and constrict the perception of an ‘ideal teacher’ (Mills & Goos, 2017, p. 638), was undertaken warily.

In this snapshot, a critical incident (Flanagan, 1954) illuminates the implications of intimate involvement in AfGT design and early implementation; being an insider in a collaborative design process and participating in ‘formative moderation’ with colleagues from other universities, involves difficult intellectual work that is critical, creative and contemplative (Giroux, 1985).

Anita’s failure or…?

The AfGT Standard setting meetings, where members of the consortium hunker down over multiple days to develop agreements about levels of achievement and cut scores, are much more than technical operations. These meetings were my first opportunity to read student work from other universities; to judge my own students and teaching against work done elsewhere. In examining and assessing PSTs’ responses in such contexts, we seek logic and coherence, (apparent) fairness and objectivity (Wildy, 2004), while also weighing moral responsibilities. Accordingly, when reading my own students’ work that I recognised easily, I had difficulty disregarding their backstories and teaching contexts. This made me wonder, as I read, about the circumstances of those PSTs unknown to me. Did their responses echo difficult contextual challenges smoothed by word limits or slick game-playing? This PST was judged not to have met the AfGT Standards, which unsettled me. Previously, she had been assessed numerous times by many, to be a worthy teacher. What faltered?

The PST had secured an ongoing teaching position in the school where the final placement (and the AfGT) occurred. She received High Distinctions on reports completed by two school mentors which included an in-school Round Table Feedback Conference involving her university mentor, school mentors, a school leader, and a fellow PST. Her university course results overall included a mix of High Distinctions and Distinctions, and her first placement also gained a High Distinction; indeed, she attained the highest score of her cohort for that first placement. In the final placement, the PSTs also completed an extended practitioner inquiry, with a pedagogical focus selected by the PST and a similar emphasis on collection and analysis of documentation to improve teaching and student learning. The task aims to build teacher agency, critical reflection on practice, theory/practice connections, dispositions that foster ongoing professional learning, and a commitment to improvement-oriented praxis (Lytle, 2008). This task took place prior to the AfGT and because it requires PSTs to collaborate within the school community and share new learning, we (and our school partners) value its capacity for impact beyond the PST; Anita attained a Distinction.

What went wrong? Did the tool need further refinement? During that two-day meeting our joint examination of a range of work samples and the ensuing dialogue indicated some changes were needed. The incident also made me critically reflect on our teaching practices and whether we had adequately prepared PSTs for the AfGT. I also wondered about the student’s decision-making during the task. A focus group interview including the PST (prior to the standard-setting workshop) indicated two possible causes. Anita devalued the task in comparison with other teaching and learning experiences during the placement.

I feel as though I have made a lot more learning and progress from the practitioner inquiry. It was exactly the same thing: teaching, gathering data, reflecting on that data, implementing new strategies, knowing your kids and knowing what you are doing. I feel like I learnt so much more from the practitioner inquiry…I felt I had more freedom to reflect on the mistakes I have made and the impact of my teaching …

For Anita the ‘artificialness of the task’ and framing the task as ‘another standardised test’ meant that she undervalued the experience. She prioritised the practitioner inquiry as more authentic. She also devalued the AfGT because, while requiring a pass, it had less weighting than other tasks.

Anita also suggested that feeling scrutinised (‘they’re watching me, they’re watching me, they’re watching me’) negatively impacted her ‘performance’, rendering her teaching less ‘courageous’ and ‘natural’. ‘You want to get across that you are a good teacher’, she said, but the perceived pressure to demonstrate this in a slice of video, impeded her capacity to reflect honestly.

My best examples and experiences happened outside of that lesson plan window or may not have been the things you can video tape…when you are trying to present, for me with the videotaping there were a lot of barriers…I know that I did a lot of reflection while I was on placement, but none of it would be through videotaping myself and watching it.

Anita found viewing herself disconcerting. She had recently undergone treatment for a brain tumour, and during the placement wore a wig. She talked at interview about being unable to look beyond her appearance when viewing the footage. While her circumstances are, mercifully rare, her honest reflections (absent from her AfGT response) made me contemplate the complexity of asking PSTs, in such diverse situations and within required word limits, to reflect in a high-stakes assessment on the complexities of teaching and its impact.

Anita’s disengagement from the task raised issues for my colleagues and me in contemplating the stakes in this task and our role in ensuring its successful completion. In these early days of implementation, we grappled with numerous dilemmas and significant questions. Given that we grade all other assessment tasks in our program and that this task takes considerable time, what weighting should we allocate to the AfGT? As soon as we give it hefty weight, we downplay other tasks and create undue pressure on PSTs already learning in demanding circumstances. How can we ensure that PSTs understand the relationship between tasks so that they strategically learn through connection-making? How much do we ‘backward map’ AfGT elements into our courses? Do our models, scaffolds and resources unduly influence responses and lead to something formulaic and contrived rather than authentic, creative and critically reflective? Which educational contexts enable a ‘better’ AfGT outcome? Are theory/practice connections conceptualised in the AfGT manufactured and too focused on fashionable approaches and researchers? How can we ensure that the video is used as the basis for meaningful critical reflection as opposed to technical performances? How could the task operate differently? Are we creating a process that under-problematises teaching and learning interactions? We grapple with such questions in developing creative and purposeful program approaches. I cherish my insider TPA involvement in this difficult, sometimes compromising intellectual work with colleagues from diverse institutions.

The AfGT consortium has constantly pursued an authentic, meaningful teaching/learning assessment tool, with positive ITE impact. Through ongoing dialogue and research with consortium members, we continued this goal. Informal and formal moderation occurs across universities providing stimulus for inquiry and the sharing of good practice; moderation here is formative, facilitating rethinking and reshaping of knowledge and practice. Collaborative AfGT research is another key aspect of the Consortium’s work, and we aim to track over time the impact of the assessment on ITE, newly-graduated teachers, and schools. In an educational context where standardisation, measurement and comparisons profoundly impact teaching practice at all levels, we commit to maintaining ongoing, rigorous discussion about ‘good’ education (Biesta, 2010). While many of us share concerns about the multiple compliance measures that we and our students endure, I am heartened by this experience where question-posing and decision-making is informed by us as field-workers.

Discussion and conclusion

Learning directives and policies mandated on universities raise questions of the role and function of academe, and academic and intellectual integrity. At stake here are control over curriculum, standardisation, teaching to the test and a narrowing interpretation of teaching and teacher competencies (Mills & Goos, 2017). This is not to suggest a defensive call to arms. The TPA is but a snapshot of the pre-service teacher upon graduation, and internal pre-service assessment can ascertain higher order capacities such as critical thinking. Nevertheless, this should trigger questions about the extent to which a policy or directive constitutes governance over its discipline (theory and praxis).

Our research suggests that academics’ key concern must continue to support pre-service and graduate teachers’ intellectual development and work—capacities for critical reflection, research- and data-literacy—over performance as independently measurable. This is often reduced to a tension between the usefulness of theory (knowledge) versus action in the case of teaching. For academics and their profession as independent intellectuals, it is a different field, but the tension remains. At heart here (in two senses?) is the viability of certain socialising processes for mediating accountability and governmentality over academic affairs, in this case the interpretation of industry and government expectations in designing an assessment tool to align with national standards. As our autoethnographic case study approach highlights, there is a tension in what is valued or elided in decisions about measurement and assessment. It is incumbent upon academia to articulate this where it may otherwise be overlooked by mandates based upon other needs.

The AfGT incorporates rubrics to measure teachers’ higher order skills, intellectual capacities, and their demonstrated knowledge of planning and assessment theory and research.Footnote 11 Reflection-on-practice is central to the AfGT. This accords with the original key message throughout the ACDE (Australian Council of Deans of Education) submission to TEMAG; that teachers must ‘think critically’, ‘reflect’ and make decisions ‘based on research and evidence' (ACDE, 2014). This was also the premise of our Consortium’s successful submission to AITSL (Stacey et al., 2019). We demonstrate a further commitment to this in reminding readers of the importance of a philosophy of education and, concretely, in the case study component.

We have offered an argument that a validated measurement can be developed and applied to critical and reflective practice that does not necessarily constitute constraining curriculum or academic independence, as shown in our two data sets: academic and PST focus group participants in the trial’s evaluation; and reflection on AfGT incorporation into degree structures. The challenge for education academics is to recognise and articulate these dimensions, and the multiple disciplinary contexts of their own practice, to inform the policy discourse around measurement and teaching.

The broader standardisation argument that surrounds TPAs requires monitoring—the AITSL teaching standards, ITE accreditation and school systems are all forms of educational standardisation. ITE academics, through a design process have and must continue to exercise their roles to engage intellectually with systematised teaching and assessment practices and unintended consequences. Designing an assessment of teachers’ intellectual work through their reflection on practice, on their praxis over their poiesis, is one way to mitigate a regression to ‘standard’. We have argued that an ability to become analytical practitioners is what an assessment of teaching should strive to assess, to demonstrate that graduates’ developmental ability; to be effective anywhere.

There remains a need for critical thinking among ITE academics collaborating cross-institutionally on TPA design. That thinking concerns reflecting on meaningful measures of PST performance, while interrogating the validity of any agreed measure.