Keywords

The Governing–Evaluation–Knowledge Nexus: How Evaluation Makes Knowledge Work for Governing

In this final chapter, we highlight some of our observations on evaluation and quality assurance (EQA) in Swedish higher education. Our explorations have been guided by an interest in how governing, knowledge, and evaluation are bound together, as denoted by the term “nexus”. In this work, we had a particular interest in studying how the features of evaluation as a social practice make knowledge work for governing, involving different actors, levels, and perspectives nationally and internationally. The broader context of the book is concerned with new forms of epistemic governance of higher education. Our starting point was “that the production of knowledge does not just belong to scientists: it is distributed among heterogeneous experts with a central position to give advice and to guide policymakers” (Normand 2016, p. 129). We acknowledge the role of knowledge as an instrument of power and legitimisation involving language, norms, decisions, beliefs, aspirations, negotiations, and actions that emerge from various actors and institutions (Normand 2016). Throughout the book, we tried to “push theorizing to a thicker description of how actors are socially embedded and how they employ (even if implicitly) that position in influencing others” (Alasuutari and Qadir 2014, p. 71) on the basis of particular conceptions, understandings, and forms of knowledge. Our ambition here is to pinpoint the particularity of evaluation in higher education as a social practice that makes knowledge work for governing. In the book, we provided examples of how evaluation produces “constitutive effects” through forms of knowledge (Dahler-Larsen 2015, p. 24). In the words of Dahler-Larsen (2015, p. 24), “the constitutive effects of evaluation extend to how we know, to our sources of knowledge” as “people change their interpretations and their actions as a result of… knowledge (officially regarded as knowledge or not) which are touched upon or enrolled by evaluation regimes”. The speciality of evaluation in higher education is that it includes work on preparing, making, receiving, and acting on judgements of the quality of intangible phenomena, such as processes, responsibilities, routines, competences, relations, and support. Thereby, evaluation constitutes a certain form of epistemic governance that presupposes, uses, and produces certain forms of knowledge about a present condition that is extrapolated into a desired future. In this sense, the governing–knowledge–evaluation nexus is bound together by a utopian dimension in which evaluation is used to generate, promote, and mediate certain forms of knowledge to pledge (constant) change in ways that fit well with governing ambitions.

In the first chapters, we initially outlined the historical development of national EQA in Swedish higher education and situated this intensified political striving within an international and primarily European context. The politics of (international) comparison and the perceived political need to govern the design and the expansion of EQA, as well as its being taken for granted as a part of higher education governing, were features identified and emphasised in the initial chapters that illustrate the complexities of the governing, knowledge, and evaluation interactions. We also illustrated that the EQA policy’s activities, processes, and the pace by which they are produced are important when unpacking subsequent enactments and governing work.

We documented the manifestation of these evaluative activities in terms of evolving “evaluation machines” (Dahler-Larsen 2012). In this context, we pointed to the important role of the work of particular actors, such as vice chancellors, interest groups and organisations, the media, policy professionals, intermediaries, and “qualocrats” (see below), who, via evaluation, become situated as enforcers, enablers, and governors. As our account became more empirically fine-grained towards the end of the book, these overall findings were analysed in more detail and with an explicit intention to be empirically exhaustive. In these analyses, we have shown how internal and external evaluation and quality assurance policies and practices are created, merged, and enacted (i.e. “worked”), and we pointed to the numerous activities and the amount of labour that go into these processes. We also discussed how the forms and qualities of “knowledge” are promoted or advanced by this work and how knowledge transforms between phases through various kinds of action (Freeman and Sturdy 2015). Doing EQA thus entails various forms of translations and results in a continuous expansion of evaluative work and activities. We label this emerging process a evaluation machinery.

This final chapter is divided into two main sections. First, we continue the above discussion, initially by revisiting the ideal typical notion of an increasingly institutionalised evaluation machinery and by looking into how its engineers and main operators, the qualocrats, can be understood. We also discuss the art of making judgements, a central aspect of “binding together” forms of knowledge to make evaluation work. In the second section, we discuss the expansion in terms of the increasing complexity of EQA work in higher education and recognise some possible implications and problems in terms of resources and sustainability.

One important argument in the first section is that there are different ways in which evaluation makes knowledge work for governing. The first way corresponds with “conventional” interpretations of epistemic governance, or soft governance. Here, power and knowledge operate through certain automated technologies that trigger and stimulate behaviours. The way comparisons are introduced in the evaluation machinery offers one such example. Comparisons – and the very notion of comparability that comparisons between HEIs and/or programmes by the state and/or by students are possible – automatically initiate new behavioural protocols based on informed work within HEIs. The second way evaluation makes knowledge work for governing is located in the concrete and often meticulous daily work and interactions of actors to make evaluation work. As demonstrated in previous chapters, such work involves different forms or phases of knowledge, like inscribed knowledge. It also involves work to produce or act based on texts, such as policies, evaluation criteria, or self-evaluations. The work may also implicate meetings where language and embodied forms of knowledge are enacted to process, transmit, or produce new knowledge. This work includes numerous translations and mediations that may cause misunderstandings, inconsistencies, or problems traditionally described in terms of validity and reliability. On a somewhat different level, these laborious activities may add little to promote organisational improvement. After a demanding external evaluation, actors within HEIs may conclude that they did not learn anything about “the quality” of their work or their organisation that they did not already know. Alternatively, evaluands may find the evaluation report inaccurate, obscure, or otherwise difficult to use for purposes of organisational change. Thus, programmatic (normative) and technological (operational) elements are not always perfectly aligned (Power 1997), implying that situations occur when evaluations do not work according to official transcripts or ideals. Our point here is that, even under such conditions, evaluative activities still draw actors’ attention to policies and consolidate actions related to policy in particular ways. Hence, regardless of whether evaluation actually “works”, the processes initiated by evaluation can still make knowledge work for governing, albeit not exactly how the “evaluation machine engineers” may have intended.

The Emerging Evaluation Machinery

The previous chapters have documented the rise of Swedish EQA in higher education. Up until rather recently in the history of higher education, anything of its kind simply did not exist. In fact, just a few decades ago, any intrusion by external agencies on HEIs, their inner work, and freedom, was unthinkable. As a reminder of the pre-EQA zeitgeist, the Robbins Report on higher education, an important post-war policy document in the UK, offers a good example. In this report, it was emphasised that:

such freedom is a necessary condition of the highest efficiency and the proper progress of academic institutions, and that encroachments upon their liberty, in the supposed interests of greater efficiency, would in fact diminish their efficiency and stultify their development. (Committee on Higher Education 1963, p. 228)

Today, on the other hand, the national evaluation machinery is a permanent, institutionalised feature in HEIs, with implications for language, culture, ideas, practices, and artefacts produced by these institutions. The permanence of this machinery and the almost naturalised status of its basic and underlining principles has evolved over time in accordance with the rise of what has been described as the making of the evaluation society (Dahler-Larsen 2012), the audit society (Power 1997), the evaluative state (Neave 1998), and audit cultures (Strathern 2000). The general process by which such permanence is socially established has been the focus of much social theory. As noted by Bourdieu (1977, p. 164), “[e]very established order tends to produce (to very different degrees and with very different means) the naturalisation of its own arbitrariness”. In this particular case, belief – often taken for granted – in ideas about fundamentally uncertain and fragile concepts, such as “quality” and “continuous improvement”, is particularly fascinating. Travers (2007) claims that quality assurance (QA) is simultaneously pervasive and diffuse. This means, for example, that any consumer of EQA discourse will struggle to work out the actual meaning of the concept of quality. In the words of Warzecha (2017, p. 11), “the extensive explanations and notes that often accompany this term support the impression that something is defined that in fact remains completely in the dark”. At the same time, quality has established itself as “the cross-boundary norm against which all areas of HEIs ought to strive against” (Schoug 2006, p. 65). One of many possible explanations of the pervasive naturalism of “qualispeak” is that it is fused with academic values, such as transparency and accountability (Lorenz 2012, p. 625, see Shore 2008). As stated by Shore (2008, p. 291):

This may also be part of the reason why audit culture is so difficult to contest; the university environment has become so steeped in managerial principles and practices that it is difficult to find that Archimedean point outside of the system that enables us to critique it.

Technologies of and in the Machinery

In this book, we have demonstrated the role played by various technologies within the evaluation machinery. In this context, technologies are conceptualised not as material aspects, such as communication technology or other tools, but as methods and procedures for governing human beings that operate in a Foucauldian sense. We discern how technologies, such as visibility, comparability, standardisation, economic incentives, and rewards and sanctions, have become embedded in the national machinery over time. We can also discern that the techniques themselves have mutated over time.

The technology of visibility is one example of how such change is introduced for knowledge to work for governing. Visibility offers means to reduce complexity through abstraction, categorisation, and data concentration (Dahler-Larsen 2012). For instance, visibility was originally an effect of state attempts to “scan” (Gröjer 2004, p. 64) or make all universities and their inner workings visible by means of producing comprehensive knowledge about the sector. This line of reasoning is congruent with Scott’s (1998) ideas on state vision and large-scale state measures to improve society and its institutions by simplifying local practices and making them legible during the twentieth century. Simplification and legibility are associated with “a synoptic view of a selective reality” that imposes a certain “logic on the very reality that [is] observed” (Scott 1998, pp. 11–14).

Over time, visibility has transformed from a kind of “singular technology” operating through the state centric gaze upon individual institutions into a “polycentric technology” of permanent visibility. The state agency has continued its observations, but higher education institutions and actors have made themselves more and more visible not only to the state but to themselves and other institutions, prospective students, stakeholders, and the general public. As noted by Foucault (1977) in his studies on the concept of panopticon, visibility is an anonymous power that produces efficiency, responsibility, and discipline. The potential of visibility in this respect has been part of management thinking for a long time. As evidenced by the Hawthorne experiments in 1924, the observer’s effect on behaviour was used to increase industrial productivity. Visibility as a technology is thus based on general knowledge, as perfected by Mayo (1946) and proponents of the Human Relations School (the forerunner to total quality management, TQM), namely, that people behave “better” when under observation compared to when left to operate on their own. Interestingly for us, visibility within the evaluation machinery has also been refined along these lines, as it has become a part of monitoring and follow-up systems that are increasingly taken for granted. For instance, recent developments illustrated in the previous chapters, in which EQA has incorporated enhancement ideas emphasising the value of anchoring, dialogue, and involvement, can indeed be discussed in relation to lessons drawn from Mayo (1946). Mayo linked governing style and morale levels to productivity levels, pointing to the need to see and listen to employees and show interest in their working conditions to increase motivation, even if conditions did not change. Such a case history complements our understanding of the rationale behind attempts to include all aspects and divisions of HEI organisations in continuous, expanding attempts to improve quality.

The Importance of Human Interaction

Indeed, technologies are examples of the “automated character” (Dahler-Larsen 2012, p. 180) of the evaluation machinery. Even so, we have also shown how embodiment is crucial for the construction and operation of the machinery. In this sense, our observations could be used to reconsider the line of reasoning provided by Dahler-Larsen (2012, p. 180), who emphasises how formalisation reduces “the need for expertise, wisdom, and educated staff”. Our data has shown that formalisation in the context of Swedish national EQA in the form of the detailed frameworks and guidelines intended to increase homogeneity and comparability requires substantial human interpretation and translation. In this context, it is interesting to recall Wittgenstein’s reminder that following rules is a social practice and that we need tacit forms of knowledge even to follow the simplest rules: “following a rule is [in fact] not like the operations of a machine” (Wittgenstein quoted in Taylor 1995, p. 168). According to Wittgenstein, rules are followed on the basis of an understanding that “is always against a background of what is taken for granted, just relied on” (Taylor 1995, p. 167). The operation of an evaluation machine demands such forms of knowledge. Then again, the scripted nature of the machine and the demands it places on representation distinguish it from aspects of more everyday social practices where rules are primarily followed on the basis of automated social understanding or “habitus” (Bourdieu 1977). Our data show that frameworks and guidelines do not contain the principles of their own applications. In our interviews, we have asked our informants how they perceive certain common concepts that are frequently used in the context of the evaluation machinery (such as “ensure” [säkerställa] or “quality”). We have recurrently observed how our informants become uncertain when asked these questions. In this sense, policy demands for homogeneity and comparability, which are based on these ambiguous concepts, seem to feed insecurity and the need for collective labour. Face-to-face meetings and negotiations between actors are often necessary for the machines to work. When actors do not trust their own interpretations, they must meet with one another to help each other “learn” to understand rules and “make sure” that they have achieved a mutual understanding of the rules. In this sense, physical meetings are crucial arenas for such interaction and work (Freeman 2008).

The importance of human interaction is a central finding of our project. As outlined above, technologies would – at least in theory – result in evaluation machines distinguished by an automated character with reduced need for human intervention and expertise. Such a version of evaluation machines would resolve problems related to coordinating dispersed knowledge following Hayekian ideals (Hayek 1945; see the chapter “Hayek and the Red Tape: The Politics of Evaluation and Quality Assurance Reform – From Shortcut Governing to Policy Rerouting”) and ultimately result in an anonymous, self-governing disciplinary power working in accordance with ideas presented by Foucault (1977). A possible way to understand why this has not been the case is to consider the difference between complex and complicated systems and the implications of this difference on higher education governance and evaluation. According to Glouberman and Zimmerman’s (2002) distinction, complicated problems (like sending a rocket to the moon) are intrinsically different from complex problems (like raising a child or managing a health system). Whereas complicated systems are planned, predictable, and engineered, complex systems are unique, relational, and uncertain because they consist of multiple interconnected components or actors that change adaptively (Glouberman and Zimmerman 2002). This rough ideal-typical distinction is informative because it draws attention to how actors and their forms of communication are imperative for knowledge to work for governing in the realm of the evaluation machinery.

We have noted that the evaluation machinery of Swedish higher education is simultaneously a complicated system and a complex system. As a complex system, it draws on technologies that make things happen seemingly without premeditated orchestration. However, it is also a complicated system in the sense that the scale of system design and the requirements for coordination and interpretation of components by specialised experts are significant. This means that the more concrete acts of making things visible or performing comparisons are far from spontaneous or straightforward. As argued by Mowles (2014, p. 163), the above distinction between complicated and complex systems fails to acknowledge that problems of the former kind – presumably instrumental and straightforward – always comprise “widespread mutual adaptation and improvisation, disagreements, lacunae, the unexpected and the contingent”. Internal and external forms of evaluation involve taking processes of management and teaching apart and analysing the details. It is a highly engineered and detailed system that requires evidence, planning, management, role definitions, and alignment. It also requires labour-intensive human interactions in the form of meetings whose purpose is to identify and eliminate fallacies caused by the differentiation and fragmentation built into the machinery. Process elements in EQA may be forgotten, described incorrectly, evaluated incorrectly, put in the wrong order, or separated or joined in the wrong places. Unnecessary process elements may be added, or mutual dependencies of process elements may be incorrectly described (Warzecha 2017). This work is inevitably done by actors, and in our project, we have identified particular forms of work in which actors – namely, the qualocrats – are situated as both engineers and operators of the evaluation machinery. Next, we turn to these actors and their background, work, and knowledge.

The Work of Qualocrats

We envision qualocrats as powerful actors in a form of government or rule (from Greek kratia and kratos) in the name of quality and quality improvement. This group of actors embraces many traditional bureaucratic values and ethics, but its power is concentrated not in administrative bureaus but in webs of organisations, institutions, policies, practices, and work related to QA and evaluation. Along these lines, Travers (2007, p. 11) has discussed the rapid development of QA as a new “specialist occupation” in society. Travers (2007) traces the origins of the quality movement to twentieth-century ideas about TQM in Japanese and American industry. These ideas eventually made their ways into the public sector as a form of “reinvention of government” (Travers 2007, p. 15). Qualocrats belong to this wider tradition, but no mission statement in terms of introducing management ideas from the private industry to the public sector guides them. Overall, qualocrats do not constitute a homogenous group, and they come from somewhat different occupational backgrounds. We thus want to emphasise the fluidity that marks this community of actors. A small fraction may work full-time with quality issues as quality managers in HEIs or as vice chancellors, whereas others work part-time. Directors of studies or teachers within HEIs may be “interpelled” as qualocrats and carry out qualocrat work (e.g. prepare and assemble information or answer questions) during specific evaluations. Some are asked – and agree – to work as external reviewers, and others function as quality experts that are consulted by HEIs. Notably, most qualocrats are trained researchers, and some manage to pursue their scholarly careers parallel to their work as qualocrats.

Some qualocrats in our study have a short career path within management and have specialised in issues of QA through national and international networks and organisations. Through deliberative attempts during recent decades, to encourage involvement of students in governance structures as stakeholders, such as the Bologna Process, QA issues now offer a career path for students within student unions and the European Association for Quality Assurance in Higher Education’s (ENQA’s) member agencies, both nationally and internationally. Other qualocrats have long backgrounds within the HEI sector and the evaluation field. This group can use their experience to bridge “old” knowledge with evolving evaluative ideas and systems in HEIs that must adhere to new societal and economic demands. Such bridging is enabled by contemporary outlooks on QA rooted in ideas that have long traditions outside the HE sector.

One way to initially locate the stature, knowledge, and work of qualocrats is to situate this group of actors within wider “professional landscape” (Brante 2013). Although their work is strongly linked to TQM and the Swedish subject Kvalitetsteknik [Quality technology, Quality management, or QA], their educational background is diverse, and there is no common doctoral programme anchoring them within the HE sector. Qualocrats are interdisciplinary, which means that “there is no robust, systematic, generally recognised, shared paradigm that unites practices” (Brante 2013, p. 6). Hence, under the surface of the seemingly coherent public manifestations of qualocrats, there may be competition and struggles over “jurisdiction and the basic doxa” (Brante 2013, p. 6). Just like semi-professionals, such as social workers and teachers, qualocrats have expanded in sync with the new layers of bureaucracy. To a great extent, they are governed by regulations and policy – but on the other hand, they have also established themselves as policy actors who influence both the making and enactment of EQA policy. As a pre-professional group, qualocrats have managed to create a niche for themselves in the wake of transformations within HEIs as “new modes of rationality in the public sector (such as New Public Management) has generated demands for various types of leadership experts, management consultants, specialist consultants” (Brante 2013, p. 8).

The identification of new groups of actors in the emerging field, such as EQA experts, consultants, officers, coordinators, vice chancellors, and vice deans, has been noted elsewhere. One strand of research classifies these actors as members of the growing management within HEIs who have been profoundly transformed by New Public Management and marketisation (Hall 2012; Lorenz 2012; Alvesson and Spicer 2016; Graeber 2018a, b). In pejorative terms, Graeber (2018b, not paginated) writes about so-called taskmasters who create extra work for academic staff who find themselves “spending less and less time studying, teaching, and writing about things, and more and more time measuring, assessing, discussing, and quantifying the way in which they study, teach, and write about things”.

This criticism aside, most qualocrats in our study are defenders of general Humboldtian ideals in terms of academic freedom, and they see themselves as protectors of collegial norms and autonomy. Nevertheless, the qualocrats’ position is far from easy, and they can be seen as relays between two narratives (Jarvis 2014, see also the chapter “Hayek and the Red Tape: The Politics of Evaluation and Quality Assurance Reform – From Shortcut Governing to Policy Rerouting”): On the one hand, they advocate traditional academic ideals and virtues, but on the other hand, they work to establish “regulatory regimes that seek to manage, steer and control the sector in ways that serve the interests of the state and the economy by applying specific ideational motifs about efficiency, value, performance, and thus the economic worth of the university to the economy” (Jarvis 2014, p. 156).

As we have indicated in the book (see the chapter “National Evaluation Systems”), the previous history of the national evaluation machinery shows that formation of qualocrats as a distinct group could have taken an alternative route. In the 1970s, early prototypes of the evaluation machines were engineered by scholars from the educational sciences who set up prospects for the future in terms of system evaluations that included information of “forms of teaching, teaching methods, students’ pre-conditions, study environment, study habits, teachers’ working conditions” (Gröjer 2004, p. 61). As it turned out, evaluation machines transmuted over time and made such pedagogical queries obsolete: There was no official interest in these aspects of social life in HEIs. In addition, the machine operators morphed into the current heterogeneous body of actors.

What, then, do qualocrats do? Shore and Wright (2000) identified such experts within the new political technologies in higher education almost 20 years ago and argued that they fulfil four main roles:

First, they develop a new expert knowledge and a discourse which create the classifications for a new framework or template of norms, a normative grid for the measurement and regulation of individual and organizational performance. Second, their grid and expertise are used for the design of institutional procedures for setting targets and assessing achievements. Third, certain of these experts staff and manage the new regulatory mechanisms and systems, and judge levels of compliance or deviance. Fourth, they have a therapeutic and redeeming role: they tutor individuals in the art of self-improvement and steer them towards desired norms. (Shore and Wright 2000, p. 62)

Qualocrats, as we have come to know them in our project, correspond well with this characterisation, but there are some additional comments to be made. First, we acknowledge the dual function of qualocrats as engineers and operators of the evaluation machinery. We have noted in the book how boundaries between evaluators and those subjected to evaluation have been gradually dissolved by new modes of deliberation and influence over forms and content in national EQA. As a consequence, qualocrats – who play a central role in such processes – acquire knowledge that can be “brought home” and used in local organisations (for instance, in HEIs). Such knowledge may be manifested in strategies to organise responses to specific EQA. One example in the data, mentioned in the chapter “Re-launching National Evaluation and Quality Assurance: Governing by Piloting”, was the establishment of a “liaison central” during site visits that served to coordinate actors who took part in interviews, their responses, and their emotional reactions. Such modes of qualocrat-led professionalisation of internal strategies to deal with external evaluation can determine the formal outcome of the entire exercise – it can be the difference between a pass or fail grade. In addition, the role of qualocrats is also that of mediators and translators, as their work has become increasingly regulated by inscription in frameworks and guidelines. We also see a development regarding the fourth role, steering individuals towards “desired roles” (see above). Particularly in the period after the much debated 2011–2014 national EQA system, qualocrats functioned as experts in facilitating “compensatory legitimation” and dealt with problems of distrust on the national and local level among actors in HEIs in attempts to “socialise people into certain attitudes and dispositions towards authority, performance, cooperation” (Weiler 1983, p. 273; see also Neave 1998, pp. 270–271). Finally, as we have argued above, the expertise of qualocrats and the discretion they exercise embody sets of expectations and beliefs that tend to be dissimulated as a matter of “technical” procedures or ideologically and theoretically innocent strivings for quality. In the next section, we explore this issue further by turning to the centre of evaluative work – judgement making.

The Burden of Judgement

The previous accounts of EQA in this book have recurrently touched upon the judgement making work. As we pointed out in the first chapter, judgement is an inherent feature of evaluation in which certain knowledge becomes activated and thereby used for governing. Parts of this judgement work are done by qualocrats, and some external assessors could very well be labelled qualocrats. However, all external assessors are “charged with making decisions subject to standards set by a particular authority” (Dworkin 1978, pp. 31–32 in Molander 2011, p. 321). In our study, the “authority” is the state, as represented by the Swedish Higher Education Authority (SHEA), and the “standards” are materialised in the inscribed knowledge laid down in, for example, the designs of the national EQA systems and the instructions and guidelines for the training of assessors. These standards, or conditional frames, constitute an inscribed “discretionary space” (Wallander and Molander 2014, p. 2) for the external assessors.

One of the conditions that frame assessors’ work, their discretionary space within the machinery, was the invention of a cut-off score in the 2001–2007 EQA system (see the chapter “National Evaluation Systems”). Linked to rewards and sanctions, cut-off scores condition assessors’ judgement making. The implications of cut-off scores are known from debates related to high-stakes testing, and in congruence with these discussions, we have seen how cut-off scores have been accompanied by sanctions, penalties, funding reductions, and negative publicity in the higher education sector. In addition, cut-off scores have resulted in heated discussions if they are perceived to be incomplete, flawed, or unfair because the consequences are based on judgements made by a relatively small number of assessors. Thus, judgement making involving cut-off scores tends to pull those subjected to evaluation and other audiences into this social practice in dramatic ways. Cut-off scores make it possible to judge HEIs’ internal QA systems as “failed”, meaning that they must take measures and be reassessed and hopefully judged in a second round as attaining the minimum standard for approval.

Another interrelated condition is the detailed manuals produced for the 2016 national EQA system, which were aimed at standardising judgements. In this work, the SHEA strove to direct external assessors’ judgements in particular ways by specifying a large number of perspectives, aspects, and indicators. This SHEA work was done to narrow the discretionary space for judgements across different assessment panels. In addition, in this EQA system, all perspectives, aspects, and indicators must be assessed as good enough for the HEIs to be judged with a pass grade. The external assessors are not allowed to let a strong result for one indicator compensate for a weak assessment result on another indicator. Only if all conditions are found to be satisfactory should the assessors assign the HEI or educational programme in question a pass. Finally, the 2016 EQA design prescribes that the external assessors must reach a unified judgement. They are not allowed to attach notes saying the decision was not unanimous.

This way of defining external assessors’ discretional space expresses a means–ends relationship or an instrumental norm (Molander 2011; see also Wallander and Molander 2014). In the present case, this discretionary space thereby becomes reduced, in line with the SHEA’s ambition to enhance comparability across panels and HEIs, a process framed as ensuring equivalent and fair judgements. In spite of these ambitions, the external assessors had to rely on “discretionary reasoning” (Wallander and Molander 2014, p. 2) to reach a final, unified judgement. They had to engage in deliberations about the circumstances and traits of an individual case to come to a decision (Molander 2011, p. 330). We find that discretionary reasoning is an appropriate expression for characterising the work of the external assessors’ use of the various indicators set up by the SHEA to come to a decision on how to grade individual HEIs. Leaning on Rawls (1993), Molander (2011) discusses the premises of discretion and discretionary reasoning and points to the many difficulties that such reasoning often entails: Even if professionals “reason as carefully and conscientiously as possible they may arrive at different conclusions” (Molander 2011, p. 330). In this work, the external assessors’ embodied knowledge is enacted in the form of previous experiences from similar work, something valued by the SHEA in their recruitment of assessors. Despite explicit formulas of inscribed knowledge in guidelines, etc., informed judgements still require discretionary reasoning, and such reasoning may end up in “sensible disagreement” (ibid.). Borrowing Rawls’ expression “the burden of judgement” (Rawls 1993, Lecture 2, §2), Molander argues that the burden of discretion implies that consensus cannot be expected in certain areas by sensible persons (Molander 2011, p. 330). Relevant facts can be complex and contradictory, and there may be disagreements about how to weight different considerations, and interpretations can vary (Rawls in Molander 2011, pp. 330–331). As noted in the chapter “Re-launching National Evaluation and Quality Assurance: Governing by Piloting”, not all external assessors were entirely comfortable with the SHEA’s demands that all panels are to agree on a unified judgement and not having the possibility of, for example, a strong outcome in one area compensating for a weaker outcome in another. Long deliberations were therefore occasionally needed to reach a unified agreement that all panel members could agree on.

As mentioned above, a new condition of the 2016 national EQA system was the inclusion of detailed instructions to the assessors with a differentiation of a large number of perspectives, aspects, and indicators to be applied consistently in each assessment. Assessors had not been instructed in such detail in earlier national EQA systems. We may say that the SHEA wished to convey specific inscribed knowledge to its assessors. This knowledge became enacted as the SHEA staff organised training sessions with the assessors. The modes of inscribing knowledge in this highly complicated infrastructure of rules thus touch on the classic and recurring question through the history of science of whether a set or whole can be determined by its elements or parts. In this context, Warzecha (2017, p. 40), referring to Bertrand Russell and mathematics, has argued that “[f]or process orientation – that is the dissection of all work processes – the following holds: the more differentiated the requirements are, the more probable it is that the number of wrong interpretations, of miscalculations and misunderstandings increases”. The burden of judgement is also a burden of differentiation – or as the popular saying goes: “the devil is in the details”.

A different aspect of the burden of judgement is related to the dual purpose of the system to develop and control quality in higher education, as laid out in the system directions from the Swedish government (Government Petition 2015/2016:76). Evaluation and assessment theorists have long argued for the potential of assessment for development and learning, specifying formative assessments as assessments aimed at enhancing the quality of the characteristic at hand and summative assessment as the practice of making a final judgement on which to base decisions (e.g. Scriven 1967; Sadler 1989; Shepard 2000; Black and Wiliam 2009). In essence, the 2016 national EQA system is intended to combine formative assessment with a summative assessment in its control function. As noted above, the system includes a cut-off score and, therefore, a risk of being assessed as a failed HEI, which suggests that the outcome of the judgements involves high stakes. According to our informants, this particular feature of the system design makes the control function more prominent and the development function less so. Instead of “opening up” to assessors, HEIs have incentives to portray themselves in a more favourable light or even try to hide weaknesses.

In summary, the SHEA’s work of designing the framework for assessment, as well as the work of assessors in judging the quality of the HEIs’ internal quality assurance (IQA) systems, is laborious and difficult. The SHEA has invested vast energy and resources in dialogue with the HEIs to persuade them to embrace the 2016 national EQA system in a positive way. This work also includes training and dialogue with the assessors with the aim of producing equivalent and fair assessments as part of the audit process. The external assessors were selected by the SHEA on the grounds of their embodied knowledge concerning academic work and EQA, but their discretionary reasoning was constrained by extensive manuals and frameworks, limiting the discretionary space. This tells us that the burden of judgement also relates to the epistemological assumption of rationality and automatization inherent in the idea of evaluation machines (Dahler-Larsen 2012). However, some evaluation theorists, such as Eisner (1999), stress a variety of opinions in judgements to support improvement and meaningfulness: “A life driven by the pursuit of meaning is enriched when the meanings sought and secured are multiple” (Eisner 1999, p. 658). Then again, such an evaluative design is not compatible with purposes of authoritative accountability.

Expansion and Increasing Complexity

In this final section of the book, we shift focus and theoretical gears to discuss the more general issues of expansion, resources, and sustainability. Our report on the proliferation of the national evaluation machinery in higher education prompts us to ask questions about costs that are relevant today and that may become even more so in the future as, and if, the expansion continues. As Dahler-Larsen (2012) has argued:

The evolutionary imaginary inherent in the evaluation machines tends to increase the costs through steady expansion of the machines, but it also tends to keep the costs out of sight and discussion. (Dahler-Larsen 2012, p. 187)

Even if many qualocrats and HEI actors in our study are concerned about this issue, for example, the increasing workload and personnel time that is subsumed in additional EQA activities, they also take part in designing, constructing, developing, integrating, engineering, operating, and feeding the machines.

Posing questions about costs, resources, and sustainability is also a way to reduce forms of ontological reductionism and myopic vision. Evolutionary theory offers a tool to inform such choices by providing a historical perspective on social change. In this final section, we therefore turn to the American anthropologist Tainter (1988), whose work belongs to the broad tradition of Marx, Tönnies, Durkheim, and Parsons. We do this to discuss investments in new layers and practices of qualocratic infrastructure that have been introduced as a state response to essential problems but which may have already reached or will eventually reach a point of diminishing marginal returns.

In recent years, the literature on governance and complexity has become extensive. If we consider this book’s account of the relatively short history of EQA in Swedish higher education, expansion and growing complexity stand out as decisive features. We agree with Jacobsson et al. (2015) that growing complexity involves a “dual governing problem” for the state: “its internal complexity and multifunctional nature poses a problem for the governing of the state which, in turn, significantly impacts the capacity of the state to govern society” (Jacobsson et al. 2015, p. 13). We thus identify two ways in which the concept of complexity is informative for us in understanding the governing–evaluation–knowledge nexus: first, complexity as an “object” of governance and, second, growing complexity as an “intrinsic” aspect of governance.

Often, complexity is conceptualised in line with the first interpretation, namely, as a kind of external predicament for state governance to act on, balance, and hopefully reduce. Here, complexity in the form of constant change, uncertainty, and unpredictability in the world is seen as a problem of state governance to address (Kooiman 1993; Pierre and Peters 2005; Duit and Galaz 2008; Rhodes 2011; Room 2011; Jacobsson et al. 2015). One important notion in this literature goes back to the prevalence of “wicked problems”, i.e. the idea that “in the complex world of social planning” (Rittel and Webber 1973, p. 165), important socio-political problems confronting governance are inherently difficult or even unsolvable. For instance, problems within higher education, in terms of “quality” in knowledge production and teaching, may not be wicked in the same way as issues like global climate change or social injustice and poverty, but even so, they are complex, and they appear to grow even more so over time.

Our study has dealt primarily with complexity as an inherent feature of HE governance and EQA, i.e. as something that evolves as the state seeks to address and resolve various perceived problems. According to Tainter’s definition, complexity in human systems is a problem-solving activity that refers to increasing differentiation and specialisation in structure combined with increasing integration and organisation of parts (Tainter 1988). Using this broad framework, our case could be summarised as an identification of how evaluative systems gradually evolve to encompass more professionals and roles; system designs that are (re)assembled and combined over time; and additional techniques, indicators, areas and aspects, templates, and information communicated through increasing numbers of channels. We have also highlighted the amount of work that goes into organising and enacting these activities undertaken within state agencies, such as the SHEA and its agency predecessors, and in HEIs.

Looking at one aspect – information sharing – we can discern a substantial change over time. Until the 1990s, information about evaluations was distributed to a small number of recipients: the evaluated HEI, the Ministry of Education, the advisory committee, the government, the parliament, and libraries. Today, the SHEA distributes results and other information through their website, press releases, newsletters, conferences, documents, and various social media platforms, such as Twitter, to inform a broad range of perceived stakeholders. Moreover, changes in the ways information is actually used as a problem-solving tool produce increasing complexity. Until the 1980s, evaluations primarily served the purposes of the state, and the assessed HEIs were seen as free to use information from evaluations at their own discretion (Gröjer 2004). Eventually, in the early 1990s, the idea was introduced that HEIs must actually use and adhere to what was presented in the evaluations. Follow-ups were introduced to assure compliance, and with time, economic and other sanctions were introduced along with stronger demands for comparability in the form of the idea that HEI quality ought to be comparable on the basis of information from national evaluations. As a result, these developments called for more standardised evaluations to assure that such comparability was valid, which, in turn, produced new forms of work as the evaluations themselves became the objects of examination and critique. Today, as we have shown in previous chapters, information processing has evolved, and the communication load has also become more of an internal phenomenon within HEIs. Development of IQA systems and external demands on HEIs in terms of, for instance, production of self-evaluations in new process/divisional organisations leads to more interfaces and unavoidable friction loss in communication between various actors, institutions, or parts of organisations (Warzecha 2017). Another striking example is the way in which information about IQA must now percolate within HEIs in ways that reveal certain feedback loops that constitute evidence of quality to external observers. Thus, whereas information about evaluation in HEIs was initially a matter for the state bureaucracy, it has become something that ought to be spread to and used by all divisions, units, and employees in each institution. In this way, our data indicate that EQA developments appear to contribute to more general patterns of increased information load and management in higher education.

For Tainter (1988), complexity is a neutral term – that is, neither good nor bad. It has the potential to be a productive and functional problem-solving tool but only as far as available resources allow: Complexity requires sufficient energy flows for maintenance, and more complex systems require more energy than simpler ones. Tainter’s (1988) rather pessimistic conclusion from studying the collapse of civilisations and systems of agriculture and energy is that all living systems eventually reach a point of diminishing returns, i.e. a situation in which complexity grows too costly. On a smaller scale, systems of innovation and knowledge production are subject to the same evolutionary dynamics (Strumsky et al. 2010). This means that over time, growing research institutions have come to produce increasingly specialised and narrowly useful knowledge at growing costs:

The productivity of innovation is not constant. It varies not only with incentives and knowledge capital, but also with constraints. Research problems over time grow increasingly esoteric and intractable. Innovation therefore grows increasingly complex, and correspondingly more costly. It grows more costly, moreover, not merely in absolute terms, but relatively as well: In the shares of national resources that it requires. Most importantly, as innovation grows complex and costly, it reaches diminishing returns. Higher and higher expenditures produce fewer and fewer innovations per unit of investment. (Strumsky et al. 2010, p. 497)

Interestingly for us, and as shown in the chapter “National Evaluation Systems”, EQA came to be perceived as a means to address the problem of expansion and increasing complexity within higher education as elite institutions grew to mass universities in the 1960s. Such expansion involved new and growing HEIs, new research and teaching subjects/programmes, recruitment of new teaching staff, and new groups of students, which resulted in the perceived problems of efficient managing and steering. In addition, higher education has continuously been ascribed new tasks and obligations to meet the needs of a society and a working life characterised by similar dynamics in terms of increasing complexity. Over time, HEIs have continued to expand, and national EQA systems have developed correspondingly to support and audit these developments. In times of temporary economic recession, cost cutting became a dominant argument for the need for (national and external) evaluation. As “quality” increasingly became established as an umbrella term for ideal production of goods and services, evaluation remained a key solution to perceived problems.

The irony of the situation is that important higher education reforms in 1977, 1993, and 2011 related to decentralisation, freedom, and autonomy were attempts to solve problems of expansion and increasing complexity, but the complexities they intended to combat actually increased as they were enacted. EQA systems involving the audit of results or institutional reviews are based on similar ideas and are aligned to overall principles of governance through goals or objectives that have produced increased complexity rather than simplification. Thus, Tainter’s (1988) notion of the evolutionary dynamics of complexity is tangential to developments in Sweden and elsewhere under the label New Public Management. Over the last few decades, decentralisation and autonomy were introduced to combat escalating bureaucratic expansion and inefficiency, but this required an increased complexity of and in measures to ensure and enforce state control of local actors (Walsh 1995; Power 1999). Thus, Geyer and Rihani’s (2010) general observation on public policy is valid for Swedish higher education policy as well:

Once again, attempts at creating greater flexibility and variety in the administration and outputs of public policy were undermined by an overriding desire for central control and oversight. (Geyer and Rihani 2010, pp. 23–24)

In our studies of the recent history of national EQA, we have identified processes of oscillation, layering, and sedimentation in evaluative systems. The question is whether we can expect continuing expansion of national EQA systems in which one system is implemented only to be replaced by another some years later. To critically discuss the prospects of alternative futures, resource issues are imperative: for example, can EQA work be conceptualised as an equation involving balance between quality and return on investment? For obvious reasons, there are a number of difficulties attached to such intellectual diversion. Starting with quality, it is notoriously difficult to assess whether HEIs are actually qualitatively “better” today than 10, 20, or 50 years ago. Moreover, the role of EQA as a driver of improvement in this respect is equally intricate. Looking at the economic part of the equation and return on investment, this component is equally intangible and tends to evade empirical scrutiny. We tried to empirically document the actual cost related to EQA in our project. However, this turned out to be a difficult undertaking because EQA work is not limited to practices and actors whose work is displayed transparently in accounts (cf. Alkin and Ruskus 1984; Dahler-Larsen 2012; Forsell and Ivarsson Westerberg 2014). On the contrary, it increasingly disperses into networks, institutions, and people who are often not paid to do it. According to Dahler-Larsen (2012, p. 184), these activities “incur costs for which no one is held accountable”. Overall, the economic dimension of EQA remains obscure, and as long as such evaluative practices appear functional and legitimate, scholars will have a more difficult time advancing critical discussions on potential problems related to diminishing returns.

A comparison with another system facing similar challenges in terms of declining return on investment may be illustrative in this respect: Within the energy system, the energy return on investment (EROI) ratio can explain how much energy is required to deliver new energy (cf. Hall et al. 2014). However, when it comes to the evaluation–governance–knowledge nexus, there is no applicable model even close to the EROI ratio. Needless to say, it would be a nearly impossible task to provide valid knowledge to inform the “values” in the above imagined equation. Even so, the “quality return on EQA investment” and the “efficiency” of continuous expansion of EQA escape examination and critical scrutiny. We know from decades of organisational studies that this kind of managerial enterprise might provide legitimacy rather than improve actual performance, but we would very much welcome such a debate, as “evaluation gluttony” (see the chapter “Enacting a National Reform Interval in Times of Uncertainty: Evaluation Gluttony Among the Willing”) comes at a (yet largely unknown) price. As we have discussed in the book, such costs may extend from narrow economic values to profound human concerns in terms of how we view and choose to undertake education and science.