Keywords

Which Gates Do Peers Keep?

An internet search of ‘gatekeepers on campus’ leads on the first page to the guards of the physical gates on campus, and also to the universityFootnote 1 services to prevent students from committing suicide. The gatekeepers in this chapter are concerned with less physical gates, and although they may themselves lead to stress and anxiety on campus, it is not often dramatic, though I will show that the anxiety has pernicious consequences. I mean the members of external evaluation committees that form the core of practically every quality assurance and accreditation procedure, and the gates are those of academic existence, because the characteristic outcome of accreditation is a decision to accept a university as legitimate (Adelman, 1992; Schwarz & Westerheijden, 2004; Sursock, 2001; Westerheijden, 2001)—whether it is a traditional campus with gates, or an open university that teaches online. In quality assurance without accreditation, a positive judgement by external evaluators may not decide about the survival of the institution, but still guards the gates of academic repute.

‘Gatekeepers of science’ has been a term applied to the persons who perform peer review, that is, who control access to journal space for publications, and to money through competitive research grants. ‘Gatekeepers of science’ is found in article titles since the late 1960s (Crane, 1967), while some years earlier the term appeared already in the text of an article on ‘The Reception System of Science’ (de Grazia, 1963). Peer review had been transposed from early-medieval legal contexts (‘the lawful judgment of his peers’) to the emerging world of science in the seventeenth and eighteenth centuries to safeguard validity of submissions for publication (Benos et al., 2007), although ‘prior to the Second World War the process was often quite uncodified, and editors frequently made all decisions themselves with only informal advice from colleagues. Only quite recently has the paradigmatic “editor plus two referees” system become widespread’ (Rowland, 2002, p. 248). Peer review has also been used for many decades in decisions about awarding competitive research grants (Marsh et al., 2008). Peer review was transposed to a new context again when quality assurance appeared in the 1980s as a policy tool to safeguard the quality of higher education institutions and their study programmes. In this chapter, issues of the latter transposing of peer review will be addressed: first, what is ‘quality’ of universities; second, what are peers in the process of quality assurance of universities; and third, how do these quality assurance processes affect the university?

Quality of Higher Education Institutions

Quality is unavoidably a contested and multi-faceted concept (Harvey & Knight, 1996). According to my Latin dictionary, the word is connected with qualis, asking ‘how’. Hence, quality asks about ‘how-ness’, characteristics. Characteristics cannot exist without an object, hence the often-used quote of Pirsig’s 1970s novel Zen and the Art of Motorcycle Maintenance, that ‘when you try to say what the quality is, apart from the things that have it, it all goes poof! There’s nothing to talk about’ (Pirsig, 1984, p. 163). Accordingly, there is good sense in the definition agreed upon in the International Organization for Standardization (ISO) of quality as the ‘degree to which a set of inherent characteristics … of an object … fulfils requirements’ (ISO, 2015). In higher education, the objects whose characteristics are of interest might include universities, the persons making up those institutions (teachers, researchers, support staff, and perhaps students), or their activities (in particular, their research or their involvement in education processes). A high-quality university then is one with a positive image of, for example, delivering good graduates to society, of having good professors, or of being famous for discoveries and inventions. This loose illustration of what a high-quality university might mean points to several important points: first, it incorporates several ‘objects’ that may have quality; second, quality is a matter of image (it is ‘in the eye of the beholder’); third, specifically for the class of objects in which higher education fits, the quality of the objects is not immediately obvious to most beholders, which explains why methods to assess the quality of higher education are so important; a fourth interesting point about the ISO definition is its connecting quality judgements to requirements, which might lead to debates about the roles of higher education in society (i.e. requirements to do what?) and to the political question, which stakeholders have or ought to have the power to define those roles and requirements. In this chapter, I cannot even begin to tease out most of the argumentations for and against potential answers to most of these questions, even though I will touch upon all four questions mentioned. A complication, which may make my contribution less legible than I would wish, is that the four issues are interconnected, which makes it difficult to treat one without making assumptions about the others.

A single higher education institution encompasses many objects that have quality: the ‘primary processes’ in a university are education and research,Footnote 2 which ‘produce’ graduates and scientific results—that defines the first two objects. Then, there are supporting processes taking place, as in any other organisation: support for teaching staff in their education and research work, support for students in their learning and living, provision and management of the facilities, and management of the organisation as a whole. But the primary processes must be divided further: each separate study programme has its own quality, just like every research project. That makes universities multi-product firms. Multi-product firms are common—this alone does not set higher education apart in the quality debate. When one considers that the ‘production technology’ in universities can be characterised as a professional-based service, the complexity increases. This means that education (to focus on that primary process) relies largely on the behaviour of teachers, who generally act separately, based on their long training in a certain field of knowledge. It also means that, like every service, learning as the ‘product’ of education only appears in interaction with the students. The final complication to the education process that needs to be considered here comes from the insight of Harvey that education is not just a service for unchanging clients, but aims to transform and empower clients, that is, students (Harvey & Knight, 1996).

The other primary process, research, also relies largely on the knowledge and skills embodied in the professionals, even if there may be more reliance on machinery (laboratories, computers) in the research process than in education (although digitalisation makes technology more important there as well) and even if producing new knowledge is not necessarily a service for or with a client. Notably, the organisational units that ‘produce’ education may differ from those that ‘produce’ research: in delivering study programmes, teachers from several departments may be involved, who—at least in research universities—may do their research in other constellations (e.g. laboratories or research institutes).

Mintzberg has elaborated the consequences from the professional production technologies of education and research for the organisation of universities, which he subsumed under his category of professional bureaucracies (Mintzberg, 1983): the power in such organisations lies with the ‘operational core’, that is, the teachers and researchers, more than with the central management; hence, universities are very decentralised, both horizontally and vertically. Kogan saw more of an exchange relation, leading to an (almost) equal balance of power between the operational core and institutional managers: ‘without active academics securing the reputation of the institution the managers would have nothing worthwhile to manage. There is, therefore, a process of exchange between those who manage and those who provide the main academic outputs of the institutions. They provide the expertise upon which the institution thrives or fails. The institution provides the resources enabling them to perform their academic tasks’ (Kogan, 1984, p. 64).

In the following, I will first consider the primary processes—specially education—as the object of quality judgements, and then turn to the organisation that houses the primary (and secondary) processes as the object.

Quality of Performance and Performance Indicators

Out of the different conceptions of quality distinguished by Harvey and Green (1993), the one traditionally held in academia is that of distinctiveness, which in Harvey & Green’s view translated to the more modern conception of quality as excellence, surpassing the highest standards. Excellence sounds alluring; who would not want to excel? However, in most ‘naïve’ debates about quality, the question is never asked, at what one wants to excel: fundamental research to gain a Nobel Prize? Being an excellent educator for first-generation undergraduates? Gaining a top-ten position in the Shanghai ranking? Leaving the object of excellence unspecified eases agreement in conversation but does not really help to evaluate or enhance quality. Moreover, ‘[e]xcellence, by definition, is a normative concept, i.e. not everyone can be excellent’ (Elton, 1998, p. 4). Excellence is a position good, in economic terms, and its wide—often unconsciously self-evident—acceptance in academia may explain why university league tables with their explicit ascription of positions found so much (albeit grudgingly) recognition.

The other conception often quoted in higher education is that of ‘fitness for purpose’, which comes close to the ISO definition of quality: do what you are supposed (or required) to do. However, in higher education, the ‘supposed’ is more often linked to the institution’s mission than to external requirements—although in state-controlled higher education systems, missions may have been defined externally as well.

Anyhow, the process view on quality has as object of valuation the performance of a university: at stake is the excellence or fulfilling the purpose by research ‘products’ and by education ‘products’. This view easily leads to quantitative indicators: numbers of publications and citations for research quality, and numbers of graduates or employment statistics for education quality. Accordingly, it fits the use of performance indicators in quality assurance, which in turn fits the neoliberal turn (Harvey, 2005) in managing universities and in higher education policy. In fact, the rise of the neoliberal notion of New Public Management (NPM) spurred the introduction of quality assurance to higher education in the first place, around the 1980s (Paradeise et al., 2009; Pollitt, 1990; Pollitt & Bouckaert, 2011). The contrast between quality assurance through performance indicators and through peer review will be a leitmotiv in this chapter.

At this place, I draw attention to one consequence of focusing on products or outputs, namely that it leads to an analytical view on quality in higher education: the quality of output A may differ from that of output B. And once one starts analysing, the different outputs in both education and research multiply quickly: by faculties and by disciplines, by levels of education, by study programmes and by course units. The overview over quality gets lost in a multitude of different qualities of different objects within the university, and that makes assessing or communicating quality exceedingly difficult (Barefoot et al., 2016; Branco et al., 2016; Cremonini et al., 2007). The information costs for prospective students—but also for other stakeholders interested in quality—to learn about quality become prohibitive in this way. Another solution must be found for quality assurance to remain practicable.

Another issue hinted at before is that quality of higher education is not like the economy textbook case where customers in advance know the quality of the good they intend to buy. The standard case is called a search good. Higher education, however, is an experience good or maybe even a credence good. Of an experience good the quality can only be judged during or after consumption: services are all of this kind, such as going to the movies (Henze et al., 2015; Reinstein & Snyder, 2005; von Ungern-Sternberg & von Weizsacker, 1985). With credence goods, even afterwards, consumers do not know the quality of the good (Bonroy & Constantatos, 2008; Dulleck & Kerschbamer, 2006): doctors’ consults, computer repairs, and education are given as standard examples of which it remains all but impossible to know if beneficial outcomes of the services can be ascribed to the service or are caused by other circumstances. No matter whether the balance tilts towards higher education being an experience good or a credence good, asymmetry of information exists beforehand, and that is what quality assurance (but also university rankings or labelling) aims to address in its function of informing external stakeholders (Baksi et al., 2016; Morphew & Swanson, 2011; Westerheijden, 2009). That is why it is relevant not to drown in a multitude of performance indicators about all possible objects of quality in a university.

Referring back to the argument that education aims to be transformative, a further complication is that by the time they graduate, students’ ideas of what they wanted to achieve with their study, that is, their quality criteria, may have changed. This too argues against education being a search good. As an aside, it also implies that current students’ satisfaction with their education is not necessarily correlated with their opinions on quality of the study once they have graduated.

Strengths and Weaknesses of Performance Indicators

As mentioned above, policies to assess quality in higher education arose with neoliberal NPM, around 1980 in early-adopter countries. NPM brought a new approach to the public sector, much more geared towards ‘producing output’, serving the customers (newspeak for citizens), and for that reason much more focused on efficiency instead of legality and legitimacy, through service units (newspeak for government agencies) led by powerful executive managers. In short, private sector management became the ideal for the public sector. In the wake of this movement, new methods in the public sector included quality assurance (Pollitt & Bouckaert, 2011; van Vught & Westerheijden, 1995), which had engulfed industry in the 1970s, when the rise of Japan’s industry showed the leading industries in the western world that methods of production and management needed much improvement to keep up with East Asian quality levels (Deming, 1993; Dill, 1999). The industrial, managerial approach to quality assurance relied on the use of numeric performance indicators for fact-based decision-making.

The main criticism of performance indicators when they were first introduced to higher education in the 1980s remains true today: each indicator captures a small aspect of the performance of a university, and it is often geared towards efficiency, far removed from what the actors in higher education themselves understand by its quality. Already in the early days of quality assurance in higher education, Elton drew attention to this as an instance of Goodhart’s law: ‘any observation of a social system affects the system both before and after the observation, and with unintended and often deleterious consequences’ (Elton, 2004).

For instance, graduation rates and employment rates were (and are) popular performance indicators. The obvious response to maximise such rates—incentivised by politicians and managers—easily led to churning out as many graduates as possible in the shortest time possible, with readily applicable job skills. Whether these graduates gained much deep learning, whether they became critical thinkers capable of bringing an analytical attitude to bear on their first job, further career and social life, whether they gained competences to become valuable, engaged citizens in an open, democratic society, was not measured by such indicators. Such performance indicators imply a short-term utilitarian view on higher education that was embedded in neoliberalism: higher education should train the country’s workforce here and now. If one holds the view that higher education has the role (also) to transmit long-term values, to educate the next generation of leaders and thinkers in society—the view of Humboldt as much as of cardinal Newman (Labrie, 1986; Rothblatt, 1997; van Vught, 1994)—such short-term performance indicators invited goal displacement for all of the higher education sector.

In sum, the weaknesses of performance indicators as a tool for assessing quality of higher education are: they are (often distant) proxies to quality as the concept is understood by actors in higher education; they measure and promote efficiency which may lead to goal displacement. Moreover, they empower managers (both inside and outside universities) rather than teachers, which may be deleterious to quality enhancement in a professional production technology such as higher education, because to achieve improvement the professionals need to adjust their behaviour ‘in the classroom’, which is precisely what managers cannot control.

Nevertheless, performance indicators have strengths as well. First, they provide objective measurements. Objectivity is important in the bureaucratic and legal contexts in which (especially public, but to a large extent also private) higher education operates. It is a solid foundation for decision-making by politicians, civil servants, quality assurance agencies, institutional leaders or managers, and even—if, for example, accreditation decisions are disputed to the highest levels—by judges. Second, as a consequence of their fit with bureaucracy (in its original, objective meaning of rule-based organisations), performance indicators make higher education institutions more manageable. Better insight into the different processes in the institution makes it possible to control and improve those processes from a managerial perspective. Managers would do well, though, to remember what early quality ‘evangelist’ in higher education, Herb Kells, said about them: an indicator puts a question mark rather than an exclamation mark (Kells, 1992); that is, performance indicators can show that something unexpected happens, but they do not define the solution for any problem they may uncover.

Quality of the Organisation and Peer Review

The closing statements in the previous paragraph turn attention again from the role of objective, analytical performance indicators back to the managers or leaders in higher education institutions who have to take action based on the indicators’ information about unexpected performances. This brings me to the alternative approach to quality assurance, that is, not the analytical view of the university’s performances, but a synthetic look at the organisation that ‘produces’ quality: research groups, faculties or—in the spotlight in this chapter—whole institutions. In that perspective, the organisation’s quality is a capacity to act, a potential, rather than a performance.

From an information or communication point of view the organisation perspective has as major advantage that in a single bit (or at most a few bytes—depending on the number of different values in the beholder’s scale of quality), stakeholders get informed about the quality of the institution. The flipside is, obviously, that a single quality judgement for the whole institution hides potentially many and very diverse qualities (van Vught & Ziegele, 2012).

Another drawback of the organisational perspective is that good organisation and good processes do not guarantee good results; it is a potential or capacity, as I just stated, and the ‘production function’ of good education is not known—if there is such a thing as a single function across the diversity of courses, educational goals, teachers and, most importantly, students (Eisner, 1976; Hanushek, 2007; Scheerens, 1987). The perception of failure of the old Weberian, bureaucratic governance of higher education with its focus on regulating and funding capacity (inputs, facilities) had instigated the NPM approach to the public sector, including quality assurance as a new policy instrument, in the first place. On the positive side, creating conditions (staff, facilities) and processes (teaching), as well as implementing those processes, is precisely what a university does. It defines the extent of the institution’s contribution to the quality of education and the outcome is (at least partly) out of its control: students’ learning, employment, citizenship and so on depend on active participation by the students themselves. Good inputs and good design of education processes can influence the occurrence of desired outputs: High-Impact Practices (HIPs) in teaching for student engagement and deep learning (Kuh, 2008), the right mix of theoretical and practical elements in study programmes to engender academic competence, but also other key competences such as entrepreneurialism, social competences, transferable skills (European Council, 2018) and so on.

In a way the turn towards the quality assurance of the organisation is a return to more traditional ways of envisioning the governance and management of the university: a focus on inputs and process. It is also more comfortable to teaching staff and administrators alike than external scrutiny of the results of their efforts, the actual performances, because it is perceived as less of an inroad on academic freedom and institutional autonomy than detailed examination of the teaching and learning itself.

What may be less comfortable to the teaching staff, especially in traditional universities, where the power of Mintzberg’s ‘operational core’ has been augmented by the ideology of academic freedom—for good reasons, but that lies outside the scope of this chapter—is that the introduction of quality assurance and other NPM policies necessarily has strengthened the grip of administrators on the university. The old jokes about the president of the university only being there to settle disputes about parking space, to make sure that the lawn got mown on campus, or that ‘almost his only duty was to preside at banquets’ like the Mayor of Michel Delving (Tolkien, 1966, p. 19), do not raise a smile any longer. The locus of autonomy in the university has shifted to managers—who no longer resent being called ‘managers’ (Westerheijden, 1997).

In briefest summary, autonomy is about ‘who decides what?’ Ever since higher education studies emerged, this has been a question of interest. Clark (1983) invented the triangle of hierarchical state coordination, academic collegial decision-making and price-regulated market coordination to show that in every country a balance was reached among hierarchy, market, and the peculiarly academic coordination of collegial decisions among peers. In a more recent analysis, de Boer et al. (2010) distinguished five coordination mechanisms in their ‘governance equaliser’: state regulation, stakeholder guidance, managerial self-regulation, academic self-regulation, and competition. Moreover, they emphasised that in any system at any moment, each coordination mechanism could be present at different levels of intensity, thus distinguishing low-governance from high-governance balances. Introducing quality assurance was one of many changes in the decades since the late 1970s that introduced NPM to higher education (since Reagan in the US and Thatcher in the UK) and that implied that managerial self-regulation has become stronger in universities (de Boer et al., 2010; Paradeise et al., 2009; Thoenig & Paradeise, 2014). This changed the balance of autonomy (Westerheijden, 2018), even if there would have been no changes to the previous level of academic self-regulation. However, for any given decision item, autonomy is a zero-sum game in the university: if the management decides about the pedagogical approach of a module (e.g. by prescribing problem-based learning), the professor cannot do it anymore.

The previous is not to state that professors’ academic freedom was necessarily the best way towards high-quality education: management may act upon more advanced (and maybe even evidence-based) views on education than the academic staff who—at least in traditional research-oriented universities—have been trained in research methods rather than in teaching methods. (This has changed to some extent in some countries, in recent decades.) There may be collective benefits to the increased managerial self-regulation, and I would venture to state that the advent of quality assurance has led to a redefinition of academic freedom: teaching has become less of a private affair in the minds of many teaching staff. Team teaching and a much more student-centred view on teaching have become commonplace. Already 10–15 years after the introduction of quality assurance in the Netherlands, the discourse among academics and administrators in research universities had embraced terms and categories of thought that would have been unthinkable before. Institutional managers but also academics invariably found the teaching and research processes legitimate objects of management (Westerheijden, 1997)—whether due to quality assurance itself or to the broader permeation of society by neoliberal ideology, does not really matter (similarly: Kolsaker, 2008; Leišytė, 2016). At the same time, this cultural turnaround did not lead to quality management becoming fully systematised: after 25 years of experience with quality assurance, external evaluations still lead to a scramble for dispersed or absent data, and a state of light panic in most higher education institutions. Quality cultures have changed, but have not become mature in most European higher education institutions; studies into how to establish a positive, pervasive quality culture are still deemed necessary to spread the quality ‘gospel’ (Bendermacher et al., 2016; Brennan et al., 2017; Harvey & Stensaker, 2008; Kottmann et al., 2016; Sursock, 2011). It still appears difficult to ensure that universities adopt a positive quality culture with a widely shared set of values and norms in most universities, underpinned by effective structures and processes such as centres for educational excellence, that gives teachers a sense of engagement with the quality of their education work (Kottmann, 2017).

Besides the shift of power within the university from the professionals (teaching and research staff) to managers, external stakeholders also gained influence over the decades of neoliberal and neo-Weberian steering of higher education, at least in Europe (de Boer et al., 2010; Pollitt & Bouckaert, 2011; Westerheijden, 2018). Instead of accepting higher education, and in particular research universities as a value in themselves, as something perhaps quirky but anyway accepted in society, it was especially the strong stakeholder of the Thatcher government in the UK in the late 1970s that demanded that higher education be useful in the short term. Polytechnics were held up as better examples of ‘useful knowledge’ (a term dating back to the Bentham and other Victorians, even to eighteenth-century Enlightenment (Berg, 2007)) than British universities in those days (Goedegebuure et al., 1990). Taking higher education serious as a sector of national economy and security had started around World War II, with rocket science developing in Nazi Germany and the atom bomb in, eventually, the USA. The US ‘endless frontier’ report engraved pure research into the realm of public policy in times of peace as well, since 1945 (Bush, 1945; Pielke, 2010). In the 1960s and 1970s, critical voices in society sparked off protest movements that emphasised universities’ critical and democratising roles. In a sort of sedimentary process, all these movements from Enlightenment to neoliberalism have left traces in the catalogue of at least partly conflicting demands that are put on universities, which translate into different and partly conflicting requirements that define what universities are expected to achieve. Consequently, what counts as quality depends on who defines quality; quality is not a matter of objective indicators, but an inherently political issue, considered at this level (Brennan, 1999; Morley, 2003; Ramirez, 2013; Skolnik, 2010; Westerheijden, 1990a). It then becomes evident that different actors in society may hold different views of universities’ quality, and if one accepts a pluralist view on society, that holding different views is legitimate.

Traditionally, the providers of education and research, the academics, have predominated in this process, in collusion with the government—in Europe the main provider of funds for higher education and research. In Clark’s ‘triangle’ (Clark, 1983), the market used to be largely absent as a coordination mechanism until the 1980s in Europe. However, since the 1980s, many governments have changed the old coalition of state officials and academic oligarchy into a principal–agent quid pro quo relation in the wake of NPM ideas. The role of students as ‘consumers’ of a ‘service’ was stressed and even stimulated in the same vein. Viewing higher education as a ‘public good’ rather than as a private, marketable service, returned on the agenda in reaction to the debates for and against the General Agreement on Trade in Services (GATS) of the early 2000s (EUA and ESIB, 2002; Vlk et al., 2008), but only slowly and again adding to the sediment layers of demands, without eradicating all of the previous developments.

These considerations point us towards the political, institutional embedding of quality assurance of higher education in quality assurance agencies, ministries, professional associations (e.g. lawyers, engineers or chartered accountants) and so on, but also towards who actually make the quality judgements, that is, the evaluators in quality assurance.

Strengths and Weaknesses of Peer Review in Quality Assurance

The term peer review was transposed to quality assurance from the quality judgements in journal publications, as stated at the beginning of this chapter. When advising about publication of a submitted article, peer review enables a holistic judgement of the article’s quality ‘in a way that defies strict logic but has won popular acceptance over the centuries’ (Robertson, 2015). Peer reviewers are expected to judge the combination of originality of the research question and if its solution would contribute to advancement of the field and/or social relevance; correct, up-to-date, and inventive use of the theory in the field(s); correct and imaginative use of methodologies; and interesting, ingenious discussion of the findings and conclusions. The combination is what makes the judgement holistic—with different weights for different elements depending on the submission and with a case-specific, perhaps only semi-conscious way to combine the different elements. Peer evaluation is a matter of ‘connoisseurship’, which requires not only flexible application of knowledge of the field, but also—exemplified in the next quote for educational science—‘to have a background sufficiently rich in educational theory, educational philosophy, and educational history to be able to understand the values implied by the ongoing activities and the alternatives that might have otherwise been employed’ (Eisner, 1976, p. 145). The holistic judgement is a major strength of peer review.

Closely related is another strength: the human connoisseurship in peer review can be applied in different contexts, quite different from the stringent requirements of exactly the same definitions of data needed for performance indicators. One only has to look at the multitude of footnotes in OECD’s annual Education at a Glance reports to see that comparison of data is quite difficult across different contexts with different data collection processes and definitions of the data. Within countries, similar problems may exist at a smaller scale between higher education institutions that collect their own data. Connoisseurs can handle such ambiguity—not perfectly from a data point of view perhaps, but good enough to get a sense of the quality profile of a university. They have deep insight into academic life as a consequence of their own long training in academia—it is the academic professional production model reflexively applied to itself.

The flip side of holistic and flexible judgement is that it is impossible to objectify it completely, which makes it less legitimate in a rule-bound, bureaucratic, or legal context. Peer review requires trust in the peers’ expertise and honesty—but all of these three conditions are questionable in the context of quality assurance: trust, expertise, and honesty.

Lack of trust in the public sector was a basic assumption in neoliberalism (Harvey, 2005), which at the individual level translates into reduced trust in the expertise of peers. This sentiment has had real consequences even in the heart of scientific communication, that is, in the practices of journal submission decision-making. The debates around the functioning of peer review in scientific journals, experiments with either double-blind or more transparent forms of it seem instigated by the neoliberal turn. However, even if it may have strengthened in recent decades, criticism of peer reviewers’ expertise predates the widespread influence of neoliberalism: already in the 1960s–1980s, studies appeared showing that peer review involved a large degree of random error (disagreement among reviewers, republication experiments). Moreover, there could be intellectual bias against non-majority, consensual theories and methodologies in a field, against negative findings (Epstein, 1990, 2004), even though falsification of a hypothesis teaches us much more than corroboration (Popper, 1980), and social bias against non-majority scientists (by gender, by colour, by institution) (Cislak et al., 2018; Hopkins et al., 2013; earlier overview: Westerheijden, 1991).

Random error and bias may occur unconsciously, but when the interests associated with a decision increase, honesty may become jeopardised as well. This issue may already arise in publication of discoveries and inventions (secrecy, stalling competitors’ publications, stealing their ideas, etc.), may be more visible in competitive grant reviews, but may reach its peak in institutional evaluation for quality assurance. Especially if in a review accreditation is at stake—hence the legitimate existence of an institution—the interest in a positive decision may threaten peers’ honesty either through their own anticipation of the consequences of their decision or through subversive actions by the university under scrutiny. Universities have been known to rig data or to respond to questionnaires strategically—and sometimes to deny such reports vehemently—in the case of university ranking (Jaschik, 2018a, 2018b; Lederman, 2009a, 2009b), so one can imagine that the temptations are still larger when it comes to the existence of the institution.

A crucial issue with peer review when applied to quality assurance is that the situation of the peers differs from peer reviewers of journal submissions. In terms of principal–agent theory, in journal submission reviews, peers are agents on behalf of the academic field as an abstract principal, who work for the benefit of the field, whose own (career) interest runs largely parallel with the task of reviewing, and who can use the state of knowledge in the field as a temporarily stable base for their decisions: the submitted paper does or does not add to the current knowledge in the field. In institutional quality assurance, the peers are agents on behalf of very real external principals, i.e. quality assurance agencies, which are usually either (quasi-)governmental agencies or agents of the profession-outside-universities (law firms, engineering bureaus, hospitals etc.). Besides, reviewers’ judgements in quality assurance are made against criteria that are defined by the quality assurance agencies—perhaps with input from academic peers, but in any case translated into the bureaucratic discourse of standards and requirements in a legal context (Baumann & Krücken, 2018; Harman, 1998; Langfeldt et al., 2010). Admittedly, quality assurance agencies regularly encountered difficulties in making academic prima donna peer reviewers apply the bureaucratic standards and criteria rather than tacit knowledge about their field, especially in the early years of quality assurance when ‘the majority of those who judge the teaching excellence of their colleagues have undergone little if any professional development as teachers and none as assessors of excellence’ (Elton, 1998, p. 4). Similarly, in Germany, where accreditation several years after its introduction only enjoyed limited legitimacy among academics, the ‘role separation’ between being teachers and being reviewers on behalf of an agency outside the academic field ‘was only obtained to a very limited extent’ and the reviewers insisted on their primary role as peers, criticising the accreditation system while taking part in it (Baumann & Krücken, 2018). Yet, in most countries in the twenty-first century, quality assurance agencies engage in fairly extensive training of the peer reviewers, and have developed more control capacity of the process through guidelines, report templates, and the presence of quality assurance agency staff members as ‘secretary’, ‘coordinator’, or ‘auditor’ in the external review team, so that the agency’s externally defined standards are applied; peer-specific field knowledge is only allowed to substantiate vague norms like ‘up-to-date’ textbooks, ‘adequate’ teaching facilities, and the like.

Since the external reviewers in quality assurance do not primarily apply (tacit) criteria from within the field, as they do in journal review, they are taken out of their field in this task and it becomes questionable if they still act as respected equals (i.e. peers) while they are agents for quality assurance agencies as principals. It may be a matter of convenience to call external review by a committee in quality assurance processes peer review, but it stretches the meaning of the term ‘peer’. In defence of quality assurance agencies, it must be added that some (but I do not know what proportion) try to instil what might be called a ‘field identification’ in the reviewers rather than a ‘government identification’. Thus, in the US context, where institutional accreditation has a century-long tradition, institutional (‘regional’) accreditation agencies give as much room as possible to evaluation from a ‘fitness for purpose’ conception of quality, that is, mission-based evaluation, so that the institution’s mission defines part of the quality criteria, which embeds the agents more in the field than when they have to evaluate according to purely externally defined criteria. For instance, one agency admonishes: ‘Evaluators are encouraged to approach their assignments as colleagues rather than as auditors’ (NWCCU [Northwest Commission on Colleges and Universities] 2017: p. 9). Nevertheless, especially in recent years, also in the US, governmental requirements have come to define a substantial part of the institutional evaluations. Still, the British Quality Assurance Agency for Higher Education (QAA) states even regarding England—where there is more emphasis on complying with externally imposed standards than in Scotland or Wales—that institutional audit operates on the ‘principle of peer review’ (QAA, n.d.).

In the reasoning above, I focused on the external reviewers’ task of passing a judgement, a summative evaluation on behalf of the principal so that they can decide about an institution’s legitimacy, funding and so on. Most quality assurance processes have, however, a second aim, that is, to assist quality enhancement in the university—the formative part of evaluation. In that perspective, it is important that the reviewers be acknowledged by the evaluees as peers from whom they accept advice, as respected colleagues in the field, fellow professionals who by their training and status in the field have expertise of how higher education works. If this is successful, peer review is very effective, because ‘one of the strongest pressures on any group of academics is the prospect of being judged by senior peers in the discipline’ (Harman, 1998, p. 354). This is a tenable thesis when the evaluation concerns teaching or research in a specific field of knowledge—the field is what defines the professional community of teachers and researchers. The peer concept gets stretched again, however, once the object of evaluation is not a recognised field of knowledge within a university, but the institution as a whole. Who are then the peers? In a 1990s internationally comparative publication, Harman mentioned ‘panels of experts, usually involving at least some “external” members’ (Harman, 1998, p. 353), but I am not aware of more recent or more precise studies on this question. A quick scan for this chapter of nine US regional accreditors and European quality assurance agencies with institutional audits showed varied practices of composing evaluation teams. The commonality is, however, that all agencies choose reviewers largely from within higher education institutions, who must have several years of experience of working in university. First, all but one of the agencies emphasise that evaluation teams must include ‘educators’, and one quality assurance agency adds that such persons ‘typically will represent one of the types of teaching disciplines at the institution being visited’. Another agency prescribes that experienced teaching staff usually constitute the majority of the visiting team—at least three out of at least five—although most agencies only mention requiring at least one teaching representative. The one agency that does not include educators defines peers exclusively as fellow leaders of institutions: ‘former rectors and vice-rectors’. Most other quality assurance agencies, too, mention senior institutional administrators as the second major category of evaluators. Implementing their mission-based approach, some US agencies specify that the administrative contingent includes financial officers and ‘specialists whose expertise is related to the known areas of concern of the institution, (e.g. assessment, student personnel, finance, planning, etc.)’. Specialists on quality assurance or institutional research also are mentioned. The third major category of reviewers are external stakeholders, which are included in a minority of quality assurance agencies’ review teams, for example, ‘members of boards of trustees of accredited institutions, legal counsel, state education or system employees, representatives of the business community, public members’. Finally, European quality assurance agencies include a student representative among the reviewers. In sum, the composition of external review teams for institutional evaluations and accreditations indeed broadens the concept of peers further. The inclusion of experienced institutional administrators but also of specialists in managing a university testifies to what has been called the rise of ‘third space’ professions in between traditional teachers and traditional university leaders (Whitchurch, 2013). Quality assurance thus has contributed to the evolution of a new species in higher education, it would seem—an unintended consequence—first by coercing universities to professionalise quality assurance with teaching excellence centres and quality assurance offices, and secondly by then promoting the new species of professionals into institutional evaluation teams. Remarkably, the spread of external evaluators seems to be less pronounced than might have been expected under an NPM regime: not all quality assurance agencies in my, admittedly, small convenience sample include externals, and if they do, there often is a certain degree of connection between them and higher education; for example, they may be lay members of a university’s board of trustees, or they are student representatives—if students can be called external stakeholders rather than internal ones.

Persistent Dilemmas: Instead of a Conclusion

The general argument in this chapter holds that the traditional way of assessing quality in higher education is through peer review. This method has strengths, namely holistic and flexible judgement, but also weaknesses especially when applied to quality assurance of higher education institutions, because it is prone to random error, to intellectual and social biases, and honesty is threatened as a result of the high stakes involved, in particular when accreditation of the university might be at risk. Moreover, trust in inside, human judgements has waned—NPM makes that explicit in its stress on transparency and accountability through performance indicators (Westerheijden, 1990b). Strengths and weaknesses of performance indicators mirror those of peer review: they are objective but partial and often distant proxies to the concept of quality. Moreover, reliance on performance indicators leads to goal displacement and concomitant distortions of behaviour of and in universities, even to fraud in creating data for indicators.

Actual quality assurance processes use a combination of peer review and performance indicators in the hope to balance the weaknesses of one with the strengths of the other. The capacity to collect data on the primary processes in university—my focus in this chapter has been on education rather than on research—and to analyse them has increased over the course of the recent decades after the introduction of quality assurance in higher education systems. We know much more about the processes, and this information, often in the form of indicators, is used as input for evaluation processes based on human judgement, which is still called peer review, even if the term has grown to carry a new, broader meaning than before.

Peer review seems to be used in institutional quality assurance also to emphasise the ‘soft’ side of it, the quality enhancement that needs the non-threatening yet authoritative voice of respected peers. Some of the quality assurance agencies whose criteria for reviewer status I scanned seem to downplay the administrators’ character for just this purpose; for example, the QAA states: ‘Most of our reviewers are academics with postgraduate qualifications, many with doctorates. Some hold senior roles such as Vice Chancellor, Principal or Pro-Vice-Chancellor’ (QAA, n.d.)—as if they just happened to have held that position for a while and accidentally. Yet, the UK is the most managerial, hard-NPM country in Europe, where more than in other countries, being an institutional administrator is a full-time occupation and a career in its own right. The rhetoric cannot hide that the external reviewers come in from a position of power—power to affect the institution’s reputation, its leaders’ careers (e.g. in the UK and the US), or even the existence of the institution (in countries where accreditation is a condition for legitimate operation). Realising this, I once formulated the following dilemma (Westerheijden, 1990b):

  • Dilemma I—Without (the threat of) serious consequences, quality assurance is not taken seriously in academe and turns into an administrative burden, yet with (the threat of) serious consequences, quality assurance turns into a game to gain positive outcomes, not to assure or enhance quality.

Considering that the peer review side of institutional reviews is stressed to make them more accepted among the evaluees, with the further aim to strengthen the quality enhancement function of the evaluation, another dilemma arises from the above as well (Westerheijden, 2013):

  • Dilemma II—Quality enhancement demands evidence-based decision-making, where the evidence usually consists of performance indicators, but performance indicators threaten quality enhancement through the goal displacement that they induce.

Having studied quality assurance in higher education for more than three decades, these dilemmas continue to puzzle me. Assuring and enhancing quality remains a balancing act involving all who work in a higher education institution, spurred on by external review, yet simultaneously hindered by its distorting effects, whatever the mix of peer review and performance indicators it employs. Peer reviewers in quality assurance, the gatekeepers of prestige and even of the existence of the campus, make tightrope walkers out of their peers on campus, the teachers as much as the leaders.