Keywords

Brief Introduction to Research Assessments

Throughout their careers, researchers will face dilemmas and need to make decisions regarding the ethics and the integrity of their work. Earlier chapters in this volume illustrate the substantial challenges and dilemmas involved and the impact that researchersā€™ decisions can have on research, knowledge, and practices. But decisions are not limited to research practices, they also need to be made about researchers themselves. Deciding which researchers should receive grants, which are allowed to start a career in academia, which are promoted, and which obtain tenure are complex issues that shape the way in which research systems operates.

In this chapter, we provide an overview of the complexities of research assessments. More specifically, we provide a critical overview of the problems that current research assessments generate and showcase innovative actions that are introduced with a view to improve the processFootnote 1. We start by briefly introducing research assessmentsFootnote 2 and the debate on whether they are fit for purpose. We then discuss problems of research assessments on five different dimensions: the content; the procedure; the assessors; the environments; and the coordination between these dimensions (Fig. 27.1).

Fig. 27.1
A stacked Venn diagram presents 5 dimensions of researcher assessments. 1. Content. 2. Procedure. 3. Assessors. 4. Environments. 5. Coordination. The first 4 dimensions the questions of what, how, who, and why.

The five dimensions of researcher assessments addressed in this chapter

Research assessments entail important decisions about what matters (i.e., what should be valued in academic careers and research outputs), about who decides what matters, and about how what matters can be measured. In addition to the inherent complexity, the decisions needed for research assessments depend on several stakeholders with their own distinct interests. Given the profound complexity, the high stakes, and the many actors involved in such decisions, it is no surprise that research assessments raise substantial controversies. Before introducing the problems and latest innovations in research assessments, we thought that it may help to provide a quick historical snapshot of the evolution of the discourse. This historical snapshot is high-level initially, but we will detail and document each point in greater depth throughout this chapter.

Scientists have scrutinised the attribution of success in academic research for well over half a century (Hagstrom, 1975; Merton, 1957; Zuckerman & Merton, 1971), yet we can pin the beginning of the debate on research assessments on the 1980ā€™s, when the growing investments in research led to a substantial growth of the academic workforce (Alberts et al., 2015). This growth introduced a stronger need for fair distribution of research resources, for example in funding allocation, hiring, tenure, and promotion. Publication metrics which had made their appearance some years earlier ā€“ namely publication counts, the H index, citations counts, and journal impact factors ā€“ started being used in research assessments as an opportunity for broad scale, rapid, and comparative research assessment that provides a greater sense of objectivity than traditional peer-review qualitative assessment (Gingras, 2016). Quite rapidly however, it became clear that the newly adopted metrics influenced the publication practices of researchers also in less desirable ways. Early metrics focused on quantity, for instance by using the number of scientific papers researchers published as an indicator of success. This focus on quantity invited high volumes of lower quality scholarly outputs (Butler, 2003). To address this problem, journal impact factors and citation counts started being used in assessments, asking researchers to place impact before volume. This change had the desired effect and redirected the scholarly output towards prestigious high impact journals (LariviĆØre & Sugimoto, 2018a). With occasional exceptions, assessors and researchers overall appeared to be satisfied with the new methods until the early 2000ā€™s. The beginning of the twenty-first century brought with it a vivid interest in meta-research, research integrity, and bibliometrics. Researchers started understanding that research was vulnerable to misconduct and inaccuracies (Ioannidis, 2005; Martinson et al., 2005), and that research assessments could influence research in harmful ways (Abbasi, 2004). Not only did impact-metrics influence the types of research being done, but they also made research move away from important integrity and quality aspects such as reproducibility and open science (Moher et al., 2018). At the same time, researchers were growing more aware of the high pressure and highly competitive environment they worked in and the impact this had on their work (Anderson et al., 2007; De Vries et al., 2006). Consequently, researchers and research communities joined forces to address these challenges and in startedĀ demanding change in the way in researchers are assessed.

The San Francisco Declaration on Research Assessments (DORA; American Society for Cell Biology, 2013), The Metric Tide (Wilsdon et al., 2015), and the Leiden Manifesto (Hicks et al., 2015) were among the first key documents to specifically address and raise awareness on the faults of the current assessment. Mostly focused on metrics, these pioneer works were then followed by position statements from numerous groups and organizations who broadened the issue towards research climates, research careers, and research integrity. In Table 27.1, we showcase a selection of position statements and documents from general and broad-reaching groups. The 11 documents displayed in Table 27.1 are only a tiny selection of the booming number of positions papers, initiatives, perspectives, and recommendations now available from different research institutions, research funders, learned associations, and policy groups. Consequently, it would be fair to say that the debate on research assessments has reached strong momentum, and that substantive changes likely are underway.

Table 27.1 Selection of position statements specifically addressing research assessments

Problems and Innovative Actions

Changing research assessments is a complex endeavour that requires multiple stakeholders, coordination, and finetuning. In the following sections we introduce a selection of key problems with current research assessments and describe a number of promising actions currently taken to address these problems and improve research assessments.

Problems with research assessments can happen on several interconnected dimensions, some of which are incredibly difficult to tackle. As a starting point, it is essential to address problems with the indicators and the approaches contained in the assessments themselves. But although the content of assessments is a necessary starting point for tackling assessments, it is not the only dimension that needs to be addressed to fully make research assessments fit for purpose. The procedure followed and the assessors responsible for assessing researchers are also important in enabling changes. Even if the indicators, the procedure, and the assessors are optimal, the research culture plays an additional role in ensuring that changes to research assessments indeed improve the practices and decisions of researchers. Consequently, the environment in which researchers work, albeit complex and difficult to address directly, also needs a place in initiatives that aim to change assessments and help foster better research. Finally, a good coordination of efforts is needed to ensure that the changes are profound, coherent, and sustainable.

In the following section, we describe key problems and innovative action on the content, procedure, assessors, environments, and coordination of research assessments. Table 27.2 summarizes the main points addressed.

Table 27.2 Frequent challenges in research assessments and examples of initiatives to improve research assessments

Content

Reflection on research assessments should necessarily start with the elements of researcherā€™s professional behavior that are assessed and their impact on the quality and relevance of research. Understanding the problems with the core elements that are used within research assessments is an important starting point to better understand what needs to change.

The problems related to the content of research assessments are too numerous to be able to cover in a book chapter. For simplicity, we selected five key issues that we believe play an important part in the current discourse on research assessments: i) the exaggerated focus on research outputs; ii) the valuation of quantity over quality; iii) the inadequacy of currently used metrics; iv) the narrow definitions of impact; and v) the obstacles current research assessments impose on diversity.

An Exaggerated Focus on Research Outputs

The Problem

When looking at research assessments in practice, it is clear that these depend almost exclusively on research outputs, most notably on scholarly papers published in international peer-reviewed journals.Footnote 3 This focus on outputs has nothing surprising. Considering that a large proportion of research is funded by public investments, it is natural to expect that researchers generate products (in this case research reports) that will ultimately enable tangible benefits for society. Yet, the way in which research outputs are currently measured is problematic in a number of ways.

For one, the exaggerated emphasis on research outputs means that current assessments are oblivious to most of researchersā€™ commitments. Publishing papers, as important as it is, is far from the only activity researchers spend their time and efforts on (Ziker, 2014). Teaching and providing services ā€” the two other pillars of academic careers ā€” and other essential tasks such as mentoring, reviewing or team contributions almost always take second place or are even ignored in research assessments (Schimanski & Alperin, 2018). And within the pillar of ā€˜researchā€™, many activities and processes that would provide invaluable information on how the research is conducted are largely ignored from current output-oriented assessments, creating a culture ā€œthat cares exclusively about what is achieved and not about how it is achievedā€ (Farrar, 2019). For example, the detailed methods, the approaches, the specific contributions, or the translation of research in practice are rarely considered in research assessments (Aubert Bonn & Pinxten, 2021b). This lack of consideration for research processes risks losing sight of important procedural concepts thought to be highly important in advancing science, such as quality, integrity, and transparency (Aubert Bonn & Pinxten, 2021a).

Innovative Action

In the past few years, there has been an increasing awareness that linking research assessments almost exclusively to research outputs may be problematic (Farrar, 2019). Principle 5 of the Hong Kong Principles, and recommendations 3 and 5 of the DORA directly address this issue, stating that a broader range of research activities should be considered in research assessments. One concrete initiative which may be a first step in solving this problem is the provision of greater visibility to a range of activities that are part of researchersā€™ daily tasks. The Open Science badges ā€” registration, open data, open materials ā€” are a good example of a simple change that allows readers or eventually assessors to quickly capture open science practices behind published works (Kidwell et al., 2016). The presence of reporting guidelines, such as those available on the EQUATOR (Enhancing the QUAlity and Transparency Of health Research) Network (EQUATOR network, n.d.) can also summarize details and procedures and provide information on the transparency and reproducibility of the work. The increasing availability of open and transparent peer-review and initiatives that provide visibility of peer-review commitments such as Publons (Publons, n.d.) or ORCID (Open Researcher and Contributor ID) (ORCID, n.d.) are other examples that can help enrich the indicators used to assess researchers. The Contributor Role Taxonomy (CRediT) which provides more information on the roles, and responsibilities that researchers take is another example we will discuss further in Sect. 27.2.1.5 (Alperin et al., 2019; CASRAI).

Broader indicators are increasingly visible in more formal assessments procedures. For instance, the Academic Careers Understood through MEasurement and Norms (ACUMEN) portfolio provides a template that considers indicators from a very diverse array of activities (European Commission, 2019). While the ACUMEN remains largely quantitative, its broad coverage of research activities is a good reminder that assessments can be much more comprehensive. The European Commissionā€™s Open Science Career Assessment Matrix (OS-CAM) is a similar model of assessment that includes a broad array of research activities such as teaching, supervision and mentoring, professional experience and even has an explicit section on research processes (European Commission, 2017). We will discuss other ways of broadening assessments such as narrative CVs and portfolios in Sect. 27.2.1.4.

Quantity Over Quality

The Problem

Another important problem of researcher assessments is their tendency to value quantity over quality. Many researchers feel encouraged to publish as many papers as possible and are sometimes offered tangible incentives such as financial rewards to publish more (Hedding, 2019; Muthama & McKenna, 2020). Assessing researchers on the number of published papers does indeed lead to more publications, but it tends to do so at the detriment of research quality (Butler, 2003; Moed, 2008). It can also encourage questionable research practices such as ā€˜salami slicingā€™ ā€” ā€œthe spreading of study results over more papers than necessaryā€ (Embassy of Good Science, 2021) ā€” and can tempt researchers to favour journals where acceptance rates are high rather than journals suited for their work or journals with thorough peer-review procedures. Unsurprisingly, the longing for quantity also works in favour of predatory publishers and paper mills whose business model is targeting authors desperate to publish regardless of quality (Hedding, 2019; Vogel, 2017).

To address this problem, research and funding institutions are increasingly modifying their assessment procedures to focus on impact rather than on quantity. Nevertheless, the impressive numbers of peer-reviewed publications or books that are very often stated in researchersā€™ biographies reminds us that productivity is still considered an important indicator of accomplishment within the research community and the research culture. Quantity indicators also remain key to institution-level assessments; a point we will discuss further in the Coordination section.

Innovative Action

The obvious solution to reduce the focus on quantity should be to look more at quality. But even though ways to assess quality are starting to pierce, the endeavour is a bit more complex that it may seem. For example, Eyre Walker and colleagues showed that, when scientists assess a published paper without knowing the journals in which the paper was published, they are generally inconsistent and unable to judge its intrinsic merit or to estimate the impact factor of the journal in which the paper was published (Eyre-Walker & Stoletzki, 2013). However, assessing quality of publications is not the only way assessments can deviate from quantity indicators. In the past few years, several research and funding institutions diverted assessments away from quantity by asking researchers to select only a subset of their work ā€” generally three to five key accomplishments or contributions (e.g., publications, events, changes in practice, committee participation, etc.) ā€” and to describe why these accomplishments matter (see for example (Cancer Research UK, 2018)). Focusing on a limited number of outputs enables a more in depth assessment which is likely to refocus the assessorsā€™ attention away from quantity towards content, meaning, and quality.

Inappropriate Use of Metrics

The Problem

As we mentioned above, most research assessments swapped volume-metrics for impact-metrics to incite researchers to publish in more prestigious journals. Among those, the journal impact factor, citations count, and the H-index raise important challenges.

Of all impact-informed metrics available, the journal impact factor is probably the most widely used in current research assessments. In a review of their use in North American academic review, promotion, and tenure document, McKiernan and colleagues found that 40% of research intensive institutions explicitly mention journal impact factors (McKiernan et al., 2019). The journal impact factor of a given year is the ratio between the number of citations received in that year for publications in that journal that were published in the two preceding years and the total number of ā€œcitable itemsā€ published in that journal during the two preceding years. (LariviĆØre & Sugimoto, 2018b; Wikipedia, 2021). The journal impact factor was designed to help librarians select the journals they should subscribe to, but it was never intended to influence researcher evaluations. On the contrary, Eugene Garfield ā€” widely known as the father of journal impact factors ā€” explicitly warned against using journal impact factors for assessing individual scholarly articles (Garfield, 1998). Nevertheless, the seductive power of a single metric that would allow to quantify the ā€˜valueā€™ of journal articles quickly won over research assessments. Unfortunately impact factors introduced substantial problems of their own. First, the mere fact that journal impact factors became recognized as a measure of success reduced their objectivity as a measure of success; a phenomenon known as Campbellā€™s law (Hatch & Schmidt, 2020). In fact, journal impact factors incite strategic responses from researchers, many of which are now considered to be questionable research practices. These include among others selective reporting, ā€˜spinā€™, p-hacking, HARK-ing (hypothesizing after results are known) and non-publication of negative results (de Rijcke et al., 2015; Gingras, 2016; LariviĆØre & Sugimoto, 2018a; Wouters, 2014). Journal impact factors further suffer from fundamental weaknesses that allow them to be gamed relatively easily (Ioannidis & Thombs, 2019).Footnote 4 In addition, impact factors are a journal-level metric and are therefore not a valid measure for the impact of individual papers or of the authors of that paper. Indeed, the distribution of citations in a journal tends to be so skewed that impact factors provide little information on the number of citations individual papers in that journal can expect (Brito & Rodriguez-Navarro, 2019; LariviĆØre et al., 2016). Finally, by the way journal impact factors are calculated, they ignore slow citation (i.e., citations two or more years after publication), thereby potentially bias against innovative research (Schmidt, 2020). Despite these fundamental flaws, journal impact factors are still widely used in researcher assessments and are frequently described as an indicator of the quality of individual research papers (Aubert Bonn & Pinxten, 2021b).

Without even entering the colossal debate on the relationship between citation metrics and research quality, it may be relevant to consider the actual number of citations which are also frequently used in researcher assessments despite the fact that these require more time to accumulate. Citations are problematic in different yet connected ways. To begin, numbers of citations provide no information on the reasons a paper is cited. Citations used to provide background information, to build an argument, to support a theory, to raise a problem, or to criticize a paper all count in the same way (LariviĆØre & Sugimoto, 2018b). Citations can also be manipulated, for example through peer-reviewer or editor requests, or by forming citation cartels (Baas & Fennel, 2019; Fong & Wilhite, 2017). They are also prone to biases unrelated to the intrinsic merit of a paper (Urlings et al., 2021). And finally, direct citations are often only partially and sometimes not at all supported by the cited article, suggesting that researchers often cite papers without reading or even downloading them (Drake et al., 2013).

The H-index ā€” or Hirsch Index for its inventor Jorge E. Hirsch ā€” is another indicator that is frequently used in research assessments. The calculation is quite simple: a researcher has an h-index of x when she or he published at least x papers which were cited at least x times each. In other words, the h-index combines impact and productivity to provide information at an individual level. Nonetheless, the H-index is also strongly criticized. First, the misleading simplicity of a single number to judge researchers is already problematic, especially when comparing researchers from different fields of expertise. Furthermore, although the H-index combines paper and citation counts, it will never be higher than the total number of papers a researcher has published, regardless of the number of citations these papers have (e.g., a researcher with 10 papers cited 10 times each will have a higher H-index than a researcher with 9 papers cited 100 times each) (LariviĆØre & Sugimoto, 2018b). Similarly, as an ever-growing metric, the H-index provides senior researchers with a clear advantage that makes them largely invincible when compared to junior researchers, even after they stop being active in research. Jorge E. Hirsch himself stated that the H-index could ā€œfail spectacularly and have severe unintended negative consequencesā€ (Hirsch, 2020, p. 4), and several metrics experts have deemed it inappropriate in measuring researcherā€™s overall impact (Waltman & van Eck, 2012). Despite all this, the H-index continues to be used often in research assessments.

Although many other metrics exist, the journal impact factor, citation count, and H-index are the three most frequently used in researcher assessment. On top of their individual flaws, an overarching criticism of these metrics is that they fail to capture the core qualities they aim to measure. More specifically, while several institutions use these metrics as a proxy to assess the quality and impact of the work (McKiernan et al., 2019), they provide very little information that could be validly interpreted as quality or impact (Aubert Bonn & Pinxten, 2021b). Instead, these metrics provide information on the visibility, the attention, and the citation patterns within academia (LariviĆØre & Sugimoto, 2018b; Sugimoto & LariviĆØre, 2018). Garfield himself qualified citations as an indicator of ā€œthe utility and interest the rest of the scientific community finds in [the work]ā€ (Garfield, 1979, p. 372), not as a measure of quality. Knowing that impact-informed metrics are even believed to ā€œdiscourage rigorous procedures, strict replication/confirmation studies and publication of negative, nonstatistically significant resultsā€, it is important to rethink how we use ā€” or at least interpret ā€” impact metrics (Lindner et al., 2018).

Once again however, reinterpreting the role of impact metrics on research assessments requires changes at the core of research communities. Researchers who have spent decades building a career on inadequate indicators may find it daunting to give up their high rankings to adopt a new system in which they may rank less excellent or even poorly. Increased awareness, discussion, and mobilisation are still needed.

Innovative Action

The Declaration on Research assessments (DORA, 2021) strongly advocates against using the impact factor in individual research evaluations,Footnote 5 supports the consideration of value and impact of all research outputs, and argues that evaluations of scientific productivity must be transparent and explicit. Along the same line, the Leiden Manifesto and The Metrics Tide pledge for the development and adoption of better, fairer, more transparent and more responsible metrics (Hicks et al., 2015; Wilsdon et al., 2015). These three initiatives, recently joined by the Hong Kong Principles for assessing researchers (Moher et al., 2020), play a crucial role in raising awareness about the shortcomings of widely used research metrics. Awareness is only the first step towards actual change but these initiatives have brought together a community that supports the change. DORA already has nearly 20,000 signatories ā€” over 2000 of which are organizations. And changes are indeed starting to happen at the research institutions, funders, and policy level. For instance, several research institutions now make sure that metrics are not used in isolation, but only as a complement to reflective qualitative peer-review (examples of institutions that have concretized these changes are available in the repository ā€˜Reimaginging academic assessment: stories of innovation and changeā€™ developed by DORA in collaboration with EUA and SPARC Europe (DORA, 2021)).

As part of the Horizon 2020 program, the European Commission also created an Open Science Policy Platform in which several expert groups were created to discuss better research assessments and indicators. These include the Working Group on Rewards, the Expert Group on Indicators, and the Mutual Learning Exercise on Open Science ā€“ Altmetrics and Rewards (Open Science Policy Platform, 2017).

New metrics are also becoming available to help balance research assessments. Simple paper downloads, for example, may capture readers who do not cite works, such as non-academic users of the work (Winker, 2017). More complex composite metrics have also been built. Altmetrics is a prime example of the diversification of the elements that can be captured on a single piece of work. Altmetrics include a wide array of inputs, such as open peer reviews reports, social media capture, citations on Wikipedia and in public policy documents, mentions on research blogs, mass media coverage, and many more aspects which help provide a broader overview of how the work is being used. The PlumX metrics, although governed by different calculations, works in similar ways. These innovative metrics are gaining increasing visibility on publisherā€™s websites, but their use in formal researcher assessment is still very limited.

Narrow Views of Impact

The Problem

In addition to the overreliance on outputs and the problem of inadequate metrics we delineated above, indicators currently used in research assessments can be criticized because they provide a very narrow view of research impact. Two main dimensions deserve to be discussed here.

The first dimension concerns the impact research has on practice, policies, or society. As we previously mentioned, researchers are often expected to dedicate a portion of their time to the key pillar of ā€˜Servicesā€™, but typically their involvement in ā€˜Servicesā€™ is almost entirely absent from researcher assessments (Schimanski & Alperin, 2018). In addition, in the rare instances where ā€˜Servicesā€™ are considered in review, promotion, and tenure assessments, their consideration almost exclusively targets services provided within the institution or the research community ā€” such as participation on university boards or editorial boards ā€” rather than services provided to the public or to society (Alperin et al., 2019). Citations-based metrics only consider recognition and visibility within the scientific (and citing) community and provide only a restricted view of academic impact (Lebel & Mclean, 2018). Impact on practice, policy and society are not captured and are even obscured by these narrow metrics. For example, the need to publish in high impact factor journals often translates in a need to publish in English-language international journals; a decision that can reduce the societal impact of locally relevant research projects (Gingras & Mosbah-Natanson, 2010). Academic environments themselves, through their funding objectives, missions, and expectations, value discovery but largely disregard how we can best implement discoveries in practice (El-Sadr et al., 2014).

A second dimension that is important to reconsider is the impact that research has on knowledge advancement. In fact, current assessments tend to conflate impact with ground-breaking findings (Aubert Bonn & Pinxten, 2021b). While this idea has long been embedded in the notion of scientific discovery, it also undermines the importance of non-ground-breaking work in advancing knowledge. Borrowing the words of Ottoline Leyser, chief executive officer of UK Research and Innovation:

It is worth remembering that the term ā€œground-breakingā€ comes from construction. There is often a ground-breaking ceremony, but then the building must be erected. This comes only after much preparation, from determining the ideal location to securing all the planning permissions. Likewise, for every ground-breaking discovery, a huge amount of work has paved the way, and follow-up work to solidify the evidence and demonstrate reproducibility and generality is essential. High-quality work of this sort is rarely recognized as excellent by the scientific enterprise but is excellent nonetheless, and without it, there would be no progress. (Leyser, 2020: 886)

The overemphasis on ground-breaking discovery has shaped a research system in which replication studies and negative results are largely invisible despite their crucial value in solidifying knowledge (Bouter & Riet, 2021; Ioannidis, 2018; MunafĆ² et al., 2017).

Innovative Action

To better capture the impact that research has on practice, policies, society, or research itself, research assessors need to broaden the scope of indicators they use. We already mentioned that alternative metrics can help capture interest that would otherwise be missed. Another notable effort that may help capture societal impact in research is the Research Quality Plus (RQ+) evaluation approach used at the International Development Research Centre (IDRC) in Canada (Ofir et al., 2016). Although emphasising expected impact in a funding application is sometimes criticized for being artificial and highly theoretical (Brooks, 2013; Kirschner, 2013), the RQ+ provides a structured method through which societal impact can be estimated before the research takes place. Since the RQ+ is used for evaluating research proposals, it is not directly applicable to assessing researchersā€™ past accomplishments. Nonetheless, it might be a good model to inspire areas of impact that could be considered in future research assessments.

To capture the impact that the research has in building knowledge, several research institutions and funders started adopting narrative CVs in which researchers are encouraged to describe, in their own words, the impact of their work. A good example of these narrative CVs is the RƩsumƩ for researchers provided by the Royal Society in the UK (Royal Society, n.d.). In the RƩsumƩ for researchers, applicants are provided with unstructured space to discuss their contributions to the generation of knowledge, the development of individuals, the wider research community, and the broader society. These open descriptions enable assessors to consider a broader, more diverse, and more personal perspective of impact that may have been invisible otherwise. While these narrative CVs are not easy to write and more demanding to assess than quantitative metrics, they are increasingly adopted in research institutions. Several other funders, such as the Health Research Board Ireland, the Dutch Research Council, and the Swiss National Science Foundation are also experimenting with open and narrative CVs (Hatch & Curry, 2020).

Obstacle to Diversity

The Problem

In addition to the issues presented above, current research assessments also often fail to promote diversity and inclusion in research. Gender inequalities, for example, are seen in both citation metrics and publication outputs (Beaudry & Lariviere., 2016; LariviĆØre et al., 2013), even more so in the disrupted working conditions of the COVID-19 pandemic (Minello, 2020; Viglione, 2020). Women are also more likely to be strongly involved in teaching, in the hands-on facets of research, or in other contributions that are essential to science but are less likely to result in first- or last-author publications (Astegiano et al., 2019; Macaluso et al., 2016). Similar issues also afflict ethnic groups and geographic regions, not only in funding opportunities and access (Check Hayden, 2015), but also in the fair attribution and recognition of their work (Powell, 2018; Rochmyaningsih, 2018). The same hurdles are faced by researchers with disability, even when policies are in place to tackle the injustice (Brock, 2021). Consequently, research assessmentā€™s excessive reliance on publication metrics may further tax diversity and inclusion issues in academia. But diversity and inclusion is not only about disadvantaged groups. Diversity of skills, contribution, and career profiles is also an essential aspect that is largely ignored in current assessments and inclusion policies. Indeed, research assessments tend to assess researchers individually and to expect them to fit a one-size-fits-all model of success in research (Aubert Bonn & Pinxten, 2021b). This individual and uniform model of assessment contradicts the highly collaborative, differentiated, and complementary roles that are intrinsic to research (Bothwell, 2019). Overlooking the still growing differentiation of research tasks disregards the unique contributions from non-leading members of research teams as well as the essential role of research support staff (Payne, 2021). Individual assessments and uniform expectations also increase competition between researchers; a feature which is known to be highly problematic and is often mentioned as a cause for research misconduct and questionable research practices (Anderson et al., 2007; Aubert Bonn & Pinxten, 2019).

Innovative Action

The lack of diversity in research is a priority on the agenda of several large funders and research organisations. The Athena Swan Charter, for example, plays an important role in inciting research institutions to achieve gender inclusivity (ā€œAthena Swan Charter, n.d.ā€). Several institutions already have internal policies, quotas, and initiatives to promote greater diversity in hiring and promotion, yet some of these policies have raised hefty debates in the past (ā€œCollege oordeelt over voorkeursbeleid TU Eindhovenā€, 2020; Dance, 2019). Going one step further, the Indiana University ā€“ Purdue University Indianapolis (IUPIU) decided not only to encourage activities that promote equality, diversity, and inclusion, but also to recognize their inherent value by considering them in researchersā€™ tenure and promotion application (ā€œIUPUI approves new path to promotion and tenure for enhancing equity, inclusion and diversityā€, 2021). Despite these important initiatives, the impact that the indicators used in assessing researchers have on diversity and inclusion is rarely addressed, and there is growing realization that diversity and inclusion should be more prominent in research assessments (Labib & Evans, 2021).

The role an individual has in the research team has also received increasing attention in the past few years. Assessors realise that knowing the ways in which researchers collaborate can provide invaluable information. As a result, interesting initiatives that enable greater visibility on the team aspect of research are starting to pierce. The Contributor Role Taxonomy (CRediT), for example, provides an added level of granularity to authorship and helps to understand the dynamics, roles, and responsibilities in team research (Alperin et al., 2019; CASRAI, n.d.). Although contributor roles have not yet fully secured their place in research assessments, more and more journals provide contributorship sections to the papers they publish. Whether the future of academia is one in which contributor roles take over authorship, however, remains to be seen (McNutt et al., 2018; Smith, 1997). Another interesting initiative in the recognition of teamwork is the Diversity Approach to Research Evaluation (DARE;Ā Bone et al., 2020). The DARE approach provides tools to measure and understand how collaborators connect and deal with diversity. While the approach is more informative than evaluative, knowing more about the dynamics in research teams is a starting point to gather information on the characteristics of strong research teams.

There is also a growing belief that the lack of diversity in the profiles of individuals that succeed in academia may weaken effective team work (Aubert Bonn & Pinxten, 2021c). Diversifying the profiles of academic employment, therefore, may help build research climates in which success comes from joint efforts rather than from competition between individuals. One early example of such initiative is the Open University in the UK, where more flexibility is given to researchers to enable to focus on different pillars of their work (Parr, 2015). As a result, researchers could pursue a career in which knowledge exchange is valued before their teaching and research achievements. The recently implemented career track at Ghent University, Belgium and the Dutch Recognition and Reward Programme are two other well-known initiatives to address the need for diversifying researchersā€™ profiles (Ghent University Department of Personnel & Organization, 2018; VSNU et al., 2019). The position paper ā€˜Room for everyoneā€™s talentā€™ from the Dutch Recognition and Reward Programme nicely illustrates how such a diversification may take shape. Specifically, researchers have the opportunity to select a unique combination of key areas they wish to specialise in and be assessed on. These key areas include research, education, impact, leadership, and patient care. While all researchers are expected to demonstrate sufficient competencies in the research and education areas, they can choose the extent to which they favour these and any other areas and can change areas of specialties at different stages of their career.

Finally, the initiative contains a clear acknowledgement of the need to reward team efforts, The Dutchā€™s highest research awards, the Spinoza and the Stevin prizes, are now also open to team applications, making another step forward in the recognition of research as team work (Hoger Onderwijs Persbureau, 2019).

Procedure

The Problem

Changing researcher assessments is a complex endeavour that extends far beyond the elements and indicators assessed. It is also important to discuss the time and resource commitments that research assessments simply.

Researchers need to invest substantial time in building a prestigious CV and in applying for research funding. While the peer-reviewed process through which research is funded is most likely essential for good quality research, the low success rate of current funding schemes (typically 5ā€“10% of the applications are granted) suggests that a lot of efforts are ultimately wasted. Past research has shown that many researchers consider the preparation of funding proposals to be the most ā€œunnecessarily time-consuming and ultimately most wasteful aspect of research-related workloadā€ (Schneider et al., 2014, p. 41) and that researchers wished they could spend less of their time on it (Aubert Bonn & Pinxten, 2020a). In fact, Herbert and colleagues estimated that the amount of time spent preparing grants for the Australian National Health and Medical Research Council in 2012 (Herbert et al., 2013) reached 550 working years of researchersā€™ time ā€” the equivalent of 66Ā million Australian dollars (around 42.5Ā million Euros at the time of writing). Considering the low success rate of these applications, competitive funding channels come with phenomenal research time investments. Building a tenure dossier and applying for different research positions is also no small task, and since grants and non-tenured research positions are typically short-term, the time investment involved is substantial.

In turn, the colossal demands for research money and opportunities also lead to increasing numbers of applications which raise faster than the investments in research funding (Rockey, 2012). This growing demand creates a pressure on funders who face an excess of applications to review, and who will, in turn, require peer reviewers and selection committee members ā€” most of the time researchers themselves ā€” to invest their already scarce research time in the review process (Aubert Bonn & Pinxten, 2020b; Gingras, 2016).

Innovative Action

With the large amount of demands for funding and career opportunities, it is difficult to reduce the volume of research assessments. Nevertheless, there are ways in which the time and resource investment can be reduced to alleviate the burden of both researchers and assessors. One such initiative is the post-peer-review lottery of funding applications which proposes that, after a first thorough quality check to select proposals that are sound and methodologically adequate, assessors should select the winning applications randomly rather than through lengthy deliberation. This radical idea would not only increase efficiency of research funding assessments (Gross & Bergstrom, 2019), but it would also guard against the ā€˜natural selection of bad scienceā€™ by allowing unusual and unfashionable topics with high risk of negative findings to be funded (Smaldino et al., 2019). The lottery approach may even help reduce career insecurity in academia, a point we will discuss further in Sect. 27.2.5 (ISE task force on researchersā€™ careers, 2020). Another way to reduce the burden of research assessment is to reduce the frequency at which researchers are evaluated. Longer terms funding and research contracts could help in this matter, while further alleviating worries around the lack of security of research careers. Similarly, reduced evaluative frequency for employed researchers may help reduce the evaluative burden. Ghent University is currently experimented this change in its new career track, moving from a review interval of 3Ā year to one of 5Ā years starting in 2020 (Ghent University Department of Personnel & Organization, 2018).

Assessors

The Problem

The assessors themselves are not so frequently on the agenda for change to research assessments, despite their direct relevance to assessment processes. Particularly, when reflective and qualitative peer-review takes precedence, a great deal of subjectivity is introduced in the assessment process. Subjectivity is not a bad thing but it leaves substantial room for personal biases and involuntary discrimination in research assessments. For instance, assessors will naturally be tempted to cherry pick the information that confirms their already formed opinion (confirmation bias), to base their assessment on easily accessible anecdotal information (accessibility bias) or to let contextual aspects such as the reputation of universities listed on the CV of applications shape their views of individual candidates (halo effect (see for e.g., Clauset et al., 2015; Kwon, 2021)), to name only a few (Hatch & Schmidt, 2020). In addition, many assessment procedures ask assessors to value highly abstract concepts ā€“ for example ā€˜excellenceā€™, ā€˜high impactā€™ ā€“ differences in interpretation, misunderstandings, and unfortunately biases can then easily happen (Hatch, 2019).

Innovative Action

Diversity is an important keyword if we want to reduce the influence of biases. Indeed, guidelines explicitly recommend that research and funding organisations should strive to ensure that reviewer pools and hiring committees contain diverse profiles (Science Europe, 2020). In addition, diversity should target not only gender and ethnicity, but also the profiles of assessors and their seniority. For example, there is increasing realisation that the input for researcher assessments, for example the reference letters used, should come from superiors as well as from those supervised or managed by the researcher being evaluated (i.e., 360Ā° feedback; Vitae, n.d.). Other ways to reduce biases on research assessments have been proposed, for example avoiding photos of the candidate on the application or moving educational history with potentially biasing university names to the end of the evaluation, but the efficacy of such approaches remains largely undocumented (Hatch & Curry, 2019). Finally, training assessors to ensure that they have a clear understanding of the assessment process and providing unambiguous definition of the key concepts that are assessed (e.g., impact, excellence, quality, etc.) can help reduce biases (Hatch, 2019; Science Europe, 2020). A few universities and organisations are starting to implement these recommendations. For example, Tampere University now informs and trains evaluators across campus about responsible evaluation practices (DORA, 2021). Similarly, the Health Research Board (HRB) Ireland also started raising awareness, training staff, and providing guidance for reviewers as a way to minimize gender inequalities and reduce unconscious biases (Health Research Board, 2019), much like the Dutch Recognition and Reward Program in which training and instructions are provided to assessment committees (VSNU et al., 2019). Others also started defining the terms they use to assess researchers. For instance, Norway Universities added clear definitions of the key concepts needed in assessments (DORA, 2021), while the ā€˜Room for everyoneā€™s talentā€™ position paper explicitly defines the concept of impact. Such initiatives are still scarcely exploited and not yet evaluated, but there is growing awareness of the need to inform, train, and support those who assess researchers.

Research Environments

The Problem

We know that the environments in which researchers operate are problematic since they impose high pressures on researchers to perform and publish (Metcalfe et al., 2020; Nuffield Council of Bioethics, 2014; The Wellcome Trust and Shift Learning, 2020). Changing research assessments can likely help to reduce the ā€˜publish or perishā€™ culture. Yet, other elements in the environment of researchers are also important to consider to avoid wasting the huge efforts invested in changing research assessments.

First, the lack of stability in research careers is an essential aspect to consider. At the moment, there is a huge discrepancy between junior (temporary) and senior (permanent) positions in academia, and only between 3% and 20% (depending on the countriesā€™ estimates and faculties) of young researchers will be able to pursue the career in academia to which they aspire (Alberts et al., 2014; Anonymous, 2010; Debacker & Vandevelde, 2016; Larson et al., 2014; ā€œMany junior scientistsā€, 2017; Martinson, 2011; van der Weijden et al., 2016). In turn, this lack of stability creates an unhealthy working environment in which stress, mental health issues, and burn out thrive (Levecque et al., 2017; ā€œThe mental health of PhDā€, 2019; Padilla & Thompson, 2016). Furthermore, the scarcity of senior positions creates a perverse hyper-competition between junior scientists who wish to survive in academia. Hyper-competition not only worsens the situation, but it is also known to be an important driver of questionable research practices (Anderson et al., 2007; Aubert Bonn & Pinxten, 2019).

Beyond these interpersonal issues, the support, resources, and infrastructures that researchers receive is also essential to ensure that changes in research assessments are implemented effectively. Currently, junior researchers and PhD students often feel unsupported (Heffernan & Heffeman, 2019; Van de Velde et al., 2019) and the transition towards new expectations can generate frustration if the resources to fulfil these new expectations are lacking. For example, expecting researchers to preregister their research protocols or to make their data open and FAIR (i.e., Findable, Accessible, Interoperable, and Reusable (Wilkinson et al., 2016)) is a great step towards better research, but it comes with important needs for adequate infrastructures, training, and most importantly researchersā€™ time. Similarly, demanding open access publication is increasingly requested by funders and institutions, but it needs to come with a budget for covering article processing charges, without which inequalities may ensue (Aubert Bonn & Pinxten, 2021a).

Innovative Action

There are several initiatives that aim to improve research environments, and in many ways, the innovative actions mentioned throughout this chapter would help create a healthier, more collaborative research climate. Yet, we would like to provide more details on a three types of initiatives that target research environments directly. First, there are initiatives that play a crucial role in raising awareness and opening the discussion on the problem. Examples include the Initiative for Science in Europe (ISE) position paper on precarity in academic careers and its associated webinar series (ISE task force on researchersā€™ careers, 2020), the French movement of ā€˜Camille NoĆ»sā€™ from Cogitamus Laboratories (Cogitamus Laboratory, 2020), and the University College Union strikes that took place at 74 Universities across the UK in early 2020 to denounce ā€” among other things ā€” the casualization and the lack of employment security of research careers (University and College Union, 2020). Second, more forceful initiatives also start to appear. For instance, at the end of 2020, Sweden produced a national bill to change to the way in which it funds research so that a greater share of researchersā€™ salary would come from governmental non-competitive funding (Regeringskansliet, 2020). This bill came in response to a thorough investigation in which it was discovered that the constant search for competitive funding ultimately undermined research quality (Hwang, 2018; Regeringskansliet, 2019). In helping researchers to have a more stable salary, Sweden aims to reduce the hyper-competition and to lower the employment insecurity of researchers. The third initiative that is highly relevant when discussing research environments is the Standard Operating Procedures for Research Integrity (SOPs4RI) European Commission project that is ongoing until 2022 (Mejlgaard et al., 2020). The SOPs4RI project is creating a toolbox of best practices and guidelines to help research and funding institutions build research integrity promotion plans. In doing so, the SOPs4RI emphasizes that research integrity is not only a responsibility of researchers, but also of research and funding institutions whose operating procedures should foster healthy research environments. Simultaneously, the project is also empirically creating its own guidelines on topics that are overlooked in existing research guidance documents. One of the guidelines being produced directly targets ways in which institutions can build better and more collaborative research environments that foster research integrity.

Coordination

The Problem

The final point that we find important to discuss is the need for thorough, intense, and continued coordination between different actors of the research system. In fact, to fully address the problems we described in this chapter, an open dialogue and thorough coordination between researchers, funders, research institutions, and policy makers as well as other actors such as publishers and metrics providers is needed.

Without coordination between stakeholders, changing research assessments is difficult and unlikely to happen on a large scale. For instance, in many countries, governments use performance-based attribution to fund research institutions, meaning that the share of funding received by research institutions largely depends on quantity indicators of outputs (Jonkers & Zacharewicz, 2016). Although using bibliometric indicators to distribute funding at an institutional level does not mean that universities should assess researchers using the same criteria (Debackere & GlƤnzel, 2004), the fear of underperforming often leads universities to use these indicators internally at a researcher-level (Aubert Bonn & Pinxten, 2021c; Engels & Guns, 2018). Similarly, the way in which universities are recognized is profoundly influenced by university rankings. University rankings strongly depend on impact factors and other publication metrics, and there is increasing awareness that they have profound flaws and should be interpreted carefully (Gadd, 2020). Yet, rankings are still a dominant way of attracting funding, researchers, and students, and most universities take strategic, organizational, or managerial action to improve their rankings (Hazelkorn, 2007). Lack of coordination with metrics-providers also play a role in the problem. In fact, most major metrics belong to profitable companies whose external agendas differ from those of the research communities (LariviĆØre & Sugimoto, 2018c). Thorough communication with publishers is needed if we hope to shape metrics that align with the objectives of the research communities.

Changing researcher assessments is also something that is difficult to implement in single institutions. In the absence of a common approach of research assessments, there is a worry that researchers building a profile to succeed in one proactive institution may later be penalised if they want to migrate to another research setting in which their profile might be undervalued. In other words, the perceived ā€˜first-moverā€™s disadvantageā€™ favours a stagnant status quo and builds a feeling of hopelessness that the highly needed changes will occur (Aubert Bonn & Pinxten, 2021c).

Innovative Action

Ensuring the coordination of all stakeholders around the same objectives ā€” and finding the means to achieve these objectives ā€” is an extremely challenging task. Among others, the European University Association (EUA) briefing and The Metric Tide provide insights on this crucial need for coordinating actions at the level of research assessments, not hiding the complexity of the tasks it implies (Saenen & Borell-DamiĆ”n, 2019; Wilsdon et al., 2015). Despite the challenge, best practice examples mentioned throughout this chapter have shown that coordinated changes are possible in practice.

Actors with broad influence and substantial budgets are essential here. For example, the European Commissionā€™s ā€˜Towards 2030ā€™ vision statement addresses the issue of ranking, calling research institutions to move beyond current ranking systems for assessing university performance because they are limited and ā€œoverly simplisticā€. (Gadd, 2020). Broad reaching groups such as the European Commission Open Science Policy Platform we mentioned earlier and DORA also plays a role in coordinating changes by uniting different research institutes and member states to agree on a strategic plan of action. In South America, the Latin American Forum for Research Assessment (FOLEC) provides a platform for discussion between stakeholders on issues of research assessments (Latin American Forum for Research Assessment (FOLEC), 2020a, 2020b, 2020c). University alliances can also help coordinate changes. For example, in 2019 the consortium Universities Norway put together a working groups aiming to build a national framework for research career assessments. The group issued a report in 2021 in which they propose a toolbox for recognition and rewards of academic careers (Universities Norway, 2021). The Academy of Finland went through a similar process to create national recommendations for responsible research evaluation (Working group for responsible evaluation of a researcher, 2020), and more and more university associations and academies are following this lead.

In a slightly more drastic approach, since 2021 the major UK research funder Wellcome decided that it would only provide funding to researchers working in organizations that can demonstrate that their researcher assessments are fair and responsible (Gadd, 2020). This strategic decision incites efforts from both the institution, which would be at a disadvantage if it did not work to ensure its eligibility to Wellcome funding, and the researchers who will push their institutions to ensure they remain eligible for this important source of funding.

Finally, the program ā€˜Room for everyoneā€™s talentā€™ we described above is an inspiring example to prove that profound coordination is possible. In ā€˜Room for everyoneā€™s talentā€™, five public knowledge institutions and research funders joined forces to ensure that Dutch research institutions would abide by the new assessment models. In addition, in the position paper announcing the new model, the five parties acknowledge their responsibility to take steps towards even tighter coordination. The position paper describes their commitment to connect with international organisations such as the European University Association, Science Europe, and Horizon Europe to encourage changes and harmonisation at a European level.

Way Forward

Changing researcher assessments is difficult and requires huge investments and efforts from a diverse array of stakeholders. We have argued that current research assessments have profound inadequacies, but that promising pioneering actions are starting to address these inadequacies and to align research assessments with responsible research practices.

To continue moving forward, we need to think of research assessments in their entire complexity, addressing not only their content, but also the processes, assessors, environment, and coordination needed for change. For each dimension, we must understand the problem, raise awareness, take action, and coordinate efforts to enable change.

Even though research institutions, research funders, and policy makers have a clear responsibility in enabling the change towards more responsible assessments, we, as researchers, also have an important role to play. For one, we should remember the biases and problems of research assessments when acting as peer-reviewers or assessors and ensure that we avoid shortcuts and biases as much as we can. But we should also play a role in shaping the tenacious research culture, helping to raise awareness and mobilise action around us. In the end, when we look at what was accomplished by DORA ā€” which started from a small group of researchers and editors within the research community ā€” researchers can help to drive the change.

But changing research assessments is not the end in itself. To avoid falling in the same pitfalls we are fighting with today, it is essential to understand whether the changes to research assessments help contribute high quality and high integrity research (Moher et al., 2018). In this regard, research on research assessments is essentially important to allow us to understand, inform, and realign research assessments towards a better future. In short, we need evidence-based research assessment policies.