How should evaluation be? Is a good evaluation of research also just? Towards the implementation of good evaluation

In this paper we answer the question of how evaluation should be by proposing a good evaluation of research practices. A good evaluation of research practices, intended as social practices à la MacIntyre, should take into account the stable motivations and the traits of the characters (i.e. the virtues) of researchers. We also show that a good evaluation is also just, beyond the sense of fairness, as working on good research practices implies keep into account a broader sense of justice. After that, we propose the development of a knowledge base for the assessment of “good” evaluations of research practices to implement a questionnaire for the assessment of researchers’ virtues. Although the latter is a challenging task, the use of ontologies and taxonomic knowledge, and the reasoning algorithms that can draw inferences on the basis of such knowledge represents a way for testing the consistency of the information reported in the questionnaire and to analyse correctly and coherently how the data is gathered through it. Finally, we describe the potential application usefulness of our proposal for the reform of current research assessment systems.


Introduction
We live in an evaluation society (Dahler-Larsen, 2011): evaluations, quantifications and assessments are everywhere and analyse, scrutinize and monitor all kinds of aspects of scholarly activities. In addition, in recent decades, the rapid changes taking place in the production, communication and evaluation of research have been signs of an ongoing transformation. Broadly speaking, we are facing a transition from a traditional evaluation model, based on indicators (e.g. number of publications and citations) to a modern evaluation, characterized by a multiplicity of distinct, complementary and new dimensions including the so-called altmetrics. This situation is led by the development and increasing availability of data and statistical and computerized techniques for their treatment, including among others the recent advances in artificial intelligence and machine learning.
Generally, moral responsibility is a typical human's ability to discern, choose and operate in accordance with principles deemed to be of universal value (see Talbert, 2019, for a cogent discussion of moral responsibility). Recently, responsible research and innovation have received increasing attention. A recent review of the literature (Burget et al., 2017), building on 235 articles analysed, identifies four conceptual dimensions of responsible research and innovation, which are: inclusion, anticipation, responsiveness and reflexivity; with in addition the two emerging dimensions of sustainability and care. Wilsdon et al. (2015), drawing on discussions about responsible research and innovation, propose the notion of responsible metrics as.
"a way of framing appropriate uses of quantitative indicators in the governance, management and assessment of research. […] Responsible metrics can be understood in terms of a number of dimensions: (i) Robustness: basing metrics on the best possible data in terms of accuracy and scope; (ii) Humility: recognising that quantitative evaluation should support-but not supplant-qualitative, expert assessment; (iii) Transparency: keeping data collection and analytical processes open and transparent, so that those being evaluated can test and verify the results; (iv) Diversity: accounting for variation by field, and using a range of indicators to reflect and support a plurality of research and researcher career paths across the system; (v) Reflexivity: recognising and anticipating the systemic and potential effects of indicators, and updating them in response (Wilsdon et al., 2015, 134-135)".
In this context, it becomes increasingly important to "evaluate evaluations" (Larson & Berliner, 1983) and discuss how evaluation should be. Interesting questions on the Science of the 21st Century were posed by Sweeney (2021) during his keynote address at ISSI2021: "Are we starting from the right place? Do we need to look at where we're going, not where we're coming from? Do we understand enough about what Society needs/wants, and how our People and Research Culture will deliver it?".
Contrary to responsible evaluations and metrics, which abound in the literature, in the current debate looking for "good" evaluations is not a recurrent topic. Although there is a proliferation of increasingly sophisticated quantitative methods for evaluating research, there is still a lack of clarity on how to understand and operationalize the notion of "good" evaluation of research practices. In a previous recent work (Daraio & Vaccari, 2020) we recognize researchers' motivations and their specific traits of character (i.e. researchers' virtues) as important factors that must be considered to make a "good" evaluation.
We use "good" in a technical sense that refers to that employed in ancient Greek ethics. In this context, the good is that which is desirable for human beings insofar as it constitutes a condition of happiness that is dependent on the excellent use of faculties and aims that are characteristic of human beings (building emotional relationships, cooperating with others, professional fulfilment, etc.). The good or good life requires participation in the social practices that characterise the life of our community. The quest for good (or good life) does not therefore promote an ascetic lifestyle, but nevertheless presupposes the ability to discipline one's desires in various ways. For example, sacrificing the gratification of immediate goals for the gratification of those aims that give meaning to one's life. This exercise may, in turn, involve the ability to harmonise the different inclinations that animate our character and be able to expand them towards new objects of value.
Good evaluation takes into account the constitutive elements of research practices, intended as social practices, according to the MacIntyre view (MacIntyre, 1985;Murdoch, 1998). Following this line, good evaluation must consider researcher's virtues in the realization of the "internal goods" of the social practice they are involved in. Virtues allow the goods of the practice to be achieved and are appropriate to the different roles that researchers play according to their skills and experiences. In a good research practice, in fact, there are three main kinds of researchers involved: leaders, good researchers and honest researchers (Daraio & Vaccari, 2020). The main result of this recent study is the preliminary version of a questionnaire to assess the researcher's virtues, to be able to complement current research evaluations with additional elements referring to a scholar's motivations and weak elements in her character currently excluded by bibliometric indicators in use.
In this paper we maintain that evaluation must be good and investigate the connection between a good evaluation and a just evaluation. We show that adopting the notion of good evaluation proposed in Daraio and Vaccari (2020) automatically leads us to a just evaluation that encompasses the fairness but also goes beyond it by also relying on a broader general concept of justice. Moreover, we propose the development of ontologies and taxonomic knowledge as a way of pursuing the implementation of the good evaluation through a questionnaire on researchers' virtues.
The paper unfolds as follows. The next section illustrates the main aim and contribution of this paper. "Methods and materials" section describes the methods and materials of the paper, arguing that a good evaluation implies a just evaluation. "Towards the implementation of good evaluation" section presents the main challenges and first steps towards the implementation of good evaluation, describing an ontology-based modelling of research virtues. "Our proposal in practice" section describes how our proposal works in practice and "Concluding remarks" section summarizes and concludes the paper.

Aim and contribution
In this work, we propose to adopt a framework for the evaluation of research based on a good evaluation of research practices. We think that considering the good evaluation could be useful for concretely delineating, in real research practices, the principles and statements postulated by the existing literature on responsible evaluation which otherwise would remain vague and indefinite.
We build on the notion of good evaluation introduced in our previous research (Daraio & Vaccari, 2020) and extend the discussion on it, scrutinizing two important points: (i) the virtues of researchers that are the cornerstone of the good evaluation, and (ii) showing that a good evaluation is also morally just.
We further propose advancing towards the implementation of good evaluation in practice. It is an ontology-based modelling approach to represent the domain of researchers' virtues in order to prepare a questionnaire to be administered to researchers. This approach will allow us (1) to check the consistency and coherence of the questionnaire content and structure before its application (or use) and (2) to correctly and coherently interpret the data gathered through the questionnaire.
Finally, we contribute to the recent debate on evaluation by offering the perspective of virtues as an additional assessment tool to balance traditional bibliometric indicators.

Methods and materials
The methods applied in this paper are (i) philosophical argumentation and (ii) ontologybased modelling for designing a questionnaire on researchers' virtues. By applying philosophical argumentation we deepen the basis of good evaluation, discussing the virtues of researchers (see "The cornerstone of the good evaluation: the virtues of researchers" section) and show that a good evaluation of research practice is also morally just (see "A good evaluation is also morally just" section). The main ideas of an ontology-based modelling of researchers' virtues are described in "Designing an ontology for the modelling of researchers' virtues" section, while "Towards the implementation of good evaluation" section illustrates the first steps towards the implementation of good evaluation.

3
The cornerstone of the good evaluation: the virtues of researchers Let's start with a general characterization of the nature of virtue. Virtues are dispositions to believe, to feel emotions and act in certain ways that are activated when we perceive some relevant characteristics in the world. A courageous person, for example, will face danger firmly when she believes that someone or something to which she attaches a value is at risk of being harmed. Similarly, a charitable person, when she believes that someone is suffering, will tend to sympathize with that suffering and, consequently, seek to alleviate it. Furthermore, a thorough person is one who does not accept any belief-shaped thing that enters his or her mind, but takes special care in forming his or her beliefs. Actions of this type will not be singular but stable and predictable: whenever she perceives certain relevant characteristics of a situation, the virtuous agent will tend to give the appropriate response to those characteristics. The possession of virtue moreover seems to be something that admits degrees of competence along a spectrum that goes from knowledge about what to do to a more complex one about why to do a certain thing. This element depends on two factors that concern our practice of attributing virtuous traits: (1) We sometimes attribute a virtue even when those who perform the corresponding action are unable to give a cogent justification of what they have done and/or their actions are not cross-situationally consistent.
(2) We believe that those who give a cogent justification of their virtuous actions, and tend to manifest the virtuous trait in a coherent plurality of situations, have a greater knowledge of virtue than those who do not. Unlike the former, these agents tend to know how to distinguish circumstances that require a virtuous response from similar ones that do not.
An entirely brave person, for example, will tend to discriminate between real danger from a merely apparent threat and to avoid facing danger just for the sheer pleasure of the adrenaline that follows. He will also be able to recognize situations that require a courageous response and to be motivated accordingly in a variety of situations: not only those involving physical confrontation, but also those involving, for example, the defence of an unpopular idea or the pursuit of a complex line of research never explored before. In a similar way, an entirely thorough person takes into account that research is both time-and resource-consuming and is able to put different investment policies in place depending on circumstances. These agents seem to possess greater knowledge of the virtue of those who simply act virtuously in a limited number of cases and lack the ability to formulate a justification for what they do. They are able to articulate the reasons in favour of virtuous behaviour in a plurality of contexts and use them to justify their conduct to themselves and others. For those who possess this knowledge virtue will not be a mere disposition to act, but a disposition to act infused with a reflective ability that involves the mastery of concepts. As with other character traits, finally, virtues are profound qualities that reveal what kind of person is the one who possesses them. The attribution of character traits is in fact a common practice that we use as much to get a general idea of someone we do not know well as, if we know him better, to predict what she will say or do or to explain why she has made certain choices in the past. Moving from the spectator's point of view to that of the agent, there are facts that are typically explained (retrospectively) by referring to motives that depend on general principles of our conduct that survive that individual case. Although not all of these principles are something we necessarily approve of, some of them are aspects 1 3 of our character that are valued by ourselves and others, and that we strive to maintain, creating opportunities to test them and reflect on how we express them in our behaviour. In these cases, virtue becomes a crucial element of our narrative and plays a normative role in our future choices.
In our research we are going to use two different types of virtues: the intellectual or epistemic virtues and the virtues of character. The former involve dispositions to exercise a set of capacities that relate to how we acquire our beliefs and communicate them to others. The latter, on the other hand, concern functional ways in which we enter into relationships with ourselves and interact with others by promoting our own and their good. The former are developed primarily through teaching by specialized staff (professors; research directors; etc.). The latter, on the other hand, require us first to enter into a non-emulative educational relationship with figures who have the role of educators in the community (parents, teachers, etc.).
Below we provide an open-ended list of the virtues that enable the constitution of good research practice and constitute something that should be taken into account by those making comparative judgments about the relative value of different research practices in a complementary way with respect to currently used individual bibliometric indicators. For a critical discussion of performance indicators of individual researchers, see Wouters et al. (2013). For an extensive discussion on these individual bibliometric indicators, see Schubert and Schubert (2019) and Wildgaard (2019). Schubert and Schubert (2019) offer an overview on h-index related indicators while Wildgaard (2019) presents a summary of existing author-level indicators of research production.
The virtues that enable the constitution of good research practice include:

Intellectual virtues
Accuracy: this is the disposition that consists in the care with which the individual researchers collect data that will constitute the pool of information shared in the research practice. Since collecting and evaluating information is time-and resource-consuming, the thorough researcher is one who implements several "policies of investigation" that are appropriate to the research circumstances (Williams, 2002). Sincerity, Honesty: this is the disposition to tell others the truth, and, when this does not happen, the capacity to indicate good reasons why this did not happen where good refers to the fact that these reasons have a constitutive reference to the interests of other people (McIntyre, 1985;Williams, 2002).
Creativity: is the ability, which finds expression both in our social interactions with others and in the results of our research, to produce something that not only has value but is characterized by the elements of novelty and the capacity to arouse surprise in others (Swanton, 2003, pp. 162, 165).

Virtues of character useful to oneself
Humility: this is the ability to accept the authority of the standards related to the rules that define the practice. I have to recognize that other participants know rules and know how to apply them better than I do. I have to be willing to learn from these people and accept their criticism (MacIntyre, 1985, p. 193).
Pride: this manifests itself in evaluative attitudes towards ourselves (Ardal, 1966;Cohon, 2008;Taylor, 2015). Unlike other emotions, which simply motivate us to pursue or avoid objects, these traits of character fix our attention on persons, casting a positive or negative light on them. If I am proud of my child's success at school, my pride does not fix my attention on the 'merits of my child,' and still less on 'me in the role of father,' but on the whole of myself. As Cohon has rightly said, "when I feel pride, I am proud of something in particular [its cause] … But the attitude of pride is a pleasure or satisfaction not in that particular accomplishment or possession, but in myself in my entirety" (Cohon, 2008, p. 166). We believe that the pride associated with one's own achievements in research and the consequent approval of one's peers or superiors is a fundamental spring that drives researchers to perform at best in their area of research (Tangney, 1999). Pride is, then, a synonym of self-respect, dignity, honour, self-esteem and self-worth.
Patience is the ability to curb one's own urge to complete a piece of research in order to obtain as soon as possible the gratification of a positive result; to be able to wait and to be guided by a cautious scepticism that prompts us to control carefully the different steps of our investigation.
Prudence is the capacity to sacrifice the satisfaction of less important pleasures closer in time than the satisfaction of more distant but more important pleasures, where the degree of importance is defined with respect to the long-term objectives that characterize our lives (Parfit, 1984).
Resilience: together with pride, this ability is indispensable to move forward in the research. It allows us to leave behind failures (rejected paper, unfunded projects, etc.) and to focus on future projects (Hormann, 2018).

Virtues of character useful to others
Courage is the capacity to risk damage or danger to oneself when individuals, values, goals that are crucial to the existence of the practice are at stake. Courage is therefore a way of showing that our attachment to these elements of the practice is genuine (MacIntyre, 1985, p. 192).
Empathy, Benevolence: in line with the extensive literature, by this term we mean the human ability to feel the emotions and feelings of other people through a vicarious feeling that is similar to that of the person with whom we sympathize. We do not believe, however, that empathy in itself is a virtuous capacity in research practices. Since empathy is an instrument for reading the other's mind, it can also be used to manipulate other researchers in malicious ways. Empathy must be cultivated in such a way that it is rooted in the benevolent tendencies of human beings (Batson, 2017, p. 2). In this way, empathy can allow the creation of a climate of trust between those who work within research institutions. Indeed, mutual trust is an indispensable component in these practices given the fundamental fact of the asymmetry of power that characterizes those interactions (Baier, 1991).
Integrity: is the willingness to behave in such a way that our actions are the outcome of our deepest values and commitments, and that we tend to refuse making them hostages to imposed obligations or duties that we do not endorse on reflection.
Justice: following Aristotle (Aristotle 2014, Book V), we distinguish two senses of justice: general justice and justice as a particular virtue. The former indicates the ability to understand and apply the good rules that guarantee the good functioning and survival of a particular institution and of research practice. This capacity typically includes creative knowledge: applying a rule means being able to apply it correctly to new cases that were not initially foreseen when the rule was formulated, or to complex cases that seem to involve several conflicting rules. Possessing the virtue of general justice means being able 1 3 to make this creative use of rules out of a reflection on the aims of research institutions or practices, i.e. on their internal and external goods.
The second sense of justice is about encompassing our sense of justice as fairness. This aspect of the virtue of justice concerns the individual's attitude not to demand more from the institution in which one works than what is due to us in view of the role one occupies in the research practice. Following the taxonomy we have introduced in our research (leader, good researcher and honest researcher), this means that a good researcher should not expect to receive what a leader receives just as an honest researcher cannot expect to receive what is due to a good researcher and a leader respectively.
Practical wisdom: this is a kind of super-virtue essential for making each virtue effective. In line with Aristotle, we believe that this rational capacity enables the virtuous agent to acknowledge and respond properly to the items in the field of the research practice, choosing the appropriate means for their own ends (McDowell, 1979). Moreover, it also allows the different virtues within an individual's character to operate and develop harmoniously with each other.

A good evaluation is also morally just
In order to show that a good evaluation is also morally just, three different conceptions of justice will be examined: (1) the deontological conception; (2) the utilitarian conception; (3) the neo-Aristotelian conception.
The basic idea of the deontological conception of justice is that human relationships must be fair. Justice according to Rawls (1971) is therefore a set of principles that constitute constraints on each individual's pursuit of his or her individual goals. In this sense, justice is therefore a set of rights and opportunities that ensure that each person pursues his or her individual aims while respecting the same pursuit for other members of the community (Rawls (1971, p. 50). According to this conception, the just society is composed of rational, autonomous, equal, and independent individuals who treat each other with respect and who are governed by principles that protect this type of mutual relationship.
The consequentialist perspective, on the other hand, considers justice from the point of view of the overall consequences of actions. The most influential version of consequentialism, i.e. utilitarianism, has two components: a theory of good and a theory of just action. The first holds that pleasure is the only property that has value. The conception of right holds that the right action is the one that maximises the pleasure or satisfaction of the preferences of all individuals affected by that action. According to utilitarianism, the achievement of equity is compatible with situations where human beings are still in a state of great suffering. The goal of justice should be to promote the development of each individual's capacities, and thus happiness, and not equity in itself. According to Sen (2010), for example, a fair distribution of resources may not be sufficient to achieve this goal. A person with a walking disability might need more wealth or a higher share of state services to be able to move around as a person without a disability. According to utilitarianism, the achievement of equity is compatible with a situation in which human beings are in a state of great suffering. The task of justice should be to promote the development of each individual's capacities, and thus overall happiness, and not equity in itself. Moreover, as Singer (2003) has pointed out, increasing inequality could lead to a significant improvement in the conditions of those who are worse off. This outcome, for people living in conditions that are just above the absolute minimum necessary to live can make a considerable difference (on this point, see Donatelli, 2015). According to the utilitarian conception, therefore, a just society is not defined primarily by the existence of a certain type of relationship between individuals, but by the fact that people are more or less happy.
According to a richer and more sophisticated utilitarian version, however, originally formulated by Mill (see Mill, On Liberty, in Mill & Ryan, 1997), justice is to be understood in a broader way that includes the protection of the enduring interests of human beings as 'progressive beings'. In this sense, justice is closely related to the rights of freedom, not only the negative ones, which protect the individual sphere from state interference, but also those rights that concern the promotion of material conditions so that individuals can actually freely choose their own life projects. Mill thus links justice to respect for the so-called perfect rights of the Natural law tradition, that is, to that class of rights with respect to which each individual possesses a "valid claim on society to protect him in the possession of it, either by the force of law, or by that of education and opinion." Finally, the third, which emerges within the communitarian tradition and has been famously advocated by Sandel (2010) draws directly on the Aristotelian view of justice. Like Mill's, this conception presents a broad notion of justice that is not confined to the realm of the distribution or redistribution of goods in society, but inquires more broadly about what kinds of goods should be promoted by the different institutions on which society is founded. Unlike some versions of the Kantian-style deontological conception, which tend to be neutral on ethically sensitive issues, it promotes an ongoing public debate about different conceptions of the good life that can result in the adoption and enactment of ethically oriented laws. In relation to what most directly affects this paper, this conception holds that the identification of the multiple goods that should be distributed among citizens must be based on a shared conception of the functions of the social institutions on which society is based. Identifying which goods education or public health should promote presupposes a conception of the function of hospitals, schools and universities a conception that, in turn, is also connected to a view regarding what virtues are respected and honoured in those institutions or practices.
Sandel's neo-Aristotelian conception is a fruitful approach to responding to distributional problems that arise within research practices and their comparative evaluation. Following Aristotle, we distinguish between the virtue of general justice and the virtue of justice. Both are qualities of the character of researchers.
Let us dwell on the first. This indicates not only the ability to abide by the general rules that enable research practices to stand and flourish, but also the ability to identify behaviours that are appropriate to those rules in the presence of circumstances that either were not anticipated by those rules or are too complex to be addressed by simply applying those rules mechanically. The researcher who possesses a sense of general justice is able to identify appropriate actions that extend the content of the rules governing the proper functioning of a practice to new cases. This is possible because the virtuous researcher is able to go back and reflect on the constitutive ends of each research practice or institution, i.e., its internal and external goods, and based on that reflection indicate the appropriate actions to solve the concrete issue under discussion.
Justice as a special virtue has a different but equally important role. Still following Aristotle, we can identify this virtue with the ability to counteract our desire to receive more than our fair share. Respect for this virtue, underlies the very structure of work in good research practice which is organized between leader, good and honest researchers. This distinction is maintained by assuming that each individual accepts what is due to him or her in his or her role in research practice. And that is why this second virtue of justice is so important.

3
The inclusion of the virtue of justice, in the two senses indicated, among the virtues of researchers allows us to identify a new potential characteristic in good research evaluation. We have argued elsewhere (see Daraio & Vaccari, 2020) that good research evaluation is one that takes into account the virtues of researchers, understood as their ability to excel in the different spheres of activity that characterize the internal and external goods of research practices. As we have shown, these virtues must also include the virtue of justice. The content of this virtue depends on the ends, the attainment of which constitutes the good of the practice. This then means that a good evaluation is also a just evaluation, since an evaluation that takes into account the virtues of researchers cannot fail to take into account, i.e., measure, the degree to which the virtue of justice is present in the components of research practice.

Designing an ontology for the modelling of researchers' virtues
Formally, an ontology in Description Logics is a knowledge base. It is a couple (pair) O = < TBox,ABox > , where TBox is the Terminological Box that represents the intensional level of the knowledge or the conceptual model of the portion of the reality of interest expressed in a formal way; and ABox is the Assertion Box that represents the extensional level of the knowledge or the concrete model of the portion of the reality expressed by means of assertions (instances).
The use of ontology-based modelling in our context allows us to implement cognitive interviewing methodology to address the challenges outlined in the previous section.
Cognitive interviewing is a psychologically oriented method for empirically studying the ways in which individuals mentally process and respond to survey questionnaires. Cognitive interviews can be conducted for the general purpose of enhancing the understanding of how respondents carry out the task of answering survey questions. However, the technique is more commonly conducted in an applied sense, for the purpose of pre-testing questions and determining how they should be modified, prior to survey fielding, to make them more understandable or otherwise easier to answer. The notion that survey questions require thought on the part of respondents is not new and has long been a central premise of questionnaire design. However, cognitive interviewing formalizes this process and it has become an interdisciplinary field (for an overview, see Willis, 2004;Miller et al., 2014).
An ontology-based semantic modelling approach offers several advantages, including: (i) A conceptual specification of the domain of interest, in terms of knowledge structures; (ii) The mapping of such knowledge structures to concrete data (the answers of the questionnaire); (iii) Reasoning over the abstract representation of the domain prior to the data collection; (iv) A flexible conceptual system that can be easily updated; (v) An open conceptual system that can be used as a common language for the research community.
The languages for representing ontologies and taxonomic knowledge, and the reasoning algorithms that can make inferences on the basis of such knowledge have been addressed by a large body of research in Artificial Intelligence and Knowledge Representation. Their formal characterisation is nowadays based on Description Logics (Baader et al., 2003), which provides a syntax for concept and role expressions and formal semantics to interpret them in a set theoretic framework. Concepts (i.e. classes) model sets of individuals and roles model binary relations. The representation of ontological knowledge is thus achieved by defining concepts and the properties (relations) that link them to other concepts in the domain of interest. The concepts are arranged in a hierarchical structure based on the subsumption relation (i.e., set containment). Along with the formal language, systems that allow us to model and use ontologies are accompanied by various forms of syntactic representation, including graphical models. Protegé (Gašević et al., 2009) is a standard tool that builds its success, among other things, on its capability to handle multiple syntactic representations that allow the user to model the domain of interest using the most convenient notation, while grounding it to a well understood formal counterpart. Another key feature of Protegé is the decoupling of the representation from the reasoning tool that is adopted to make inferences. Protegé, for all the reasons explained above, will be used for the development of the ontology for the assessment of researchers' virtues.

Challenges and first steps
The assessment of researchers' virtues is challenged by several pitfalls and problems. The first issue is related to the question of the measurement of virtues. Virtue, it has been argued, seems to consist of a special sensitivity that escapes empirical measurement (Murdoch, 1998). Recently, however, some scholars have tried to undermine this pessimistic assumption by giving hope to those of us who seek to develop a model for evaluating research that also includes a component based on the virtues. Snow (2014) has recently launched a promising line of research based on the elements of psychology that characterize the virtues. In its perspective virtue is composed of the following three elements: (1) Intelligence, which highlights the fact that virtue proceeds from a set of cognitive and emotional mental states that enable us to be sensitive to some morally relevant features of the situations in which, really or imaginatively, we find ourselves (Snow, 2014, pp. 4-5). See also Snow (); (2) Dispositionality refers to the fact that this state is a trait of the personality of the agent and is not an occasional element of his psychology; (3) Behaviour, i.e. virtue typically manifests itself in the actions and other behavioural responses of the virtuous person (Snow, 2010, pp. 4-5).
Snow argues that each of these characteristics of virtue can be measured and she outlines a model that consists of three measurement criteria. First, the agent's performance must be taken into account, i.e. the presence of the virtue in question must be verified from the agent's ability to repeatedly perform a given behavioural pattern in the different situations that constitute, so to speak, the field of action of a specific virtue. Secondly, Snow believes it is crucial to take into account the reports that agents make of their emotional and cognitive life during the performance of actions that they consider virtuous. To facilitate this task, Snow believes it is desirable that, on the model of some US colleges, research institutions make available to their participants special apps that can be downloaded on any electronic device, allowing them to collect the results of the self-observations of agents. Gathering the products of introspection, in addition to offering useful material to those who are called to assess the presence of virtues in others, also allows agents to take into account the health of their virtues and measure any flexing or, on the contrary, increases their readiness and effectiveness in responding to the pressures the world exerts on them. Finally, Snow argues that it is important to connect these data with those that impartial observers, in the form of external evaluators, can collect in the course of annual surveys covering both the outputs of the research and the way in which the researcher dwells in different spheres of social interaction with other participants in the practice.
A further problem to be addressed is which questions to introduce in the questionnaires. These must be sufficiently diversified to allow the evaluators to answer not only the blunt question about whether or not there is a virtue, but also to determine the quantum of it. Snow suggested four levels to be introduced in the questionnaires.
(I) The first verifies the presence in the agent of receptivity to the stimulus that typically activates virtue. (II) The second examines its ability to recognize the virtue appropriate to the given circumstance. (III) The third verifies the most complex ability to generate a virtuous response. (IV) The fourth, finally, measures the ability of the agent to generate a virtuous crosscutting response to a plurality of situations.
Following the four levels of questions introduced by Snow, it is possible to measure on a scale from 0 (minimum) to 4 (maximum) the researcher's mastery of virtue. This is done over a spectrum ranging from (1) the ability to understand the importance of the problem to which virtue constitutes an answer, to (2) the ability to recognize the virtue in question, to (3) the ability to express virtue occasionally, to (4) the ability to manifest it in all situations that constitute the scope of that virtue.
On top of these problems, the traditional problems related to the development of questionnaires and the collection of the necessary information through questionnaire and interview arise (Hochschild, 2009;Kvale, 2008;Rabionet, 2011;Wolcott, 2008).
In this paper we suggest to exploit the advantages of an ontology-based modelling that we will illustrate after the next section, for overcoming the aforementioned problems.

Semantic modelling of the virtues of researchers
The starting point for the semantic modelling of the domain under examination are the virtues of researchers. A first attempt to develop a questionnaire for the evaluation of virtues in research practices has been done by Daraio and Vaccari (2020). Our Table 1 below, elaborating further Table 1 of Daraio and Vaccari (2020, pp. 1067-1068, proposes some examples of questions to consider in evaluating the virtues of researchers.

Related works
While there is a rich literature and several approaches for extracting information through ontologies, for a review see Wimalasuriya and Dou (2010), the literature on ontological modelling in support of questionnaire development is scant. Notable exceptions include Sherimon et al. (2014) where an ontology-based model for gathering patient medical history based on a dynamic questionnaire ontology is developed. The model is implemented and explained for the domain of diabetes by using Protegé. Another interesting contribution is Borodin and Zavyalova (2016), in which the authors focused on the problem of Table 1 Examples of questions to include in the questionnaire on researchers' virtues

Accuracy
Do you thoroughly collect all the pieces of information that constitute the body of knowledge around which the practice revolves? Do you evaluate this disposition as instrumental or intrinsic? Do you have a tendency to share data, results, methods, ideas, techniques, and tools used in your research practice? Sincerity, Honesty Do you think there are circumstances in which your colleagues can be manipulated? Are you inclined to admit publicly when you make mistakes? Creativity Are you able to explore and follow your own line of research? Do you have the tendency to question your own ordinary experience and look with suspicion at what is the result of habit? Virtues of character useful to oneself Humility Are you inclined to recognize that other researchers (participants in the practice) know rules and know how to apply them better than you do? Are you willing to learn from these people and accept their criticism? Pride Do you have the ability to feel fulfilment for academic success through demonstrating competence according to social standards and to draw strength from your achievements? Do you think you have a stable awareness of your own value that is not shaken by the successes of others? Do you have the capacity to enjoy and congratulate other people's accomplishments? Patience Are you able to curb the rush to hastily complete a search to achieve the gratification that comes with a prima facie positive result? Are you willing to be guided by a cautious scepticism that prompts you to control accurately the different steps of your investigation? Prudence What do you think about people who tend to restrain actions, inclinations, and impulses that are likely to upset others? Do you tend to respect the fundamentals of the research practice within which you work? Resilience Would you describe yourself as a person who leaves behind failures (rejected paper, unfunded projects, etc.) and to focus on future projects?

Virtues of character useful to others Courage
Are you willing to risk damage or danger to yourself when individuals, values, goals that are crucial to your research practice are at stake? Do you have a propensity to apply for highly competitive grants? Empathy, Benevolence How do you feel about people who are sensitive to the suffering of their colleagues caused by failures or exclusions? How do you feel about people who seek feedback to improve interactions with others? Do you think that preservation and promotion of the welfare of people with whom you are in frequent personal contact is important? How would you describe collegial engagement? Integrity Do you think that to hold beliefs that are consistent with actions has always a positive value? semantic representation of questionnaires. They constructed the generic ontological model of questionnaire, which provides a possibility of question structure description including complex questions with a set of answers of different kind and question order, including skipping and branching. Surveys, in fact, may be conducted to gather information through a printed questionnaire, over the telephone, by mail, in person or on the web, etc. and the structure of survey questionnaires and feedback if described in ontological terms provides an opportunity not only to structure the survey data but also to analyse the responses.
Our contribution adds to the limited existing literature, showing the potential of ontology-based modelling for challenging and intriguing topics such as the "good" evaluation of research practices, which relates the development of a questionnaire for the assessment of the motivations and the stable character weaknesses (or virtues) of researchers.

Our proposal in practice
The application potential of our approach is enormous. Considering the recent request to rethink research evaluation systems by various stakeholders and policy makers at an international level, our proposal is timely and could help to design and implement a new evaluation framework.
The Science Europe "Position statement and recommendations on research assessment processes" (Science Europe, 2020) revealed the complexity of research assessment processes and the variety of methods applied by research organizations. Science Europe (2020, p. 24) concluded with an invitation to "the exploration of novel approaches to guide changes to the research system". Other calls for changing the current practices of research assessment are reported in Saenen et al. (2021) on "Reimagining Academic Career Assessment: Stories of Innovation and Change" and Grant (2021) on "Academic Incentives and Research Impact: Developing Reward and Recognition Systems to Better People's Lives". The European Commission in a recent scoping paper entitled "Towards a reform of the research assessment system: scoping report" proposed a "European agreement that would be signed by individual research funding organizations, research performing organizations and national/regional assessment authorities The content of this table is an elaboration of the content of Table 1 by Daraio and Vaccari (2020, pp. 1067-1068  and agencies, as well as by their associations, all willing to reform the current research assessment system". In this scoping paper, the European Commission stated: "the current research assessment system often uses inappropriate and narrow methods to assess the quality, performance and impact of research and researchers. Notably, the quantity of publications in journals with high Journal Impact Factor and citations are currently the dominant proxies for quality, performance and impact. Many research funding and performing organisations are already taking steps to reform and improve the way they assess research and researchers, but progress remains slow, uneven and fragmented across Europe. […] The aim is for research and researchers to be evaluated based on their intrinsic merits and performance rather than on the number of publications and where these are published, promoting qualitative judgement with peer-review, supported by a more responsible use of quantitative indicators. The way in which the system is reformed should be appropriate for each type of assessment: research projects, researchers, research units, and research institutions. A reformed system should also be sufficiently flexible to accommodate the diversity of countries, disciplines, research cultures, research maturity levels, the specific missions of institutions, and career paths (European Commission, 2021, p. 3)." Thus, our proposal comes at a time of great transition, where efforts are being made to reform current research evaluation practices.
Our approach can be used in a complementary way in a standard assessment based on indicators.
There are several evaluation processes, including (i) assessment of the career advancement of individual scholars, (ii) evaluation of research proposals for funding allocation, or (iii) assessing the performance of research teams, institutes or departments and universities or research centres.
When the evaluation concerns academic researchers, other academic and teaching activities carried out by the researchers should also be considered.
Depending on the type of evaluation being considered (career advancement, project evaluation or institution/centre/department or group evaluation) the balance between traditional bibliometric indicators and researchers' character traits may be different.
For example, in the selection of young researchers, at the beginning of their careers, the evaluation of character traits (or virtues, which are qualities or robust motivations of the individuals that lead to action) may be preponderant, since the results of research carried out so far may be unavailable or unrepresentative, but an evaluation of the research potential of the subjects must be carried out. When evaluating research projects, there can be a judicious balance between indicators and character traits. In cases where institutions, departments, centres, or research groups are assessed, indicators could be used as "minimum thresholds" (see Moed, 2020), after which the focus could be on assessing the character or virtue traits of researchers belonging to the institutions, departments, centres, or research groups.
In evaluating individual careers, Gläser and Laudel (2015) proposed to investigate the three careers of a scholar, including the studies carried out, the content of the research in connection with the scientific community of reference, and the institutional career.
Our proposal to consider the traits of the character of researchers and to focus on research practices in which goods internal and external to the practices are realized, allows us to orient the evaluation towards the constitutive elements of good research practices, and not to focus exclusively on the output of the research activity.
The evaluation of character traits enables the identification of the salient characteristics of the subjects with respect to their role in research practice. By distinguishing between leader, good and honest researcher, and considering their respective virtues, it is possible to discriminate between good research practices and bad research practices.
When we consider character traits, we make a very general (so-called "thin") assessment that refers to the characteristics of the human being as such, regardless of the specific context in which the researchers find themselves (including freedom to decide on research topics, sufficient funding, stable contract etc.).
The proposed list of virtues to be used in assessing the character of researchers makes it possible to formulate judgements that, while possessing an unavoidable component of subjectivity, aspire to be intersubjective. On the one hand, judgements cannot but be shaped by the cognitive and cultural biases of the evaluators interpreting the virtues. On the other hand, the list is a closed one and this delimits the evaluator's discretionary power, forcing him to use a shared yardstick that is intelligible to others. It should also be noted that at a certain level of analysis, the virtues included in the list can be declined in different ways that take into account cultural factors of the country in which the research practices take place. This means, for example, that the threshold of resilience required for this to become a virtue may differ from country to country. It is evident that in a patriarchal society, where women are less used to applying for research grants, the threshold for attributing the virtue of resilience is lower than that required of young female researchers in a prestigious American university.
Our framework allows a more structured evaluation of the research that overcomes the limits of a purely quantitative evaluation focused on the outputs (publications and citations) of the research activity.
The good and fair evaluation that we propose is aimed at evaluating the subjects with respect to the function they perform within the practice (distinguishing between leader, good and honest researcher), allowing an evaluation of the subjects for what they are and not just for what they do.
The distinction between epistemic virtues and virtues of character proposed in our framework offers a richer "rationale" for discussing the issue of project evaluation, whether to fund projects or people. Ioannidis (2011) provocatively concluded his paper by saying: "The aim of science is to expand our knowledge base, which, eventually, yields useful applications. This is what scientists entered their profession to do, so requiring them to spend most of their time applying for grants is irrational. It's time to seriously consider another approach" (Ioannidis, 2011, p. 531). Table 2 reports a summary of different types of research funding systems proposed by Ioannidis (2011) in which we have added the last two columns to describe how our approach can be useful in modernizing research funding systems and to report some clarifying examples.

Concluding remarks
In this contribution we have elaborated a model of good research evaluation. The model argues that good evaluation must take into account the three characteristics that constitute good research practice. These are the virtues of the researchers, the internal and external goods of the practice and the division of the components of the research practice into leader, and good and honest researchers. In this paper we also argue that good evaluation of a research practice is also just evaluation. This depends on the fact that good evaluation will also have to take into account the virtues of justice of the researchers. We have argued Table 2 Different funding systems and our approach. Adapted from Ioannidis (2011, p. 530) *Scientific citizenship practices include data sharing, high-quality methods, careful study design and meticulous reporting of scientific work, openness to collaboration, nonselective publication of "negative" findings, balanced discussion of limitations in articles and high-quality contributions to peer-review, mentoring or database curation Does not eliminate project proposals Is vulnerable to favouritism Holds potential for exaggerated promises and claims Assess scholars considering their epistemic and character virtues Fund scholars with highest intellectual virtues of creativity that the virtue of justice is not reducible to fairness and includes two capacities, that of being able to creatively and extensively interpret the rules that underlie a research practice and the ability not to ask for a greater share of goods than we are entitled to by the role we play in the practice. Based on the conceptual tools used by normative ethics, in particular from the perspective known as virtue ethics, we have started to develop a questionnaire capable of revealing the presence and quantity of virtues in individual researchers. We believe that the use of ontology-based modelling in this context might enable us to further implement a cognitive interviewing methodology which may help us to address the many challenges of this research field. Our model takes up the very recent invitation of the European Commission to reform existing assessment procedures based on bibliometric indicators. On the basis of an awareness of the crucial role that individual character plays in research practices, we are proposing to balance or, in extreme cases, replace the bibliometric-based research assessment. The extent of this integration cannot be decided in the abstract, but will be established on a case-by-case basis depending on which area of research is being evaluated, i.e. whether it is the researchers' career, a research project or an entire research institution.
Acknowledgements The financial support of the Sapienza University of Rome (through the Sapienza Awards No. PH11715C8239C105 and No. RM11916B8853C925) is gratefully acknowledged. This paper is a substantially extended version of the paper by Daraio and Vaccari (2021) presented at the 18th International Conference on Scientometrics & Informetrics (ISSI2021), held virtually in Leuven on 12-15 July 2021. In this extended version, we added the discussion on how evaluation should be, the explanation that a good evaluation implies a just evaluation that goes beyond the concept of fairness and includes a broader sense of justice, and extended the part on justice in the implementation section about the questionnaire. Finally, we express our gratitude to two anonymous reviewers for providing stimulating comments and suggestions that allowed us to greatly improve the quality of our paper.
Funding Open access funding provided by Alma Mater Studiorum -Università di Bologna within the CRUI-CARE Agreement.

Conflict of interest The first author (Cinzia Daraio) is a member of the Board of Scientometrics.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.