Heuristics as conceptual lens for understanding and studying the usage of bibliometrics in research evaluation

While bibliometrics are widely used for research evaluation purposes, a common theoretical framework for conceptually understanding, empirically studying, and effectively teaching its usage is lacking. In this paper we develop such a framework: the fast-and-frugal heuristics research program, proposed originally in the context of the cognitive and decision sciences, lends itself particularly well for understanding and investigating the usage of bibliometrics in research evaluations. Such evaluations represent judgments under uncertainty in which typically neither all possible outcomes, nor their consequences and probabilities are known. In such situations of incomplete information, candidate descriptive and prescriptive models of human behavior are heuristics. Heuristics are simple strategies that, by exploiting the structure of environments, can aid people to make smart decisions. Relying on heuristics does not mean trading off accuracy against effort: while reducing complexity, heuristics can yield better decisions than more information-greedy procedures in many decision environments. The prescriptive power of heuristics is well-documented in a large literature, cutting across medicine, crime, business, sports, and other domains. We outline the fast-and-frugal research program, provide examples of past empirical work on heuristics outside the field of bibliometrics, explain why heuristics may be especially suitable for studying the usage of bibliometrics, and propose a corresponding conceptual framework.


Introduction
A man is suffering from serious chest pain and rushes into hospital. The medical personnel have to make a decision immediately: should the man be assigned to the coronary care unit or to a regular nursing bed for monitoring? The responsible doctor sends the patient to the coronary care unit. How do physicians arrive at such decisions? There are at least three ways. First, a doctor might not make a decision in the first place, but simply follow the rule of thumb to play it safe by sending all patients directly to the coronary care unit. The downside of this strategy: Many patients, who do not warrant special attention, will end up in that unit, eventually overcrowding it, while decreasing the quality of care and increasing costs. Second, the doctor might have relied on the Heart Disease Predictive Instrument (Pozen, D'Agostino, Selker, Sytkowski, & Hood, 1984). This complex tool relies on a logistic regression to estimate the probability that a patient better be assigned into the coronary care unit. Third, the doctor might have used a simple decision tree, for example the one shown in Figure 1. This tree consists of three questions. (1) If the electrocardiogram reveals a change in the so-called ST-segment, the patient is immediately sent to the coronary care unit without that any other information is considered.
(2) If there is no such change, but the chief complaint is chest pain, the patient is assigned to a regular nursing bed.
(3) If chest pain is not the chief complaint but any one of five other factors is present, the patient is sent to the coronary care unit; else the patient ends up in a regular nursing bed. As it turns out, basing a decision on those three questions can lead to better outcomes than following the more complex regression-based Heart Disease Predictive instrument. 1 The decision tree depicted in Figure 1 is a fast-and-frugal heuristic (Gigerenzer, Todd, & ABC Research Group, 1999). The word "heuristic" has Greek roots and means "serving to find out or discover" (Gigerenzer & Brighton, 2009, p. 108). A fast-and-frugal heuristic is a decision strategy that bases decisions on little information (hence dubbed frugal), and that, in doing so, allows making quick decisions (hence fast).

Figure 1.
A simple heuristic for deciding whether a patient should be assigned to the coronary care unit or to a regular nursing bed (NTG: nitroglycerin; MI: myocardial infarction; T: Twaves with peaking or inversion). Source: Marewski and Gigerenzer (2012, p. 78) who adapted it from Gigerenzer (2007). The figure itself is based on Green and Mehr (1997).
Basing decisions on little information can also aid making accurate decisions. To explain, in many real-world situations relevant information comes with irrelevant bits, including mere random noise. In addition to setting irrelevant information aside, fast-andfrugal heuristics can exploit the statistical structure of decision making environments, such as the ways in which information is distributed, or how predictor variables correlate with each other. As studies in medicine, business, crime, sports, voting, and other domains have shown, fast-and-frugal heuristics can yield accurate decisions in classification, forecasting, selection, and other tasks (see Gigerenzer & Gaissmaier, 2011;Goldstein & Gigerenzer, 2009, for an overview). In addition, heuristics can capture how humans actually make decisions; that is, they can be good descriptive models of behavior.
This paper aims to propose a conceptual frameworkthe fast-and-frugal research program (e.g., Gigerenzer et al., 1999)for studying the application of bibliometrics to research evaluation. Such a framework can aid uncovering which bibliometrics-based heuristics (BBHs) are used, when should they be used (and when not), as well as how corresponding heuristics can be investigated. This paper is not intended to advocate bibliometrics as means for research evaluation; the use of bibliometrics for evaluating research is highly debatable (e.g., Gingras, 2016;Osterloh & Frey, 2014). Rather, we outline the fast-and-frugal research program, provide examples of past research on heuristics outside the field of bibliometrics, explain why heuristics might be especially suitable for studying the usage of bibliometrics, and outline a corresponding conceptual framework.
2 How the study of heuristics can inform the study of bibliometrics Our central thesis is as follows. While many real-world environments appear to afford making good decisions based on little information, there are still domains in which the dominant line of thought prescribes searching for complex solutions: When decision makers in science (scientists or managers) evaluate units (e.g., research groups or institutions), complex solutions are generally preferred (e.g., because scientific quality is conceived of as a multi-dimensional phenomenon). In research evaluation, complexity can come in at least two different disguises: First, complexity has arrived at the evaluation stage when many indicators are consideredbe they separately presented in long lists or combined into a (weighted) composite indicator. One example is U-Multirank 2 , an international university ranking developed recently (van Vught & Ziegele, 2012). Alongside the ranking, an indicators book was published explaining the many included indicators on more than 100 pages (U-Multirank, 2017). Another example is the Altmetric Attention Score for measuring public attention to research which is based on 15 weighted alternative metrics (altmetrics), such as Twitter and Facebook counts. 3 Second, complexity can come in the disguise of evaluation procedures that include numerous internal and external reviews. Those reviews serve to assess institutions and scientists on multiple evaluation criteria, all of which are then often integrated into a comprehensive joint score or assessment. For example, many universities have established evaluation procedures that start with an internal evaluation in which a research unit (e.g., a university department or faculty) evaluates itself resulting in a self-evaluation report including comprehensive statistics (e.g., Rothenfluh & Daniel, 2009). What follows is an external evaluation in which a group of well-known experts visits the department for some days. The visit typically finishes with a report on the unit's strengths and weaknesses. The experts' assessments are based on their own impressions and on the self-evaluation report (Bornmann, Mittag, & Daniel, 2006). For these procedures, the unit and a group of reputable peers are absorbedleading to longer periods with little active research.
Bibliometrics is a method for assessing research activities that is frequently employed in research evaluation procedures (besides peer review). With bibliometrics, research activities are evaluated by reducing the information considered to publication and citation numbers. Despite its frequent use, the method is often criticized. According to Macilwain (2013) a fundamental defect of quantitative research-assessment tools, such as bibliometrics, is that they are "largely built on sand" (p. 255), which means that they cannot directly measure the quality of research and instead use "weak surrogates, such as the citation indices of individuals" (p. 255). An overview of other critical points on bibliometrics (especially citation analysis) can be found in the paper The mismeasure of science: citation analysis, published by MacRoberts and MacRoberts (2017). Critics mention, for example with respect to citation counts, that important publications for the advancement of science are not always highly cited, or that researchers create citation circles which cite each other by courtesy (Gigerenzer & Marewski, 2015).
Many publications that critically target bibliometrics focus either on the method as a whole or on flawed bibliometric indicators (e.g., the h index, Hirsch, 2005). We believe that many critical points are justified and should be considered in the use of and research on bibliometrics, but the points are not processed in coordinated research activities targeting what might be a valid reliance on bibliometrics in research evaluation. Bibliometrics lack a common, foundational research program which could function as an aegis for these activities.
We think that an important condition for a common program is the existence of a conceptual  Waltman & van Eck, 2016). Precisely describing the environments in which a heuristic works, in turns, is one of the key research goals of the fast-and-frugal heuristics framework. 4 Second, the fast-and-frugal heuristics framework prescribes not only studying when which heuristics will work well and when they will fail; this framework also asks the question when and how people will use which heuristic. Hence, descriptive work on bibliometrics would investigate when and how scientists, administrators, and others rely on the fieldweighted citation impact and other indicators. In this vein, models of heuristics focus on the cognitive processes and not only on the outcomes (i.e., results) of problem solving. They describe (1) algorithmic rules, (2) skills using these rules, and (3) the categories of problems which can be solved (i.e., the environments in which the heuristics can be successfully applied).
For example, like many other heuristics, fast-and-frugal trees such at the one shown in Figure 1 can be cast in terms of three sets of algorithmic rules: A search rule that specifies what information (e.g., predictor variables) is searched for, a stopping rule that prescribes when information search ends, and a decision rule that determines how the acquired information is combined to classify objects (e.g., patients). In their general form, the rules specifying fast-and-frugal trees read as follows: Search rule: Look up predictor variables in the order of their importance.
Stopping rule: Stop information search once one predictor variable is found that allows classifying objects.
Decision rule: Classify objects according to this predictor variable.
Skills necessary for using the fast-and-frugal tree shown in Figure 1 include expertise in medical diagnosis, such as knowing how to assess changes in the ST-segment and T-waves, 4 Note that this ecological approach to studying decisions is one of the several lines of division between the fastand-frugal heuristics and alternative approaches to studying decision making. The literature on heuristics-andbiases (e.g., Kahnemann, Slovic, & Tversky, 1982), for instance, is often taken to suggest that heuristics are error-prone, biased mental shortcuts that rational decision makers ought to avoid (see also Lopes, 1992; see e.g., Marewski, Gaissmaier, & Gigerenzer, 2010a;Marewski, Gaissmaier, & Gigerenzer, 2010b, for a discussion).
respectively. The clinic environment in which this tree will yield accurate decisions must be specified in terms of the patient population and its attributes, because the same classifier (e.g., a fast-and-frugal tree, an HIV test, or a bibliometric indicator used to categorize scientists) might yield different classifications, for instance, depending on the prevalence of a condition of interest within a population.
Third, several studies on fast-and-frugal heuristics have shown that their predictive accuracy can be similar to or higher than that of weighted-additive and other more information-greedy models (for overviews, see Goldstein & Gigerenzer, 2009;Hafenbrädl, Waeger, Marewski, & Gigerenzer, 2016). Let us mention just a few heuristics. Simply weighting information equallywhich is what the so-called tallying heuristic doescan yield as accurate as and sometimes even better inferences than multiple regression which optimally weighs information (e.g., Czerlinski, Gigerenzer, & Goldstein, 1999;Dawes & Corrigan, 1974;Einhorn, 1975). The take-the-best heuristic (Gigerenzer & Goldstein, 1996), which similarly to fast-and-frugal trees bases decisions on just one predictor variable, has been found to outperform multiple regression across 20 different environments, covering psychology, sociology, demography, economics, health, transportation, biology, and environmental science (e.g., Brighton, 2006b;Czerlinski et al., 1999;Gigerenzer & Brighton, 2009). The recognition heuristic (e.g., Goldstein & Gigerenzer, 2002), which relies on name recognition as the only variable to make inferences and forecasts, can predict the outcomes of Wimbledon tennis matches better than the ATP rankings and better than the seeding of Wimbledon experts (e.g., Scheibehenne & Bröder, 2007;Serwe & Frings, 2006). By exploiting people's systematic ignorance (i.e., their systematic lack of recognition), that heuristic can, moreover, even aid to predict the outcomes of political elections (e.g., Gaissmaier & Marewski, 2011).
Importantly, as can be seen from the examples enlisted above, prescriptive research on fast-and-frugal heuristics examines the performance of heuristics not in isolation, but asks how well a given heuristic performs in comparison to competing approaches (Gigerenzer & Brighton, 2009;Marewski, Schooler, & Gigerenzer, 2010 Fourth, an adaptation of the concept of heuristics to bibliometrics seems particularly reasonable, because bibliometrics are commonly used to assess a highly complex phenomenon, and one that comes with considerable uncertainty. When will potentially different aspects of the scientific merit of an article, a unit, or a scientist reveal itself: in a year from now, five years, or perhaps never? How will those aspects reveal themselves: will a finding lead to a revolution in technology and society (e.g., like the invention of the computer), or will it simply lead to new insights (e.g., about the workings of human memory)? By definition the future is knowable only from hindsight. But even the present and past can be uncertain, namely when the decision maker does not know the 'true state' of the world and has to infer that state. 5 Heuristics are tools for dealing with uncertainty. Yet, surprisingly, in the bibliometrics literature, only a few hints to heuristics can be found and those are not based on fast-andfrugal heuristics as tools for managing uncertainty.
For example, Saad (2006), Prathap (2014), and Moreira, Zeng, and Amaral (2015) conceive of the h index (and its variants, see Bornmann, Mutz, Hug, & Daniel, 2011) as a heuristic tool which reduces quantity and quality information to a single value. Heinze (2012Heinze ( , 2013 introduces a heuristic tool (actually a classification scheme) "that singles out creative research accomplishments from other contributions in science" (Heinze, 2012, p. 583). 5 In this paper, we use the terms "infer" or "inference" to refer to all judgments where the 'true state' of the world is unknown to the decision making at the movement of making the inferencebe those inferences about the present or past, or inferences about the future (= forecasts). That is, forecasts and predictions are also inferences. Classifying research into different categories (e.g., 'high quality' versus 'medium' versus 'low quality') warrants inferences, toonamely about quality.
Beyond those isolated examples, to the best of our knowledge, people's use of heuristics in the area of evaluative bibliometrics has not been researched. It is unclear whether heuristics are applied andif soin what form. If the use of heuristics could be identified, their frequency of use could be detected in given task environments, and empirical investigations about the validity of judgments based on them could be undertaken. Just as in studies on heuristics in other areas, it could become the goal of descriptive research on bibliometrics to formulate and evaluate highly precise formal (i.e., computationally or mathematically specified) models that could be submitted to fine-grained mathematical analyses, powerful computer simulations, and strong experimental tests: "Formal models of heuristics allow asking at least two questions: whether they can describe decisions, and whether they can prescribe how to make better decisions than, say, a complex statistical method" (Gigerenzer & Gaissmaier, 2011, p. 459).
Fifth, humans are not omniscient and omnipotent. They cannot foresee the future, and their information-processing capacities are limited. Defying the rational economic theories of his time, Nobel Laureate Herbert Simon coined the term bounded rationality to refer to those limits. Simon also stressed that human cognitive capacities are nevertheless adapted to their environment. The fast-and-frugal heuristics research program has taken up those lines of thoughts (e.g., Simon, 1955;Simon, 1956Simon, , 1990. Heuristics are tools for managing and reducing complexity, for instance, by allowing decision makers to focus on a few relevant attributes or weights, this way simplifying decision tasks. In conceiving of heuristics as a conceptual lens for studying bibliometrics in research evaluation, we suggest that those heuristics might ease the complex process of assessing scientific quality and deciding on units (e.g., on scientists, research groups, or countries; Bornmann, 2015).
Specifically, research on BBHs could ask the question (i) to what extend (i.e., compared to other procedures) effort-reductions and time-savings occur, as well as (ii) when that is the casethat is, in what environments. As we will discuss in more detail below, the rationality of relying on heuristics depends not only on the environment, but also on the goals of the actor. Hence, fast-and-frugal heuristics are not just systematically evaluated in terms of one single performance criterion (e.g., their accuracy for making inferences), but in terms of those criteria that match the task environment and goals at hand. For example, in some situations reducing effort, saving time, and making accurate inferences might be goals; in others (e.g., in strategic interactions) being predictable and fair vis-à-vis cooperation partners might be all what a decision maker cares about.
Sixth, ever since citations are used as measures of quality or research impact, scientists have tried to formulate theories of citations (Bornmann & Daniel, 2008;Moed, 2017). Those theories do not focus on the evaluative use of citations, but on the process of citing: why do authors (researchers) cite certain papers and not others? Two prominent citation theories have been introduced hitherto (see overviews in Cronin, 1984;Davis, 2009;Moed, 2005;Nicolaisen, 2007). The first is Merton's (1973) normative citation theory: publications are cited because they have cognitively influenced the author of the citing publication. Merton's theory provides the theoretical basis for using citations in research evaluation, since citations indicate recognition by peers and the allocation of achievements to publishing authors. More citations mean more recognition and more attributed achievements. Publishing researchers are generally motivated to cite other researchers, because they belief it is fair and just to give credit to successfully publishing (researching) authors.
The normative citation theory has been heavily criticized since its introduction. The most important critical points refer to the fact that the theory does not explain all citation decisions (or no real decisions at all). Other factors besides cognitive influence and peer recognition play a significant role, so many studies indicate (see the overview in Tahamtan, Safipour Afshar, & Ahamdzadeh, 2016). One of the earliest studies in this respect has been conducted by Gilbert (1977) who interprets citations as tools for persuasion. Authors select those publications for citing that were published by reputable researchers in their fields. Thus, it is not the scientific content of a paper that leads to citation decisions, but the anticipated influence of the cited author on the reader. These accompanying citations are intended to confirm the claims of the citing author. The reputation of the cited author and related factors influencing citation decisions are mostly regarded to be consistent with the socialconstructivist theory of citing (see the overview in Cronin, 2005), which views citations as rhetorical devices. Such different theories of citation behavior can be extended in three interrelated ways when viewing citation decisions through the theoretical lens of the fast-andfrugal heuristic framework.
(i) For one, heuristics may be candidate models to explain citation decisions: what heuristics might researchers use (if any) to decide whom to cite and in which environment do they use each heuristic? Since the fast-and-frugal heuristics framework assumes people to come equipped with a repertoire of heuristics   Examples are the maximization of (subjective) expected utility and models of Bayesian inference (e.g., Arrow, 1966;Edwards, 1954;von Neumann & Morgenstern, 1947). The lemma of those classics is that rational (=optimal) decision making warrants full information.
For instance, classic decision analysis suggests people (e.g., investors) can and should consider all options at hand, all possible consequences that come with deciding for an option and the consequences' probabilities of occurrence. They should then compute the expected (e.g., monetary) value of each option to identify the optimal one (i.e., the one that maximizes the monetary gain). Also the Heart Disease Predictive Instrument mentioned above is a representative of optimization: like many statistical methods (e.g., regressions) used in science, this tool aims, in integrating information, to estimate optimal coefficients (e.g., beta weighs that maximize fit by minimizing error).
When applying optimization methods to judgment problems, decision makers actually make implicit assumptions about the environment in which they decide. One way to think about those environments is Knight's (e.g., 1921) notion of risk: In worlds of risk, the events' probabilities of occurring are known or can be reliably estimated. In those well-defined and predictable situations, optimisation is not only feasible, but a sensible approach to take.
Perhaps the most intuitive examples of such environments are gambles of chance (e.g., card games, roulette, and lotteries). All outcomes of such games (e.g., throwing any number from 1-6 with a dice), their consequences (e.g., win $200 with a 6, lose $41 with any number from 1-5), and probabilities (e.g., 1/6) can be specified.
Small worlds (Savage, 1954) of risksituations where all the outcomes, consequences, and probabilities of an action are knowablemight be conceived of as one end of the spectrum. Those at the other are large (or uncertain) worlds (Binmore, 2007;Binmore, 2009;Knight, 1921;Savage, 1954). 6 Following Gigerenzer (e.g., Hafenbrädl et al., 2016), in large, uncertain worlds, decision makers do not know all their options. They also do not know all consequences of the options at hand. Likewise, the probabilities that any of those consequences will occur may remain unknown and/or there may not be enough information to reliably estimate those probabilities. As Hafenbrädl et al. (2016) point out, "in such situations, surprises can occur, leaving the premises of a rational (e.g., Bayesian) decision theory unfulfilled. Not only does uncertainty lead to optimization becoming unfeasible or inappropriate, but it also invalidates optimization as a gold standard to which other decision processes are compared" (p. 217) as benchmark. 7 The conceptual lens of fast-and-frugal heuristics suggests that most real-world situations are situations of uncertainty: large worlds that may or may not entail elements of risk, but that, in so doing, still remain fundamentally uncertain (Gigerenzer, 2008(Gigerenzer, , 2014Hafenbrädl et al., 2016;Mousavi & Gigerenzer, 2014. For instance, imagine it might be possible to reliably assess what budget will be available to fund social science research at your university during the next two years; it might also be known which projects are in need of funding. However, what might not be known or knowable is the likelihood that any of those projects will lead to an outcome that justifies investing in it in the first place. To illustrate the fast-and-frugal framework's conceptual lens on decision making under uncertainty in more detail, let us consider inferences about quality (e.g., of departments, 6 Savage (1954) distinguished between small and large worlds, and Knight (e.g., 1921) similarly between risk and uncertainty (see Binmore, 2007). In this article, we use those terms interchangeably. 7 Indeed, there is a large literature (and a good amount of debate) about the scope of classic rational models (see Gigerenzer & Marewski, 2015). For instance, some authors (e.g., Lindley, 1983) do believe that Bayesian models can be applied to all situations of uncertainty, including singular events. After all, one can specify subjective probabilities that aliens will land on earth or that Michael Jackson visited the moonthere are no limits to coming up with subjective priors and using them in Bayes' rule. Other authors have more reservations. For example, to Savage (1954), the main proponent of modern Bayesian decision theory, an unlimited application of that theory seemed "utterly ridiculous" (p. 16). In his view, "[i]t is even utterly beyond our power to plan a picnic", and that "even when the world of states and the set of available acts to be envisaged are artificially reduced to the narrowest reasonable limits" (p. 16): those who plan can simply not know all possible consequences. Probably the least controversial area of application of Bayesian models are those situations where data are available to empirically and reliably estimate priors, such as it can be the case in medical diagnosis, informed by epidemiological data. Then probabilities can be measured in terms of frequency counts, such as the prevalence of HIV or ischemic heart disease in a given population.
scientists, manuscripts, grant applicants) in science as example. Conceptually speaking, such inferences can be thought of as classifications. As with most judgments under uncertainty, these classifications are likely never perfect: reviewers may infer 'good' research to be 'good' (correct positives) and 'bad' research to be 'bad' (correct negatives); however, they may also consider 'bad' research to be 'good' (false positives) and 'good' research to be 'bad' (false negatives) (Bornmann & Daniel, 2010 Uncertainty also arises when the criterion can, statistically, not be perfectly predicted from any set of predictor variables. In many real world domains, the relation between predictor variables and a criterion is obscured by random noise; and neither the predictors nor the criterion might be stable (but e.g., fluctuating). The more sources of potential error there are, the less predictable the criterion will be. Scientific research (e.g., in psychology, education, economics, medicine, or metrology) that tries to predict behavior, performance, and other variables from different kinds of (e.g., questionnaire, laboratory, field) data offers numerous notorious examples for this kind of predictive uncertainty (e.g., predicting stock market fluctuations from economic indicators, forecasting political voting intentions from questionnaires, modeling epidemic deaths from health statistics, forecasting the whether from past records). Also in research evaluation criterion variables can come with considerable uncertainty: Leaving trivial cases aside, how frequently does one know for sure how good a piece of research really is? And if ever, when does one know thisa few months or years later, or after decades when the revolutionary ideas expressed in a paper finally are ripe to become appreciated? Such uncertainties are hinted at by low agreements between reviewers in journal peer review procedures assessing the same manuscript .
Moreover, in research evaluation, the consequences (e.g., costs and benefits) associated, at T2, with correct positive, correct negative, false negative, and false positive judgments might be difficult to assess at T1, or even completely unknowable. One can only speculate about the costs (e.g., for a national funding agency or the individual scientist) if a single landmark piece of research is classified as 'bad', or thousands of trivial empirical findings as 'excellent'. Sometimes certain costs might be known (e.g., the amount of money invested in project), but then the benefits coming from that investment might be hard to estimate (e.g., how to estimate the benefits of knowledge transfer?). Also in other areas the consequences of decisions can often only be roughly understood: For example, when it comes to breast cancer screening (another type of classification), one can ask what costs and benefits are there from a health perspective (see Figure 2), however such costs and benefits only become sufficiently predictable after large amounts of epidemiological data has been collected. Other hidden costs might never be known. And if known, different costs might be difficult to be traded off against each other (e.g., money versus health) and/or come with new uncertainties.
The fast-and-frugal heuristic framework suggests to not mistake such unpredictable uncertainties for calculable risks. Instead, the framework takes uncertainties seriously, and in so doing, makes them transparent. The framework does also not pretend that optimization is the optimal tool for managing uncertainty. Instead, it acknowledges that there might be very different answers to the same questions (e.g., about scientific quality), depending on what classifiers (e.g., BBHs) one selects and depending on what criteria and cost-benefit structures form part of the decision environment. What is more, in assuming a repertoire of decision strategies to act as tools for dealing with uncertainty, the framework allows for flexible solutions to decision problems. To illustrate that last point, when classifying scientific output (e.g., as 'high quality'), the actual goal might not be to make 'optimal' classifications, but to simply allocate funds to different projects, with judgments about scientific quality being used to justify such funding decisions. Hence, instead of trying to infer quality, decision makers might directly resort to heuristics for candidate selection. That is, a problem of inference and classification s replaced by a selection problem. Figure 2. The fact box shows the benefits and harms of mammography screening compared to no mammography screening. Source: https://www.harding-center.mpg.de/en/fact-boxes/earlydetection-of-cancer/breast-cancer-early-detection Selection heuristics can implement satisficing decision processes: the notion of satisficing, coined by Simon (1955Simon ( , 1956, prescribes that a good solution does not have to be optimal; it has to be satisfactory for achieving a given task. For instance, when it comes to searching and selecting a mate, rather than considering all possible sexual partners (e.g., all males), a satisficing strategy would first set an aspiration level (e.g., based on past encounters with males), and then lead a decision maker to pick the first mate that meets that aspiration level (e.g., Todd & Gigerenzer, 2003). In science, areas of application for satisficing might be hiring (e.g., selecting a suitable candidate for a professorship), funding (e.g., identifying grant proposals worth being funded) or literature search (e.g., finding a citable paper). To elaborate on just one example: instead of trying to rank all scientific output (e.g., of units or scientists), instead of then trying to identify the costs of false assignments of ranks, and so on (which is what a utility-maximizing approach would prescribe), a funding body could simply define a criteria (i.e., aspiration levels) for funding eligibility and then allocate money to all applications that meet those criteria. In case there are more eligible applicants than funds, a selection among those could simply be made at random (see Bishop, 2018 4 Four points key to the study of fast-and-frugal heuristics Indeed, when scientists' and evaluators' use citation counts as cue to scientific quality, they might, essentially be relying on social (e.g., imitation) heuristics (e.g., 'Infer highlight cited papers to be quality ones!', 'Cite the paper on a topic with the most citations!'; see section 2). On average, that might be a good idea, just like it is a good idea for cattle in a herd to imitate one another when one animal starts, all of a sudden to run (e.g., that animal might have detected a predator). Yet, just like a heard of stampeding cattle might run off a cliff, also in research social heuristics will not always yield clever decisionsfor instance, wrong theses might be propagated for a long time. This example and the line of reasoning exposed in the previous section serve to illustrate five general points key to the study of fast-and-frugal heuristics.
The first one is that no heuristic will always yield clever judgmentsno decision mechanism, be it complex or simple, results in good decisions in all situations. What matters is whether decisions are satisfactory enough (e.g., on average), in a given task environment.
Translated to research evaluation, this means that no evaluation tool, be it different bibliometric indicators or peer-review should be expected to always yield good judgments.
The second one concerns the notion of the adaptive toolbox: Rather than assuming decision makers come equipped with just one tool (e.g., one type of strategy) to solve all problems, the fast-and-frugal heuristic framework posits that people can adaptively select from a large repertoire of different strategies (including both different heuristics and other more complex methods), with each strategy being tuned to a given task environment. Being able to smartly choose among the different tools from this toolbox as a function of the task environment at hand makes up the expertise (and art) of clever decision making (see e.g., Hafenbrädl et al., 2016). For instance, bibliometricians ought to know when to rely on which indicator, and when to not rely on bibliometrics at all and to switch to extensive peer-review procedures instead.
The third point concerns normative criteria for evaluating decisions: How should decisions be made? As mentioned above, classic models of decision making assume the more information, the better for finding optimaland hence rationalsolutions. Coherence, as embodied by the rules of logic, is another norm for rational decision making. In contrast, in assuming an ecological view of rationality (dubbed ecological rationality), the fast-and-frugal heuristics framework stresses correspondence (Hammond, 1996): what matters is not whether a decision is in line with the prescriptions of logic or, say utility maximization, but to what extent that decision can aid an agent to solve a problem in a given task environment. As stressed above, the solutions do not have to be perfect, but satisfactory for achieving a given goal. 8 Moreover, sometimes, an ecologically rational heuristic for one decision maker's task is an unsatisfactory heuristic for another. Hence, no strategy is universally rational for all people, instead the rationality of using a given strategy always has to be examined relative to a specific person's goal in a given task environment.
To illustrate the latter point, if physicians work in an environment where they risk being sued for mistakes, then it is ecologically rational for them to perform diagnostic tests and treatments on patients even if they think that those are unwarranted. This 'conservative' heuristic might not be beneficial for patients, because it increases the false-positive rate, potentially leading to harmful over-diagnosis and over-treatment, but it protects the doctor.
We alluded to this type of defensive heuristic in the introduction, when we pointed out that a doctor might simply follow the rule of thumb to send all patients who exhibit serious chest pain to the coronary care unit (Marewski & Gigerenzer, 2012). Playing it safe is an important goal in many professional environments (Artinger, Artinger, & Gigerenzer, 2018). In research evaluation, a prototypical conservative heuristic might be to always assess other units (e.g., departments) positively when the review is non-anonymous. This defensive heuristic is a social one; using it helps to avoid making enemies. Or, in a public funding agency, an administrator may instruct the agency's evaluation panel what reasons for rejections of proposals need to be put in writing: the goal is not only to inform reviewers, but to make sure that enlisted reasons are legally bullet-proof so that applicants cannot sue the funding agency for rejections. In the humanities, another example comes in the form of evaluations based on bibliometric indicators (Ochsner, Hug, & Daniel, 2016). Those indicators might be used when one believes that numbers are socially more accepted than seemingly more subjective assessments (e.g., based on reading papers), even when knowing that the indicators might be less informative (e.g., than the contents of the papers read). 9 The fourth point key to the study of fast-and-frugal heuristics concerns the research questions to be asked. Specifically, the study of heuristics asks descriptive, ecological, applied, and methodological ones, all of which are relevant for the discussion of heuristics in the context of bibliometrics-based research evaluations (Gigerenzer, Hoffrage, & Goldstein, 2008). 9 In adopting such an ecological view on rationality, the study of fast-and-frugal heuristics can aid both: (i) uncovering such defensive social heuristics and (ii) modifying the corresponding decision making processes. To illustrate the first point, in modelling London judges' bailing decisions with fast-and-frugal trees, Dhami (2003) was able to uncover that those judicial decisions were geared to enable the judges to 'pass the buck' in case something went wrong (see Gigerenzer & Gaissmaier, 2011). As to the second point, Luan, Schooler, and Gigerenzer (2011) offer a signal-detection analysis of fast-and-frugal trees that allows, as a function of the goals of the decision maker, to engineer either more defensive (conservative) or more liberal trees (see Hafenbrädl et al., 2016, for a discussion). When building bibliometric indicators into decision trees, the same kind of signaldetection analysis can help to create more conservative or more liberal classifiers, leading either to a larger probability of committing false positive or false negative research assessments (e.g., classifying, with a larger probability, 'low quality' work as 'good' or 'high quality' work as 'bad').
(i) Descriptive: What heuristics do people use and when do they rely on which heuristic? For instance, when will decision makers rely on imitation strategies, and when will they try to uncover solutions to decision problems themselves? 10 reconsider the kind of classification problems from medicine and research evaluation we described abovethose where the criterion is not directly (or immediately) accessible. Rather than letting decision makers resort to selection (satisficing) or social heuristics, one can try to make the criterion more accessible 10 Corresponding descriptive research on heuristics not only tests models of heuristics, but additionally includes complex alternative approaches in model comparisons. After all, the claim of the fast-and-frugal heuristic framework is not that simple decision mechanism will always be relied uponrather, people adaptively switch between different decision mechanisms as a function of the task environment, including both complex and simple ones. Descriptive research on heuristics hence also asks the radical question whether heuristics are good models of behaviour in the first place (e.g., Glöckner, Hilbig, & Jekel, 2014). For instance, while the assumption that our cognitive make-up comes with a repertoire of mechanisms is common in many areas of psychology and biology (see Marewski & Link, 2014, for an overview) the decision making literature has put forward alternatives to such multi-mechanism conceptions, too, including the proposal that one single mechanism is sufficient to explicate behaviour across tasks (e.g., Busemeyer, 1993Busemeyer, , 2018Glöckner & Betsch, 2008). Which type of approach is theoretically more plausible is debated in the literature (see Marewski, Bröder, & Glöckner, 2018, for a special issue on the topic). To give another example, for years, the aforementioned recognition and take-the-best heuristics have been at the centre of critique, questioning both heir psychological plausibility and the evidence that people rely on those heuristics in decision making (e.g., (e.g., through measurement and data bases, such as by making the tools SciVal 11 , and InCites 12 , for bibliometrics-based institutional evaluations available).
One can also create environments in which making mistakes (e.g., using the wrong heuristic) is likely less costly. Strategies for managing mistakes are well known in aviation as part of the safety culture. In research evaluation such strategies might bolster the consequences of misclassifications (e.g., evaluating good work as 'bad')in the extreme case one would eliminate all potentially negative ones (e.g., Do not base tenure or funding decisions on bibliometric analyses!).
We hasten to add that the applied question also asks when optimization procedures should be relied upon: the toolbox of decision mechanisms consists not only of heuristics; also Bayesian models, subjective expected utility-maximization, and other complex models have a place in it and there are environments where complex approaches should be favored over heuristics. Katsikopoulos (2011) developed a simple tree for deciding, more generally, when to rely on fast-andfrugal trees as opposed to more complex procedures, notably regression. (ii) The data can be used for empirical analyses either within single fields or for crossfield comparisons. 13 Much social science research evaluates how well different models explain existing data and observations from hindsight. A typical case is fitting a regression model to data at hand, and reporting R 2 or some other measures to establish how good the model is. In contrast, the performance of fast-and-frugal heuristics for inference and classification is evaluated in foresight, that is, out of sample or out of population, reflecting actual decision making under uncertainty about the future or unknown. It is in out of sample prediction where heuristics can outperform more complex models (for the difference between fitting and predicting, see Marewski, Gaissmaier, et al., 2010a;Marewski & Olsson, 2009;Pitt, Myung, & Zhang, 2002b). 14 Yet, a completely different class of indicators might focus on methodological aspects of research quality. To illustrate this, one could count how many of the papers produced by a unit offer sufficiently complete descriptions of methods (e.g., data collection, data analysis) in order to allow for replication studies to be conducted, or how many papers avoid improper statistical rituals (e.g., null ritual in null hypothesis significance testing, see Gigerenzer & Marewski, 2015) or how many papers correctly distinguish between mere data fitting and actual predictions (e.g., with fixed parameter values; see Pitt, Myung, & Zhang, 2002a). In fields such as medicine, the amount of pre-registered studies could also be taken into account. Corresponding indicators could be built into heuristics and other strategies for research evaluation, and be systematically studied through the lens of the fast-and-frugal research program. One such heuristic is the f-index, suggested by Gigerenzer and Marewski (2015). The f-index serves to gauge to what extent a paper problematically disguises hypothesis finding as hypothesis testing. Corresponding 'scientific fishing expeditions' (hence the 'f' in f-index) take place when authors first look at the data, check for significance, and then report the results as if they were the fruit of a hypothesis test (Kerr, 1998 Gigerenzer and Marewski (2015) report that in the Academy of Management Journal in 2012, the number of p values computed in an article was on average 99 (median = 89), ranging from 0 to 578. Likely most articles stated fewer hypotheses.
(iii) Bibliometrics are rooted in the research process of nearly every researcher. It is the task of researchers to make their results publicly available (this differentiates public research from industrial research) and to embed the results in previous results (research stands on the shoulders of giants, see Merton, 1965). (vi) Since bibliometric data are available for many researchers and science managers, they can be used when time constraints exclude complex methods for research evaluation, when researchers have to assess research activities outside of their own area of expertise, or when they evaluate many and big units (e.g., countries). In other words, bibliometrics can be applied to situations fraught by capacity limitations. In these situations, bibliometrics (interpreted as heuristics) provides a 'ballpark figure' of research performance which might be "ecologically rational to the degree that it is adapted to the structure of the environment" (Gigerenzer et al., 1999, p. 13).
(vii) The availability and use of bibliometrics can be thought of in terms of a 'rational' trade-off. Not every research assessment is important enough to warrant spending a lot of time; thus, researchers choose bibliometric indicators, such as the field-weighted citation impact, that save efforts. To obtain a rough impression of the research performance of researchers from a department, simple publication and citation numbers can be considered, albeit while taking the researchers' different academic ages into account (Bornmann & Marx, 2014).
Indeed, there is empirical evidence that simple procedures, such as counting the number of publications or citations, lead to similar results as the complex peer review procedure. Auspurg, Diekmann, Hinz, and Näf (2015) investigated the research rating process suggestaccording to Pride and Knoth (2018) that "citation-based indicators are sufficiently aligned with peer review results at the institutional level to be used to lessen the overall burden of peer review on national evaluation exercises leading to considerable cost savings". Traag and Waltman (2018) used the same dataset also to investigate the relation between peer review and bibliometrics at the institutional level. They found a relatively minor difference between metrics and peer review in clinical medicine, physics, and public health, health services and primary care. Similar results have been published by Harzing (2017).
The studies by Auspurg et al. (2015), Diekmann et al. (2012), Pride and Knoth (2018), Traag and Waltman (2018), and Harzing (2017) exhibitsimilarly as in other areas of research on heuristicsa less-is-equal effect: a complex procedure uses more information than a simple tool and performs extensive assessments, but nevertheless lead to similar final judgments. Precisely specifying when, how, and why such effects emerge could be the goal of a foundational research program on bibliometrics.
6 Bibliometrics-based heuristics (BBHs) illustrated: One-reason decision making in research evaluation We propose to conceptualize bibliometrics as fast-and-frugal heuristics, since they can be used as shortcuts that enable assessing research units in a short time with minimal effort.
Furthermore, bibliometrics are frugal, because they hinge on minimal information and ignore the rest. Research achievements can be reduced to publication and citation numbers without considering further indicators, such as number of grants, editorships, or contributions to conferences (for an overview of existing indicators, see Montada, Krampen, & Burkard, 1999). Gigerenzer and Gaissmaier (2011) proposed describing heuristics in terms of four broad classes of models: "The first class exploits recognition memory, the second relies on one good reason only (and ignores all other reasons), the third weights all cues [i.e., predictor variables] or alternatives equally, and the fourth relies on social information" (p. 459). In the following section, for illustrative purposes we discuss on one-reason decision making heuristics. Other heuristics that, in the context of evaluative bibliometrics, might (in the future) be interesting to study are profiling ones. Geographic profiling is, for instance, used "to predict where a serial criminal is most likely to live given the sites of the crimes" (Gigerenzer & Gaissmaier, 2011, p. 463). Spatial bibliometrics (see an overview in Frenken, Hardeman, & Hoekman, 2009) is an emerging topic in scientometrics with increasing popularity. Akin to profiling heuristics, spatial bibliometrics can be relied upon to identify hot and cold spots in international research (Bornmann & de Moya Anegón, in press).
What are one-reason heuristics? As mentioned above, social heuristics may be useful when little information and sparse (e.g., feedback) learning opportunities are available, making turning to others a sensible course of action. In other situations, decision makers have sufficient knowledge to instead rely on (non-social) one-reason heuristics. A representative of this class of decision making strategies is the aforementioned take-the-best heuristic. Takethe-best prescribes sequentially considering predictor variables (called cues) in the order of their predictive accuracy (called validity), and bases a decision on the first cue that differentiates between options: Search rule Consider cues in the order of their validity.
Stopping rule: Stop search once one cue is found on which one but not the other option has a positive value.
Decision rule: Decide for the option with the positive value.
This form of one-reason decision making requires expertise, and hence ample information and learning opportunities: to use take-the-best, decision makers need to know which cues are the most predictive ones in order to sequentially consult them (see Garcia-Retamero & Dhami, 2009).
Another sub-class of one-reason decision making heuristics focuses on just one 'clever' cue and base decisions exclusively on that cue without (sequentially) considering any others. To illustrate this type of one-clever cue heuristic, in business Wübben and Wangenheim (2008) examined different approaches for developing and implementing customer base management strategies. Such strategies are important to managers; they aid them to decide on what customers to spend their limited marketing budget onbe it to offer those customers discounts or send them catalogues. Wübben and Wangenheim (2008) report that managers seem to rely on a simple recency-of-last-purchase rule, the hiatus heuristic.
Decision rule: If a customer has not purchased within a certain number of months (the hiatus), the customer is classified as inactive; otherwise, the customer is classified as active.
The hiatus is the only cue considered. Another cue allows making a similar bet, namely when it comes to forecasting who will be the future high-value customersthose customers who warrant special treatment (e.g., who should be given gifts and bonuses to prevent them going to the competition).
Decision rule: Infer that the top x% of customers in the past will continue to be the top x% of best customers in the future.
In a study on the airline, apparel, and music industry, this one-clever cue heuristic has been found to predict as well or outperform the more complex (but standard) model from the marketing literature (Wübben & Wangenheim, 2008).
One-clever cue heuristics are commonnot only in humans. As Gigerenzer and Gaissmaier (2011) note, "many animal species appear to rely on a single 'clever' cue for locating food, nest sites, or mates. For instance, in order to pursue a prey or a mate, bats, birds, and fish do not compute trajectories in three-dimensional space, but simply maintain a constant optical angle between their target and themselves" (p. 463). The (frequent) use of certain bibliometric indicators by researchers and science managers seems to be consistent with the definition of one-clever-cue heuristics, too. 16 For example, the field-weighted citation impact of papers is used to get a rough impression of a researcher's academic performance.
The share of a university's papers which belong to the 10% most-frequently cited papers in their subject category and publication year is used in the Leiden Ranking to identify the best universities worldwide. Moreover, the number of papers published by a research group is often used to estimate its productivity.
In their own practice, readers of this paper will have come across many other such Other aspects are accuracy and importance (Martin & Irvine, 1983).
Since a few years, what one might call an (adaptive) toolbox for using bibliometric methods in research evaluation seems to be emerging. For instance, Todeschini and Baccini 16 Our use of the term 'one-clever-cue heuristic' might suggest that the application of bibliometric indicators is 'clever' in general. However, the only reason of using this label here is that the term has been proposed in the literature for a certain type of heuristic. Research on bibliometrics-based heuristics has yet to reveal whether and in which environments using different BBHs actually is 'clever'. impact of scientific units (e.g., of single researchers). When comparing different researchers from the same field, an output-oriented one-clever-cue heuristic might be: Search rule: Search for all substantial publications (e.g., articles and reviews), produced by two researchers (A and B).
Stopping rule: Stop search, once all such publications have been identified.
Decision rule: If scientist A has published more substantial publications than scientist B and both scientists are in the same academic age, then infer that scientist A is more active in research than scientist B.
This heuristic may work in academic environments where different authorship orders play no role (or average each other out) (Bornmann & Marx, 2014).
It is standard in bibliometrics to measure impact in cross-field comparisons with fieldand time-normalized citation scores. The Leiden Manifesto (Hicks, Wouters, Waltman, de Rijcke, & Rafols, 2015) recommends the percentile indicators as the most robust normalization method: "Each paper is weighted on the basis of the percentile to which it belongs in the citation distribution of its field (the top 1%, 10% or 20%, for example)" (p. 430). The share of top x% papers for units in research allows a clear decision on their standing: higher values than x point to units with an above average impact in the corresponding fields and publication years. Thus, the decision rule for an impact-oriented heuristic for institutional assessments might be: Search rule: Search for all substantial publications (e.g., articles and reviews), produced by the institute.
Stopping rule: Stop search, once all such publications have been identified.
Decision rule: If the institute has published papers over several years with at least 12% top 10% papers in the corresponding fields and publication years, the institute has a (significantly) better performance than an average institute in the world.
This decision rule resembles the hiatus heuristic mentioned above. One may be inclined to speculate to what extend this heuristic may work particularly well in environments where publication rates are highly skewed.
Importantly, just as with other heuristics in the adaptive toolbox, the success of onereason BBHs depends on environmental structure and the goals of the decision maker. When a decision maker wants to make predictions, relevant statistical aspects of environmental structure might include, for instance, "1. Uncertainty: how well a criterion can be predicted. 2.
Redundancy: the correlation between cues [i.e., predictor variables]. 3. Sample size: number of observations (relative to number of cues). 4. Variability in weights: the distribution of the cue weights (e.g., skewed or uniform)" (Gigerenzer & Gaissmaier, 2011, p. 457). With respect to using bibliometric indicators to make predictions, this could mean that corresponding one-reason heuristics might be especially suitable in environments in which (1) it is difficult to foresee research quality (e.g., in the evaluation of greater units, such as institutions or countries), (2) bibliometric indicators are highly correlated with other indicators of research performance (this is also especially the case with greater units), and (3) many units have to be assessed (e.g., many institutions or countries).
Moreover, many decision environments in science actually have structures which may further favour one-reason decision making heuristics. Science is characterized by the Pareto principle of the "vital few and the trivial many" (see Bornmann, de Moya-Anegón, & Leydesdorff, 2010). Juran (1954) explains the principle as follows: "In any series of elements to be controlled, a selected small fraction, in terms of numbers of elements, always accounts for a large fraction, in terms of effect" (p. 758). In these environments, the prediction of each predictor cannot be overruled by the combined predictions of all less important predictors together (Martignon & Hoffrage, 1999). "This condition is met, for example, by assigning binary predictors the weights of 1/2, 1/4, 1/8, 1/16 or the weights 100, 10, 1, 0.1. As can be seen, no trade-offs among predictor variables can be made with these non-compensatory weights (e.g., 1/2 cannot be overruled by 1/4, 1/8, 1/16; 1/4 cannot be overruled by 1/8, 1/16; and so on). In such an environment, simply relying on the best predictor will lead to the same choice as trying to weigh and add all predictors" (Hafenbrädl et al., 2016, p. 218). Such noncompensatory weights actually correspond to the order in which take-the-best's search rule considers information: mathematically, ordering is a form of weighting.
We hasten to add that, as any other method of decision making, naturally also onereason BBHs will lead to errors even when they are relied upon in particularly fitting taskenvironments. However, it is an empirical question to what extent that error rate is larger than that of methods of judgment which follow other, more complex rules.
Leaving such ecological considerations aside, also from a descriptive point of view, it does not seem unreasonable to suspect that one-reason BBHs might be frequently used in research evaluation: these days, research must be evaluated under great time pressure. 17 Experimental studies indicate that time pressure may induce people to base decisions on one or a few cues in decision making (e.g., Pachur & Hertwig, 2006;Payne, Bettman, & Johnson, 1988). Furthermore, to what extent bibliometrics can, in the first place, realistically be replaced by other evaluation strategies may depend on the evaluative task environment at hand. If only a few research institutes with different missions (i.e., goals, see section 4) must be evaluated, an informed peer review process including a thorough indicator reportreflecting the different missionsmay be feasible to conduct. However, evaluators may actually be forced to resort to simpler methods when they face the challenging task of evaluating numerous units or scientists. When considering multiple options, heuristics are particularly helpful: Early onin a first stepin the decision process, one-reason heuristics can be relied upon to reduce to a manageable number the options to be considered in more detail in a second step. Such consideration-set (e.g., Hauser & Wernerfelt, 1990) generating decision processes have been described in other areas, including in election forecasting (Marewski, Schooler, et al., 2010) and consumer choice.
For instance, empirical studies (Kohli & Jedidi, 2007;Yee, Dahan, Hauser, & Orlin, 2007) suggest that consumers use sequential heuristics to eliminate products for further consideration and "evaluate the remaining options more carefully" (Gigerenzer & Gaissmaier, 2011, p. 466;Hauser, 2011). Paralleling consumer choice situations, foundations for the promotion of research often receive more applications (e.g., for post-doctoral fellowship programs) than can be processed in a thorough peer review process (Bornmann & Daniel, 2004). Thus, it is necessary to perform a pre-selection to reduce the number of candidates to be evaluated in more detail. The results of Horta and Santos (2016) suggest that "those who publish during their PhD have greater research production and productivity, and greater numbers of yearly citations and citations throughout their career compared to those who did not publish during their PhD" (p. 28). Similar results have been reported by Pinheiro, Melkers, and Youtie (2014) and Laurance, Useche, Laurance, and Bradshaw (2013). Based on the findings from such studies, a one-clever-cue heuristic for the pre-selection of applicants could prescribe selecting those who have published the most papers (articles or reviews) during their PhD: This pre-selection heuristic could by further refinedbased on the results by Bornmann and Williams (2017) as well as Cole and Cole (1967) by considering citations or journal metrics in a second step: Pre-selected candidates could be winnowed down further by identifying those with most publications in reputable journals or by zooming in onto those candidates with a minimum amount of citations.
Selection heuristics could also aid in the last decision round of a peer review process.
In these last rounds, foundations face the problem of choosing among candidates all of whom are all well-suited, but available funds are insufficient for supporting all. Funders experiment with a lottery system to select among these candidates (Bishop, 2018). The problem with lottery systems is that they do not rely on scientific quality criteria. BBHs could be an interesting alternative. For example, those candidates could be selected who have published the most highly-cited papers (i.e. papers belonging to the 10% most frequently cited papers within their field and publication year). This indicator is seen as the most robust fieldnormalized indicator (Hicks et al., 2015). 18 See http://www.dfg.de/en/research_funding/programmes/excellence_initiative/index.html

Discussion
Heuristics offer a theoretical framework for understanding how people can make ecologically rational decisions. Those decisions can be predictions about the future as well as inferences about the past or present (e.g., as in rankings, classifications, or estimations).
Those decisions can also concern choice (e.g., of different courses of action), or search and selection (e.g., of candidates or mates), to name just two other examples. Many heuristics use a single cue (or few cues) among the many available cues for decision-making. Davis-Stober, Dana, and Budescu (2010) call this reduction "over-weighting" to describe the "effect of a single predictor cue receiving disproportionally more weight than any other predictor cue according to an optimal weighting strategy" (p. 217). Other heuristics use many predictor variables, but weigh them equally (e.g., as the tallying heuristic; see section 2). Those heuristics simplify tasks by ignoring order (ordering is, mathematically, a form of weighting).
In contrast to the widespread views that complex tasks warrant complex solutions and that simple solutions reduce accuracy, the study of heuristics has shownto speak with Gigerenzer and Brighton (2009) -that "less information, computation, and time can in fact improve accuracy" (p. 107). Moreover, decision makersas a ruleoften do not know all the relevant information, have limited time, and information-processing constraints, including failing memories or incomplete data bases. They frequently operate in environments which are difficult to predict (e.g., decision makers in science who try to select the next breakthrough research projects). In these uncertain environments, less information can be more when it comes to make inferencesthat is, judgments about something that is unknown, be it because the criterion to be inferred lies in the future or because it is simply not accessible at present. In other words, ignorance can be useful (Gigerenzer & Gaissmaier, 2011).
In this paper, we propose to conceptualize the use of bibliometrics in research evaluation in terms of fast-and-frugal heuristics. Bibliometrics is frequently used in different evaluative contexts, whereby the information for decision-making is reduced to publication and citation numbers. More generally speaking, this reduction may be ecologically rational in environments where active researchers are publishing researchers and where every new research has to be framed in terms of past research (by citing the corresponding publications).
These environments can be found in most disciplines. In section 6, we gave examples of how bibliometrics can be conceptualized in terms of one-reason heuristics. However, this conceptualization ought to be seen can be seen as a first rough attempt only.
In psychology (and beyond), a line of systematic research of heuristics has emerged with numerous publications. The search in WoS core collection for publications (articles and reviews) with the topic "heuristic" and the research area "psychology" found around 4000 documents (date of search: 9/24/2018). Gigerenzer and Brighton (2009) present "ten wellstudied heuristics for which there is evidence that they are in the adaptive toolbox of humans" (p. 130). Those authors formulate a vision of human naturehomo heuristicus -"based on an adaptive toolbox of heuristics rather than on traits, attitudes, preferences, and similar internal explanations" (p. 135).
Based on the results from research on heuristics, we encourage studying the use of bibliometrics in research evaluation under the umbrella of the fast-and-frugal heuristics framework. This does not mean that the use of bibliometrics is reduced to only one heuristic model. Instead, it should be the objective to develop an adaptive toolbox including a collection of heuristics to have a coordinated set of bibliometric indicators available for specific evaluation environments. This objective is in accordance with a recent call by Waltman and van Eck (2016) for contextualized scientometric analyses "which is based on the principles of context, simplicity, and diversity" (p. 542). Every heuristic should be tuned to specific environments and designed for specific evaluation tasks (Bornmann & Marx, 2018).
Note that specifying the toolbox of BBHs does not necessarily entail coming up with an unlimited (or very large) number of heuristics: Gigerenzer and Gaissmaier's (2011) overview demonstrates how many different heuristics are built from common building blocks.
Those building blocks reduce the larger collection of heuristics to a smaller number of combinable components. Specifically, depending on how a small number of search, stopping and decision rules is combined, a larger number of heuristics can emerge. For instance, search rules that sequentially consider predictor variables (e.g., as in Figure 1's fast-and-frugal tree), can be complemented by even simpler search rules that consider predictor variables in any order. Search could end after a certain number (e.g., 3) of predictors has been identified or after a certain amount of time has elapsed (e.g., 10 minutes). 19 Decisions could then be made by simply counting how many predictor variables suggest one option as opposed to another.
This equal-weighting principle is built into, for instance, tallying heuristics (see, e.g., Gigerenzer & Goldstein, 1996;Marewski, Gaissmaier, et al., 2010a, see section 2), and is helpful in many domains, ranging from medicine to financial investment and avalanche forecasting (Hafenbrädl et al., 2016). Identifying such building blocks might, too, be helpful for studying the use of bibliometrics as heuristics.
In short, in line with research on other fast-and-frugal heuristics, work on BBHs should target four key points (see section 3). alongside with the environments for which there is evidence that those heuristics lead to accurate decisions (Bornmann & Marx, 2018). Based on this adaptive toolbox, the use of bibliometrics could be effectively taught by professional bibliometricians in workshops and courses. Moreover, from a policy-point of view, research environments could be changed so that certain heuristics work better, or alternatively, decision makers could be recommended to use other (e.g., new) heuristics than they used previously.

Conclusion
In research evaluation, the oldest and most frequently used method for assessing research activities is the peer review process (Bornmann, 2011). This qualitative form of research evaluation belongs to the class of complex and 'rational' judgment strategies. Peer review processes are frequently put in place when single contributions, such as journal submissions and grant applications must be evaluated. However, as the example in the introduction shows, the process is also used in the evaluation of units, such as universities or single researchers. Presumably, peer review processes are based on the assumption that all possible information (on manuscripts, researchers, applications etc.) is taken into account and then differentially weighted. Involving experts in research evaluation, most likely ensures, so the rationale goes, that assessments are based on full information (e.g., on all aspects of scientific quality). In general, this premise is only limited by reviewer tasks which exceed field-specific expertise: societal impact considerations are difficult to assess for reviewers in grant peer review procedures (Derrick & Samuel, 2016).
Since the end of the 1980s, relying on mere publication and citation numbers as cue to scientific quality has become increasingly popular (Bornmann, in press). Nowadays, corresponding bibliometric indicators are often used as complement to expert peer-review; sometimes bibliometrics even fully replace expert judgments. However, bibliometrics are not yet at the center stage of a fully established profession, and, perhaps due to their openness to non-specialists and peripheral actors, a broad consensus as to which indicators should be used in what settings has yet to emerge (Jappe, Pithan, & Heinze, 2018).
While all assessment methodsbe they based on peer review, different bibliometric or other indicators, for that matterhave their specific advantages and disadvantages, arguments in favor of resorting to bibliometrics are that (i) the application of bibliometrics is less costly than that of peer review (the time of experts is valuable); (ii) bibliometrics offers numbers that lend themselves to fine-grained rankings of evaluated units (the outcomes of peer review are usually not so finely granulated); (ii) large numbers of units can be evaluated and compared (it is difficult for peers to overview and assess a large number of units); (iv) publication and citation numbers can be used to assess large units (peers have usually problems to assess large units, such as institutions and countries, because of their complexity).
In research evaluation, bibliometrics and peer review are typically used as standalones. The Leiden Ranking, for instance, is a sheer bibliometric ranking of universities worldwide 22 . Submissions to journals are assessed by reviewers only. However, bibliometrics and peer review can also be used in combination: in informed peer review, peers assess units (e.g., institutions) based on their own impressions and based on a bibliometric report.
Moreover, sometimes bibliometric indicators are considered together with other indicators.
Examples are university rankings (e.g., the Academic Ranking of World Universities 23 ) which are mostly based on bibliometric measures, but additionally draw on other measures (e.g., number of Nobel laureates) to assess research performance. Although different methods do thus exist, complex procedures are generally (i.e., in all evaluation environments) conceived of as more appropriate than simpler procedures when it comes to evaluating research activities. It is seen as imperative that a complex research process with many uncertainties is evaluated with a complexand hence seemingly rationalprocedure.
We are aware of only a few studies investigating the use of bibliometrics in different evaluative contexts empirically. For example, Hammarfelt and Haddow (2018) examined metrics use among humanities scholars in two countries and Gunashekar, Wooding, and Guthrie (2017) studied how panels at the UK National Institute for Health Research (NIHR) rely on metrics. Moed, Burger, Frankfort, and van Raan (1985) produced bibliometric results for research groups in the Faculty of Medicine and the Faculty of Mathematics and Natural Sciences at the University of Leiden. The results were discussed then with researchers from the two faculties.
Importantly, this latter study points to several problems in the evaluative use of bibliometric data. Indeed, as the literature overview of de Rijcke, Wouters, Rushforth, Franssen, and Hammarfelt (2016) shows, much work on bibliometrics focuses on "possible effects of evaluation exercises, 'gaming' of indicators, and strategic responses by scientific communities and others to requirements in research assessments" (p. 161). Thus, the bibliometric enterprise's manipulative and negative aspects are at the forefront of scientific interest, but not the fruitful and advantageous usage of indicators. Moreover, there does not exist a research program studying the environments in which specific research evaluation methods (e.g., using bibliometric indicators X, Y, and Z in judgment strategy A, B, or C in an informed peer review or without peer review) are more successful than other methods (e.g., pure peer review), and reversely, when other approaches work better.
Since bibliometric indicators areas a rulecritically assessed, but used very frequently, the gap between empirical studies and practical usage is surprising. How is it possible that certain bibliometric indicators, such as the Journal Impact Factor, remain instruments of research evaluations although prominent and popular declarations, such as the Declaration on Research Assessment (DORA 24 ), exist against its use? What are the benefits which are gained by relying on certain indicators although justified critique exists? The frequent usage might indicate that bibliometrics are simply very attractive in certain evaluative environments (e.g., because they are easy to use, i.e., fast-and-frugal). 24 See https://sfdora.org To conclude, a heuristic research program on the evaluative use of bibliometrics might reveal the following: (1) BBHs can be similarly accurate as complex procedures (such as peer review) even though they are simpler.
(2) The ecological rationality (e.g., accuracy) of such heuristics depends on the task environment.
(3) Users of bibliometrics can learn to select BBHs from an adaptive toolbox (e.g., provided by the ISSI society) as a function of the task environment and goals at hand. (4) Science evaluation comes with uncertainty which, in turn, might foster using heuristics, both from a descriptive and from a prescriptive (i.e., policymaking) point of view.