What is an animal personality?

Individuals of many animal species are said to have a personality. It has been shown that some individuals are bolder than other individuals of the same species, or more sociable or more aggressive. In this paper, we analyse what it means to say that an animal has a personality. We clarify what an animal personality is, that is, its ontology, and how different personality concepts relate to each other, and we examine how personality traits are identified in biological practice. Our analysis shows that biologists often study specific personality traits, such as boldness, sociability or aggressiveness, rather than personalities in general. We claim that personality traits are best understood as dispositions and that they are operationally defined in terms of certain sets of behaviours, which are studied in specific experimental set-ups. Furthermore, we develop an integrative philosophical account that specifies and formalises three criteria for identifying personality traits, which are used in biological practice. For an individual animal to have a personality trait it must, first, behave differently than others (Individual Differences). Second, these behavioural differences must be stable over a certain time (Temporal Stability), and third, they must be consistent in different contexts (Contextual Consistency).


Introduction
Human personalities have been studied for a long time and are a central topic of contemporary psychological research. Personality psychology, for instance, studies how humans differ in their behavioural and psychological traits and how this reflects differences in their general personality structure (e.g., as reflected by the big-five factor structure; Goldberg 1990). Concepts of personality started to be applied to animals in the late 1930s (Gosling 2001). Over the last decades, research on animal personalities, which are often understood as between-individual differences in behaviour that are consistent across contexts and time (Réale et al. 2007;Stamps and Groothuis 2010;Wolf and Weissing 2012), increased drastically. For vertebrates, the concept of animal personality is nowadays widely accepted but the idea that also invertebrates such as insects have personalities, has long been the object of scepticism or even ridiculed (Gosling 2001). In the last ten years, however, within-species differences in personalities have also been demonstrated in several case studies on invertebrates (Kralj-Fišer and Schuett 2014). The rise of studies of animal personalities is closely related to a recent trend in ecology, behavioural and evolutionary biology that draws attention to how individual organisms differ from each other and that focusses on the causes, consequences and underlying mechanisms of individual differences (e.g., Bolnick et al. 2003;Dall et al. 2012;Wolf and Weissing 2012). Understanding withinspecies variation is of importance for biologists because this variation is the raw material for evolution (Mather 1998) and because the selective benefits of trait variation can be investigated. Cross-species comparisons potentially help to elucidate the origins of personality traits as results from convergent evolution or homologies and thereby contribute to explaining why within-species individual differences develop and persist (Gosling 2001) and how comparable or different they are among species (Carter and Feeney 2012).
Animal personality research is philosophically very interesting because it is far from clear how the personality of an animal should be defined and how animal personalities are to be identified. Scientists who claim that animals of certain taxa are not capable of having a personality seem to tailor personality closely to humans and thus to endorse a quite demanding personality concept. Others reject this view and attribute personalities also, for instance, to various arthropod species. However, not all animals seem to express a personality, which is why discovering personalities in certain animal taxa is so interesting. Furthermore, if an animal expresses a personality, not all of its behaviours seem to contribute to a personality trait; for example, simple feeding behaviour may be irrelevant to boldness. Nevertheless, it is far from clear which conditions must be fulfilled for an animal to be assigned a personality. A commonly accepted definition is that animal personalities are "behavioural differences between individuals that are consistent over time and across situations" (Réale et al. 2010: 3937). By contrast, other definitions claim that personality requires temporal stability and/or contextual consistency (e.g., Réale et al. 2007: 294;Dall et al. 2012: 734). In these definitions, fulfilling only one of the two conditions is sufficient. In addition, it remains unclear how the different criteria for animal personality should be spelled out in detail and be integrated into a formal definition. At this point, philosophers can contribute to clarifying the involved concepts and their relation to each other. They can reveal the ontological and epistemological assumptions involved in studying animal personalities and establish coherence between them.
The central goal of this paper is to explicate what animal personalities are. What does it mean to say that individual animals have a personality? What determines the personality of an individual animal and how can personality traits and personalities (in general) be identified in biological practice? Under which conditions is it legitimate to ascribe a personality type to individual animals; for instance, when are some individuals bolder, more sociable or more aggressive than others?
Clarifying the concept of animal personality and explicating its epistemological and ontological underpinnings falls into two major tasks. The first task is to clarify the concept of animal personality with regard to its relation to other concepts and its ontological presuppositions. In studies of animal personalities, concepts such as 'behavioural syndrome', 'temperament' and 'coping style' are frequently used as alternative concepts to 'personality'. In addition, biologists specify and explain the concept of animal personality by referring to a broad variety of other concepts, such as 'personality trait', 'personality dimension', 'behaviour', 'behavioural type', 'behavioural trait', 'behavioural pattern' and 'behavioural tendencies'. Our paper responds to this "lack of coherence in terminology" (Réale et al. 2007: 291) and the "pressing need for a strong theoretical and conceptual foundation" (Réale et al. 2010: 3938) to overcome the confusion in this field. We develop a unified conceptual framework that clarifies the concept of animal personality and its relation to other major concepts used in the field (e.g., 'personality trait' and 'behaviour'). Our conceptual framework also explicates the ontological assumptions that underlie these concepts (section "Three levels of investigation: behaviour, personality trait and personality in general") and it explains how personality traits are operationally defined in terms of behaviours (section "Relation between behaviours and personality traits"). The second task consists in specifying how animal personalities are and should be identified in biological practice (section "Criteria for identifying animal personality traits"). Which criteria for ascribing personality traits to individual organisms are (implicitly) at work in this field and how can they be formulated and turned into precise, distinct criteria for animal personality? We think that both tasks-clarifying the relation of 'personality' to other concepts and explicating their ontology as well as specifying the criteria for identifying personality traits-are crucial for understanding what animal personalities are.
When analysing the concept of animal personality, we take a practice-based approach and pay special attention to the practices of how animal personalities are in fact identified in empirical research. We critically reconstruct the conceptual, epistemological and ontological assumptions about animal personalities that underlie these identification practices. Our analysis integrates information from different sources. Among the empirical sources we focus on biological review papers that are frequently cited when the concept of animal personality is introduced and defined (e.g., Carter et al. 2013;Gosling 2001;Réale et al. 2007;Sih et al. 2004b;Stamps and Groothuis 2010;Wolf and Weissing 2012). In addition, we conducted a case study analysis of 30 representative, empirical studies on boldness in different animal taxa, including vertebrates and invertebrates, published within the last 12 years. We searched the ISI Web of Science for the terms 'boldness' and 'personality' and selected studies that used the term 'personality' throughout the paper (not just as keyword), covering different study organisms, different authors and different study questions (see Supplementary Table 1). We focus on the personality trait boldness because, first, it is among the most-investigated personality traits in animals; second, boldness is studied in humans as well as in other animals (Wilson et al. 1994); third, the range and diversity of behaviours that indicate boldness and that can be investigated empirically is particularly broad; fourth, boldness is an intriguing personality trait because on the one hand we seem to have clear intuitions about what it means to be bold and on the other hand it is far from obvious which behaviours indicate boldness and how boldness can be demarcated from other personality traits, such as exploration or aggressiveness (Carter et al. 2013).
We bring together this empirical information from and about biological practice with ideas about personality, character and operationalisation that have been developed in philosophy, for instance in virtue ethics (e.g., Doris 2002;Besser-Jones and Slote 2015;Snow 2015) and in the philosophy of experimentation (Feest 2010(Feest , 2012. We think that applying philosophical methods and ideas can help to clarify biological concepts and provide a solid theoretical foundation of animal personality studies. Our paper is structured as follows. In section "Methods applied in studies of animal personalities", we briefly introduce the methodologies that biologists use to study animal personalities. Sections "Three levels of investigation: behaviour, personality trait and personality in general" and "Relation between behaviours and personality traits" identify major concepts that are applied in animal personality research: the concept of behaviour, the concept of personality trait, and the concept of personality in general. We clarify the meanings of these concepts, how they relate to each other and to other concepts in the field, and which ontological assumptions they invoke (section "Three levels of investigation: behaviour, personality trait and personality in general"). In particular, we show that personality traits are manifested in certain behaviours and experimental set-ups and that this is how personality traits are operationalised (section "Relation between behaviours and personality traits"). In section "Criteria for identifying animal personality traits", we develop an integrative, coherent account of how to identify personality traits. We highlight three criteria that should (and in most cases do) guide the ascription of personality traits. The criterion Individual Differences captures the idea that having a personality requires being different from others (section "Individual behavioural differences"). Temporal Stability states that personalities are expressed only if the corresponding behaviours are stable over a certain period of time (section "Temporal stability"). Finally, the criterion Contextual Consistency requires that personalities are expressed in behaviours that are consistent over different contexts (section "Contextual consistency"). In section "Concluding remarks", the main conclusions are summarised.

Methods applied in studies of animal personalities
Different methods have been established to characterise individual differences in personalities. To record information and to quantify behavioural differences, two main methods are used, coding and rating (Gosling 2001;Highfill et al. 2010). Codings are behavioural observations that are based on units of discrete, welldefined behaviour. The researchers assess the animals on task-related activities and record their behaviours in a quantitative way by scoring frequencies, latencies (time until something happens) or overall durations in responses in experimental laboratory set-ups or under naturally occurring settings (Mirkó et al. 2013;Vazire et al. 2007). For example, in the classical open field test, movements of individuals are recorded to determine locomotor activity levels, exploration and anxiety-or boldness-related behaviour (Prut and Belzung 2003;Tremmel and Müller 2013). This test has been developed to study vertebrate behaviour but has also been adopted to study invertebrate personalities. Movements are often filmed and are analysed for traits such as covered distance, relative amount of movements in the inner area of the open field, number of turning angles and so on (Tremmel and Müller 2013). In a different experimental set-up providing another context, the dark-light test, individuals are placed in a dark container or refuge and the latency until they move to light conditions and thus out of the refuge is measured (Müller et al. 2016;Sih et al. 2003;Zipser et al. 2013). Ideally, individuals are tested in various experimental set-ups that simulate different contexts (see section "Contextual consistency"). Staying longer in the centre of an open field and emerging quicker from a dark refuge are then often interpreted as being bolder (Briffa et al. 2008).
In contrast, ratings are based on judgements of observers that are familiar with the study animals, such as owners or keepers. They rate individuals on a number of behaviours on ordinal scales with various adjectives, ideally at several points in time. Ratings are usually done with questionnaires which are standardised as much as possible (Pastorino et al. 2017). For example, aggressiveness, which is often related to boldness (Sih et al. 2004b), can be rated using observations such as likelihood to bite a human, the tendency to bark or being hysterical or jealous of other conspecifics (Mirkó et al. 2013).
Both methods, coding and rating, have their advantages and disadvantages (Mirkó et al. 2013) and the applicability of these methods depends on the species under consideration. Overall, ratings have been less commonly used (Gosling 2001) and codings seem to be the more suitable approach (Vazire et al. 2007). One of the most important methodological challenges concerns the reliability of these observations. The reliability of ratings has to be confirmed with regard to inter-observer reliability and test-retest reliability. Both a sufficient number of observers and a sufficient number of items are requested to provide a reliable estimate of each anticipated dimension (Gosling 2001). In a few empirical studies, mostly on captive or domesticated animals, codings and ratings have been used in combination and tested for correlations to generate more information and enhance the validity and reliability (Mirkó et al. 2013;Pastorino et al. 2017). Ideally, 1 Page 6 of 25 coding and rating measures of certain behavioural traits should converge (Vazire et al. 2007). Subjectivity of observations should be reduced as much as possible, which can be realised by different statistical measures such as correlation tests.
In experimental assessments of behaviours, individuals can be either forced into a certain situation (i.e., placed into an open field) or they are able to choose situations freely (Carter et al. 2013). These different set-ups will impact how the outcome of the test is then interpreted highlighting the problem of the validity of empirical tests. Boldness may be only testable in a forced situation, whereas under free conditions rather activity and exploration are evaluated. This example highlights that it is also important to reduce the subjectivity in the interpretation of the meaning of an experimental outcome. We argue that this problem is best overcome by testing animals in various contexts, to base an interpretation not only on the outcome of one single trait.
Indeed, detecting personalities goes far beyond just observing individual behaviours. In the strict sense, the term personality should only be applied if individuals show consistent behavioural differences across both time and contexts (Stamps and Groothuis 2010) (see section "Criteria for identifying animal personality traits"). Behavioural observations should thus be done repeatedly in a standardised manner on the same individuals. Further methods are necessary following the behavioural observations to test for significant correlations across time and contexts. For example, Spearman rank-correlation matrices are computed and followed by clustering methods (Gyuris et al. 2011;Tremmel and Müller 2013). Alternatively, multivariate statistics such as principle component or canonical variate analyses are used to identify underlying dimensions that cause correlations between various behavioural variables (Carter and Feeney 2012;Ley et al. 2008). Repeatability (i.e., temporal stability) can be statistically tested, for example, by applying generalised linear mixed effect models and exploring the variance explained by an individual divided by the sum of variance explained by the individual and among individual variance (Nakagawa and Schielzeth 2010).

Three levels of investigation: behaviour, personality trait and personality in general
In this section, we examine the various concepts that are used to refer to and to specify the concept of animal personality. In our analysis we pay special attention to those biological review papers that are frequently cited when the concept of animal personality or related concepts such as 'behavioural syndrome' 1 or 'temperament' are introduced and defined (e.g., Gosling 2001;Réale et al. 2007;Sih et al. 2004b;Stamps and Groothuis 2010;Wolf and Weissing 2012, and others). Our aim in this section is to provide a unified conceptual framework, which brings together the different concepts that have been used so far and which is closely connected to the empirical work that is done in this research field. The conceptual framework that we propose is normative because, first, it involves identifying those concepts that are most central to the field, and second, we critically reconstruct the meaning of these concepts, establish coherence between these concepts, and reveal their ontological presuppositions (Kaiser 2019).
We think that three major concepts should be distinguished because each of them refers to a different level of investigation: behaviour, personality trait and personality in general ( Fig. 1). As we will point out in the following, for all three major concepts synonymous concepts exist that can be used.

The concept of behaviour
Behaviours are the types of actions that animals show and that are observed, recorded, quantified and coded or rated in empirical studies of animal personalities (see section "Methods applied in studies of animal personalities"). They are also referred to as 'behavioural traits' (e.g., Stamps and Groothuis 2010 : 302;Réale et al. 2010 : 3938). Examples of behaviours that are measured in coding experiments are latencies of "moving to light conditions", "contacting a novel object" and "feigning death after an attack". Depending on the observed latencies, animals then show distinct behavioural types (e.g., Sih et al. 2004b : 372;Wolf et al. 2007 : 581) for a given behaviour (Fig. 2). Behavioural types are thus subtypes of behaviours. For example, when measuring the behaviour "latency in contacting a novel object", animals can be assigned according to their behavioural type to individuals "contacting a novel object quicker" and individuals "contacting a novel object slower" compared to the average.
We think that there are three important characteristics of the concept of a behaviour and the concept of a behavioural type. First, both are type-level concepts. That is, they refer to types of behaviours, rather than to particular instances of behaviours, such as "beetle number 12 contacted the novel object after 28 s". Although, strictly speaking, in codings and ratings biologists observe particular instances of behaviours, the concepts 'behaviour' and 'behavioural type' that they use to describe and discuss the results of their studies are type-level concepts. This is due to the fact that attributing personalities requires generalising over several individuals and finding correlations between behaviours in different contexts and at different times (see section "Criteria for identifying animal personality traits"). Second, the concept of behaviour and the concept of behavioural type are empirical concepts, that is, they refer to such kinds of behaviours and behavioural types that can be studied empirically. As a consequence, many behaviours, especially those that are studied in coding experiments (e.g., the behaviour "latency of feigning death after an attack") are constrained by the specific experimental conditions in which they are measured. Further constrains may exist due to a limited number of tests that are feasible for testing a certain animal species. Third, the concept of a behavioural type is a comparative concept, while the concept of behaviour is not. Behavioural types are typically characterised comparatively because animals can have a personality only if they behave differently than others (section "Individual behavioural differences"). It is thus not surprising that instead of 'behaviours' or 'behavioural types' biologists often use terms such as 'behavioural differences' (e.g., Réale et al. 2007:, 291;Wolf and Weissing 2012: 452) or 'behavioural variation' (e.g., Réale et al. 2010: 3941)

The concept of personality trait
Whereas behaviours (and behavioural types) are the objects of measurement, personality traits are inferred from observing and measuring behaviours (Fig. 2). In both codings and ratings, certain behaviours are assumed to indicate the existence of a specific personality trait and in this sense to code for or express this personality trait. 2 For example, the latencies how long animals stay in the centre of an open field, how quickly they emerge from a dark refuge and how much time passes until they contact a novel object are often interpreted as coding for how bold an animal is. In other words, behaviours such as "contacting a novel object" are taken to be representative for the personality trait boldness (Tremmel and Müller 2013;Tan and Tan 2019). A single test may indicate different personality traits, for example boldness as well as exploration. Therefore, we recommend to ideally perform several tests that can be interpreted in the context of a certain personality trait and evaluate statistically whether the outcomes of these tests correlate. Personality traits are also referred to as 'personality dimensions' or 'personality axes' (Gosling 2001: 58;Sih et al. 2004a: 373). Among the most commonly studied personality traits are boldness, aggressiveness, exploration, activity and sociability (Gosling 2001: 48-57;Réale et al. 2007: 295;Wolf and Weissing 2012: 453). Just as animals with distinct behaviours show certain behavioural types, personality traits subdivide into different personality types (see Fig. 2) (Wolf and Weissing 2012: 453), which are sometimes also referred to as 'personality phenotypes' (Réale et al. 2007: 296). For instance, the personality trait "boldness" can be subdivided into the personality types "bolder", "boldest", "shyer" and "shyest". Accordingly, a continuum of personality types exists that all exemplify boldness. To sum up, personality types are subtypes of personality traits just as behavioural types are subtypes of behaviours.
Regarding the ontological nature of personality traits, we think that they are best interpreted as being dispositions. 3 Dispositions are properties that are manifest only under specific conditions, so called manifestation conditions (Hüttemann and Kaiser 2018). For instance, the breakability of a glass is manifest only if, for instance, it is struck with a hammer. Terms such as 'boldness', 'aggressiveness' and 'sociability' already indicate that personality traits are dispositions because they are not constantly manifest but only if specific conditions are fulfilled. For example, most animals are aggressive or show sociable behaviour only if conspecifics are present. Also, bolder or shyer behaviour is shown only under specific conditions, such as when a predator is present or if protection is available. The aim of empirical studies of animal personality is to establish the conditions under which a certain personality trait becomes manifest to test whether individuals possess this trait and by which behavioural type it is manifested. The claim that personality traits are dispositions 1 Page 10 of 25 corresponds well to the fact that biologists study personality traits only indirectly by means of observing (a limited number of) behaviours and distributing the individuals in behavioural types and by inferring the existence of personality traits as underlying dispositions (Réale et al. 2007: 295), which are themselves "unobservables" (Réale et al. 2007: 294). In section "Relation between behaviours and personality traits", we will further specify our dispositional view of personality traits and how this gives rise to questions about operationalisation.

The concept of personality in general
Another major concept that we think is central to studies of animal personalities is the concept of personality in general. This concept refers to a larger set of personality traits that are attributed to an individual animal after measuring several individuals of a group or population and are thus parts of the personality in general (see Fig. 1). Although the concept of personality in general plays only a minor role in the actual empirical studies of animal personalities, the idea that different personality traits together form a general personality of an individual is an important background assumption of these studies (Réale et al. 2007: 295). This background assumption becomes obvious, for example, when biologists report that animals of a certain taxa have a personality (Briffa and Greenaway 2011;Tremmel and Müller 2013).
In several studies, evidence for one personality trait, e.g. repeatability over contexts and/or time in one or few behaviours being interpreted as indicating one personality trait, is equated with personality in general (Briffa et al. 2008;Kortet et al. 2012;Morales et al. 2013). Other studies consider two up to four distinct personality traits to finally refer to a personality in general (Cote et al. 2010;Gyuris et al. 2012;Tremmel and Müller 2013). In addition, the use of concepts such as 'personality dimensions' and 'personality axes' suggest that a whole personality in general exists, which can be split in different dimensions or axes. Fig. 3 Certain behaviours cluster together (here B 1 + B 2 , B 3 − B 5 ), forming personality traits (here P 1 and P 2 ). If contextual consistency and temporal stability are given, individuals express a personality in general (P Gen  Unlike personality traits, which can be expressed by a continuum of different personality types, personality in general is not a gradual phenomenon. It is something that an animal either possesses or not. In summary, groups of certain behaviours that cluster indicate a certain personality trait. Not every behaviour may cluster with other measured behaviours (as behaviour B 6 in Fig. 3). Individuals showing consistent behavioural differences in different personality traits can then be considered to express a personality in general (Fig. 3).

Relation between behaviours and personality traits
The goal of this section is to analyse the relation between behaviours and personality traits in more detail. We take up the dispositional view of personality traits and show how this leads to the claim that personality traits are and should be operationalised in terms of behaviours and experimental set-ups (section "Operationalising personality traits"). On the basis of our case study on boldness we analyse how a specific ontological view of what a personality trait is constrains the set of behaviours that legitimately can be said to indicate and manifest this personality trait (section "Which behaviours to measure").

Operationalising personality traits
Interpreting personality traits as dispositions that are manifest only under specific conditions explains why personality traits are studied only indirectly by observing and measuring certain behaviours that animals display (recall section "The concept of personality trait"). Dispositions cannot be observed as such, only their manifestations can (Kaiser and Krickel 2017). Accordingly, experimental studies of personality traits aim at establishing specific experimental set-ups (manifestation conditions) under which personality traits (dispositions) become manifest in specific behaviours (manifestations). From the fact that a personality trait is manifested in specific behaviours follows that, in turn, these behaviours can be assumed to indicate or code for a specific personality trait.
This gives rise to the general question of how personality traits should be operationally defined (Réale et al. 2007: 295;Gosling 2001: 68). Operational definitions are characterised as providing "paradigmatic conditions of application for a given concept, thereby specifying a standard procedure for the scientific investigation of the phenomenon thought to be picked out by the concept" (Feest 2010: 179). In other words, operational definitions provide us with an understanding of how to empirically individuate and approach the phenomena of interest (Feest 2010: 178;Feest 2012: 177). Understanding personality traits as dispositions offers a plausible view of how personality traits can be operationally defined. Since personality traits are empirically accessible only through the behaviours that manifest them under specific conditions they should be operationally defined also in terms of these manifestations (i.e., behaviours) and manifestation conditions (i.e., experimental setups). Hence, to operationally define a personality trait such as boldness requires two aspects. First, one must identify one or optimally several behaviours (e.g., latency to contact a novel object) that indicate this personality trait. 4 Second, one must specify an operation or procedure in an experimental set-up that allows for measuring the behaviours (e.g., confronting individuals with an object that they never have faced before and assessing the latency until the individuals contact this object). Our view that personality traits should be operationally defined in terms of behaviours and experimental set-ups is in line with the methodology of codings and ratings and with the current practice in coding experiments to use certain observable behaviours (or behavioural types) as proxies for a certain personality trait (or type).
Sometimes a personality trait is even more radically operationalised. From the group of behaviours that indicate a specific personality trait, one behaviour is chosen as being representative for the others. The underlying assumption is that in order to study a personality trait it is sufficient to study this one behaviour in a specific experimental set-up. For example, it is widely assumed that the novel object test, which tracks the behaviour "approaching a novel object", is a legitimate way to study boldness in different animal species (Dammhahn 2012; Tan and Tan 2019).
The strategy of operationalising personality traits is very useful because it makes phenomena that cannot be observed directly (in this case personality traits) empirically tractable. Operational definitions are thus important tools of knowledge generation (Feest 2012: 176). Furthermore, for studies in the same or closely related animal species, operationalising can be very useful to measure and interpret personality traits in a comparative way without the need to justify the methodology in every study again. However, one should be cautious to simply transfer operational definitions between species without scrutinising whether a personality trait is expressed in the same set of behaviours in both species or whether behaviours must be interpreted differently. For example, one could imagine that contact duration of a solitary species confronted with a conspecific of the same sex may most likely be interpreted as aggression, whereas in a social species contact duration may also be interpreted in various other ways.

Which behaviours to measure
Personality traits are operationalised very differently in different research areas. The behaviours that are measured when studying a personality trait vary depending on the study species and on the experimental set-up. For example, boldness can be measured as duration of tonic immobility after a simulated predator attack in a beetle species (Müller and Juškauskas 2018) or as number of times hiding inside a refuge in a lizard species (Rodríguez-Prieto et al. 2011). The same personality trait is thus operationalised differently with regard to different species and in different experiments (Gosling 2001: 68). Some biologists therefore call for establishing unified, coherent definitions of personality traits (Réale et al. 2007) or they propose an integrative theoretical framework to overcome the conceptual and methodological problems (Carter et al. 2013). Our paper makes use of philosophical methods and brings in philosophical ideas to contribute to this debate. Our main idea, which we develop on the basis of our case study on boldness, is that the set of behaviours that indicate and manifest a personality trait is determined by the ontological view of what a personality trait is. In other words, a more substantial idea of what a personality trait is allows for distinguishing those behaviours (and corresponding experimental set-ups) that indicate or code for this personality trait from those behaviours that do not.
Our analysis of empirical studies of boldness (see "Introduction") reveals that various behaviours (and corresponding experimental set-ups) are used to measure boldness. Examples include the latency until movement after a disturbance or simulated predator attack, the duration of death-feigning, or the time spent eating under predation risk. In a surprisingly large number of papers, however, no explanation or justification of the assumption that certain behaviours indicate boldness can be found (e.g., Bell and Sih 2007;Brodin 2009;Morales et al. 2013;Supplementary Table 1). Some authors refer to the fact that the link between boldness and certain behaviours (and experimental set-ups) has already been successfully established by other empirical studies or is widely accepted in the field (e.g., Briffa et al. 2008;Cote et al. 2010;Frost et al. 2007;Kaiser et al. 2018;Labaude et al. 2018;Rodríguez-Prieto et al. 2011;Tüzün et al. 2017). The novel object test, for example, is said to be a "standard paradigm used to assess boldness" (Frost et al. 2007: 334).
Although several empirical studies contain no explicit explanation for why only specific behaviours are indicative of boldness we think that such an explanation can be reconstructed from these studies. In many studies, boldness is associated with risk-taking behaviour (Dammhahn 2012;Herde and Eccard 2013;Šlipogor et al. 2016;Tan and Tan 2019;Wilson and Krause 2012). The underlying assumption is that boldness becomes manifest particularly in risky situations and that being bolder implies to take risks, whereas being shyer implies to avoid risks and to act cautiously (Kortet et al. 2012). Risky situations can involve either an actual threat, for instance if the presence of a predator is simulated, or it can involve a potential threat, for instance if an animal is exposed to a novel environment, if a novel object is presented, if an animal can leave a shelter or a dark refuge, or if an animal is placed in an unprotected arena. Since the presence of an actual or potential threat often causes anxiety and stress, boldness is also linked to anxiety-related behaviour (Lewejohann et al. 2011;Rödel and Meyer 2011) and to activity under stressful conditions Šlipogor et al. 2016). Some empirical studies assume that boldness can also be expressed in exploratory behaviour (Briffa et al. 2008;Kortet et al. Labaude et al. 2018;Tan and Tan 2019). We think that this is only plausible if individuals are forced into a novel environment, where exploratory behaviour can overlap to a large extent with risk-taking behaviour. Cases of exploration that are risk-neutral in the sense that they do not involve taking any kinds of risks should not be interpreted as manifesting boldness. Similarly, only very specific cases of active behaviour, namely activity under the presence of an actual or potential threat, should be seen as legitimate indicators of boldness.
In sum, our analysis shows that it is possible to introduce a more substantial ontological view of what boldness is, which accounts for many assumptions about boldness that are implicitly made in biological practice and which constrains the set of behaviours that legitimately can be said to indicate or code for boldness. The view is that boldness is a disposition, which only manifests in risky situations that involve an actual or potential threat and that manifests in risk-taking behaviour (bolder individuals) or in risk-avoiding and cautious behaviour (shyer individuals).

Criteria for identifying animal personality traits
Having clarified what behaviours, personality traits and personality in general are and how they relate to each other, we can now turn to the second task and explicate how animal personalities are (and should be) identified. The goal of this section is to develop an integrative, coherent account of identifying personality traits. Our account concerns the identification of personality traits, not of personalities in general, because this is the primary goal of many empirical studies in animal personality research. In our account, we integrate various kinds of information: explicit discussions about criteria for animal personality traits in theoretical biology papers (e.g., Stamps and Groothuis 2010;Wolf and Weissing 2012), information about how animal personality traits are actually identified in empirical studies, and assumptions about the concept of character/personality that figure prominently in virtue ethics and personality psychology of humans (e.g., Doris 2002;Krahé 1992: Chapter 2; Besser-Jones and Slote 2015; Corr and Matthews 2009;Snow 2015;Stemmler 2016). Our account focuses on three criteria for identifying personality traits, which play a central role in biological practice: the criterion of Individual Differences (section "Individual behavioural differences"), the criterion of Temporal Stability (section "Temporal stability") and the criterion of Contextual Consistency (section "Contextual consistency"). The goal of this section is to work out important aspects of these criteria und to integrate them into a coherent formal definition.

Individual behavioural differences
During the past two decades, research on animal behaviour has undergone a major shift. While individual differences were traditionally considered as noise, nowadays researchers focus on the causes, consequences and underlying mechanisms of individual behavioural differences, which can actually be highly structured (Wolf and Weissing 2012). In the case of animal personalities different behavioural types coexist within populations (Wolf et al. 2007: 581). To explore these personalities, behaviours of individuals are measured repeatedly and in different contexts. Individuals can only be compared if several individuals of a group or population of the same species are investigated. Thus, per se the personality of just one isolated individual can never be analysed. In other words, the concept of personality can only be applied at the group level, while the measurements are taken on the individuals of the group (Stamps and Groothuis 2010: 311). Even within groups of genetically identical individuals kept under identical environmental conditions, repeatable individual differences in behaviour can be found (Lewejohann et al. 2011;Müller and Müller 2015). 5 Moreover, studies of animal personalities focus on the behaviours of the individuals relative to one another, not on the absolute levels or scores of behaviours expressed by each individual (Stamps and Groothuis 2010: 304). Taking the measurements of all individuals of a group, their behaviours can be clustered, for example, in dendrograms (e.g., Gyuris et al. 2011;Tremmel and Müller 2013). The dendrogram is then evaluated with respect to the entire group to delineate distinct personality traits. Within these dendrograms, each individual will have its individual-specific score and in that way will be unique. Hence, while there is the necessity to investigate the behaviour of individuals relative to each other, the uniqueness of each individual can still be emphasised. In a next step, individuals with similar scores are often considered again as subgroups. These subgroups are formed because the aim is to develop interesting generalisations and predictions that can be tested. For example, when behaviours of 30 individuals are studied, first the ranking of each individual in the continuum of behavioural scores is noted, but subsequently differences between subgroups of individuals sharing similar scores and in particular between the two extremes (e.g., the overall shyer versus overall bolder individuals) are explored. In that way, it can, for example, be studied how experience with certain environmental conditions shapes the personalities of individuals within and among populations.
To conclude, having a personality trait requires being unique in the sense of showing a set of behavioural types or traits that no other individual in the population shares. Accordingly, ascribing personality traits to individuals requires there to be differences between these individuals. In personality psychology, studies of human personalities are also understood to be studies of individual differences (e.g., Corr and Matthews 2009: 11;Stemmler 2016: 20). This observation gives rise to a first criterion that is necessary for identifying personality traits, the criterion of Individual Differences. The consequences of individual behavioural differences are manifold. In an evolutionary context both standing genetic variation and the degree of patterning of genetic and phenotypic variation need to be considered as drivers of the direction and outcome of natural selection. In an ecological context, individual differences are important drivers of competition within and among species and can thus influence ecological networks (Wolf and Weissing 2012). In the context of animal welfare, certain personalities may be used as proxies for emotional states (Richter and Hintze 2019) or psychological and physiological conditions of an individual, such as, for example, depression. Ultimately, the unique personality is a key determinant of fitness.

Temporal stability
The various definitions of animal personality reveal that mere individual differences in behaviour are not sufficient for ascribing personality traits. Behavioural differences must be "structured" (Wolf and Weissing 2012: 452), that is, they must be stable over time (as discussed in this section) and consistent over different contexts (see section "Contextual consistency"). The underlying general idea is that personality traits do not constantly change but are quite "robust traits" (Doris 2002: 18), which are reliably manifested and relatively stable (Stemmler 2016: 306). For instance, if an animal is bold, it does not show risky behaviour once and is otherwise cautious. Rather, it takes risks repeatedly and in different situations.
Temporal stability of individuals refers to the extent to which the score of an individual measured in a certain behaviour at one time changes when this behaviour is measured again at a later time (Stamps and Groothuis 2010). Such repeatability (i.e., consistent ranking over time) should ideally hold for several behaviours that indicate different personality traits (arrows in Fig. 4). Regarding repeated measures, temporal stability can be considered over two distinct intervals, shorter and longer ones (Stamps and Groothuis 2010: 308). Shorter intervals (e.g., within a few days) are used to determine whether behaviour is sufficiently stable across time to be in line with the definition of personality. By contrast, longer intervals (weeks up to months or even years, depending on the life cycle of the animal species under consideration) may be used to determine how behavioural types change over the course of a lifetime, that is, how a personality develops (Stamps and Groothuis 2010: 308). Apparently stable personality traits may change over ontogeny in response to shifting environmental conditions (Groothuis and Trillmich 2011;Müller and Müller 2015;Stamps and Groothuis 2010), whereas others may remain stable (Bell and Sih 2007;Sachser et al. 2013;Wuerz and Krüger 2015). Scores in behaviours and personality traits could change over ontogeny, because certain mechanisms induce behavioural shifts in response to the environment (Stamps and Groothuis 2010). These mechanisms can include genetic regulatory networks, epigenetic processes and neuroendocrine regulation, which interact with the environment and may enable life-long plasticity in personality (Trillmich et al. 2018). Thus, test rounds for behavioural observations may be done before and after a certain treatment, for example, to test the response to an environmental stimulus (Tüzün et al. 2017), before and after an important switch-point in life (Müller and Müller 2015) or in different life stages (Herde and Eccard 2013). Overall, the timeframe in which temporal stability is tested largely depends on the research question and context in which animal personalities are studied. In any case, it is crucial to always clearly indicate at which temporal scale within-individual stability is being measured (Réale et al. 2010: 3941). It is likely that all forms of short-and long-term stability are ecologically important (Sih et al. 2004b), while their consequences at the ecological or evolutionary level may differ substantially (Réale et al. 2010). Apart from the time interval or scale across which animals are investigated, the question arises how often the behaviour needs to be repeatedly tested within each interval. Studies range from one up to five repetitions (i.e., six test rounds in total, as in Tüzün et al. 2017). One major issue that needs consideration in repeated tests is the fact that former test experience may influence the subsequent test outcome.
Once an individual has been tested in a certain experimental set-up, it has gained experience with this test and is thus not naive to this set-up any more. For example, when testing for temporal stability of activity, larvae already tested a few days before were less active than larvae that had not been tested for activity . When testing for temporal stability of boldness in a novel object test, a given object is only novel in the first test round and therefore another object needs to be offered in a second test. Prior experience has indeed been shown to alter the degree of boldness (Frost et al. 2007). Thus, the effects of prior testing in repeated tests need to be considered when interpreting the data. Moreover, if several behaviours are tested repeatedly, not all behaviours may be stable over time. However, the majority of behaviours should be repeatable at least within a short-term interval to apply the personality concept.
The following second criterion for individuating personality traits spells out the requirement that personality traits must be robust and reliable in the sense of being temporally stable. should ideally be measured in different contexts (different fillings of circles) for contextual consistency. Moreover, the score of each individual in one behaviour must significantly correlate (as indicated by arrows) with the score in other behaviours that code together for one personality trait (P 1 , P 2 , …). For further details see legend of Fig. 3 individual

Contextual consistency
A third criterion for ascribing personality traits to individual animals is the contextual consistency of intraspecific differences in behaviour. The central idea that underlies this criterion is that a given personality trait is usually expressed not only in a single behaviour but in different behaviours, which are expressed in different contexts (indicated by different fillings in Fig. 5), but which all indicate or code for the same personality trait (indicated by the arrows in Fig. 5). Contextual consistency thus refers to the extent to which the score of an individual measured in one behaviour is highly correlated with its score(s) in (an)other behaviour(s) that code(s) for the same personality trait. The idea that personality traits must be expressed consistently over a range of different contexts or situations can also be found in philosophy and traces back to Aristotelian virtue ethics. In Aristoteles' view, characters or personality traits (also referred to as virtues) are interpreted to be "robust dispositional traits, i.e. traits that lead us to act in similar fashions across a wide range of situations" (Besser-Jones and Slote 2015: 376; see also Snow 2015: 362). Doris (2002: 22) formulates this criterion as follows: "Character and personality traits are reliably manifested in traitrelevant behavior across a diversity of trait-relevant eliciting conditions that may vary widely in their conduciveness to the manifestation of the trait in question." For example, someone who is courageous is expected to exhibit courage in a wide variety of relevant contexts or situations, such as in a war, when other people are oppressed, on the sports field, when somebody is being robbed, and so on. Which situations and behaviours are relevant to a specific personality trait of humans is an interesting question that is difficult to answer on a general level (Doris 2002).
In studies of animal personality, contextual consistency can be empirically tested: selected behaviours that animals show in different contexts or situations are grouped together and assigned to different personality traits or to the same personality trait according to their statistical correlations (see arrows in Fig. 5). We speak about contextual consistency, rather than about contextual stability, because a personality trait is not expressed by a single behaviour that is stable over different contexts. Instead, a personality trait is expressed by different behaviours and the expression is consistent over contexts.
The characterisation of contextual consistency that we have discussed so far needs to be specified and scrutinised in two respects. First, the question arises how many different behaviours need to be consistent across contexts to assign a 1 Page 20 of 25 personality trait. In our case study analysis on boldness, the number of behaviours that indicate boldness and that were scored in different individuals within species ranged from just one behaviour (e.g. Brodin 2009;Kaiser et al. 2018;Monceau et al. 2015; Tan and Tan 2019) up to six different behaviours (Šlipogor et al. 2016;Wilson and Krause 2012). In most of the 30 papers that we analysed, also other personality traits, such as activity and exploration, were measured within the same study but different numbers of behaviours were studied for each personality trait.
Second, the question arises whether all behaviours need to be measured in different contexts (contextual consistency) or whether a correlation between different behaviours in the same context also suffices (behavioural consistency). As an example of contextual consistency, bold mustard leaf beetles not only approach a novel object quicker but also become active quicker after feigning death and spend more time in the open field (Tremmel and Müller 2013). These three behaviours are measured in different contexts but the scores of the beetles highly correlate, clustering in the personality trait boldness. We use a broad notion of 'context,' which we take to be synonymous with 'situation' or an 'experimental set-up'. This notion is not restricted to functional behavioural categories, such as feeding, mating, antipredator, parental care, contest or dispersal contexts (Sih et al. 2004a: 372), but also includes other kinds of conditions that are external to the individuals under study and that influence the investigated behaviours. We thus agree with Stamps and Groothuis that a context includes "the entire range of stimuli that impinge on individuals when they express behaviour" (Stamps and Groothuis 2010: 304), even though in most cases we will not be able to measure the entire range.
Contextual consistency in a stricter sense (i.e., excluding behavioural consistency) seems to figure as a regulative ideal, which is aspired but often not achieved due to practical constraints. It is not always possible to have a sufficient number of experimental set-ups to measure behaviours in distinct contexts that then cluster together in one personality trait. On the other hand, different behaviours measured in the same context can also cluster in distinct personality traits. For example, the behaviour "inner area movements" clusters with other behaviours measured in other contexts in the personality trait boldness, whereas the behaviours "covered distance" and "amount of movements" cluster in the personality trait activity in the mustard leaf beetle. All three mentioned behaviours were measured in a (forced) open field context (Tremmel and Müller 2013). We claim that in studies of animal personalities it should be described in which contexts which behaviours are measured and it should be justified how these contexts differ from each other. Furthermore, to fulfil contextual consistency at least two different behaviours in at least two contexts must be tested. Not all behaviours that are assumed to indicate the same personality trait must be shown in different contexts but ideally a personality trait should be expressed by behaviours in at least two different contexts. The following third criterion for identifying animal personalities summarises these ideas.
An individual animal I1 has a personality trait P1 only if (3) the scores of behaviours Bx that indicate P1 are correlated and I1 shows at least two of these behaviours in different contexts (Contextual Consistency).

Concluding remarks
What does it mean to say that some individual animals have a personality? In this paper we answer this question by clarifying three concepts that are central to studying animal personality: behaviour, personality trait and personality in general.
Behaviours are the types of actions that animals show and that are empirically investigated in codings or ratings. Individuals differ in their behavioural types. Personality traits are inferred from observing and measuring behaviours. We show that this means that personality traits are operationalised in terms of specific sets of behaviours and experimental set-ups. Another major result of our analysis is that personality traits are best interpreted as being dispositions because they are manifested in certain behavioural types only if specific conditions are fulfilled. Individuals differ in their personality types, which are subtypes of personality traits just as behavioural types are subtypes of behaviours. The third concept, the concept of personality in general, refers to a larger set of personality traits that are attributed to an individual animal. In contrast to behaviours and personality traits, which can be expressed by a continuum of different types, personality in general is not gradual but binary. Our analysis reveals that many empirical studies of animal personality are concerned with discovering specific personality traits, not personality in general. Based on these conceptual and ontological clarifications, we specify how animal personalities are and should be identified in biological practice. Under which conditions is it legitimate to ascribe a personality trait to individual animals? To answer this question, we developed an integrative, coherent philosophical account of identifying personality traits, which specifies and formalises three criteria, Individual Differences, Temporal Stability and Contextual Consistency, that play a central role in biological practice and should all hold in parallel.