Cues to individual identity in songs of songbirds: testing general song characteristics in Chiffchaffs Phylloscopus collybita

Individual variation in vocalizations has been widely studied among different animal taxa, and it is commonly reported that vocalizations could be potentially used to monitor individuals in many species. Songbirds represent a challenging group of animals for the study of signalling of individual identity. They are highly vocal, but their songs are complex and can change over time. In this study, we tested whether general song characteristics, which are independent of song type, can be used to discriminate and consistently identify Chiffchaff males within and between days and between years. There was individual variation in songs of recorded Chiffchaffs, and it was possible to easily discriminate between males at any one point in time. However, the level of re-identification of males across days and years was low. For effective identification it was necessary to compare songs of a single song type. However, Chiffchaffs haphazardly switch among song types, sometimes singing the same song type for a long time, making it difficult to collect equivalent song types or to sample the birds’ full repertoires. For example, 5-min recordings of males taken in different years did not contain equivalent song types, leading to low identification success. Although we were not successful in the re-identification of males based on general song characteristics, we discuss methods of acoustic identification which are not dependent on song repertoire content and are potentially valuable tools for the study of species such as the Chiffchaff.


Introduction
Recognition of individuals is widespread in the animal kingdom. Both receivers and signallers can benefit from individual recognition in situations when they repeatedly communicate and interact with each other (Tibbetts and Dale 2007). Its importance is apparent throughout the whole life of an animal, both for social or solitary species. For example, it is important to recognize individuals within the family. Parent-offspring recognition, for example, can be challenging, especially in colonially breeding species (Jouventin et al. 1999), but is essential for offspring survival. Later in life, it can be important to distinguish among familiar and unfamiliar group members and between friend and foe (Olendorf et al. 2004;Boeckle and Bugnyar 2012). This recognition can even last over several years (Godard 1991;Boeckle and Bugnyar 2012;Draganoiu et al. 2014). The existence of individually distinct signals on the sender's side is a precondition for the process of recognizing an individual.
Vocalizations are suitable for identity signalling because they provide actual and current information even over long distances and in densely structured environments. Scientists have focused their attention on individual variability in bird vocalizations for decades, and many studies show individual variation in vocalizations in birds (Hutchison et al. 1968;Thompson 1970;White et al. 1970;Peake et al. 1998;Lengagne 2001;Petrusková et al. 2016). Studies on how individual identity is expressed in vocalizations are crucial for understanding the individual recognition process, but they are also important for practical situations, for example, this type of information might be eventually used as an alternative or complementary method for individual monitoring (Terry et al. 2005;Laiolo et al. 2007;Mennill 2011).
Non-songbirds are considered to lack complex vocal learning (Jarvis 2004; see e.g. Tyack 2016) and, therefore, represent good study models for identity signalling. Despite the fact that vocalizations of non-songbirds could be more plastic than previously thought (e.g. Derégnaucourt et al. 2009), their vocalizations are, in general, simpler and more stable than songs of complex learners. Accordingly, calls of many non-songbirds were successfully used for discrimination and, especially, for later reidentification of particular individuals (Peake et al. 1998;Lengagne 2001). ''Discrimination'' and 're-identification' as defined by Peake et al. (1998) represent two related terms of distinct meaning, used in studies on individual variation of vocalizations. Discrimination requires individual variation in vocalizations and can be useful when one needs to determine how many individuals are calling. Re-identification requires that individually variable features remain stable over time, or, in other words, are repeatable. Features that can be used for re-identification are more likely to play roles in individual recognition.
Discrimination and re-identification of individual songbirds (and a few other avian, complex vocal learners), on the other hand, can be challenging due to their vocal learning abilities. Songbirds might possess more complex identity coding and decoding mechanisms than other nonlearners. The plasticity of their songs could possibly hinder recognition of individuals between birds, as well as discrimination and re-identification of individuals in acoustic monitoring programs of individuals. For example, song sharing and large song repertoires, which are both associated with songbird acoustic communication, were proposed to hinder individual recognition (Stoddard 1996). Further, individuals may change their song repertoire over time, which can make re-identification of males difficult (Kroodsma 2004;Kipper and Kiefer 2010).
Studies have focused on different aspects of songs when testing for individual variation and when suggesting possible ways of how identity is expressed. For example, birds may develop individually specific repertoires of songs, song types or syllables (Thompson 1970;Cicero and Benowitz-Fredericks 2000;Petrusková et al. 2016). Further, only subunits within the repertoire were suggested to serve as individual signatures. Such individual signatures can be located in specific parts of songs only-phrases or syllables-typically at the beginning of the song (Nelson and Poesel 2007;Wegrzyn et al. 2009;Osiejuk 2014). Also, the order of song elements (Briefer et al. 2009) or proportion of song units (Sandoval et al. 2014) can be individually specific.
Most of the studies have focused on particular patterns in spectrograms when looking for identity cues. In other terms, researchers were interested in particular song types or syllable types. The minority of studies focused on ways of signalling identity by 'voice quality'. Weary and Krebs (1992) have proven that Great Tits (Parus major) can recognize a familiar and unfamiliar individual by general voice characteristics, independent of the syllable content of the song. They proposed that individuals could be tuned to sing in a particular way, and basic song characteristics such as song duration, minimum, maximum and peak frequencies, could be common to all songs of an individual independently of song type (Weary et al. 1990;Weary and Krebs 1992). Such characteristics independent of song type have been little studied. Content-independent identification can, nevertheless, potentially be universally applied, allowing rapid individual recognition in species with simple as well as complex, even changing, repertoires. Such content-independent identification of individuals could also be beneficial in acoustic monitoring programs because it could be easily adapted to different species (Fox 2008).
The Chiffchaff (Phylloscopus collybita) is a very common, small-sized song bird (Passeriformes; Sylviidae) and is widespread in the Western Palearctic (Cramp et al. 1992). The colouration of Chiffchaffs is monomorphic, cryptic, and without significant individual variation. Nevertheless, males are very vocal throughout the breeding season (Rodrigues 1996) and their songs function in territorial defence (Linhart et al. 2012(Linhart et al. , 2013 and likely for individual recognition as well. Indeed, Chiffchaffs are able to distinguish between their neighbours by a single, randomly chosen song (Jaška et al. 2015). While the song sounds simple ('chiffs' and 'chaffs'), there usually are at least three syllable types per song and males may have up to ten different syllable types in their repertoire. Syllable types can be shared between individuals. The sequence of syllables within the song is irregular (Cramp et al. 1992). While contact calls were documented to be individually distinct in Phylloscopus collybita canariensis (Naguib et al. 2001), similar analysis has not yet been done on the songs. Moreover, songs were not supposed to carry identity information due to their variability (Cramp et al. 1992). Due to the fact that Chiffchaffs are able to quickly recognize other individuals, despite this variation and syllable sharing, they seem to be a very suitable model with which to explore the function of general song characteristics in individual recognition.
In this study, we first present information about repertoire size and song organization in the Chiffchaff to see how these might relate to individual recognition. The main aim of this study, however, was to test whether it is possible to discriminate between Chiffchaff male individuals based on general song characteristics independent of repertoire (sensu Weary and Krebs 1992), and whether it is possible to use these general song characteristics to reidentify individuals within different timescales. We explored individual variation in songs recorded (1) within a single recording session, (2) within a single day, (3) between 2 successive days, and (4) between years.

Study area
Males were recorded in a former military training area on the outer boundary of Č eské Budějovice, Southern Bohemia, the Czech Republic (48°59,5 0 N, 14°26,5 0 E). The area (ca. 1 km 2 ) is covered by small ponds, marsh, and shrub. Willows (Salix sp.), birches (Betula sp.), and poplars (Populus sp.) dominate the vegetation. We were studying Chiffchaffs at the locality from 2008 to 2012. Chiffchaffs are migratory and arrive at the locality from the second week of March with most of the males arriving towards the end of March. The area of 1 km 2 hosts approximately 60 males every year, and the breeding density is relatively high. The large majority of males was colour-banded during the years.

Recordings
Males were recorded for the purposes of various studies from 2008 up to and including 2011. Recordings were made from April to June, and the exact dates differed between the two particular datasets (see below). Only spontaneously singing males, which were not involved in territorial interactions at least for 5 min, were recorded. Individuals studied were marked by a combination of standard aluminium rings with up to three different colours, allowing identification of individuals in the locality. Recorded songs represent two different datasets: the first dataset was used for studying individual variation within and between days (DAYS dataset; n = 13 males), and the second dataset was used for studying individual variation between years (YEARS dataset; n = 16 males). A detailed description will follow. In total, songs from 29 different males were used in this study. None of the males were used in both YEARS and DAYS datasets. Recording and weather conditions were comparable for all recordings. Recordings were done from 0530 to 1100 hours in the morning. We used a Marantz PMD 660 recorder and a directional microphone (Sennheiser ME67). We tried to record birds from as close as possible, usually within a distance of 5-15 m from the singing bird, and with no obstacles between the microphone and the recorded male, if possible.
The DAYS dataset included recordings from June 2011 only. The DAYS dataset was recorded on 2 successive days for each male. Only the males that sang vigorously, and which were likely to be at the same breeding stage (i.e. the beginning of the second breeding), were included. During the first day, recordings were taken from early morning from ca. 0500 to ca. 0930 hours. We aimed to record extensive sets of songs from each individual. During the second day, each individual was re-recorded only for about 15 min; i.e. about 50 songs for each male were recorded during the early morning (ca. 0500-0600 hours). Specifically, recordings from the first day (day 1) represent, on average, 63 min of continuous singing (minimum 28 min, maximum 115 min); and recordings from the next day (day 2) represent 14 min on average (minimum 5, maximum 61 min). Overall, we recorded over 17 h of singing, reaching a total of 6216 songs on both days from which we subsequently selected songs with excellent quality for automatic analysis (Table 1).
Recordings for the YEARS dataset were taken from years 2008 up to and including 2011. The YEARS dataset comprises males that were recorded and re-recorded in two different years. In both years, recordings were done during the late April and early May when the males' singing activity is high and in most of the pairs female is incubating the eggs. The YEARS dataset contained on average 4 min-long recordings (minimum 2 min, maximum 8 min). Altogether, 837 songs from 16 males were collected; but again, only excellent songs were selected for automatic analyses (Table 2).

Song analyses
All songs were annotated, processed and analysed in Avisoft SASLab Pro software (Raimund Spetch, Berlin). As a first step, we checked each recording to select the bestquality songs for analysis. In the second step, any songs with apparently low recording quality, those containing excessive background noise, or those overlapping with vocalizations of other birds, etc., were rejected from any further analyses. Subsequently, we applied a high-pass filter (2500 Hz) on all preselected songs and down-sampled the songs to a 22,050 Hz sampling frequency (originally recorded at 44,100 Hz, 16-bit, with no compression). These songs were used for classification of song types. The number of songs was further reduced for automatic measurement analyses (see below).

Song type and syllable repertoire
Although we were mainly interested in the possible role of general voice characteristics in the recognition process, we also explored the repertoire composition of each male in the DAYS dataset to see how song type might affect the results of identification and also to investigate a possible role of the repertoire in individual recognition. We use common terminology to describe song units, as illustrated in Fig. 1 (Catchpole and Slater 2008). An 'element' is a single continuous trace on the spectrogram. A 'syllable' consists of one  or more elements that are always combined in the same way. Usually, gaps between elements are very short, so they appear as a single sound to the human ear. The largest gap between elements within a syllable was about 0.05 ms in the case of one individual which sang an unusual syllable with two clearly distinguishable elements. Elements, or syllables, thus represent the smallest building blocks of songs. A 'song' is a sequence of syllables (more than three syllables-an arbitrary threshold) separated from other songs by a period of silence ([0.7 s) substantially exceeding the usual time interval between syllables (approximately 0.35 s). Besides these song units, we observed that Chiffchaff singing typically includes certain 'song types' which are generally repeated during a bout of singing for many times before they switch to another type. A song type consists of clusters of syllables which are sung almost always together in the song strophe, though their total number within a song and their order can vary. For example, syllable 'a' and syllable 'b' could make a song type 'ab', and the sequence 'aaaabb' or 'baba' could still belong to the same song type 'ab'. Syllable types are generally not shared between song types (Jaška, unpublished data). For the purpose of song type classification, we created a syllable repertoire for each male. Differences between syllables are typically obvious. However, there are groups of syllables that may look very similar according to their time-frequency modulation. For example, they may look like uncomplete versions of another syllable or even another syllable which has shifted in frequency. In cases where we could not identify intermediate forms, we considered them separate syllable types. Song type was defined as the set of syllable types which were consistently repeated together in songs (see Fig.2). Sometimes, the first syllable of the song is different from all the other syllables and consistently occurs only at the first position within a song. We termed these syllables 'initials'. Initials were not used to calculate repertoire size. Initials of the same type can occur within or across song types.
Due to the large size of the DAYS dataset, it was not possible to have songs classified by several observers. But we tested the reliability of subjective classification of syllables to syllable types on songs from the YEARS dataset. Reliability was characterized by percentage of agreement between two observers, which is a suitable reliability index in our case (Jones et al. 2001). One of the authors (P. L.), selected main syllable types from 200 random songs, 20 songs per male-ten songs from year 1 and ten songs from year 2. P. L. screened song spectrograms of each male briefly and listed all syllable types in a song whenever there was remarkable change in song syllable content. Syllable type templates were labelled alphabetically ('A'-'L'; depending on number of syllable types). Altogether, 79 syllable types were detected (six to 12 syllable types per male). The co-author responsible for classifying song and syllable types in the DAYS dataset (P. J.) and one observer with experience in song analysis, but not familiar with the Chiffchaff repertoire (P. S.), were asked to rewrite songs into syllable sequences based on spectrograms and predefined syllable types. Syllables not matching the templates were labelled 'X'. The beginning and the end of the song were noted on spectrograms to ensure that observers scored exactly the same sequences. The observers knew which songs belonged to a single male but did not communicate with each other regarding the year of recording and the scores of the other. The observers agreed on a number of syllables within a song. The same syllable count was scored for 96% of 200 songs, and mistakes seemed to result not from a difference in recognition of the syllables but from issues regarding recording quality and typos. The songs with different syllable numbers were not included in the calculation of reliability between the observers. The observers were also asked to suggest new syllable types or to merge syllable types when differences between syllable types were not big enough to justify distinct syllable types. Six different combinations were suggested in total, and the observers agreed on three of them. In total, 2113 syllables were scored. Syllables not assigned to the predefined syllable types (i.e. X syllables) represented only 6% (P. J.) and 13% (P. S.) of the syllables, and 16 (P. J.) and 35 (P. S.) new syllable types were suggested by the two observers. The observers' syllable types coincided for 84% of the syllables. From the remaining 348 disagreements, 207 of the cases (59%) concerned situations when one of the observers scored syllable X. In the case that both observers could have assigned a syllable to a predefined syllable type, agreement was 94%. We, therefore, consider this classification of syllables and song types (based on syllable content) to be sufficiently reliable for the syllables appearing frequently in the songs. However, identification of rare variants and the total number of syllable types may differ based on the person doing the classification.

Song parameters measured
We used the Automatic parameter measurement tool in Avisoft SASLab Pro to measure song parameters. We measured the following basic song parameters: song duration (seconds), syllable interval (seconds), minimum frequency (Min. F; hertz), maximum frequency (Max. F; hertz), peak frequency (Peak F; hertz), and frequency quartiles (25, 50, 75% of the spectrum, Q25, Q50 and Q75 respectively; all in hertz) (Fig. 1). Frequency measurements were done on a mean power spectra. The minimum and maximum frequencies were taken at -20 dB from the peak amplitude. We visualized measurements on spectrograms and reviewed the outcomes of automatic analysis for each song. Songs containing any mistakes, for example, incorrect identification of onsets and offsets of songs and syllables, and frequency measurements outside the song frequency range, were rejected from measurements.
Many recordings were eliminated from further analyses, but a large sample of songs still remained. Of the original 6216 songs, 2770 songs were selected for the DAYS dataset (see Table 1). The YEARS dataset was much smaller and we analysed 335 songs of the 837 original songs. Furthermore, data of seven birds (six in the YEARS dataset and one in the DAYS dataset) were completely removed from further analyses because of the low number of songs per recording (Table 2). Overall, 12 birds in the DAYS dataset and ten birds in the YEARS dataset remained for further analyses. The DAYS dataset was further reduced to ten males for the analysis of the reidentification of the same song types because, for two males, there were only three or six high-quality songs from a particular day for discrimination, which we considered insufficient.

Statistical analysis
Statistical analyses were performed using the statistical program R. We started with univariate analyses. We then calculated the potential of individual coding (PIC) for each acoustic variable (Robisson et al. 1993). We calculated PIC from the first 15 songs on day 1 and from all songs on day 1. Further, we calculated the repeatability of each parameter on different timescales using the Pearson R correlation coefficient (Nakagawa and Schielzeth 2010). For each song characteristic, we calculated the average value for each male from the first 15 songs on day 1 and correlated them with average values of the last 15 songs on day 1 (withinday repeatability), and the first 15 songs on day 2 (repeatability between days; for three males, only 14, 11, and 7 songs were available). Further, we used songs of the same song type on day 1 and day 2 to calculate the repeatability of characteristics for these same song types. For repeatability of song characteristics between years, we correlated the males' average song characteristics from year 1 and year 2.
Further, we used linear discriminant analysis (LDA) to assign songs to each bird according to a measured song's characteristics and determined classification success achieved by the discriminant analysis. Classification success from LDA was compared to the classification success expected if the classification were made by chance. We assumed an equal probability of chance classification for all males and calculated the success of such classification by dividing 100 by the number of males in the discrimination analysis. The success of the chance classification was compared to the success of the LDA classification using a binomial test. It should be mentioned that assumptions regarding multivariate normality and homogeneity of variance were violated in our data, as is often the case for datasets of field studies (McGarigal et al. 2000). Since LDA is moderately robust to the violation of assumptions, and we make no inference regarding specific values of canonical parameters, we considered using LDA appropriate in our case. The violation of assumptions is associated with an increase of misclassifications, and our classification results should be, therefore, considered conservative (McGarigal et al. 2000;Mundry and Sommer 2007). All acoustic variables were scaled to z-scores prior to LDA.
Several LDAs were conducted. The first LDA was conducted to see whether individuals can be discriminated within the recording session timescale. It was based on the first 15 songs from day 1. We call it a 'recording session' although, in some cases, if the first recording session did not contain enough high-quality songs for measurements, additional songs from the next recording session were used as well. We used leave-one-out cross-validation to discriminate between males. Although we would have preferred to use jack-knife cross-validation, there were usually not enough songs within a single recording session for independent training and test datasets, and we wanted to keep songs as close as possible to each other in time. For LDA data on within-day, between-days, and between-years timescales we used jack-knife cross-validation, which means these data were divided into training and test subdatasets. For example, the LDA for the within-day timescale included the first and last 15 songs from day 1. The first 15 songs were used as a training dataset to derive discriminant functions which were subsequently used to identify males from the last 15 songs from day 1, forming the test dataset. Identification based on the between-days timescale used the first 15 songs from day 1 for training and the first 15 songs from day 2 (with the exception of three males where only 14, 11, and 7 good quality songs were available) for testing; the between years timescale used all songs from one year for training and all songs from the second year for testing.
We conducted two additional LDAs on the between-days timescale to assess the effect of the number of songs available in the training dataset, as well as the effect of song type included in the classification success. First, we tried to increase the training dataset and used all available songs from day 1 to derive discrimination functions, and the first 15 songs from the second day (with the exception of three males where only 14, 11, and 7 good quality songs were available) were used as a test dataset. Further, we conducted the LDA on songs of the same song type (i.e. songs composed from the same syllable types, although the songs differed in the total number and sequence of syllables) from day 1 (training dataset) and day 2 (test dataset). We pragmatically selected song types with enough examples for day 1 and day 2 for the analysis. Therefore, the most frequent song type during day 1 was not always used for the analysis.

Results
Song type and syllable type repertoire of Chiffchaff males Song types were determined for a total of 5532 songs of 13 males from the DAYS dataset (103-757 songs per male, median = 355). Each male had two to eight distinct song types (median = 5) (Fig. 3). In some songs, it was apparent that there was a switch between song types within one song. These were labelled as 'mixed song types', and it seems that they were more common when a male switched between song types. Most of the songs were not mixed (80-100% depending on the particular male; median = 91%) and could be clearly associated with one of the basic song types. Males may sing one song type many times before switching to another; overall, a maximum of 277 consecutive songs belonged to the same song type. Males usually had one dominant song type comprising a significant proportion of the songs (21-89%, median = 55% of songs belonging to the most frequent song type). Apparently, Chiffchaffs do not circle through the repertoire before singing the same song type again. Males had nine to 24 different syllable types (median = 15). The cumulative repertoire size curve (Fig. 4) shows characteristic steps corresponding to the fact that new syllables occur together in new song types. Despite a very extensive number of songs for some of the males, there was no apparent asymptote in the cumulative repertoire size curve. Syllable types rarely occurred in more than one different song types of a male.
We compared the song types occurring in two different years visually. We found a match in song types between the first and the second year in just a single case. Syllable type matches were observed more frequently and occurred in all but one male. The number of syllable types detected in 2 years ranged from zero to seven for each male. Syllable types found in both years were typically used in different song types in each year.

Discrimination of individuals
First, we assessed the PIC and repeatability of each of the acoustic song characteristics (Table 3). PIC values were higher than 1 for the first 15 songs from day 1 as well as for all songs from day 1. PICs were always lower when all songs were included. Song characteristics were repeatable within a day, except for Max. F, and repeatability decreased markedly on between-days and between-years timescales when the analysis was done independently of song type. When considering the same song type songs on days 1 and 2, between-days repeatability was high for all parameters.
Since all variables showed PIC [1 and most of them were repeatable within a day (except Max. F), we used all variables for LDA. However, we also report overall discrimination results for LDAs where only repeatable variables were included (in parentheses). The number of correct classifications of songs to the correct individuals was 59% for songs recorded within a short time interval, within a single recording session. Hence, it was possible to discriminate between individuals with moderate success. Re-identification was not as successful. Within-day classification accuracy was 34% (discrimination based on all acoustic variables except Max. F = 33%), and betweendays classification accuracy was 28% (discrimination based on song duration, syllable interval and Q25 = 28%) (Fig. 5a). The between-years classification accuracy was low and similar to the between-days accuracy-only 28% (discrimination based on syllable interval and Q50 only = 29%). Even such a low accuracy is still almost twice as high as the accuracy expected by chance (binomial test, all p \ 0.001). Classification success for year 1 songs only (training dataset) with leave-one-out cross-validation was high at 77%. Classification accuracy also varied markedly between males within each timescale. For example, the accuracy within recording session for PC1107 was 100% in comparison with only 7% of correct classifications for PC1104.
Although the identification accuracy was better than that expected by chance, it was quite low and not very useful for the re-identification of males. We, therefore, tried two  approaches to improve re-identification. First, we tried to increase the training dataset and used all available songs from day 1 to derive discrimination functions. This led only to a small increase in the classification accuracy to 38% (Fig. 5b). On the other hand, when we used only the song types sung on both day 1 and day 2 for each bird, the identification increased substantially to 75% (Fig. 5c). This level of identification is high and only slightly lower than discrimination for a single song type on day 1 (the same song type songs from day 1-83% with leave-one-out validation).
In the between-days identification without distinguishing song types, the score of correct classification for a particular male correlated with the number of songs of the same song type between day 1 and day 2 (Spearman's rank correlation, r = 0.60, p = 0.038). For example, in male PC1109 all songs on day 2 were of the same song type as on day 1, which led to the highest score and correct identification of 60% (9/15 songs classified correctly); on the other hand, correct identification for males PC1101 and PC1106, which respectively sung one and no songs on day 2, gave the same song type as songs on day 1. This led to no song being correctly classified.

Discussion
We demonstrate that the Chiffchaff has an interesting and complex song organization despite the simplistic, repeated melody indicated by its name ('chiff-chaff'). We document here support for our original impression that the combination of syllables within Chiffchaff songs is not completely random, and that syllables occur in clusters and form distinct songs, which we refer to as song types. Although this was not a proper repertoire study, our results suggest that the repertoire size reported in the literature for the Chiffchaff might be underestimated due to the low tendency of Chiffchaffs to switch between songs. Hence, long periods of sampling seem to be necessary to record the entire repertoire of a species like the Chiffchaff.
We show that, based on discriminant function analysis, it is possible to discriminate between Chiffchaff males based on general song characteristics within recording session, as shown by relatively high discrimination success on songs recorded in a single recording session. This means there is individual variation in these parameters. However, assigning additional songs from a different recording sessions to a particular male, i.e. re-identification of males, is problematic when ignoring song syllable content. On the other hand, re-identification was nearly as successful as discrimination when using the same song type. This suggests that the song characteristics which we measured are not similar between song types of a particular individual and cannot be used for individual recognition independently of song type.

Repertoire size and song organization in the Chiffchaff
Even though measuring repertoire size and examining song organization were not the main goals of the study, we document that certain combinations of syllable types occurred consistently together in songs. A song can, therefore, be classified as belonging to one of several song types based on syllable type composition. Such song Fig. 5 a-c Re-identification of males by their songs between days depending on song type and number of songs used for the training dataset. a The first 15 songs from day 1 were used in the training set regardless of song type. b All available songs from day 1 were used in the training set (i.e. song types from the testing dataset were included in the training dataset). c The training and testing dataset involved songs of the same song type only organization has not been reported previously for the Chiffchaff, and seems to be an interesting feature for study in other species as well. Song types are usually considered as having the same syllable composition as well as order; thus, there is little variance between different renditions of a particular song type (Hultsch and Todt 1981;McGregor and Krebs 1982;Slater 1983;Mennill and Vehrencamp 2005). Yet, Chiffchaffs' song types could fulfil a discreteness condition postulated by MacDougal-Shackelton (Macdougall-Shackleton 1997), that song type is a ''…specific version of a discontinuous song. Song types must be consistently reproduced between renditions, and there must be discontinuous variation in frequency or temporal characters allowing them to be divided into discrete groups.'' The Chiffchaff song organization does not fit species with hierarchical song organization (Gil and Slater 2000) or species in which the syllables in the repertoire are combined in an undetermined fashion (Catchpole 1976).
Further, it seems that published data on the Chiffchaff's repertoire size have been underestimated (Cramp et al. 1992). We documented that males had nine to 24 syllables in their repertoire, double the number previously reported for P. collybita collybita. However, repertoire sizes should be considered as rough estimates because they may differ based on the observer's preferences and experience, and different people may have different opinions about what should be considered as a separate syllable type (see reliability analysis in ''Materials and methods''). Proper repertoire study would be needed to set clear definitions about what should be considered a syllable type. For example, we are not able to decide whether the syllables clearly resembling parts of other syllables should be considered as separate syllable types. They were used consistently in complete and incomplete forms in different song types, yet sometimes, we also observed some transitional forms. It seems that some particular individuals tend to sing such transitional forms more often than others (Jaška, personal observation). Consistency in syllable production might be associated, for example, with the male's quality (Botero et al. 2009;Ferreira et al. 2016), but this needs to be tested in further studies on the Chiffchaff.
Our analysis shows that the emergence of new song types is simultaneously associated with the emergence of new syllable types, not simply a recombination of syllables which have already been recorded. Emergence of a new song type, and its syllables, was possible even after more than 700 songs had been recorded. This is a significant drawback for the estimation of repertoire size in the Chiffchaff, as well as in species that might have similar song organization. Repertoire size cumulative curves in closely related Willow Warblers (Phylloscopus trochilus) show clear repertoire size asymptotes after 100 songs have been recorded (Gil and Slater 2000). A much more limited number of songs is also sufficient for good repertoire estimates in other species, even when they have much more complex repertoires (Kipper et al. 2006;Hesler et al. 2010;Mamede and Mota 2012;Petrusková et al. 2016). Estimating repertoire size is not a trivial matter, and recently, new approaches have been discussed for this (Garamszegi et al. 2004;Peshek and Blumstein 2011;Kershenbaum et al. 2015). Chiffchaffs could be characterized as singing with an eventual varied, noncyclic singing style probably with heterogeneous song type selection (dominant and rare song types) similar to Rufous-and-white Wrens (Harris et al. 2016). As a result, capture-recapture (Garamszegi et al. 2004) or coupon collector (Kershenbaum et al. 2015) methods could be appropriate to estimate their repertoire size. According to another study, Chiffchaffs could be characterized as singing standardized clusters of syllables with eventual variety (Botero et al. 2008). Repertoire sizes for such a singing style were among the most challenging to estimate, and estimates for this type of repertoire were associated with significant errors. Our study suggests that it can be extremely challenging to estimate repertoire size, even in a species with seemingly simple songs and small repertoires.
We do not know what the function of song types may be. We observed (but did not quantify) that a change in song type was sometimes associated with changing the song post. This has been observed in Red-winged Blackbirds by Yasukawa (1981), who considered whether changing song post and song type simultaneously in Redwinged Blackbirds is in line with the beau geste hypothesis. However, because the match was not perfect, Yasukawa (1981) concluded that changing song type was more likely a byproduct of longer pauses in between songs caused by moving to a new song post. It may also be possible that Chiffchaffs match the songs of their neighbours (Catchpole and Slater 2008) or associate song types with particular neighbours for some reason. We did not observe apparently common song types between different birds, but our males were mostly not neighbours.

Discrimination versus identification
Our study provides another example that a single recording session is not sufficient to make conclusions about possible identity cues. Random variation can lead to individual differences in a single point in time and lead to relatively high discrimination success. However, for individual recognition it is necessary that identity cues remain stable within a considerable period of time (Terry and McGregor 2002;Tibbetts and Dale 2007) and, in future studies, researchers should always be able to show not only discrimination but also re-identification or repeatability of a particular trait when inferring about the identity signalling function of a particular song trait.

General song characteristics
We show that, in the Chiffchaff, there is an individual variation in general song characteristics (PICs of all variables [1, discrimination of males is possible within recording session). However, song characteristics are generally not stable in time. This is documented by the lack of repeatability of single song characteristics at larger timescales and the approximately two-fold decrease in classification accuracy within-day, betweendays, and between-years as compared to within-recording session LDA classifications (34, 28, 28, and 59%, respectively). Although we found high within-day repeatability for all song characteristics except for Max. F,, and generally much lower repeatability between days and years, timescale does not seem to be a crucial factor for re-identification accuracy as suggested by only 6% difference in classification accuracy of within-day and between-years LDA classifications. Syllable interval was the only consistently repeatable song characteristic on all timescales. However, syllable interval is substantially affected by social context (Linhart et al. 2013) and can only be used for individual recognition if this is taken into account.
Effective re-identification was only possible when it was based on the comparison of the same song types. It is possible that differences in environmental conditions such as noise levels (Verzijden et al. 2010), recording distance, humidity, etc. could affect the general characteristics of Chiffchaffs' songs and hinder re-identification success in the case of songs from different recording sessions (within days, between days, between years). Environmental conditions are relatively less important than song type content as indicated by high classification success in cases where the same song types from different recording sessions were used (likely differing in environmental conditions). Our interpretation of the results is that the general song characteristics in Chiffchaffs we measured result from specific syllable composition of the song and change crucially whenever song type is switched. Songs within a single recording session are more likely to involve songs of the same syllable content and result in relatively high discrimination success compared to songs from different recording sessions. Our study, therefore, does not support the hypothesis that Chiffchaffs could use general song characteristics for individual recognition independently of song type, as suggested for Great Tits (Weary et al. 1990;Weary and Krebs 1992).

Other possible cues to identity in Chiffchaff songs
We focused on general song characteristics that are easy to measure and were used previously. However, birds might have some other features associated with 'voice quality'. There are several studies, including our previous study on Chiffchaffs, that proved that methods used for content-independent speaker identification in humans can be adapted to identify bird individuals (Fox 2008;Cheng et al. 2010;Ptáček et al. 2016). However, it is still necessary to prove that re-identification with these methods is possible on a much larger timescale.
Besides general song characteristics, Chiffchaffs could further discriminate other individuals based on shared or unique song types and syllable types. Individual recognition based on a unique repertoire (Thompson 1970;Petrusková et al. 2016) is highly unlikely in Chiffchaffs. Though individuals might have unique repertoires (cf. Fig. 3), it would be too time consuming to sample them. More likely, recognition might be based on dominant song types that seem to be unique and individually distinct. Indeed, our analysis demonstrating between-day re-identification was carried out partly on dominant song types. Nevertheless, it would be necessary to show that males use a dominant song type consistently over time. In our study, the dominant song type was identical on days 1 and 2 for seven out of 13 males, but it is likely that the number of songs recorded on day 2 was too low to identify the dominant song type correctly.
Further, Chiffchaffs could use differences in homologous syllable types for individual recognition. The term 'homologous syllables' implies that the syllables share the same structure (structural homology) or function (functional homology). For example, it is apparent that all males demonstrate in their repertoire some sort of 'L'-A3 and D13 syllables as found in male PC1101 (or 'H' syllable-PC1103, A2; 'inflected I'-PC1101, D11, etc.). However, males typically have more variants in their repertoires; thus, it is unclear how birds could recognize which of these represents an individual signature syllable. Initial syllables could very likely be homologous across males. The initial syllable appears at the beginning of the song. Nevertheless, initials, again, can have more variants in some males, and we did not observe them in all individuals despite extensive sampling. It may be also possible that Chiffchaffs use a combination of different strategies for the perception and signalling of identity.
Despite the fact that we were not able to document reidentification based on general song characteristics in Chiffchaffs, we believe that more attention should be given to studying general song characteristics which allow birds to recognize other individuals regardless of song. Usage of such general song characteristics could help to explain why species with complex and changing repertoires, or those sharing the same song types, can easily discriminate other birds simply based on single songs, songs they have never heard before, or incomplete songs. Also, recognition of individuals based on general song characteristics would allow a species a flexible means of monitoring individuals. We believe that this is a promising field of future study.