Introduction

Bird song is a sexually selected trait characterized by extremely varying complexity and fulfilling several functions. These are functions such as: mate attraction, territorial defence, maintaining pair bonds, signalling predation risk, etc. (Catchpole and Slater 2008). Most well-studied species were found to demonstrate some kind of geographical variation in song structure, which is often observed at two levels: micro- and macrogeographical (Mundinger 1982). Microgeographic variation is related to neighbouring groups of males, which are likely to interact with each other during their lifetimes. In most cases, even a superficial inspection of sonograms enables us to find some consistent differences in song structure among males belonging to different dialect groups called microdialect or local dialect. Areas and numbers of males covered by local dialects might be very variable. Dialects are common in species with either a small or large repertoire size. As a consequence, bird song of different species is characterized by huge number of patterns of spatial variation (Podos and Warren 2007). For example, in some species, males share full songs from a small repertoire and dialects have sharp borders (e.g. white-crowned sparrow Zonotrichia leucophrys, Nelson and Soha 2004a; corn bunting Emberiza calandra, McGregor 1980). In other species, local dialects are recognized by some specific structures occurring within song. This specific structure was shown both for species with a small repertoire (e.g. final whistles in yellowhammer Emberiza citrinella, species with a typical repertoire size of 2–3 song types per male; Rutkowska-Guz and Osiejuk 2004) and for species with large repertoires (e.g. the so-called ‘group signature’ in skylark Alauda arvensis, a species with over 300 syllables in a male repertoire; Briefer et al. 2008).

Much more interesting than describing new dialects, or even finding new patterns of their distribution, is looking for possible relationships between the geographical pattern of song distribution and its biological significance. It is not easy, and as usual we have a ‘snap-shot’ of a current state, i.e. we know the distribution of songs in space for a few (or more if you are lucky) years. We try to draw conclusions about the possible function(s) affecting this long historical process of dialect formation and maintenance (Baker and Thompson 1985). It is widely accepted that songbird dialect formation depends on three crucial factors: song learning, dispersal pattern and female choice (see Catchpole and Slater 2008). Nonetheless, scientists still dispute the possible function(s) of sharing songs within dialects. Nottebohm (1969, 1972) proposed the ‘genetic adaptation’ hypothesis, where the possible function of dialects is the maintenance and development of local adaptations. Thus, a female choosing her mate based on local song cue benefits from having a partner better adapted to the local habitat. According to the ‘social adaptation’ hypothesis proposed by Payne (1981a, b), young males may benefit from learning the songs of their older, established neighbours in order to interact with them more effectively. Hence, this hypothesis stresses the importance of intra-sexual selection in the forming of dialects. Other authors emphasised the role of habitat-dependent transmission of songs, which may affect their structure during propagation and thus also affect learning processes (‘habitat matching’ hypothesis; Hansen 1979). These hypotheses are not mutually exclusive and were at least partially supported for some species (review in Catchpole and Slater 2008). However, all the research leads to the conclusion that there is probably no single explanation for sharing songs within local dialects. Moreover, it seems that a valid null hypothesis for the appearance of dialects could be the ‘by-product’ hypothesis, suggesting that song sharing occurs because of specific learning and dispersal pattern but is not necessarily functional (Podos and Warren 2007).

Regardless of dispute on dialect functions, many studies have demonstrated that both male and female birds are able to distinguish between geographically close and distant songs. In most cases, the birds will respond more strongly to local songs (e.g. Baker et al. 2001; MacDougall-Shackleton and MacDougall-Shackleton 2001; Wright and Dorin 2001; Nelson and Soha 2004b). On the other hand, our knowledge of the acoustic cues used for recognition of songs from local populations is still incomplete, which impedes our full understanding of dialect function (Nelson and Soha 2004a). One productive approach in studying dialects and the birds’ perception of the dialects is comparing responses of males to playback of songs derived from different dialect areas. It is especially helpful if we are able to manipulate songs by changing those song parts which are supposed to be dialect cues. It is useful if we test birds from a population characterized by different learning and/or dispersal pattern (e.g. Osiejuk et al. 2007a; Briefer et al. 2008).

We report here a set of experiments aimed to test how male ortolan buntings (Emberiza hortulana) perceive “locality” of songs in a population with a pattern, which is typical for the species dialect. This paper is a continuation of an earlier study, where we used a similar experimental design to find out how ortolan bunting males respond to local, foreign and hybrid songs in a population where there was no local dialect in the strict sense (Osiejuk et al. 2007a). The ortolan bunting belongs to the group of songbirds with relatively small repertoire sizes, usually regarded as varying between 2 and 3 song types per male (Cramp and Perrins 1994). Songs of the ortolan bunting have a similar general structure in all studied populations and consist of two easily recognized parts: (1) an initial phrase of higher overall frequency and wider frequency range, and (2) a final phrase of lower overall frequency and relatively narrow bandwidth. Both the initial and final phrases may be complex, i.e. consisting of more than one type of syllable, but in most cases, both parts are phrases of a repeated, single syllable type (Cramp and Perrins 1994; Osiejuk et al. 2003). Ortolan buntings form apparent dialects in central and southern Europe. Males within a dialect share a single type of final song phrase and have a relatively small repertoire of initial song phrases. A single dialect usually covers an area of several hundred square kilometres or more, and dialects seem to be stable over many years (Conrads and Conrads 1971; Conrads 1976, 1994; Helb 1997). These descriptive studies suggest the most likely dialect cue in this species is a locally shared final song phrase (the dialect cue hypothesis).

However, recent studies were done on an isolated population inhabiting Hedmark County in Norway. The results revealed that this population did not show a dialect in the same way as ortolan buntings from central Europe (Osiejuk et al. 2003, 2007a; Łosak 2007). There was no single final phrase of song shared by all or the majority of males. In addition, particular males often had song types in their repertoires with different final phrases. These differences mean that Hedmark males should be called mixed dialect singers sensu Conrads (1994). Comparatively large numbers of different initial phrases were also found at the population level. All these data suggest that microgeographic song variation in this species is diverse and may be related to ecological factors such as dispersal pattern, isolation and patchiness of the population (Osiejuk et al. 2005a; Łosak 2007). Despite having no common dialect (at least in the strict sense of sharing a single final song phrase), higher song complexity, larger song type repertoire size and lower levels of song type sharing, males in this population were found to recognize songs of conspecifics and interacted with them in the normal way (Osiejuk et al. 2007a, b; Skierczyński et al. 2007). Using playback experiments with full local songs, full foreign dialect songs and two types of hybrid songs (locaLForeign and foreign–local), we found that the ortolan bunting males we studied in Norway did not respond strongly to foreign dialect songs nor to kinds of artificial hybrid dialect songs. We also did not find any significant differences in response to foreign song and artificial hybrid songs. Furthermore, we found that adding any typical local final phrase of song did not increase response strength in comparison to the full foreign song. Therefore, we have rejected the dialect cue hypothesis. The main conclusion from this study was that in ortolan buntings from Norway only the full, local song evoked an immediate and normal response (Osiejuk et al. 2007a).

In this paper, we experimentally test if and how male ortolan buntings from Poland, where apparent local dialect pattern occurs, discriminate between local, foreign and hybrid songs. These tests were based on evaluating and comparing the response of males to playback of (1) local dialect songs—L songs, (2) conspecific foreign dialect songs—F songs, and two types of artificially mixed hybrid songs consisting of (3) a local initial phrase and a foreign final phrase—LF songs, and (4) a foreign initial phrase and a local final phrase of song—FL songs. According to the dialect cue hypothesis, we predicted that subjects would respond more strongly to local songs and to hybrid songs with local final phrase than to foreign songs and to hybrid songs with local initial phrase. If the dialect cue hypothesis is incorrect, it is reasonable to assume that only playback of local songs would evoke the strong response of males (the full song hypothesis) as was found in the Norwegian population. More generally, this paper focuses on mechanisms of perception of song locality. We also focused on potential differences in this perception between a model dialect and non-dialect populations of a passerine bird species with a small repertoire.

Materials and methods

Study area and subjects

The study was carried out in Wielkopolski National Park and its surroundings, in western Poland (52°17′N, 15°56′E). The study area is typical for this region of Poland being dominated by farmland with a mosaic of fields, meadows and wasteland. Ortolan buntings are common here, breeding preferentially along forest edges, in tree lines surrounded by farmland, and in open habitats with scattered trees. The population of ortolan buntings consists of several thousands of breeding pairs and, in the case of the Wielkopolska region, it is more or less continuous within the farmland landscape (Kuźniak and Dombrowski 2007). Within the study area, singing males have been counted and intensively recorded since 1998 and the number of territorial males usually exceeded 50 (Łosak 2007; own unpublished data). Between 1998 and 2003, sonograms of over 12,000 songs of 251 males were visually inspected. It was found that only one male had a foreign dialect song without the typical local final phrase in his repertoire. The male with foreign final phrases had in his repertoire two song types built with a foreign final song phrase; one with a commonly shared initial phrase and one with an initial phrase described only for him (Łosak 2007). Therefore, the studied population in Poland clearly belongs to the group type described in central Europe with and apparent local dialect indicated by a shared final phrase with an extremely low level of foreign or mixed dialect singers (Cramp and Perrins 1994).

Playback equipment and preparation of song stimuli

For the playback experiments, we used a Philips Magnavox ESP25 (Suzhou, China) compact disc player with a wireless SEKAKU WA-320 (Taichung, Taiwan ROC) loudspeaker with 20-W amplifier (frequency range 50–15,000 Hz and linear frequency response within species-specific frequency range, i.e. 1.8–6.6 kHz). Each experiment used different songs (altogether 61 song samples). These songs belonged to one of three kinds of stimuli: (1) songs from randomly chosen non-neighbours of the subject males, (2) songs from randomly chosen males from a foreign population, and (3) computer hybrid songs which consisted of local and foreign phrases (see treatment description for more details). Foreign songs or their parts originated from a population inhabiting Hedmark County, southeastern Norway (the centre of a ca. 500 km2 area inhabited by 100–150 males: 60°41′N, 11°47′E). For more details, see Osiejuk et al. (2007a).

All the songs (or their parts) used as stimuli were common in either Poland or Norway, respectively, i.e. were shared by at least 10% of males within a given population. We decided to use such a threshold because song type diversity in the ortolan bunting is much higher in Norway than in Poland (70 vs. 11 different song types per 100 males, respectively; Osiejuk et al. 2005a; Łosak 2007). Based on this threshold, we used for the preparation of the stimuli: 61 different song examples belonging to the most typical 18 Norwegian and 14 Polish song types. Each full song or song phrase used for preparation of stimuli was derived from the repertoire of a different male. We have been studying the song of the species in Poland since 1998, and in Norway since 2001. We have a database of over 40,000 digitized songs from more than 500 males, so the only additional criterion of choice was recording quality. More detailed information about the different song type usage is given in “Experiments”.

All the songs used were of good quality and were digitally prepared (2 kHz high-pass filter, amplified or attenuated) to match 86 dB signal pressure level (SPL) at 1 m from the loudspeaker [measured with a CHY 650 (Ningbo, China) sound level meter]. The SPL value was set on the basis of normal ortolan bunting song amplitude level which had been previously measured in the field. In all cases, the amplitude manipulation did not exceed ±5 dB, and did not affect song structure. All recordings in our database were originally recorded at 48 kHz/16 bit. We did a conversion to 44.1 kHz/16 bit sampling rate with accuracy 512 and anti-aliasing filtering ‘on’ in order to prepare an audio CD for playback. We used Avisoft SASLab 4.34 software (Specht 2002) to prepare playback cuts.

Playback protocol

We carried out the experiments between 30 April and 13 May in 2007 and 2008 (between 0530 and 1000 hours local time). On the basis of behavioural observations, the experimental period was chosen as the time when males intensively defend their territories. All subject males were recorded prior to the experiments and their territories were mapped at least 1 day before the experimental period. All subject males were unpaired and defended territories for at least 1 day before the experiments were conducted. Before each experiment, the loudspeaker was placed in a tree about 1.5–2 m above the ground. The place was always within the subject male’s territory and at an average (±SE) distance of 36 ± 2.2 m from his song post, observed during equipment setup. Differences in distance between initial position of the loudspeaker and a focal male resulted from the characteristics of the habitat within a particular territory. In contrast to the typical ortolan bunting habitat in Norway, potential song posts on the Wielkopolska farmland were less evenly distributed. Therefore, we had to accept higher variation of initial distance to ensure that the loudspeaker was placed in a way which allowed the focal male to move towards the loudspeaker and to land on a tree, bush or elevated ground at a distance less than or equal to 5 m from it (see Skierczyński and Osiejuk 2010 for discussion on habitat differences between studied populations and their influence on territorial defence). The loudspeaker was also always placed in such a position that the bird did not move towards the observer. The location of the observer was approximately 20–30 m perpendicular to the straight line between the loudspeaker and the focal male at the beginning of the trial. The distance between loudspeaker and initial bird position was measured with a Bushnell Yardage Pro Trophy laser rangefinder monocular (Overland Park, KS, USA).

Each experiment consisted of two stages: a 1-min playback (PLAY) followed by 1 min of after-playback (POST) observation of the focal male behaviour. Timing of the PLAY and POST stages was determined on the basis of earlier experiments with the species. The time of PLAY was short, as we did not want to interact with males for a longer period. We wanted only to evoke a simple and clear response to stimuli, where treatment only differed in a single dimension (structure of song). The same songs were played back six times during 1 min. This is a typical song rate and type of song delivery for the species (Osiejuk et al. 2003; Łosak 2007). The playback was started ca. 2 s after the last song of the focal male. This allowed the focal male to avoid overlapping with the following playback songs, especially because an increase in song rate during a playback phase is not a typical response (Osiejuk et al. 2007a, b; Skierczyński and Osiejuk 2010). The POST stage was only 1 min long. This was done to avoid including into the analysis any behaviour caused by non-playback factors which might occur after we stopped the playback. Observations of the behaviour of the males were dictated to a microcassette recorder and notes were transcribed later the same day. The accuracy of extracting variables from a tape was 1 s. We recorded ten measures of response to playback, which are described in Table 1.

Table 1 Measures of male ortolan bunting (Emberiza hortulana) responses to playback

As the studied population was not colour ringed, we tested each male only once with a randomly chosen song sample. When several males occurred at a particular site, we usually tested all of them during a day or two, which allowed us to avoid incidentally testing the same male two times. Additionally, we also checked repertoire contents and frequency parameters of shared song types, which are individually distinct. Thus, we were able to recognize males individually even if they share same song types (Osiejuk et al. 2005b; Łosak 2007; Skierczyński and Osiejuk 2010).

Experiments

Altogether, we tested 61 different males and conducted four series of experiments with the same design and different types of vocal stimuli used. Thus, we analysed the total response to the playback of 61 males sharing the same local dialect. We divided the 61 males into four almost equal groups:

  • Experiment no. 1—L local song: we tested 15 males with a randomly chosen non-neighbour local song (L).

  • Experiment no. 2—hybrid FL song: we tested 16 males with a randomly chosen mixed song consisting of a Norwegian initial phrase (F) and a Polish final phrase (L).

  • Experiment no. 3—hybrid LF song: we tested 15 males with a randomly chosen mixed song type consisting of a Polish initial phrase (L) and a Norwegian final phrase (F).

  • Experiment no. 4—F foreign song: we tested 15 males with a randomly chosen foreign song (F) from a Norwegian population.

All full songs (L, F) or song phrases (LF, FL) used as stimuli were derived from repertoires of different males. The choice of song stimuli for particular males and experiments was random with the following exceptions: all initial phrases within song stimuli of particular experiments 1–4 were different but belong to the group of locally common phrases (i.e. shared by at least 10% of males within the population). In L and FL experiments, the final phrases always belong to the same type, which is characteristic for the Polish population from the Wielkopolska region and seems to be a dialect cue (Fig. 1). Final phrases of songs in Norway are much more variable than in Poland. Therefore, in the LF experiment, we used six different, common types of such phrases and each type was used two or three times. Each final phrase was still derived from the repertoire of a different male. In the F experiments, final phrases of song stimuli belonged to seven different common types; six of them were used two times and one type three times. The consequence of using common song types as stimuli was that all tested males shared at least some phrases with the playback stimuli used in all the experiments of L, LF and FL treatments. This made it extremely unlikely that experimental males were unfamiliar with the local song types (or their parts) used in experiments.

Fig. 1
figure 1

Sonograms of exemplary songs used for the playback experiments. ac Typical songs of Polish (=local) ortolan buntings (Emberiza hortulana). df Typical songs of Norwegian (=foreign) ortolan buntings. gh Computer hybrid songs with initial and final parts originating from different populations. Solid and dashed line boxes separate initial and final phrases of songs

Analysis

Altogether, we measured ten response variables, describing the responses of the males to playback, which in over 77% of the cases correlated significantly with each other. Separate tests on original variables would not be statistically independent and would not reveal the multivariate character of the response (Rice 1989; McGregor 1992). For this reason, we combined all log-transformed original variables into two orthogonal principal components (PC1 to PC2). The dataset appeared to be suited for such an analysis (Kaiser–Meyer–Olkin measure of sampling adequacy = 0.75, Bartlett test of sphericity = 446.23, P < 0.001). The assumptions of homogeneity of variance and normality of the distribution of response variables were examined using Levene’s tests and Kologorov–Smirnov tests, respectively. The variances were homogeneous (all P > 0.14) and distributions did not depart significantly from normality (all P > 0.18). Therefore we used general linear models (GLM) to test for differences in response between different song categories played back to four groups of males. We used original response variables to present results graphically. In all experiments, n equals the number of subjects and different stimuli. Statistical tests were two-tailed.

Results

Most of the approach-related response measures were strongly correlated with PC1 and lower values of PC1 corresponded to a stronger response, i.e. faster approach to a closer distance and more flights during and after playback (Table 2). PC2 was negatively correlated with the number of songs given during and after playback, and positively related to the number of calls given and song latency (Table 2). This compound variable reflects the diversity of vocal response to stimuli (i.e. call- or song-biased response). Further, we interpreted the obtained principal components as compound measures of APPROACHING (PC1) and VOCAL RESPONSE (PC2).

Table 2 Eigenvalues, variance explained and weightings of the original variables in the first two principal components extracted from the ten original variables of the response to the playback

We found ortolan bunting males in Poland to respond similarly to L songs as birds tested in other populations with their local songs (Osiejuk et al. 2007a, b; Skierczyński et al. 2007; Skierczyński and Osiejuk 2010). When L songs were played back, males commenced their first flight towards the loudspeaker just after the onset of playback and with a few more flights came very close to the loudspeaker (Fig. 2a–c). Typically during this initial approach, males ceased singing, but soon after, in the vicinity of the loudspeaker, they usually started singing or calling. Call-biased response is considered stronger for the species (Osiejuk et al. 2007a, b; Skierczyński et al. 2007; Skierczyński and Osiejuk 2010) and is typically connected with more flights towards the loudspeaker (correlation between overall number of flights and calls for all experiments: r = 0.54, n = 61, P < 0.001). Birds replying to playback with more songs have less of a tendency for flight (correlation between overall number of flights and songs for all experiments: r = −0.24, n = 61, P = 0.065). The number of songs given during the playback stage was usually not very high as a consequence of initial cessation of singing and later typically singing only a single song a few seconds after each song played back. The ortolan bunting is characterized as having a typical strong response with fast approach, and many flights were connected with the giving of more calls. Consequently, the strong response scored by compound measures is characterized by low values for APPROACHING and VOCAL RESPONSE shifted toward high values related to call-biased response.

Fig. 2
figure 2

Box plots showing nine original response measures to playback of local (L), foreign (F) and artificially mixed songs (FL and LF): a flight latency, b approach latency and c closest distance, d song latency, e flights during playback, f flights after playback, g songs during playback, h songs after playback, i calls during playback and j calls after playback. The lower and upper edges of the boxes represent the first and third quartiles; the median divides each box. The vertical lines (‘whiskers’) include the range of values within 1.5 times the interquartile range. Solid circles are the means

Multivariate analysis with both compound response measures revealed significant or nearly significant differences between responses to the four categories of songs used in the experiments (GLM, Wilks’ λ = 0.73, n = 61, P = 0.006; APPROACHING: F 3,60 = 3.84, n = 61, P = 0.014; VOCAL RESPONSE: F 3,60 = 2.62, n = 61, P = 0.059). Post hoc comparison (Tukey HSD) revealed that response of males measured by APPROACHING differed significantly only between L and F experiments (P = 0.009), while all the other pair comparisons showed non-significant differences (all P > 0.123). For the VOCAL RESPONSE component, differences between groups of males tested with L and FL songs were close to significant (P = 0.054) and all the others were non-significant (all P > 0.15).

We found no significant differences between experiments in the initial distance between focal males and loudspeaker (one-way ANOVA, F 3,60 = 0.607, n = 61, P = 0.613). We have included the initial distance to the loudspeaker as a covariate in the next GLM analysis (experiment: Wilks’ λ = 0.73, n = 61, P = 0.008; covariate: Wilks’ λ = 0.986, n = 61, P = 0.674) and found that it did not affect male response significantly (covariate: APPROACHING: F 3,60 = 0.45, n = 61, P = 0.506; VOCAL RESPONSE: F 3,60 = 0.33, n = 61, P = 0.570) nor changed estimations for differences between different experiments (experiments: APPROACHING: F 3,60 = 3.82, n = 61, P = 0.015; VOCAL RESPONSE: F 3,60 = 2.45, n = 61, P = 0.073). Also, the correlations between initial distance to the loudspeaker and original response measures were close to zero and non-significant with the exception of one weak but significant correlation for the closest distance (r = 0.32, n = 61, P = 0.012). Therefore, we found that initial differences in distance between males and loudspeaker weakly affected response and could be ignored.

Clearly, APPROACHING response was the strongest for L songs, intermediate for FL and LF songs and weakest for F songs (Fig. 2a–c, e, f). VOCAL RESPONSE for tested groups varied with the more complicated patterns. Males responding to L songs biased their vocal output towards giving more calls than songs and uttered on the average (±SE) 13.2 ± 3.83 calls during playback and 20.0 ± 4.73 calls after playback (n = 15). Therefore, in L experiments, calls were produced much more frequently than in the remaining experiments (averages for playback and after playback phase for F, LF and FL experiments varied between 0.8 and 4.8, and 5.2 and 12.7 calls, respectively, n = 16 for F and 15 for LF and FL experiments; Fig. 2i, j).

Discussion

We found that most of the ortolan bunting males we studied in Poland did respond much more weakly to foreign dialect songs but they responded strongly to both local songs (L) and songs which contained any local phrase (i.e. LF and FL songs). Switching from singing to calling mode response is treated as a stronger response in the ortolan bunting (Osiejuk et al. 2007a, b; Skierczyński et al. 2007; Skierczyński and Osiejuk 2010). Based on the approaching pattern and switching from singing to the stronger calling mode response, one may ordinate response strength on the strongest–weakest axis as follows: L > LF > FL > F. We found that adding both a local final phrase or a local initial phrase of song increased response strength in comparison to a full foreign song. These findings imply the rejection of both the full song and dialect cue hypotheses. Our results suggest that in the ortolan bunting from Poland (i.e. population with typical local dialect) both parts of the local song contain information important for discriminating between local and foreign song dialect.

One of the most important findings of this and our earlier study (Osiejuk et al. 2007a) is that ortolan bunting males from populations differing in song diversity (size of song repertoires, presence of so-called dialect cue, number of different song types per male within a population) differed in their response to songs with manipulated information contents (i.e. LF and FL songs). We also found that these birds were similarly responsive or non-responsive to L and F songs, respectively. The crucial difference concerned the response to hybrid songs. Such songs did not significantly increase male responsiveness in comparison to foreign songs in the case of the without-dialect population (Osiejuk et al. 2007a). In the case of the population with a local dialect, hybrid songs evoked only slightly weakened male response in comparison to full local songs (this study). The comparison suggests that males from dialect and non-dialect populations differ not only in song variation but also in song perception by males suggesting some asymmetric recognition error (Colbeck et al. 2010). Playback experiments conducted in Norway supported the full song hypothesis, which suggests that males are responsive only to known, full songs. These known, full songs are not necessarily only the songs they sang, but also songs they had heard and remembered through their lifetime during vocal interactions with other males (Osiejuk et al. 2007a). In this case, the observed local dialect, or its lack, is supposed to be only a by-product of learning (i.e. number of tutors, tutor choice, timing of learning, etc.) and male dispersal pattern. This hypothesis has strong support thanks to the detailed study on ortolan bunting dispersal in Norway (Dale et al. 2005, 2006). Several aspects of dispersal behaviour clearly changed in this population due to isolation and fragmentation, which finally strongly affected learning pattern and song diversity. In particular, within-season changes of territory between patches of suitable habitats due to scarcity of females affects the number of tutors a male is able to hear in comparison to the typical continuous populations where males settle and spend the whole season at one place (Dale et al. 2005, 2006; Osiejuk et al. 2008; Skierczyński 2009).

Experiments presented in this paper suggest that ortolan buntings from the Polish population perceive full local song phrases and both types of hybrid songs as local songs. This recognition means that they are able to recognize both initial and final local phrases within hybrid songs regardless of whether the remnant part of the song was derived from a foreign dialect. Such a result was quite surprising, as it supports neither the full song nor the dialect cue hypothesis. These findings suggest that a final phrase of the song in ortolan bunting is not a dialect cue (or at least it is not the only dialect cue) in Norway, where males may have several types of final phrases, but also not in Poland, where virtually all males share a single final phrase.

Our study raises some interesting questions at this stage. For example, if the final phrase is not a dialect cue, why is it shared among males? What factors maintain a lower variation of final phrases in comparison to initial phrases within populations? Regardless of whether there was a single (Poland) or several (Norway) final phrases within both studied populations, they were less variable in overall frequency than initial phrases (Osiejuk et al. 2003, 2005a; Łosak 2007). It has been suggested (Harris and Lemon 1974) that the degree of responsiveness to foreign dialect is affected by the amount of similarity in the two dialect forms presented. This similarity might result from such factors as distance between localities and song meme mutation level, isolation effect, etc. However, here we compared results of symmetric experiments, which were conducted in two populations using the same song manipulation programme. For example, we have no good arguments for why Norwegian song could be “more foreign” to Polish males than Polish song to Norwegian males. This is especially true about FL and LF hybrid songs, which in both studies were made up of common (for a particular population) syllable types. Therefore, we conclude that, despite the same level of similarity in the songs presented in the studied populations, there were significant differences in the response to FL and LF hybrid songs. We also concluded that this asymmetry in response to songs containing non-local phrases resulted from song variation across the studied populations.

Conclusions

Our results show that ortolan bunting males from different populations easily discriminate between full local and foreign songs but differ in perceiving hybrid songs with non-local phrases incorporated. This asymmetry in song perception likely resulted from vast differences in song diversity among those populations and differences in the exposure of young males to different songs. The mechanisms responsible for these differences, however, and further consequences remain unclear. Detailed study on functions of different song parts (see, e.g., Nelson and Poesel 2007) is required. The number of studies showing within species differences in LF dialect discrimination (Thompson and Baker 1993; Nelson and Soha 2004a), or more generally showing the asymmetric discrimination in signal perception within species, is increasing (Colbeck et al. 2010).