Untested assumptions about within-species sample size and missing data in interspecific studies
- First Online:
- Cite this article as:
- Garamszegi, L.Z. & Møller, A.P. Behav Ecol Sociobiol (2012) 66: 1363. doi:10.1007/s00265-012-1370-z
- 258 Downloads
Phylogenetic comparative studies rely on species-specific data that often contain missing values and/or differ in sample size among species. These phenomena may violate statistical assumptions about the non-random variance component in sampling effort. A major reason why this assumption is often not fulfilled is because the probability of being sampled (i.e., being captured or observed) may depend on species-specific characteristics. Here, we test this assumption by using information on within-species sample sizes and missing data from five independent comparative datasets of European birds. First, we show that the two estimates of data availability (missing values and within-species sample size) are positively correlated and are associated with research effort in general (the number of papers published). Second, we demonstrate biologically meaningful relationships between data availability and phenotypic traits. For example, population size, risk-taking, and habitat specialization independently predicted within-species sample size. The key determinants of missing data were population size and distribution range. However, data availability was not structured by phylogenetic relationships. These results indicate that the accuracy of sampling is repeatable and distributed non-randomly among species, as several species-specific attributes determined the probability of observation. Therefore, data availability seems to be a species-specific trait that can be shaped by ecology, life history, and behavior. Such relationships raise issues about non-random sampling, which requires attention in comparative studies.