Sperm cells are among the most diverse cell types known, showing a large variation in size and shape across the animal kingdom (Pitnick 2009; Kahrl et al. 2021). Not only are sperm traits variable among species and higher-level taxa, there is also diversity in sperm traits, especially size, at various intraspecific levels. That is, sperm sizes vary among populations of the same species, among males within a population, among ejaculates from the same male, and among individual sperm within the same ejaculate (Ward 1998; Laskemoen et al. 2007, 2013; Pitnick 2009; Gohli et al. 2013; Hogner et al. 2013). The documentation and analysis of this diversity and variability at all levels of organization is fundamental to our understanding of the biology of sperm. We need to know the structure of this diversity before we can fully understand the functional and developmental processes shaping various sperm traits, as well as how and why these traits evolve and diversify. A prerequisite for this research is access to reliable measurements of sperm traits. It is also scientifically valuable to have preserved reference samples that can be re-analyzed when necessary, like specimens in natural history collections (Shaffer et al. 1998; Joseph 2011; Holmes et al. 2016; Lifjeld 2019).

The study of sperm cells dates back to the invention of the microscope, and Antonie van Leeuwenhoek (1632–1723) described the flagellate sperm of insects, dogs and humans as early as 1677. The older literature is rich in detailed drawings of sperm morphology in many taxa. Sperm morphology traits have been extensively used as diagnostic traits in zoological taxonomy (Wirth 1984; Jamieson and Leung 1991; Jamieson et al. 1995). In more recent years, following the developments in high-resolution microscopy and digital imaging, the quantitative measurements of sperm traits (morphometrics) have become more common and have facilitated more comparative, large-scale statistical analyses of sperm traits (e.g. Kahrl et al. 2021). This is also true for birds. Recent advances in this field, in combination with better phylogenetic reconstructions, have shown how sperm size and components may evolve at different rates in different clades and lineages (Omotoriogun et al. 2016; Supriya et al. 2016), and that the risk of sperm competition can be an important driver in the evolution of sperm traits (Eberhard 1996; Briskie et al. 1997; Immler et al. 2012; van der Horst and Maree 2014; Rowe et al. 2015; Durrant et al. 2020). Phenotypic divergences in sperm traits have been indicated as playing a role in speciation processes (Cramer et al. 2016, 2021) and in the delimitation of species (Lifjeld et al. 2016). However, given the fact that birds represent one of the better-studied groups of animals, with a wealth of natural history information available (Billerman et al. 2020), it is noteworthy that there still is a large proportion of bird species whose sperm is yet undescribed or unknown. Part of the reason is presumably that natural history museums with ornithological collections have not prioritized preserving sperm as part of their regular collection practices. Our institution, the Natural History Museum, University of Oslo (NHMO), is one of few museums in the world with a special collection of preserved samples of avian sperm cells (Lifjeld 2019). The collection currently holds more than 13 000 samples of sperm from more than 700 species, mostly preserved in formalin (= formaldehyde in aqueous solution).

Comparative studies of sperm traits often require comparing samples that have been prepared in different fixatives and stored for varying periods of time. Durations range from sperm measured fresh to after several years in storage. Using a reliable fixative that preserves the cell over time and prevents shrinkage, degradation or other changes is therefore important (Briskie and Birkhead 1993; Jonmarker et al. 2006; Schmoll et al. 2016). Commonly used fixatives for animal tissues are aldehydes (e.g. formaldehyde and glutaraldehyde) and alcohols (e.g. methanol and ethanol). Both groups of fixatives penetrate the tissue and harden its components by dehydration and crosslinking proteins (aldehydes) or denaturing proteins (alcohols). Museum specimens are often fixed in formalin and later stored in ethanol. However, since formalin-fixed tissues are not ideal for DNA preservation and extraction (e.g. Thavarajah et al. 2012; Hykin et al. 2015), most museums have developed special collections for genomic research based on cryopreservation of fresh or ethanol-preserved tissues. Avian sperm samples have commonly been fixed and preserved in formalin for the analysis of sperm morphology (Lüpold et al. 2009; Helfenstein et al. 2010; Schmoll and Kleven 2011; Immler et al. 2012; Albrecht et al. 2013; Cramer et al. 2013; Rowe et al. 2015; Kleven et al. 2019). However, if such samples are also intended for DNA analyses, ethanol might be a preferred fixative and storage medium.

The aim of the present study is to investigate the effects of formalin and ethanol on the integrity and morphometrics of avian sperm samples. In a previous study, Schmoll et al. (2016) found that bird sperm stored in formalin showed no detectable length changes for a period of 13–14 months. In this study, we expand on their investigations by also including ethanol-preserved samples and sperm samples measured fresh.

Using a dataset of sperm samples from six passerine bird species from four taxonomic families, we addressed the following questions: (1) Fixation: Does the fixation process alter the length of sperm cells when sperm are fixed with formalin or ethanol? (2) Storage in fixatives: Does length of sperm cells fixed in ethanol or formalin change over storage time? (3) Storage on microscope slides: Do sperm cell lengths remain consistent on slides stored dry over time? (4) Preservation efficacy of fixatives: Does fixation and storage in ethanol or formalin preserve the morphological integrity of sperm cells equally well?



Sperm samples were collected from Greenfinch Chloris chloris (n = 3), Hawfinch Coccothraustes coccothraustes (n = 2), Blue Tit Cyanistes caeruleus (n = 2), Great Tit Parus major (n = 2), House Sparrow Passer domesticus (n = 3), and Fieldfare Turdus pilaris (n = 3), caught in the Botanical Garden of the NHMO between April and June 2020. These passerine species are common and readily captured in the eastern part of Norway, with sperm lengths similar enough to allow measurement at the same magnification level (see Table 1 for data on sperm length and the coefficient of variation [CV] of sperm length within and between males for these species). Sperm was collected using non-invasive cloacal massage (Wolfson 1952; Laskemoen et al. 2013) during the early breeding stages. For longer-term questions, we examined sperm cells from three Fieldfares and three Hawfinches from the sperm collection of the NHMO (birds caught and sampled in 2018) and from Blue Tits, House Sparrows, Fieldfares, Greenfinches, and Great Tits sampled in 2007–2008 (also three samples for each species). All samples were accessioned to the Bird Collection at the NHMO and metadata is available via (2022). The datasets are deposited and accessible at Dryad (Grønstøl et al. 2022).

Table 1 Overview of average sperm cell lengths and CV of samples based on samples in the Avian Sperm Collection at the Natural History Museum of Oslo

Sample preparation

Upon sampling, in 2020, sperm was diluted using 30 µl of Phosphate-buffered saline (PBS) and mixed well before being subdivided into three samples: (1) one fresh sample with no medium other than the PBS—measured immediately after collection (while still motile), (2) one sample fixed in 5% formaldehyde solution, and (3) one sample fixed in 96% ethanol. The latter two samples were fixed in 250 µl of medium, in 2 ml tubes with screw-on lids and sealing rings, at room temperature. The 2007–2008 and 2018 samples were collected similarly and stored at room temperature. Samples remained in their original fixatives throughout the course of the study. The 5% formaldehyde solution was made by mixing a 37% formaldehyde solution (free from acid and stabilized with 10% methanol and calcium carbonate for histology [Merck Millipore 103999]) with PBS.

Preparation of slides

To analyze the effect of fixation, we compared length measurements from fresh (live) sperm to samples recently fixed in formalin or ethanol. Fresh samples were prepared first by mixing the sperm sample collected in PBS, and then pipetting a drop containing live sperm in 4-chambered Leja slides (chamber depth: 20 µm, chamber volume: 5 µl) and photos of sperm cells were taken immediately, before the sample dried out. For fixed samples, 15 µl of sample was applied on a standard glass microscope slide in 5 stripes of 3 µl each. Slides were left to dry overnight and gently rinsed with distilled water before imaging. Cover slips were not used on the slides. For the fixed samples, photos for measurements were taken within two days of collection.

Digital imaging and sperm cell measurements

All images were taken using a Leica DM6000B microscope connected to a digital camera (Leica DFC420). Photos of sperm cells deemed complete were taken at 320× magnification, and sperm cell length was measured on the photos using the SegmentLine tool in Leica Application Suite software v4.13. Total sperm length was measured as the sum of three separate segment lengths (head, midpiece, and tail). For each measurement occasion, we measured the segments (approximately) to the nearest pixel (0.14 µm) for 10 sperm cells per male. Measuring 10 sperm cells gives an adequate estimate of the mean sperm length for the sample (Laskemoen et al. 2007). One to ten photos were taken per slide to measure 10 sperm cells. Photo series within slides were taken with identical camera and microscope settings (e.g. lighting, exposure, and saturation).

The Leica Application Suite software stores the position of the photos on the slides as coordinates in the metadata information. When remeasuring individual sperm cells for assessing the effect of dry storage, this coordinate information was used to identify previously measured sperm cells, which were then rephotographed and remeasured. This was done blindly in the sense that previous length measurements were removed from view before doing the second photo and measurement round.

In all samples except for one, for each measurement round, 10 sperm cells were found and measured on a single slide. One of the Great Tit samples (NHMO-BI-103463) had a low sperm count. To get 10 measurable cells for this sample, we had to make two slides for the fourth ethanol and formalin Storage times and three slides for the third formalin Storage time. Preparation of slides, digital imaging of sperm cells, and measurements of sperm cells was carried out by the same person (GG), with the exception of sperm cell measurements made in 2007 and 2008, which were measured by two other persons (one measured the Greenfinches and the Fieldfares, and the other person the Blue Tits, Great Tits and House Sparrows).

To test intermediate and longer-term effects, repeated measurements were performed on both fixatives 45, 146, and 227 days after the initial samples were acquired (for the samples collected in 2020). A new slide was prepared for each of these measurement rounds. We measured the length of 10 sperm cells for each of 15 individual males of six species in both fixatives at these four time points. Photos for measurements of these samples were taken within three days of slide preparations.

To test how well sperm cells were maintained in formalin over a longer time span, in 2021, we prepared new slides of six samples previously measured in 2018 and 15 samples previously measured in 2007–2008. These were measured and compared with earlier measurements.

To assess the effect of dry storage on microscope slides on individual sperm, we remeasured 10 individual sperm cells on each of 10 slides that had been measured six months earlier (samples from 2021 of two Greenfinches, one Hawfinch, one Blue Tit, one Great Tit, three House Sparrows, and two Fieldfares).

Samples were not blinded with regard to Storage time and Fixative. Because live sperm cells in chamber slides look different from dry slides, it was impossible to blind live versus fixed sperm samples. Further, the sperm samples were measured shortly after the slides were made at each of the four Storage times, so the measurer knew which batch he was working on. However, care was taken to measure each sperm in a standardized way and blind to previous measurements.

Scoring of head damage

To estimate how well the sperm cells were preserved, we scored the frequency of sperm cells with head damage after fixation in samples collected in 2021. The scoring was based on photos of sperm cells from 15 samples of six species in both preservation media. Photos were taken from the first batch of sample slides approximately six weeks after the slides were prepared (i.e., head damage may have occurred during fixation of the sample and/or during storage on the slide for approximately six weeks). Prior to regular measuring of sperm length, sperm cells were (subjectively) screened to exclude cells that appear ruptured or deformed. However, when taking photos for the head damage assessment, a random sample of sperm cells was selected to obtain a representative sample of cell damage. Digital photos of the sperm cells to score were taken starting from the upper left side of the slide moving right. All sperm cell heads present within the microscope view were photographed. When the end of the slide was reached, a new line of microscope view below and parallel to the previous line was photographed. This was repeated until at least 100 sperm cell heads on the slide had been photographed (or all cells on the slide, for slides with < 100 cells).

The scoring was done starting from the left side of the photo, scoring sperm cell heads while moving to the right. The number of cells scored in each slide was standardized to the number of scorable heads found in the slide with the fewest sperm cells, which was 84.

Sperm head damage was based on visible damage and scored using three categories: (1) acrosome damage (acrosome paled, degraded, or missing), (2) nucleus damage (nucleus distended or burst, or acrosome and nucleus both missing), or (3) head undamaged. We scored two males for each of the species Blue Tit, Great Tit and Hawfinch (i.e., 168 cells for these species), and three males each for the Fieldfare, Greenfinch, and the House Sparrow (i.e., 252 cells for each of these species).


For investigations related to length of live and fixed sperm cells, we used general linear mixed models assuming Gaussian errors. These analyses were carried out in IBM/SPSS Statistics version 27. To compare frequencies of sperm cells with head damage we used generalized linear mixed models with logit link functions. These analyses were run in R v 4.1.1 with the package lme4 (Bates et al. 2015). Further, we used Cohen’s d-values to estimate effect sizes, and Cohen’s benchmarks (Cohen 1988) to evaluate effects sizes. These guidelines characterizes effect sizes around d = 0.2 as small, around d = 0.5 as intermediate, and around d = 0.8 as large effects. Cohen’s d-values were calculated from parameter estimates given in the outputs, following procedures given by Nakagawa and Cuthill (2007).

We do not specifically report measurement repeatability, as measurement repeatability sets an upper bound on the other measures of repeatability reported here. Other studies have shown that repeatability of such measurements are high and measurement error low (Laskemoen et al. 2007, 2010), and that measurement repeatability by the same observer is high (Lifjeld et al. 2016; Cramer et al. 2021).


Effect of fixation: sperm cell length measurements

To investigate if fixation of sperm cells in ethanol or formalin cause immediate changes in sperm cell length, we compared sperm cell length in seven samples (from four species) shortly after fixation in ethanol and formalin with length measurements of live sperm cells (Fig. 1). We examined this in a general linear mixed models, entering Fixation status as a fixed factor and including Species, Male ID (nested within Species), and Subsample ID (nested within Male ID and Species) as random factors. The Subsample ID described the 10 measurements made for a male at each sample occasion (Male ID by Fixative by Time). This grouping corresponded to Slide ID in all except two sampling occasions of a thin Great Tit sample, where we needed to make more than one slide to find 10 measurable sperm cells. The effect of fixation was minimal (F1,12 = 0.70, P = 0.51, and the Cohen’s d estimates from paired comparisons of the levels were low: live cells vs. formalin = 0.0053; live cells vs. ethanol = 0.0191; formalin vs. ethanol = 0.0138). Overall, there were no clear initial effects of fixation on sperm length when compared with measurements of live sperm cells.

Fig. 1
figure 1

Sperm lengths before and after fixation in ethanol and formalin. This figure shows length of fresh (live) sperm cells and length of sperm cells shortly after fixation in ethanol or formalin. Individual males are represented with different colors, and error bars denote 95% confidence limits around the mean length of 10 measured sperm cells per treatment per male

Effects of storage time in fixative: sperm cell length measurements

To estimate effects of Fixative and Storage time on sperm cell length, we entered Fixative (Ethanol or Formalin) as a fixed categorical factor and Storage time as a fixed covariate in a general linear mixed models analysis. Time of measurement was entered with the levels one to four representing the four sampling occasions. Species, Male ID (nested within Species) and Subsample ID (nested within Male ID and Species) were included as random factors. Fixative had a very small and non-significant effect (t = 1.02, Cohen’s d = 0.003, F1,102 = 1.05, P = 0.31). The effect of Storage time appeared to differ between Fixatives (Storage time × Fixative interaction: F1,102 = 3.98, P = 0.049). Storage time did not have a substantial effect on samples in ethanol (t = 0.03, Cohen’s d < 0.001, P = 0.98, when ethanol was the reference level). The average change in sperm cell length for the 15 individuals over 227 days of Storage time was small: in ethanol it increased by 0.13 µm (0.05%; about 1 pixel), SD = 1.07. For samples in formalin, Storage time had a very small, though significant, effect (t = 2.79, Cohen’s d = 0.004, P = 0.006, when formalin was the reference level). This constituted a decrease of 0.69 µm (0.56%), SD = 1.0 (Fig. 2).

Fig. 2
figure 2

Changes in sperm length with time of storage in ethanol and formalin. Sperm from the same samples were measured at four time points: 1: 0 days after fixation; 2: 45 days, 3: 146 days, 4: 227 days) for (A) Blue Tit (B) Greenfinch (C) House sparrow (D) Fieldfare (E) Hawfinch (F) Great Tit. Ten cells were measured in each sample at each of the time points. Error bars signify 95% confidence limits. Individual males are represented with different colors. Note that the y-axis scale differs across panels

We further analyzed if sperm cell length of Hawfinch and Fieldfare changed over a longer time span of storage in formalin. New slides were made of samples measured three years earlier, and old and new measurements were compared in a general linear mixed model with Time of measurement entered as a fixed categorical factor. Including the three random terms Species, Male ID (nested within species), and Subsample ID (nested within Male ID and Species) led to convergence problems. We therefore here report results with only Species and Subsample ID as random effects (from comparing variance attributable to random effects and comparing test statistics, this should be conservative with respect to Type I error, and based on AIC, better explained the data). There was no discernible effect of Time of measurement on sperm length (t = − 0.26, Cohen’s d <  < 0.001 F1,9 = 0.016, P = 0.90; Fig. 3a). Results were similar in the slightly less-well-fit model using Species and Male ID as random effects. Hence, the sperm cells seemed to keep well in formalin over three years.

Fig. 3
figure 3

Effect of long-term storage in formalin on sperm length after (A) three years of storage in formalin (2018–2021), and (B) 13–14 years of storage in formalin (2007/2008–2021). Mean values of percentage change over storage time of 10 measured sperm cells are plotted with 95% confidence limits. Three samples were measured per species. The dotted line represents identical values for the two measurement occasions

We also examined if length of sperm cells changed across longer storage times. We compared old and new measurements of three samples of each of five species that had been stored in formalin over 13–14 years (i.e., initially measured in 2007 and 2008 and remeasured in 2021). Again Time of measurement was entered as a fixed categorical factor. Species, Male ID (nested within species), and Subsample ID (nested within Male ID and Species) were entered as random terms. Overall there was a decline in sperm cell length over Time (t = 2.835, Cohen’s d = 0.016, F1,14 = 8.04, P = 0.013). The average reduction in sperm cell length from 2007/2008 to 2021 was 0.93%. Though species-level differences were not explicitly examined, the length reduction seemed most pronounced in the Blue Tit, the Great Tit, and the House Sparrow, whereas the Fieldfare and the Greenfinch appeared to maintain their sperm cell length across the storage duration (Fig. 3b). It was, unfortunately, not possible to control for between-observer variation in this analysis, so we cannot rule out that the observed changes may have been inflated by observer differences.

Dry storage on microscope slides: sperm cell length measurements

To check whether individual sperm cell lengths on microscope slides changed across a six-month period of dry storage, we entered Time of measurement as a fixed categorical factor, and Species, Male ID (nested within species), and Sperm ID (nested within Male ID and Species) as random terms. Overall there was a significant decline in sperm cell length over Time (t = 2.75, F1,99 = 7.53, P = 0.007). This reduction was, however, small: average decrease in sperm length was 0.2 µm or 0.18%, with a Cohen’s d = 0.005, i.e. a reduction approximating the measurement precision. ESM 1 shows the first and second measurements plotted against each other for the six species.

Preservation efficacy: examining sperm cell head damage after fixation in ethanol and formalin

ESM 2 shows examples of damaged and normal heads. Sperm cells stored in ethanol had a much higher proportion of damaged heads than sperm cells stored in formalin (Fig. 4). We tested this using GLMM with logit link functions. We created one model for nucleus damage and one for acrosome damage. Species and Fixative were fixed effects. We did not test for an interaction between Species and Fixative, nor did we treat Species as a random term, because attempts to do so caused convergence problems. Similarly, a random term of Subsample ID nested within Male ID led to convergence problems; we therefore report models with only subsample identity as a random effect (since these models were conservative with respect to Type I error, based on comparing variance attributable to random effects and comparing test statistics). Results again were similar in a model including Male ID as a random effect. When testing the total number of damaged cells by fixative, we found highly significant differences in proportions (Table 2; nucleus damages comparison z = 3.80, p < 0.001; acrosome damages comparison z = 6.05, p < 0.001). Frequencies of damaged-to-undamaged cells were far higher in ethanol than in formalin: overall, 888 (70.5%) of 1260 sperm cells had acrosome damages in ethanol versus 38 (3.0%) in formalin. Further, 296 (23.5%) of 1260 sperm cells had nucleus damage in ethanol versus 3 (0.2%) in formalin.

Fig. 4
figure 4

Effect of fixatives on morphological integrity of sperm cells. This panel shows the percentage of damaged sperm cell heads found in ethanol and formalin samples. The head damage categories were (A) acrosome damage (acrosome paled, degraded, or missing) and (B) nucleus damage (nucleus distended or burst, or acrosome and nucleus both missing). In total, 1260 sperm cells were scored for head damage

Table 2 Effect of fixative type on sperm integrity


We found no significant effects of the fixation process when comparing length of fresh (living) sperm cells with cells fixed in formalin and ethanol. Nor did we find notable or consistent differences in sperm cell length between samples stored in formalin and ethanol over a period of 227 days. The small differences in sperm length observed between first and fourth Storage time point (in ethanol: 0.13 µm or 0.05%, and in formalin 0.69 µm 0.56%) may be more related to sampling variation within ejaculates than to systematic changes in sperm morphology due to storage and fixatives, as most individuals did not show a consistent directional trend but rather variation around a mean sperm length (Fig. 2). Further, we did not find significant changes in sperm cell length in the Fieldfare and the Hawfinch after storage in formalin for three years. Sperm cells did however shorten significantly over a longer storage duration of 13–14 years, with an average reduction of 0.93% across five species. The reduction seemed to be more pronounced in the Blue Tit, the Great Tit, and the House Sparrow, than in the Fieldfare and the Greenfinch. This reduction could have been due to shrinkage or tail-tip breakage. As the sperm cell measurements in 2007–2008 and 2021 were made by different people, we cannot exclude the possibility that this difference to some extent was due to a between-measurer error. The consistent differences among males apparent in Fig. 2 suggest that between-observer error might be relatively minor compared to biological differences among individual birds.

Our findings are important for meta-analyses and reviews that combine data from sources using different procedures for sperm measurements, including measurements of fresh sperm (e.g. Fuentes-Albero et al. 2021; Kahrl et al. 2021). The results are in line with those of Schmoll et al. (2016), who found no evidence that the total length of sperm cells changed following approximately one year of storage in formalin. On the other hand, sperm head integrity was much better maintained in formalin than in ethanol, implying that the latter should be avoided e.g. in studies focusing on sperm head morphology (see below).

We also found that sperm cells initially fixed in formalin seemed to keep well in dry storage on slides for six months. Length of individual sperm cells measured twice with an interval of six months showed a significant decrease, but the magnitude of the decrease was small and close to the measurement precision (0.2 µm or 0.18%, with a Cohen’s d = 0.005).

One clear difference between the fixatives was that the proportion of sperm cells with head damage was much higher in ethanol than in formalin. Overall, 71% of 1260 sperm cells had acrosome damage in ethanol versus 3% of 1260 sperm cells in formalin. Hence, more effort had to be spent on locating intact sperm cells in ethanol than in formalin (i.e., cells that were not ruptured, missing parts of the acrosome, or showing other deformities). Sperm cells with paled heads were deemed measurable as long as the acrosome was not otherwise damaged or lost. We have assumed that the propensity of cells to degrade was not size related. If this was the case, a systematic bias might have been introduced to the length measurements.

It is known that the use of different staining methods or fixative types may cause sperm heads to swell by penetrating its membrane and influencing the osmotic balance (Czubaszek et al. 2019). Presumably, it was such an osmotic effect that caused the sperm cell heads to be more prone to swelling and bursting in ethanol than in formalin.

There are incentives to find good alternative fixation agents to formalin. Formaldehyde has allergenic, carcinogenic, and mutagenic properties (e.g. Pandey et al. 2000; Aalto‐Korte et al. 2008). Formalin fixation also damages DNA (Hykin et al. 2015), and an advantage of using ethanol as fixative is that DNA is much more accessible for downstream analyses than when using formalin. We found that formalin clearly was the best fixative in terms of maintaining the integrity of sperm cell heads. However, the fact that there were no consistent differences in sperm cell length between the two fixatives, indicates that samples stored in both formalin and ethanol can be used for comparisons that involves sperm cell length measurements, as long as care is taken to avoid measuring sperm cells that appear abnormal.