Human Genetics

, Volume 119, Issue 1, pp 212–219

Y-chromosomes and the extent of patrilineal ancestry in Irish surnames

Authors

  • Brian McEvoy
    • Smurfit Institute of GeneticsTrinity College
    • Smurfit Institute of GeneticsTrinity College
Original Investigation

DOI: 10.1007/s00439-005-0131-8

Cite this article as:
McEvoy, B. & Bradley, D.G. Hum Genet (2006) 119: 212. doi:10.1007/s00439-005-0131-8

Abstract

Ireland has one of the oldest systems of patrilineal hereditary surnames in the world. Using the paternal co-inheritance of Y-chromosome DNA and Irish surnames, we examined the extent to which modern surname groups share a common male-line ancestor and the general applicability of Y-chromosomes in uncovering surname origins and histories. DNA samples were collected from 1,125 men, bearing 43 different surnames, and each was genotyped for 17 Y-chromosome short tandem repeat (STR) loci. A highly significant proportion of the observed Y-chromosome diversity was found between surnames demonstrating their demarcation of real and recent patrilineal kinship. On average, a man has a 30-fold increased chance of sharing a 17 STR Y-chromosome haplotype with another man of the same surname but the extent of congruence between the surname and haplotype varies widely between surnames and we attributed this to differences in the number of early founders. Some surnames such as O’Sullivan and Ryan have a single major ancestor, whereas others like Murphy and Kelly have numerous founders probably explaining their high frequency today. Notwithstanding differences in their early origins, all surnames have been extensively affected by later male introgession. None examined showed more than about half of current bearers still descended from one original founder indicating dynamic and continuously evolving kinship groupings. Precisely because of this otherwise cryptic complexity there is a substantial role for the Y-chromosome and a molecular genealogical approach to complement and expand existing sources.

Introduction

The paternally inherited Y-chromosome has an extensive track record in exploring various aspects of human population history, particularly large-scale patterns of migration (see Jobling and Tyler-Smith 2003). In many societies surnames are also paternally inherited and this theoretical co-inheritance of Y-chromosome and surname has been exploited in several population genetic studies. The geographic origin of Gaelic Irish surnames, for example, was used to describe significant differences between the Y-chromosome complements of eastern and western regions of the island (Hill et al. 2000). The same study also noted significant differentiation of Y-chromosomes from Irish men with English or Scottish surnames versus those with Gaelic surnames, an observation which pointed to a general congruence of surnames with patrilineal ancestry. There is a clear, and long predicted, further potential for the Y-chromosome to reconstruct the origin and history of individual surnames (Jobling 2001). However, this niche role has not been extensively exploited as, until relatively recently, there was a dearth of high-resolution markers necessary to characterise fine-scale Y-chromosome relationships. A limited number of low-resolution studies have been undertaken including an examination of Korean surname lineages using the 49a/Taq1 polymorphism (Kim et al. 1999) and a study of the English surname ‘Sykes’ using four microsatellite loci (Sykes and Irven 2000).

Although paternally inherited hereditary surnames are the norm across most of Europe, early medieval Ireland was probably the first culture to adopt their usage, with some appearing in the early 10th century ad. In addition Irish surname nomenclature is strongly paternalistic. Virtually all indigenous (or Gaelic) Irish surnames include the prefix Mac or Ó, meaning ‘son of’ or ‘grandson/descendent of’, respectively, followed by the name of an ancestor; for example ‘O’Brien’ meaning grandson/descendent of Brian. This system inherently implies a single male ancestor and consequently a single Y-chromosome lineage within a modern surname population. However, Gaelic Irish surnames were later extensively transformed by a change in vernacular language, from original Irish to English, that accompanied England’s gradual conquest of Ireland from the late 12th century ad. Most names were haphazardly and inconsistently converted to English language forms (Woulfe 1923; MacLysaght 1985a). The exceptional age of Irish surnames together with this extensive anglicisation makes it uncertain whether modern bearers of the same surname are linked by any shared patrilineal kinship. Our study examines Y-chromosome diversity in 1,125 men, bearing over 40 surnames, to answer this question and, in so doing, assess the potential of Y-chromosomes to reconstruct surname history through molecular genealogy.

Materials and methods

Samples

From volunteers, 1,125 usable DNA samples were collected in accordance with the principle of informed consent. Most were assembled from postal requests to relevant surname bearers selected from telephone directories. Irish surnames often show distinct geographic distributions, so attempts were made, in so far as possible, to reflect these in the major surnames samples (drawing on the earliest records of distribution dating to the mid-19th century). Volunteers undertook self-collection of buccal cheek cells using a sterile nylon cytology brush, from which DNA was later extracted using a standard phenol/chloroform protocol with Proteinase-K digestion. The participation rate in response to postal requests was 31.8%. Thirty-one of these samples were first reported elsewhere (Moore et al. 2006). A general Irish population sample of 765 Y-chromosomes (Moore et al. 2006) was used as a background control group.

Laboratory procedures

Each Y-chromosome was genotyped for 19 rapidly mutating short tandem repeat (STR) loci (DYS19, DYS385A, DYS385B, DYS388, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS434, DYS435, DYS436, DYS437, DYS438, DYS439, DYS460, DYS461, DYS462) in three multiplex polymerase chain reactions (PCR) using fluorescently labelled primers and polyacrylamide gel electrophoresis, essentially as described by Bosch et al. (2002). As the repeat length of DYS389II contains DYS389I, the latter was subtracted from the former to give the derived DYS389B score (Rolf et al. 1998). The homologous products DYS385A and DYS385B cannot be differentiated using these primers and were therefore not included in subsequent analyses. As the majority of the Y-chromosome does not recombine, the remaining 17 STR loci together form highly informative and discriminating single haplotypes.

Samples were also genotyped for up to six binary, generally single nucleotide, polymorphisms (M269, M170, M26, SRY-1532, M35, YAP) in a hierarchal manner using PCR/restriction fragment length polymorphism (RFLP) assays. These are more slowly evolving than STR loci and consequently divide individual haplotypes into broader classifications of greater evolutionary time depth termed haplogroups (Y-Chromosome Consortium 2002; Jobling and Tyler-Smith 2003).

Statistical analysis

The correspondence of a surname label to real Y-chromosome genetic division and thus common patrilineal ancestry was investigated using several approaches. The elevated frequencies of distinct Y-chromosome haplotypes in different surnames are the most basic expected legacy of surname foundation by a single or limited number of men. The probability of a man sharing an identical 17 STR Y-chromosome haplotype with another individual from any defined group (such as a surname) is a simple function of that haplotype’s group frequency. Match probability scores were calculated for each individual and averaged within and across surnames using purpose-written PERL software.

The congruence of surname boundaries with Y-chromosome STR diversity was also examined through an analysis of molecular variance (AMOVA) (Excoffier et al. 1992). The method partitions the Y-chromosome variance of the sample universe into proportions found within and between surnames. Variance components were calculated using ARLEQUIN (version 2.000) (Schneider et al. 2000). The relationship between individual haplotypes for these calculations was defined as the sum of the squared difference in the number of repeat units at each locus. Significance was gauged by randomly permuting individuals across surname populations 10,000 times to generate a distribution of values under the null hypothesis.

Finally, a Partial Mantel test (Smouse et al. 1986) was employed to assess the correlation of surnames and Y-chromosomes while controlling for the effect of a third variable, geography. The relationships between individual samples are set out in a separate matrix for each of the three factors: surname, Y-chromosome and geography. The matrices describing the relationship of surname and Y-chromosome between individuals were based on a simple match/no match binary description. The matrix of geographic relationships was defined according to the distance (in kilometres) between the paternal county of origin for each individual. P value was determined by random permutation of matrix elements 10,000 times. Computations were carried out using the PASSAGE package (Rosenberg 2001).

Median-joining networks and TMRCA estimates

Median-joining (MJ) networks graphically describe the phylogeny or inter-relationship of different Y-chromosome haplotypes (Bandelt et al. 1999). MJ networks were constructed using the program NETWORK (version 4.1). An arbitrary seven-class weighting system for the 17 loci was employed to reflect the phylogenetic information content of each locus. Highly mutable loci are prone to recurrent and parallel mutations that can confound relationship reconstruction. Mutation rates are not available for all loci, therefore weighting was instead based on the variance in repeat score amongst 985 Y-chromosomes within the discrete R1b3 haplogroup. Variance can be related to mutation rate (Di Rienzo et al. 1994) allowing a proxy and relative indication of mutability. The most stable loci (least mutable) were weighted 9 (DYS462, DYS436, DYS434), then 8 (DYS388, DYS393, DYS438, DYS435, DYS437, DYS19); 7 (DYS461); 6 (DYS389I, DYS392, DYS389B) and 5 (DYS460, DYS391). The apparent high mutability of DYS390 and DYS439 was reflected in greater increments and low weightings of 3 and 1, respectively.

Although a single (or limited number of surname founders) should result in the elevated frequency of their haplotype(s) in the modern surname population, the relatively high mutation rate of STR loci and long history of Irish surnames mean even a single lineage is expected to have diversified into a cluster of related types. Discrete individual lineages within surnames were initially identified as frequent and phylogenetically central (ancestral) Y-chromosomes with subsidiary diversity taken as all one-step neighbours and any further haplotypes that could be traced back to the ancestral haplotype via a continuous (filled) pathway of increasing frequency.

Using prior knowledge of STR mutation rate for calibration, the molecular clock can then be used to estimate the time to the most recent common ancestor (TMRCA) of individual lineages through the observed accumulated mutations from the ancestral Y-chromosome (ρ statistic) (Morral et al. 1994). TMRCA estimates from the ρ statistic, along with associated standard deviations (σ) (Saillard et al. 2000), were calculated for each lineage in NETWORK, applying a mutation rate of 1/2,131 years for a 17 STR marker haplotype (Zhivotovsky et al. 2004).

Finally, an ad hoc rate of generational decay (r) in the proportion of surname bearers still descended from an original founding male was estimated as \( r{\text{ = 1}} - \sqrt[n]{x}, \) where n is the number of generations since the surname founding male and x the fraction of the modern surname sample estimated to be part of the corresponding Y-chromosome lineage. This is derived from a standard compound calculation formula and may also be interpreted as a historical (per generation) rate of male introgression into a surname group. An average male generation time of 30 years (Tremblay and Vezina 2000) was assumed in these calculations.

Results and discussion

DNA samples from 1,125 males, encompassing multiple bearers of 43 Gaelic Irish surnames, were collected. The average surname sample population was 26 but this ranged from 3 to 99 (trivial spelling differences within surnames were not distinguished). Each Y-chromosome was genotyped for 17 relatively rapidly evolving STR loci and several haplogroup-defining binary polymorphisms. Full genotypes from this study can be found on the author’s website. The vast majority of the samples (about 90%) belonged to the R1b3 haplogroup, defined by the derived state (C allele) at the M269 SNP (Cruciani et al. 2002). Most of the remainder falls into the IxI1b2 classification. Consequently haplogroups were generally uninformative in respect of more recent relationships in an Irish context and most analyses relied on STR diversity.

The average probability of a man sharing an identical 17 STR Y-chromosome haplotype with another man of the same surname is 8.15% (Fig. 1). This value is over 30 times greater than the background Irish population Y-chromosome match probability of 0.2%. However, the extent of sharing differs widely between surnames. Amongst the most extensively sampled names (with a sample size ≥50) it ranged from a 47-fold higher probability over background levels in Ryan to a more modest 4.5-fold increase in Kelly. The wide range indicates high variability in the origin and history of individual surnames. AMOVA confirmed that a highly significant proportion (19.6%, P<0.00001) of the observed Y-chromosome diversity was found between the 43 surnames, demonstrating demarcation of shared patrilineal kinship by surname. The remaining variation occurs between individuals of the same name, simultaneously indicating considerable heterogeneity within surnames and no simple one to one correspondence of surnames and Y-chromosome haplotypes.
https://static-content.springer.com/image/art%3A10.1007%2Fs00439-005-0131-8/MediaObjects/439_2005_131_Fig1_HTML.gif
Fig. 1

Y-chromosome sharing in Irish surnames. The histogram shows the average probability that any two men randomly drawn from a particular surname (sample sizes ≥50) will also share an identical Y-chromosome (17 STR haplotype). The background value for this statistic in the general Irish population (‘Ireland’), calculated from 765 similarly genotyped Y-chromosomes, is also included

Irish surnames often show strong regional specificity while Y-chromosome diversity can also display marked geographic sub-structure (Hill et al. 2000). For this reason AMOVA analysis was also undertaken for a subset of surnames from two smaller geographical areas. Among 17 surnames (n=315) from the North East of the island (the historical county of Down and surrounding areas) a highly significant (P<0.00001) proportion (30.6%) of the Y-chromosome variance was found between surnames. A similarly significant (P<0.00001) 20.4% of variance occurred between 11 surnames (n=112) from a midland region (centred on County Cavan). Finally the correlation between surname and Y-chromosome distances across all 43 surnames examined is significant even when geographic distance is considered (Partial Mantel test, P<0.0001). The relationship between surnames and paternal ancestry thus appears robust to any geographic sub-structuring of Irish Y-chromosomes.

Individual surname histories, where sample size was >=50, were explored using MJ networks (Fig. 2). These are a convenient way of illustrating the relationship of Y-chromosomes found within a particular surname. As expected from match probability statistics, most surname groups are clearly distinct from the background Irish phylogenetic structure (also shown in Fig. 2). Many, specifically Ryan, O’Sullivan, O’Neill, Byrne and Kennedy, show one predominant Y-chromosome, which forms the ancestral haplotype of a broader diversified lineage. Such phylogenies are consistent with one major eponymous ancestor to these surnames, a conclusion that is in general agreement with expectations from historical accounts (MacLysaght 1985a, b).
https://static-content.springer.com/image/art%3A10.1007%2Fs00439-005-0131-8/MediaObjects/439_2005_131_Fig2_HTML.gif
Fig. 2

Median-joining (MJ) networks of Y-chromosomes (17 STR haplotype) found in 11 Irish surnames (n≥50). These illustrate Y-chromosome phylogeny, where each circle represents a distinct haplotype, with circle area proportional to frequency, and line length between haplotypes indicates their mutational divergence. The surname, sample size (n) and estimated number of bearers in Ireland are shown below each network (trivial differences in spelling within surnames have been ignored). Haplogroup designations are also indicated (a small number of samples could not be assigned to any of the haplogroups tested and these are marked asterisk). A MJ network for a general Irish population sample of similar sample size is also shown for comparison. The broken lines indicate the estimated boundaries for 15 discrete patrilineal surname lineages (marked AO) determined as described in the Materials and methods section. However, the lineage C (O’Neill) boundary was restricted to a maximum of two mutational steps due to the dense phylogeny

Others names, including McCarthy, McGuinness, Donohoe and McEvoy, show evidence of several founders or at least the prominent biological legacies of several males. Often, though not always, different Y lineages display a particular and strong geographical bias indicating multiple independent origins in different parts of the island. For example, McEvoy has a strong dual foundation signature and this is probably explained by the independent anglicisation of different, but phonetically similar, Irish surnames to the same English form [Mac Fhíodhbhuidhe in the midlands and Mac an Bheatha in the Northeast (MacLysaght 1985a)]. In other instances the presence of multiple major founders is inexplicable by different geographies and unanticipated by history. For example, both McCarthy and McGuinness have at least two major lineages in their historical geographic heartlands, notwithstanding a single putative ancestor for each proposed by historical sources (MacLysaght 1985a, b).

Murphy and Kelly are the most common Irish surnames, with approximately 66,000 and 60,000 bearers, respectively, in Ireland or about 1.2% of the entire island’s population each (surname numbers are estimated from telephone directories). There is a lack of phylogenetic partitioning of Y-chromosome haplotypes within these two names when compared to others and they are marked by high diversity and more diffuse network structure (Fig. 2). This implies numerous patrilineal lineages, none of which overwhelmingly predominate, observations which are consistent with historical suggestions that the personal names from whence these surnames derive were common in the past and consequently passed into hereditary surnames (by the addition of Mac or Ó) on numerous occasions throughout Ireland (MacLysaght 1985a, b).

Multiple founding ancestors may explain, in part at least, the high number of people currently bearing the Murphy and Kelly surnames. However, there is no general correlation between match probability scores (representing the number of ancestors) and the current frequency of the surnames shown in Fig. 2. Such a relationship may well exist in a wider sample of surnames since those examined in detail here are heavily biased to high frequency names (including six of the top ten most common Irish surnames). Even where surnames have similar qualitative origins, a large disparity in their current numbers was sometimes observed. For example, there are nearly 40,000 bearers of the O’Sullivan surname in Ireland but other names, like O’Gara, also with a principally monogenic origin, are far less common (<1,000 bearers). The current frequency may have been influenced by differences in fecundity engendered by the social power and prestige associated with the name and its bearers in the past (Nicholls 2003). Whatever the explanation it illustrates the substantial variance in male reproductive legacy over relatively short timeframes, which likely plays an important role in shaping broader patterns of Y-chromosome, and indeed genome-wide, variation over the longer term.

Estimates of TMRCA for 15 major surname lineages identified in Fig. 2 ranged from 380 to 2,010 (average of 1,100) years before present (YBP). However, virtually all dates are, at least, consistent with a common male ancestor for each lineage within the major surname foundation period ca. 900–1200 ad, given the uncertainty associated with the ρ estimator (Fig. 3). Other sources of error in TMRCA estimates include imprecise mutation rates and uncertainty in lineage boundaries. However, exploration of different lineage definitions, rates and TMRCA estimators could still not reject a roughly millennial age for surname lineages collectively.
https://static-content.springer.com/image/art%3A10.1007%2Fs00439-005-0131-8/MediaObjects/439_2005_131_Fig3_HTML.gif
Fig. 3

Time to the most recent common ancestor (TMRCA) estimates for major surname lineages. Associated standard deviation (σ) for each ρ estimate is also shown. The relevant surname and lineage code (AO, corresponding to those used in Fig. 2) are given below each estimate. The major period of Gaelic surname foundation (ca. 900–1200 AD) is indicated between the broken lines

Notwithstanding the clear variation in surname foundations from this time, modern surname groups are invariably a mixture of numerous paternal lineages of varying modern legacy, indicating high permeability to later male introgression. In the closest ideals to monogenic surnames encountered in this study (Ryan and O’Sullivan) only about 50–55% of the modern surname bearers still appear to descend from one original founder. The origins of the minor lineages are somewhat enigmatic. Vertical transmission of Y-chromosome/surname linkage can be disrupted by non-paternity events, adoptions or occasional maternal transmission of surname, all of which are likely to have occurred.

Disruption could also be explained by the horizontal absorption or transmutation of initially distinct surnames to other forms during the haphazard anglicisation process. The best-sampled name McGuinness (n=99), derived from the Irish Mac Aonghusa meaning ‘son of Angus’, showed the signature of two major ancestral males as well as the smaller legacies of several other paternal lineages (Fig. 4). The surnames McCreesh and Neeson are putative distinct anglicisations of the original Mac Aonghusa (MacLysaght 1985b) and in support of this origin they both display a shared patrilineal ancestry with McGuinness (Fig. 4). However, the two derive from different McGuinness lineages—neither of which corresponds to the most frequent or diverse lineage, which is the most likely legacy of the eponymous Angus. The surname Guinness is potentially associated with a minor McGuinness lineage but is not obviously connected to any of the bigger clusters. In addition very early genealogical tracts name a common 6th century ad male ancestor for the McGuinness and McCartan surname progenitors (Byrne 2001). Remarkably this appears to be confirmed by the close relationship of the most prominent, and probable founding, lineages in each surname (Fig. 4). This suggests a degree of reliability to early Irish historical accounts and records that are otherwise difficult to confirm.
https://static-content.springer.com/image/art%3A10.1007%2Fs00439-005-0131-8/MediaObjects/439_2005_131_Fig4_HTML.gif
Fig. 4

Median-joining network of Y-chromosomes in the McGuinness and four putatively related surnames. Different colours represent different surnames with proportional ‘pie-slices’ for haplotypes shared across surnames. McGuinness Y-chromosomes are shown in grey (n=99), McCartan in blue (n=13), McCreesh in red (n=7), Neeson in yellow (n=8) and Guinness in green (n=3). See Fig. 2 legend for a fuller explanation of MJ networks diagrams

Finally, we used the O’Sullivan surname sample to estimate a historical rate of male introgression. The unusual personal nickname from which the surname derives (MacLysaght 1985a) suggests it arose only once, providing the opportunity to estimate a rate less affected by the confusion of multiple independent founders or horizontal anglicisation events. Assuming the name arose 35 generations ago (ca. 950 ad) the rate of decay from the founding Y-chromosome is ~1.6% per generation. This represents a maximum non-paternity rate, only realised if all later Y-chromosomes were introduced by such means. Even then, this is at the lower end of current estimates of non-paternity, which vary greatly. However, it is close to both the 1.49% estimated from genotypic data in Iceland (Helgason et al. 2003) and the 1.3% deduced from a similar analysis of the English surname ‘Sykes’ (Sykes and Irven 2000).

In summary, our study demonstrates for the first time that surnames collectively are markers of shared recent patrilineal kinship. The extent of this varies depending on the specific name and the nature of its foundation. Some names have numerous early origins, while others have a defined and focused early genesis. In either case, it is clear that subsequent events of the 1,000-year-long history of Irish surnames have been a substantial force in shaping the genetic diversity of a modern surname population. The frequency today may be influenced by the power and prestige associated with the name in the past, while intra-surname Y-chromosome heterogeneity indicates complex and continuously evolving post-foundation histories. The Y-chromosome proved to be a powerful tool in unravelling this history, demonstrating a significant role for a molecular genealogical approach to complement and expand existing historical sources.

Electronic database information

The URLs for data in this article are as follows:

Acknowledgements

We gratefully acknowledge the financial support of Patrick Guinness and Joseph A. Donohoe, through the Trinity Trust, for this research. We thank volunteers for their participation, L.T. Moore for access to pre-published Irish Y-chromosome data as well as K.H. Wolfe, C. Tyler-Smith, M.A. Jobling and T.E. King for helpful discussion.

Copyright information

© Springer-Verlag 2006