Allele frequency distribution of 1691G>A F5 (which confers Factor V Leiden) across Europe, including Slavic populations

The allele 1691A F5, conferring Factor V Leiden, is a common risk factor in venous thromboembolism. The frequency distribution for this allele in Western Europe has been well documented; but here data from Central, Eastern and South-Eastern Europe has been included. In order to assess the significance of the collated data, a chi-squared test was applied, and Tukey tests and z-tests with Bonferroni correction were compared. Results: A distribution with a North-Southeast band of high frequency of the 1691A F5 allele was discovered with a pocket including some Southern Slavic populations with low frequency. European countries/regions can be arbitrarily delimited into low (group 1, <2.8 %, mean 1.9 % 1691A F5 allele) or high (group 2, ≥2.8 %, mean 4.0 %) frequency groups, with many significant differences between groups, but only one intra-group difference (the Tukey test is suggested to be superior to the z-tests). Conclusion: In Europe a North-Southeast band of 1691A F5 high frequency has been found, clarified by inclusion of data from Central, Eastern and South-Eastern Europe, which surrounds a pocket of low frequency in the Balkans which could possibly be explained by Slavic migration. There seem to be no indications of variation in environmental selection due to geographical location.


Introduction
The gene F5 encodes for blood coagulation Factor V, and the 1691G>A F5 transition in exon 10 causes a substitution (R506Q) known as Factor V Leiden. This genetic variant is the most common hereditable risk factor for thromboembolic disease (Kujovich 2011). Clinical observations with the 1691A F5 allele include an increased risk of deep vein thrombosis (Koster et al. 1993;Ridker et al. 1998;Rosendorff and Dorfman 2007), and 1691A F5 is also associated with an increased risk for pregnancy loss and possibly other obstetric complications (Ridker et al. 1998;Foka et al. 2000), these effects being semi-dominant.
The 1691A F5 allele is thought to have originated with a single mutation event which took place 21,000-34,000 years ago (Zivelin et al. 1997). The 1691A F5 allele appears to be generally confined to Indo-Europeans: the 1691A allele is very rare or non-existent in Asia (0.6 %) and some regions of Africa (0.0 %) (Hira et al. 2002;Nasiruddin et al. 2005;They-They et al. 2010); but note that Settin et al. (2008) reported a frequency of 10.2 % in Egypt. In various parts of Europe the frequency of this variant has been described as uneven (Rees et al. 1995;Herrmann et al. 1997;Paseka et al. 2000;Adler et al. 2010), with high frequencies observed in the Czech Republic (5.1 %, n =2819) (Prochazka et al. 2003) and Turkey (5.0 %, n =2003) (see Fig.1 for references).
We provide a summary study of the frequency distribution of 1691A F5 across Europe by addition of values from 12 countries/regions which are predominantly populated by Slavic populations (note that Serbia and Montenegro separated in 2006 -data in this article precede this date).

Materials and methods
The databases Medline (U.S. National Library of Medicine, Bethesda, USA) and several websites providing local language articles (e.g., www.google.ba for Bosnia/Herzegovina, www. google.hr for Croatia; Google Inc., Mountain View, California, USA) were searched until March 2012. Keywords and/or mesh terms used were (together with country/region names): "1691A F5", "Leiden", "FV', "F5", "FVL", "Factor 5", "fv". Data for Bosnia/Herzegovina were obtained from our previous study (Valjevac et al. 2013). Data were included from: groups of blood donors, those defined as "healthy" adults in case-control studies, and population groups. Summary data in Europe for NM_000130.4:c.1691G>A F5 is presented as a map.
In order to test the significance of the resulting distribution, a chi-squared test for independence (Ulukus et al. 2006) and post-hoc pair-wise proportion comparison tests were carried out between all countries by two methods: (1) as in Zar (1999): tests analogous to, and refered to here as, "Tukey tests" (critical values from R package DKT; Lau 2011); (2) z-tests with Bonferroni correction (Microsoft Excel 2007, Redmond, WA, USA; http://www.csun.edu/~vchsc006/469/ z.html). The critical significance level was set at p =0.05.

Results
A map showing average frequencies for 1691A F5 throughout Europe is given in Fig. 1a.
Group 1 consisted of countries/region with 1691A F5 allele frequency <2.8 %; the average frequency in the group was 1.9 %. Countries in group 1 included: Russia, Finland, Ukraine, Poland, Slovenia, Serbia/Montenegro, Croatia, The Netherlands, France, Spain, Portugal (these countries have reasonable sample sizes, and each has at least one significant difference with another country's value). If those countries with no significant differences at all (with smaller sample sizes) were included, then this group consisted of most (ten) Slavic populations together with another eight non-Slavic countries (Fig.1).
Group 2 consisted of countries with 1691A F5 allele frequency ≥2.8 %; the average frequency in the group was 4.0 %: Countries with at least one pair-wise significant difference by Tukey tests included: Sweden, Norway, Denmark, Germany, U.K., The Czech Republic, Austria, Hungary, Italy, Bulgaria, Greece and Turkey. Including those countries with no significant differences at all (with small sample sizes), this group consisted of 13 countries including the predominantly Slavic countries Czech Republic and Bulgaria (Fig. 1).
Tukey tests and z-tests gave similar numbers of pair-wise differences between countries (66 and 59, respectively; Tukey test results are shown in Fig.1b.; z-test results not shown). However, the Tukey tests defined statistically significant differences involving Slovenia (2.5 %, n =526), Russia (2.4 %, n =539) and Serbia and Montenegro (2.2 %, n =499), which were not identified using z-tests, whereas z-tests defined differences involving Romania (8.3 %, n =42 -note the very small sample size). Additionally the other Tukey test differences were more evenly distributed whereas with z-tests most differences were concentrated on Spain (2.0 %, n =889), and Finland (1.1 %, n =1285).

Discussion
In Europe the average 1691A F5 allele frequency was found to be 3.5 % (n =44,627, references in Fig.1).
In order to highlight the distribution of high frequency populations, European countries/regions were arbitrarily divided into two groups: with low (group 1) or high frequency (group 2), with an arbitrary boundary at 2.8 % (delineated by thick lines in Fig. 1). The group of high frequency countries (i.e., group 2) was found to have a North-Southeast band distribution. A pocket of low frequency with some Southern Slavic countries (Slovenia, Croatia, Bosnia and Herzegovina, and Serbia and Montenegro) and Albania (although note low sample size) was found to be enclosed by this band of high frequency. We can therefore surmise that the band of high frequency was created by founder effect seeding, and the pocket of low frequency might have been created with the purported movement of Eastern Slavic populations with low frequency to this region in the fifth to seventh centuries. However, conclusions referring to the migration of Slavic peoples have to take into account the fact that the Czech Republic and Bulgaria have high frequencies of this allele. Especially note that the Czech Republic has the highest frequency value in Europe (5.1 %; with a reasonable sample size of 2819) but note that this value is not significantly different to values in the neighboring countries Germany and Austria. Additionally, therefore, it could be hypothesized that these result from high influx of peoples carrying the variant from neighboring populations.
It is not known to what extent positive or negative selection acts on this allele -it is thought that the negative impact of 1691A F5 before reproductive age exists but reports in children are scanty (El-Karaksy and El-Raziky 2011). From the North-Southeast band of high frequency of the 1691A F5 allele there seem to be no indications of possible environmental effects which could vary according to location (and act as selection) on this allele.
Although such a map in Fig. 1 is informative, the statistical significance of the distribution found depends directly on the sample sizes of the data (see legend to Fig.1). Additionally, there are several ways to do post-hoc pair-wise comparison tests between countries, which take into account sample sizes. By using Tukey tests or z-tests it was found that there was only one intra-group significant difference (within group 2 between Czech Republic and Denmark; Fig.1b), but many significant differences between groups. (Significance maps at the p =0.01 level, for either test, were very similar -not shown.) This means that inferences from individual country values should be made with caution, and the high frequency group of countries could be treated more or less as one block (and likewise for the low frequency group). The map in Fig. 1 represents the present state of knowledge concerning the distribution of the 1691G>A F5 allele, and theoretically increasing the sample sizes would increase the number of pairwise statistical differences shown in Fig.1b. It should be noted, however, that several sample sizes are already reasonably large e.g., The Netherlands (2.2 %, n =2631), Germany (3.8 %, n =2199) but show no intra-group significant differences by either of the methods used.
Despite the overall similarities in the conclusions drawn from the Tukey and z-tests, it should be noted that these analyses suggest (and confirm) the appropriateness of Tukey tests, rather than z-tests with Bonferroni correction, for this type of study. The reason for this is that any inference from differences involving Romania with only 42 subjects is rather dubious (as is the rather high allele sample frequency of 8.3 %, which might not be a good representation of the frequency in the population). The z-tests, however, gave six differences, between Romania and: Finland, Poland, Ukraine, Croatia, Spain, and Portugal (data not shown), whereas the Tukey tests gave no significant differences involving Romania (Fig.1b).
The clinical effects of the 1691A F5 variant are fairly well established, with a three to eight-fold increased thrombotic risk for heterozygotes, and up to 80-fold increased risk for homozygotes, found in a pooled analysis of eight case-control studies with over 2300 cases from Western European countries (Emmerich et al. 2001). In a Croatian case-control study (with 160 cases) a high frequency of thromboembolism was also found to be associated with the 1691A F5 allele: 21 % of patients with venous thromboembolism carried the 1691A F5 allele, but only 4 % of healthy controls (Coen et al. 2001). Further studies could be done to confirm whether or not the presence of the 1691A F5 allele does indeed provide a similar risk for venous thromboembolism across Europe.

Conclusions
In summary, a North-Southeast band distribution of 1691A F5 high frequency in Europe was found by the addition of data collated and presented from Central, Eastern and South-Eastern Europe. A pocket of low frequency in some Southern Slavic countries was found to be enclosed by this band of high frequency. The reasons for this distribution are not known, but perhaps reflect founder effects following migration after the last ice age, followed by the migration of Slavic peoples from the fifth century. There seem to be no indications of variation in environmental selection due to geographical location.  (2000) Factor V leiden and prothrombin