Background

The genetic affinities of the Jewish populations have been studied since the early days of genetics, yet the origin of these populations is still obscure. Some of the studies, trying to establish the origins of the Jewish populations with autosomal markers, claimed that the Jewish populations have a common origin, but others concluded that the Jews are a very diverse group. This corpus of studies has already been critically reviewed [1].

The origin of Eastern European Jews, (EEJ) by far the largest and most important Ashkenazi population, and their affinities to other Jewish and European populations are still not resolved. Studies that compared them by genetic distance analysis of autosomal markers to European Mediterranean populations revealed that they are closer to Europeans than to other Jewish populations [13].

EEJ are the largest and most investigated Jewish community, yet their history as Franco-German Jewry is known to us only since their appearance in the 9th century, and their subsequent migration a few hundred years later to Eastern Europe [4, 5]. Where did these Jews come from? It seems that they came to Germany and France from Italy [58]. It is also possible that some Jews migrated northward from the Italian colonies on the northern shore of the Black Sea [9]. All these Jews are likely the descendents of proselytes. Conversion to Judaism was common in Rome in the first centuries BC and AD. Judaism gained many followers among all ranks of Roman Society [1013].

The aim of this study is to establish the likely origin of this major Jewish population by using a larger dataset of autosomal markers, and compare the results to analyses based on the available data for the X and Y chromosomes and for mtDNA.

Methods

Six Jewish populations: EEJ, Moroccan Jews, Iraqi Jews. Iranian Jews, Yemenite Jews and Ethiopian Jews, which have been studied for all the autosomal markers used in this study, are included in the analysis. EEJ are defined on the basis of history as those Jews originating from the areas of the Polish-Lithuanian Kingdom and their descendants in bordering regions, encompassing the territories of Russia, Poland, the Baltic States, Belarus, Moldavia, Moldova (the north-eastern part of Romania) and the Ukraine. The Data on the non-autosomal markers were also available for other Jewish populations: Bulgarian Jews (X, mtDNA), Turkish Jews (X, mtDNA), Tunisian Jews (mtDNA), Libyan Jews (Y, mtDNA) and Djerban Jews (Y).

The seventeen autosomal markers are: AK, ADA, PGM1, PGD, ACP, ESD, GPT, HP, GC, J311 MspI & MetH TaqI (both on chromosome 7 near the CF locus), FV G1691A, FII G20210A, MTHFR C677T, CBS 844ins68, ACE ID and PAH XmnI. All the markers are unique-event-polymorphisms, and apart from two insertions (CBS 844ins68, ACE ID) are all SNPs. The first nine markers are polymorphisms of red cell enzymes and serum proteins, and were typed mostly by protein electrophoresis, but the variation at the protein level is directly related in a 1:1 manner to the SNP variation at the DNA level. Indeed, some of the results for the Jewish populations were obtained by PCR methods [1, 14]. The polymorphism of the remaining eight markers can only be detected at the DNA level. J311 MspI and MetH TaqI were typed in all the populations including the Israeli populations (unpublished results) by Southern blotting and hybridization [15, 16]. The other 6 markers were typed in the Israeli populations by PCR methods. The data on FV G1691A, FII G20210A, MTHFR C677T and CBS 844ins68 have already been published [3, 17]. The data on ACE ID and PAH XmnI are still unpublished. These polymorphisms were typed according to the methods of Rigat et al. [18] and Goltsov et al. [19] respectively. Allele frequencies for all the populations are given in Additional file 1: tables S1-4. Table S2 (Additional file 1) presents four markers on both sides of the CF locus. Because of the linkage between them, I chose to use only the two most distal markers, which are separated by a few centimorgans. Haplogroup frequencies of the non-recombining Y chromosome (NRY), the X chromosome (dystrophin locus, dys44, on Xp21.3) and mtDNA are given in Additional file 1: tables S5, S6 and S7 respectively.

Gower (cited in [20]) recommends, that for microevolutionary studies, when sample sizes are quite variable and gene frequencies do not differ greatly, Sanghvi's G2 [21] would be the most appropriate, and this is the measure I used. Distances were also calculated with Nei's [22] formula and the results were very similar (r = 0.990, genetic distance matrix not shown). The neighbor joining tree was computed by PHYLIP 3.66. Since it does not calculate Sanghvi's G2, I used Reynolds et al. distance [23], which is also based on the assumption that gene frequencies change by genetic drift alone, solely for the calculation of the tree (genetic distance matrix not shown). The significance of nodes in the tree and the standard errors of the genetic distances were computed by bootstrapping 10,000 times. Multidimensional scaling plots and Mantel tests for correlation between genetic distance matrices and between them and matrices of geographic distances were computed by NTSYS 1.70. Geographic distances were calculated as great circle distances between the capitals of the countries of origin of the populations (Warsaw was chosen for EEJ). Mantel test significance was assessed by 10,000 permutations.

Results

The autosomal genetic distances (table 1) do not show any particular resemblance between the Jewish populations. EEJ are closer to Italians in particular and to Europeans in general than to the other Jewish populations. All of the distances, apart from one, differ from zero by more than twice their standard error. A difference between two distances can be considered meaningful, if it is more than twice their largest standard error. The differences between the distance of EEJ from Italians and their distances from the other Jewish populations are meaningful according to this criterion, and the same is also true for all the Non-Jewish populations except for Greeks and Russians. In fact the distance between EEJ and Italians is the smallest distance in the matrix. A multidimensional scaling plot of the genetic distance matrix (figure 1) captures the proximity of EEJ to Italians and other European populations. The same is also true for the neighbor joining tree (figure 2). It should be noted that multidimensional scaling plots are a way to present graphically the intricate relationships of genetic distance matrices. As such they are necessarily less accurate than the matrices on which they are based. In order to understand the genetic affinities of a particular population, one must examine its distances in the matrix itself, not in the plot. The same also applies to the neighbor joining tree. The bootstrap values indicate the robustness of the clustering, but not the significance of individual genetic distances.

Table 1 Autosomal genetic distance matrix (×1000) (standard errors above the diagonal)
Figure 1
figure 1

A multidimensional scaling plot of the autosomal genetic distance matrix excluding Ethiopian Jews. Stress = 0.100. Populations names are: EEJ - Eastern European Jews, IqJ - Iraqi Jews, InJ - Iranian Jews, MJ - Moroccan Jews, YJ - Yemenite Jews, Pa - Palestinians, Tur - Turks, Gr - Greeks, It - Italians, Ge - Germans, Br - British, Fr - French, Ru - Russians, Po - Poles. Squares represent Jews and circles non-Jews. Colour indicates geographic region: red - Europe, green - Eastern Mediterranean, blue - Iran-Iraq, purpule - Arabian peninsula, yellow - North-Africa.

Figure 2
figure 2

A neighbor joining tree based on the autosomal polymorphisms. A number next to a node indicates the majority bootstrap support for that node out of 10,000 repetitions.

X-chromosomal haplogroups demonstrate the same relatedness of EEJ to Italians and other Europeans (table 2, figure 3). In contrast, according to the Y-chromosomal haplogroups EEJ are closest to the non-Jewish populations of the Eastern Mediterranean (table 3, figure 4). MtDNA shows a mixed pattern where EEJ are about equally close to Moroccan Jews, Palestinians, Italians and Bulgarian Jews, but overall are more distant from most populations and hold a marginal position in the MDS plot, rather than a central one like in the other plots (table 4, figure 5).

Table 2 X chromosomal genetic distance matrix (×1000)
Figure 3
figure 3

A multidimensional scaling plot of the X-chromosomal genetic distance matrix. Stress = 0.125. Populations names are: EEJ - Eastern European Jews, IqJ - Iraqi Jews, InJ - Iranian Jews, MJ - Moroccan Jews, YJ - Yemenite Jews, EJ - Ethiopian Jews, BJ - Bulgarian Jews, TrJ - Turkish Jews, Pa - Palestinians, It - Italians, Ge - Germans, Po - Poles, Fr - French, Bre - Bretons, Sp - Spaniards, Ba - Basques, EO - Ethiopians Oromo, EA - Ethiopians Amhara. Squares represent Jews and circles non-Jews. Colour indicates geographic region: red - Europe, green - Eastern Mediterranean, blue - Iran-Iraq, purpule - Arabian peninsula, yellow - North-Africa, brown - Ethiopia.

Table 3 Y chromosomal genetic distance matrix (×1000)*
Figure 4
figure 4

A multidimensional scaling plot of the Y-chromosomal genetic distance matrix. Stress = 0.133. Populations names are: EEJ - Eastern European Jews, IqJ - Iraqi Jews, InJ - Iranian Jews, MJ - Moroccan Jews, LJ - Libyan Jews, DJ - Djerban Jews, YJ - Yemenite Jews, EJ - Ethiopian Jews, Pa - Palestinians, It - Italians, Fr - French, Br - British, Ge - Germans, Ru - Russians, Po - Poles, SC - Serbo-Croats, Alb - Albanians, Gr - Greeks, Ma - Macedonians, Ro - Romanians, Tur - Turks, Inn - Iranians-North, Ins - Iranians-South, Iq - Iraqis, Cy - Cypriots, Sy - Syrians, Lb - Lebanese, Jo - Jordanians, SA - Saudi-Arabians, Qa - Qataris, UA - United Arab Emirates, Om - Omanis, Ye - Yemenites, Eg - Egyptians, Mo - Moroccans, Alg - Algerians, Tun - Tunisians, EO - Ethiopians Oromo, EA - Ethiopians Amhara. Squares represent Jews and circles non-Jews. Colour indicates geographic region: red - Europe, green - Eastern Mediterranean, blue - Iran-Iraq, purpule - Arabian peninsula, yellow - North-Africa, brown - Ethiopia.

Table 4 mtDNA genetic distance matrix (×1000)*
Figure 5
figure 5

A multidimensional scaling plot of the mtDNA genetic distance matrix. Stress = 0.110 for the outer plot and 0.161 for the inner one. Populations names are: EEJ - Eastern European Jews, IqJ - Iraqi Jews, InJ - Iranian Jews, MJ - Moroccan Jews, LJ - Libyan Jews, TnJ - Tunisian Jews, BJ - Bulgarian Jews, TrJ - Turkish Jews, YJ - Yemenite Jews, EJ - Ethiopian Jews, Pa - Palestinians, It - Italians, Fr - French, Br - British, Ge - Germans, Ru - Russians, Po - Poles, Sp - Spaniards, Gr - Greeks, Tur - Turks, In - Iranians, Cy - Cypriots, Sy - Syrians, Lb - Lebanese, Jo - Jordanians, SA - Saudi-Arabians, Ye - Yemenites, Eg - Egyptians, MoA - Moroccan Arabs, MoB - Moroccan Berbers, Et - Ethiopians. Squares represent Jews and circles non-Jews. Colour indicates geographic region: red - Europe, green - Eastern Mediterranean, blue - Iran-Iraq, purpule - Arabian peninsula, yellow - North-Africa, brown - Ethiopia.

Correlations between genetic distance and geography and between genetic distance matrices based on different markers (excluding the non-Caucasoid populations Ethiopians and Ethiopian Jews) are shown in table 5. The autosomal polymorphisms have a very high correlation (0.789) with geography in contrast to the more moderate correlations of the X-chromosomal, Y-chromosomal and mtDNA polymorphisms (0.540, 0.395 and 0.641 respectively). In order to compare two competing theories regarding the origin of EEJ, their geographic distances were computed as if they originated from Italy or Israel, i.e. the great circle distances for EEJ were calculated not between Warsaw and other capitals, but between Rome or Jerusalem and other capitals. The correlation between the autosomal genetic distance matrix and geography was slightly higher, 0.804, for Rome but dropped to 0.694 for Jerusalem. Autosomal distances are much better correlated with mtDNA distances (0.826) and with X-chromosomal distances (0.732) than with Y-chromosomal distances (0.437). The correlations between the mtDNA and X-chromosomal matrices and the Y-chromosomal matrix are rather poor (0.206 and 0.241 respectively) and insignificant. When the correlations with geography were only calculated for the genetic distances of EEJ and not for the entire matrix (table 6), the same trends emerge with the autosomal correlation from Rome reaching a high of 0.926. The correlations from Jerusalem are negative for the autosomes, the X chromosome and mtDNA. The reverse is true for the Y chromosome.

Table 5 Correlation and significance level between genetic distance matrices and between genetic distance and geography
Table 6 Correlation between the genetic distances of EEJ and geography*

Discussion

The autosomal genetic distance analysis presented here clearly demonstrates that the investigated Jewish populations do not share a common origin. The resemblance of EEJ to Italians and other European populations portrays them as an autochthonous European population. A study conducted in a New York college in the 1920s point to the same Ashkenazi - Italian similarity on basis of physical characteristics. Freshmen were asked before they knew one another to indicate the origin of their fellow students. Forty percent of the Italians were taken to be Ashkenazi Jews, and the same percentage of Ashkenazi Jews was adjudged Italians [24]. EEJ seem to be mainly Italian (Roman) in origin, which is easily understood, considering the historical evidence presented above.

The high correlation between the autosomal genetic distances and geography and the reduced correlation when EEJ are taken to originate from the Land of Israel reinforce the European origin of EEJ. In fact the correlation of the autosomal markers with geography is higher than previously described for 49 classical markers (0.503) or ~300,000 autosomal SNPs (0.661) in Europe [25]. If for comparison, only non-Jewish European populations are included, the correlation is lower, 0.689, but still higher than the above mentioned correlations. It is also interesting to note how using the three geographic alternatives for EEJ, changes the correlation, when only European populations are included. The correlation remains almost the same, 0.679, for Rome but drops to 0.490 and 0.571 for Warsaw and Jerusalem respectively; further emphasizing the correct geographic origin of EEJ within Europe.

Biparental versus uniparental markers

At first sight it seems that there is more than one explanation for the differing results produced by the analysis of the NRY haplogroups. It thus seems possible that EEJ founder population in Rome was composed of exiled Israelite males and local Roman females. In its simple form this clearly contradicts the facts, because both the autosomal and X-chromosomal polymorphisms demonstrate that EEJ do not occupy an intermediate position between European and Middle Eastern populations, but rather a strict European one. From table 1 it is clear that Italians are as close or closer to the other Jewish populations and Palestinians as EEJ. It is possible that once the founder population was established no other males but many females joined it, thus creating a population that is almost entirely European in all genetic aspects apart from its Y chromosomes. Such phenomenon was described for the population of Antioquia, Columbia, where the autosomes point to 79% of European ancestry and only 16% of Amerindian ancestry, whereas according to mtDNA the ancestry is 90% Amerindian and only 2% European (there is also a small African component). Historical records demonstrate that local Amerindian females joined the population only at its beginning, whereas European males joined it also in later periods [26]. The suggestion that the proselyte ancestors of EEJ were almost entirely females does not however accord with what we know about conversion to Judaism [10, 12, 2729].

The inference that the NRY points to a Middle Eastern origin of EEJ is erroneous not only because the Y chromosomal analysis contradicts the analyses based on the other chromosomes, and because the NRY is a single uniparental marker that does not represent the whole history of the population, but also because its smaller effective population size makes it much more vulnerable to severe genetic drift caused by demographic bottlenecks. The demographic histories of three Jewish populations exemplify how different demographic patterns make the uniparental markers more reliable for Iraqi (Babylonian) Jews and Yemenite Jews and less reliable for EEJ. Both Yemenite Jews and Iraqi Jews resemble populations from their regions of origin according to autosomal markers [1, 3, 3032]. Yemenite Jews, who are usually considered a small isolate, were numerous enough to have an independent kingdom in the first centuries AD [33]. They numbered a few hundred thousand in the 12th century AD, and gradually declined; reaching only about 30-40,000 in the beginning of the 20th century [34]. Babylonian Jews numbered more than a million in the first century AD [35], and constituted the majority of the population in the area between the Euphrates and the Tigris in the 2nd-3rd centuries AD [36]. Gilbert [37] estimates that by 600 AD there were 806,000 Jews in Mesopotamia, and according to Sassoon [38] it was inhabited by about a million Jews in the 7th century. In the 14th century the estimates for Baghdad alone range from 70,000 to hundreds thousands [38]. By 1939, 11 years before their emigration, there were 91,000 Jews in Iraq [35]. In contrast, the Jewish population of the Polish-Lithuanian Kingdom (EEJ) went through the opposite process. Their history is one of founder effects, migrations, demographic bottlenecks and finally a rapid expansion. We know nothing about their number in the first millennium, but after their emigration from Italy to Western Europe it is estimated that they numbered 4,000 in 1000 and 20,000 a hundred years later [8]. In 1500 already in Eastern Europe they numbered 10,000-30,000, in 1648 230,000-450,000 and in 1764 750,000 [3941]. In the 19th century because of the partitions of the Polish-Lithuanian Kingdom and the immigrations of Jews to Central and Western Europe and America, the estimation of the number of EEJ becomes more difficult, but there is no doubt that the increase in numbers was impressive, as the number of EEJ under Russian rule alone was 5,200,000 in 1897 [41].

The existence of severe demographic bottlenecks in the history of EEJ has also been suggested by genetic studies of disease-causing-mutations and mtDNA [4246]. The comparison based on this second uniparental marker, mtDNA, may help to resolve from within genetics itself the problem of the Y chromosome reliability for inferring the origin of the male ancestors of EEJ. If the European and Middle Eastern contributions to the gene pool of EEJ were female and male respectively, then comparisons based on mtDNA must place EEJ among other European populations, distant from Middle Eastern populations. The mtDNA analysis presented in this study does not place EEJ among other European populations rather their position is more intermediate and marginal, as can be seen in figure 5 and in figure 6, where autosomal distances are correlated with mtDNA distances. This lends further support to the notion that because of the unique demographic history of EEJ, their uniparental markers were subjected to stronger genetic drift than the biparental markers and thus should not be used to trace their origin.

Figure 6
figure 6

Correlation of autosomal (X axis) and mtDNA (Y axis) distances. Red circles denote EEJ. Most of the mtDNA distances of EEJ are too high relative to their autosomal distances, in contrast to most other distances (r = 0.826), attesting the greater genetic drift, to which the uniparental markers of EEJ were subjected.

The data on the Y chromosome itself also support the unreliability of the uniparental markers for discovering the origin of EEJ. Nebel et al. [47] studied haplogroup R-M17, whose frequency is ~12% in Ashkenazi Jews. By comparing the structure of the STRs network among the various Ashkenazi populations and among the various European non-Jewish populations they reached the conclusion that a single male founder introduced this haplogroup into Ashkenazi Jews in the first millennium. Behar et al. [48] write "It is striking that whereas Ashkenazi populations are genetically more diverse at both the SNP and STR level compared with their European non-Jewish counterparts, they have greatly reduced within-haplogroup STR variability ... This contrasting pattern of diversity in Ashkenazi populations is evidence for a reduction in male effective population size, possibly resulting from a series of founder events and high rates of endogamy within Europe. This reduced effective population size may explain the high incidence of founder disease mutations despite overall high levels of NRY diversity". It is unlikely that EEJ are the descendants of a single population. Admixture coupled with small effective population size and bottlenecks can create the puzzling situation we encounter in the uniparental markers. Thus smaller contributions from several populations, including possibly the original Middle Eastern Jewish population, and a major contribution from Italy combined with the unique demography of EEJ can create the current genetic picture without the need to invoke a major contribution from the Middle East, which contradicts the autosomal and X-chromosomal data.

Comments on previous studies

Some previous studies based on classical autosomal markers concluded that EEJ are a Middle Eastern population with genetic affinities to other Jewish populations. The problems with these studies have been previously discussed in detail [1]. These studies used fewer markers (mostly the less reliable antigenic markers) and failed to include European Mediterranean populations, apart from the discriminant analysis of Carmelli and Cavalli-Sforza [49], which used only four markers and contradicts the results of the later more elaborate discriminant analysis [1], and the genetic distance analysis of Livshits et al. [32], which includes a single European Mediterranean population, Spain. Despite this when a genetic distance analysis was performed, the greater similarity of EEJ to Russians and to a lesser extent to Germans more than to Non-European Jews was evident [32]. In fact Russians were more similar to EEJ than to any Non-Jewish European population in that analysis.

Recently, Cochran et al. [50] used 251 autosomal loci to calculate genetic distances and concluded that "from the perspective of a large collection of largely neutral genetic variation Ashkenazim are essentially European, not Middle Eastern". More recently, thousands of SNPs were used by Need et al. [51] to infer the relationships between Ashkenazi Jews and non-Jewish Europeans and Middle Easterners. They concluded that Ashkenazi Jews lie approximately midway between Europeans and the Middle Easterners, implying that Ashkenazi Jews may contain mixed ancestry from these two regions, and that they are close to the Adygei population from the Caucasus. However these conclusions are ill-founded, because, they used a highly selected set of SNPs, which were selected specifically for the purpose of distinguishing between Ashkenazi Jews and other populations and they inferred the origin of Ashkenazi Jews from principal components analysis (PCA), but as Tian et al. [52] show "PCA results are highly dependent on which population groups are included in the analysis. Thus, there should be some caution in interpreting these results and other results from similar analytic methods with respect to ascribing origins of particular ethnic groups'" Tian et al. [52] also published a table of paired Fst distances based on 10,500 random SNPs, which demonstrates that Ashkenazi Jews are not at all close to the Adygei population, and similarly to what is seen in table 1, their smallest distance is to Italians and then to Greeks. Unlike the assertion of Need et al. [51] on the midway position, and again similarly to what is seen in table 1, Italians and Greeks are closer to the Middle Eastern populations than Ashkenazi Jews.

The same phenomenon is seen in the table of Fst distances of Atzmon et al. [53]. North Italians (Bergamo and Tuscany) are a little closer to the Jewish and Middle Eastern populations than Ashkenazi Jews. The Italians from Tuscany (surprisingly the sample from Bergamo was not used) in Behar et al. [54] are also closer to the Jewish and Middle Eastern populations than Ashkenazi Jews. The Italians from Tuscany are in fact the closest population to Ashkenazi Jews in Behar et al. [54]. There is one sample that is apparently a little closer, what they call Sephardic Jews. Unfortunately this sample is composed of two populations, Turkish Jews and Bulgarian Jews, which should have been studied separately like all other Jewish populations. Bulgarian Jews have been shown in the past based on autosomal classical markers to be closer to EEJ than to populations with Sephardic ancestry and considering their history it was concluded that the Ashkenazi component in their gene pool is at least as large or even larger that the Sephardic component [1]. From both The current study and those of Atzmon et al. [53] and Behar et al. [54] it can be seen that the only Jewish populations that are as close to Ashkenazi Jews as non-Jewish Europeans are those with a significant Sephardic (The descendants of the Jews who were expelled from the Iberian peninsula at the end of the 15th century) component in their gene pool. It is not possible at this stage to say what is the source of this resemblance, since we don't know what is the origin of Sephardic Jews, but considering all the genetic affinities of both groups it likely stems from Sephardic Jews being the descendants of converts in the Mediterranean basin rather than from a common Jewish origin in the Land of Israel. When one compares the autosomal distances of EEJ (current study) or Ashkenazi Jews (in Atzmon et al. [53] and Behar et al. [54]) from the Jewish populations that were investigated in the current study, Iraqi, Iranian, Moroccan, Yemenite and Ethiopian Jews, one finds perfect agreement. EEJ or Ashkenazi Jews are much closer to non-Jewish Europeans than to these Jewish populations in all three studies.

The studies of Atzmon et al. [53] and Behar et al. [54] are based on 164,894 and 226,839 SNPs respectively. While this impressive number reduces the errors of the distances that stem from the number of markers, the errors that stem from sampling only a small number of individuals are much larger in these studies, where sample sizes can be as small as 2-4 individuals. The effect of these errors can be seen in table 7. Despite the small number of markers the current matrix has the highest correlation with geography. Moreover it has a higher correlation with each of the two other matrices than the two of them have with each other. The high correlations between the current matrix and the other two attest for the robustness of the autosomal genetic distances in this study. The lower correlation between the two matrices, which are based on more than 150,000 SNPs, is surprising and even more so, if we remember that the four non-Jewish populations are represented by exactly the same individuals taken from the Human Genome Diversity Panel (HGDP). It is likely then that sampling more individuals, which represent more of the variation of the investigated populations, is far more important than typing many markers. It is also possible that the typing error rates of genome-wide microarray studies are much higher, as demonstrated by the genotyping errors that were discovered in 7 out of 29 (24%) reexamined SNPs [55]. It seems therefore, that good characterization of the genetic relationships between populations can be achieved by a small number of good unique-event-polymorphisms.

Table 7 Comparison of the correlations of the three autosomal genetic distance matrices*

Conclusions

EEJ are Europeans probably of Roman descent who converted to Judaism at times, when Judaism was the first monotheistic religion that spread in the ancient world. Any other theory about their origin is not supported by the genetic data. Future studies will have to address their genetic affinities to various Italian populations and examine the possibility of other components both European and Non-European in their gene pool.

Reviewers' comments

Reviewer's report 1

Damian Labuda, Pediatrics Department, Montreal University Sainte-Justine Hospital Research Center, Montreal, PQ Canada (nominated by Jerzy Jurka, Genetic Information Research Institute, Mountain View, California USA).

The author compiled and reanalyzed the data on autosomal and sex chromosomes polymorphisms collected by different laboratories on different Jewish and West-Eurasiatic populations. His analysis indicates much greater European component of Eastern European Jews, EEJ (essentially Ashkenazim) than of other Jewish groups. Moreover the analysis points to Italians as the closest population to EEJ.

The question is how to interpret this evidence. Imperial Rome was a very cosmopolitan city culturally and genetically diverse. To what extent a sample of contemporary Italians preserves the genetic link to its population? It can simply reflect a mixture of historical influences from different centers around the Mediterranean Sea. We should thus keep in mind that the Italian connection may simply indicate Southern European and Mediterranean links with the latter including Middle Eastern roots.

Interestingly, this analysis that is based on a limited number of markers provided results that are very similar to a paper of Atzmon and colleagues, published five days ago in the American Journal of Human Genetics, and based on the microarray-based genotyping genome of wide distributed markers. I would like the author to comment on this paper in the context of his findings and his thoughts and reflections on the origin of Jewish Diasporas. Should we go back to the single locus analyses, as in the case of uniparentally transmitted markers, but targeting one by one different individual segments of the nuclear genome? Perhaps, in this way we could partition and identify genetic ancestries of different populations, which due to their history of relative isolation, are considered as genetically homogenous.

The author refers to Sangvi's G2 as the most appropriate distance metrics. Could you make it more clear when this metric was used and when that of Reynolds (only to produce a tree?).

Author's response

The historical sources listed above show that conversion to Judaism was common in ancient Rome among all ranks of the Roman society including the imperial families. It is thus unlikely that the original Roman population did not constitute a significant portion of the proselytes. What else can explain the resemblance of EEJ to a general sample of Italians in this study and to more local samples in the two array studies [53, 54]? In all three studies the genetic affinities of the Ashkenazim are very similar to the affinities of the Italians, with the Ashkenazim usually being a bit more distant from the other populations, as can be expected from a population that underwent a stronger genetic drift. It is thus unlikely that the Ashkenazim are a mixture of people from different places in the Mediterranean basin, unless current-day Italians themselves not only have absorbed foreign genetic contributions, but actually constitute such a mixture, and this seems unlikely as well. The very high correlation (0.926) between the genetic distances of EEJ and geographic distances, when the latter are calculated from Rome, also supports the origin of EEJ from Italy or its vicinity and not merely from the Mediterranean basin. The similarity to Italians was also evident when several Italian populations from different provinces were included in a comparison based on classical autosomal markers. Most Italian populations were closer to EEJ than all other populations (data not shown).

My comments on the papers by Atzmon et al. [53] and Behar et al. [54] are in the discussion. Studying autosomal haplotypes will indeed contribute to revealing the ancestries of populations, but in order to gain meaningful insights one ought to study at least several loci and ensure that sample sizes are adequate, this may entail more effort than studying single SNPs, and I am not sure that the affinities between the populations are going to be depicted more accurately. I changed the phrasing in Methods to make it clearer that the formula of Reynolds et al. was only used for the calculation of the tree.

Reviewer's report 2

Kateryna Makova, Department of Biology, Penn State University, Pennsylvania USA.

This is an interesting manuscript that presents intriguing results. I have only a few comments:

  1. 1.

    The introduction is very short, while the discussion is lengthy. I suggest moving parts of the Discussion to the Introduction.

  2. 2.

    Some of the statements in the Discussion are too strong. I disagree with statements about "erroneous Y chromosomal genetic distances", "both uniparental markers should not be used to trace their origin", "uniparental markers being unreliable". The author should modify them.

Author's response

I moved the paragraph on the history of EEJ to the Introduction. The current revised version of the paper includes a new comparison based on mtDNA. I maintain that it adds more weight to my assertion that the uniparental markers should not be used to trace the origin of EEJ. In no way did I mean that the uniparental markers are always unreliable; to clarify it I modified the relevant sentence in the discussion. Indeed from the demographic examples that I give in the Discussion, it seems that the uniparental markers can be used to study the origins of Iraqi Jews and Yemenite Jews.

Reviewer's report 3

Qasim Ayub, The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK (nominated by Dan Graur, Department of Biology and Biochemistry, University of Houston, Houston, USA).

The paper by Zoossmann-Diskin entitled 'The origin of Eastern European Jews revealed by autosomal and sex chromosomal polymorphisms' explores autosomal and sex chromosomal polymorphisms in six Jewish populations using previously published and additional unpublished data. The author concludes that the Jewish populations examined do not share a common origin and that Eastern European Jews are closer to the Italian population.

My major concern is the choice of markers and populations used in this study. The author has analyzed 17 autosomal loci, including 9 polymorphic protein electrophoretic variants in which the genotype was assumed. Although phenotypes often do correlate with genotypes assuming that they do can lead to erroneous results. Of the remaining 8 it is unclear whether the same samples were genotyped as the sample numbers for each locus vary widely (Supplementary Tables 2-4).

The author also uses Y hapologroup frequencies and shows a multidimensional scaling plot of Y chromosomal genetic distance matrix. However, the supplementary data (Supplementary Table 5) lists an outdated nomenclature for Y haplogroups as the M78 marker is no longer considered part of haplogroup E3b1. It would be more appropriate to list which markers are used to designate the haplogroups to ensure that they are comparable. In addition, the haplogroups that are selected for these analyses do not provide phylogenetic resolution to reliably detect male genetic sub-structure within the Middle East. The omission of recent mtDNA studies (Behar et al., 2008, PLoS One 3:e2062) is surprising as is the use of a single X chromosomal locus (DYS44) to make broad conclusions about genetic relatedness.

Current evidence, supported more recently by two major studies carried out on Jewish populations (Atzmon et al., Am J H Genetics 86:850-859; Behar et al., Nature doi:10.1038) using a much larger dataset clearly demonstrate a common genetic thread linking the diverse Mizrahi, Sephardic and Ashkenazi Jewish populations with the populations from the Levant and Middle East. The Ashkenazi show a European component but this is shared with many Eastern and Southern Europeans populations. These studies contradict the author's conclusion and demonstrate the power of using unbiased markers and host populations in corresponding geographic regions to address issues such as genetic relatedness among Jewish and non-Jewish populations

Author's response

I am not sure what Dr Ayub means by "assumed", but I suspect that he means something like the relationships between phenotype and genotype in certain blood groups, in which one (or more) allele is dominant over the other and the gene frequencies of the alleles have to be inferred from the phenotypes assuming Hardy-Weinberg equilibrium. In such cases there may indeed be errors in the gene frequencies. Protein electrophoretic markers are completely different. Nothing is inferred! As mentioned in Methods all the protein electrophoretic markers in this study represent a SNP at the DNA level. This SNP causes an amino acid change that can be detected at the protein level. Both alleles are directly viewed on the gel in the same way as both alleles of an RFLP are directly viewed on the gel. Gene frequencies are determined in both cases by simple gene counting and the error rate in protein electrophoresis is no greater than in DNA studies. There is no need to type the same samples for all the polymorphisms, because the unit of study is the population, not the individual. One can use polymorphisms typed by different researchers using different samples and combine them to create a genetic profile of each population. Typing all the polymorphisms on the same sample does not add more credibility to the study. Indeed the renowned works that employed classical autosomal markers to portray the genetic affinities of human populations were based on many different samples typed by many different researchers [56, 57].

The nomenclature in the Y chromosome supplementary table has been updated. Following the publication of the study by Behar et al. [54] it was possible to add more Jewish populations to the Y chromosome analysis and increase the number of chromosomes for the Jewish populations. This increase has come however at the expense of resolution, because Behar et al. [54] used fewer haplogroups in their analysis. Consequently the number of haplogroups was reduced from 15 in the original version to 14 in this revised version. I would have been happier if the available data on the Jewish populations had enabled greater resolution to reliably detect male genetic sub-structure within the Middle East, but since this work deals with the genetic affinities of EEJ, the current level is sufficient. The work of Behar et al. from 2008 was instrumental in creating the mtDNA matrix as can be seen in table 7 in Additional file 1. There was no need to cite it previously, as it did not contain any genetic distance analysis that could further clarify the origin of EEJ. I am surprised at Dr Ayub's surprise at the use of a single X chromosomal locus. It would have been better to use many X chromosomal loci, but even the use of single loci is advantageous, as I am sure even Dr Ayub would agree regarding the two other single loci that I use, the non-recombining Y chromosome (NRY) and mtDNA.

As written in the Discussion the genetic distance matrices of Atzmon et al. [53] and Behar et al. [54] do not contradict my results, but reinforce them. I completely reject Dr Ayub's claim that the markers or populations I used are biased in anyway, and I let the reader judge, where exactly the bias lies.