Introduction

Rice (Oryza sativa L.) is the most significant crop, feeding approximately half the population of the world. It is believed that rice was domesticated from common wild rice (O. rufipogon Griff.) in East Asia about 10,000 years ago (Doebley et al. 2006). Common wild rice is photoperiod sensitive, flowers at short-day condition, and mainly exists in South and Southeast Asia (Vaughan et al. 2003). However, O. sativa has been spread all around the world, and some can even flower in long-day condition. Previous studies have suggested that the allelic variation of Heading date 1 (Hd1) in O. sativa might be one possible reason for this phenomenon (Yano et al. 2000; Tsuji et al. 2008; Takahashi et al. 2009; Huang et al. 2012a).

Hd1 is a major photoperiod-sensitive gene and plays a significant role in regulating rice flowering. Previous research revealed that clock genes receive signals from light and circadian clocks and regulate expression of Hd1 (Yano et al. 2000). Then Hd1 regulates the expression of a mobile flowering signal florigen, Hd3a, and controls the flowering of rice (Tsuji et al. 2008). Hd1 is regarded as the major determinant of the variation in flowering-time diversity in cultivated rice, and the high degree of polymorphisms in Hd1 is believed to contribute to the regulation of rice flowering time (Takahashi et al. 2009). Although more than 60 cultivated rice accessions have been sequenced and plenty of nucleotide variations have been obtained (Takahashi et al. 2009; Fujino et al. 2010; Huang et al. 2012a), which variations are significantly associated with flowering time is not yet completely clear. Thus, the association of Hd1 with flowering time was analyzed in the present study to detect these significant variations.

Traditionally, O. sativa is divided into two subspecies: O. sativa L. ssp. indica Kato and O. sativa L. ssp. japonica Kato. These two rice subspecies can be distinguished by both DNA markers and morphologic characteristics. The origin of indica and japonica has been studied for decades, but still with quite varied conclusions (Londo et al. 2006; Gao and Innan 2008; Molina et al. 2011; Huang et al. 2012b; Wei et al. 2012a). However, according to rice diversity research, O. sativa can be divided into several groups. Not only indica and japonica, but also aus should form one group of O. sativa (Garris et al. 2005; Kovach et al. 2007; Zhao et al. 2010). Aus is known as pre-kharif rice or autumn rice which grows mainly in India and Bangladesh. It is usually planted during May to August and harvested in autumn. It was selected from other varieties based on the character of earliness. Earliness is the main character of aus, and most varieties of aus are light insensitive. Few studies about the domestication of aus have been reported. A recently published paper which analyzed the domestication of O. sativa by genome-wide patterns and rich material from a wide range suggests that ancient japonica was first domesticated from O. rufipogon around the middle area of the Pearl River in Southern China and that indica was subsequently developed from crosses between ancient japonica and local wild rice as the initial cultivars spread into Southeast and South Asia, then initial indica diverged into indica and aus (Huang et al. 2012c). In the present study, japonica, indica, and aus varieties were collected and sequenced to analyze the phylogeny and domestication process of Hd1 in these groups.

Hd1 shows intriguing variations of functions at 31°N and 23.5°N (Izawa 2007). It promotes flowering in the short-day condition in geographical regions south of 31°N but represses flowering in the long-day condition in areas north of 31°N, while south of the Tropic of Cancer (TOC, 23.5°N), Hd1 tends to be nonfunctional and results in late flowering due to its insensitivity to photoperiod (Huang et al. 2012a). In the present study, all the aus varieties were from tropical areas south of the TOC, but the japonica varieties ranged from 10°N to 45°N and the indica varieties ranged from 5°N to 30°N. Therefore, japonica could be divided into tropical japonica, subtropical japonica, and temperate japonica according to the geographical divisions at 31°N and the TOC. Also, indica could be divided into tropical indica and subtropical indica according to the geographical division at the TOC. The diversity of Hd1 in different groups and the relationship between Hd1 in geographical regions are also investigated.

In the present study, 92 accessions of O. sativa and 111 accessions of O. rufipogon were collected and sequenced. All cultivated samples are landraces (pure-line varieties developed by farmers without artificial intercrossing). The landraces and O. rufipogon were collected from their mini-core collections in China. The mini-core collections were identified from China National Genebank, which included 50,526 landraces and more than 10,000 accessions of O. rufipogon, by morphological traits and simple sequence repeat (SSR) markers. Through a hierarchical sampling strategy, the mini-core collections of O. sativa and O. rufipogon retained more than 70 % of the morphological variation in all germplasm collections (Zhang et al. 2011). Sixty O. sativa accessions originating from other countries have been sequenced and published (Fujino et al. 2010). These sequences are also included in our analysis. In total, Hd1 of 263 accessions of O. sativa and O. rufipogon with high diversity has been sequenced and analyzed. Such study with rich and representative materials can provide much significant information to aid further understanding of the domestication process of Hd1.

Materials and methods

Sampling and phenotypic data collection

The materials used in this study included 92 accessions of cultivated rice (Table 1), 111 accessions of O. rufipogon, and 1 accession of O. barthii A. Chev. (Table 2). The selected 46 indica and 46 japonica landraces were from 23 different provinces in China. To distinguish indica and japonica, we investigated all individuals by Cheng’s index method (Lu et al. 2009), which has been popularly used in China. O. rufipogon was procured from Guangzhou and Nanning Wild Rice Field Genebanks, except for the wild rice of Yunnan Province, which was collected directly from distribution sites. The O. rufipogon accessions were investigated carefully throughout their whole lifespan to remove the individuals which had significant gene flow from O. sativa from the samples. Currently, O. rufipogon only exists in seven provinces in South China, and samples originating from all seven provinces were included in our research. O. barthii was provided by the International Rice Research Institute and used as the outgroup sample. The geographic localities of O. rufipogon and landraces from China sampled in our research are shown in Fig. 1. Sixty O. sativa accessions originating from other countries were sequenced (Table 3), and the sequences were added in the following analysis.

Table 1 Details of the landraces from China
Table 2 List of wild samples used in the study
Fig. 1
figure 1

Geographic origins of the materials from China. Red circles indicate O. sativa L. ssp. indica Kato; blue circles indicate O. sativa L. ssp. japonica Kato; green triangles indicate O. rufipogon Griff. (Color figure online)

Table 3 List of the cultivated samples used in Fujino et al. (2010)

All landraces were planted in late November at Sanya, Hainan (southernmost China) from 2009 to 2011 and grew mostly under short-day conditions. Ten plants were transplanted in a single row with 20 cm between plants and 30 cm between rows of different accessions. Field management was performed following normal agricultural practices. Heading date was defined as the days from sowing to the appearance of 50 % panicles.

DNA extraction, PCR amplification, cloning, and sequencing

Total DNA extraction and polymerase chain reaction (PCR) amplification were conducted generally following the methods of our previous studies (Wei et al. 2012a; Qiao et al. 2012). Primers of Hd1 are listed in Table S1. Almost the entire gene region of Hd1 was sequenced, including the promoter, exon 1, intron, exon 2, and 3′-untranslated region (UTR) (Fig. 2). Sequencing was performed by an ABI 3730 automated sequencer (Applied Biosystems, USA). Initially, all of the samples were directly sequenced, but if the accession was a heterozygote, the PCR product was ligated into the EASY vector (Transgen, China). Independent plasmid DNA was then selected randomly, and at least four clones were sequenced so that the sequences of both alleles would be obtained. Because Taq errors did occur, when polymorphisms were only found in one of the accessions, this accession was resequenced with the cloning step to ensure these polymorphisms were not false. Because heterozygous individuals exist in O. rufipogon, two sequences of alleles were obtained for some wild samples.

Fig. 2
figure 2

Structure of Hd1 and locations of the regions sequenced. The regions amplified and sequenced are shown as blue lines. (Color figure online)

Population structure analysis

Twenty-four simple sequence repeat (SSR) markers were used to detect the structure of the cultivated samples: RM529, RM522, RM526, RM211, RM411, RM60, RM518, RM348, RM574, RM274, RM508, RM412, RM427, RM172, RM339, RM408, RM553, RM321, RM484, RM239, RM224, RM479, RM247, and RM463. One exists in each short and long arm of the 12 rice chromosomes. PCR was performed as described above, and PCR products were separated on 6 % polyacrylamide denaturing gels to determine the alleles of each marker. The STRUCTURE 2.3.2 program (Falush et al. 2003) was used to infer the population structure with burn-in of 100,000, run length of 100,000, and a model allowing for admixture and correlated allele frequencies. Number of subpopulations K from two to ten was tested, and ten independent runs yielded consistent likelihoods of the population structure for each K.

DNA sequence analysis, neutrality test, and association analysis

The DNA sequences were aligned using the ClustalX program (Thompson et al. 1997) and manually adjusted in BioEdit (Hall 1999). The number of segregating sites (S), the number of haplotypes (h), the haplotype diversity (Hd), and two parameters of nucleotide diversity, namely π (Nei 1987) and Watterson’s estimator from S (θ w) (Watterson 1975), were calculated by DnaSP version 5.0 (Rozas 2009).

Two neutrality tests, namely Tajima’s D value and Fu and Li’s D */F *, were calculated for all loci to test the neutral mutation hypothesis. Tajima’s D (1989) is based on the discrepancy between the mean pairwise differences (π) and Watterson’s estimator (θ w), whereas Fu and Li’s D */F * (Fu and Li 1993) relies on the differences between the numbers of polymorphic sites in external and internal groups. In these two tests, negative values indicate an excess of low-frequency polymorphisms, whereas positive values indicate an excess of intermediate variants.

Association between the phenotypes and sequences was analyzed by TASSEL 2.1 (Bradbury et al. 2007). Single-nucleotide polymorphism (SNP)/indel–trait associations were identified by generating a general linear model (GLM). Linkage disequilibrium (LD) was estimated by DnaSP using standardized disequilibrium coefficients and squared allele–frequency correlations (r 2) for pairs of SNP loci.

Phylogenetic analysis

The haplotype networks of the coding region were constructed by mutational steps with NETWORK 4.6 (Bandelt et al. 1999). Only major haplotypes containing two or more individuals were used. The phylogenetic relationships among the major haplotypes of the coding region and all haplotypes of the whole region were constructed by neighbor-joining (NJ) analysis (Saitou and Nei 1987) using MEGA 4.0 (Tamura et al. 2007). Gaps were treated as missing values, and these sites were excluded from the data matrix. In the NJ analysis, we followed Kimura’s two-parameter model (Kimura 1980). The nonparametric bootstrap test was performed to quantify the confidence of internal nodes with 1,000 replications.

Results

Nucleotide diversity and neutrality analysis

The whole genomic DNA sequences of Hd1 from the 92 cultivated and 111 wild accessions were sequenced, and the nucleotide diversity of Hd1 in them was analyzed. High degree of polymorphisms and long insertions and deletions were detected in Hd1. The gene lengths ranged from 2,811 to 3,476 bp because of the long indels. In total, 63 indels and 82 SNPs were found. In different regions of Hd1, the polymorphisms varied. As shown in Table 4, the pairwise nucleotide diversity parameter (π) and the level of the Watterson estimator (θ w) of promoter were higher than for other regions for O. rufipogon and indica. This result is in line with some other research into another photoperiod gene (Ghd7) in rice (Lu et al. 2012). However, for japonica, the first exon had the highest diversity. In the neutrality analysis, only the values for O. rufipogon were significantly negative, suggesting negative selection or quick population expansion of O. rufipogon.

Table 4 Summary of nucleotide polymorphisms and neutrality test

Association analysis

A population structure of the cultivated rice was constructed using the 24 SSR markers (Fig. 3). The structure can be classified into three subpopulations because the highest log-likelihood scores of the population structure were observed when the number of populations was set at 3 (K = 3; Fig. S1). The first subpopulation (subpopulation 1) contained 51 accessions, of which 86 % were indica varieties; subpopulation 2 contained 8 japonica varieties; and subpopulation 3 contained 33 accessions, of which 94 % were japonica varieties (Table 1). In fact, when K = 2, subpopulations 2 and 3 were in the same cluster (Fig. S2). Further analysis revealed most japonica in subpopulation 2 were located south of the TOC, while all japonica varieties in subpopulation 3 existed north of the TOC, implying that cluster 2 might be tropical japonica and cluster 3 might be temperate japonica. LD was detected in the whole genomic region of Hd1 and fast LD decay was observed (Fig. S3), suggesting weak association between the SNPs in Hd1.

Fig. 3
figure 3

Structure of the 92 landraces constructed by 24 SSR markers. K = 3. Subpopulations are indicated by different colors. (Color figure online)

Taking the population structure data as covariates (Table 1), we used GLM to identify SNP/indel–trait associations separately for 3 years. SNPs and indels at less than 5 % were excluded. No SNP was found to be related to the flowering date in the association analysis. Five significant associated indels were detected and were the same between different years (Table 5). All insertions and deletions were located in the coding region. S3527 and S4199 were 2- and 4-bp deletions, respectively, both of which would lead to loss of function of Hd1. These deletions had also been detected in previous research (Takahashi et al. 2009; Fujino et al. 2010). S1081, S1539, and S1668 were 3-, 33-, and 156-bp insertions, respectively. The long insertions might lead to partial loss of function of Hd1 (Yano et al. 2000). Thus, S1081, S1539, S1668, S3527, and S4199 were the sites associated with flowering time of rice. All of them were in the coding region and would weaken function of Hd1.

Table 5 Results of GLM association analysis

Hd1 protein diversity

As revealed in the association study, flowering time of rice is strongly related to the coding region rather than to other regions in Hd1. The diversity of Hd1 in all cultivated samples was analyzed to further investigate the variations. Sixty alleles published in a previous study (Fujino et al. 2010) were also added to the analysis. Forty-two protein types were identified, including 31 protein types in this study and 18 protein types in the previous study (Fig. 4). Besides the 18 protein types reported previously, 24 new protein types were found in the present study. Compared with the sequence of the functional Hd1 allele from the japonica cultivar Ginbouzu (Takahashi et al. 2009), a total of 44 mutation events, including 6 insertions, 2 transposons, 12 deletions, 6 nonsynonymous SNPs, 15 synonymous SNPs, and 3 premature stop codons were detected in the coding region. Twelve mutation events were identified as generating loss-of-function alleles based on their predicted effects on Hd1, including two transposons, three premature stop codons, three 1-bp deletions, one 2-bp deletion, one 4-bp deletion, one 43-bp deletion, and one 144-bp deletion in the end.

Fig. 4
figure 4

Nucleotide changes in the coding region of Hd1 among cultivated rice. Categories R, S, N, I, D, and Tp indicate replacement, synonymous, noncoding site, insertion, deletion, and transposon, respectively. Numbers indicate size of insertions or deletions. Numbers in the right column are numbers of cultivars represented in every protein type. A, I, and J indicated aus, O. sativa L. ssp. indica Kato, and O. sativa L. ssp. japonica Kato. Variations that would not lead Hd1 to lose function are shown in yellow; variations that would lead Hd1 to be nonfunctional are shown in red. Sixty cultivars from a previous study (Fujino et al. 2010) were also included. (Color figure online)

All 42 types of coding region in Hd1 were compared with that of Ginbouzu, as shown in Fig. 4. This clearly showed that the coding region sequences could be divided into two different groups: one contained the 33- or 156-bp insertions and had several different polymorphisms with Ginbouzu, while the other did not contain the long insertions and was more similar to Ginbouzu. As mentioned in the association analysis, the association containing the long insertions might have lost part of Hd1 function, so the first group could be regard as a part loss function group and the second group as a functional group. Both groups contained about half cultivated samples, but the first group included indica, japonica, and aus varieties while the second group mainly contained indica and japonica individuals. This result is in line with the fact that most aus varieties are insensitive to photoperiod.

Haplotype analysis

Haplotype networks were constructed by the whole gene region for the samples from the mini-core collections in China (Fig. 5) and the coding region for all accessions (Fig. 6). The whole gene region network contained 101 haplotypes: 20 O. sativa haplotypes, 86 O. rufipogon haplotypes, and 1 O. barthii haplotype. O. sativa and O. rufipogon shared six haplotypes.

Fig. 5
figure 5

Haplotype networks of the Hd1 whole region. Circle size is proportional to the quantity of samples within a given haplotype. Lines between haplotypes represent mutational steps between alleles. Colors for species: yellow, O. rufipogon Griff.; orange, O. sativa L. ssp. indica Kato; blue, O. sativa L. ssp. japonica Kato; green, aus; black, O. barthii A. Chev. FJ Fujian, GD Guangdong, GX Guangxi, HN Hainan, HuN Hunan, JX Jiangxi, YN Yunnan. (Color figure online)

Fig. 6
figure 6

Haplotype networks of the Hd1 coding region. Circle size is proportional to the quantity of samples within a given haplotype, and the numbers next to the circles represent the haplotype number. Lines between haplotypes represent mutational steps between alleles. Colors for species: yellow, O. rufipogon Griff.; orange, O. sativa L. ssp. indica Kato; blue, O. sativa L. ssp. japonica Kato; green, aus; black, O. barthii A. Chev. (Color figure online)

As shown in Fig. 5, O. rufipogon samples from the same region tended to be in neighboring haplotypes. The phenomenon of geographical difference in O. rufipogon was also detected in previous research (Wang et al. 2008; Wei et al. 2012b). However, obvious association between the distribution of cultivated accessions and the haplotypes had not been observed.

Figure 6 shows the coding region network constructed by 12 haplotypes, including 7 O. sativa haplotypes, 8 O. rufipogon haplotypes, and 1 O. barthii haplotype. All haplotypes were major, containing two or more individuals, except the O. barthii haplotype. Among these haplotypes, H1, H2, H3, and H4 were shared by both O. sativa and O. rufipogon. H1 and H4 belonged to the protein group which had the 33- and 156-bp-long insertions, while H2 and H3 belonged to the other protein group that was similar to Ginbouzu. H1 and H3 contained 1 indica accession and 16 japonica accessions. H2 contained 52 indica accessions, 7 japonica accessions, and 2 aus accessions. In fact, five japonica accessions in H2 were divided into subpopulation 1 in the structure analysis which included most indica. Thus, H3 and H2 could be regarded as japonica haplotype and indica haplotype, respectively. Indica, japonica, and aus could all be found in H4 almost averagely, including 15 indica accessions, 27 japonica accessions, and 17 aus accessions. It is hard to define H4 as an indica, aus or japonica group.

We also used the NJ method to construct the phylogeny of major haplotypes for the coding region (Fig. 7). The phylogenetic result was quite similar to the haplotype network. All branches were divided into two groups: one contained H2, H3, and a major O. rufipogon haplotype, while the other one included H1, H4, and other small O. rufipogon haplotypes.

Fig. 7
figure 7

Phylogenetic tree of Hd1 coding region. Each haplotype of the loci is indicated by one branch. H1, H2, H3, and H4 are given. Bootstrap values are shown on the trees. The accessions contained in the haplotypes/branches are indicated by different symbols: O. rufipogon Griff. alleles by diamonds, aus alleles by squares, O. sativa L. ssp. indica Kato alleles by triangles, and O. sativa L. ssp. japonica Kato alleles by inverted triangles. Trees were rooted with O. barthii A. Chev. alleles, indicated by solid circles

Geographic distribution

According to geographical division at 31°N and the TOC, we further divided O. sativa into six subpopulations: aus, tropical indica, tropical japonica, subtropical indica, subtropical japonica, and temperate japonica. The relationship between the protein type and the distribution for each subpopulation was analyzed. Figure 8 shows the distribution of Hd1 protein type of the six subpopulations of O. sativa in the present study. Aus and tropical japonica mainly evolved from O. rufipogon in haplotype H4; tropical and subtropical indica mainly evolved from O. rufipogon in H2 with a small part evolved from H4, while subtropical and temperate japonica evolved from O. rufipogon in H3 and H4. These results reveal that the Hd1 protein type evolved from H4, with part loss of function, was transferred into all subpopulations of O. sativa.

Fig. 8
figure 8

Hd1 protein type distribution of the six O. sativa subpopulations in Asia. Hd1 protein types evolved from H2, H3, and H4 are indicated in red, blue, and green, respectively, in solid circles. Squares in red, yellow, and blue represent total loss of function, part loss of function, and functional Hd1 protein, respectively. The size of the circles and squares is proportional to the quantity of samples. Detailed quantitative information is presented in Table S2. (Color figure onine)

Among these subpopulations, aus and tropical japonica contained the highest percent of Hd1 protein which had lost its function. Even protein type from H4, which might have partly lost its function, would lose its function totally in later artificial selection. In contrast, subtropical indica and subtropical japonica have the highest percent of functional Hd1 protein. Most of the protein types evolving from H2 and H3 were fully functional, while few of them lost function in later artificial selection.

Discussion

Diversity of Hd1 in O. sativa and O. rufipogon

The diversity of O. sativa and O. rufipogon in different genes had been analyzed in our previous studies (Wei et al. 2012a, b; Qiao et al. 2012). The results indicated that the diversity of O. rufipogon is much higher than that of O. sativa and that the diversity of indica is higher than that of japonica. In the present study, we also found that the segregating sites and haplotype numbers of O. sativa were less than those of O. rufipogon. With the haplotype numbers used as a proxy for diversity, common wild rice contained 86 % of the total haplotype diversity, whereas cultivated rice only contained 20 % of the total haplotype diversity, indicating a strong genetic bottleneck during domestication. The diversity of japonica was higher than that of indica in the whole gene region except in exon 2. This result is quite different from previous research. It might be related to the wider distribution of japonica. One possible explanation is that domestication made japonica varieties adapt to the varied light conditions in different areas, and more polymorphisms resulted from both natural and artificial selection.

Association analysis of Hd1

Association analysis revealed that loss-of-function deletion and long insertion in the coding region contributed to the diversity of rice flowering date and transformed rice from a typical short-day plant to a facultative short-day plant. This result suggests that the coding region was the main target region for artificial selection and that indels that affect Hd1 protein function were the major selection methods.

Besides the previously reported 2 bp and 4 bp in the coding region that lead to loss of function, three other kinds of mutations associated with flowering in short-day condition were also identified. These four kinds of mutations would all delay flowering for rice to adapt to different light conditions; For example, in the aus subpopulation, the 2-bp and 4-bp deletions caused its loss of function, reduced the floral expression level, and suppressed its flowering. This variation allows the plant to flower later in the short day and improves field production. On the other hand, in temperate region where temperate japonica varieties are widely cultivated, the light period is usually long. Nonfunctional Hd1 would not inhibit the expression of Hd3a and avoid too late flowering. Additionally, in the subtropical region, both indica and japonica have high percentage of functional Hd1. Thus, to enable them to complete the life cycle in a short summer period in temperate region and to help them avoid too early flowering in tropical area, mutations that would lead to total loss of function of Hd1 should be transferred into varieties cultivated in these regions. Specific markers could be developed for selection of favorable protein types to meet the demand for varieties in different ecotypes.

Domestication of Hd1

Since O. sativa and O. rufipogon in the same haplotypes had the same nucleotide polymorphisms, O. sativa might have evolved from the O. rufipogon groups in the same haplotypes rather than the O. rufipogon groups in other haplotypes. So, the O. rufipogon haplotypes shared with O. sativa were regarded as direct ancestors of O. sativa. Figure 6 reveals that the ancestor O. rufipogon could be divided into two groups. One group (H1 and H4) contained individuals including long insertions and tended to be partly nonfunctional, while the other group (H2 and H3) did not include the long insertions and tended to be functional. The functional group could be further divided into an indica subgroup (H2) and a japonica subgroup (H3) which shared haplotype with indica and japonica, respectively, indicating that functional O. rufipogon diverged into indica-like and japonica-like groups in natural environment. We concluded that Hd1 in indica and japonica were domesticated from that in indica-like and japonica-like O. rufipogon groups, respectively.

Generally, the debate about rice domestication has focused on the origin of indica and japonica. Some researchers insist on a “single origin” of the two subspecies and suggest that both subspecies were domesticated from one group of O. rufipogon in a narrow region (Gao and Innan 2008; Molina et al. 2011; Huang et al. 2012b), while others propose “multiple origins,” which means that domestication of the two subspecies occurred independently in different ecological and geographical environments (Cheng et al. 2003; Zhu and Ge 2005; Londo et al. 2006). Thus, our results support multiple origins of indica and japonica. This conclusion has obtained much support from various rice domestication research (Kovach et al. 2007; Sang and Ge 2007; He et al. 2011; Yang et al. 2011).

As another important group of O. sativa, aus was somewhat different from indica and japonica in terms of the domestication of Hd1. Most aus accessions shared haplotypes with O. rufipogon in H4, which was obviously different from that in H2 and H3, containing long insertions that would make the Hd1 protein partly lose its function. Therefore, we concluded that Hd1 in aus were domesticated from O. rufipogon group, partly losing its function (H4), while indica and japonica evolved from functional O. rufipogon, which diverged into indica-like (H2) and japonica-like (H3) groups. Moreover, some varieties of indica and japonica also existed in H4, indicating that this haplotype of Hd1 might be widely transferred into aus, indica, and japonica to help them adapt to different photoperiods.

Figure 8 shows the Hd1 gene in the six subgroups of O. sativa in tropical, subtropical, and temperate regions divided by the TOC and 31°N. As shown in Fig. 8, Hd1 in the six subgroups of O. sativa were domesticated from those in different O. rufipogon groups, and Hd1 in O. sativa varieties that evolved from that in O. rufipogon in H4, which contain the long insertions, could be found in each subgroup of O. sativa. This result indicates that Hd1 which contained the long insertions had been transferred into all subgroups of O. sativa. However, the content of the varieties which evolved from the long-insertions Hd1 in the six subgroups were quite different. Most Hd1 in aus and tropical japonica evolved from that in O. rufipogon in H4, but only a small percentage of Hd1 in subtropical indica evolved from that in O. rufipogon in H4. This phenomenon might result from different light conditions in the tropical and subtropical regions, as explained in the association analysis.

The proportions of functional Hd1 versus nonfunctional Hd1 for each subgroup of O. sativa are shown in Fig. 8. This shows that Hd1 in aus and tropical japonica which existed in the south and most Hd1 in temperate japonica which existed in temperate region lost function totally or partly, and that most Hd1 in subtropical indica and subtropical japonica were functional in the subtropical region. This result suggests that Hd1 tends to be selected to be nonfunctional when spread to tropical and temperate regions. Moreover, even Hd1 in O. sativa, which was supposed to be domesticated from part functional loss (H4) Hd1 in O. rufipogon, was also selected to be nonfunctional. We conclude that artificial selection of nonfunctional mutations occurred after domestication in the breeding of varieties to adapt to the changed light conditions when the varieties were brought into a new environment.

Geographic origin of Hd1

It has been proposed that Asian cultivated rice originated from South Asia (Londo et al. 2006), Southern China (Ting 1957), the lower area of the Yangtze River in China (Vaughan et al. 2008; Zong et al. 2007), and Yun-Gui Highland (Liu 1975) based on different evidence, but the molecular evidence was insufficient. In previous study, to determine the relationship between the haplotypes and geographic origin, the whole gene region of Hd1 was used in the geographic analysis (Wei et al. 2012a). Twenty-one accessions of common wild rice from Southern China were regarded as the ancestors of O. sativa in China, and O. sativa might be domesticated from the Pearl River region in Southern China.

Oryza sativa mainly evolved from O. rufipogon in H2, H3, and H4. More O. rufipogon was included in H2 and H3 than in H4. All O. rufipogon accessions in H4 were from Southern China. To confirm whether O. rufipogon in H4 only originated from China, we detected the first exon of Hd1 in 60 accessions of O. rufipogon from South Asia and Southeast Asia. Generally, O. rufipogon accessions in H4 contained 33- and 156-bp insertions, and their sequences were longer than other samples. However, the insertions in the first exon were not detected in the samples from South and Southeast Asia (Fig. S4). Thus, we conclude that H4 might only originate from Southern China and that Southern China was one of the domestication centers of O. sativa. A recently published paper also suggested that rice was first domesticated in the Pearl River region of Southern China (Huang et al. 2012c).