The contribution of the International Rice Genebank to varietal improvement and crop productivity in Eastern India

Using survey dataset collected from nearly 9000 farmers along with pedigree and evaluation data, this study measures the contribution of the International Rice Genebank (IRG) to varietal improvement and rice productivity of farmers in Eastern India. We empirically test the relationship of ancestry to productivity changes while controlling for the effects of other farm inputs and environmental factors. Estimated coefficients indicate that a 10% increase in the genetic contribution of IRG accessions to an improved rice variety is associated with a yield increase of 27%. Through pedigree analysis, we also confirm that 45 to 77% of the genetic composition of improved rice varieties is derived from the genes of IRG accessions. Peta, Dee Geo Woo Gen, and Fortuna are the three most popular progenitors with definite IRG contribution. High genealogical diversity likely results from the crossing of germplasm received from multiple countries of origin, which also confer multiple, functional trait combinations in a released variety. Further, our calculations reveal that the average coefficient of parentage of all pairwise combinations among the 10 most adopted rice varieties is 0.0973, indicating a high degree of latent genetic diversity. Findings demonstrate the valuable contribution of the genetic resources conserved and distributed by IRG to the development of improved rice varieties.


Introduction
Rice (Oryza sativa) is the most important cereal crop and the staple food of more than half the world's population. Asia is the largest producing and consuming region (FAO 2014). During the Green Revolution of the 1960s, high-yielding rice varieties were introduced in response to the specter of famine as population densities rose and productivity stagnated. This remarkably successful process created an unintended consequence of crop diversity loss. Traditional varieties and wild species of rice were rapidly replaced by new varieties. Genetic variation from traditional varieties and related wild species is needed in crop improvement to cope with the many biotic and abiotic stresses that continually challenge rice production around the world (IRRI 2018).
To protect against the loss of rice diversity, the International Rice Research Institute (IRRI) initiated the collection of rice genetic resources in 1962 and established the International Rice Genebank (IRG) in 1971. IRG has the largest and most diverse collection of rice genetic resources in the world. As of June 2018, the collection includes 130,139 accessions, comprising 123,837 Oryza sativa, 1655 of O. glaberrima, and 4647 accessions of wild relatives and interspecific hybrids. Since the 1970s, the rice genetic resources maintained by the IRG have been used to raise the productivity of rice cropping, particularly among smallholder farm families in lower income countries. Rice genetic improvement using IRG accessions is accomplished through IRRI's breeding programme. The breeding programme produces improved cultivars both in the form of varieties that are ready for immediate use by farmers and in the form of advanced lines suited for use as parent material in national plant-breeding programmes (Evenson and Gollin 1997, p. 471). However, other than the landmark studies by Evenson and Gollin (1997) and Gollin and Evenson (1998), we are not aware of analyses that have explicitly related farm productivity changes to genebank accessions through varietal improvement. Evenson and Gollin (1997) consulted the genealogies of 1709 rice varieties released by national programmes and IRRI from 1965 to 1990, correlating productivity changes with changes in IRRI programmes in an econometric model. They estimated that adding 1000 catalogued accessions was associated with the release of 5.8 additional varieties. Assuming a 10-year lag for variety development and a 10% discount rate, they calculated that these new accessions generated a present value (in 1990 dollars) of USD 325 million. High payoffs provided an economic justification for the continued operation of the International Network for the Genetic Evaluation of Rice and the IRG.
In another analysis, Gollin and Evenson (1998) examined impacts in India only. India is the world's second largest producer of rice, and more than half of India's population depends on rice as their staple food. The authors conducted a pedigree analysis of the 306 rice varieties released for planting in India over the period from 1965 to1986. They applied a two-staged regression analysis to the district-level time series data to estimate the relative contribution of varietal improvement to productivity growth in rice. Their results showed that varietal change contributed more than one-third of the rice productivity gains realized over the post-Green Revolution period from 1972 to 1984. Their findings demonstrated that the economic value of genetic resources in India exceeds the costs of maintaining them (Gollin and Evenson 1998, p.149).
Two decades later, our analysis provides renewed, evidence-based documentation of the value of the IRG operation in raising productivity on farms in Eastern India. Unlike earlier studies, we are able to draw on data collected in a largescale farm survey to test the genetic contribution of IRG accessions. We are also able to benefit from a digitized pedigree information system to better characterize ancestry and measure latent genetic diversity.
Our principal objective is to test the effect of IRG ancestry on the rice productivity of farmers in Eastern India in an econometric framework. We also examine the country of origin of all IRG accessions in the ancestry of improved rice varieties grown and characterize the most widely used progenitors. Finally, we evaluate the genealogical (latent) diversity of the most popular improved rice varieties grown by farmers using coefficients of parentage (COP). At the scale of a farming landscape, a negative relationship between productivity and crop diversity is often hypothesized (Widawsky and Rozelle 1998;Smale et al. 1998Smale et al. , 2008Di Falco and Chavas 2009).

Data
Eastern India, comprising the states of Assam, Bihar, Jharkhand, Chhattisgarh, Orissa, eastern Uttar Pradesh, and West Bengal, is the largest rice-growing region of the country. This region accounts for approximately 60% of the total rice area of India and generates around 48% of the national rice production (Adhya et al. 2008, p. 1). In 2016, IRRI conducted a farm-level survey in four states of Eastern India (eastern Uttar Pradesh, Bihar, Orissa, and West Bengal) through a project called the 'Rice Monitoring Survey' (RMS), funded by the Bill and Melinda Gates Foundation. This survey aimed to gather information on the rice varieties cultivated by farmers during the kharif or wet season of 2015 and examine diffusion of submergence as well as drought-tolerant rice varieties. A total of 720 villages were randomly selected from the rural villages defined in the 2011 Census of India and 12 households were randomly selected in each village. The total number of households interviewed was 8640 with the sample size in each state proportional to the rural population across states. Figure 1 shows the geographical location of the villages included in the survey. Table 1 shows the distribution of sample villages and household by state as well as the average rice area per household.
We used cross-sectional, farm-level data from RMS to identify the name of improved or released rice varieties grown by the farmers during the wet season of 2015 and constructed variables to explain productivity. Plot-level data was used in the econometric analysis, since there is one variety planted per plot and most of the farmers have more than one plot. However, only those plots that were planted with the improved rice varieties identified by this study and with pedigree information were included in the analysis. Those observations with outlier values based on extreme value analysis were also excluded from the sample. The final sample size used for model estimation was 8967 rice plots managed by 4298 farmers.
For the pedigree analysis of improved rice varieties, data were retrieved from existing databases, such as the International Rice Information System (IRIS), Genetic Resources Information Management System (GRIMS) and Genesys. IRIS is the rice information management system of the International Crop Information System (ICIS). ICIS provides integrated management of global information on genetic resources and crop cultivars and is used to manage germplasm information of materials developed, received, and maintained by IRRI. GRIMS is used to manage data from the different operations of the genebank, such as seed acquisition, multiplication, characterization, storage management, and seed distribution. Genesys is a gateway through which germplasm accessions from genebanks around the world can be easily found and ordered.

Methods
Descriptive analysis was conducted for all the improved rice varieties identified under the RMS project. Each improved rice variety was classified according to the breeding institution responsible for its development and the source of its direct parent(s). Breeding institutions were classified as IRRI or non-IRRI. The source of direct parent(s) was classified as IRG or non-IRG. The improved rice varieties, which covered 95% of the total area planted to improved varieties, were identified using the results of the area estimates from RMS and became the focus of the succeeding analysis. Each of the identified varieties was classified according to whether it has direct parent(s) acquired from IRG.
Pedigree analysis was employed to quantify the genetic contribution of each IRG accession to each improved rice variety. Mendelgram, the programme of IRIS used for this analysis, assumes that each parent contributed an equal amount to their progeny. Using an algorithm that is consistent with Mendelian genetics, the genetic contribution is calculated as the probability that an unselected allele 1 comes from a progenitor with values ranging from 0 to 1.
In this study, the progenitor contribution of the IRG accessions was classified into four categories: definite contribution, possible contribution, no contribution, and unknown contribution. Definite contribution refers to identified progenitor contribution of the IRG accession in the ancestry of an improved rice variety. IRG accessions can be identified when the progenitor in the pedigree tree obtained in the BROWSE application of IRIS has an International Rice Germplasm Collection (IRGC) number opposite the variety name. This implies that the accession was obtained directly from IRG.
Possible contribution refers to a progenitor that does not have an IRGC number opposite the variety name in the pedigree tree, but it has a match by name somewhere in the IRG collection. In this case, it is possible that the progenitor came from IRG, but due to human error in encoding or a random error in the programme of the application/tool used in migrating and processing the data, the wrong name was recorded as the source. If a progenitor had no IRGC number opposite the variety name in the pedigree tree and had no match by name in the whole collection of the IRG, then it was classified as no contribution. Lastly, unknown contribution refers to the progenitor with unknown or confidential information.
Definite contribution was computed first during the aggregation of the progenitor contribution of IRG to each modern rice variety. In this process, the progenitors in the extracted pedigree tree with an IRGC number were identified and their ancestry levels were examined. If their pedigree lines were independent, then the progenitor contributions were added. However, in the case of recurrence of the IRGC accession in the same pedigree line, only the IRG accession with the lowest ancestry level was included in the aggregation to avoid double counting which could result in overestimation. Afterwards, following the same procedure for aggregating the progenitor contribution, the remaining progenitors in the extracted pedigree tree with names matched in the IRG collection were identified and their contributions were aggregated. This type of contribution is classified as the possible contribution. Similar procedures were employed for the aggregation of no and unknown contributions. The percent area of the improved 1 Allele is a variant form of a gene. Fig. 1 Geographical location of the sample villages of the RMS in 2016. Source: image depicted by authors and data from the 2016 RMS in Eastern India rice varieties adopted by farmers surveyed was used as a weight in computing the overall average progenitor contribution of IRG accessions by category.
A plot-level yield response function was estimated to test the effect of the genetic contribution of genebank accessions to productivity while controlling for the influence of conventional inputs (fertilizer, labor, machinery, plant protection, irrigation), management (age, education, access to inputs, credit and extension advice), and environmental factors (submergence, salinity, drought). The values of continuous variables pertaining to production output and input were transformed into unit per hectare. In the initial exploratory analysis, both definite and possible contributions of IRG on the variety were included in the model. However, in the final model, we only used the index for definite contribution to measure the clear impact of IRG on productivity.
Four functional forms were tested: linear, extended linear, Cobb-Douglas, and translog. Below are the specifications of these models:

Cobb-Douglas
where y i is yield of rice per hectare, x i is the quantity of conventional production inputs per hectare, z i is the nonconventional input variables (management, environmental factors, and progenitor contribution), and μ i is a random error. Two different models were estimated per functional form. The first model did not control for location effects while the second model controlled for state effects. Initially, we tested a third model, which controlled for the village effects. However, due to small sample sizes per village, we excluded this model in the final analysis. Model diagnostics were performed to determine whether the necessary model assumptions are valid. We used Variance Inflation Factors (VIF) to test multicollinearity among independent variables and Breusch-Pagan/Cook-Weisberg test for heteroscedasticity. To evaluate the economic performance of the functional specification for each model, both Akaike's information criterion (AIC) and Bayesian information criterion (BIC) were applied. The most preferred model specification has the minimum AIC and BIC value. Stata 14.2 was used to estimate the model and to perform other necessary tests discussed above.
The key variable of interest in all models is the variable measuring genetic (progenitor) contribution. The null hypothesis is that the genetic contribution of IRG accessions does not affect rice yield. If the coefficient of this variable is significantly different from zero, then the null hypothesis is rejected and the alternative hypothesis that the contribution of IRG accession affects the yield of the improved rice varieties is accepted.
Similar econometric methodology has been employed in a small number of previous studies (e.g. Widawsky and Rozelle 1998;Smale et al. 1998Smale et al. , 2008Di Falco and Chavas 2009). While these studies focused on the effects of variety diversity and genetic diversity on yield or yield risk, they did not link either outcome directly to genebank accessions through varietal improvement. To address this aspect, we also estimated econometric models using the preferred specification to test the effects of definite IRG contribution on the dependent variables of yield variance and yield kurtosis. Dependent variables were calculated from regression residuals of the main model, following Bozzola et al. (2018), which is based on Antle's method of moments (Antle 1987). The most popular IRG accessions in the ancestry of those adopted improved rice varieties were then identified. Initially, the progenitors with definite and possible contribution were segregated according to the decade when the variety was released. From this list, the study selected the three most common IRG accessions across decades. The Lastly, we selected for further genealogical analysis the 10 varieties most adopted by farmers during the 2015 season according to area planted. COP were computed for all pairwise combination among these 10 varieties to measure genetic diversity. The COP between two individuals is defined as the probability that a random allele at a random locus in one individual is identical by descent to a random allele at the same locus in the other individual (Cox et al. 1985, p. 529). The values of COP range between 0 (no known common ancestor) to 1 (same individual or variety). The lower the value of the COP, the higher the latent genetic diversity conferred by parentage among the included varieties (Souza et al. 1994(Souza et al. , 1998. This final step provides evidence concerning the relationship between productivity and genetic diversity in the farming landscape during the survey year.

Description of improved rice varieties grown by farmers
A total of 132 improved rice varieties (124 inbred and eight hybrid) were cultivated by rice farmers in Eastern India during the 2015 wet season. Five percent of these varieties were developed by IRRI, while the majority (80%) were developed by research institutions under the national breeding programme in India and other private companies. Non-IRRIdeveloped varieties covered around 78% of the total cultivated area of released varieties. Only 2% of area was planted to IRRI-developed varieties (for additional details, see Villanueva et al. 2019).
Of the 132 improved rice varieties cultivated by farmers during the 2015 wet season, 45 varieties covered 95% of the total cultivated area of improved rice varieties and are the focus of this analysis. Table 2 shows that 20% of these popular varieties have at least one direct parent that definitely came from the IRG. This comprises 43% of the total area. If possible IRG acquisition is included, which means that the direct parent has no IRGC number but has a name match with one of the IRG accessions, then 53% of the varieties have at least one direct IRG parent (corresponding to 71% of the total area).

Progenitor contribution of IRG
A total of 45 released varieties (inbred and hybrid) were cultivated on about 10.78 million ha. Fourteen of these varieties (six hybrid and eight inbred) do not have pedigree information from the IRIS database, and thus, the progenitor contribution of IRG in these varieties is unknown. Table 3 shows the summary of the progenitor contribution of IRG to the most adopted rice varieties by farmers during the 2015 wet season in Eastern India. Additional details per variety can be found in the Tables 10 and 11 in the Appendix. In definite, possible, and unknown progenitor contribution, the minimum and maximum contribution is 0 and 1, respectively, while the no contribution has a minimum of 0 and maximum of 0.50. The unweighted mean shows that the average definite and possible progenitor contribution of IRG to a released variety is about 35 and 25%, respectively. Considering the estimated percent planted area as weights, the average definite and possible progenitor contribution of IRG to a released variety is about 45 and 32%, respectively. These results mean that, on average, 45% of the genetic composition of a released rice varieties cultivated by farmers during the 2015 wet season came from IRG accessions. The progenitor contribution increases up to 77% if the possible contribution is added. This value can still be higher if there are IRG accessions in the pedigree of the hybrid and inbred varieties with no information. These results serve as evidence of the significant contribution of IRG to the development of improved rice varieties in Eastern India. Table 4 shows the summary of the distribution of progenitors by country of origin and by type of progenitor contribution of IRG. Values shown in this table represent the aggregated frequency of unique progenitors of all released varieties by their country of origin. Here, progenitors that appeared in the pedigree of more than one progeny have been counted only once to avoid double counting. Results show that there are 122 unique progenitors from 18 countries with definite and possible contribution of IRG on released varieties in Eastern India cultivated during the 2015 wet season. Combining the results for definite and possible IRG contribution, the most popular country of origin is Philippines (35 progenitors), followed by India (30 progenitors) then the United States (15 progenitors) and Taiwan (8 progenitors). The majority of the progenitors from the Philippines were from IRRI where the IRG is located. The complete list of countries of origin of all ancestors by type of IRG contribution is shown in the Appendix.

Country of origin
These results demonstrate the diversity of the progenitors in terms of their country of origin, which can also translate into multiple trait combinations in a released variety. According to Ramirez et al. (2013, p. 44), the wide use of landraces from different countries as a source of desired traits has contributed to the increase in rice production in most ricegrowing countries. The combined traits from these landraces conferred the necessary characteristics that allowed the different cultivars to cope with changing pest and disease pressures, various soil and nutrient conditions, and regional climatic conditions (Sebastian et al. 1998). Table 5 reports the definition and basic descriptive statistics for the variables included in the yield response function. The average yield is around 2.5 t per hectare (t/ha) with standard deviation of 1.7 t/ha. Farmers applied an average of 153 kg/ha of nitrogen (N), phosphorus (P), and potassium (K) fertilizer on aggregate. Labor, power cost, and other material inputs has an average of 93 person-days/ha, Php5,521/ha, and Php778/ ha, respectively. About half of the total plots experienced drought based on the perception of the farmer. The average age and education of the farmers is about 48 and six years, respectively. Only a few of the respondents have access to input, credit, and extension workers. The plot-level average of the definite contribution is 0.49, which means that about 49% of the genetic composition of the variety came from the genes of IRG accessions. Villanueva et al. (2019) provide all estimated regression models and functional forms for each variable, including VIF. The range of VIF (1.01-3.47 with a mean of 1.55)   confirms that there is a moderate multicollinearity among independent variables, but it is not severe enough to warrant corrective measure. We include other diagnostic tests for the four functional forms and each of the two models in Table 6. The result of the Breusch-Pagan/Cook-Weisberg test for heteroscedasticity is significant in each, which means that there is a problem of heteroscedasticity. To deal with this problem, robust standard errors were used in all models. Among all functional forms, results show that the translog has the lowest values of AIC and BIC. This result implies that translog is the best fit to the data. Between the two models with translog functional form, Model 2 was selected as the final model because it controls for state effects and captures significant yield variation across observations. The preferred yield response model is presented in Table 7. The definite IRG contribution is the main variable of interest to us. The coefficient of the definite IRG contribution is positive and significant. This result means that the higher value of the genetic contribution from IRG in an improved rice variety increases its yield. The magnitude of the increase in yield can be computed by obtaining the exponential value of the coefficient of variable pertaining to the definite IRG contribution using the formula exp. (coefficient). After applying this formula, the result is 1.027. In terms of percent change, this result can be interpreted as a 1% increase in the definite IRG contribution on an improved rice variety can increase the yield by about 2.7%. Furthermore, a 10% increase in the definite IRG contribution can lead to a 27% yield increase. These findings imply that the germplasm acquired from IRG is associated with yield improvement of rice varieties on farms in Eastern India.

Contribution of IRG genetic resources to productivity
Other results show that, except for labor, the conventional production inputs have positive and significant effects on productivity, conforming to the economic theory of production. The negative sign of labor input may be attributed to measurement error, as labor input was not adjusted with respect to the difference in quality between skilled and non-skilled farm workers. Person-days per hectare are often overstated, with variable lengths of workday. These findings were consistent when we explored different transformations of the labor variable and included them in the translog model. The sign of labor input remained negative. The signs of the other are consistent in various models that we evaluated.
Transplanted rice has significantly higher yield compared to broadcasted or direct seeded rice. Plots with a higher percentage of irrigated area have higher yield. Even in the wet season, some areas in Eastern India experience erratic rains and farmers need supplementary sources of irrigation for their crops. The coefficients of submergence, salinity, and drought are negative and significant. These abiotic stresses affect the growth of rice crops and decrease their productivity. In some parts of Eastern India, rice is cultivated in low-lying areas, which are prone to submergence due to heavy rains and intrusion of river or sea water. On the other hand, salinity can be caused by rising ground water or the intrusion of sea water, which brings salts to rice areas. Drought during the wet season The contribution of the International Rice Genebank to varietal improvement and crop productivity in Eastern India can be caused by untimely or low amounts of rainfall, and without other sources of water for irrigation, rice productivity will be affected. Effects of management factors are weaker. The age and education of the farmers do not significantly affect the yield. Plots owned by farmers with access to credit have higher yield compared to those who do not have access. Credit can be in a form of farm input or cash that is used to provide the optimal needs in rice production. Finally, additional econometrical models testing the effects of definite IRG contribution on the yield variance and kurtosis (variability and downside risk) indicate risk-neutrality, which means that the contribution of progenitors does not significantly affect the yield risk.

Most popular progenitors
The progenitors that frequently appear in the pedigree of the released varieties in India during the 2015 wet season were enumerated by decade of varietal release. Peta, Dee Geo Woo Gen (DGWG), and Fortuna are the three most popular progenitors, with definite IRG contribution across the decades.
Peta and DGWG are the direct parents of IR8: the variety known as the 'miracle rice' that revolutionized rice production in tropical Asia. The development of IR8 was the start of the Green Revolution in rice not only because of rapid increase in rice production but also because this variety is the most widely used parent in several crosses in tropical Asia. On the other hand, Fortuna is one of the ancestors of elite varieties such as IR24, IR36, IR64, Lalat, Pooja and Swarna-Sub1. Table 8 'profiles' the morphological and other special characteristics of these progenitors based on data assembled from the field and screen house evaluations. The following section provides additional details about Peta, DGWG, and Fortuna. These examples demonstrate that by conserving and distributing the seeds of key progenitors, IRG played a major role in producing widely adopted, improved rice varieties.

Peta (IRGC 35)
Peta, a tall and vigorous indica rice variety from Indonesia, was produced from a cross of Tjina and Latisail by H. Siregar. Tjina is presumed to originate from China, while Latisail came   in 1961, 1971, 1972, and 1976. We focus on IRGC 35 since it is the accession IRRI breeders used to produce IR8. Peta is a tall (28.6 cm) and late-maturing variety (145 days) with a semi-compact panicle at post-harvest stage. It has an intermediate size of leaf length and width. Its grain length and width are 9.3 and 2.9 mm, respectively, with grain weight per 100 grains of 2.9 g. Based on the results of field evaluation on biotic stresses, Peta is resistant to tungro virus and moderately resistant to blast. This variety is susceptible to bacterial blight and moderately susceptible to sheath blight and ragged stunt virus. Peta is also susceptible to all destructive insects, except the zigzag leafhopper. For abiotic stresses, it is tolerant to saline conditions but susceptible to drought, floods, and cold. We searched the elite lines and varieties where Peta was one of the ancestors from first up to fourth degree of their pedigrees. We found 5728 unique advanced lines and released varieties with Peta ancestry in IRIS, most of which were developed in the 1960s. The high frequency of Peta in their ancestries is due to the crossing of IR8 with other varieties or landraces to produce more improved rice varieties. Aside from IR8, other most notable released varieties with Peta in their ancestries are IR36, IR42, IR64, IR72, Swarna, Pooja, and Lalat. Twenty out of 31 popular varieties (65%) cultivated during the 2015 wet season in Eastern India and with pedigree data have Peta as one of their ancestors. These varieties were planted in around 6.6 million ha during that season, representing 58% of the total area of improved rice varieties in Eastern India.

DGWG (IRGC 123)
DGWG, found in Taiwan, is the earliest known semi-dwarf rice and is also known as 'I-geo-woo-gen'. The prefixes Deegeo and I-geo mean 'dwarf' (Dalrymple 1986, p. 17). The origins of DGWG are unclear. One account suggests that it may have been brought from Fujian several hundred years ago (Miu 1959, p. 67), while another suggests that it may have been a spontaneous mutant from another traditional variety named Woo-gen (Hu 1973 p. 566). DGWG soon became popular in Taiwan and was planted on 10,907 ha during the first cropping season in 1953.
Before the development of IR8 in early 1960s, DGWG was used in the first cross involving a semi-dwarf in Taiwan, the 'Tsai-yuan-chung': a tall, disease-resistant local variety. A selection from this cross was named 'Taichung Native 1' (TN-1) in 1956. TN-1 was rapidly accepted by farmers due to its short-stature and high tillering. By 1965, 79,000 ha were planted with TN-1, making it the second most popular variety that year (Dalrymple 1986, p. 18).
On 24 March 1962, IRG acquired DGWG seeds from Taiwan and assigned 123 as its IRGC number. Afterwards, this variety became the donor of the dwarfism trait in IR8. Dr. Robert Chandler, former director general of IRRI, described DGWG as 'a high-yielding, heavy-tillering, shortstatured variety from Taiwan' (Hargrove and Coffman 2006, p. 36).
DGWG is a medium maturing (111 days) variety. It has intermediate size of leaf in terms of length and width. Its grain has a length of 8.1 mm, width of 3.1 mm, and 100-grain weight of about 2.3 g. It has a short and drooping panicle with a length of 24 cm. DGWG was categorized as non-glutinous or non-waxy rice based on the starch in the endosperm. It is moderately resistant to blast but very susceptible to tungro and ragged stunt virus; drought and cold conditions; and pests, such as the brown planthopper, green leafhopper, whorl maggot, white-backed planthopper, and striped stemborer.
According to the database, DGWG with IRGC 123 is present in the first up to fourth pedigree of 5649 advanced lines and released varieties, which shows its immediate contribution in developing improved rice varieties. Some of the notable released varieties are IR6, IR8, CO 36, and Giza 180. Similar to Peta, the popularity of DGWG is due to the development of IR8, which was crossed with other varieties to produce new improved rice varieties. Some of the wellknown released varieties that are progenies of DGWG are IR36, IR42, IR64, IR72, Swarna, Pooja, and Lalat. DGWG was found in the ancestry of 15 out of 31 popular varieties (48%) with pedigree information. These varieties were cultivated on around 6.2 million ha during the 2015 wet season in Eastern India, covering 55% of the total area of improved rice varieties.

Fortuna (IRGC 139)
Fortuna is a landrace javanica rice that originated in Taiwan. The IRG acquired it on 7 July 1978 and assigned it an IRGC number of 139 (DOI https://doi.org/10.18730/1PME6). Fortuna is a late-maturing variety (124 days) with intermediate plant height. It has a grain length of 9.8 mm, width of 3. 0 mm, and a 100-grain weight of 3.1 g, which is similar to Peta. The variety has an intermediate size of leaf with a long (32 cm) and spreading panicle (open). Based on its endosperm, Fortuna is classified as non-glutinous or non-waxy rice. This variety is susceptible to bacterial blight, sheath blight, tungro virus, and ragged stunt virus as well as all common destructive insects. In the early vegetative stage, it is resistant to drought, but it becomes susceptible in the late vegetative stage. While it exhibits intermediate tolerance to salinity condition, it is susceptible to flooding and cold conditions.
Fortuna with IRGC 139 was found in the first up to fourth degree of the pedigree of 765 advanced lines and released varieties, illustrating its immediate genetic contribution to improved rice varieties. Some of the noteworthy varieties in this list are Blue Bonnet, Sigadis, Star bonnet, and Sun bonnet. These varieties were then crossed with other improved rice varieties and elite lines to develop other improved rice varieties, including the popular varieties Milfor, IR64, Lebonnet, IR72, Lalat, Pooja, and Swarna-Sub1. Among the 31 popular improved rice varieties with pedigree information, we found that 11 varieties (35%) have Fortuna in their ancestries. During the 2015 wet season, these varieties were cultivated on around 2.5 million ha in Eastern India, amounting to 22% of the total area of improved rice varieties. Table 9 shows the off-diagonal matrix of the COP of the pairwise combinations among the 10 most adopted varieties by farmers with pedigree information during the 2015 wet season in Eastern India. All values in the diagonal of the half matrix are 1 since the progenitors between a variety and itself are perfectly the same. Among the 45 pairwise combinations of these varieties, 34 of them (76%) have values of COP less than or equal to 0.10. Four varieties (Pooja, Lalat, Sarjoo 52, and Moti) have no identical progenitors with Mahsuri because the values of their COP are zero. On the other hand, the pairwise combination with the highest value of COP is between Swarna and Swarna-Sub1 (0.94) followed by the combination of Mahsuri and Sambha Mahsuri (0.50). Swarna-Sub1 is closely related to Swarna because Swarna-Sub1 is the improved version of Swarna after adding the Sub1 gene, which makes this variety tolerant to submergence. Samba Mahsuri is the progeny after crossing two Indian popular varieties, Sambha and Mahuri, which is why 50% of their progenitors are identical.

Latent genetic diversity
On average, the COP for all pairwise combinations among these varieties (excluding the COP of a variety to itself) is 0.0973. This means that two varieties have a mean of 9.73% identical progenitors in their pedigrees, suggesting a high degree of latent genetic diversity. High diversity among these varieties is likely the result of crossing germplasm from different countries of origin, as shown in the previous section. This ancestral diversity may also be reflected in diverse, multiple trait combinations that provide functional diversity. High varietal diversity may reduce yield variability when pest infestations strike or bad weather occurs (Widawsky and Rozelle 1998).

Conclusion
We have provided new evidence concerning the contribution of IRG to rice productivity among smallholder farms in Eastern India, following earlier studies by Evenson and Gollin (1997) and Gollin and Evenson (1998). The unique feature of this study is that we were able to combine data collected in personal farmer interviews with digitized pedigree data to test the role of IRG ancestry in farm productivity in Eastern India. To accomplish this, we identified the nature and origins of the IRG ancestry in varieties covering 95% of the rice area surveyed. We accessed other characterization data from field and screenhouse evaluations to 'profile' the three most popular IRG progenitors with definite IRG contribution over the decades. Lastly, we computed the COP for all pairwise combinations among the 10 improved rice varieties most adopted by farmers.
We found that, on average, 45% of the genetic composition of an improved rice variety cultivated by farmers in Eastern India was definitely contributed by IRG accessions. If possible IRG contributions are included, the total progenitor contribution could increase up to 77%. To assess the farm-level impact, the index for the definite IRG contribution was included as one of the explanatory variables of the yield response function. The results of the translog model show that the definite IRG contribution has a positive and significant impact on yield. A 1% increase in the definite IRG contribution to an improved rice variety increases rice yield by about 2.7%, other factors held constant.
Based on the country of origin of the progenitors, there are 122 unique progenitors from 19 countries with definite and possible IRG contribution. This demonstrates the diversity of the IRG progenitors in terms of their country of origin, implying that the wide use of combined traits from the landraces originating in different countries may have conferred the necessary characteristics that allowed farmers to cope with biotic and abiotic conditions and raise their rice production.
The most popular IRG progenitors with definite contribution identified in this study are Peta, DGWG, and Fortuna. Peta and DGWG were popular progenitors because they are the direct parents of IR8 and the development of IR8 was the start of the Green Revolution on rice. Twenty out of 31 popular varieties (65%) identified by this study have Peta, DGWG or Fortuna in their ancestry. These 20 varieties were planted on around 6.6 million ha during the 2015 wet season, covering 58% of the total area of improved rice varieties in Eastern India.
At the same time, COP analysis indicates that the top 10 most adopted varieties have only 9.73% identical progenitors in their pedigrees, implying a high degree of diversity conferred by ancestry. High diversity among these varieties probably reflects the crossing of germplasm sources from different countries of origin and may also imply functionally diverse, multiple trait combinations. Consistent with this result, we find that the definite IRG contribution is neutral to yield risk in rice production. The findings of this study demonstrate the valuable contribution of IRG's conservation and distribution of genetic accessions to the development of improved rice varieties and rice production on farms in Eastern India. Future research might include applications to other farming areas, preferably with panel data to enhance the robustness of the analysis.   Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Melinda worked in Pakistan, Somalia, Mauritania and Niger on shorterterm assignments. She has published over 100 peer-reviewed articles, has received awards for outstanding articles and has served on the editorial committees of several journals. She enjoys working with young professionals.
Nelissa Jamora leads the monitoring, evaluation, and impact assessment of programmes and projects of the Crop Trust. She has a PhD in Agricultural Economics from the University of Göttingen in Germany and an MSc from Michigan State University in the USA. Her interest in agricultural development was motivated from her early work at IRRI on price and market analysis and impact assessment involving rice farmers in the Philippines. Nelissa received a USA Fulbright scholarship in 2004 and a grant for early-researchers in 2014 from the German Research Foundation. She was recognized as a Young Rice Scientist at the International Rice Congress in 2014 for her work on rice price analysis. She joined the Crop Trust in 2015 and currently coordinates the Community of Practice on Genebank Impacts.
Grace Lee Capilit is a Lead Specialist in data management at IRRI. She currently manages the rice germplasm database of the International Rice Genebank and collaborates with scientists and researchers on genebank operations, such as seed acquisition, multiplication, characterization, conservation, seed distribution, and information sharing. She has an MS and a BS degree in Statistics from the University of the Philippines Los Baños (UPLB). Before joining the TT Chang Genetic Resources Center in 2009, she was a database manager at the Plant Breeding, Genetics and Biotechnology Division of IRRI from 2000 to 2008 and also served as a teaching associate at UPLB handling statistical programming, experimental designs, use of statistical packages, and general statistics classes.
Ruaraidh Sackville Hamilton has over 40 years of experience in the conservation and use of crop genetic resources, including best practices and workflow management systems for genebank management; database design and data management; statistics, genetics, and genomics; crop wild relatives; pre-breeding; plant breeding; plant ecology; GM biosafety; and international policy on access and benefit-sharing. From 2002 to 2018, he served as the Head of the International Rice Genebank at IRRI in the Philippines. He is an evolutionary biologist and one of the seven award recipients of the Crop Trust Legacy Award in 2018 for his effort and outstanding contribution in the field of plant genetic resources conservation. He has a PhD in plant genetic resources from the University of Cambridge, UK.