1 Introduction

Amaranth (Amaranthus spp., mchicha in Swahili) is a popular traditional vegetable with a long cultural tradition in East Africa. It is cultivated for its leaves and young stems, and sometimes for its grain. The leaves are rich in vitamin C and pro-vitamin A as well as in iron, zinc, and calcium – micronutrients that local diets are often deficient in (Yang & Keding, 2009). Increased consumption of green leafy vegetables such as amaranth is important to achieving food and nutrition security. In Kenya and Tanzania, there has been an increasing demand for traditional African vegetables such as amaranth, especially among urban consumers (Cernansky, 2015; Chelang’a et al., 2013). This provides an opportunity for smallholder farmers, many of whom are women, to earn income and improve their livelihood through vegetable amaranth production (Mwadzingeni et al., 2021).

Until about fifteen years ago there was no formal seed system for supplying improved amaranth cultivars. Farmers saved their own seed and exchanged it locally, which is still a common practice today. The World Vegetable Center (WorldVeg) established an amaranth breeding program in Tanzania in 2004 with the aim of creating new amaranth cultivars with preferred taste and enhanced performance under local biotic and abiotic constraints. To do this, WorldVeg researchers and partners collected amaranth germplasm from across Africa, Asia and the Americas and conserved this for public access in vegetable genebanks in Taiwan and Tanzania, which currently hold 867 amaranth accessions of 18 species (Dinssa et al., 2018; see also https://genebank.worldveg.org/). Accessions with farmer and consumer-preferred traits were chosen for purification and improved through selection and crossbreeding. Together with the Tanzania Agricultural Research Institute (TARI), promising lines were tested in multiple locations in Tanzania and the best performing lines were released for commercial seed production and distributed to farmers. Amaranth cultivars (that is, cultivated varieties) and selected lines have been promoted in Tanzania and other countries of sub-Saharan Africa including Malawi, Rwanda, Kenya, and Uganda by including them in large-scale seed distributions through several programs (Stoilova et al., 2019). In addition, by 2016, local seed companies in Kenya and Tanzania were commercializing improved cultivars and sold and estimated 2.9 tons of WorldVeg amaranth seed (Ochieng et al., 2019), which increased to 8 tons in Kenya alone in 2020 (Simlaw Seeds, personal communication).

A recent study using expert elicitation methods estimated that improved amaranth cultivars were planted on 51% of the amaranth area in Kenya and 70% in Tanzania with WorldVeg-related varieties planted on 20.0% of the amaranth area in Kenya and 47.1% in Tanzania (Ochieng et al., 2019). These data suggest that improved amaranth cultivars have made a substantial impact, but this has not been accurately quantified to date.

Previous studies have shown that estimates based on household surveys or expert opinion tend to overestimate the adoption of improved cultivars as compared to DNA fingerprinting, which is a more reliable method (Rabbi et al., 2015; Wossen et al., 2019). DNA fingerprinting has previously been applied to quantify the adoption of improved cultivars of staple food crops (e.g., Floro IV et al., 2018; Kosmowski et al., 2019; Maredia et al., 2016). For Tanzania, two previous studies applied DNA fingerprinting to quantify the adoption of improved maize and rice cultivars (Oruko et al., 2015; Wineman et al., 2020). Regarding the accuracy of alternative methods, Floro IV et al. (2018) in a study for Colombia, estimated that 9.2% of farmers cultivated improved cassava cultivars as based on DNA fingerprinting while this was 17.1% as based on household survey data. Also, Maredia et al. (2016) found that expert elicitation and household surveys overestimated the adoption of improved cassava and bean cultivars as compared to DNA fingerprinting. Their results showed 85% adoption of improved cassava varieties in Ghana as based on expert opinion, 57% as based on a household survey, but only 16% as based on DNA fingerprinting. For improved bean varieties in Zambia this was 80% adoption as based on household survey data, 68% as based on expert opinion, but only 4% as based on DNA fingerprinting.

Against this background, the objective of this study is to estimate the adoption and impact of improved amaranth cultivars in Tanzania that originated from the international breeding program of WorldVeg. We use DNA fingerprinting as the main method to quantify cultivar adoption, but also compare it against farmers’ own assessment as measured from a farm household survey and expert elicitation estimates as reported by Ochieng et al. (2019). Our study is the first application of DNA fingerprinting to cultivar adoption in of vegetables. The application to amaranth is challenging because it is a multispecies crop with at least six different species cultivated in Tanzania. Furthermore, while most previous studies stopped at cultivar identification and adoption, this study goes one step further and combines household survey data with cultivar identification to quantify the immediate impact of variety adoption on crop yield, production costs and sales. Our hypothesis is that improved cultivars give a higher mean yield while production costs are similar. A higher yield would allow farmers to sell a greater quantity.

2 Materials and method

2.1 Farm household survey

Survey data were collected in five of Tanzania’s seven agro-ecological zones (AEZs) between October and November 2018. A multi-stage stratified sampling procedure was used to select households. In the first stage, five AEZs with the highest amaranth production as based on national statistics were selected, namely Eastern zone, Northern zone, Lake zone, Central Zone, and the Southern Highlands (Table 1).

Table 1 Distribution of the sample of amaranth-producing households in Tanzania, 2017–2018

In the second stage, regions with most amaranth production were selected from each AEZ: Dar-es-salaam and Pwani in Eastern zone, Arusha and Kilimanjaro in Northern zone, Mwanza in Lake zone, Dodoma in Central zone, and Iringa in the Southern Highlands (Fig. 1). In the third stage, three districts were selected from the selected regions in each AEZ. Within the selected districts, three wards known to be involved in vegetable or amaranth production were identified with the help of key informants (mostly District Agricultural, Irrigation and Cooperative Officers (DAICO) or horticultural officers). Two to four villages were then randomly selected from a complete list of villages. Amaranth farmers were listed with the help of a local lead farmer and a sample was drawn proportionate to the number of amaranth farmers. A total of 1,355 households were interviewed, about 270 from each zone.

Fig. 1
figure 1

Study areas in Tanzania

The respondent was the person in the household with most knowledge of amaranth production. The questionnaire contained sections on household demographics (household composition, age, gender, and education of all household members), crop production (cultivation techniques, management, harvesting, crop yield, and sales), and income, among others. The questionnaire was coded on tablet computers using SurveyCTO and interviews were conducted by trained enumerators. Respondents were informed about the purpose and scope of the interview, how the data will be used and by whom. Before the survey, each respondent was carefully informed that their participation was voluntary and was asked to sign a consent form before the interview or provide verbal consent where the respondent was unable to read and write.

2.2 DNA fingerprinting

Amaranth leaf samples were collected from farmers’ fields at the same time as the survey data collection. Three to four pieces of young leaf approximately 3 cm long were sampled per plant and from each cultivar grown. One sample was taken from each field with the same plant species. Where farmers cultivated more than one variety of amaranth, each was given a different label. For farmers who had no amaranth in the field, a few seeds were collected and grown out at the WorldVeg research station in Arusha to collect the leaves. The leaf samples were dried before DNA extraction. DNA was extracted with Sbeadex Maxi Plant Kit (LGC, UK) according to the supplier manual. DNA of suitable quality for genotyping by sequencing could be obtained for 1,007 out of 1,450 samples collected. The loss was high because samples had to be shipped from Tanzania to Taiwan for analysis.

DNA libraries and sequencing were completed by Diversity Arrays Technology Pty Ltd. (DarT, Australia) using DarTSeq™. DarTSeq routinely uses double-digested restriction enzyme (PstI-Msel) associated DNA sequencing to genotype the samples. The SNP calling followed the procedure described in Lin et al. (20212022). The raw reads were filtered with a Phred quality score threshold of ≥ 20 in SolexaQA (Cox et al., 2010) and then de novo assembled using the “denovo-map.pl” pipeline in Stacks (version 2.54; Catchen et al., 2013), after which the SNPs were recalled. The SNP markers are available in Appendix Table A1 in the online data depository. Samples with a genotype missing rate > 50% were removed, resulting in a total of 955 samples kept in the DNA fingerprint analysis. The genotypes of 29 cultivars distributed to farmers in amaranth seed kits by WorldVeg and partners or released and commercialized as varieties were obtained from Lin et al. (2022) and included six amaranth species: A. blitum, A. caudatus, A. cruentus, A. dubius, A. hypochondriacus, and A. retroflexus.

A. dubius is a highly heterozygous tetraploid species (Lin et al., 2022) that is very different from the other amaranth species. We therefore used a different criterion to prepare the SNP matrix. In total, 22 k SNPs were available for the set of 29 reference genotypes. For A. caudatus, A. cruentus, A. hypochondriacus, A. blitum and A. retroflexus (a total of 24 cultivars), homozygous SNPs with minor allele frequency > 0.05 and missing data rate < 0.5 were kept, resulting in 2,063 SNPs, while for A. dubius (5 cultivars), also heterozygous SNPs were included, but SNPs with missing data were excluded, resulting in total in 9,020 SNPs. Samples with heterozygous SNP sites > 0.02 were grouped to A. dubius, and those with < 0.02 were not considered as A. dubius and instead grouped to the other amaranth species. After merging the SNP matrix for the known WorldVeg cultivars and the field samples, the SNPs were filtered again following the same principles described above. The SNP matrix was used to calculate genetic distance in TASSEL 5.0 (Bradbury et al., 2007). A field sample is claimed to be a WorldVeg cultivar if the genetic distance between itself and a known WorldVeg cultivar equals zero. The fingerprints obtained with these SNP matrices could not distinguish all field samples and released cultivars from each other. In order to resolve genetic distances between A. hypochondriacus accessions, 85 SNPs polymorphic in this group were selected and used to genotype field samples. Similarly, a set of 18 SNPs was chosen to further distinguish A. dubius accessions and cultivars from each other.

2.3 Propensity score matching

The use of cross-sectional data for impact evaluation may risk the presence of selection bias which occurs when characteristics that influence the outcomes also influence individuals’ decision to adopt a technology. We use propensity score matching (PSM) to correct for selection bias in observable household characteristics between adopters and non-adopters (Rosenbaum & Rubin, 1983). PSM has been widely used in the impact evaluation literature (Fischer & Qaim, 2012; Gautam et al., 2017; Gitonga et al., 2013; Ochieng et al., 2018).

PSM follows three steps. In the first step, adoption status is regressed upon a set of independent farm and household characteristics that simultaneously influence adoption and outcomes. The DNA fingerprinting results are used to separate households using amaranth cultivars that very likely originate from WorldVeg (labelled as “adopters”) from households that use other cultivars (“non-adopters”). WorldVeg cultivars are mostly improved while others are mostly local, though we cannot rule out the existence of other improved cultivars. Independent variables, selected based on our own knowledge of production systems in Tanzania, include: (a) gender of the household head which has an influence on decision making related to land allocation, type of crop to be cultivated, and is a proxy for access to information given that men have better access to information on new technologies than women; (b) age and education of the household head which have an influence on experience and knowledge in vegetable production; (c) household size as a driver of land-use intensity; (d) total area of land under cultivation which has an influence on the intensity of technology adoption, and also represents the wealth status of a household; (e) use of manure; (f) access to extension which is important for access to technical know-how; (g) membership in farmer groups as a proxy for knowledge sharing and innovativeness; and (h) location dummies. A logit model is estimated and predicted values are computed. These predicted values (i.e., the propensity scores) indicate the probability of a certain household adopting improved cultivars conditional on confounding factors.

In the second step, households are ranked according to their propensity scores and each adopter is compared with a set of most similar non-adopters using alternative matching algorithms (Caliendo & Kopeinig, 2008). We use the nearest-neighbor method, which matches each adopter to its most similar non-adopter. Observations without a good match are dropped from the analysis. We also use the radius and kernel matching methods as a robustness checks.

In the final step, the difference in outcome variables is calculated for each matched pair, after which these differences are averaged over the entire sample to obtain the average treatment effect. Three outcome indicators were used to quantify impact. First, crop yield is measured as the quantity of amaranth leaves harvested expressed in tons per hectare. In Tanzania, amaranth leaves are the main output with only few households harvesting the grain and we therefore focus on leaf yield, calculated as the sum of harvested leaves in the short- and long-rain seasons. Second, the quantity sold is expressed in tons. Third, the cost of production is expressed in US dollars per hectare and calculated as the sum of costs incurred on land preparation, inputs (seed, fertilizer, pesticides etc.), and labor, specific to fields planted to amaranth.

Two key assumptions are tested while implementing the PSM technique. First, application of propensity scores requires that the distribution of covariates between adopters and non-adopters is similar or balanced (Caliendo & Kopeinig, 2008; Rosenbaum & Rubin, 1983). We check this using four tests: First, a two-sample t-test comparing covariate means before and after matching shows no significant difference after matching (Appendix Table A2). Second, a comparison of the pseudo-R2 before and after matching shows it to decline from 0.057 before matching to 0.002 after matching indicating no systematic differences in the distribution of covariates. Third, the percentage bias after matching is less than 6% for all covariates after matching and 1.6% over all covariates, which is below the common thresholds of 20% and 10%, respectively. Fourth, after matching the variance ratio tests, Rubin’s B should be less than 25 while Rubin’s R should lie between 0.5 and 2 (Rubin, 2001). The results show that Rubin’s B is 10.2 and while Rubin’s R is 1.17 as required.

Table 2 Summary of household characteristics, sample means, 2017–2018

A second assumption is the presence of common support (overlap condition) in the propensity scores of adopters and non-adopters, which is confirmed through visual inspection (Appendix Figure A1).

3 Results

3.1 Sample characteristics

The means for adopters and non-adopters are not statistically different for any variable except household size, monthly per capita expenditure (a proxy for income), and cultivated area (Table 2). Non-adopters have a significantly higher monthly per capita expenditure (+ USD 3,060). Adopters have a significantly higher area under cultivation (+ 0.22 ha), but the average area under amaranth was not significantly different between adopters and non-adopters. The location variables show that adoption is higher in the Lake zone and Central zone of the country, and lower in the Eastern zone.

Seed use and cultivar identification.

Of the interviewed amaranth producers, 55% used own saved seed or seed obtained from other farmers—also known as the informal seed sector; while 42% bought seed from stockists and local shops—that is, the formal sector (Table 3). Less than 1% of the amaranth seed used was obtained from public sector organizations (e.g., ministry of agriculture or research institutions). Seed from the informal sector may also be of improved cultivars as all cultivars are open-pollinated and farmers can save the seed.

Table 3 Sources of amaranth seed planted in Tanzania, 2017–2018

Farmers were asked to tell the name of their amaranth cultivars during the 2017/2018 cropping period. Nine cultivars were listed, with adoption varying by AEZ (Table 4). The names can be confusing as some of the same cultivars are known under different names (e.g., Madiira 1 and Ex-Zim-Sel) and many cultivars are known by their species name. A. hypochondriacus appears to be the most popular specie, cultivated by over 56% of farmers interviewed, followed by A. dubius (22%), and A. cruentus (15%). Popular cultivars identified by farmers include Poli, Nguruma, and Madiira 2, and a cultivar farmers call A. hypochondriacus. Nguruma is popular in the Northern zone of Arusha and Kilimanjaro, while A. cruentus is popular in the Eastern zone (Dar-es-salaam and Pwani) and Southern Highlands (Iringa).

Table 4 Amaranth cultivars based on farmers knowledge, in percentage of all respondents per agroecological zone, 2017–2018

Cultivar choice was driven by farmers’ perceptions of product marketability (30%), high yield (17%) and early maturity for grain or seed growers (17%) (Appendix Table A3). These perceptions varied slightly across cultivars. For instance, farmers who cultivated A. cruentus and A. dubius species were more driven by the low cost of seed, and less by early maturity. A. cruentus was also preferred because it is relatively pest resistant. Those who cultivated Madiira 1 were also driven by its taste and good pest resistance. Farmers who cultivated Nguruma (A. dubius) indicated that it is drought tolerant. This means that beyond marketability and yield, amaranth farmers prefer cultivars resistant to biotic and abiotic stresses. Most farmers (47%) for instance mentioned that pests and diseases are a challenge in amaranth production (Appendix Table A4). The strong preference placed on marketability shows that many households are interested in selling amaranth. The survey data show that 86% of farmers produce amaranth for both home consumption and selling (Table 5).

Table 5 Farmers’ purpose of producing amaranth in Tanzania, 2017–2018

3.2 DNA fingerprinting

Table 6 shows amaranth cultivars or species grown by farmers as identified by DNA fingerprinting. Of the total samples assayed, 67.4% were found to contain genetic material originating from WorldVeg. Most of these cultivars were of A. hypochondriacus (46%), followed by A. dubius (14%), and A. cruentus (3.9%) and A. blitum (3.4%). Around 80% of the samples from the Lake and Central zones and 55% from the Southern Highlands were A. hypochondriacus, while the Northern zone had mainly A. dubius. About half of the samples from the Eastern zone did not originate from WorldVeg.

Table 6 Amaranth cultivars as identified by DNA fingerprinting disaggregated by region, in percentage of all observations

The large group of A. hypochondriacus showed identical DNA fingerprints to three WorldVeg lines and one WorldVeg genebank accession. Similarly, the fingerprints of the A. dubius samples matched the fingerprints of one WorldVeg line and two WorldVeg genebank accessions. To assess whether the field samples were related to distributed WorldVeg cultivars or to genebank accessions, specific polymorphic marker sets for the A. hypochondriacus and A. dubius groups were used to determine the genetic distances between the WorldVeg cultivars, genebank accessions and field samples. It showed that 93% of the field samples attributed to the WorldVeg A. hypochondriacus group showed greatest genetic similarity to line TZSMN (VI062428), 3% were related to two other WorldVeg cultivars and 3% could not be distinguished from the genebank accession. From the A. dubius group, only 25% of the samples showed greatest genetic similarity to WorldVeg cultivar Ex-Zan. The other samples showed either greater or equal genetic similarity to two WorldVeg genebank accessions. All A. blitum field samples showed greatest similarity to a WorldVeg cultivar, and all A. cruentus field samples were related to genebank accessions. Overall, fingerprinting confirmed that 51.8% of the field samples show greatest genetic similarity to WorldVeg cultivars, and further 15.6% of the samples were related to WorldVeg genebank material. We compared the DNA fingerprinting results against the household survey data to determine the extent to which farmers were able to accurately identify their amaranth cultivars. The comparison is based on a subset of 708 households from whom only a single plant sample was collected. Figure 2 shows that farmers were correct 51% of the time.

Fig. 2
figure 2

Percentage of correctly identified amaranth cultivars by farmers

3.3 Impact of improved cultivars

Table 7 summarizes the outcome variables used in the PSM model. Households that had adopted WorldVeg amaranth cultivars achieved a significantly higher mean yield (+ 4.65 tons), and lower cost of production (− USD 64.52) than non-adopters. There was no significant difference in the quantity of amaranth sold between adopters and non-adopters.

Table 7 Sample means of the outcome variables used

Controlling for other factors, the PSM model shows that the decision to use WorldVeg amaranth cultivars is largely influenced by locational variables. Households in the Northern zone, Southern Highlands, and Lake zone are more likely to use WorldVeg cultivars than those in the Eastern zone (Appendix Table A5). This may be because of differences in the suitability of cultivars or because of differences in efforts to introduce the cultivars to farmers. Evidently, only 6% of the observed variation in crop yield, cost of production and quantity sold is explained by the covariates as indicated by the R-squared.

Impact estimates show that households using WorldVeg cultivars achieved a mean amaranth leaf yield that was 6.1 tons/ha higher (p = 0.002 using nearest neighbor matching) than those who used other cultivars (Table 8). This suggests a crop yield premium of 48%. The impact on the cost of production and quantity sold are not significant. Radius matching and kernel-based matching give the same results.

Table 8 Impact of WorldVeg amaranth cultivars on crop yield, sales and cost of production in Tanzania using three alternative propensity score matching algorithms, 2017–2018

4 Discussion

4.1 Strengths and limitations of the study

This is the first application of DNA fingerprinting to quantify the adoption of vegetable cultivars and the first to combine DNA fingerprinting with propensity score matching to quantify the impact of improved cultivars. DNA fingerprinting provides a unique way of quantifying adoption of improved crop cultivars and we used nationally representative data of amaranth-producing households in Tanzania.

The fingerprinting study applied highly stringent criteria to determine the relationship between field samples and WorldVeg material. An overestimation of the adoption of WorldVeg amaranth lines is therefore unlikely. However, the study may have missed WorldVeg materials that have not been documented as having been released in the region, which may result in underestimating the adoption of WorldVeg cultivars. While DNA fingerprinting is thought to give more accurate estimates of cultivar adoption compared household surveys or expert opinions, it needs to be kept in mind that the method assumes farmers’ fields to be genetically uniform, while in reality, farmers’ amaranth fields may have different genotypes. Amaranth is mainly self-pollinating, but seed saving and the use cultivar mixtures may introduce genetic variation. Consequently, a single sample per field may not represent the genotype of the whole field.

Vegetable amaranth has been collected by WorldVeg and other organizations in Tanzania and may have been used for developing improved cultivars. This opens up the possibility that germplasm collected from the region is still in use by local farmers and could have been erroneously classified as an improved WorldVeg cultivar by the fingerprinting study. The WorldVeg genebank holds 58 Amaranthus accessions collected from Tanzania and one accession, VI059412 collected from Tanzania in 2002, has an identical DNA fingerprint to three improved WorldVeg cultivars, namely TZSMN 102, AH-NL and AH-TL. Investigation of the genetic relationship among these materials with 85 markers specifically designed for this germplasm corroborated that 93% of the field samples fingerprinted as A. hypochondriacus showed greatest similarity to cultivar TZSMN 102 and further 3% of the samples had greatest similarity with two other WorldVeg cultivars, while only 3% of the field samples was more or equally similar to genebank accession VI059412. This strongly indicates that most of these A. hypochondriacus field samples were indeed derived from improved WorldVeg lines rather than the original genebank accession. Similarly, the A. dubius accessions VI050451 and VI50448 originating from Tanzania could not be distinguished from WorldVeg cultivar Ex-Zan. Here, a smaller polymorphic marker set was available allowing the attribution of only 25% of the samples of the A. dubius group to the improved WorldVeg cultivar Ex-Zan, while the remaining samples were more or equally similar to A. dubius genebank materials collected from the region. In conclusion, based on genetic similarity, 51.8% of the field samples are most likely derived from improved WorldVeg material.

As a possible weakness of the study, we note that the use of propensity score matching on cross-section data of a non-randomized intervention may not fully eliminate selection bias. Panel data would be preferred to better understand trends and drivers of cultivar adoption. The analysis also focused on a few relatively easy to measure outcome variables (mainly crop yield), while further analyses could investigate income and nutrition effects.

4.2 Comparison of methods

The DNA fingerprinting method showed that 67% of the amaranth samples originated from genetic material supplied by WorldVeg. The adoption rate for WorldVeg amaranth cultivars is estimated to be 66% of the total area under amaranth in Tanzania. This is 20% higher than what was reported earlier using expert elicitation techniques (Ochieng et al., 2019). While previous studies showed that expert elicitation tends to overestimate adoption rates (e.g., Maredia et al., 2016), our study shows that it can also underestimate adoption rates as the farm survey data showed the adoption of improved cultivars to be 51%, which is 16% lower than the DNA fingerprinting results.

5 Conclusion

In Tanzania, 67% of amaranth-producing households use amaranth cultivars that are likely derived from breeding lines and genebank accessions coming from the international amaranth breeding program of World Vegetable Center in Tanzania. The WorldVeg cultivars had a mean yield that is + 6.1 tons/ha higher (+ 48%) than other cultivars. This shows that international vegetable breeding research on traditional African vegetables can create tremendous impact as the lack of previous research in these crops can lead to relatively quick and sizeable improvements in crop performance.