Background

The population of Sub-Saharan Africa is growing fast, and 70% of the population is in rural areas that depend on the agricultural sector as a source of livelihood. The sector is not growing fast enough to meet food adequacy. Much of the agricultural growth achieved to date is by the expansion of agricultural land area. In the face of an increasing population, agricultural land expansion has reached its geographical limits and has become a leading cause of soil fertility decline and environmental degradation (Wiggins 2000; Breisinger et al. 2011). The agricultural sector is still an important economic sector, and it employs over 50% of working adults and over 65% of the labor force (Gollin, Parente, and Rogerson 2002). Improving agricultural production and productivity through adoption of improved agricultural technologies is an important pathway that will improve livelihoods of the majority and enhance food security. Adoption of new and improved practices, expansion of rural financial markets, increased capital and equipment ownership, and development of research and extension linkages could all contribute to increases in productivity, which is a prerequisite for poverty alleviation and enhanced food security (Von Braun, Ruel, and Gillespie 2010; Wesley and Faminow 2014). While many countries in Asia, the Caribbean, and Latin America have registered production and productivity gains from adopting agricultural technologies such as hybrid seeds, inorganic fertilizer, and irrigation, in Sub-Saharan Africa, the adoption of promising agricultural technologies has been far from ubiquitous and has remained particularly low. For example, Gollin, Morris, and Byerlee (2005) show that improved maize varieties accounted for 17% of the total area harvested in Sub-Saharan Africa compared to 90% in East and South East Asia and the Pacific and 57% in Latin America and the Caribbean. Primarily cultivated by smallholder farmers for domestic consumption, sorghum thrives in harsh climates, is drought resistant, and can improve food security and mitigate the influence of climate change especially among vulnerable populations (Ahmed, Sanders, and Nell 2000). The sorghum crop is an important source of protein and nutrients for millions of people. In West Africa, sorghum accounts for 70% of total cereal production (Atokple 2003). The adoption rates of improved sorghum varieties (ISVs) vary significantly within Sub-Saharan Africa, with Southern Africa having higher adoption rates than other parts of the region. The sorghum crop consistently accounts for more than 30% of the total cultivated land, and 23% of the total sorghum crop area is planted with improved varieties. In most parts of West Africa, the area with ISVs is less than 2% of the total cultivated land (Cline 2007; Burke, Lobell, and Guarino 2009). As discussed in Gollin, Lagakos, and Waugh (2014), there is also a large gap between what the sub-Saharan farmer produces per unit area and production potential with the available technology.

Worldwide, recent research and extension efforts have resulted in better agricultural practices, new and improved crops varieties, and improvements in soil and water management practices. However, Meinzen-Dick et al. (2004) argue that the only way for sub-Saharan farmers to gain from these new agricultural technologies is through adoption, after perceiving them to be beneficial and profitable. To enhance the adoption, there are several studies that focus on mapping agricultural technology adoption patterns and on finding variables associated with adopters of these technologies. This study extends the latter category by using a two-step cluster analysis to group farmers into subgroups with similar adoption patterns. The generated knowledge is important in terms of formulating specific policies and/or targeting specific groups of farmers to promote the adoption of ISVs in Tanzania and giving feedback to institutions involved in agricultural research and extension in similar regions in Sub-Saharan Africa.

One of the goals of this study was to quantify the factors influencing the adoption of ISVs developed by the International Crop Research Institute for Semi-Arid Tropics (ICRISAT) and tested by the Department of Research and Development (DRD) of Tanzania’s Ministry of Agriculture, Livestock, and Fisheries. The results from this study will allow ICRISAT and DRD to test the validity of their new research strategies and to suggest an efficient mechanism and adoption pathways for other crops. In addition, the present study adds to the literature about the role of a lack of information and capital constraints on the adoption of ISVs. The analysis illustrates how access to information and the availability of capital jointly affect the adoption behavior of sorghum producers. We go beyond the traditional approach of assessing factors affecting adoption by using a two-step cluster analysis and t-distributed stochastic neighbor embedding (t-SNE) that allows visualization of the underlying relationships among farmers with similar adoption patterns (Burke, Lobell, and Guarino 2009). The results are key for good decision-making process in terms of designing cost-effective agricultural research prorates and extension advisory services.

In the following section, we present an overview of sorghum research and development in Tanzania followed by a description of the source of the data analyzed in this study. Then, we present a conceptual framework for technology adoption in the presence of multiple binding constraints, the empirical specifications of a multiple-hurdle Tobit model, and a brief review of two-step cluster analysis. In the last two sections, we present key findings and the policy implications for scaling up the adoption of improved ISVs in Tanzania.

Sorghum research in Tanzania

Sorghum (Sorghum bicolor (L.) Moench or Mtama in Swahili is one of the five most important cereal crops in the world, and because of its broad adaptation, it is one of the climate-ready crops (Association for Strengthening Agricultural Research in East and Central Africa 2013). In Tanzania, sorghum is the second most important staple food after maize, supporting more than 80% of the population (Rohrbach et al. 2002). Most farming systems in Tanzania are increasingly cultivating sorghum as the main crop to address recurring food shortages resulting from other crop failures (Kombe 2012). Sorghum research and development activities in Tanzania trace back to the early 1980s. During that period, ICRISAT began collaborating with DRD as well as some non-governmental organizations (NGOs) to test improved sorghum varieties using both on-station and on-farm trials. Early efforts led to the release of three sorghum varieties: Tegemeo, Pato, and Macia in 1978, 1997, and 1998, respectively (Mgonja et al. 2005). In 2002, they released the Wahi and Hakika varieties, and in 2008, they released NARCO Mtama 1. Seed Co Tanzania Limited also released the Sila variety in 2008 (Monyo et al. 2004). Kilimo (2008), Kanyeka, Kamala, and Kasuga (2007), and Association for Strengthening Agricultural Research in East and Central Africa (2013) summarize agronomic and physical characteristics of these varieties. The varieties are drought-tolerant and are for human consumption. Agro-pastoralists use crop residues as animal fodder (Rohrbach and Kiriwaggulu 2007; Kombe 2012). Over the past decade, sorghum is slowing entering the nonfood and value-add markets with use in the baking, brewery, and animal feed industries. The focus of current research and extension efforts is on linking farmers to this nonfood market to stimulate production and scale up ISV adoption in Tanzania (Monyo et al. 2004).

Source of data

The data for this analysis are from a survey conducted by Selian Agricultural Research Institute (SARI), Arusha, Tanzania, in collaboration with ICRISAT, Nairobi, Kenya. The first author of the present study developed the structured questionnaire. A 2-day enumerator-training workshop, organized by the main author, was conducted in May 2013 to review the questionnaire. Twenty-five extension agents working in major sorghum farming systems and three scientists from ICRISAT participated in the workshop. After the workshop, the questionnaire was pre-tested in the Singida Rural and Rombo Districts. Issues found during the questionnaire pre-test provided guidance for refinement of the final survey instrument used in the study.

We considered the intensity of sorghum production and importance of sorghum in the farming system to select participating regions and districts. The sample area included the Iramba, Singida, and Manyoni Districts (Singida Region, 435 sample households), Kondoa District (Dodoma Region, 102 sample households), Babati District (Manyara Region, 110 sample households), Rombo District (Kilimanjaro Region, 57 sample households), and Kishapu District (Shinyanga Region,118 sample households). We randomly selected two sample wards and one village from each ward from each district. Administrative subdivisions in Tanzania include regions, districts, wards, and villages. Therefore, the village is the lowest administrative unit (Map 1).

Map 1
figure 1

Location of sample households in Tanzania

To create a counterfactual (for impact assessment in another study), 60% of the responding households were an adopter, that is, planted at least one improved sorghum variety during the 2013/2014 farming season. For statistical analysis, the sample size per village was at least 50 households. The survey covered 822 households, of which 505 were adopters (61.44%) and 317 were non-adopters (38.56%). At the village level, we first grouped farmers into adopters and non-adopters using the village register and then randomly selected sample households from each group. Previously trained enumerators collected the data from the respondents, who were knowledgeable farmers at the household level.

Modeling adoption under information and capital constraints

Theoretically, the adoption of agricultural technology occurs when the expected utility from the technology exceeds that of non-adoption (Huffman 1974; Rahm and Huffman 1984). Since utility is not observable, single, or multivariate limited dependent models have been a workhorse for estimating factors affecting adoption (Huffman and Mercier 1991; Grabowski and Kerr 2013). Cragg’s double-hurdle model (Cragg 1971) extends these models if a farmer faces two hurdles while deciding to adopt. Croppenstedt, Demeke, and Meschi (2003) modified Cragg’s model to directly model imperfections that create multiple hurdles during the adoption process.

In this study, there are three groups of farmers. The first group passed all hurdles and adopted the improved seeds. The second group had a desired demand but lacked either information or capital. In this group, there were farmers with limited information on ISVs not constrained by capital and farmers with enough information on ISVs but not enough capital to buy improved seeds and/or complementary inputs. The third group consisted of farmers who were non-adopters with access to both information and capital, but they did not adopt ISVs due to other unknown constraints. Given the standard utility maximization condition for the adoption process and letting \( {\boldsymbol{D}}_{\boldsymbol{i}}^{\boldsymbol{T}} \)stand for a binary variable for the adoption decision (where adoption = 1 and 0 otherwise), \( {\boldsymbol{D}}_{\boldsymbol{i}}^{\boldsymbol{c}\mathbf{1}} \) is a binary variable representing information constraint, and \( {\boldsymbol{D}}_{\boldsymbol{i}}^{\boldsymbol{c}\mathbf{2}} \)is a binary variable standing for capital constraint. The multiple-hurdle Tobit model is:

$$ {D}_i^{\ast }={D}_i^T{D}_i^{c\mathbf{1}}{D}_i^{c\mathbf{2}}=\left\{\begin{array}{l}>\mathbf{0},\kern0.5em \mathbf{if}\ \mathbf{ISVs}\ \mathbf{is}\ \mathbf{adopted}\\ {}\kern0.5em \mathbf{0},\mathbf{if}\ \mathbf{ISVs}\ \mathbf{is}\ \mathbf{not}\ \mathbf{adopted}\end{array}\right. $$
(1)

In this equation, \( {\boldsymbol{D}}_{\boldsymbol{i}}^{\ast} \) is a latent variable standing for the unobservable intensity of adoption measured as the proportion of cropland allotted to ISVs. The variable is positive for adopters and zero for non-adopters. Adoption occurs when three factors hold simultaneously: the discounted expected utility of profit from ISVs adoption is positive, the farmer is sufficiently aware of ISVs, and the farmer has access to capital to invest in the new sorghum enterprise (Grabowski and Kerr 2013). Each constraint is independent. The probability of allotting land to ISVs is the multiple of the probability of each constraint. We could estimate Eq. (1) using a joint maximum likelihood as in Jones (1992), Smith (2003), Moffatt (2005), Teklewold et al. (2006), Shiferaw et al. (2015), and Burke, Myers, and Jayne (2015). The underlying assumption is that a binomial probability model governs the binary outcome of whether an outcome variable has a zero or a positive realization. The likelihood function is therefore separable with respect to the different parameters and is the sum of the log likelihoods from two separate models—a binomial probability and a zero-truncated model. The maximization of different components of the log-likelihood function generates consistent, efficient, and unbiased estimates. Expressions defining farmer groups with desired demand but constrained by a lack of information and capital are as follows:

$$ {D}_i^{\ast }={\beta}^T{X}_i+{\mu}_i;\kern1em {I}_i^{\ast }={G}^{c\mathbf{1}}={\alpha}^T{z}_i+{\omega}_i;\kern1em \mathbf{and}\kern1em {S}_i^{\ast }={G}^{c\mathbf{2}}={\delta}^T{h}_i+{\varepsilon}_i. $$
(2)

In Eq. (2), \( {\boldsymbol{D}}_{\boldsymbol{i}}^{\ast} \) is the observed demand that is truncated at zero, excluding non-adopters (Tobin 1958); I* and S* are the unobservable demand constrained by a lack of information and capital, respectively; z and h are the vectors of covariates that affect access to agricultural information and capital, respectively; and α and δ are the parameter vectors of the model. The random variable μi is N(0, σ2), and the random variables ωi and εi are N(0, 1).

Estimating Eq. (2) using a multiple-hurdle Tobit (Tobin 1958) framework as explained in Feder, Just, Zilberman (1985), Roodman (2011) and Croissant, Carlevaro, Hoareau (2016) allows the prediction of both intensity and probability of adoption. The first hurdle defining adoption and non-adoption is modeled as a probability choice where adoption occurs with probability \( \boldsymbol{P}\left({\boldsymbol{D}}_{\boldsymbol{i}}=\mathbf{1}\right)=\boldsymbol{P}\left({\boldsymbol{y}}_{\boldsymbol{i}}^{\ast}>\mathbf{0}\right) \) and non-adoption with probability \( P\left({\boldsymbol{D}}_{\boldsymbol{i}}=\mathbf{0}\right)=\boldsymbol{P}\left({\boldsymbol{y}}_{\boldsymbol{i}}^{\ast}\boldsymbol{\le}\mathbf{0}\right)=\mathbf{1}-\boldsymbol{P}\left({\boldsymbol{y}}_{\boldsymbol{i}}^{\ast}>\mathbf{0}\right) \), where P(.) is the probability function and \( {\boldsymbol{y}}_{\boldsymbol{i}}^{\ast} \) is the latent variable representing the intensity of adoption. In the second and third hurdles, singular probability choice models replace the second and the third expression such that \( P\left({\boldsymbol{I}}_{\boldsymbol{i}}^{\ast}=\mathbf{1}\right)=\mathbf{1} \) and \( P\left({\boldsymbol{S}}_{\boldsymbol{i}}^{\ast}=\mathbf{1}\right)=\mathbf{1} \). To estimate Eq. (2), Smith (2003) suggests setting zero correlations between random disturbances. The Voong test (Vuong 1989) tests the hypothesis of no correlation between incidence and intensity of adoption.

The four subgroups of farmers discussed above included adopter (505 sample households), non-adopter with desired demand and without capital constraint but lacked enough information (150 sample households), non-adopter with capital constraints (85 sample households), and non-adopter with no desire to adopt improved sorghum varieties and no capital or information constraints (82 sample households). The average time between learning about ISVs and field testing was 3.76 years, and for the third quartile, this time was 4 years. Farmers in the desired demand group who lacked information were either not aware of any improved sorghum varieties, or if they were aware, then the threshold was less than 4 years. Farmers in the desired demand group who were aware of ISVs were asked follow-up questions to identify reasons for non-adoption, and they either identified lack of capital or credit as a major constraint to adoption.

There are three types of covariates to include in Eq. (2): farm and farmer associated attributes, attributes associated with the technology, and farming goals. Examples of these variables include human capital represented by the level of education of the farmer, risk and risk management strategies, and access to the institutional support systems such as marketing facilities, research and extension services, availability of credit, and transportation. Other variables include production factors, such as farm size, number of livestock, and off-farm income and income sources. Farmers may have different farming goals such as subsistence or market-oriented farming. Feder and Slade (1984); de Janvry, Fafchamps, and Elisabeth (1991); Holden, Shiferaw, and Pender (2001); and Adegbola and Gardebroekb (2007) describe these variables in detail.

Apart from finding factors affecting adoption, understanding the diversity of farmers is of critical importance for the successful development of interventions. We extended this study by grouping farmers into sub-homogenous groups with similar adoption patterns through a two-step cluster analysis. There were three main procedures applied in the cluster analysis: hierarchical cluster analysis, k-means cluster analysis, and two-step cluster analysis (Rousseeuw 1987). Hierarchical clustering is useful for small datasets or when examining changes (merging and emerging clusters). With k-means clustering, the number of clusters is specified in advance, and k is the number of clusters. It is also efficient when using normally distributed continuous variables and when there is enough data to allow variability among the created clusters (Gower 1971).

Two-step clustering is suitable for large datasets, especially when there is a mixture of continuous and categorical variables (Gorgulu, 2010). The goal is to automatically form several clusters based on the mix of categorical and continuous variables. Most algorithms for two-step clustering use the first step to pre-cluster the data into many small sub-clusters. The second step uses the pre-cluster to form the desired number of clusters, or if the desired number of clusters is unknown, then these algorithms will automatically find the best number of clusters. In this study, we used two-step clustering tools to group sample households into homogenous groups. The variables used for grouping were both categorical and continuous and included the estimated probability of censoring (P(y>0)) and the estimated expected value of an uncensored dependent variable (E(y ∣ y>0) and all statistically significant variables in Eq. (2).

The first step involved calculating Gower’s distance matrix to separate households into (dis)similar groups. We could not use the Euclidean distance since it is valid for only continuous variables. For the limitations of using Euclidean distance in cluster analysis, see Gower (1971) and Struyf, Hubert, and Rousseeuw (1997). After calculating Gower’s distance matrix, the second step involved finding an optimal number of clusters and portioning the (dis)similar groups partitioned around medoids (PAM) to form clusters and using a silhouette distance to determine optimal number of clusters as suggested in Rousseeuw (1987), Kaufman and Rousseeuw (1990), and Pollard and van der Laan (2002). This approach depends on the actual partition of the objects and not on the type of clustering algorithm. The best method to visualize the formed clusters is t-distributed stochastic neighbor embedding or t-SNE. Developed by van der Maaten and Hinton (2008), t-SNE is a dimension reduction technique that tries to preserve the local structure and make clusters visible in a 2D or 3D visualization. t-SNE is a non-linear dimensionality reduction algorithm for finding patterns in the data by grouping observation clusters based on similarities in a large dataset with many variables. It is extremely useful for visualizing high-dimensional data. It overcomes the limitations of many linear dimensionality reduction algorithms and concentrates on placing dissimilar data points far apart in a lower dimension representation. t-SNE is based on probability distributions with a random walk on neighborhood graphs to find the structure within the data. Bunte et al. (2012) and Donaldson (2016) show that t-SNE presents high dimension data on low dimension while preserving global geometry at all measurement scales in the dataset. We conducted all analysis in the R environment (R Core Team 2017).

Results and discussion

Factors affecting adoption

Table 1 presents summary statistics on the incidence and intensity of adoption. In the table, farm size is the total land area cultivated in the 2013/2014 farming season. Most farmers cultivated a single variety of sorghum rather than a combination of different varieties. The widely adopted improved sorghum variety was Macia. About 22% of the households adopted the Macia variety and 18% of the farmers adopted the Tegemeo variety. Hakika and Macia adopters had smaller land holdings in terms of cultivated land. Macia variety adopters cultivated about 1.91 ha of land and allotted about 0.94 ha to Macia variety. Adopters of the Hakika variety cultivated 1.62 ha and allotted 0.83 ha to that variety. Other households cultivated more than 2.28 ha and allotted less than 0.82 ha to ISVs. The proportion of land allotted to ISVs ranged from 13% for both Tegemeo and Pato to 26% for Macia. However, the proportion of land allotted to the Macia variety was more variable compared to others with a standard deviation of 21%. For Hakika, the proportion of land allotted to that variety was 25%, and the standard deviation was 15%. The land apportioned to other varieties was less than 17%, and the standard deviation was less than 13%. In the sample, 91% of non-adopters cultivated local varieties other than Langalanga, a variety of choice for non-adopters.

Table 1 Land allocation to improved and local varieties

Table 2 shows the results from the multiple-hurdle Tobit model, and we present summary statistics of all covariate variables in Appendix. The first part of Table 2 shows the results estimated with the hypothesis that there is no correlation between the main adoption equation and the two hurdle equations (Estimate 1). The results in the second part (Estimate 2) are after imposing correlation among the three equations. For each estimate, the log-likelihood ratios compare the specified model with a naive model, defined as a model without covariates. In both cases, the models with covariates performed better than the naive model at the 10% and 1% level of significance for the independent and dependent models, respectively. The Vuong test (1989) in the first estimate (Estimate 1) compares the presented results with a simple choice model as suggested by Heckman (1976, 1979). The test minimizes the Kullback-Leibler information criterion, and the test results are for finding the best parametric model specification. In this case, the independent multiple-hurdle Tobit model without correlation performed better than a simple-selection model. The estimated Vuong test statistic is 29.1980 and was significant at the less than 1% probability level; rejecting the null hypothesis that the two models were equivalent.

Table 2 Regression results on factors affecting adoption

The Vuong test also compares the specification between independent (Estimate 1) and dependent models (Estimate 2) that impose correlation between the main adoption equation and the hurdle equations. The dependent model was the model of choice compared to the independent model as showed by statistical significance of the Vuong test (p < 0.01). In addition, the estimated correlation parameters were statistically significant (p < 0.01). Particularly, there was a high and negative correlation between adoption and lack of both information and capital. Although the correlation is not causation and because we are modeling intensity of adoption, we can conclude categorically that lack of information and limited capital decreases both incidences of adoption and adoption intensities.

The results in Table 2 also show a high positive correlation between lack of information and capital constraint that is associated with decreased incidence and intensity of adoption. The positive relationship implies that most farmers who lack information on ISVs are also likely to be poor. In the study area, the main source of agricultural information is from both the public agricultural research and extension systems. Their effectiveness in influencing adoption of new agricultural technologies depends on the strength of linkages between farmers, extension agents, and research scientists. These linkages are still weak, and there is no incentive or mechanism for either extension agents or research scientists to network with poor households. Poor households are also likely to be outside of the information networks such as farmer-to-farmer linkages, participation in farmer field schools, or contract farming. These variables are important during the adoption process and have a high impact on incidence and adoption intensity.

Because the dependent multiple-hurdle Tobit model results are superior, this discussion also focuses on the second part of Table 2. For the adoption equation, even though the gender of household head is not statistically significant, this parameter is positive, showing that households headed by male farmers are more likely to adopt ISVs compared to female-headed households. Most adoption studies show that gender-linked differences in the adoption of agricultural technologies are not directly attributable to a farmer being male or female but to differences in access to key requisites such as improved seeds. Female-headed households are likely to lack the resources and networks that allow male-headed households to access the primary and secondary inputs that are necessary for the adoption of agricultural technologies. However, some studies, including De Groote et al. (2002), show that the gender of farmers did not influence adoption, which is contrary to other studies such as Thomson, Gelson, and Elias (2014) that report that gender was important in explaining the adoption of improved seeds.

Other non-statistically significant variables that represented household characteristics included labor availability, education level of household members, and income level of the household. These results are inconsistent with other studies that show these variables influence the adoption of improved seeds. Labor availability is usually associated with adoption (Hoop et al. 2014). However, the labor market tends to dictate technology adoption depending on whether the area targeted with the new agricultural technology has a net labor surplus or the proposed technology is labor saving or labor intensive. The labor market also depends on the opportunity cost of off-farm labor. Due to the subsistence nature of the farming system and the lack of alternative use of surplus labor, labor may not be a major constraint in the study area (Diagne 2006). Studies including Feleke and Zegeye (2005), and Thomson, Gelson, and Elias (2014) reported a statistical influence of education on adoption of improved seeds. Education level is associated with human capital and the ability of farmers to adjust faster to new production and market conditions. Similarly, Kaliba (2004); Langyintuo and Mungoma (2008); Marra, Pannell, and Ghadimb (2003); and Awotide et al. (2012) argue that wealth is often associated with the adoption of new agricultural technologies because wealthier farmers are more likely to try new agricultural technology.

The marital status dummy variable (married = 1 and 0 otherwise) was statistically significant (p < 0.1). The results imply that married couples are more likely to adopt ISVs. The results are consistent with other studies including Peterman et al. (2010) and Kondylies and Mueller (2013) that showed that married farmers have distinct agricultural contacts that include extension agents and agro-dealers compared to divorced, widowed, or single farmers who are more dependent on other farmers as their reliable source of agricultural information. The average age variable for all household members was also statistically significant (p < 0.1) but negatively associated with incidence and intensity of adoption of ISVs. The results imply that in the study area, the adopters of ISVs were young households. Although there are studies indicating that the age of the farmer does not influence adoption (Paudel and Matsuoka 2008), the results of other studies, such as Kaliba, Verkuijl, and Mwangi (2000); Wakeyo and Gardebroek (2013); Gebrezgabher et al. (2015); and Lambert, Paudel, and Larson (2015), support these results that suggest that older farmers are likely to be more risk-averse than younger farmers.

Knowledge measured in years since the farmer was aware of ISVs has a positive and highly significant impact on adoption. As shown by Leggesse, Burton, and Ozanne (2004); Diagne (2006); Diagne and Demont (2007); and Oster and Thornton (2009) in most cases, exposure to a technology is not random, and technology awareness is an important precondition for adoption to occur. However, individual farmers need enough time to transition from old to new agricultural technology. After adopting the technology, the farmer may decide to continue using it or stop using it. This action depends on the experienced benefits and associated risks after adoption (Asuming-Brempong et al. 2011; Kabunga, Dubois, and Qaim 2012).

Other variables with positive and significant impacts on adoption included the quality of extension services, the intensity of research activities, and market participation. These three variables are related to the availability of institutional support systems. Development experts have emphasized agricultural extension and rural education as crucial in achieving agricultural development, poverty reduction, and food security (Evenson, 2001; Feder, Murgai, and Quizon 2003; Ginéa and Yang 2009). Agricultural extension services are useful for incentivizing the adoption of ISVs and associated agronomic practices that increase yield such as line planting and weeding. Similarly, increases in research activities imply that there are research-managed, farmer-managed, or on-farm trials that create awareness, which encourages others to test and eventually adopt new technologies and practices (Lambrecht et al. (2014). The promotion of the improved agricultural technologies in this study hinges on the premise that adoption of improved seeds will results in higher production and increased productivity. Increased production and productivity will allow smallholder farmers to enter the market to sell their surplus crop. However, there are limited studies that focus on the interdependencies between market participation and adoption of new agricultural technologies.

Dummy variables standing for the three regions (i.e., Kilimanjaro, Manyara, and Singida) control for the possibility that farming systems with favorable soil and climatic conditions might be more likely to have farmers who are willing to adopt ISVs. We dropped both Dodoma and Manyara Regions from the model due to issues related to the singularity of the Hessian matrix. Principally, Singida farmers were more likely to adopt ISVs than farmers in the Dodoma and Manyara Regions, and Kilimanjaro farmers were less likely to adopt ISVs than farmers in the Dodoma and Manyara Regions. There was no statistically significant difference in the incidence of adoption among farmers in the Shinyanga, Dodoma, and Manyara Regions. Comparatively, farmers in the Singida Region were dependent on sorghum production as a source of food and income; therefore, they were more likely to try new varieties that would increase the production and productivity of their available resources. In the Kilimanjaro Region, sorghum production is at an infancy stage of development. Farmers are still depending on landraces with known yields.

The signs on the coefficients of all information constraint variables were negative, as expected. The results suggest that these variables tended to reduce information constraints for non-adopters with a desired demand. The statistically significant variables were income and knowledge of ISVs. This result may indicate that information delivered by extension agents in this study was not otherwise available to certain types of farmers, especially the poor. As discussed before, if a farmer is not aware that a technology exists, then adoption is not possible. In the study area, extension agents as an exogenous source of information may be neglecting the poorest farmers as discussed in Alwang and Siegel (1994) or women and female-headed households who tend to be relatively poor. Moreover, the diffusion of information related to new agricultural technologies such as ISVs is a dynamic process within social networks. Farmers learn about the profitability of the technology and about how to correctly use it from their own experience and from their peers’ experiences. While learning from others is important, several factors can make social learning inefficient. Conley and Udry (2010) show poor farmers rely primarily on family, kinship, and neighbor networks for social learning. However, due to limited social networks, poor farmers are unlikely to see the decision process of peers, making it more difficult to accurately assess the available information about new agricultural technologies.

Except for the intensity of research activities variable, the signs of the coefficients of all capital constraint variables were negative, as expected. Similarly, the results suggest that these variables tended to reduce capital constraints for non-adopters with a desired demand. The variables for income, knowledge about ISVs, and market participation were statistically significant in reducing capital constraint among non-adopters with a desired demand. In the adoption literature, one of the most highlighted constraints to agricultural technology adoption is the availability of capital, which reduces both liquidity constraints faced by farmers and risk aversion. Availability of capital facilitates experimentation with new agricultural technologies and enhances diffusion of new agricultural technologies as rich farmers or farmers with access to credit are more likely to be the early adopters. A common finding is that adoption requires a set of minimum incentives and capacities from the farmer’s perspective or an investment threshold that is not necessary for traditional production practices. If farmers are assured that investment in new agricultural technologies will have positive returns, then they may be encouraged to access credit from all markets. Furthermore, market participation by farmers increases the net returns from agricultural production and available resources including capital.

Groups of farmers

Figure 1 shows the relationship between the estimated average silhouette distance and the proposed number of optimal clusters. When there are three clusters, the mean silhouette distance is 0.33 for the entire dataset. However, the shape of the graph does not taper off after three clusters, which implies that many farmers are outside or on the boundary of the three selected clusters. Tapering occurs when the number of clusters equals 11. We expected these results given the heterogenous nature of small-scale farms. For example, using cluster analysis to study family farms in Switzerland, Hoop et al. (2014) estimated a mean silhouette distance of 0.24 for 12 optimal clusters. Gorgulu (2010) used similar techniques to classify dairy animal performance and calculated average silhouette distances that were between 0.35 and 0.52. In this study, the average silhouette distance was 0.203 for 12 clusters.

Fig. 1
figure 2

Mean silhouettes distance by number of clusters

We used visual inspection to find the number of clusters with the best results after plotting the clusters using t-SNE to reduce the number of dimensions to two by giving each data point a location in a map, thereby avoiding the crowding of points in the center of the map. We used the Barnes-Hut algorithm to approximate the distance between the points because it reduces the number of pairwise distances. Nine clusters provided the best visualization results with few outliers and few overlaps. In Fig. 2, the whole numbers are the cluster names, and the fractions are the estimated (mean) probability of adoption within the cluster. Farmers in cluster 4 have the highest probability of adoption (0.82), and farmers in cluster 6 have the lowest probability (0.20).

Fig. 2
figure 3

Relative position of the nine clusters

The polar plots in Fig. 3 illustrate the most prominent variables within each identified cluster. The variables on the polar axis represent the following in clockwise order: geometric mean age of adults in the household (age), awareness of ISVs in years (aware), credit availability (credit), weighted education level in years (edu), expected adoption intensity in hectares (expv), labor availability in labor equivalent (labor), market participation (market), estimated probability of adopting ISVs (pado), quality of government extension services (qext), intensity of research activities (rese), marital status of household head (status), gender of household head (typehh), and total wealth in Tshs (wealth). In Fig. 3, the general variables that distinguish the clusters are the wealth indicator that removes capital constraints and awareness that removes information constraints. Notice that the prominence of other variables depends on the individual clusters.

Fig. 3
figure 4

Polar plots of identified clusters

The nine identified clusters illustrate the typical characteristics of diverse groups of farmers within the sample. Ninety-nine of the sample households (12.07%) belonged to cluster 1. This cluster included mostly sample households with young farmers and with intermediate awareness of ISVs. While the probability of adoption was intermediate (0.47), the expected intensity of adoption was high due to awareness of ISVs. Since capital, rather than information, was highly limiting, we refer to this cluster as “adopters with adoption potential.” Increases in available capital and awareness through increased quality of extension services and intensity of research activities could increase the adoption of ISVs among the members of this cluster. The second cluster had 170 sample households (20.73%), and it included sample households with mature family members and a labor supply that was not limiting. The probability of adoption is intermediate (0.32), with a low expected intensity of adoption. We refer to this cluster as “adverse adopters.” The members of this cluster do not face both capital and information constraints and have the basic resources to adopt ISVs. They need more training and more evidence-based extension services such as field days and demonstration plots that manifest the superiority of ISVs.

The third cluster included 54 sample households (6.59%). In this cluster, all variables included in the regression model were above the third quartile. The probability of adoption was 0.77, the members were potentially married, and the household head was male. This cluster is referred to as “continuous adopters,” and its members need continuous support from both research and extension services. The fourth cluster had 93 sample households (11.34%) and is similar to cluster 3, but its members included young farmers with the highest probability of adoption at 0.82. We referred to this cluster as “continuous innovators” since its members have all the characteristics of innovators. Research and extension agents could use this cluster to test new agricultural technologies related to ISVs in the study area through farmer-managed trials. The fifth cluster included 30 sample households (3.66%). Household members in this cluster were also similar to cluster 3 but included young and wealthy farmers. Cluster members were more aware of ISVs and had access to credit. The low probability of adoption (0.36) in this cluster could be attributable to low access to research and extension services. Cluster 5 is referred to as “adopters in waiting” since adoption among these clusters could be scaled up through an increase in the intensity of research and extension activities. Referred to as “typical non-adopters,” cluster 6 included 105 sample households (12.80%). The members of this cluster had all the attributes that positively influence adoption with both capital and information that were not limiting. The probability of an adoption in cluster 6 was the lowest at 0.2, meaning that the characteristics of ISVs and the existing institutional support systems do not influence adoption, and non-adoption is a choice made by individual farmers.

All variables for the household members in cluster 7 that includes 35 sample households (7.93%) were between the second and third quartile of the overall sample. The probability of adoption, however, was low at 0.33. The attributes of the members of this cluster were quite mixed; however, approximately 35% of the households in this cluster had unmarried household heads, and 45% of the households headed by females in this study belonged to cluster 7. We called this cluster the “virtual adopters” since adoption is mainly constrained by the unavailability of basic resources such as land and labor, which is magnified by a lack of capital and information about the technology. Cluster 8 had 35 sample households (4.27%), but 60% of the households were in the first quartile (based on wealth distribution), and the majority were unmarried couples and included households headed by females. Despite being highly aware of the technology, the probability of adoption was 0.46, which was high given the attributes of this cluster and factors that positively influence adoption. We called this cluster the “enthusiast adopters.” In defiance of resource constraints, the members of clusters 7 and 8 had the potential to use all available resources to adopt ISVs. Directing research and extension activities that focus on easing resource constraints would be beneficial for these two clusters. Cluster 9 had 169 sample households (20.61%) and contained members who were old, wealthy, and with above average resources including labor and credit; therefore, this cluster was referred to as the “veteran adopters.” The probability of adoption for this cluster was 0.49, and awareness campaigns or/and increased research and extension activities could scale-up the adoption among members of this cluster. These results show that farmers are not homogenous and need tailored research and extension messages or/and public policies to scale-up the adoption of ISVs. While awareness campaigns among households in clusters 4, 6, and 7 could increase adoption, the households in clusters 7 and 8 need basic resource support systems to scale-up the adoption process. Other clusters need more classroom training, field days, and demonstration trials to create confidence and assurance of the performance of ISVs.

Conclusion

Adoption studies are evaluation tools aimed at generating knowledge to intensify the impact of agricultural programs. Using data from northern and central Tanzania, the focus of this study was on finding strategies to alleviate existing constraints and scale-up the adoption process. We mapped the factors influencing adoption using a multiple-hurdle Tobit model and t-distributed stochastic neighbor embedding (t-SNE) to cluster and visualize homogenous groups of farmers. The results showed that there is a threshold for both knowledge and capital before a farmer begins experimenting with improved sorghum varieties. Assurances that improved sorghum varieties are superior to landrace will sensitize the farmers to access credit from both informal and formal markets. Market participation will increase returns from available resources and profitability of the sorghum enterprise and will therefore increase adoption.

Demonstrating the superiority of improved sorghum varieties will have a more effective outcome when applied to households with limited networks. Learning by doing or learning from other peers and public policies such as targeted input subsidies will have a high impact. Classroom training and demonstration plots can end information asymmetry and increase the knowledge threshold, which will jump-start and scale-up the adoption process. Evidence from this study also suggests that young farmers with resources and knowledge about improved sorghum varieties are increasingly adopting improved sorghum varieties. Mass media could play a key role in increasing awareness of the potential of improved sorghum varieties to increase productivity and create wealth. Establishing a central delivery scheme and training of extension professionals on using mass media sources are highly recommended. This scheme could facilitate the delivery of well-designed, effective, and efficient agricultural extension content to sorghum farming communities. Regional television stations and radios and hand-held electronic devices could provide a continuous and sustained means of information and education for farmers in remote villages. Due to a comparatively short crop cycle (about 6 months), mass media messages must be highly informative, intensive, and coordinated to avoid mixed messages and information overload. Studies addressing complementary factors such as soil quality as related to organic and inorganic fertilizer use and marketing studies to analyze the localized small-scale value-added potential of sorghum would increase both market participation and profits from sorghum enterprises.

There is also an urgent need to strengthen the ability of local government and the private sector to play a more prominent role in delivering tailored services to underserved groups including female farmers and the poor who face different production and market constraints. A strong pedagogical linkage between research, extension, and policy professionals is essential in promoting appropriate, easily accessible, and current agricultural technology. Training to incentivize scientists and extension agents and engagement of policymakers during farmer training and field days are valuable to supporting these important linkages.