1 Introduction

Amidst the myriad states contributing to India’s agricultural tapestry, Uttar Pradesh stands out as a significant player. Renowned as the “Agricultural Powerhouse,” the state plays a pivotal role in the nation’s food production. Its diverse agro-climatic conditions and reliance on agriculture for livelihoods make it a crucial focal point for understanding and mitigating regional disparities. Although Uttar Pradesh witnessed significant strides in agriculture spurred by the Green Revolution, persistent imbalances continue within the state’s agricultural landscape and certain districts in Uttar Pradesh continue to lag behind due to topographical constraints and other contributing factors [1]. To effectively address these challenges and disparities, there is a pressing need for comprehensive methodologies that can accurately quantify development in agriculture sector and identify key areas for intervention. This is what prompted us to study the spatio-temporal disparities in the agricultural development of the state.

The selection of an appropriate method for quantifying regional disparities in agricultural development has been a subject of ongoing research for several decades. Considering data for a single time point, initial studies to assess spatial disparities in development, and rank various regions in different sectors of development were based on the computation of Composite Indices of Development (See; Iyengar and Sudarshan (1982) and Narain et al. (1991) among several others) [2, 3]. Later, spatio-temporal developmental efforts in various sectors were analyzed, considering the data for two or more time points separately [4, 5 and references cited therein]. However, no work has been reported that measures the developmental efforts in the agricultural sector in any region using MCDM methods. It would be, therefore, novel, and interesting to measure the developmental efforts in agriculture using an MCDM method. MCDM methods are used to make the decision-making process much easier and faster in the presence of several alternatives and criteria. There are numerous MCDM methods such as TOPSIS, VIKOR (Vlse Kriterijumska Optimizacija Kompromisno Resenje—Multicriteria Optimization and Compromise Solution), PROMITHEE (Preference Ranking Organization Methods for Enrichment Evaluation), among several other methods. A detailed and an excellent critique of these methods has been given in Opricovic and Tzeng [6]. Over the years, several other articles have also appeared in the literature comparing the performance of these methods [7,8,9,10,11 and references cited therein]. Recent contributions in MCDM research, such as the work by Ecer et al. [12], highlight the importance of integrating sustainability considerations into decision-making processes, particularly within the context of healthcare management and supply chain operations. Additionally, innovative approaches like the Base Criterion Method (BCM), introduced by Haseli et al. [13] and Haseli and Sheikh [14], offer promising solutions for obtaining criteria weights with higher accuracy and efficiency compared to traditional methods. These advancements contribute to the ongoing evolution of MCDM methodologies, providing decision-makers with more reliable tools to navigate complex decision landscapes effectively [15].

TOPSIS, a prominent MCDM method, has garnered substantial attention due to its simplicity and efficacy [16]. The method has been widely applied in diverse domains that include ranking developmental levels of countries in various contexts such as HDI, SDG’s etc. Against this backdrop, following [17], this paper assesses and ranks various districts in agricultural development in the state of Uttar Pradesh in India using a TOPSIS-based Factor Analytic Model. This innovative approach amalgamates the strengths of TOPSIS and factor analysis, aiming to provide a more comprehensive understanding of regional disparities within the state’s agricultural sector. The analysis in the current study utilizes district-level data for the years 2019 and 2020 to carry out a factor analysis model that supports the general principles of factor analysis and is based on TOPSIS improvement. Using Hierarchical Cluster Analysis, the work in the paper also attempts to categorize various districts of the state into different levels of development. The work in the paper is also directed towards the determination of model districts for poorly developed districts which will be helpful in accelerating the overall agricultural growth in the state.

The layout of the paper is as follows: Sect. 2 of the paper describes various agricultural indicators and methodology used in the paper. Section 3 of the paper delves into an in-depth analysis of agricultural development within the state. Following this, Sect. 4 focuses on the classification of districts based on the derived composite index. Section 5 presents the identification of model districts aimed at facilitating development in less-developed regions. Finally, Sect. 6 offers conclusions drawn from the study’s findings and provides suggestions for future research and policy implications.

2 Materials and methods

2.1 Agricultural indicators

The study is based on twenty-six agricultural indicators drawn from ‘District Wise Development Indicators Uttar Pradesh 2019 and 2020’ and ‘Statistical Abstract’ issued once a year by the Economics and Statistics Division, Government of Uttar Pradesh (Table 1).

Table 1 List of Agricultural Indicators

2.2 Methodology

2.2.1 TOPSIS

The process in TOPSIS consists of normalizing the original data matrix, determining optimal and worst solutions, and assessing the proximity of each evaluation object to the optimal solution. The method is based on the Euclidean distance between the evaluation object and the optimal and the worst plans, providing a basis for evaluating advantages and disadvantages. The specific steps of TOPSIS are as follows:

  1. 1.

    Data representation There are \(n\) evaluation objects and \(m\) evaluation indices represented by the original data matrix of order \(n \times m\) given by \(X = \left( {x_{ij} } \right)\).

  2. 2.

    Normalization Maximum index uniform transformation and minimum index normalization transformation are as follows:

    $$ Maximum: Z_{ij} = \frac{{X_{ij} }}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} X_{ij}^{2} } }}\;{\text{and}}\;Minimum: Z_{ij} = \frac{{1/X_{ij} }}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} (1/X_{ij}^{2} )} }} $$

    In normalized matrix \(Z = \left( {Z_{ij} } \right)_{nXm} , \) the maximum and minimum values of each column constitute the best and worst vector, expression with \(Z^{ + } = \left( {Z_{max,1} , Z_{max,2} , \ldots ,Z_{max,m} } \right)\) and \(Z^{ - } = \left( {Z_{min,1} , Z_{min,2} , \ldots ,Z_{min,m} } \right)\), respectively.

  3. 3.

    Distance calculation Distances between the \(i^{th}\) evaluation object and the optimal \(D_{i}^{ + }\) and worst \(D_{i}^{ + } \) schemes are calculated using Euclidean distance formulas.

    $$ D_{i}^{ + } = \sqrt {\mathop \sum \limits_{j = 1}^{m} \left( { Z_{max,j} - Z_{ij} } \right)^{2} } \;{\text{and}}\; D_{i}^{ - } = \sqrt {\mathop \sum \limits_{t}^{l} \left( { Z_{min,j} - Z_{ij} } \right)^{2} } $$
  4. 4.

    Proximity calculation The proximity \(\left( {C_{i} } \right)\) between the \(i^{th}\) evaluation object and the optimal scheme is determined as:

    $$ C_{i} = D_{i}^{ - } \theta $$

    where \(\theta = (D_{i}^{ - } + D_{i}^{ + } )^{ - 1}\) and \(0 \le C_{i} \le \) 1.

If the value of this index is closer to one, then it indicates a higher level of agricultural development while if it is closer to zero indicates a lower level of agricultural development in the time interval.

2.2.2 Factor analysis model

Factor analysis aims to derive a linear function with a reduced number of common factors, coupled with the sum of specific factors, to articulate each variable of the original observations. The objective is to attain a coherent explanation of the correlation among the original variables and the dimensionality of the simplified variables. Here the original data matrix is denoted by \(V = \left( {V_{1} ,V_{2} , \ldots ,V_{m} } \right)\) consisting of observations each on m indicators. The specific steps of Factor Analysis include:

  1. 1.

    Standardization Standardized the original data matrix \(V = (V_{ij} )_{nXm}\)

  2. 2.

    Correlation matrix calculation Calculate the correlation matrix \(R\) using standardized matrix \(V.\)

    $$ R = \frac{1}{n - 1}V^{T} V $$
  3. 3.

    Principal component analysis (PCA) Apply PCA on the correlation matrix R.

  4. 4.

    Eigen value determination Determine the eigen values \(\lambda\) and the corresponding eigen vector \(a\) from the correlation matrix R.

  5. 5.

    Common factor extraction Extract common factors based on the eigenvalues. Choose \(p\) factors such that \(\lambda_{1} \ge \lambda_{2}\) ≥ …. ≥ \(\lambda_{p} \ge 0.\) Using these, the loading matrix \(A = (a_{ij} )_{pXm}\) is obtained.

  6. 6.

    Factor rotation Rotate the loading matrix to maximize interpretability. This involves transforming the loading matrix \(A\;{\text{to}}\;A^{*}\), typically using methods like varimax rotation.

  7. 7.

    Factor score calculation Calculate factor scores \(f_{i} \) for each common factor for each sample. Various estimation methods can be employed, such as regression method or Bartlett method.

  8. 8.

    Comprehensive evaluation index Calculate the comprehensive evaluation index \(y\) as a linear combination of the factor scores and their respective eigenvalues:

    $$ y = \mathop \sum \limits_{i = 1}^{p} \frac{{\lambda_{i} }}{{\mathop \sum \nolimits_{j = 1}^{p} \lambda_{i} }}f_{i} $$

In these formulae, \(n\) represents the number of samples, \(m\) denotes the number of indicators, and \(p\) signifies the number of common factors.

2.2.3 TOPSIS based factor analytic model

Suppose the data for \(m\) indicators are available at multiple time points say \(t = 1,2 \ldots l\) for \(n\) districts. Then, the data matrix \(V^{\left( t \right)} \) for each time point \(t = 1,2 \ldots l\) can be represented as \(V^{\left( t \right)} = (v_{i1}^{t} , v_{i2}^{t} , \ldots v_{im}^{t}\)), where \(i = 1,2, \ldots .n, j = 1,2 \ldots m \) of the \(i^{th}\) indicator for all the districts at time point \(t\). As all the indicators have different units of measurements, the data available for all the indicators are standardized at first. The eigen values of the variance covariance matrix of standardized variables are then computed and arranged in decreasing order of magnitude, say \(\lambda_{1} \ge \lambda_{2}\) ≥ …. ≥ \(\lambda_{n}\). If \(p\) eigen values are larger than unity, then the regression factor scores \(f_{k}\) for all the \(p\) factors are extracted. Using these extracted factors, a weighted factor score for each time point is computed. It is pertinent to mention here that the current data is taken for two time points, i.e., the years 2019 and 2020.

The weighted factor score for each time point is given by

$$ y_{t} = \frac{{{ }\lambda_{1t} }}{{\mathop \sum \nolimits_{k}^{p} \lambda_{kt} }}f_{1t} + \frac{{{ }\lambda_{2t} }}{{\mathop \sum \nolimits_{k}^{p} \lambda_{kt} }}f_{2t} + \cdots + \frac{{{ }\lambda_{pt} }}{{\mathop \sum \nolimits_{k}^{p} \lambda_{kt} }}f_{pt} ; \quad t = 1,2 $$
(1)

Equation (1) helps us to develop a composite index system using the TOPSIS method \(.\) The steps for the evaluation of composite indices are as:

  1. 1.

    Standardization of Weighted factor scores Calculate standardized weighted factor scores for each time point using the formula:

    $$ Z_{ti} = \frac{{y_{t,i} }}{{\sqrt {\mathop \sum \nolimits_{i = 1}^{n} y_{t,i}^{2} } }};i = 1,2, \ldots ,n ;t = 1,2 $$

    Using equation the above equation, we compute the \(n X 1\) vectors of standardized weighted factor scores \(Z_{1i}\) and \(Z_{2i}\) for the time point 1 and 2.

  2. 2.

    Identification of optimal and worst vectors Let \(Z_{{{\text{max}}\left( {1i} \right)}} ,Z_{{{\text{max}}\left( {2i} \right)}} \) and \(Z_{{{\text{min}}\left( {1i} \right)}} , Z_{{{\text{min}}\left( {2i} \right)}}\) be the maximum and minimum values of \(Z_{1i}\) and \(Z_{2i}\). Thus, corresponding optimal and worst vectors for \(i = 1,2, \ldots ,n;j = 1,2, \ldots ,l\) are

    $$ Z^{ + } = \left( {Z_{{\max \left( {1i} \right),}} Z_{{\max \left( {2i} \right)}} } \right) {\text{and}} Z^{ - } = \left( {Z_{{\min \left( {1i} \right),}} Z_{{\min \left( {2i} \right)}} } \right) $$
  3. 3.

    Calculation of distances to optimal and worst solutions Let \(D^{ + }\) and \(D^{ - }\) be the distance of \(i^{th}\) object and the optimal and worst solutions respectively, given as:

    $$ D_{i}^{ + } = \sqrt {\mathop \sum \limits_{t}^{l} \left( { Z_{ti} - Z^{ + } } \right)^{2} } \;{\text{and}}\; D_{i}^{ - } = \sqrt {\mathop \sum \limits_{t}^{l} \left( { Z_{ti} - Z^{ - } } \right)^{2} } $$
  4. 4.

    Derivation of composite index of development Then for each district the composite index of development is obtained as:

    $$ C_{i} = D_{i}^{ - } \theta $$

    where \(\theta = (D_{i}^{ - } + D_{i}^{ + } )^{ - 1}\) and \(0 \le C_{i} \le \) 1.

The resulting​ \(C_{i}\) values range from 0 to 1, indicating the level of agricultural development in the years 2019 and 2020. A value closer to 1 signifies a higher level of agricultural development, while a value closer to 0 indicates a lower level.

3 Agricultural development in Uttar Pradesh: a case study

For the time interval 2019–20, the agricultural development in different districts of Uttar Pradesh has been evaluated using the methodology discussed in the previous section. Using the criteria that the eigen values are greater than one, eight factors for both the time periods are retained. In Table 2, eigen values pertaining to two-time points are given.

Table 2 Eigen values for time periods 2019 and 2020

Using these eigen values, the composite indices for the development of the agricultural sector for all the districts of Uttar Pradesh, which have been categorized according to various regions, are obtained and given in Table 3:

Table 3 TOPSIS based factor analysis: agricultural development in UP

According to Table 3, the average composite index value for the western part of Uttar Pradesh is 0.628, which is the maximum value, indicating that agricultural growth in this area is excellent. At the same time, we noted that the average composite index value for the Central area was 0.505, significantly higher than the average values for the Bundelkhand and Eastern regions, which were 0.256 and 0.435, respectively. The Bundelkhand region of Uttar Pradesh has the worst agricultural development, which is of major concern to the government and decision-makers. From the Table, it is also observed that Basti, Baghpat, Kasganj, Etah, Sant Kabir Nagar, Shahjahanpur, Meerut, Mainpuri, Bulandsahar, Kheri are the top 10 districts in terms of Agricultural Development. Out of these ten districts Basti, which is ranked first, and Sant Kabir Nagar ranked fifth are from Eastern region of the state, while Baghpat, Kasganj, Etah, Shahjahanpur, Meerut, Mainpuri, Bulandsahar are from Western region. Kheri is the only district from the Central region that has been able to make its place in the top ten, while none of the Bundelkhand regions have been able to make their place in the top ten districts.

Sonbhadra, Chitrakoot, Varanasi, Prayagraj, Raebareli, Mirzapur, Banda, Jhansi, Chandauli, Hamirpur are the ten worst-ranked districts in view of agricultural development of the Uttar Pradesh. Out of these top ten poor performers Sonbhadra, Varanasi, Prayagraj, Mirzapur, Chandauli are from Eastern region of the state while, Chitrakoot, Banda, Jhansi, Chandauli, Hamirpur are from Bundelkhand region. It is pertinent to mention that Bundelkhand is the smallest region of Uttar Pradesh in which there are only seven districts but out of these seven districts five districts are found bottom-ranked in the agricultural development of Uttar Pradesh. It is also observed that none of the districts from the Western region are found to be in the bottom ten districts of the state in terms of agricultural development (Fig. 1).

Fig. 1
figure 1

Heat map for sustainable agricultural development index

4 Classification of districts

In the previous section, it is found that some of the districts of Western region are highly ranked in terms of agricultural development while few are from the eastern and central region, while in Bundelkhand region most of the districts are the lowest-ranked districts in the agricultural development of the state. As a result, simply ranking the districts based on the composite indices obtained in Sect. 3 does not provide a meaningful characterization of the level of agricultural development. To overcome such problems in this section we use Hierarchical and K-mean cluster analysis to construct homogeneous groups of the districts in terms of agricultural development of the state.

In the field of social-sciences Cluster analysis is widely regarded as the most significant statistical tool for the categorization of different regions into various groups with similar characteristics. In the present study Cluster Analysis was performed to identify various groups of districts with similar levels of Agricultural development in Uttar Pradesh. Cluster methods were applied using composite indices obtained in Sect. 3. The hierarchical clustering using Ward’s method has been used to compute the group centroids. A non-hierarchical cluster analysis (K-mean) was then used to “fine-tune” the outcomes of the best hierarchical solution as the initial seed points.

The K-mean method assigns districts to clusters according to their distance from centroids and upgrades the location of the centroid depending on the mean values of the cases in each cluster. These steps were repeated until any reassignment of cases did not result in cluster becoming more internally cohesive (homogeneous) and clearly differentiated from one another. As a result, the K-mean method has been successfully utilized to improve the outcomes of Ward’s method.

4.1 Hierarchical cluster analysis and determination of the value of ‘K’

The number of clusters for K-mean cluster analysis is determined by observing the Dendrogram presented in Fig. 2. From the Dendrogram it is observed that 4-cluster solutions are appropriate for grouping the districts based on their developmental characteristics.

Fig. 2
figure 2

Cluster solution of agricultural development in districts of Uttar Pradesh

4.2 K-mean cluster analysis

In this stage, the K-mean cluster analysis is carried out using the district-wise composite index values as variable and each of the seventy-five districts as cases with K = 4 number of clusters. The classification of the districts into four clusters is provided in Table 4 for an in-depth analysis of the findings.

Table 4 Districts in Different Clusters based on Agricultural Development

Cluster I, named “Highly Developed Districts”, consists of eight districts out of which six districts are from the Western region. The mean value of composite indices in this cluster is found to be 0.808 which is closer to 1 implying that it is more advanced than other clusters in terms of agricultural development of the state. Also, the value of coefficient of variation in this cluster is found to be 7.364 which is minimum as compared to the other cluster viz. 2, 3 and 4 implying that regional disparities in this cluster are minimum as compared to the other clusters.

Cluster II, named “Developed Districts”, consists of a maximum number of districts viz. 29 districts out of which maximum districts are from the Western region. The mean value of composite indices in this cluster is found to be 0.636 implying that the performance of this cluster is less than that of cluster 1 in terms of agricultural development of the state, but significantly better than clusters 3 and 4. Also, the value of the coefficient of variation in this cluster is found to be 8.698 minimum as compared to clusters 3 and 4 implying that regional disparities in this cluster are also minimum. Here it is also observed that, the Western region is said to be a highly developed region of the state in terms of agricultural development, since out of 30 districts of the Western region 25 districts were classified into clusters 1 and 2.

Cluster III named “Developing Districts”, consists of 23 districts with a mean composite indices value of 0.416, which is lower than the previous cluster values, showing that this cluster has not made much progress in comparison to clusters 1 and 2. The value of the coefficient of variation is also found significantly larger than the previous two clusters viz. 16.169 indicating that the level of regional disparities in the agricultural sector has significantly increased in this cluster.

Cluster IV, named “Less Developed Districts”, consists of 15 districts out of which most of the districts are from Bundelkhand region. Here it is pertinent to mention that out of 7 districts from Bundelkhand region 6 districts have been classified into this cluster. The mean value of composite indices was found to be smallest viz. 0.227 with the largest value of the coefficient of variation viz. 36.147 indicates that this cluster is poorly developed in terms of the agricultural development of the state with maximum regional disparity.

The most important thing observed from the above table is that most of the districts from the Western region have been classified into Highly Developed and Developed clusters. Thus, the Western region of Uttar Pradesh has been found to be fairly developed in terms of agricultural development. While the Bundelkhand region of the state has been found to be in the most pathetic condition in the agricultural development of the state. Here it is also observed that most of the districts of Eastern Uttar Pradesh have been classified into cluster II which was found to be the largest cluster and has been found in second place in agricultural development after cluster I Cluster IV has been found to be poorly developed in the agricultural sector with maximum regional disparities and requires significant government effort to achieve a balanced regional agricultural growth in Uttar Pradesh.

The distance between the pair of final cluster centers or centroids (in standardized scale) has been presented in the following Table 5, the largest distance of 0.581 has been observed between cluster I and IV while the distance between I and III is also high indicating that the districts of clusters III and IV lagged much behind the districts of cluster I in terms of agricultural development of the state. A minimum distance of 0.172 has been observed between cluster I and II, which shows that regional disparity between these two clusters is minimum as compared to the other clusters (Fig. 3).

Table 5 Distances between final cluster centers
Fig. 3
figure 3

Clustering of districts according to K-means clustering technique

5 Identification of model districts for less developed districts

One of the most essential aspects of this study is determining the model districts for agriculturally less developed districts of the state, so that planners and policymakers may devote appropriate attention to these districts while keeping the model districts in mind. Using the methodology given by Narain et al. [18], firstly the root mean square distances between the districts were computed over all the agricultural indicators. Then, the distance matrix was computed and the model districts for less developed districts were identified as the districts that lie in the interval (5.07, 7.49). As a result, the model districts identified for various less-developed districts are presented in Table 5.

6 Conclusion and suggestions

In conclusion, our study proposes a robust composite index system for evaluating the agricultural development of Uttar Pradesh, encompassing 26 indicators over the time points [2019, 2020]. The “TOPSIS-based Factor Analytic Model” is employed across all 75 districts, categorizing development into four clusters using “Hierarchical and K-Mean Cluster Analysis”. Key findings affirm the prosperity of the Western region and the challenges faced by Bundelkhand, as Western Uttar Pradesh is more prosperous in agricultural development due to its fertile plains, well-developed irrigation infrastructure, diverse crop cultivation, and proximity to markets, with the region’s sugar industry and government support further contributing to its agricultural success. These findings align with previous research conducted by Gulati et al. [19]. To address the regional disparities in agricultural development within Uttar Pradesh, it is imperative to implement precise and effective interventions. This necessitates the utilization of statistical methodologies, such as employing multivariate clustering techniques at the tehsil or block level, to inform data-driven decision-making processes. Encouraging districts with lower levels of development to pursue enhancement, fostering collaborative initiatives among neighboring districts, and conducting comprehensive growth assessments at the micro-level emerge as crucial strategies for fostering a more equitable and balanced regional agricultural landscape (Table 6).

Table 6 Model districts for less developed districts

However, limitations exist. The study relies on data of the years 2019 and 2020, and changes over time may impact the generalizability of findings. Although the use of TOPSIS is widely accepted, however, use of other MCDM methods might affect the results. Additionally, external factors such as policy changes and unforeseen events may influence the dynamic nature of agricultural development. Despite these limitations, our study provides valuable insights and a comprehensive framework for policymakers to foster uniform agricultural development in Uttar Pradesh.