1 Introduction

Identification of a single crop has an advantage for the government as they can undertake different policies for the masses as well as regulate import and export strategy. Crop type maps often come to rescue of national and regional agricultural. They provide information to facilitate water resource planning for irrigation [1, 2], crop yield assessment and forecasting [3, 4] as well as mapping soil productivity [5]. A lot of fluctuations have been seen in the grains market especially in the last decade. The production of wheat and rice dropped from 63 to 16 MT between 2002 and 2007 [6]. This situation was of major concern for the national food security. Hence, for efficient analysis of such conditions, monitoring single crop for its yield, acreage and agricultural pattern is essential [7].

Orthodox methods to compose crop type maps are based on ground surveying and census and record keeping [8]. These methods lack standardization. In order to standardize, the continuous nature of collecting information using remotely sensing satellite has proven efficient [9,10,11]. Satellite images obtained from different sensors and of different time durations can be clubbed together to obtain datasets with relatively low spectral dimensions. Many studies have exploited the use of optical images for carrying out crop-based analysis [12,13,14,15].

One major concern related to the optical data still remains the validity of how accurately the atmospheric corrections are done. This issue is confidently handled by microwave remote sensing [16,17,18]. Crop identification can be successfully carried out by classifying temporal satellite images as reflectance corresponds to crop phenology and single date image analysis is a challenge [19]. Pixel-based hard classification technique where spectral mixing at class boundaries does not exist has been traditionally carried out in crop studies [13, 20]. In such studies, nonlinearity of classes goes unnoticed. The present study tries to overcome this problem of hard classification by using soft classification technique for handling mixed pixels using temporal normalized difference vegetation index (NDVI) obtained from Formosat-2 and temporal synthetic aperture radar (SAR) images obtained from Radar Imaging Satellite-1 (RISAT-1) for classification paddy (Oryza sativa).

The complete growth cycle of paddy is divided into five stages and is observed in the time frame starting from end of June to the start of November. These stages include the transplanting period; seedling development stage; ear differentiation period; the heading period; and the maturation period where the rice plant is matured and ready for harvesting [21]. It is observed that long growing season is favorable for high paddy yields, and hence, the transplanting season starts at an early date as compared to the previous growth cycles. The life cycle of paddy is around 120–150 days, but a variation is observed as the paddy type changes. The spectral signatures at crop growth stages in the time domain can help in discriminating various crops and vegetation patterns [22]. This phenological aspect was used to map paddy fields through multi-temporal datasets.

The objective was to classify paddy fields over a region and estimate the regions with maximum possibility of paddy cultivation. The supervised classification was chosen over unsupervised one as it provided the user to manipulate the pixel’s spectral values and not rely on the clustering patterns generated by algorithms. The spatial resolution of satellite image is directly proportional to the values captured in the pixels. Coarser the spatial resolution higher is the chance that the pixel is a part of more land cover types; these pixels are termed as mixed pixels. In traditional hard classification techniques, algorithms assign pixels to specific land cover types which results in loss of information. To handle the mixed pixel issue in the classification, possibilistic c-means (PCM) classifier was selected over fuzzy c-means (FCM) classifier as membership values of PCM are a measure of ‘degree of belonging’ [23], while that of FCM is ‘degree of sharing’ [24]. In PCM, the clustering problem is drafted in the possibility domain where the resultant partitions are interpreted as possibilistic partition.

The major objective of the research was to soft classify the bi-sensor temporal images and compare between the norms for the best separation of linear classes. Other objectives include: (1) to evaluate combined bi-sensor datasets for better extraction of paddy fields, (2) to identify best date combination of bi-sensor datasets for identification of paddy fields and (3) to compare similarity and dissimilarity norms via PCM classifier.

2 Indices and measures

2.1 Normalized difference vegetation index (NDVI)

In order to reduce the spectral dimensionality of the dataset, vegetation indices were used. The NDVI band ratio was proposed by Kriegler et al. [25]. It was observed that the vegetation pigments have high absorptivity in the red spectral wavelength and high reflectance in the near infrared wavelength. This observation was successfully drafted into a band ratio where Red and NIR bands were used to reduce dimension of dataset. The NDVI is calculated using Eq. (1):

$${\text{NDVI}} = \frac{{\rho_{{{\text{NIR}}}} - \rho_{{{\text{RED}}}} }}{{\rho_{{{\text{NIR}}}} + \rho_{{{\text{RED}}}} }}$$

where \(\rho_{{{\text{NIR}}}}\) represents reflectance at near infrared band and \(\rho_{{{\text{RED}}}}\) represents reflectance at red band.

2.2 Possibilistic c-means (PCM) classification method

PCM classifier generates membership values which are interpreted as degree of belongingness or typicality [26]. The actual feature classes should have high membership values as compared to the values associated with unrepresentative features. The objective function is expressed in Eq. (2):

$$J_{m} \left( {U,V} \right) = \mathop \sum \limits_{i = 1}^{C} \mathop \sum \limits_{j = 1}^{N} \left( {u_{ij} } \right)^{m} d_{ij}^{2} + \mathop \sum \limits_{i = 1}^{C} \eta_{i} \mathop \sum \limits_{j = 1}^{N} \left( {1 - u_{ij} } \right)^{m}$$

where C is the number of classes, N is the number of pixels, m is the weighted constant, and \(\eta_{i}\) (scale or resolution parameter) are suitable positive numbers. The first term demands that the distance from the feature vector to the prototypes be as low as possible, whereas the second term forces \(u_{ij}\) (fuzzy membership value) to be as large as possible to avoid the trivial solution. The resolution parameter is calculated as in Eq. (3):

$$\eta_{i} = \frac{{\mathop \sum \nolimits_{j = 1}^{n} u_{ij}^{m} D\left( {x_{j} ,v_{i} } \right)}}{{\mathop \sum \nolimits_{j = 1}^{n} u_{ij}^{m} }}.$$

The fuzzy membership value \(u_{ij}\) is calculated from Eq. (4):

$$u_{ij} = \left[ {\mathop \sum \limits_{k = 1}^{C} \left( {\frac{{D\left( {x_{j} ,v_{i} } \right)}}{{\eta_{i} }}} \right)^{{\frac{1}{m - 1}}} } \right]^{ - 1}.$$

2.3 Similarity and dissimilarity measures

2.3.1 Similarity measures

Similarity measure between two sequences is a measure that quantifies dependencies between them. The similarity measures of cosine and correlation are applied with PCM classifier.

Cosine This similarity norm measures the cosine angle between two vectors of an inner product space. It is mentioned in Eq. (5):

$$D\left( {X_{j} ,V_{i} } \right) = 1 - \frac{{\sum x_{j} v_{i} }}{{\sqrt {{\text{Abs}}\left[ {x_{j} } \right]^{2} } \sqrt { {\text{Abs}}\left[ {v_{i} } \right]^{2} } }}.$$

Correlation Pearson’s correlation coefficient (r) is used to measure the similarity between the two items. The same is formulated in Eq. (6):

$$D\left( {X_{j} ,V_{i} } \right) = 1 - \frac{{\left[ {\left\{ {x_{j} + \frac{1}{b}\left( {\sum - x_{j} } \right)} \right\}*\left\{ {v_{i} + \frac{1}{b}\left( {\sum - v_{i} } \right)} \right\}} \right]}}{{\sqrt {\sum {\text{Abs}}\left[ {x_{j} + \frac{1}{b}\left( {\sum - x_{j} } \right)} \right]^{2} } *\sqrt {\sum {\text{Abs}}\left[ {v_{i} + \frac{1}{b}\left( {\sum - v_{i} } \right)} \right]^{2} } }}.$$

2.3.2 Dissimilarity measures

This measure between two sequences quantifies the independency between them. The dissimilarity measure D is considered a metric if it produces a higher value as corresponding values in the sequence become less independent. A total of ten dissimilarity norms of Bray Curtis, chessboard, Manhattan, Canberra, Euclidean, mean absolute distance, median-absolute distance and normalized squared Euclidean are evaluated in the PCM classifier [27].

Bray Curtis This norm is a statistic used to quantify the compositional dissimilarity between two sites based on counts on each site [28]. Equation (7) describes this norm.

$$D\left( {X_{j} ,V_{i} } \right) = {{\sum {\text{Abs}}\left[ {X_{j} - V_{i} } \right]} \mathord{\left/ {\vphantom {{\sum {\text{Abs}}\left[ {X_{j} - V_{i} } \right]} {\sum {\text{Abs}}\left[ {X_{j} + V_{i} } \right]}}} \right. \kern-\nulldelimiterspace} {\sum {\text{Abs}}\left[ {X_{j} + V_{i} } \right]}}.$$

Chessboard Also known as the Chebyshev distance, this metric distance is defined on a vector space where the distance between the two vectors is the greatest of their distances along any coordinate dimension. Equation (8) gives the chessboard norm.

$$D\left( {X_{j} ,V_{i} } \right) = {\text{Max}}\left[ {{\text{Abs}} \left( {X_{j} - V_{i} } \right)} \right].$$

Manhattan This norm is the sum of absolute intensity differences and is one of the oldest norms. The generalized equation used is shown in Eq. (9).

$$D\left( {X_{j} ,V_{i} } \right) = \left[ {\sum {\text{Abs}}\left[ {X_{j} - V_{i} } \right]} \right].$$

Canberra This norm is the weighted version of Manhattan distance. It is a numerical measure of point pairs in vector space. The formula is stated in Eq. (10).

$$D\left( {X_{j} ,V_{i} } \right) = \frac{{\sum {\text{Abs}}\left[ {X_{j} - V_{i} } \right]}}{{\left[ {{\text{Abs}}\left[ {X_{j} } \right] + {\text{Abs}}\left[ {V_{i} } \right]} \right]}}.$$

Euclidean Euclidean distance between two points is calculated as square root of the sum of the squares of the difference between the corresponding points. Equation (11) explains the Euclidean distance. The Mahalanobis and diagonal Mahalanobis norms are calculated as per Eq. (12).

$$D\left( {X_{j} ,V_{i} } \right) = \sqrt {\mathop \sum \limits_{i = 1}^{n} \left( {X_{j} - V_{i} } \right)^{2} }$$
$$D\left( {X_{j} ,V_{i} } \right) = \left( {X_{j} ,V_{i} } \right)^{T} A^{ - 1} \left( {X_{j} ,V_{i} } \right)$$

where \(A^{ - 1}\) is both variance–covariance and diagonal variance–covariance.

Mean Absolute Difference This norm measures the mean of the absolute deviation from the central point. It is the summary statistics of variability. Equation (13) discussed this norm.

$$D\left( {X_{j} ,V_{i} } \right) = \frac{1}{b}\left[ {\sum {\text{Abs}}\left( {X_{j} ,V_{i} } \right)} \right].$$

Median-Absolute Difference This dissimilarity norm reduces the effect of impulse noise on the calculated images. The formula for this norm is represented in Eq. (14).

$$D\left( {X_{j} ,V_{i} } \right) = {\text{Median}}\left[ {{\text{Abs}}\left( {X_{j} - V_{i} } \right)} \right].$$

Normalized Squared Euclidean This norm normalizes the measure with respect to the image contrast. In the calculation of correlation coefficient, scale normalization is performed once after calculating the inner product of the normalized intensities. Equation (15) describes this norm.

$$D\left( {X_{j} ,V_{i} } \right) = \frac{{\sum {\text{Abs}}\left[ {X_{j} + \frac{1}{b}\left( {\sum - X_{j} } \right) - V_{i} + \frac{1}{b}\left( {\sum V_{i} } \right)^{2} } \right]}}{{2\left[ {{\text{Abs}}\left\{ {X_{j} + \frac{1}{b}\left( {\sum - X_{j} } \right)} \right\}^{2} + \sum {\text{Abs}}\left\{ {V_{i} + \frac{1}{b}\left( {\sum - V_{i} } \right)} \right\}^{2} } \right]}}.$$

2.4 Backscattering coefficient

The backscatter coefficient (\({\sigma }^{0}\)) is defined as the differential scattering cross section per unit volume for a scattering angle of 180′. Measurements of this quantity involve the projection of a pulsed ultrasound beam into a volume containing the medium of interest and monitoring echo signals due to scattering. The formula used for calculating backscattering coefficient for RISAT-1 data is given in Eq. (16) [29].

$$\sigma^{0} \left( {{\text{dB}}} \right) = 20\log_{10} \left( {{\text{DN}}_{{\text{p}}} } \right) - K_{{{\text{dB}}}} + 10\log_{10} \left( {\sin \left( {i_{{\text{p}}} } \right)/\sin \left( {i_{{{\text{center}}}} } \right)} \right)$$

where \({\text{DN}}_{{\text{p}}}\) is the digital number for the pixel p, \(K_{{{\text{dB}}}}\) is the calibration constant in dB, \(i_{p} \; {\text{and }}\;i_{{{\text{center}}}}\) are the incidence angle for pixel p and center of the scene, respectively. The value of \({\text{DN}}_{{\text{p}}}\) is calculated using Eq. (17).

$${\text{DN}}_{{\text{p}}} = \sqrt {\left( {{\text{DNI}}_{{\text{p}}}^{2} + {\text{DNQ}}_{{\text{p}}}^{2} } \right)}$$

where \({\text{DNI}}_{{\text{p}}}\) is DN value of in-phase (real channel) component and \({\text{DNQ}}_{{\text{p}}}\) is DN value of quadrature (imaginary channel) component.

3 Study area and data used

The study area is situated on the east side of Haridwar, Uttarakhand, India, toward national highway 74 as seen in Fig. 1. The central latitude and longitude of the area are 29°52′20.3124″N and 78°10′25.0998″E. River Ganges flows through the district, and hence, the land here is fertile and conducive for agriculture. The major crops cultivated in this region include wheat, rice, sugarcane, mustard, groundnuts and fruits like mangoes and litchis. The temperature in summer ranges from 25 to 44 °C, while that in winter ranges from − 2 °C to 24 °C. The area of the city is 12.3 km2 (Table 1).

Fig. 1
figure 1

Study area as seen in Formosat-2

Table 1 Sensor details of temporal images

In most of the researches carried out, optical data were used extensively for crop mapping. In India, the monsoon season coincides with the transplanting season for paddy crop. Due to this, the remotely sensed data have cloud cover which makes it difficult to understand and exploit the pixel values. The occurrence of atmospheric disturbances, cloud cover, creates gaps in temporal data and decreases the accuracy of results [30]. Data used for this research work include four satellite images obtained from RISAT-1 (two images) and Formosat-2 (two images). RISAT-1 images are microwave images in the C band (5.35 GHz) frequency range with dual polarization HH and HV in Medium Resolution ScanSAR (MRS) mode. The electromagnetic radiation is a combination of electric and magnetic waves in which the electric field dictates the direction of propagation. When the receiving antenna points in the same direction of that of propagation, best results are achieved. Hence, if the propagation of waves is in the horizontal domain and the antenna also points in the same direction, the polarization achieved is termed as HH.

The study uses the HH polarization as it was found suitable for paddy monitoring [21, 31]. Based on their studies in the Zhaoqing test site, they established an experimental backscattering model in which the backscattering was shown as a function of time using cubic polynomial. This study was taken into account while short listing the polarization to be used. The spatial resolution in MRS mode for RISAT-1 is 18 m, while that of Formosat-2 is 8 m. As crop mapping cannot be determined accurately using single date image [19], temporal data were used. The temporal dates of RISAT-1 data were 27 June 2014 and 09 July 2014, while that of Formosat-2 were 10 August 2014 and 25 September 2014 as seen in Fig. 2. Ground truth was collected with the help of global positioning system (GPS) points. The survey dates were 19 and 20 October 2014. These data were used for both the training of supervised classification classifier as well as for testing. Of the total points collected, 80% of the points were used for training, while 20% were used for testing purposes. Supervised soft classification was carried on with the JAVA-based sub-pixel multi-spectral image classifier (SMIC) tool with the PCM classifier and the similarity and dissimilarity distance norms. Three temporal datasets were assembled using the four date images in combination of the microwave and optical bands. Dataset one contained two microwave bands dated 27 June and 09 July and one optical band dated 10 August. Dataset two had one microwave band dated 07 July and two optical bands dated 10 August and 25 September, while dataset three had all the four bands (refer Table 2). It was important to select the best date combination to achieve high accuracy. Three-day combination gave better results as compared to single date image or two-date combination [32]. These datasets were constructed after the backscattering coefficients of the microwave data were calculated. NDVI index was calculated using optical images of Formosat-2.

Fig. 2
figure 2

Temporal images used for study

Table 2 Temporal datasets used for classification and analysis

4 Methodology and approach

Bi-sensor approach was used to carry out the research. This was done to understand if data fusion for classification would yield desired results. Methodology followed is presented in Fig. 3.

Fig. 3
figure 3

Methodology adopted for paddy field mapping

RISAT-1 images dated 27 June and 9 July 2014 were geometrically corrected to be in accordance with Formosat-2 temporal images dated 10 August and 25 September 2014. RISAT-1 images were processed to obtain backscattering images using formula 16 mentioned above and resampled to 8 m to match the spatial resolution of Formosat-2. NDVI image was obtained from Formosat-2 temporal images. The backscattered images and NDVI images were linearly stretched to obtain 8 bit images with pixel values ranging from 0 to 255. This was done as the SMIC (sub-pixel land cover mapping image classifier), a JAVA-based image processing package [33] which supports 8 bit imagery was used for classification. The linear stretching would overcome the difference created by decibel values of backscattering images and NDVI values bringing uniformity in analysis.

4.1 Dataset generation

Three datasets were generated using temporal images. These datasets were obtained by stacking backscatter images and NDVI images in order of the date of acquisition of images. The stacking of these bands and the dates is as seen in Table 2.

The ground data which were utilized for the training and testing of sites were collected on 19 and 20 October 2014 through field survey. These data were used to accurately locate the pure pixels, i.e., to identify the paddy fields in the region of study. Pure pixels are needed for training as fields in India are closely situated and spectral mixing of signature may result in nonlinearity in classes. The growth cycle under study belongs to the kharif season which is monsoon dependent in India. Typically, it takes anywhere between 120 and 150 days for paddy to reach maturity. The collection of ground data was close to the late transplanted paddy, and hence, identification of this single class was carried out. The target class of late transplanted paddy was to be isolated from nearby paddy fields which were basically early transplanted paddy and nearing maturity.

Nearly 15 pure pixels from six different paddy field sites were used to carry out supervised kernel-based PCM using SMIC package. Different types of similarity and dissimilarity based distance norm kernels were used with various values of weighted constant ‘m’ ranging from 1.5 to 3 for each kernel.

5 Results and discussion

Agricultural fields change temporally hence to classify these changes temporal images are used. As quantitative evaluation of the fields was not possible, the evaluation was done based on the membership values of PCM for varying values of ‘m.’ The fundamental of soft classification which says inter-class membership variance should be more and intra-class membership variance should be low [34] was used considering factors like optimized weighted constant ‘m’, temporal combination of images and similarity and dissimilarity-based distance norms.

Figures 4, 5 and 6 show the classified outcomes of dataset 1, 2 and 3 for late transplanted paddy for optimized weighted constants as seen in Tables 3, 4 and 5, respectively.

Fig. 4
figure 4

Late transplanted paddy extracted for dataset 1 using a Bray Curtis, b Canberra, c chessboard, d correlation, e cosine, f diagonal variance–covariance, g Euclidean, h mean absolute difference, i Manhattan, j median absolute difference, k normalized square Euclidean, l variance–covariance for optimized weighted constants

Fig. 5
figure 5

Late transplanted paddy extracted for dataset 2 using a Bray Curtis, b Canberra, c chessboard, d correlation, e cosine, f diagonal variance–covariance, g Euclidean, h mean absolute difference, i Manhattan, j median absolute difference, k normalized square Euclidean, l variance–covariance for optimized weighted constants

Fig. 6
figure 6

Late transplanted paddy extracted for dataset 3 using a Bray Curtis, b Canberra, c chessboard, d correlation, e cosine, f diagonal variance–covariance, g Euclidean, h mean absolute difference, i Manhattan, j median absolute difference, k normalized square Euclidean, l variance–covariance for optimized weighted constants

Table 3 Results obtained for dataset 1, where A corresponds to the difference between the membership values of late transplant and early transplant, while B is that for late transplant and shallow water
Table 4 Results obtained for dataset 2, where A corresponds to the difference between the membership values of late transplant and early transplant, while B is that for late transplant and shallow water
Table 5 Results obtained for dataset 3, where A corresponds to the difference between the membership values of late transplant and early transplant, while B is that for late transplant and shallow water

The membership values for different testing sites were noted for target class (late transplanted paddy) and non-target classes (early transplanted paddy and shallow water) for all the three datasets. The target and non-target classes selected were based on the temporal images used. Paddy fields are filled with water during the transplantation stage. This often leads to misclassification when class of shallow water is present in the study area resulting in similar backscattering values. The maximum membership variance for corresponding weighted constant will result in optimized weighted constant, whereas the maximum variance in membership for optimized weighted constants for different datasets will result in best temporal combination of dataset as nonlinearity was handled best for that combination of distance norm, temporal combination and weighted constant.

Tables 3, 4 and 5 clearly depict the membership values for unbiased testing sites for dataset 1, 2 and 3, respectively. The optimized weighted constant for which the difference of membership values between the target and non-target sites is mentioned in the tables. For dataset 1 with temporal images of June, July and August show that for optimized ‘m’ = 1.3, mean absolute difference and Manhattan distance norms yielded highest mean membership values for late transplanted paddy class of 253 of all unbiased testing sites. These norms provided the maximum mean membership values for target class (refer Table 3). For dataset 2 (refer Table 4) with temporal images of July, August and September maximum mean membership value of target class was observed for diagonal variance–covariance norm at 254 with optimized ‘m’ = 2.3.

Whereas (refer Table 5) for dataset 3 which consists of all temporal images of June, July, August and September maximum mean membership value for target class calculated from unbiased sites was observed for variance–covariance norm at 253 where optimized ‘m’ was 2.1

Graphs shown in Fig. 7 are plotted for the difference between the membership values of late transplanted paddy and early transplanted paddy for datasets 1, 2 and 3, respectively. Figure 8 represents the graphs for difference of membership values between late transplanted paddy and shallow water for 12 norms at various values of weighted constant. These differences are based on the 8 bit classified images obtained using SMIC classifier. The average difference of unbiased sites is plotted against weighted constants ‘m.’

Fig. 7
figure 7

a, b, c Plot of difference between late transplanted and early transplanted paddy for all norms at different values of 'm' for datasets 1, 2 and 3

Fig. 8
figure 8

a, b, c Plot of difference between late transplanted paddy and shallow river bed for all norms at different values of 'm' for datasets 1, 2 and 3

Table 6 gives an overview of the best combination of dataset, distance norm and corresponding optimized weighted constant for which the classification of late transplanted paddy yielded best statistical results.

Table 6 Selection of best dataset via optimized 'm' and best norm/s

6 Conclusions

A total of 12 similarity and dissimilarity norms were tested over 3 datasets with temporal resolution of 3 and 4 dates for better extraction of single class that is late transplanted paddy fields. It was observed that the weighted constant ‘m’ played a significant role in suppressing non-target classes when single class classification was carried out. The suppression of non-target classes was not uniform as difference between the membership values of the target and non-target classes were uniform as microwave and optical images characteristics came into picture. In some cases, the difference between the membership values of late transplanted paddy and early transplanted paddy was maximum, while in few cases, the difference between late transplanted paddy and shallow water was maximum.

It was observed that the PCM classifier could identify and extract single class and suppress other classes and also solve the problem of mixed pixels to a larger extent when datasets were compared. Statistically, it was found that the dataset containing two dates of microwave data and one date of optical data produced best results for the norms mean absolute difference and Manhattan at weighted constant value ‘m = 1.3.’ However, visual interpretation suggested that the dataset containing two dates of microwave data and two dates of optical data had better results for the norm variance–covariance at ‘m = 2.1’ as seen in Figs. 4, 5 and 6.

It was thus concluded that the PCM classifier produces satisfactory results for single class extraction and also manages mixed pixels to suppress non-target classes for dataset containing more microwave temporal images than optical images. The noise observed in the classification images can be handled by applying speckle filter before the classification process. From the results obtained, it can be concluded that the use of multi-date microwave data when integrated with optical data produces better results as compared to the use of single date microwave data with optical data in temporal scenarios for crop identification and mapping.