Introduction

Invasive alien plants (IAPs) everywhere pose a substantial threat to farming, biodiversity, and the delivery of ecosystem services (Lyimo et al., 2019). The IAPs caused great economic and ecological damage around the world (Das and Duarah, 2013). They were reported to reduce agricultural productivity through various mechanisms such as competition (for light, nutrients, and water), allelopathy, and parasitism (Bajwa et al., 2019; Fried et al., 2017). Tall IAPs are known for shading out juvenile crops and hampering their growth (Burgos and Ortuoste, 2018), thus reducing native species recruitment.

In the tropical and sub-tropical parts of South Africa, some of the noxious IAPs include the Solanum mauritianum (wormwood), Lantana camara (Lantana) and the Chromolaena odorata (Triffid weed). The S. mauritianum is a woody species native to South America and has naturalized in Africa, Australasia, India, and islands in the Atlantic, Indian and Pacific oceans (Roe, 1972). The S. mauritianum is a shrub or small tree that can grow up to 4 m tall (Olckers, 1999). The L. camara is another woody perennial shrub native to Central and South America that is aggressive and troublesome within the natural and agricultural environments worldwide with a wide climatic tolerance enabling it to be naturalised within Australia, Africa including South Africa and India (Vardien et al., 2012). On the contrary, C. odorata is a perennial shrub that forms dense tangled bushes of about 1.5–2.0 m in height native to subtropical and tropical America and colonised in Asia, Africa and Oceania (Koutika and Rainey, 2010). Early detection of weed infestations will provide relevant stakeholders such as farmers to control the spread of invasive alien plant species in support of efforts geared towards food security.

The advent of high spatial resolution imagery proved in many studies to enable the delineation and mapping of IAPs, as they are capable of visually detecting the spatial distribution of IAPs through direct remote sensing methods (Huang and Asner, 2009). This approach is better than the traditional field monitoring of IAPs which is difficult and expensive (Lourenco et al., 2021). Several researchers have used high spatial resolution multispectral sensors in detecting and mapping IAPs, for example, the Pinus spp. was successfully mapped in mountainous regions of the Western Cape Province (South Africa), using SPOT-6 imagery (Forysyth et al., 2014). Additionally, Rajah et al., 2019 have used Sentinel-2 vegetation indices (Vis) in detecting and mapping the American Bramble (Rubus cuneifolius) at an accuracy of 80% at the uKhahlamba Drakensberg Park (UDP). Subsequently, this integrated model correctly predicted 87% of canopy gaps that contain C. odorata at a probability threshold of 0.3.PlantScope (3m) and Sentinel 2 datasets were used to map the distribution of Lantana camara L. and Leucaena leucocephala L. where L. camara was the common species with 23% coverage and found that PlanetScope using random forest (RF), Gaussian mixture (GMM) and support vector machine (SVM) as more accurate in detecting both invasive species rather than with Sentinel 2(Iqbal et al., 2023). Moreover, Serephium plumosum (S. plumosum) which is an encroacher shrub that has been declared as problematic in some parts of South Africa through legislation has been mapped in Telperion and Ezemvelo nature reserves in South Africa using Sentinel-2 and random forest (RF) and support vector machine (SVM) with an overall accuracy of 97.42 and 95.48% for RF and SVM respectively (Nkhwanana et al., 2022). SPOT 6 bands and vegetation indices mapped L. camara on an independent test dataset using a random forest algorithm at an accuracy of 75% in a community grazing land in KwaZulu- Natal (Oumar, 2016a). Kganyago et al., 2018) also used Landsat OLI 8 and SPOT 6 in mapping the distribution and patching sizes of Parthenium hysterophorus in the savannah landscapes of Kwazulu-Natal in combination with SVM classification at a higher overall accuracy of 86% for SPOT 6 and 83% for OLI. The medium to high spatial resolution used in all these studies was conducted in different environments other than tea plantations.

This study thus aimed to test the capability of high-medium resolution satellite imagery in mapping the occurrence of S. mauritianum, L. camara, and C. odorata and their co-occurrence probability in the Tshivhase Tea Estate using SPOT 7 and Sentinel-2 images. Knowledge of the S. mauritianum, L. camara and C. odorata occurrence and their extent will assist Tshivhase Tea Estate managers in controlling the infestation of the IAPs within the tea plantation by targeting and prioritising areas that are invaded by IAPs. IAPs affect the growth of tea through competition for sunlight, water, nutrients, and space thereby affecting tea quality and production, therefore it is essential to detect and monitor them for easy eradication planning.

Material

Study area

The study was undertaken at Tshivhase Tea Estate in Thulamela Municipality in the Vhembe District Municipality of the Limpopo Province of South Africa lying between 30.314: 30.367 E and −22.968: −22.994 S (Figure 1) with an extent of 577 hectare (ha) (Nesamvuni et al., 2022). The Tshivhase Tea Estate is traded by Venteco, which took over the business from British firm Sapeco after it withdrew from the country in 2004 as it was operating at a loss. The Tshivhase and Mukumbani are the two sub-estates under the Tshivhase Tea Estate. The cultivation and production of tea (Camellia sinensis) and packaging of bulk black and rooibos tea (Aspalathus linearis) are the main activities in the Tshivhase Tea Estate. The study focused only on the Tshivhase sub-estate. The area is situated in the Eastern subtropical region and is generally characterised by hot humid, sub-humid and semi-arid climates (Mulugisi, 2015). It receives much of its rainfall during summer (November to February) (Kabanda, 2004; Durowoju et al., 2019).

Fig. 1
figure 1

Tshivhase Tea Estate, Thulamela Municipality, Limpopo Province

Image data acquisition and pre-processing

The SPOT 7 and Sentinel-2 data acquired in the spring of 2021 at level 1 were used to map the presence/absence of S. mauritianum, L. camara and C. odorata as well as the observed and conditional co-occurrence probability amongst the three species at the Tshivhase Tea Estate. SPOT 7 has 4 spectral bands at 6m: blue (0.455–0.525 μm), green (0.530–0.590 μm), red (0.625–0.695 μm) and near-infrared (0.760–0.890 μm) with data up to 1.5 m spatial resolution panchromatic and multispectral (Hubert-Moy et al., 2020) whereas Sentinel-2 has 13 spectral bands spanning the visible/near-infrared and shortwave infrared spectral ranges at 60 m, 20 m and 1 0m spatial resolutions (Immitzer et al., 2016). The 10 m spatial resolution Sentinel-2 spectral bands include: B2 (0.49 μm), B3 (0.56 μm), B4 (0.665 μm) and B8 (0.842 μm). The SPOT 7 data were pre-processed for any geometric and radiometric distortions by the South African National Space Agency (SANSA). On the other hand, Sentinel-2 data were radiometrically and geometrically corrected by the European Space Agency (ESA) (Pandzic et al., 2016). Further, Sentinel-2 was pre-processed using the open source software QGIS for atmospheric correction via the Semi-Automatic Classification Plugin (SCP) (Congedo, 2016).

Field data collection

A total of 124 samples (n =124) were randomly surveyed within the Tshivhase Tea Estate boundary. The locations of surveyed sites were recorded using Garmin eTrex Global Positioning System (GPS) receiver with a maximum spatial accuracy of 3 m. The presence and absence of alien invasive plant species were observed visually on the randomly sampled points for 6 days between the 1st and 8th days of October 2021. The date was chosen based on the phenological condition (green and ready for harvest) of the tea during that time. In each of the visited sites, one specimen of S. mauritianum, L. camara or C. odorata was considered present while no specimen observed was considered absent. In each of the visited sites, two different specimens occurring together in one site were considered co-occurrence. By observation, an infested tea plantation would have five or more specimens of the three IAPs each with a maximum height of 2 m and more.

Spectral indices

The selected spectral indices were calculated from both SPOT 7 and Sentinel-2 datasets. The following spectral indices were thus used: (i) Normalized Difference Vegetation Index (NDVI), (ii) Brightness Index (BI) and (iii) Modified Normalized Difference Water Index (MNDWI) (Table 1) were used to determine the brightness, health, and moisture status of tea plantation in the Tshivhase Tea Estate (Table 1). The NDVI was chosen as it is a successfully commonly used vegetation index in monitoring vegetation conditions (Cropin, 2021). The MNDWI uses the green and near-infrared to extract water bodies (McFeeters, 1996) and to delineate the amount of water present in vegetation (Gao, 1996). The NDWI is useful in determining plant water stress which can have a major impact on general plant development, and crop failure resulting in lower production in agricultural areas (Factsheet, 2011). The MNDWI is not widely used in crop health monitoring and it has been used in this study to determine tea plant water stress. The Brightness Index (BI) is related to the brightness of soils influenced by soil moisture, the presence of salts, and the organic matter content of the soil surface (Escadafal, 1989). The BI was used to determine the soil moisture in the study area. In addition to the spectral indices, the standard deviation variable (STDv) and mean variable were computed from all the spectral bands for each sensor.

Table 1 Spectral indices selected for the study

Method

The relationships between S. mauritianum presence/absence, and the remotely sensed environmental data, as well as the observed and conditional co-occurrence probability, were modelled using the stepwise logistic regression modeling (SLRM) techniques in R statistical software. The SLRM is the regression method that can be used when a response variable is binary (0/1), and the predictor variable is continuous or categorical (Sahragard and Ajorlo, 2015). Logistic regression uses the logit function to describe the relationship between response variables and predictor variables (Miller and Franklin, 2004). Logistic regression is a form of the Generalised Linear Model (GLM). The GLM is a generalisation of ordinary linear regression that allows for response variables that have error distribution models other than a normal distribution (DataCamp, 2020). The coefficients of the models were then used in QGIS to generate the distribution of the plant species map together with the co-occurrence probability map.

Model calibration

Approximately 60% of 124 total datasets (n =74) were used in the stepwise logistic model calibration. The calibration dataset was used to train the model for invasive species occurrence using the R statistical software and for producing the spatial distribution maps. The SLRM was built by including all input variables in the model and the statistically insignificant variables were eliminated through the Akaike’s Information Criterion (AIC) selection procedure. The logistic regression is widely used for distribution modelling of plant species in the general linear Model (Rushton et al., 2004), and it is given by the Equation (1) adapted from Higgins et al., (1999):

$$y=\frac{e^{\beta_0+{\beta}_1{x}_1+{\beta}_2{x}_2+\dots {\beta}_n{x}_n}}{1+{e}^{\beta_0+{\beta}_1{x}_1+{\beta}_2{x}_2+\dots {\beta}_n{x}_n}}$$
(1)

where y is the probability of species occurrence, xn is the explanatory variable, βn is the coefficient of xn, β0 is the intercept and e is the exponent function of the model.

Model validation

From the standard dataset, 40% (n = 50) of the data was used for validating and testing the accuracy of the predictive models. The area under the curve, receiver operator characteristic (AUC ROC) measure was used for the assessment of model accuracies. The AUC statistic shows the power of the model in the distinction between presence and absence. The value of the statistics closer to 1.0 indicates superior agreement between the model and the real environment (Sahragard and Chahouki, 2015). The receiver operator characteristic (ROC) curves are used to show how a predictor compares to the true outcome (Muschelli, 2019). In ROC analyses, the predictive capabilities of a variable are commonly summarised by the area under the curve (AUC), which can be found by integrating areas under the line segments (Muschelli, 2019). AUC takes values from 0 to 1 and is considered as follows: 0 (inaccurate test), 1 (accurate test), 0.5 (no discrimination), 0.7-0.8 (acceptable),0.8-0.9 (excellent) and more than 0.9 (outstanding) (Mandrekar, 2010).

Results

S. mauritianum occurrence

The stepwise logistic regression model derived from SPOT 7 data indicated that blue (band1), green (band 2), red (band 3), and near infrared (band4) bands were statistically significant at p<0.05. Additionally, the standard deviation (STDv) was also statistically insignificant at p>0.05 while the NDVI, BI and MNDWI were also significant at p<0.05 with negative correlations to the presence/absence of S. mauritianum in Tshivhase Tea Estate (Table 2). On the other hand, the stepwise logistic regression model derived from Sentinel-2 data indicated band2 (blue), band4 (red), band 8 (NIR) and standard deviation (STDv) variables as significant at p<0.05. The bands 2 and 4 and STDv yielded a positive relationship with the occurrence of S. mauritianum while the spectral band 8 showed a negative relationship with the presence/absence of S. mauritianum in the Tshivhase Tea Estate (Table 3). The SPOT 7 stepwise logistic regression yielded a higher AUC ROC of 0.93 than the Sentinel-2 model at 0.8 (Figure 2).

Table 2 The results of S. mauritianum occurence stepwise logistic regression model derived from SPOT 7
Table 3 The results of S. mauritianum occurrence stepwise logistic regression model derived from Sentinel 2
Fig. 2
figure 2

SPOT 7 (A) and Sentinel-2 (B) S. mauritianum occurrence Logistic Regression model ROC AUC curve

L. camara occurrence

The stepwise logistic regression model derived from SPOT 7 variables indicated red (band 3), standard deviation (STDv), variance, BI and MNDWI as significant at p<0.05 with red (band 3) and variance having a positive relationship with the occurrence of L. camara and standard deviation (STDv). Additionally, the BI and MNDWI yielded a negative correlation with the presence and absence of L. camara (Table 4). On the other hand, the regression model derived from Sentinel-2 variables indicated a blue band as statistically significant at p<0.05 with a negative correlation with the presence/absence of L. camara. The red, near infrared, STDv and BI were not significant (Table 5). The stepwise logistic regression model derived from the SPOT 7 data model yielded a performance prediction of 0.87 AUC ROC whereas Sentinel-2 produced a prediction performance of 0.63 (Figure 3).

Table 4 The results of L. camara occurrence stepwise logistic regression model derived from SPOT 7
Table 5 The results of L. camara occurrence stepwise logistic regression model derived from Sentinel 2
Fig. 3
figure 3

SPOT 7 (A) and Sentinel-2 (B) L. camara occurrence Logistic Regression model ROC AUC curve

Observed co-occurrence probability

The model derived from SPOT 7 for the observed co-occurrence probability of S. mauritianum (P1) and L. camara (P2) indicated band 1 as a statistical significance at p < 0.05 with a positive relationship to the occurrence of P1 and P2 IAP species whereas red (band 3), near infrared (band 4), variance and BI as insignificant at p>0.05 (Table 6). Other variables such as soil condition and Digital Elevation Model (DEM) could correlate with the presence/absence of L. camara in the Tshivhase Tea Estate. The stepwise logistic regression derived from Sentinel 2 data indicated blue (band 2), red (band 4), near infrared (B8), standard deviation (STDv), and BI as significant at p<0.05 with band 2, band 4, and STDv shows a negative correlation towards the co-occurrence of P1 (S. mauritianum) and P2 (L. camara) (Table 7). The variance was insignificant at p>0.05 (Table 7). The SPOT 7 model produced a higher AUC ROC performance of 0.91 in predicting the co-occurrence of S. mauritianum and L. camara in the Tshivhase Tea Estate whereas sentinel 2 produced an AUC ROC of 0.82 (Figure 4).

Table 6 The results of observed co-occurrence probability stepwise logistic regression model derived from SPOT 7
Table 7 The results of observed co-occurrence probability stepwise logistic regression model derived from Sentinel 2
Fig. 4
figure 4

SPOT 7 (A) and Sentinel-2 (B) observed co-occurrence of S. mauritianum (P1) and C. odorata (P3) Logistic Regression model ROC AUC curve

The observed co-occurrence probability model of S. mauritianum (P1) and C. odorata (P3) for Sentinel 2 could not be validated as it was invalid for binary logistic models. On the actual presence/absence data in validation data, there was no co-occurrence of P1 and P3 species.

Conditional co-occurrence probability

The stepwise logistic regression derived from SPOT 7 data indicated only blue (band 1) as significant and red (band 3), near infrared (Band 4), variance, BI, and MNDWI as non-significant in predicting the co-occurrence of S. mauritianum and L. camara given that C. odorata is not occurring (Table 8). The stepwise logistic regression derived from sentinel 2 data on the other hand indicated all the spectral bands, STDv, mean, variance and BI as non-significant at p>0.05 (Table 9). This means that these variables do not influence the co-occurrence of S. mauritianum and L. camara in the Tshivhase Tea Estate. Other variables such as soil condition, climate, and DEM could have an influence over the co-occurrence of S. mauritianum and L. camara in the condition where C. odorata does not occur. The SPOT 7 stepwise logistic regression of conditional co-occurrence probability model derived from SPOT 7 data yielded an AUC ROC of 0.96 whereas the Sentinel 2 model yielded 0.83 (Figure 5).

Table 8 The results of conditional co-occurrence probability (P1P2) stepwise logistic regression model derived from SPOT 7
Table 9 The results of conditional co-occurrence probability (P1P2) stepwise logistic regression model derived from Sentinel 2
Fig. 5
figure 5

SPOT 7 (A) and Sentinel-2 (B) conditional co-occurrence Logistic Regression model of P1 and P2 ROC AUC curve

The conditional co-occurrence probability model of both SPOT 7 and Sentinel-2 was insignificant in predicting the co-occurrence of S. mauritianum and C. odorata given that L. camara does not occur. There was no probability that S. mauritianum and C. odorata co-occur in one area in the Tshivhase Tea Estate.

Discussion

S. mauritianum occurrence

The stepwise logistic regression model derived from SPOT 7 indicated NDVI, BI, and MNDWI as significant with a negative relationship to the occurrence of S. mauritianum. This means that tea health, soil moisture, and moisture condition do not influence the occurrence of S. mauritianum in the Tshivhase Tea Estate. On the other hand, stepwise logistic regression derived from Sentinel-2 showed blue (band 2), red (band 4) and standard deviation (STDv) influencing the occurrence of S. mauritianum.

The current SPOT 7 model in this study has exhibited a higher AUC (AUC = 0.93) than Sentinel 2 with 0.8 AUC. AUC is very important for understanding the accuracy and reliability of the predictive model. SPOT 7 models performed better than the Sentinel 2 model because of its high spatial resolution. Many studies successfully used SPOT data in detecting and mapping IAPs (Odindi et al., 2014; Schmidt et al., 2010). For instance, Odindi et al., 2014 mapped bracken fern using WorldView-2 and SPOT-5 images with an overall accuracy of 84.72% and 72.22% respectively use the random forest algorithm. In this study, the stepwise logistic regression model derived from a high spatial resolution image was superior in mapping the occurrence of S. mauritianum. This is in support of the previous studies that have successfully used SPOT data in monitoring and mapping IAPs. Most of these studies were conducted in a different environment other than the tea plantation such as the forests, community grazing land, and mountainous areas. The high spatial configurations and temporal resolution of SPOT data may have improved the identification and monitoring of IAPs in these different environments.

Figure 6 shows the predicted spatial distribution of S. mauritianum in the Tshivhase Tea Estate where the blue colour represents where the S. mauritianum occurs and the red colour represents where the species do not occur. SPOT 7 has successfully mapped the distribution of S. mauritianum over the Sentinel 2 map and conforms to what is on the ground (Figure 6). The occurrence of S. mauritianum could have been improved by increasing the sampled points within the tea estate. The distribution pattern of S. mauritianum will assist the Tshivhase Tea Estate managers in better planning for eradication. From the field observation, monkeys were feeding on the S. mauritianum fruits, distributing seeds all over the tea plantation Fig. 7.

Fig. 6
figure 6

S. mauritianum occurrence stepwise logistic regression model map for SPOT 7 and Sentinel 2

Fig. 7
figure 7

L. camara occurrence stepwise logistic regression model map for SPOT 7 and Sentinel 2

L. camara occurrence

The stepwise logistic regression model derived from SPOT 7 indicated B3 and variance as significant with a positive relationship towards the occurrence of L. camara in the study area whereas Green (B2) was significant with a negative relationship to the occurrence of L. camaradata on the Sentinel 2 stepwise logistic regression model. The stepwise logistic regression model derived from SPOT 7 yielded a performance prediction of 0.87 whereas Sentinel 2 produced a prediction performance of 0.61 (Figure 4). This means that SPOT 7 was able to predict the occurrence of L. camara in the Tshivhase Tea Estate than Sentinel 2 because of its high spatial resolution property. The potential of satellite remote sensing in weed detection and mapping in South Africa using readily available multispectral data has been proven in the assessment of the SPOT 6 sensor in detecting and mapping L. camara for a community clearing project in the KwaZulu-Natal study (Oumar, 2016). This study is congruent with Omar’s (2016) study as it has also mapped the occurrence of L. camara using SPOT 7 successfully. Sentinel 2 was also used successfully in mapping the distribution of IAPs in a water-limited catchment with a higher overall accuracy of 71% (Mtengwana, 2020). The map in Figure 7 shows the occurrence of L. camara in the Tshivhase Tea Estate derived from SPOT 7 representing where the species occurs in blue colour. The Sentinel 2 map indicated that there is no occurrence of L. camara with a red colour and therefore SPOT 7 still stands in mapping the occurrence of L. camara.

Observed co-occurrence probability

The stepwise logistic regression model derived from SPOT 7 data indicated band 1 as significant and with a positive relationship towards the co-occurrence of S. mauritianum (P1) and L. camara (P2) in one area. The Sentinel 2 stepwise logistic regression model indicated band 8 and BI as significant and with a positive relationship towards the co-occurrence of P1 and P2 species whereas blue (band 2), red (band 4), standard deviation (STDv) as significant with a negative relationship towards the co-occurrence of these two species. The SPOT 7 stepwise logistic regression model produced a higher AUC ROC of 0.91 in predicting the co-occurrence of P1 and P2 species than the Sentinel 2 model with 0.82. For future studies, high spatial resolution images should be used with machine learning algorithms to delineate various IAPs species occurring in one area.

The maps below show the observed co-occurrence probability of S. mauritianum and L. camara (Figure 8). The SPOT 7 map indicates S. mauritianum and L. camara occurring together in one area along the open areas whereas the Sentinel 2 map shows no co-occurrence of these two species.

Fig. 8
figure 8

Observed co-occurrence stepwise logistic regression model map for SPOT 7 and Sentinel 2

Conditional co-occurrence probability

The stepwise logistic regression derived from SPOT 7 data indicated blue (band 1) as significant with a positive relationship towards the co-occurrence of S. mauritianum (P1) and L. camara (P2) co-occurring in one area together give that C. odorata (P3) does not occur whereas the other variable was insignificant. The conditional co-occurrence probability model of S. mauritianum (P1) and L. camara (P2) for SPOT 7 yielded ROC AUC of 0.96 whereas the Sentinel 2 model yielded 0.83. This means that SPOT 7 was good in predicting the occurrence of S. mauritianum and L. camara in one area together given that C. odorata is not occurring. For future studies, conditional co-occurrence probability should be applied to a larger area using hyperspectral satellite data with the capability of delineating different IAPs under the canopy.

Conclusion

Tea plantations are infested with IAPs species affecting their growth and health and therefore innovative, effective, dependable, and easily accessible measures are needed to monitor the spread of IAPs species within the tea plantation. This study aimed at determining the potential of the medium to high spatial resolution satellite data in mapping the existing occurrence and the co-occurrence of IAPs species within the Tshivhase Tea Estate. The combination of logistic regression and remote sensing variables was significant in mapping the prediction of S. mauritianum, L. camara and C. odorata especially when using a higher spatial resolution satellite image like SPOT 7. The observational matrix was also successful in mapping the co-occurrence between the IAPs species and therefore they can assist the tea plantation managers with information about the IAPs species that co-occur within the tea plantation and plan for clearance thereof. Using logistic regression and remote sensing variables, the location of these IAPs species will be easily identifiable providing the opportunity to directly target areas infested by these species saving time for walking all over the tea plantation locating areas invaded by invasive alien plant species. This study introduced the use of an observational matrix in combination with satellite data in mapping IAPs species that occur together in one area and IAP species that do not occur when the others occur together. This will give tea farmers an opportunity to understand IAPs species community within the tea plantation and plan for their clearance thereof.