1 Introduction

Coral reefs, nurtured by tiny polyps, are essential underwater ecosystems and host an astounding diversity of marine life within their vibrant, expansive structures. Despite covering less than 0.1% of the ocean floor, they support more than 25% of marine species (Fisher et al. 2015), hence serving as a biodiversity hub critical for various reasons. Moreover, among various aquatic organisms, coral reef organisms hold promise vis-à-vis search for potential medical breakthroughs. This is because they present possibilities for use in treatments targeting such ailments as cancer, arthritis, human bacterial infections, and viral diseases. Other than their medicinal potential, these reefs serve as crucial coastal barriers, offering historical climate insights and contributing nearly 30 billion USD annually via tourism, fisheries, and coastal protection (Cesar et al. 2003). However, these ecosystems suffer mounting threats due to climate change, unsustainable practices, overfishing of few species, and the increasing crown-of-thorns starfish (Acanthaster planci, COTS) population, resulting in extensive coral degradation (Van den Hoek and Bayoumi 2017).

The population explosion of COTS is a leading cause of scleractinian coral mortality across the vast Indo-Pacific coral reefs. These outbreaks, shown in Fig. 1, are pivotal in the degradation of coral reefs, thereby posing considerable ecological and conservation challenges (Deaker and Byrne 2022). These starfish pose a notable threat during their potential outbreaks, indicated by densities ranging between 15 and 100 COTS per hectare, while confirmed outbreaks surpass 100 COTS per hectare (Dumas et al. 2020). A single adult starfish can consume approximately 10 m\(^2\) of coral per year, and their large outbreak populations considerably affect the local cover with them persisting for more than a decade (Keesing and Lucas 1992). On account of their high reproduction rate, the population of these starfish rapidly grows in certain regions, increasing their numbers by approximately 5–10 times in a span of 2 to 3 years (Chandler et al. 2023), coupled with notable population surges during outbreaks.

Fig. 1
figure 1

Observations of crown-of-thorns starfish (COTS) predation on coral reefs

Timely identification or prevention of increasing starfish population densities is the optimal approach to averting or minimizing their harmful effect on coral communities. However, our capacity to detect the emergence of these population surges has been notably limited because of variations in methods used for surveying COTS. Traditionally, two primary techniques are used. One method involves recording reef sections for subsequent COTS counting (Jan et al. 2007; Dirnwoeber et al. 2012), while the other, the Manta Tow Survey, involves counting COTS while being towed behind a boat (Liu et al. 2021). These approaches, despite their widespread use, are labor-intensive and error-prone, frequently resulting in underestimation of COTS populations in specific regions (Dayoub et al. 2015). However, with the recent advancements in edge devices and image analysis, there has been a growing interest in the deployment of automated systems for surveying COTS (Arima et al. 2014; Bonin-Font et al. 2017; Abbasi et al. 2022). This has resulted in the development of sophisticated survey machines that incorporate these cutting-edge technologies.

Major contributions outlined in this study can be summarized as follows:

  1. (1)

    An efficient, real-time machine learning framework for COTS classification in reef environments is developed, leveraging a discrete wavelet transform (DWT) for streamlined surveys, early outbreak identification, and targeted COTS control team deployment. The framework reduces computational complexity by employing principal component analysis (PCA) to select optimal features, expediting the classification process and minimizing the model’s size for achieving enhanced performance on resource-constrained devices.

  2. (2)

    Color information and spatial texture analysis using gray-level co-occurrence probability (GLCP) are used to enhance the understanding of texture and spatial relationships within coral habitats that host COTS and other marine species. Integration of crucial Haralick features refines pattern recognition to identify key traits of COTS, including the detection of sharp spines covering their upper bodies.

  3. (3)

    The proposed solution harnesses the robustness of eXtreme Gradient Boosting (XGBoost) in adeptly managing numerous variables, particularly in reef environments where correlations are prevalent. This proficiency considerably increases both the accuracy and speed of COTS classification in reef environments with similar structures, thereby providing researchers with a reliable and effective approach for such ecosystems.

This paper is structured as follows: The introduction presents the research scope. Section 2 explores the related works, highlighting the existing approaches and their limitations. Section 3 presents the proposed system for recognizing COTS, detailing the system’s architecture, components, and design reasoning. Section 4 elaborates on the methodology employed to recognize COTS from the dataset, discussing the techniques and algorithms employed. Section 5 presents the experimental results and findings derived from the implemented methodology. Finally, Section 6 summarizes key points, draws conclusions, and proposes potential avenues for future research.

2 Related work

The current research suffers considerable challenges with color variability and shape complexities in COTS detection. To address these issues, Clement et al. (2005) pioneered the use of local binary patterns (LBPs) for texture-based classification. Recognizing the limitations in color segmentation and shape matching, they identified distinct thorns as reliable features, prompting detailed texture analysis. However, LBP does not effectively encode color information, which is an effective cue for pattern recognition such as object and scene image classification (Banerji et al. 2013). LBP’s focus on local texture patterns might limit the method’s ability to consider larger spatial context or structural information, potentially affecting the differentiation of COTS from other marine elements. Similarly, Dayoub et al. (2015) observed challenges in prior methods that relied on shapes and colors for COTS identification. Hence, they combined an extended version of LBP, called uniform patterns, for textures and a ’histogram of oriented gradient’ (HOG) for edge information, leveraging the sharpness of long thorns. Training a random forest classifier with numerous features from images, their approach successfully recognized and even tracked these starfish in images while achieving high accuracy.

Efficient and precise object detection methods are crucial in the evolution of computer vision Zhao et al. (2019), shaping the diverse array of algorithms integrated into COTS detection methodologies. Pooloo et al. (2021) contributed to this evolution by harnessing the power of EfficientDet-D0 in their custom dataset for COTS detection. This advanced model is characterized by its weighted bidirectional feature pyramid network (Tan et al. 2020) and compound scaling strategy. Sheth and Prajapati (2022) comprehensively compared various object detection models, including YOLOv5, YOLO-X, Detectron2, faster region-based convolutional neural network, and Retina Net, further enriching the exploration of detection techniques in this domain. Li et al. (2022) proposed implementation of YOLOv4 on edge devices, particularly Nvidia Jetson Xavier, to facilitate real-time COTS monitoring. Similarly, Nguyen (2022) aimed to optimize the YOLOv5 model for edge deployment using TensorFlow Lite, attempting versatility in device application. However, both suffered strikingly similar obstacles. The challenges encompassed adapting sophisticated YOLO models to edge devices, wherein computational demands and hardware compatibility posed considerable obstacles. Moreover, Heenaye-Mamode Khan et al. (2023) introduced a novel approach to convolutional neural networks (CNNs) with an enhanced attention module. They applied various pretrained CNN models, including VGG19 and MobileNetV2, to their dataset to detect and classify COTS using transfer learning, providing insights into the causal features associated with COTS.

Feature extraction is a crucial step in pattern analysis, as it captures essential shape information to facilitate classification via a formalized approach. The extraction process aims to acquire the information optimally suited for classification from raw data. Diverse techniques for image feature extraction include statistical feature extraction (comprising first- and second-order statistics), global transformation, series expansion features (such as Fourier transforms, wavelet transformation, Gray level co-occurrence transform, LBP), and geometrical and topological features extraction methods (Kumar and Bhatia 2014). However, certain methods, such as Gabor filters, suffer drawbacks. Gabor filters lack complete shift invariance, making them sensitive to small changes in object position. Similarly, Fourier transforms, while widely used, struggle with capturing localized information because of their global nature. Noteworthily, wavelet transformation emerges as a promising approach for overcoming the challenges suffered by Gabor filters, and Fourier transforms in image feature extraction.

DWT is a promising approach to enhance image classification in intricate underwater settings. The current techniques, reliant on shape and color, suffer difficulties in distinguishing objects with similar textures in their surroundings (Dayoub et al. 2015). To address this, prioritizing the analysis of prominent edges, particularly in intricate patterns, becomes crucial. Exploring the potential 2-dimensional discrete wavelet transform (2-D DWT), an image is decomposed into one approximation, and three detailed images unveil detailed edge and texture information (Huang and Aviyente 2008), enhancing detection and segmentation capabilities across various scenarios. This method’s adaptability and nuanced insights may offer valuable advancements, potentially extending to the detection of challenging marine organisms.

3 Proposed working system

In Fig. 2, the methodology presents a systematic approach for identifying COTS in reef environments. Initially, image dimensions are standardized to a height (h) of 720 pixels, a width (w) of 720 pixels, and 3 color channels via preprocessing to ensure dataset uniformity amidst varying image sizes. Following the partition of training and test images, a DWT with Daubechies wavelets is employed, facilitating effective multiresolution analysis and edge detail preservation. This method is crucial in capturing intricate underwater features, increasing classification accuracy and addressing challenges such as diverse scales and lighting variations. In addition, the approach extends beyond traditional gray-scale imagery by separately applying the DWT to the red, green, and blue (RGB) channels, enhancing analysis depth and comprehensive understanding of COTS traits in reef environments. Noteworthily, results are decomposed up to level 4, further refining the feature extraction process with cA (approximation coefficients) and cD (detailed coefficients) at each level.

Fig. 2
figure 2

Colored wavelet-GLCP based COTS recognition framework in reef environment

Following an extensive wavelet-based analysis, the methodology seamlessly integrates gray-level co-occurrence matrix (GLCM) analysis across individual RGB channels, marking the initial step for GLCP and extracting crucial textural and spatial features. This analysis identifies key features such as ’homogeneity’, ’contrast’, ’dissimilarity’, and ’energy’. Feature normalization and PCA are employed to refine feature extraction in the classification task. PCA effectively captures prominent features while simplifying computation, strengthening the precision and robustness of the classification model. This comprehensive feature extraction process enables a deep understanding of COTS traits within intricate reef ecosystems. Moreover, the machine learning model is trained using the XGB algorithm, rooted in wavelet analysis and complemented by spatial texture analysis. Remarkably, this framework occupies less than a megabyte of storage space and requires minimal computational resources, thereby making it highly adaptable and seamlessly integrated with edge devices.

4 Methodology for proposed framework

4.1 Multiresolution analysis with wavelets

This study draws inspiration from Beijbom et al. (2012), who explored coral image classification using multiscale color and texture filter banks at various scales. The methodology, rooted in DWT, aligns with their emphasis on multiscale analysis while addressing a different task. DWT is a potent tool for image signal analysis, breaking down an image signal via iterative low- and high-pass filter applications, as shown in Fig. 3. This process involves a low-pass filter, condensing redundant signal details, and a high-pass filter, capturing intricate signal specifics. The transformative DWT results in two separate one-dimensional iterations, storing filter coefficients on either side of the matrix and subdividing the image into four distinct sub-bands (low-low (LL), low-high (LH), high-low (HL), and high-high (HH)) (Kumar and Bhatia 2014), representing various frequency components. In this specific case, working with \(w \times h \times 3\) images from an underwater dataset, DWT is applied across the three image channels, generating four \(\frac{w}{2} \times \frac{h}{2} \times 3\) sub-band images.

Fig. 3
figure 3

Red, green, and blue (RGB) channel detail extraction via multiscale wavelet analysis for level 1 decomposition

This represents the breakdown of image details into various frequency sub-bands, a process iterated four times, ultimately decreasing the image dimensions by \(\frac{1}{8}\) of the original. The application of DWT is grounded in its inherent multiresolution properties, preserving both high- and low-frequency features and facilitating the extraction of discriminative multiscale characteristics. By segregating fine-scale and large-scale information into the wavelet detail and approximation coefficients, respectively, DWT yields decomposition coefficients that encompass all original signal data, enabling direct extraction of multiscale features. While this methodology is inspired by the multiscale analysis in Beijbom et al. (2012), the focus herein diverges by leveraging DWT to break down image details into distinct frequency sub-bands. This comprehensive approach incorporates color information through the decomposition of RGB channels and subsequent wavelet and GLCM texture feature extraction, demonstrating the efficacy of this method in preserving discriminative features across scales in the context of underwater image analysis.

Regarding wavelet transforms, there are two main types of wavelet functions: scaling (or approximation) functions denoted by \(\varphi\) and wavelet functions denoted by \(\psi\). DWT can be realized using either the scaling function \(\varphi\) or wavelet function \(\psi\). The choice between \(\varphi\) and \(\psi\) depends on the specific wavelet transform algorithm and the properties desired for analysis.

The DWT is described as follows:

$$\begin{aligned} W_{\varphi }(a, b) = \frac{1}{\sqrt{M}} \sum \limits _x f(x) \varphi _{a, b}(x), \end{aligned}$$
(1)
$$\begin{aligned} W_{\psi }(a, b) = \frac{1}{\sqrt{M}} \sum \limits _x f(x) \psi _{a, b}(x), \end{aligned}$$
(2)

where M represents a normalization factor, which ensures that the wavelet transform is energy-preserving or unitary. In addition, a and b control scaling and translation of \(\varphi\), respectively.

In the exploration of DWTs, various wavelets characterized by their scaling function \(\varphi\) were carefully considered, each offering unique properties for signal decomposition. This study explores six common wavelet families, namely the Haar wavelet (haar), biorthogonal wavelets (’biorX.Y’, where X and Y refer to the orders), Symlets wavelet (symN), Meyer wavelet (dmey), Coiflet wavelet (coifN), and Daubechies wavelet with approximation order ’dbN’, where filter length increases with order (N). While the Haar and biorthogonal wavelets exhibit symmetry, the Daubechies wavelet is approximately symmetrical. In particular, the use of symmetric Daubechies wavelets captures attention for efficiently representing real signals in multiresolution and enhancing and restoring images in an effective manner (Lina 1997; Stanković and Falkowski 2003). Although biorthogonal wavelets offer advantages for image compression (Krishna et al. 2022), they were not used because the focus was on image classification. Furthermore, Daubechies wavelets were selected rather than Haar wavelets for several reasons: they offer superior frequency localization (Sharif and Khare 2014), more efficiently represent image features because of having more vanishing moments, capture finer details and variations, and generally yield smoother signal reconstructions (Park et al. 2001).

Daubechies wavelets

This study emphasizes the use of Daubechies (db) wavelets, denoted by dbN, where N represents the number of points in the basic filter, characterizing this wavelet family within the orthogonal wavelet spectrum (Daubechies 1990). These wavelets are distinguished by their optimal vanishing moments, exhibiting the most vanishing moments in comparison with all families within this category. A vanishing moment refers to a wavelet’s ability to represent polynomial behavior or information in a signal. Specifically, the number of vanishing moments in db wavelets is equal to half the count of filter points (\(\frac{N}{2}\)). For instance, db2, with two filters, has one moment and encodes polynomials of one coefficient, representing constant signal components. Similarly, db4 encodes polynomials of two coefficients, covering constant and linear signal components, while db6 encodes three coefficients, representing constant, linear, and quadratic signal components. This unique property aligns with the pursuit of precision and computational efficiency in signal/image processing, highlighting the ability of this wavelet family to encapsulate varying signal components in different applications.

To harness both spectral and spatial information, we decomposed the three color channels (red, green, and blue) of each image using a two-level wavelet transform with a Daubechies-like (db2) filter. This decomposition segmented each channel into four sub-bands: LL, LH, HL, and HH (see Fig. 3). The LL sub-bands primarily contain smoothed approximations, while the LH, HL, and HH coefficients encompass more detailed information. Consequently, this process yielded 12 features per image (3 color channels \(\times \;4\) sub-bands), effectively capturing detailed information at various orientations. We iterated this procedure for a second, third, and fourth level of decomposition in order to enhance the analysis, resulting in an additional 36 features. In total, we extracted 48 wavelet decomposition parameters per image, considerably contributing to texture feature generation.

4.2 Texture feature generation

This study focuses on using GLCP for precise texture feature extraction in COTS images. GLCP is a robust statistical method that considers spatial relationships between pixels, capturing the co-occurrence frequency of pixel pairs with specific gray-level values and predefined spatial offsets. This is achieved using GLCM, which encodes the spatial distribution of similar gray-level values within the image. This approach leverages wavelet decomposition to capture detailed texture information and extracts edge features from the long thorns of the starfish, thereby enhancing the robustness of the classification approach. In addition, the focus extends to the probabilistic interpretation provided by GLCP, refining the understanding of spatial texture and enabling the derivation of quantitative measures such as contrast, energy, homogeneity, and dissimilarity. These measures, supported by the literature (Mokji and Bakar 2007; Kiaee et al. 2019), allow for comprehensive capture of intricate texture and edge information.

Going beyond the basic approach of counting co-occurrences within a set window, this method explores spatial texture by computing \(C_{ij}\), which represents conditional probabilities of any two gray levels (i, j) based on their specific displacement. The analysis relies on G, which represents the number of quantized gray levels, representing the tonal range of the image from 0 (absolute black) to \(G-1\) (pure white). In this context, \(P_{i,\,j}\) precisely monitors the frequency of encountering gray levels (i and j) as neighboring pixels, with a separation indicated by a specific displacement in the x- and y-directions. The collection of GLCP can be defined as follows:

$$\begin{aligned} C_{i j}=\frac{P_{i j}}{\sum \nolimits _{i, j=0}^{G-1} P_{i j}}. \end{aligned}$$
(3)

Revealing the intricacies of COTS texture that extend beyond simple intensity counts, four key Haralick features were studied: contrast, energy, homogeneity, and dissimilarity (Haralick et al. 1973). From the perspective of GLCP, these features are examined as prominent features for capturing all necessary information from the four decomposed levels of the wavelet transform.

  1. (1)

    Contrast (CON): COTS often exhibit more distinctive features and color variations compared with the surrounding coral reef (Dayoub et al. 2015). Contrast is effective in capturing these local variations, making it easier to differentiate between the starfish and the background.

    $$\begin{aligned} \text {Contrast}(\text {CON}) = \sum \limits _{i, j=0}^{G-1} C_{i j}(i-j)^2. \end{aligned}$$
    (4)
  2. (2)

    Energy (EN): The orderliness and structured appearance of COTS can be highlighted by high energy values. This feature is useful for identifying regions where the distinct patterns of starfish attract specific attention versus the more uniform coral reef background.

    $$\begin{aligned} \text {Energy}(\text {EN}) = \sum \limits _{i, j=0}^{G-1} C_{i j}^2. \end{aligned}$$
    (5)
  3. (3)

    Homogeneity (HOM): COTS may present regions with uniform color or texture. By emphasizing areas with similar gray levels, homogeneity assists in identifying these uniform patches on the starfish, contributing to its recognition.

    $$\begin{aligned} \text {Homogeneity}(\text {HOM}) = \sum \limits _{i, j=0}^{G-1} \frac{C_{i j}}{1 + (i - j)^2}. \end{aligned}$$
    (6)
  4. (4)

    Dissimilarity (DIS): Dissimilarity is crucial for capturing the differences between adjacent pixel values. In the context of COTS identification, it assists in highlighting areas where the texture of starfish considerably deviates from that of surrounding coral, aiding in anomaly detection.

    $$\begin{aligned} \text {Dissimilarity}(\text {DIS}) = \sum \limits _{i, j=0}^{G-1} C_{i j} |i - j|. \end{aligned}$$
    (7)
figure a

Algorithm 1 Feature extraction from images

4.3 Classification

Classification, a crucial task in machine learning, involves the categorization of data into predefined classes with respect to their features. This study employs four robust supervised classifiers to enhance decision-making in this classification process: (1) XGB, (2) random forest classifier (RFC), (3) logistic regression (LR), and (4) support vector machines (SVMs) using a radial basis function kernel. XGB is a highly effective algorithm because of its ensemble learning approach. The iterative boosting process combines predictions from multiple decision trees, correcting errors and increasing overall model accuracy (Chen and Guestrin 2016). Its superiority is attributed to its adept handling of intricate feature interactions and complex relationships, making it particularly advantageous for diverse classification tasks. By incorporating 192 features generated through a combination of multiresolution and texture analysis across all four levels of wavelet decomposed coefficients, XGB effectively uses this comprehensive information to augment its predictive capabilities. Moreover, its algorithmic structure is based on optimizing a cost function via the sequential addition of decision trees. Each tree learns from the errors of its predecessors in this iterative process, and, coupled with the regularization term and gradient descent-based optimization, it adeptly manages complex relationships within the feature space. This capability results in superior classification performance vis-à-vis the reef ecosystem study. The proposed solution leverages XGB’s robustness in adeptly managing numerous variables, particularly in reef environments where correlations are prevalent, thus considerably increasing both the accuracy and speed of COTS classification in such ecosystems, hence providing researchers with a dependable and efficient approach for such environments.

5 Results and discussion

5.1 Datasets and evaluation metrics

Three distinct datasets were integrated to construct a comprehensive, consolidated dataset of 700 images evenly distributed, with 350 images per class: COTS and non-COTS. The primary dataset was derived from the CSIRO COTS Detection Dataset by Liu et al. (2021), meticulously designed to address the COTS outbreak-induced coral loss on the Great Barrier Reef. Encompassing 23500 underwater images with a resolution of width (\(\widehat{w}\)) 1280 pixels and height (\(\widehat{h}\)) 720 pixels, this dataset features 4919 images depicting COTS, including instances with multiple starfish in a single frame. A subset of 140 images (70 COTS, 70 coral environments/non-COTS) was taken from this extensive collection. This involved excluding consecutive frames, mitigating the risk of the model learning continuous angles during training that could result in overfitting during testing. The second data source was the Heron Island Coral Reef Dataset (HICRD), comprising 6003 low-quality, 3673 good-quality, and 2000 restored images. This dataset exclusively contained underwater coral environments and did not contain COTS in any frames. Strategically, 100 coral reef images were selected from HICRD, emphasizing low-light conditions to align with the literature findings, indicating a 27% higher likelihood of finding COTS in low-brightness areas compared with visible areas (Kayal et al. 2017; Ling et al. 2020). The third data source, obtained from an open Kaggle page, also used by Heenaye-Mamode Khan et al. (2023), contributed 280 COTS and 179 non-COTS images. This meticulously curated dataset, which incorporated a deliberate strategy to prevent overfitting, enabled comprehensive training and evaluation of the model across diverse scenarios, including those with various tonalities and textures dependent on environment and depth.

In terms of the evaluation metrics, the assessment of classifier models involves key indicators such as Accuracy, Precision (P), Recall (R), and F1 score. These metrics provide valuable insights into the effectiveness and real-time performance of the classification process. Simultaneously, the dataset is split into 80% for training and 20% for testing, hence comprising 559 images for training and 140 for testing. During the initial experimentation phase, the focus was primarily on accuracy as the key metric, spanning various algorithms, including XGB, RFC, LR, and SVM. This exploration encompassed the performance across various wavelet families, allowing for assessment of their effectiveness and suitability for specific classification tasks. In this experimental setup, XGB, RFC, and LR used the Binary Cross Entropy Loss function, while SVM employed the Hinge Loss function.

5.2 Wavelet selection and evaluation

Experiments were conducted using various wavelets from different families to evaluate the effectiveness of wavelet-based and GLCP statistical measures for COTS detection, as presented in Table 1. After assessing the performance at the first level of decomposition, several promising wavelets were selected for further analysis at higher decomposition levels. These included Haar, Symlets (Sym2, Sym4, Sym6, Sym8), Coiflets (Coif2), Biorthogonal (bior2.2), and Daubechies (db2, db4, db6). Wavelets with poor performance and high computational costs were excluded from further consideration.

Table 1 Performance comparison of different wavelet families and variants

The decision to exclude wavelets with higher vanishing moments, such as Symlet variants with vanishing moments of more than 5 (10 filters) and Daubechies wavelets with vanishing moments of more than 4 (8 filters), was influenced by several technical factors. While higher-order wavelets may capture more detailed information, not all of this information may be relevant or discriminative for the classification task. This is because some finer details captured by higher-order wavelets may be noise or irrelevant variations that introduce confusion into the classification process. In addition, higher vanishing moments may not effectively capture subtle texture details required for accurate classification because the starfish appeared relatively smaller in the dataset. Similarly, Meyer wavelets were excluded from further analysis because of their poor performance in capturing relevant features for COTS detection. This comprehensive evaluation process enabled identifying and focusing on the most effective wavelet variants for the specific classification task, considering both the performance and computational considerations.

5.3 Higher-order decomposition analysis

Wavelet decomposition reveals layers of increasingly fine details (textures, edges) within progressively smoother representations. This layer-by-layer analysis unlocks richer information for diverse image-processing tasks. Initially, we opted for wavelets from level 1 decomposition, capturing the overall structure and dominant textures of the starfish. However, with progression to higher levels (2, 3, 4), hidden details such as subtle variations in spine texture, slight differences in arm shapes, and even internal structures became more visible in high-resolution images, as presented in Table 2. Each level of decomposition represents the image at a different resolution, hence enabling the analysis of features at various scales. Integrating these features extracted from higher levels of decomposition can furnish the machine-learning model with richer information, potentially enhancing discrimination and classification accuracy.

Table 2 Extended evaluation–Performance metrics of wavelet families in subsequent decomposition levels

After evaluating the performance across various decomposition levels, we observed that certain wavelet variants, including ’sym2’, ’sym4’, ’sym8’, and ’db2’ wavelets up to level 4 decomposition, yielded notable results. Of these, the Daubechies wavelet with 2 filters (one vanishing moment) was the most effective. However, beyond level 4 decomposition, the diminishing size of the image reduced to more than approximately \(\frac{1}{64}\) of the original size rendered further decomposition ineffective. This decline in utility was attributed to the addition of redundant features, resulting in sustained accuracy up to level 5, followed by subsequent decline, prompting the discontinuation of further decomposition levels.

The performance of the models was assessed using the refined feature set. Noteworthily, the analysis results highlighted the considerable efficacy of the XGB and RFC models, both of which showed promise for further enhancement with larger training datasets. The performance of the SVM model leveled off after a certain point, indicating its limited ability to incorporate additional data points, as shown in Fig. 4. Consequently, we decided to discontinue its usage for subsequent analyses. Remarkably, the exceptional area under the curve of 96% achieved by XGB in the receiver operating characteristics (ROC) analysis highlights its superiority in distinguishing COTS from other entities in the reef ecosystem (see Fig. 5). This outstanding performance not only solidifies the efficacy of XGB but also suggests its potential as a cornerstone in future efforts of coral reef monitoring and management.

Fig. 4
figure 4

Validation accuracy plots of various models

Fig. 5
figure 5

Receiver operating characteristic (ROC) curve for the two classes (COTS and non-COTS)

5.4 Quality feature analysis

To address the issue of redundancy within the 192 generated features, PCA was employed to streamline the feature vector. A comprehensive series of experiments was conducted using various feature subsets obtained via information gain feature selection. These subsets included Top-1, 3, 5, 10, 25, 50, 100, and 150, each representing different numbers of features with the highest information gain, as presented in Table 3. For instance, the Top-1 feature subset comprised only one feature with the highest information gain, while the Top-50 feature subset consisted of 50 features with maximum information gain.

Table 3 Model results of  ’db2’ wavelet after PCA

Through systematically reducing the dimensionality of the feature space, the resilience of the models, particularly XGB and RFC, in maintaining high classification accuracy was showcased even with a considerably reduced number of features. Noteworthily, the findings revealed that the models consistently achieved remarkable accuracy levels via only 50 features, confirming the effectiveness of the approach in streamlining the classification process without sacrificing predictive performance. This highlights the practical applicability of the approach, particularly in scenarios wherein computational resources and ease of model interpretation are crucial factors.

5.5 Model comparision

Thus far, there has been only limited exploration into COTS detection, with deep learning models showing promise of achieving higher testing accuracies. However, these existing models do not comprehensively explain the complexities involved in COTS image classification. Table 4 comprehensively assesses diverse techniques for COTS detection, evaluating key metrics, namely accuracy, precision, recall, and F1 score. The method, represented by [Proposed], attracts attention with its remarkable accuracy rates of 95.00% and 93.57%, respectively, leveraging colored wavelet-guided GLCP in conjunction with XGB. This success is underpinned by the strategy of integrating features that accentuate edge responses in COTS images drawn from edge-oriented texture descriptors, effectively supplementing texture-based analysis. Noteworthily, the approach also exhibits high precision, further enhancing its effectiveness in the accurate identification of COTS. The substantial performance margin over prior methods, exemplified by the RFC and particle filter (Dayoub et al. 2015), and the enhanced VGG19 with CBAM (Heenaye-Mamode Khan et al. 2023), highlights the efficacy of this novel approach. Moreover, the framework outshines models such as EfficientDet-D0 (Pooloo et al. 2021), hence reaffirming its superiority and representing a considerable leap in COTS detection.

Table 4 Comparative performance analysis of benchmark COTS detection methods

6 Conclusions

This study introduced a robust framework for enhancing the effectiveness of COTS control programs, crucial for maintaining reef resilience in the Indo-Pacific region. By innovatively combining texture-based and edge-based features within color images, the methodology achieved a considerable accuracy of 95.00% in identifying and categorizing COTS across diverse reef structures. This performance, validated against the established techniques, is invaluable for directing COTS control strategies and optimizing computational resource allocation. A series of tests conducted on multiresolution analysis using different wavelet families demonstrated that db wavelets outperformed the others by effectively removing smooth background noise while preserving sharp edges and other features in the image. The finite length of db wavelets ensured that their influence was localized within the image, making them compact and decreasing computational complexity. These characteristics make db wavelets particularly advantageous for this application.

Future improvements, particularly through integration with neural network approaches and expansion of the image dataset, are expected to further increase the adaptability and effectiveness of this method in marine conservation. In addition, employing wavelet descriptors for object recognition would broaden the application of this technology in real-time video processing. Wavelet descriptors can capture fine details and textures, making them well-suited for the detection of various marine species and objects within complex underwater environments. Exploring this approach further vis-à-vis video processing would yield notable advancements.