1 Introduction

Agriculture serves as a fundamental cornerstone in supplying sustenance and essential resources crucial for human survival and economic progress. Enhancing crop yield involves cultivating nutrient-rich plants that necessitate a balanced intake of 16 essential nutrients, categorized as follows:

  • Primary Elements: Carbon, Hydrogen, Oxygen

  • Macronutrients: Nitrogen (N), Phosphorus (P), Potassium (K)

  • Secondary Nutrients: Calcium (Ca), Magnesium (Mg), Sulfur (S)

  • Micronutrients: Boron (B), Chlorine (Cl), Copper (Cu), Iron (Fe), Manganese (Mn), Molybdenum (Mo), Zinc (Zn)

Soil degradation caused by factors like salinization [1] and improper irrigation practices leads to nutrient depletion, ultimately resulting in stunted plant growth and decreased crop yield [2, 3]. Deficiency of micronutrients is especially noticeable in crops such as bananas (Muss Sp.), which have significant cultural, economic, and nutritional importance in regions like India. The visible symptoms of micronutrient deficiencies, such as Boron and Iron, are noticeable on banana leaves, leading to the use of deep learning methods for their identification.

The current techniques primarily focus on identifying macro-nutrient deficiencies like Nitrogen, Phosphorus, and Potassium, mainly in crops such as maize, rice, wheat, and cotton. However, only a few researchers have explored the use of image analysis methods to diagnose micronutrient deficiencies in banana leaves.

This study bridges the research gap by addressing the lack of comprehensive methodologies for detecting micronutrient deficiencies in banana leaves through the utilization of deep learning techniques.

Deep learning, demands an extensive image dataset to achieve meaningful accuracy. Consequently, an exclusive dataset is meticulously curated by physically capturing images during field visits to banana plantations. Because of the restricted image count in the dataset, Image Augmentor and Autoencoder techniques are utilized for augmentation, resulting in the creation of a diverse and expanded dataset. The augmented dataset consists of \(\tilde{1}1000\) images evenly spread across all categories of leaf images that are deficient in nutrients [4]. A novel CNN model is designed incorporating skip connections to detect micronutrient deficiency in banana leaves namely Boron and Iron. The model is able to achieve a high accuracy of 95% in less execution time. In the pursuit of creating a comprehensive multinutrient CNN model, separate CNN models are developed for each specific micronutrient deficiency namely Boron and Iron. Further, individual models are integrated to form a unified model capable of effectively identifying multiple micronutrient deficiencies in banana leaves.

The contributions of this study encompass:

  • Development of deep learning models tailored for detecting Boron and Iron deficiencies in banana leaves

  • Comparative analysis with established pre-trained models like VGG16, DenseNet, and Inception V3

  • Provision of recommendations for nutritional supplementation to rectify identified micronutrient deficiencies

2 Motivation

Micronutrients play crucial roles in various physiological processes, including growth, cellular function, metabolism, and tissue repair. Detecting micronutrient deficiency can help to identify the specific nutrient requirements for different age groups to ensure healthy growth and development in individuals.

Bananas, being highly nutritious fruits, offer several health benefits. They are packed with essential nutrients, including vitamins (such as vitamin C, vitamin B6, and folate), minerals (such as potassium and magnesium), and dietary fiber. Further, nutrient deficiencies in banana plants can manifest in various symptoms on the leaves, for which a Deep Learning model can be employed to automate the detection process. By automating the process of plant health assessment, deep learning strategies provide several advantages like

  • Reduce the time and effort required for manual inspection

  • Enabling faster detection of plant diseases, nutrient deficiency, or stress

  • Offer more objective and consistent assessments

  • Contribute to improving crop management practices and increasing yield

  • More sustainable agriculture.

Hence, the motivational reasons, which aim to tackle specific agricultural challenges and leverage cutting-edge technologies to improve crop productivity are:

  • Effects of micronutrient deficiency on human health

  • Deficiency of micronutrients is exhibited on the leaves of banana plants

2.1 Effects of micronutrient deficiency on human health

Micronutrients, which encompass essential vitamins and minerals, play crucial roles in various physiological processes, supporting overall health and well-being. These vital micronutrients perform diverse functions for the proper growth and development of the body namely

  • Facilitating enzyme

  • Hormone production

  • Synthesis of various essential substances

Table 1 Impact of deficiency of micronutrients on human health

The impact of micronutrient deficiencies on human health is of utmost importance, as their absence can lead to severe and potentially life-threatening conditions. For instance, inadequate Iron intake, especially in children and pregnant women, can result in anemia, characterized by a decrease in red blood cells or hemoglobin concentration, leading to symptoms like fatigue, weakness, shortness of breath, and dizziness. As evidenced by Ram [7], these deficiencies can give rise to visible and hazardous health conditions, emphasizing the critical need for addressing and preventing the deficiencies. A comprehensive Table 1 details the impact of micronutrient deficiencies in humans.

The micronutrients, also known as “HIDDEN HUNGER” are affecting billions. As cited by Hannah [6], approximately two billion people, accounting for 30% of the world’s population, suffer from deficiencies in one or more essential micronutrients, underscoring the significant global challenge of addressing malnutrition and nutrient deficiencies. The repercussions of these nutrient deficiencies often lead to severe and long-term health complications, adversely affecting the well-being of life for many individuals worldwide. For instance, the prevalence of anemia is alarmingly high, affecting around 42% of children under 5 years of age and 40% of pregnant women globally.

Moreover, UNICEF reports that only 66% of households worldwide have access to iodized salt, indicating a significant risk of micronutrient deficiencies. Insufficient levels of essential micronutrients have escalated the likelihood of health issues such as stunting, birth defects, and blindness, highlighting the critical importance of addressing this challenge. For improved overall health, it is possible to prevent these deficiencies through

  • Effective nutrition education

  • Promoting consumption of a diverse and balanced diet

  • Incorporating a variety of foods

  • Implementing measures like food fortification

  • Providing supplements to ensure adequate nutrient intake

As the world’s population continues to grow, the imperative to boost food production and elevate its nutritional content becomes ever more pressing. Consequently, by focusing on cultivating nutrient-rich crops, plants become paramount to ensure a steady supply of nutritious food for the growing population.

Bananas are incredibly healthy, convenient, delicious, and one of the most inexpensive fresh fruits liked by most people in the world. They are rich in Boron, Iron, Potassium, and Magnesium. This work aims to detect micronutrient Boron and Iron deficiency in bananas, helping farmers to grow banana fruits rich in nutrients and in turn improve the health of human beings.

2.2 Deficiency of micronutrients are exhibited on the leaves of banana plants

Bananas are a wholesome and nutrient-packed fruit, serving as an excellent source of essential nutrients, including fiber, potassium, vitamin B6, vitamin C, antioxidants, and phytonutrients, all of which contribute to their positive impact on overall health. Nutritional deficiency symptoms in bananas can be predominantly observed on their leaves. Table 2 summarises the essential nutrients and their deficiency symptoms visible on the leaves.

Table 2 Micronutrients and deficiency symptoms on leaves of banana plants

Conventional laboratory methods of soil and plant analysis for identifying deficiencies in nutrients are both useful and accurate, but they are time-consuming and incur high costs. Nutrient deficiencies in banana plants primarily affect the color, size, and edges of the leaves, depending on the type of nutrient. The application of deep learning strategies to detect deficiencies of micronutrients such as Boron and Iron in banana leaves can be a valuable approach. Timely results achieved from deep learning models will help farmers to take appropriate measures and grow nutrient-rich banana plants.

3 Literature survey

In recent times, researchers have been pivotal in enhancing agricultural output through the utilization of AI technology, resulting in groundbreaking advancements across different facets of farming methodologies [10, 11]. Pioneering efforts by researchers have been focused on leveraging deep learning CNN models to detect nutrient deficiencies and diseases in crops through image processing and analysis, as evidenced by works such as those by Hassan [12]. Exploration in this domain has the potential to cultivate effective strategies for nutrient management, thereby fostering profitability and sustainability within the banana industry. This research is currently advancing in the following key areas:

  • Identification and prediction of diseases in banana plants

  • Finding Banana fruit ripening stages

  • Detection of nutrient deficiencies in banana plants

3.1 Identification and prediction of diseases in banana plants

Prerana and colleagues [13] have developed an integrated system that leverages a Convolutional Neural Network (CNN) to extract meaningful features from banana images, which are then fed into a K-Nearest Neighbors (KNN) algorithm for accurate disease prediction in banana plants. The diseases predicted are namely Mosaic, Black Sigatoka, Yellow Sigatoka, Panama wilt, Streak, etc. The system also provides preventive measures and precautions to the farmer for disease detection.

Gokula Krishnan et al. [14] have experimented with a hybrid segmentation technique called Total Generalized Variation Fuzzy C Means (TGVFCMS) on the CIAT image dataset. TGVFCMS was able to segment the disease-affected area with 93% accuracy and was successful in detecting five different diseases namely Fusarium Wilt of Banana (FWB), Black Sigatoka (BS), Xanthomas wilt of banana or Banana Bacterial Wilt (BBW), Yellow Sigatoka (YS), and Banana Bunchy Top (BBT) diseases in banana plants. The CIAT image dataset consists of 18,000 images and only 9000 images belong to five classes of diseases.

Niraj et al. [15] have proposed a deep learning method with an aim to cluster images of banana leaves into two types of diseases namely Black Sigatoka and Black Speckle. The dataset employed consisted of 653 images and they belong to three categories which are healthy (360 images), Black Sigatoka (220 images), and Black Speckle (43 images). The model was able to achieve an accuracy of 90%.

Sophia et al. [16] have developed a mobile application to detect diseases in banana leaves. The application uses ResNet152 and InceptionV2 deep learning models which were trained with 3000 images and have achieved 99% and 95% of accuracy respectively. The augmented image dataset size was 18,000 banana leaf images of three classes namely Black Sigatoka, Fusarium Wilt, and healthy leaves. The models were trained with an 80% training set, a 15% testing set, and a 5% validation set.

The major five diseases in banana plants— Xanthomonas wilt of banana (BXW), Fusarium Wilt of Banana (FWB), Black Sigatoka (BS), Yellow Sigatoka (YS), and Banana Bunchy Top (BBT) along with the Banana Corm Weevil (BCW) pest class were identified by capturing ariel images and processed them using ML methods by Michael [17]. Three models—Inception, MobileNet, and ResNet50 were trained with the CGIAR dataset achieving an accuracy between 70 and 99%. The dataset contained more than 18,000, while only 12,600 banana leaf images were utilized for experimental analysis.

Nandini et al. [18] proposed a Gated Recurrent CNN to classify the diseased banana plants. The Recurrent layers helped in recognizing the sequential characteristics of patterns of data in the given sequence. The proposed model has achieved an accuracy of 94%. Cristian et al. [19] have identified the progress of disease infection in banana leaf images using the LesNet deep learning model and the intensity of the diseases is measured using a Decision Tree (DT).

Disease detection was also done by Anasta [20] by capturing images using a thermal FLIR camera and processed using image processing. By applying multi-threshold methods they have achieved an accuracy of 92%. Surya et al. [21] have analyzed that the geodesic method used for segmenting the diseased banana leaves has given the least MSE parameter value among various methods like Canny, Robert, Pretwitt, Color Segmentation, Sobel, etc.

Chaitanya et al. [22] designed a 3-layer CNN to detect four types of diseases in banana leaves. The model was trained with a 1200 image dataset and has achieved an accuracy of 80%. They were able to detect Freckle and Sigatoka disease successfully.

A low-cost embedded system was designed and trained using DenseNet CNN model by Fredy et al. [23] to detect diseases in banana leaves. The embedded system was able to categorize Bacterial Wilt and Black Sigatoka diseases with an accuracy of 92%. The system was trained using a dataset built with 200 images in each category, for a total of 600 images, and was labeled by an expert. Authors have considered 80% of the dataset for training and 20% for testing.

Bolanle et al. [24] have built a novel Capsule Network model (CapsNet) to detect two major diseases of banana specifically Black wilt and Sigatoka. The proposed model classified the diseases 95% accurately compared to LeNet and ResNet CNN models. The model was trained with 1000 custom image datasets which were collected on the cultivated field across the three classes and the images were divided in the ratio of 80% for training and 20% for testing. The testing dataset was further divided into 90% for testing and 10% for validation.

An ANN with Multilayer Perceptron was designed by SweetWilliam et al. [25] to classify banana leaves infected by Sigatoka disease. Discriminative color features are extracted using a Scalable Color Descriptor and Histogram of Orientation Gradient(HOG) for texture features. The HOG descriptor has achieved the best accuracy of 98%.

An AI-based banana disease detection system was developed using Deep Learning CNN which was successful in detecting diseases timely and developed control measures with 90% accuracy. Similarly, Gokulnath et al. [26] have classified diseased banana leaves using an Adaptive Neuro Fuzzy Interference System (ANFIS).

3.2 Ripening stages of banana fruits

Kusuma et al. [27] proposed a 3-layer CNN model to identify the ripening stages of banana fruit. An accuracy of 80% was achieved. Overfitting was reduced by augmenting the positive image with a negative one. Similarly, a 4-layer CNN model is trained by Ahmed et al. [28] to classify the ripened banana fruits with the highest accuracy. Mayuri et al. [29] have designed an automatic system to identify the ripening stages of banana fruit from images. The designed system extracts the features using the Inception V3 model and classification is done by using SVM. The system is trained with an image dataset consisting of 84 images for training and 36 images for testing. Three stages of ripening namely green, ripe, and overripe were detected successfully with 89% accuracy.

Raymond and Andi [30] applied CNN to classify banana fruits into four classes specifically unripe/green, yellowish-green, mid-ripen, and overripe. Authors have used two pre-trained models MobileNet and NASNetMobile to train 436 images with 70% training and 30% testing. Authors have observed that MobileNet has given the highest accuracy of 96% with 100 epochs. Anand and et. al. [31] have used CNN to segregate ripe and raw bananas achieving an accuracy of 90%. The model was trained on 200 raw banana images consisting of 138 ripe and 60 raw banana images.

3.3 Nutrient deficiency

Amritha et al. [32] aimed to design an automatic robot that detects the deficiency of Manganese, Potassium, Sulphur, and Zinc in various crops using the CNN model to reduce the burden of the farmers. Renato et al. [33] developed a CNN model and trained it with fine-tuned transfer learning. The model recognizes deficiency of nutrients namely Nitrogen, Potassium, and Sulphur in images of banana leaves. The model is trained with a 995 image dataset. A pre-trained CNN model VGG 16 is used by applying transfer learning and fine-tuning. It has been recorded that Histogram Equalization with YUV color space has yielded the best accuracy of nearly 98%.

A web-based mobile application is developed by Jonilyn et al. [34] using Random Forest (RF) a Machine Learning algorithm to detect deficiencies in Nitrogen, Potassium, and Phosphorus on banana leaves. The application is trained in 10-fold cross-validation giving a performance of 92% accuracy. The dataset for training the system consists of 705 images, with 50 healthy leaves, 255 leaf images deficient in Nitrogen, 155 deficient in phosphorus, and 90 in potassium.

In brief, the prevailing techniques employed primarily focus on identifying macro-nutrient deficiencies such as Nitrogen, Phosphorus, and Potassium, primarily in selected crops such as maize, rice, wheat, and cotton [32,33,34]. However, there has been limited exploration into the use of image analysis methods by a small number of researchers to diagnose micronutrient deficiencies in banana leaves. Further, the models were trained on a dataset containing a restricted quantity of leaf images.

In summary, the exploration of deep learning methodologies in banana cultivation spans several domains, including disease detection, yield forecasting, ripeness evaluation, and disease advancement tracking.

3.4 Research gap

Despite the extensive research in agricultural sciences, there remains a notable void in the specific identification of micronutrient deficiencies in plants, particularly in the case of banana cultivation. Banana plants have been relatively understudied compared to other crops, leading to a scarcity of comprehensive datasets containing banana leaf images depicting various nutrient deficiencies.

The current research work aims to address this gap by utilizing a Deep Learning methodology customized for the precise detection of Boron and Iron micronutrient deficiencies in banana leaves. However, to accomplish this objective effectively, it is essential to curate a robust dataset comprising accurately labeled images representing a wide array of nutrient deficiencies.

3.5 Objectives

  • Curating a specialized image dataset sourced from various banana plantations, facilitating the accurate detection of micronutrient deficiencies in banana leaves

  • Develop a Convolutional Neural Network (CNN) model specifically engineered to discern deficiencies in Boron and Iron micronutrients within banana leaves

4 Dataset creation

The dataset forms the fundamental basis for training, validating, and evaluating Deep Learning models. A well-curated, diverse, and accurately labeled dataset plays a critical role in developing robust models that generalize well, exhibit high performance, and address real-world challenges effectively.

The custom image dataset is created which comprises a comprehensive collection of banana leaf images obtained from diverse banana plantations, encompassing distinct categories of plantains such as Musa Acuminata (Dwarf Cavendish), Robusta, Rasthali, Poovan, Monthan, and Elakkibale, gathered from various locations in and around Hassan, Karnataka, India. The deficiency of nutrients is predominantly visible as a change in color on the leaves of the plantains. Hence, the images are being labeled by an expert in agriculture into nine categories based on the deficiency in nutrients [4].

A versatile banana leaf dataset is compiled by collecting images from different mobile devices with varying camera resolutions, ensuring robustness and adaptability in training deep learning models for various real-world scenarios. Images are captured under different environmental conditions including different lighting conditions. Manual visits to banana plantations were conducted to capture images of the plants exhibiting signs of nutrient deficiency. The original images had varying dimensions, with resolutions of \(3072\times 4096\), \(3096\times 4128\), and \(3000\times 4000\) pixels, presenting the need for resizing to achieve a consistent format for further analysis and processing. For the purpose of standardization and maintaining consistent input, every image is resized to dimensions of \(256 \times 256\) ensuring uniform and steady input format for both the training and testing phases of the deep learning model. The natural background of the images is also standardized to black, ensuring consistency and reducing potential variations that might otherwise affect the model’s performance [33]. Figure 1 gives the sample visual of leaf images that are deficient in nine categories of nutrients along with healthy leaf images.

Fig. 1
figure 1

Samples of nutrient deficient leaf images

Figure 2 gives the distribution of raw images with nine categories of nutrients along with healthy leaf images included. It shows the distribution is imbalanced across the categories of nutrients.

Fig. 2
figure 2

The distribution of raw images

A balanced distribution helps to prevent biases and ensures that the model receives sufficient examples of each nutrient deficiency category for effective training. Hence, the dataset size is enhanced to 11,000 images through data autoencoder and image augmentation to boost the performance and the ability of the Deep Learning model to detect and classify nutrient deficiencies accurately in banana leaves.

  • Autoencoders: Neural networks that learn to reconstruct images, text, and other data from compressed versions of themselves. The autoencoder generates images closely resembling the originals by altering image dimensions to \(225\times 255\), \(275\times 275\), \(300\times 300\), and \(350\times 350\) pixels, thereby augmenting the dataset size.

  • Image Augmentation: Various transformations are applied to the original images resulting in multiple transformed copies of the same image. The specific parameters and the values set for image transformations are:

    • Flip Left Right - 0.5

    • Rotate - 0.6

    • Skew - 0.4,0.5

    • Flip Random - 0.5

    • Zoom - 0.3

During the augmentation process, it is ensured that each nutrient category is represented by an approximately equal number of images, ensuring a balanced and unbiased dataset for training the model effectively.

5 Methodology

The structural framework of the developed deep learning model encompasses the configuration and interconnection of its elements, dictating its efficacy, computational efficiency, and capacity for generalization. Illustrated in Fig. 3 is an outline of the architectural design employed in the developed deep learning model.

Fig. 3
figure 3

General Architecture of the designed Deep Learning model

The following section provides a detailed description of each component of the designed model that constitutes the architecture, highlighting their functionalities and contributions to the overall system.

5.1 Dataset preparation

A diverse array of sources is tapped to acquire images by capturing leaf samples with digital cameras and cutting-edge mobile devices, showcasing leaves affected by Boron deficiency and iron deficiency, as well as healthy ones. The current dataset consists of versatile images with various resolutions, light conditions, and different environmental locations.

5.2 Data pre-processing

Images are preprocessed by resizing them to a consistent resolution and normalizing the pixel values to a suitable range (e.g., 0–1). The Online tool RemoveBackground is used to set the image background to black enhancing the color features of the images, enabling improved focus on the subject, and facilitating more accurate analysis and processing.

5.3 Data splitting

The dataset is split into 70% for training and 30% for testing. Data imbalance issue is avoided by ensuring that each category has a balanced representation of Boron and Iron deficient leaf images and healthy leaves.

5.4 Data augmentation

The quality, quantity, and context of the training data significantly affect the accuracy of deep learning models making data selection and preparation critical aspects of achieving reliable and high-performing models. However, data scarcity is one of the most common challenges in building deep learning models. Gathering such data can impose significant financial and temporal burdens. Thus, the image data generator and autoencoder techniques are employed to augment the training dataset by generating additional samples.

5.5 CNN model

CNNs notably shine in tasks like image classification, harnessing attributes like color, pattern, and texture to attain precise outcomes. Emphasizing the significance of color, the model under consideration strategically incorporates color features for classification. This pivotal feature furnishes discriminative insights for the model’s layers to discern patterns and effectuate classifications. Significantly, deviations in color can serve as indicators of micronutrient deficiencies in banana leaves; for instance, leaf yellowing may signify iron deficiency (Table 2). Figures 4 and  5 illustrate the novel thirteen-layer CNN model to detect Boron and Iron deficiency in banana leaves.

Fig. 4
figure 4

CNNSC to detect boron deficiency

Fig. 5
figure 5

CNNSC to detect iron deficiency

The designed CNN model takes into account multiple parameters to optimize its performance and accuracy namely:

  • Activation Function: The rectified linear activation( ReLu) function is used in each hidden layer. ReLu Eq. (1) overcomes the vanishing gradient problem, allowing the model to learn faster and perform better.

    $$\begin{aligned} f(x)=max(0,x) \end{aligned}$$
    (1)

    The last layer with softmax activation is best suitable for multiclass classification. The Softmax activation value is calculated using Eq. (1), which normalizes the output scores of the neural network into a probability distribution.

    $$\begin{aligned} S(y)_i=\frac{exp(y_i)}{\sum _{j=1}^n exp(y_j)} \end{aligned}$$
    (2)
    • y is the vector of raw outputs from the neural network

    • The i-th entry in the softmax output vector softmax(z) is the predicted probability of the test input belonging to class i.

  • L2 Regularizer: Overfitting can be mitigated, leading to better generalization on new data.

  • Cost Function: Categorical cross entropy is employed to decrease the loss value. Categorical Cross-entropy = (Sum of cross-entropy for N data)/N. The equation to calculate Cross entropy is Eq. (3):

    $$\begin{aligned} Loss_{CE}= - {\sum _{1}^n t_i log(y_j) p_i} \end{aligned}$$
    (3)
    • \(t_i\): Truth label

    • \(p_i\): Softmax Probability for i th class

  • Optimizer: For model compilation, the Adam optimizer is utilized to efficiently optimize the parameters of the neural network, aiding in the convergence and improved performance of the model. Adam combines the benefits of two other optimization methods: Adaptive Gradient Algorithm (AdaGrad) and Root Mean Square Propagation (RMSProp). Both AdaGrad and RMSProp are memory-efficient and require fewer computational resources compared to traditional optimization algorithms like vanilla stochastic gradient descent (SGD). Their efficiency stems from their adaptive nature, where they dynamically adjust the learning rates based on the history of gradients observed during training. Consequently, Adam enhances memory efficiency and diminishes computational overhead when optimizing neural networks, while sustaining high efficacy [35]. The optimizer is calculated using the Eq. (4)

    $$\begin{aligned} m_t=\beta _1m_{t-1} + (1-\beta _1) [\frac{\delta L}{\delta w_t}] =\beta _2m_{t-1} + (1-\beta _2) [\frac{\delta L}{\delta w_t}]^2 \end{aligned}$$
    (4)
    • \(\beta _1\) & \(\beta _2\) = decay rates of an average of gradients in the above two methods. (\(\beta _1\) = 0.9 & \(\beta _2\) = 0.999)

    • \(w_t\) = weights at time t

    • \(\delta L\) = derivative of Loss Function

    • \(\delta w_t\) = derivative of weights at time t

    • \(m_t\) = aggregate of gradients at current time t

  • Epochs: The model has undergone experimentation with varying numbers of epochs and has exhibited promising results, particularly when trained for 100 epochs, indicating its potential effectiveness in detecting the desired characteristics. The selected number of epochs strikes a balance between computational resources, model convergence, and performance metrics. For Model Convergence, through experimentation, it was observed that training for a lower number of epochs (e.g., 10–80) resulted in underfitting, preventing the model from adequately capturing essential color features within the image dataset. Conversely, extending training to a higher number of epochs (e.g., 150–1000) led to overfitting, wherein the model memorized the training data but struggled to generalize effectively to unseen data. Consequently, achieving optimal accuracy necessitated training the model for precisely 100 epochs. Considering, Computational Resources, as the number of epochs rises, so does the time required for execution. Consequently, it was found that the optimal execution time was achieved with 100 epochs.

    The model’s performance, as evaluated by various metrics, reached satisfactory levels when trained with 100 epochs.

  • Metrics: The Performance of the designed model is accessed by carefully observing various parameters including Accuracy, Loss, Precision, Recall, F1Score, and the Confusion matrix. The assessment provides comprehensive insights into the models’ effectiveness and reliability.

  • Skip connections: The degradation problem, also known as performance degradation, occurs when the performance of a deep neural network model deteriorates as the complexity or load of the architecture increases, leading to a decline in overall model performance.

    Skip connections offer an effective solution to overcome the degradation problem by providing an alternate pathway for the gradient to propagate during backpropagation. It enables the model to mitigate performance degradation and maintain the flow of information across different layers. Figure 6 depicts the fundamental skip connection model, illustrating the inclusion of direct connections between specific layers to facilitate the flow of information and address the degradation problem.

    Because the output from the preceding layer is added to the layer ahead, this operation requires no extra parameters. Mathematically, a skip connection in a CNN can be represented as follows:

    $$\begin{aligned} Output=Activation(Convolution(Input)+Input) \end{aligned}$$

    Gradient information can be lost as we pass through many layers and this is called vanishing gradient. The advantage of skip connections is that they pass feature information to lower layers which helps to classify minute details easily. Skip connections can help to have an uninterrupted gradient flow from the first layer to the last layer, which tackles the vanishing gradient problem.

    Skip Connections are employed in the designed CNN model as shown in Figs. 4 and 5, skipping some of the layers in the neural network and feeding the output of one layer as the input to the next layers.

Fig. 6
figure 6

Skip connection

5.6 Software and hardware requirements

Table 3 presents a comprehensive breakdown of the system configuration and packages utilized during model training, offering an in-depth overview of the system employed for the experimental analysis.

Table 3 Experimental setup

5.7 Performance evaluation

The farmland dataset is divided into 70% training and 30% testing datasets. Hyperparameters of the CNN model are tuned using the Bayesian Optimization (BO) technique. The BO approach is used to determine an optimized set of hyperparameters namely the count of neurons, activation function, optimizer, learning rate, and batch size for the designed model. BO takes into account past evaluations when choosing the hyperparameter to set the next value. The essential ingredients of a BO algorithm are the Surrogate Model (SM) and the Acquisition(or selection) Function (AF). The SM is often a Gaussian Process that can fit the observed data points and quantify the uncertainty of unobserved areas. It can be interpreted as an approximation of the objective function and is used to propose parameter sets to the objective function that yield an improvement in terms of accuracy score.

$$\begin{aligned} AF=\frac{accuracy}{hyperparameters} \end{aligned}$$

Hyperparameters needed for evaluation by the objective function are selected by applying a criterion to the surrogate function. A selection/acquisition function defines this criterion. A prevalent method for selecting acquisition functions involves utilizing the Expected Improvement metric, computed through the expression provided in Eq. (5).

$$\begin{aligned} EI_y * x= \int _{-\infty }^{y}{(y^*-y) p(y|x)dy} \end{aligned}$$
(5)
  • y* - the minimum observed true objective function score so far

  • x - proposed set of hyperparameters

  • y - the actual value of the objective function using hyperparameters x

  • p(y|x) - surrogate probability model expressing the probability of y given x

The objective of the function is to optimize the hyperparameter x of Expected Improvement to achieve the best possible outcome. This entails identifying the optimal hyperparameters within the framework of the surrogate function p(y|x). If p(y|x) is zero for every \(y<y^*\), then the hyperparameters x are not expected to yield any improvement. If the integral is positive, then it means that the hyperparameters x are expected to yield a better result than the threshold value.

Optimizing the hyperparameters involves evaluating the models’ performance across all classes by analyzing the metrics derived from the confusion matrix, such as Accuracy, Precision, Recall, and F1-score.

The confusion matrix (Fig. 7) defines the performance of the model and is as given below:

Fig. 7
figure 7

Confusion matrix

  • Accuracy metric is used to measure the algorithm’s performance in an interpretable way. Informally, accuracy is the fraction of predictions the model got right.

    $$\begin{aligned} Accuracy=\frac{\text {Number of Correct Predictions}}{\text {Number of Total Predictions}} \end{aligned}$$

    The Eq. (6) is used to calculate accuracy.

    $$\begin{aligned} Accuracy=\frac{TP+TN}{TP+TN+FP+FN} \end{aligned}$$
    (6)
  • Precision helps us to visualize the reliability of the deep learning model in classifying the model as positive.

    $$\begin{aligned} Precision=\frac{\text {Number of correctly classified positive samples}}{ \text {Total number of classified positive samples}} \end{aligned}$$

    Precision is calculated using the Eq. (7.

    $$\begin{aligned} Precision=\frac{TP}{TP+FP} \end{aligned}$$
    (7)
  • The recall measures the models’ ability to detect positive samples. The higher the recall, the more positive samples detected.

    $$\begin{aligned} Recall=\frac{\text {Number of Correct Predictions}}{\text {Number of Correct Predictions+Number of Negative Predictions}} \end{aligned}$$

    The equation to calculate recall is using Eq. (8).

    $$\begin{aligned} Recall=\frac{TP}{TP+FN} \end{aligned}$$
    (8)
  • The F1 score metric combines both precision and recall into a single value, providing a balanced evaluation of a classification model’s performance. The score is maximum if the recall is equal to the precision and is estimated using Eq. (9).

    $$\begin{aligned} F1 score=2*\frac{\text {(precision * recall)}}{\text {(precision + recall)}} \end{aligned}$$
    (9)

6 Results and discussion

The innovative CNNSC image classification model merges two separate models that have been specifically developed for detecting deficiencies in micronutrients, namely Boron and Iron. The architecture of every CNNSC model consists of Conv2D layers succeeded by MaxPooling layers [36, 37], totaling 13 layers in all. Skip connections are created between layers that share similar filters, allowing for direct connections and information flow between these layers. Each model is trained with a distinct dataset. For instance, the Boron CNNSC model is trained using a dataset consisting of leaf images that encompass both boron-deficient leaves and healthy leaves. Likewise, the Iron CNNSC model is trained using a dataset that specifically includes images of iron-deficient leaves. Each of the datasets is partitioned into 70% training and 30% testing for evaluating models’ performance. Ultimately, the output of each model is integrated as illustrated in Fig. 8 to derive the final results.

Fig. 8
figure 8

Merged CNNSC model

Each model undergoes training for 100 epochs, employing various batch sizes including 8, 16, 64, and 128. The Adam optimizer is utilized with an initial learning rate of 0.001, while the momentum and weight decay are set to 0.9 and 0.999, respectively.

During the training process, each model is exposed to input images of different sizes, ranging from \(16\times 16\), \(32\times 32\), \(64\times 64\), \(128\times 128\), to \(150\times 150\) pixels, enabling it to learn and adapt to various image resolutions. When the image size is reduced, it results in the loss of high-frequency information, potentially impacting the fine details and overall clarity of the visual content. The smaller the image, the less specific the representation becomes. Further, decreasing image size would decrease false negatives and increase false positives. The designed models have achieved the highest accuracy when using an image size of 150x150 pixels, indicating that this resolution yields the most optimal results for the classification task.

The performance of the designed models is evaluated by comparing them to state-of-the-art Deep Learning models such as DenseNet, VGG16, and InceptionV3. Table 4 presents the performance accuracy of designed CNNSC-Boron and other models, providing a comprehensive comparison of their respective performances in accuracy and loss parameters.

Table 4 Comparative study of accuracy and loss patterns in CNNSC-boron and pre-trained model implementations

The accuracy graphs in Fig. 9a–d and loss graphs obtained in Fig. 10a–d are analyzed to measure the effectiveness of the models, providing insights into their performance and training dynamics. Upon reviewing Table 4, it becomes evident that Inception V3 attained the highest accuracy of 99% in detecting Boron micronutrient, while the designed CNNSC-Boron model exhibited a slightly lower accuracy of 94%.

Fig. 9
figure 9

Boron—comparison of training and testing accuracy

However, a closer examination of the loss graph Fig. 10a–d highlights a notable reduction in the loss rate of the designed CNNSC-Boron model compared to other pre-trained models. In addition, validation loss can be further minimized by adding more images to the dataset.

Fig. 10
figure 10

Boron—comparison of training and testing loss

Observing the Precision comparison in Fig. 11a, it is evident that the designed CNNSC-Boron model has exhibited the highest precision value of 90%, outperforming other pre-trained models, which achieved approximately 60% precision. Furthermore, in Figs. 11b and 11c, it can be observed that the designed CNNSC-Boron model has achieved the highest Recall and F1Score values of 90%, indicating its superior performance compared to other models.

Fig. 11
figure 11

Boron—comparison of precision, recall, F1Score and execution time of all the models

Figure 12a–d, on the other hand, shows the confusion matrices that support and validate the results of the metrics Precision, Recall, and F1Score of the classification report.

Fig. 12
figure 12

Boron—comparison of confusion matrices of all the models

In the same vein, Table 5 illustrates the performance of the designed CNNSC-Iron and other models while the accuracy plots in Fig. 13a–d and loss plots in Fig. 14a–d present a comparative analysis of the designed CNNSC-Iron and other pre-trained models. In addition, if the accuracy graphs reveal that InceptionV3 demonstrates superior performance to all other models, the loss graphs suggest that the designed CNNSC-Iron exhibits a significant reduction in loss rate when compared to the other models.

Fig. 13
figure 13

Iron—comparison of training and testing accuracy of all the models

Fig. 14
figure 14

Iron—comparison of training and testing loss of all the models

Fig. 15
figure 15

Iron—comparison of evaluation metrics of all the models

Fig. 16
figure 16

Iron—comparison of confusion matrices of all the models

Table 5 Iron—comparison of accuracy and loss of all models

Likewise, the visual depictions shown in Fig. 15a–d for precision, recall, and F1 score emphasize the significant improvements attained by the CNNSC-Iron model designed in comparison to the pre-trained models.

Furthermore, the analysis of confusion matrices in Fig. 16a–d reinforces the superior performance of the designed CNNSC-Iron in comparison to other models.

The performance of a system can be determined by its execution time, as performance and execution time have an inverse relationship.

$$\begin{aligned} \frac{\text {Performance of A} }{\text {Performance of B}} =\frac{\text {Execution Time of B}}{\text {Execution Time of A}} \end{aligned}$$

Applying the aforementioned equation confirms that the execution time of the designed CNNSC model is 1.6 times faster than InceptionV3, 5 times faster than VGG16, and 3 times faster than DenseNet, indicating their superior efficiency and faster processing capabilities. It is evident from the analysis that the designed Skip Connection CNN models perform well when classifying leaf images into two classes of micronutrients namely Boron and Iron deficient.

The advantages of considering the CNNSC models for detecting micronutrient-deficient banana leaves are as follows:

  • The designed model is 40% more precise

  • CNNSC is 30% faster in providing timely decision

The aforementioned features serve as evidence that the designed models surpass the existing pre-trained Deep Learning models in terms of efficiency and performance. Furthermore, Table 6 presents a detailed breakdown of the number of layers in each model. The minimized count of Conv2D layers in the designed CNNSC models enables them to arrive at decisions much faster than their counterparts.

Table 6 Layers in the models

6.1 Prediction

A Graphical User Interface (GUI) is developed using Gradio to accept real-time captured images of banana leaves as input and predict the nutrient deficiency present in the leaf. The GUI demonstrates that the model consistently predicts nutrient deficiencies in leaf images with a precision of 90% for Boron deficiency and 60% for Iron deficiency. Upon identifying Boron deficiency, it is recommended to address the issue by:

  • Soil application of Borax @ 25 g per plant or

  • Foliar application of 0.1 % Boron(Solubore) or

  • Foliar application of Banana special @ 5 g per litre

To effectively tackle Iron deficiency after its detection, experts recommend countering the issue by

  • Foliar spray of 0.5 % Iron sulphate with 1 % urea or

  • Spray of Banana special @ 5 g per litre

7 Conclusion and future work

The intricately engineered thirteen-layer CNN architecture with skip connections emerges as a resilient solution for precisely identifying deficiencies in essential nutrients within banana plants. Trained on a bespoke dataset comprising 11,000 images of banana leaves, the model achieves an impressive accuracy rate of 95%.

Detailed scrutiny of the confusion matrices reveals that the CNNSC model outperforms its counterparts by a notable 40% in precision. Examination of the loss graph further demonstrates that the CNNSC model consistently maintains a substantially lower loss rate compared to its peers.

Moreover, the CNNSC model exhibits its efficacy by delivering timely insights to farmers, operating 30% faster than other models. Thus, the model assumes a pivotal role in advancing sustainable agricultural practices and fortifying global food security initiatives.

Identifying the deficiency of multiple nutrients in a plant, which manifests in similar symptoms on banana leaves, presents a formidable challenge in agricultural diagnostics. Visual symptom-based detection of multi-nutrient deficiencies in crops is complicated by the potential overlap of symptoms caused by different nutrient deficiencies. For instance,

  • Nitrogen deficiency symptoms may be similar to the symptoms of Sulphur deficiency.

  • Boron deficiency is accompanied by a red coloration of the leaves near the growing point when the plant is well supplied with Potassium (K).

  • In the field, distinguishing among deficiency symptoms can be challenging, as certain micronutrient deficiencies may resemble symptoms caused by diseases or insect damage. For example, leaf hopper damage can be caused by a deficiency in K.

  • Yellowing of the tissue in leaf veins is a common symptom observed in deficiencies of Zinc, Magnesium, Iron, and Manganese.

Recognizing the complex interplay of multi-nutrient deficiencies and diseases poses a formidable challenge, yet there exists the promising potential to expand the scope of this endeavor towards identifying deficiencies across various nutrients in banana plants in the foreseeable future.