1 Introduction

Waste recycling and management is now a serious concern causing several issues on human health as well as on our atmosphere. The world is now more interested in waste management in terms of developing technologies to reduce waste volume. Recycling is one of the major ways to limit waste generation. But the key step before recycling is the segregation of the waste as per its nature by using new technologies. Authors are preferring the automatic segregation approach instead of doing it manually because of its volume, accuracy, and cost-effective process [1]. Waste classification and recycling are being done at some fixed locations somewhere outside the city. All these wastes need to be dumped and sorted at that place with the help of large labor intensity, lower sorting performance, and a poor working environment with less accuracy. The problem of waste classification can be tackled from the initial phase of utilization, in the house effectively, or at its source based on its structure, fabric, and several other features, including its relationship to other items, like its place, structure, and enhancement of its intellectual ability and avoid wastage of resources; and various kinds of trash must be filtered by segment to meet the requirements for environmental protection [2]. However, because people’s knowledge of waste classification is minimal, classification is difficult, as there are many different forms of waste. The people living in the villages can’t classify these waste materials. For example, multiple studies in developed countries have reported that the mean garbage collection in Europe is 517 kg annually, with just a small portion of it being recycled [3]. As per the Environmental Protection Agency, 75% of garbage generated by US individuals is reusable, but they’re only 35% is being recycled, and according to the latest international assessment, worldwide garbage volume will grow by 70% by 2050 [4].

Food waste leads to environmental problem such as greenhouse gas emissions [5]. Food waste creates methane when it is dumped in a landfill approximately 58% [6, 7]; also, it takes up space and poses an "upstream" risk to groundwater contamination. Landfill gas collecting devices are unable to catch the methane produced approximately 61% by food waste that is escapes the atmosphere [8]. The authors [9] described that the value of agricultural product waste in the food sector is USD 750 billion based on producer pricing (not including fisheries and seafood). The gross domestic product (GDP) of Switzerland is equal to this sum. The amount of food produced and ultimately wasted or lost is equivalent to an area the size of China, which is needed to produce one-fourth of the total amount of agricultural water used annually [10]. World leaders, governments, and environmental organisations have pledged to address this problem by reducing food waste, diverting it from landfills, and recycling waste. This problem can also be handled and making it more promising with automating the waste classification and management system by adding Artificial Intelligence (AI) in it.

1.1 Organic and recyclable waste classification

The waste mostly ends up in landfills. The major issues related to landfills is Eutrophication, increase in toxins, animals consuming toxic material, and land-water pollution. The major issue presented in the current work is based on toxins and waste materials of households. Hence, the data is segregated into two classes, organic and recyclable material. The smart automated segregation leads to lesser toxins filling landfills. The challenge to classify recyclable waste from organic items and/or the different type of waste, requires an automated approach as an easy yet accurate solution.

Multiple authors have published approaches based on machine learning and deep learning. The advancement in machine learning(ML) and deep learning (DL) models is playing a significant role to tackle these above-mentioned issues utilizing computer vision [11, 12]. Thorough research on waste classification and management has been carried out based on popular ML algorithms, such as the artificial neural network (ANN) [13], MAPMobileNet-18, Mobile-NetV3 [14], Mobile-NetV2 [13], and popularly Convolutional neural network (CNN) is broadly being used in picture classification, recognition, and segmentation [15].

Working towards enhancing the existing results of Waste classification, proposed paper works using CNN model and add layer level improvements with the use of a)Leaky-Relu as the activation function [16] and b)regularization using Drop-out [17]. The model is experimented with Network Depth and L1 and L2 regularization based hyper-parameter fine-tuning. The results shown are best combinations.

1.2 Deep architectures

Basically, CNN is made of a feature extraction layer (convolutional layer) and a classifier (Fully-connected layer). The classifier layer consist of huge number of neurons. The Dropout could effectively produce classification results. The Future alternate to this can be convolutional layers and de-convolutional layers to have efficiency in outcome. In short, multiple improvements have been observed in data augmentation and classifier layer based changes, but only a few studies have been observed in DCNN using activation layer based improvement for better feature extraction and hyper-parameter tuning along with Dropout at classifier level in organic and recyclable waste classification.

Additionally, the classification is observed using Transfer learning based models along with Improved DCNN. The performance is compared for VGG16, VGG19, MobileNetV2, DenseNet121, and EfficientNetB0. The results of the Light model MobileNetV2 show better accuracy.

The use of DCNN is to detect recyclable from organic waste defining the problem as Binary classification. The Dataset contains High resolution images with varied background. The proposed work classify the two class waste product with an Accuracy [18] of 88.54%. In this article, an Improved CNN-based framework is presented to tackle the problem of waste classification effectively for the recycling process. The rest of the study is divided into five sections: introduction, related work, methodology, experimental results & discussion, recommendation and conclusion.

Fig. 1
figure 1

Sample images from dataset [19]

1.3 Contribution

  • Binary Classification of Organic and Recyclable household waste using Customised CNN. Improvements in a CNN architecture using activation function as LeakyRelu, and dropout for regularization are implemented, and

  • The Performance is evaluated in terms of accuracy, missed detection rate (MDR), and false detection rate (FDR).

  • Transfer-learning based models are applied to compare the results.

  • Hyper-parameter tuning is applied and performance is compared to observe the requirement of improvement(s) suggested in DCNN in present work.

A random collection of 8 pictures from our dataset is displayed in Fig. 1.

2 Related work

The separation of the waste is a manual as well as automated process. The manual process is where workers or labour separate waste into different types for disposal or recycling. The manual process is prone multiple issues such as

  • Human Error: Workers may make mistakes in sorting waste material. The efficiency of the work is hence impacted.

  • Inconsistency in Interpretation of Material: Different people may have different meaning of waste class which may lead to inconsistent sorting or classification which in-turn adds to MDR or FDR.

  • Limited Scalability: With increase in waste material and landfills, manual labour may become difficult due to its limited scaling power. Large volume leading to limited scalability leads to inefficient process.

  • Increased labour cost: Addition to previous point, the increase in waste would lead to increase in labour, thereby, increasing operational cost.

  • Almost no feedback on process: Manual work does not provide any feedback on waste streamlining, consequentially, limiting the ability to optimize recycling processes and improve waste management strategies.

Other challenges may include a)Health and Safety concerns, b)Lack of training, c)Emotional and Psychological Impact, d) Dependency on Visual Inspection, e) Limited Data Collection, and f) Environmental concerns.

To address these challenges, many waste management systems are adding automated technologies to enhance the performance and efficiency of waste classification processes. The automated processes can help reduce errors, improve scalability. To address these challenges, many waste management systems are incorporating automated technologies, such as robotics and artificial intelligence, to enhance the accuracy and efficiency of waste classification processes. Automated systems can help reduce errors, improve scalability, and minimize the health and safety risks associated with manual sorting.

Multiple published research worked closely with deep learning architectures and reported good performance. A total of 12 recent works are compiled to showcase the state-of-the-art work published so far. In the article [20], the authors presented an automatic recyclable garbage categorization technique to replicate the old methods of handling three categories of garbage: plastic bottles, papers, and soda cans. They cited an accuracy using various classifiers as 85.70% for the fine tree, 88.10% for the bagged tree, 73.80% for linear discriminant, 95.20% for quadratic discriminant, 78.60% for fine K-nearest neighbor (KNN), 81.00% for weighted KNN, 92.90% for linear support vector machine (SVM), 90.50% for quadratic SVM, and 83.3% for fine Gaussian SVM. Finally, they achieved 83.3% accuracy for bottles, 100% accuracy for soda cans, and 100% precision for papers. Where the dataset size was not clear, the categories with ever increasing data needed more rigorous models such as deep learning-based models. Hence, later on, [21] proposed a smart waste classifier that works on a ResNet-50 model that provided extractor and SVM that classified garbage into several groups like glass, metal, paper, and plastics. The suggested method was evaluated using Gary Thung and Mindy Yang’s garbage picture set of data, and it acquired a precision of 87%. Also, the authors [22] trained and compared multiple deep-learning architectures for automated waste management systems using the TrashNet dataset. Numerous CNNs designs, including VGG, Inception, and Residual Neural Networks (ResNet), were investigated. A combined Inception-ResNet framework made the maximum classification output, with an accuracy of 88.6%.

In the article [23], the authors presented a deep learning and computer vision principles-based approach for waste classification in six major categories as glass, metal, paper, plastic, cardboard, and others with the help of waste images. A training data set was prepared from internet site images, and applied to multiple layering of the CNN framework, especially the Inception-v3 model, for waste classification with an accuracy rate of 92.5%. In addition to it, [24] investigated a variety of methodologies to give a comprehensive summary. SVM using HOG features, basic CNN, and CNN with residual blocks were the models tested. They proved that basic CNN networks with or without residual blocks function well, based on the evaluation results. Later, [25] presented a ResNet18 framework that focused on the implementation techniques for the categorization of biodegradable waste materials. Glass, metallic elements, plastic, and paper were among the waste types that the program could automatically extract for categorization. The algorithm had a 92% prediction accuracy in the categorization of recyclable garbage, indicating that it could efficiently detect recyclable trash, according to experimental data. One-of-a-kind research by [26] presented a system that included a well-tuned VGG-19 framework to categorize 95 different kinds of large-sized waste materials, three hybrid features to effectively manage the imbalanced class challenge, such as class weight VGG19, XGB VGG19, and Light Gradient Boosting (LGB) VGG19, and a huge set of data with 95 classes. Analysis revealed that the system surpassed existing methods in detecting large garbage, with an effectiveness of 86.19%.

Later, with the idea of improving CNN, [27] discussed the application of automatized learning to solve a real-world problem in a real-life. In their model, they employed the rectified linear unit (Relu) activation function and Adam approaches to improve the suggested CNN architecture, and the suggested architecture gained a final accuracy of 79%. Also, [28] presented a unique system termed double fusion which used the feature and score-level fusion approaches to efficiently merge two deep learning models. The double fusion technique confirmed that deep models contribute optimally by first integrating its competencies in an initial and delayed fusion structure, then integrating the classifier performance acquired with primary and delayed fusion techniques at a score level. They assessed the accurateness, average precision, recall, and F1 score of each deep model separately, and noticed that ResNet-101 produced superior outcomes. The authors in [29] presented a hybrid technology based on multiple layered perceptions and CNN which acts as a real-time intelligent waste classifier, to classify the wastes into their respective categories. The CNN determined the category of non-metal trash, whereas the multiple layered perceptions determine binary classification, as garbage having metallic or nonmetallic properties. The trained model had an accuracy of 99% with the test data set, which is extremely dependable and consistent. The authors in [30] presented factor analysis for site survey and data processing on residents’ perceptions of disposal and management and used the CNN to categorize and acknowledge garbage images, which was used to help garbage classification decisions, with an accuracy of 85.32%. Later, [31] used the TrashNet, a well-known standard dataset with a total of 2527 photos divided into six waste types, to assess the CNNs’ efficiency and proposed an optimized DenseNet121 by using a genetic algorithm (GA) with a Fully Connected-layer (FC-layer) of DenseNet121 for enhancing the precision rate of DenseNet121. While comparing it to the CNNs used in earlier experiments, the optimized DenseNet121 attained great accuracy of 99.6%. The potential utility of machine learning models is unpredictable for application purposes like predicting wastes, diseases, market prices, and many other areas. As a consequence, DCNNs deserve future investigation and study.

2.1 Research gaps

The research already published provides less information on the trade-off between volume of data used and the performance outcome. The amount of data used in literature is small hence leads to major discussion on few-shot learning and transfer-learning. That leads to the point when if sufficient amount of large labelled dataset is available, as in proposed work, the experimentation can be done with improved and/or fine-tuned CNNs. The use of improved CNNs can lead to better interpretability and explainability.

The perturbations or addition of more data with advancing volume of dataset in complex architectures must perform reliably. The literature lacks the dynamic and evolving nature of algorithms used to handle continual learning of the model and changes in data distribution over time. In addition to this, the class imbalance factor must be addressed in multi-class classification of the present reference frame. Additionally, the robustness of the model in terms of missed classification rate and false classification rate due any small perturbations are less likely discussed. Such benchmark evaluation metrics must be included to foster fair and meaningful comparison.

2.2 Overall research goals

Based on the study of research gaps identified, the present article provides overall following research goals:

  • An improved organic to recyable waste classification model based on improved CNN is proposed.

  • Due to complex nature of images of waste, the intricate patterns and features in the image data are captured better by LeakyRelu Activation function, resulting in better model learning.

  • Dropout is used to prevent overfitting in presently used large dataset, encourages learning of more diverse feature set, thereby, improving model generalization.

  • Efficient Hyper-parameter tuning is applied to accommodate better computational efficiency and robustness. The experimentation is performed on Network layers, epochs, regularization and learning rate along with dropout rate and activation function.

3 Methodology

In this section, a detailed description of the data and the framework of architectural design is presented. The entire proposed work is based on the principle of Binary classification of organic and recyclable waste, to reduce the toxins in organic waste going to landfills. For the same purpose, the experimentation design in impressing upon a) Choice of improvements in CNN for image based data, b) Hyper-parameter tuning in the model for better performance and c) Comparison with benchmark CNNs. The proposed overall flow is shown in Fig. 2. For the better understanding, here is the road-map of the proposed work in this section.

3.1 Road-map

  • Data Description: The large dataset is divided into two classes, organic and recyclable waste of households, with a total of 25077 images. The images are resized into fixed resolution of 224 X 224.

  • Proposed model: A DCNN with LeakyRelu as an activation function and dropout for regularization is proposed.

  • Hyper-parameter tuning: The proposed model applies appropriate tuning of hyper-parameters such as learning rate, epochs, testing size, Network layers.

  • Performance Metrics: Improvement due to the proposed model can be represented in terms of performance metrics such as accuracy, precision, recall, MDR, and FDR.

  • Comparison of Pre-Trained Models: A comparison of inbuilt deep learning models along with their architectures is performed.

  • Discussion and Future recommendation: Results are discussed, and some future recommendations are presented.

Table 1 Data Description

3.2 Dataset decription

In this paper, image recognition based waste classification is proposed. Let’s say there are total N images with different resolution and Z categories / classes, the training set in waste image dataset is \((X_{i}|{_{i=1}^{N/7}}),(Y_{j}|{_{j={0,1}}})\) with each \(X_{i}\) image out of N/7 images belongs a category of Z as training image waste dataset and \(X_{i} \in R^{HXWXC}\) and \(Y_{j}\) is defined as organic and recyclable waste labels, in this research \(Z={0,1}\) respectively. In Table 1, our data set is divided into training and testing in the proportion of 70% and 30%, decided upon experimentation, of the N respectively which means 22,564 images are considered for training purposes and 2513 images are considered for testing the models with \(Z= 2\); organic waste and recyclable waste. The images are of random resolution. The images are resized to size 224 X 224. The Fig. 2 shows model is tested after training, the input set \(X_{i}|{_{i=7/N+1}^{N}}\) is suppose to produce waste label \({Y_{j}}\) corresponding to waste image \({X_{i}}\).

3.3 Proposed DCNN architecture

Convolutional Neural Networks use three concepts: pooling, weight-sharing, and local receptive fields. It can have multiple different layers as Convolutional, pooling, and FC layers. The convolution Layer is the key part of the architecture, which performs shift, multiply, and sum operations. A series of programmable filters or weight matrices, commonly referred to as kernels, make up its parameters. The main objective is to extract patterns from small areas of the input pictures that are present throughout the data set. In order to create non-linearity, the weighted sum is then combined with a bias and put through an activation function. Each convolutional layer’s weights determine the convolution filters, and each convolutional layer may have more than one filter. Each feature-containing filter is a tiny section of a picture, and during the forward pass, each filter is moved across the input image’s dimensions to produce its feature map. To make the information in the output from the convolutional layer simpler, each convolution layer is followed by an optional pooling layer. A pooling layer summarises a cluster of neurons in the convolution layer by down-sampling each feature map produced from the convolutional layer. Convolutional layers are followed by fully linked layers in a standard CNN design. Each neuron in the fully connected layer is linked to every neuron in the subsequent layer, and each value receives a vote to indicate how closely it resembles a certain class.

Fig. 2
figure 2

Implementation flow of proposed work

3.3.1 Improvements in DCNN

The issues of computational calculation in deep learning is solved partially by using the ReLU in (1) [32].

$$\begin{aligned} \textit{y} = \left\{ \begin{array}{ll} 0 &{} \text{ if } x < 0;\\ \textit{x} &{} \text{ if } x \ge 0 .\end{array} \right. \end{aligned}$$
(1)

One disadvantage of the ReLU is that it doesn’t respond to non-positive signals, resulting in the inactivation of numerous neurons while training, which may be seen as a re-ceding gradient issue for negative values once again. The inclusion of the Leaky rectified linear unit (Leaky ReLU), which activate marginally for negative values, solves the non-activation for non-positive integers, as expressed in (2).

$$\begin{aligned} \textit{y} = \left\{ \begin{array}{ll} \alpha * \textit{x} &{} \text{ if } x < 0, \alpha = 0.25;\\ \textit{x} &{} \text{ if } x \ge 0 .\end{array} \right. \end{aligned}$$
(2)

In our proposed model, the LeakyRelu activation function has been used that is based on or upgraded type of the ReLU but has few negative slopes for non-positive values rather than a Relu having a horizontal slant. this form of activation function is effective in encountering irregular gradient situations, as shown in Fig. 3.

Fig. 3
figure 3

Activation function: ReLu vs LeakyReLu

The use of Dropout reduces overfitting issues and permits the rapid growth of numerous distinct neural network topologies. The term "dropout" describes parts in a standard neural network is now no longer active (hidden and apparent). As demonstrated in Fig. 4, dropping out a node means eliminating it from the connection temporarily, together with its all inputs and output links. The concept of Dropout has been applied in our model to enhance the performance of the proposed architecture, since a) prevents overfitting, b)improved generalization by encouraging network to learn diverse features. The experimentation is performed on amount of dropout \(\lambda \)

Fig. 4
figure 4

(a)Standard Neural Network and (b)Neural Network with Dropout architecture

3.3.2 Hyper-parameter tuning

The hyper-parameters are external configuration settings in deep learning which can not be learned from the data used for experimentation. Set prior to training the model, these parameters are substantial in learning process and performance of the model.

The value of learning rate (\(\eta \)) is tuned to help convergence of model during training, ensuring stability and generalization of the model. Unlike shallow architectures, here in proposed Deep CNN the smaller learning rates are experimented with. The value choice is performed to find trade-off between overfitting and underfitting whilst ensuring capturing of waste image characteristics.

Fig. 5
figure 5

Systematic block representation of proposed work

The customised CNN of has multiple conv2D layers C and Max-pooling layers P. The proposed work experiments with the architectural layers in the form \({4C-P-2C-P-2C}\) and \({2C-P-2C}\) \({-P -2C-P-2C}\). The choice can be further experimented in future. Here xC and xP represent x amount of C layers and P respectively. The experimentation is preformed with these different network depths with epoch =50, 100, 200 and test size =30, 15. Although the experimentation is performed on Batch size as well but outcome does not reflect substantial change, hence the value is fixed to 64.

3.4 Systematic architecture of proposed solution

A CNN model is proposed made of multiples layers of the components with different respective values like 2D convolution layer (filter and kernel), LeakyRelu [16] as an activation function, batch normalization, and MaxPooling (filter, kernel, and upsampling), shown in Fig. 5. The output of the encoder is fed into the classifier where we use the components of the structure like dense unit, dropout, and SoftMax classifiers that can easily classify the organic items and waste items. The values of 2D convolution layers for components as filter and kernel long with max-pooling and FC are shown in Table 2, the value of LeakyRelu is 0.25 throughout the structure. The dropout \(\lambda \) used value is 0.25 after experimentation.

Table 2 Model: 4C-P-2C-P-2C

Defining the dimensionality of the input image is the first stage in developing the framework of a CNN architecture. High resolution leads to a rise in the size and duration of computations, which might cause the processing units and their memory to become overburdened. The overall implementation flow is shown in Fig. 2, where the whole proposed process has been divided into 3 stages named the first, second, and third stages. Input is applied at the first stage where it passes through a layer of processes as a Conv2D layer and MaxPooling. The result obtained from the first stage is fed as input of the second stage where it passes through the FC layer, dropout, and SoftMax. And at last, it classifies whether the input is a recyclable waste item as 1 or organic waste as 0 from the L-class labeled data.

Accuracy is measured using the following metrics: To evaluate and implement the effectiveness of a given approach, we introduced numerous indicators depicting our effectiveness over the problem scenario to confirm ours. We used the Confusion Matrix to determine Accuracy, Precision, Recall, F1- Score, MDR, and FDR [33].

Fig. 6
figure 6

Flow of Structure of VGG16 [34]

3.5 Architectures of models used

3.5.1 VGG16

The working of VGG16 can easily be understood, and broadly utilized the concept of the CNN framework for ImageNet, a massive optical indexed project utilized in optical object identification. The key distinguishing characteristics of VGG16 is, rather than having a great amount of hyper framework, authors focused on 3x3 filtered convolution layers having stride 1 and using the same pad, and also a Max-pool stage having 2x2 filtering framework using stride 2. Such convolution and max pool layered architecture is continued throughout the implementation. Also, it features two FC layers, followed by a SoftMax, as shown in Fig. 6. The number 16 in VGG16 reflects the notation that it contains 16 layers, including one with a distinct weight [35,36,37,38].

3.5.2 VGG19

The VGG19 architecture is a modified variant of the VGG architecture with a sum of 19 sections (16 CNN elements, 3 FC elements, 5 Maxpool elements, and 1 SoftMax element). The VGG-19 architecture is employed like the pre-processing steps of CNN. The depth of the network has been enhanced and greater than the conventional CNN framework. It has a dynamic structure means its fluctuation is found in between the multiple convolutional and non-linear activation tiers. The layered framework provides better picture characteristics, down sampling with Max-Pooling, and customization of Relu as the activation function, which also takes the peak value in the source image as that of the pooling value of these kinds of the zone. The down-sampling phase is being utilized to improve the anti-distortion competence of that network while keeping the sample’s key properties and limiting the parameters [39,40,41,42,43,44]. The stucture is shown in Fig. 7

Fig. 7
figure 7

Architectural structure of VGG19 [45]

3.5.3 MobileNetV2

An architecture termed MobileNetV2 is a type of CNN aiming to function friendly on mobile devices, having an inverted residue structure, with bottleneck stages connected by residue links. The middle tier was acting by expanding separators of the aspects using lightweight depth-wise computation as a source of non-linearity. A convolution neural layer containing 32 filters was accompanied by 19 residue bottleneck layers in the overall architecture of MobileNetV2, as shown in Fig. 8 [46,47,48,49].

Fig. 8
figure 8

Flow of Structure of MobileNetV2 [50]

3.5.4 DenseNet121

DenseNet is a CNN network framework that focuses on deepening the deep learning networks and also enhancing the efficiency to train by employing shorter interconnections among layers. The DenseNet design modifies the normal CNN architecture by connecting each layer, thus named a Densely Connected Convolutional Network. \(L(L+1)/2\) links are directly associated with L layers. The extracted features map of all previous layers are utilized as sources for each layer, and its feature maps are being used as sources for the following layers, as shown in Fig. 9 [51,52,53,54,55,56].

Fig. 9
figure 9

Architectural structure of DenseNet121 [57]

3.5.5 EfficientNet

EfficientNet is a CNN framework and scaling approach which uses an integrated coefficient to measure its various dimensions and aspects such as depth, width, and resolution effectively. There are 237 levels in EfficientNet-B0 if you count them all up, as shown in Fig. 10 [58,59,60].

Fig. 10
figure 10

Architectural structure of EfficientNetB0 [61]

In Table 3, the constraints of the pre-trained model are explained, having different attributes like the size in Megabyte, Number of parameters in millions, Depth, Time per interface step (CPU) in milliseconds, and time per interface step (GPU) in milliseconds. VGG16, VGG19, MobileNetV2, DenseNet121, and EfficientNetB0 models are used as pre-trained models.

Table 3 Comparing Pre-Trained Models

3.6 Performance metric equations

To measure the effectiveness of our suggested architecture, accuracy, MDR and FDR have been considered. A confusion matrix assists to calculate the values provided below [62,63,64,65].

  • True Positive (TP): When an architecture accurately supposes that the sample is valid.

  • True Negative (TN): When an architecture accurately supposes that the sample is invalid.

  • False Positive (FP): When an architecture inaccurately supposes that the sample is valid.

  • False Negative (FN): When an architecture inaccurately supposes that the sample is invalid.

Using the parameters of confusion matrix, Accuracy, MDR and FDR are calculated as given in (3), (4) and (5):

$$\begin{aligned} \text {Accuracy} = \frac{TP +TN}{TP +TN + FP + FN} \end{aligned}$$
(3)
$$\begin{aligned} \text {MDR} = \frac{FN}{TP + FN} \end{aligned}$$
(4)
$$\begin{aligned} \text {FDR} = \frac{FP}{TP + FP} \end{aligned}$$
(5)

4 Experimental analysis and results

The experimental design, as shown in Fig. 2 provided the guidelines for planning and carrying out the experiment, as well as interpreting the results. Following the data analysis, the findings are presented most appropriately and understandably as possible utilizing tables and graphs.

4.1 Data preparation

A total of 25,077 images are collected from Kaggle [19]. The images are of two classes, organic waste with a total 13966 images and recyclable Waste class with a total of 11,111. The signal to noise ratio of images is above 2.5 hence all images are utilized for experimentation. The the data set is divided into training and testing in the proportion of 70% and 30% of the whole respectively which means 22,564 images of both classes are considered for training purposes and 2513 images are considered for testing the models. Each image is resized to size 224 X 224. The image size can be experimented with in future scope.

The network is trained for 09 minutes on MIRC (2Intel Xeon processors with 256GB RAM and 32GB Tesla V100 using Python 2.7.

4.2 Experimentation road map

The outcome is measured for proposed CNN based classification results with LeakyRelu \(\alpha =0.25\) and Batch Size 64 as follows:

  1. 1.

    Impact of network depth, learning rate \(\eta \) and different Test size \( t= 30, 15\) along with epochs \(e= 50,100,200\) is evaluated on proposed CNN.

  2. 2.

    Accuracy of proposed CNN with \(\lambda =0.25\) and \(\lambda = 0.1\) along with MDR and FDR is evaluated.

  3. 3.

    Accuracy of proposed CNN without \(\lambda \) along with MDR and FDR is evaluated.

  4. 4.

    Comparative performance evaluation of VGG16, VGG19, MobileNetV2, DenseNet121, and EfficientNet-B0 with proposed work is explored.

  5. 5.

    Comparative performance evaluation of existing literature with proposed work is explored.

The outcome of the work is measured to check a)accuracy of the results, b)reduction in MDR and FDR. The system producing trade-off between MDR and FDR and better accuracy is considered as successful of the dataset.

4.3 Impact of network

The two architectural combinations used in the proposed CNNs are 4C-P-2C-P-2C and 2C-P-2C-P-2C-P-2C. Upon experimentation with different epochs=50, 100, 200 and test-size =30, 15, it is evident that with epochs 50 to 100, the accuracy is increasing irrespective of layer depth and presence or absence of \(\lambda \) . Where as with decrease in test size from 30 to 15, the accuracy is compromised with increasing epochs, irrespective of the layer depth of the networks or \(\lambda \). Hence, 30 as test size after observing accuracy of the outcome is finalised.

Now, observing behavior of epochs with test size 30 shows that with increase in epochs from 50 to 100, the accuracy increases but stays monotonic with further increase in epochs. With this, the maximum and stable accuracy observed in epochs 100, test size 30 is observed in network layer \({4C-P-2C-P-2C}\) as 93.28% in the presence of \(\lambda =0.25\). The corresponding MDR and FDR obtained is least among all with values 2.6% and 4.5% respectively.

Table 4 Comparative accuracy (in %) of proposed CNN with epochs =50, 100, 200, test size = 30, 15 and \(\lambda =[none, 0.1, 0.25]\) with fixed values \(\eta =0.001\), \(\alpha = 0.25\), batch size= 64

4.4 Impact of dropout

Our proposed CNN model achieved an accuracy of 88.54% with \(\lambda = 0.25\) and 85.4% without \(\lambda \) value with epochs 50 and test size =30. The use of regularization can be seen is required for better outcome. Later, with hyper-parameter tuning, experimenting with epochs =100 and test size = 30, the values of \(\lambda = [none, 0.1, 0.25]\) are utilised and the outcome shows the requirement of regularization is must, choosing to experiment at classifier level. The suitable value for given data characteristics is to be experimented. The value \(\lambda =0.25\) produces best results with network layer \({4C-P-2C-P-2C}\), epochs =100, test size =30, \(\eta =0.001\), batch size= 64, \(\alpha =0.25\). Table 4 shows the comparative accuracy (in %) of the proposed CNN with different values of epochs, test size, \(\lambda \) along with fixed values of batch size, \(\eta , \alpha \).

4.5 Comparison with pre-trained models

As shown in confusion matrix of all models used in current work in Fig. 11, the accuracy of rest of models named VGG16, VGG19, MobileNetV2, DenseNet121, and EfficientNet-B0 for this data set are 90.60%, 89.27%, 92.86%, 90.73%, and 49.43% respectively. The same is compared with proposed work in Table 5. The visual comparison is shown in Fig. 12. The Comparison of the impact of \(\lambda \) presence vs absence is shown in Table 6. The performance is certainly better with the presence of \(\lambda =.25\), given network layer architecture is same \({4C-P-2C-P-2C}\). The graphical comparison is shown in Fig. 13.

Fig. 11
figure 11

Confusion matrix of all the models

Table 5 Performance metrics comparison of pre-trained models and proposed CNN

A Comparative study of MDR and FDR for different models is performed. Our proposed model has an MDR of 12.37% and an FDR of 13.37% in absence of \(\lambda \) and 2.6% and 4.5% iwth \(\lambda =0.25\). Our proposed CNN with hyper-parameter tuning is performing better than other considered models considerably as shown in Tables 5 and 6.

4.6 Challenges in the experimentation and discussion on the performance

Five transfer learning models are deployed on the dataset. Accuracy for the models (VGG16, VGG19, MobileNetV2, DenseNet121, and proposed CNN) is 91%, 89%, 93%, 91%, and 89% respectively. MobileNetV2 achieved the best result out of all the pre-trained models with the optimal trade-off of MDR and FDR. The Top-1 accuracy is found to be poor in EfficientNetB0. The comparison is potentially on the basis of Model size and Inference time. The size of EffcientNetB0 is more than MobileNetV2. The solution to use the EfficientNet family can be EfficientNet-Lite. Where MobileNet is better with Real-time on-device classification, EfficientNet is better with Large amounts of datasets. If the dataset is increased in the future, this model may work better.

Another major observation is the imbalanced MDR-FDR outcome. Although DenseNet121 is competitive in accuracy with MobileNetV2, it performs poorly in FDR outcome. Fine-tuning is required in the Proposed CNN to achieve a better MDR-FDR outcome.

Fig. 12
figure 12

Performance of the proposed work

Table 6 Impact of \(\lambda \) in proposed CNN
Fig. 13
figure 13

Impact of Dropout on proposed CNN

Table 7 Comparative analysis of proposed work with already existing techniques with different dataset used

Additionally, the quality of the dataset is good. The SNR value is more than 2.5, hence generality of the algorithm does not include future poor quality images with structured or unstructured noise, background distortion etc. Also, the dataset augmentation is unexplored in the current work and can be explored in the future. In addition to these challenges, one major challenge is limited resource availability. The large dataset required powerful GPUs which in our case are available but may be a challenge for other authors. The quality and quantity of images produces a barrier for further research.

Lastly, the dataset in the context is required to be more diverse, with complex backgrounds and complex features to train the models effectively.

The potential of Deep learning models is unpredictable due to their broad application area in terms of comfort applications. As a result, Deep learning models require additional interpretation and analysis, and we expect to see more novel models for applications and research findings shortly.

4.7 Comparison with existing literature

Table 7 provides an extensive comparison of existing techniques with proposed work. The initial work shows that the dataset are small in size. The accuracy is hence compromised [20, 66]. Later, with the use of TrashNet and other government acquired dataset, the dataset size N increased along with Z. The use of Benchmark Pre-trained models provided increase in accuracy but lesser flexibility in terms of hyper-parameter tuning and generality of the algorithm for future changes in image quality [1, 31, 67,68,69,70]. The present work provides a flexibility in terms of hyper-parameter tuning, model generality and inter-class variability. With multiple limitations discussed in Section 4.6, the model has performed better than already existing techniques. Since the dataset used is utilized only with [19], the comparison is made only on the factor [19] used image augmentation to increase feature set and CNN to fuse features, further ridge regressor method to provide feature selection, makes it a compound and constrained method. Additionally, the quality of the image for future addition is not explored and MDR / FDR are missed in performance outcome. Yet, current work provides competitive outcome to [19].

5 Conclusion

Organic and recyclable waste image classification method is proposed in the current paper. The classification is based on improved CNN which uses deep learning to separate household waste images for landfills. The model was tested using a dataset of 25077 images and produced an accuracy of 93.28% along with MDR of 2.6% and FDR of 4.5%. The proposed CNN outcome is compared with in-built pre-trained models. The MobileNetV2 along with Fine-tuned CNN performs better when Pre-trained with an accuracy of 92%. In reality the images are of different form, varied background, type of waste and lightning conditions. Due to inadequate inter-class variability, the future work may improve the outcome with the proposed work with better generalisation across diverse scenarios.