1 Introduction

The eye is a vital organ for vision, which is one of our most basic senses to perceive the environment in biological forms. The deterioration in the health status of this organ can lead to the emergence of negative situations, from the decrease in the ability to see to its complete loss. One of the riskiest diseases for the eye is glaucoma. Glaucoma basically represents a group of eye diseases, not a single disease. As a result of glaucoma, eye damage such as loss of vision and complete loss of vision may occur. The main cause of glaucoma is damage to the optic nerves located behind the eye over time. The fact that the symptoms are not very obvious in the diagnosis of glaucoma and the disease progresses very slowly over time can cause people not to be aware of these disorders. In the absence of treatment, it is possible that the vision can be completely lost. Although it is stated that glaucoma-related disorders occur more in old age, it is also possible to encounter this disorder at an earlier age due to heredity. For the early diagnosis of this disease, a comprehensive and detailed examination of the eye is required.

Computer-aided systems have been developed more widely in recent years for the detection of eye diseases and the analysis of health. The most important advantage of these systems is that they assist the health-care professional in the final diagnosis and accelerate the decision process. Glaucoma is diagnosed by analyzing fundus images. In the analysis of these images, besides statistical analysis, artificial learning approaches are also used. It is aimed to model the decision-making process of an expert decision-maker over existing images with artificial learning. Thus, the presence of glaucoma becomes digitally questionable based on the available image data. In recent years, it can be stated that artificial learning-supported systems have become more and more widespread in studies on glaucoma.

Mahdavi and Weiss [1] investigated the measurement of retinal ganglion cells, the most critical components in the human eye, with optical coherence tomography (OCT). Within the scope of their studies, they examined other studies in the literature. In their work, they said, AI-assisted glaucoma diagnosis is still a very new field, and early research has yielded remarkable results. Lee et al. [2] analyzed the thickness of the retinal nerve fiber layer (RNFL) with the ResNet34 deep learning network using a dataset of fundus images. They calculated the risk of developing glaucoma in eyes suspected of having the disease based on RNFL thickness. Wang et al. [3] aimed to present an advanced method that will automatically extract clinical concepts in examining intraocular pressure, visual acuity, and drug results in cataract and glaucoma surgeries using electronic health records. They used a rule-based classifier as the classifier in their method. Wang et al. [4] aimed to predict glaucoma progression requiring surgery by analyzing electronic health records with deep learning methods. They claimed that they successfully predicted glaucoma that requires surgery with deep learning approaches. Hashimoto et al. [5] used deep learning methods to estimate the visual field from tomography images of the spectral region. They compared CNN results with SVM and MLR (multiple linear regression) results. They said that the prediction accuracy of the deep learning method is quite satisfactory. Iqbal et al. [6] discussed the application of deep convolutional neural networks (DCNN) in medical image analysis, specifically for interpreting endoscopic images of the gastrointestinal tract. Experimental results demonstrate the superiority of the proposed algorithm over recent techniques, showcasing its potential to assist gastroenterologists in classifying gastrointestinal abnormalities and saving time and effort in the diagnostic process. Thakur et al. [7] evaluated the accuracy of a deep learning approach to predict glaucoma from fundus images obtained in the years before the disease was diagnosed. According to their study findings, they suggested that deep learning approaches work at a logical level of accuracy. Chayan et al. [8] proposed a transfer learning approach that could classify glaucoma and compared it with pre-trained convolutional neural network models. They noted in their study that the medical assessment of glaucoma is like a black box, so it is not possible to decide on the key features behind the glaucoma prediction. Li et al. [9] generated color fundus images and visual fields data using longitudinal cohorts. In their experiments, they used artificial intelligence models to predict the presence of glaucoma, the future incidence of glaucoma, and glaucoma progression. Iqbal et al. [10] discussed the use of deep learning in medicine, particularly for diagnosing skin lesions. The proposed approach shows promise for automating and improving the efficiency of skin lesion classification, potentially assisting dermatologists and saving time and effort.

Mirzania et al. [11] conducted a systematic review of the use of deep learning applications for glaucoma detection. They assessed the risk of bias in research that was analyzed around four criteria: dataset, variable identification, algorithm, and output. The ability of deep learning algorithms to detect additional features provides important inputs for glaucoma diagnosis and progression. In the study, it was stated that data quality heavily affects the performance of deep learning systems. Hemelings [12] proposed an active and transfer learning-based convolutional neural network model for the accurate prediction of glaucoma discomfort from color fundus images. They used the ResNet-50 CNN model for transfer learning with reference to optic disk centers. They used the model parameters with the Adam optimizer at a constant learning rate. The binary cross-entropy technique was run as a cost function. Oh et al. [13] developed the glaucoma recognition and explanation system for a given prediction by making machine learning prediction. In their study, they deployed support vector machine, C5.0, Random Forest, and XGboost classifiers on five features. They claimed that their work was the first to address the concept of “explainable artificial intelligence” in eye diseases. Kamal et al. [14] focused on the concept of explicable artificial intelligence in glaucoma predictive analysis to understand risk factors in treatment planning. In the study, modeling of fuzzy membership functions, attributes, and relations between instances was provided with ANFIS architecture. They discussed the use of multi-pattern learning using fundus images and medical records. Huang et al. [15] used probabilistic deep learning to detect glaucoma from multi-model data. They modeled the glaucoma detection process using a probabilistic density layer in three different probabilistic CNN models. According to the results of the study, it was stated that the hybrid probabilistic AI model provides higher accuracy compared to the probabilistic AI models. Some research and studies on glaucoma are summarized and given in Table 1.

Table 1 Comparison of some studies in the literature

Thompson et al. [16] stated that glaucoma can lead to progressive visual impairment and visual impairment. Deep learning algorithms were trained with OCT (optical coherence tomography) images to improve the detection of glaucoma damage on fundus images. In the final result of the study, it was said that deep learning models showed satisfactory results in the detection of microstructural damage and glaucoma progression due to glaucoma. Girard and Schmetterer [17] conducted a study on the current status and future prospects of artificial intelligence and deep learning in glaucoma. In their work, they analyzed in depth a number of topics about the role of artificial intelligence in optic nerve analysis, glaucoma diagnosis, glaucoma prediction, cost-effective imaging, and genetics in glaucoma. It should be stated that clinical professionals expect a large number of AI tools to support them in glaucoma analysis in the near future. Tarcoveanu et al. [18] reviewed classification algorithms used to predict glaucoma progression. In this context, Random Tree, C4.5, NNGE (Non-Nested Generalized Exemplars), Random Forest, MLP (Multi-Layer Perceptron), SVM, and KNN classifiers were used. The best results were reported to be obtained with Random Forest and MLP. Madadi et al. [19] introduced domain adaptation-based deep learning models for the prediction and diagnosis of glaucoma. In the new method they introduced, they developed a model that learns domain-invariant and domain-specific representations to extract general and specific attributes. They suggested that their proposed method is generalizable and provides high diagnostic accuracy. Nunez et al. [20] conducted a detailed review of SotA studies and new trends in artificial intelligence systems in glaucoma recognition and monitoring. It was stated that the five main features of such systems will accelerate adoption in clinical settings. These features are testability, compatibility, relative advantage, observability, and complexity. Singh et al. [21] stated that glaucoma can cause irreversible vision loss by damaging the optic nerves. They used pre-trained CNN models in their work, then tried to refine hyperparameter settings with a transfer learning approach and fine-tuning. Akter et al. [22] conducted a study on the diagnosis of glaucoma by multi-feature analysis and deep learning methods. They performed multi-feature analysis and the introduction of a new cross-sectional optic nerve head feature (ONH) to facilitate existing glaucoma recognition processes. They used the VGG16 and ResNet18 CNN models for deep learning. They expressed the potential of obtaining improved results by combining the newly defined 3D section feature, ONH, with features obtained from the 2D plane. Pham et al. [23] stated that glaucoma is a disease that causes vision loss worldwide, especially in elderly people. In this context, they proposed a multimodal deep learning approach for future visual field prediction in glaucoma patients. They also used features from horizontal and vertical tomogram images in their work.

2 Motivation

Glaucoma is a critical eye disease that can be detected by detailed analyzes at a young age, usually occurs in old ages, and can cause blindness. Early diagnosis of the disease is very important for successful treatment. Considering the increasing average life expectancy, it can be said that this disorder, which usually occurs in old age, will lead to loss of vision, one of the five basic senses, and therefore to a decrease in living standards. The biggest risk factor in this disease is the slow progression of the disease, spreading over time, without showing any external symptoms. Glaucoma is detected by a specialist doctor’s analysis of retinal images. Experts can diagnose glaucoma faster with computer-aided systems while making their diagnosis. Computer-assisted machine learning systems can also make important contributions to early diagnosis in the detection of glaucoma and other eye diseases. The study focuses on the methods and techniques used in machine learning-based systems that can be used for early diagnosis of glaucoma. For this purpose, an activation function that can be used in such systems has been developed.

3 Material and methods

3.1 Retinal database (ACRIMA and HRF)

In the study, datasets used in experimental evaluations to examine the effect of activation and optimizer functions on deep learning models are mentioned.

ACRIMA dataset contains 705 fundus pictures (396 glaucomatous and 309 normal images). These were obtained from glaucomatous and healthy patients at the FISABIO Oftalmologica Médica in Valencia, Spain, with their prior agreement and in compliance with the moral guidelines outlined in the 1964 Declaration of Helsinki. Glaucoma experts with years of expertise annotated every image from the ACRIMA database. They were renamed after being clipped around the optic disk [24]. The HRF dataset includes 101 glaucoma retinal images and 300 non-glaucoma fundus photos with pixel resolutions of 3504 × 2336 [25]. Figure 2a shows a fundus scan of a normal eye, while Fig. 2b shows a fundus scan of a glaucoma-affected eye. Both images were taken from the ACRIMA dataset. Figure 3a shows a normal fundus scan, while Fig. 3b displays a fundus scan with glaucoma that was taken from the HRF dataset. The optical disk of an eye was the focus for almost all fundus imaging. These photos were gathered with the help of IMAGEnet, Topcon (TRC), and the annotations of medical professionals. The CIFAR-10 dataset consists of 60,000 32 × 32 color images in 10 classes, with 6000 images per class. There are 50,000 training images and 10,000 test images.

The normal and glaucoma-affected eye structures are shown in Fig. 1.

Fig. 1
figure 1

Human eye structure for a normal and b glaucoma [26]

When Figs. 2 and 3 are examined, the differences between the fundus scan images of glaucoma-affected eyes and normal eyes are evident in the effects of glaucoma on the optic disk and retinal structures. In glaucomatous eyes, optic disk fading and an increase in the cup-to-disk ratio are observed, indicative of nerve damage. In addition, thinning of the retinal nerve fiber layer (RNFL), localized defects of this thinning, and microbleeds around the optic disk are characteristic signs of glaucoma. As glaucoma progresses, additional findings such as notching at the edge of the optic disk and the appearance of abnormal blood vessels or bypass channels may also occur. These changes are easily recognized on fundus scans and are critical in the diagnosis and monitoring of glaucoma. Fading of the optic disk, changes in cup-to-disk ratio, and thinning and defects in the RNFL are key findings to look for in fundus images of glaucoma, and these findings are the basis for early diagnosis and management of the disease.

Fig. 2
figure 2

Fundus images a normal eye and b glaucoma-affected eye for ACRIMA database [24]

Fig. 3
figure 3

Fundus images a normal eye and b glaucoma-affected eye for high-resolution fundus (HRF) database [25]

3.2 Convolutional neural network

Deep learning, a branch of machine learning, is studied and has a multi-layered structure [27]. Convolutional neural networks are among the most often used deep learning models (CNN). By combining feature extraction and classification, CNNs feedforward neural networks inspired by biological processes are able to discern patterns directly from picture pixels or other data [27,28,29]. Iqbal et al. [30] compared eight algorithms and fine-tuned their settings before running final tests. The study concludes that AdaGrad outperformed others in terms of accuracy, training time, and memory usage.

In the experimental study, two distinct CNN architectures with three and six layers were tested. The 64 filters and 3 × 3 filter size in the three-layer CNN architecture are used to construct the feature map. The size of the feature map is then decreased by using the max-pooling layer. The filter size used in the max-pooling layer was 2 × 2. In the six-layer CNN architecture, the first convolutional layer has 64 filters and a filter size of 3 × 3. The second convolutional layer has 32 filters and a filter size of 3 × 3. The third convolutional layer has 16 filters and a filter size of 3 × 3. The fourth convolutional layer has eight filters and a filter size of 3 × 3. As with the preceding layer, a 2 × 2 filter size was chosen for the max-pooling layers. The collected data are prepared for the fully connected layer using the flatten layer. The density of 512 neurons was gradually changed in order for the completely connected layer to connect to every neuron in the preceding layer. The SoftMax layer is then applied to improve classification success.

3.3 Activation functions

The performance of CNN architectures is enhanced by newly developed state-of-the-art activation functions. Since it is more possible to diversify nonlinear activation functions, it is seen that they have been developed and proposed frequently in recent years [28]. A function is considered nonlinear if it has a variable slope. In other words, if the derivative of the function is variable, that function is nonlinear, while if the derivative is constant or zero, that function is linear. Besides, nonlinear activation functions are diversified into two types as monotonic and non-monotonic. If there is a continuous increase or decrease in the min and max input intervals of a function, it has the feature of monotonicity, while if there is no continuity in the increase or decrease, it has the feature of non-monotonicity. This section presents the basic types of nonlinear monotonic and nonlinear non-monotonic activation functions.

Since the logistic function takes a value between 0 and 1, it is a basic nonlinear function suitable for dual-class classification problems. The monotonic hyperbolic tangent function is also used for binary classification where the logistic function is insufficient by solving problems with center zero. However, when the hyperbolic tangent function approaches the boundaries of the range (2, − 2) on the y-axis, it encounters the vanishing gradient problem, thus reducing the performance of the architecture as a significant decrease in learning.

Presented in recent studies as a piecewise and non-monotonic nonlinear function, SupEx [31] both eliminates the vanishing gradient problem and is a zero-centered function. However, since non-particle functions can be understood more clearly in terms of mathematical expression, they are often recommended in experimental studies. For example, Swish [32], Mish [33], Logish [34], and Smish [35] functions are non-particle nonlinear non-monotonic activation functions. These functions can be used in basic form as well as with trainable alpha and beta parameters to change nonlinearity and slope.

ReLu ReLu is a nonlinear activation function used in multi-layer or deep neural networks. The formula for this function is \(f\left( x \right) = \max \left( {0,x} \right)\). Where \(x\) is the input value, and the maximum value between zero and the input value is ReLu’s output. When the input value is positive, the input value equals the input value, and when the input value is negative, the output equals zero.

LeakyReLU Similar to ReLU, LeakyReLU has a small slope for negative values rather than a flat one. Alpha, another name for this slope coefficient, is a tiny constant number. The network learns the ideal values for weights and other parameters during training, but α stays fixed. LeakyReLU can be expressed as \(f\left( x \right) = \max \left( {\alpha x,x} \right)\). In this case, α represents the slope coefficient, and x is the activation function’s input. If \(x\) is positive, the output of the function is \(x\), but if \(x\) is negative, the output is \(\alpha x\) instead of \(0\).

Swish A straightforward mathematical formula called the Swish activation function is applied in machine learning to enhance the accuracy of different models, including neural networks. The Swish function formula, defined as \(f\left( x \right) = x \cdot {\text{sigmoid}}\left( {\beta x} \right)\), is quite straightforward. But its remarkable qualities and ability to increase learning model accuracy account for its simplicity. Reducing the difference between complex activation functions like repeating units and high-performance ones like ReLU is the primary goal of the Swish function.

Logish This activation function, defined as \(f\left( x \right) = x\cdot\ln \left( {1 + {\text{sigmoid}}\left( x \right)} \right)\), is nonlinear and non-monotonic. In Logish, the variable \(x\) is used to guarantee that the negative output has a strong regularization effect after the logarithmic operation is used to reduce the numerical range of \({\text{sigmoid}}\left( x \right) + 1\).

Mish The definition of this activation function, which was first presented in a research paper in 2019, is \(f\left( x \right) = x \cdot \tanh \left( {{\text{Softplus}}\left( x \right)} \right) = x \cdot \tanh \left( {\ln \left( {1 + e^{x} } \right)} \right)\). The input signal from the preceding neuron is represented by the first part of the formula, x. Two activation functions (hyperbolic tangent and Softplus) are implemented in the second part of the formula, \(\tanh \left( {{\text{Softplus}}\left( x \right)} \right)\). The Softplus activation function is defined as \({\text{Softplus}}\left( x \right) = \ln \left( {1 + e^{x} } \right)\). Mish is not monotonic, which sets it apart from other activation functions. Additionally, it has a self-regulation feature that enhances generalization and helps avoid overfitting. In essence, this means that it facilitates better learning by the neural network, preventing it from memorizing patterns in place of learning how to classify data.

Smish The Smish is a mathematical expression employed in deep learning as an activation function. Definition of the expression is \(f\left( x \right) = x \cdot \tanh \left( {\ln \left( {1 + \sigma \left( x \right)} \right)} \right)\), where the sigmoid function is indicated by σ(x). The sigmoid and hyperbolic tangent functions are used by the Smish activation function to calculate its output given an input of x. Moreover, f(x) = αx * tanh(ln(1 + σ(βx))) is the parameterized form of the Smish function that has been proposed by researchers. The function’s performance is enhanced, and its behavior is tuned by adding two new parameters (α and β) in this parameterized version.

3.4 Proposed activation function: Trish

One of the most important criteria in activation functions is their ability to introduce nonlinearity to the neural network. This nonlinearity is crucial for the network to learn and represent complex patterns and relationships within the data. In addition, differentiability, output range, computational efficiency, saturation properties, smoothness and monotonicity, and bias/variance trade-off are important criteria for activation function selection. Inspired by these criteria, which emphasize the importance of the activation function, an approach that can provide a balance between performance and cost is adopted.

The activation function proposed in the study was developed based on the Smish function given in Eq. (1).

$$f\left( x \right) = x\tanh \left( {\ln \left( {1 + {\text{sigmoid}} \left( x \right)} \right)} \right)$$
(1)

The proposed function is achieved by reciprocally transporting the tanh and sigmoid components in the Mish function. That is why it was named Trish. The basic form of the function is given in Eq. (2), and the trainable parameterized form is given in Eq. (3).

$$f\left( x \right) = x{\text{sigmoid}} \left( {\ln \left( {1 + \tanh \left( x \right)} \right)} \right)$$
(2)
$$f\left( x \right) = \alpha x{\text{sigmoid}} \left( {\ln \left( {1 + \tanh \left( {\beta x} \right)} \right)} \right)$$
(3)

In the equations, the parameters of \(\alpha\) and \(\beta\) refer to the slope coefficients. The positive region is controlled by \(\alpha\), and the negative region is controlled with \(\beta\). The \(x\) given in the equations is expressed as input data. Link: https://github.com/serhatklc/TrishAF

In activation functions with trainable parameters, the value of \(\alpha\) and \(\beta\) allows the model to be adaptively tuned to the characteristics of the data during the training process, which contributes to better generalisation performance and improvement of gradient flow in deep networks. Moreover, thanks to the value of \(\alpha\) and \(\beta\), the activation function can produce nonzero values for negative inputs, which increases the learning capacity of the model and prevents information loss from negative inputs. These properties make \(\alpha\) and \(\beta\) a critical factor that directly affects the performance of the model, especially in complex and deep learning models.

Equation (4) is obtained when the basic function is simplified.

$$f\left( x \right) = x\left( {\left( {\tanh \left( x \right) + 1} \right)^{ - 1} + 1} \right)^{ - 1}$$
(4)

As such, it seems clear that the function is differentiable. So, when we derive it, we get Eq. (5).

$$f\left( x \right) = \frac{\tanh \left( x \right) + 1}{{\tanh \left( x \right) + 2}} + \frac{{x{\text{sech}}^{2} \left( x \right)}}{{\left( {\tanh \left( x \right) + 2} \right)^{2} }}$$
(5)

Graphs of Trish’s basic equation and its derivative are given in Fig. 4.

Fig. 4
figure 4

Graph of proposed Trish activation function and its derivative

Graphs are given in Figs. 5 and 6 to examine the effects of Trish’s trainable parameters.

Fig. 5
figure 5

Graph of proposed Trish activation function with changing \(\alpha\)

Fig. 6
figure 6

Graph of proposed Trish activation function with changing \(\beta\)

The main reason why the slope values of α and β in Figs. 5 and 6 are between 0.1 and 1 in the proposed activation function is that this range optimizes the learning process and generalization capability of the model in a balanced way. These values help the model to adapt to different data distributions, achieving both fast learning and high generalization performance. The range between 0.1 and 1 offers a balance obtained as a result of practical experience and experimental validation, which prevents the model from overreacting and gives good results in a wide range of problems. Therefore, tuning these parameters is the key to improving the performance of the model in a balanced way and adapting to the characteristics of the dataset.

Trish has increasing and decreasing monotonic intervals, so it is non-monotonic. Trish also has a variable gradient indicates nonlinearity. Also, since the proposed function does not meet the odd or even conditions, it is neither odd nor even and non-symmetric.

4 Experiments and observations

In the study, a novel non-monotonic activation function named Trish is proposed to improve accuracy of deep learning architectures. In addition, experimental analyses were carried out to examine the effect of optimizers on deep learning architectures. Experiments were performed on three datasets of high-resolution fundus (HRF), ACRIMA, and CIFAR-10. Thus, three-layer and six-layer deep learning architectures in which SGD, Adam, RmsProp, AdaDelta, AdaGrad, Adamax, and Nadam are used as optimizers, ReLU, LReLU, Mish, Swish, Smish, Logish, and the proposed Trish are used as activation functions. The CNN models used in the study were inspired by the simple models used in the literature for testing purposes, which are usually composed of 2–8 layers. The main approach here is to perform the analysis by using system resources more efficiently due to the high number of experiments. On the other hand, this approach is important to see to what extent acceptable success levels are achieved in models with the least complexity.

Thanks to CNN simulations created with HRF and ACRIMA datasets, glaucoma disease detection was performed. In addition, experimental evaluations were made through the CIFAR-10 dataset to further support the proposed activation function. Experiments were obtained by repeating 5 times. In Tables 2 and 3, experimental evaluations ACRIMA and HRF dataset made to examine the effect of parameters of the proposed activation function on deep learning are presented. In experimental studies, each model was trained for 50 epochs, learning rate 0.001, and batch size 64. The proposed model is presented in Fig. 7.

Table 2 The effect of single and double parameter on the accuracy results on the activation function on the ACRIMA dataset
Table 3 The effect of single and double parameter on the accuracy results on the activation function on the HFR dataset
Fig. 7
figure 7

Flowchart of the proposed model

Experimental evaluations with single and double parameter were performed on the ACRIMA and HRM datasets using the Trish activation function in Tables 2 and 3. When Table 2 is examined, the highest performance result was obtained as 0.9750 in Adamax optimizer with α = 1 slope parameter of single-parameter Trish activation function. In addition, the highest performance result of 0.9642 was obtained in the Adam optimizer with the parameters α = 1 and β = 0.1 in the two-parameter activation function. Table 3 compares the effect of using single and double parameter on the accuracy of the activation function on the HFR dataset using various optimization algorithms (SGD, RmsProp, Adam, AdaDelta, AdaGrad, Adamax, and Nadam). In the single-parameter case, the accuracy rate generally increases with increasing parameter value; the highest accuracy rates were generally obtained using Nadam and Adamax. In the double-parameter case, decreasing parameter values from 1.00 to 0.10 generally increased the accuracy rates, with the highest accuracy rates generally obtained at values between 1.00 and 0.10. This shows that the use of double parameters can give better results than the use of single parameter in certain situations. Among the optimization algorithms, in general, Adam, Nadam, and Adamax algorithms provided higher accuracy rates, while the performance of SGD and RmsProp algorithms remained relatively low.

It has been observed that the performance results obtained in the single- and double-parameter experimental results give lower results than the proposed Trish activation function in non-parametric structure. Therefore, all experimental results performed using the non-parametric Trish activation function are presented in Table 4. In the experimental evaluations, only the LeakyReLu activation function was taken with a parameter value of 0.9, and the other activation functions were used non-parametrically. Since the best accuracy in the proposed activation function is in the non-parametric structure, α and β values are equal to 1 in the experiments.

Table 4 Three-layer CNN experiments on ACRIMA dataset

The results of the experiments performed for the detection of glaucoma disease on the ACRIMA dataset using the three-layer CNN architecture are given in Table 4. In the experiments, how the optimizer and activation functions affect the accuracy was examined. The best val-loss, val-acc, and val-fscore values in the results of the experimental evaluations for the detection of glaucoma disease on the ACRIMA dataset in the three-layer CNN architecture were obtained with proposed Trish activation function in all optimizers. The highest performance is with the proposed activation function Trish and Adamax optimizer, with 0.1495 loss, 0.9716 accuracy, and 0.9750 f-score. As a result, it is seen that the proposed Trish activation function gives better results than different optimizers and other activation functions in the three-layer CNN architecture. In the experimental environment created, the lowest accuracy was obtained with Adam optimizer and non-parametric ReLU activation function as 0.7589 accuracy.

The 95% confidence intervals for the optimizers were calculated and presented in Table 5. The Adamax optimizer showed a better confidence interval with a higher mean and a smaller interval width compared to other commonly used optimizers.

Table 5 Confidence interval profile of Trish activation function on three CNN layers for ACRIMA

In Fig. 8, the performance comparison of the activation functions according to the Adamax optimizer, where the best result was obtained for the detection of glaucoma disease on the ACRIMA dataset, is given. Better validation accuracy value than other non-monotonic functions was obtained by the proposed Trish activation function. In addition, it is observed in Fig. 8 that the proposed Trish and ReLU activation functions show similar behavior. In Fig. 8, it is seen that the Trish activation function except ReLU has a smaller box graph size, and the distance of the whiskers to the box is less compared to other activation functions. Thus, in light of the results obtained, it has been evaluated that the proposed Trish activation function works more stable than other functions. The small whisker lengths in the box plots and the small difference between the lowest and highest performance values obtained in the current experiments are an indication of the stability of the activation functions. The proposed Trish activation function in the results gives more stable results in the detection of glaucoma with the three-layer CNN model compared to other activation functions.

Fig. 8
figure 8

Accuracy box plot of activation functions in three-layer CNN model of dataset

In Table 6, the results of the experiments performed for the detection of glaucoma disease on the ACRIMA dataset using the six-layer CNN architecture are given. The best val_loss, val_acc, and val_fscore values were obtained in all optimizers with the proposed Trish activation function in the results of the experiments performed for the detection of glaucoma disease on the ACRIMA dataset in the six-layer CNN architecture. The highest performance was achieved with the proposed activation function Trish and Adamax optimizer at 0.2406 loss, 0.9722 accuracy, and 0.9842 f-score values. As a result, it is seen that the proposed activation function gives better results than all activation functions with different optimizers in the six-layer CNN architecture. In the experimental environment created, the lowest accuracy was obtained with the AdaDelta optimizer, with an accuracy of 0.5390 with the non-parametric Swish activation function, while an accuracy of 0.8440 with the Trish activation function suggested in the same optimizer.

Table 6 Six-layer CNN experiments on ACRIMA dataset

The 95% confidence intervals calculated for the optimizers are given in Table 7. The proposed Trish activation function showed a better confidence interval with a higher mean and a smaller interval width in the Adamax optimizer compared to other common optimizers used in the study.

Table 7 Confidence interval profile of Trish activation function on six CNN layers for ACRIMA

In Fig. 9, the performance comparison of the activation functions according to the Adamax optimizer, where the best result was obtained for the detection of glaucoma disease on the ACRIMA dataset, is given. The proposed Trish activation function has a higher validation accuracy value than other non-monotonic activation functions. In addition, it is observed in Fig. 9 that the Trish activation functions proposed with LeakyReLU show similar behavior. It is seen that the box graph size is small, and the distance of the whiskers to the box is less compared to other activation functions except LeakyReLU. Thus, the distance between the extreme values in the box plot is very small. Also, the small difference between the accuracy rates shows that the proposed activation function is stable. In light of the results obtained, it is observed that the proposed Trish activation function works more stable than other functions. It is seen that the suggested Trish activation function box plot lengths are shorter than the other box plot lengths, and the distance of the whiskers to the box is short. The small whisker lengths in the box plots and the small difference between the lowest and highest performance values obtained in the current experiments are an indication of the stability of the activation function. The proposed Trish activation function in the results gives more stable results in the detection of glaucoma with the three-layer CNN model compared to other activation functions. When the other activation functions are examined, it is seen that especially the lower whiskers are quite far from the box plot.

Fig. 9
figure 9

Accuracy box plot of activation functions in six-layer CNN model of ACRIMA dataset with Adamax optimizer

In Table 8, the results of the experiments performed to examine the effect of optimizer and activation functions for the detection of glaucoma disease over the HRF dataset using the six-layer CNN architecture are given. The highest performance was achieved with the recommended activation function Trish and Adamax optimizer at 0.6429 loss, 0.7538 accuracy, and 0.7874 f-score values. When the experimental results are examined, it can be said that the proposed Trish activation function works actively in all optimizers, so it works both consistently and stably. As a result, it is seen that the proposed activation function gives good results in three-layer CNN architecture, from different optimizers and all activation functions. In addition, in the experimental evaluations performed with the HRF dataset, it was observed that some activation functions could not produce accuracy in all optimizer functions due to the characteristics of the dataset.

Table 8 Three-layer CNN experiments on high-resolution fundus (HRF) dataset

The 95% confidence intervals calculated for different optimizers are given in Table 9. The proposed Trish function showed a better confidence interval with a higher mean and a smaller interval width with the Adamax optimizer than the other common optimizers used in the study.

Table 9 Confidence interval profile of Trish activation function on three CNN layers for HRF

In Fig. 10, the performance comparison of the activation functions with the Adamax optimizer, where the best result was obtained for the detection of glaucoma disease on the HRF dataset, is given. It is seen that the proposed Trish activation function achieves higher results in validation accuracy value than other non-monotonic activation functions. When the figure is examined, it is seen that the box graph size of the proposed Trish activation function is smaller, and the distance of the whiskers to the box of the proposed Trish activation function is less compared to the other activation functions. However, since the box size of the proposed Trish activation function is close to the box size of the Smish and Swish activation functions, they exhibit similar behavior. Also, the small difference between the accuracy values shows that the proposed activation functions are stable. Thus, in light of the results obtained, it is observed that the proposed Trish activation function works more stable than other functions. It is seen that the length of the box plot of the proposed Trish activation function is shorter than the other box plot lengths, and the distance of the whiskers to the box is short. The small whisker lengths in the box plots and the small difference between the lowest and highest performance values obtained in the current experiments are an indication of the stability of the activation function. The proposed Trish activation function in the results gives more stable results in the detection of glaucoma with the three-layer CNN model compared to other activation functions (Table 10).

Fig. 10
figure 10

Accuracy box plot of activation functions in three-layer CNN model of HRF dataset with Adamax optimizer

Table 10 Three-layer CNN experiments on CIFAR-10 dataset

The 95% confidence intervals calculated for different optimizers are given in Table 11. The proposed Trish activation function gave a better confidence interval with a higher mean and a smaller interval width with the Adamax optimizer compared to other common optimizers used in the study.

Table 11 Confidence interval profile of Trish activation function on three CNN layers for CIFAR-10

In Fig. 11, the performance comparison of the activation functions used with the Adamax optimizer, where the best result was obtained to measure the performance of the proposed activation function over the CIFAR-10 dataset, is given. The validation accuracy value of the proposed Trish activation function was higher than other non-monotonic activation functions. When the box plot of the proposed Trish activation function is examined, it is smaller than the box plot size of the other activation functions, and the distance of the whiskers to the box is less. It is also seen that there is a small difference between the accuracy values. In addition, the smallest value of Trish activation function is above LeakyReLU, Mish, and Smish activation functions. Thus, in light of the results obtained, it is observed that the proposed Trish activation function works more stable than other functions. It is seen that the CIFAR-10 dataset gives different results in different activation functions and exhibits an unstable behavior. As a result, it is seen that the proposed Trish activation function exhibits a more stable structure on the CIFAR-10 dataset than other activation functions. The model was successfully optimized using an Adamax optimizer with beta_1 = 0.9, beta_2 = 0.99, epsilon = 1e−07, and a beginning learning rate of 0.001.

Fig. 11
figure 11

Accuracy box plot of activation functions in three-layer CNN model of CIFAR-10 dataset with Adamax optimizer

5 Discussion

Glaucoma is a chronic eye disorder characterized by damage to the optic nerve, which leads to progressive loss of vision, typically in the peripheral field. It is caused by an increase in intraocular pressure (IOP) within the eye, which can lead to damage to the nerve fibers that carry visual information to the brain. The most common form of glaucoma is primary open-angle glaucoma, which is characterized by a slow and painless loss of peripheral vision. Preventing vision loss caused by glaucoma, which is a leading cause of blindness, can be achieved through early diagnosis and treatment [36].

Computer-aided systems are widely used in glaucoma classification by using techniques such as machine learning, deep learning, and image processing to analyze and classify the images. In ref [37] is a study that proposed a method for classifying glaucoma using color fundus images and a CNN with transfer learning. The study used a dataset of color fundus images of eyes with glaucoma and healthy eyes. The CNN model was pre-trained on a large dataset and then fine-tuned on the dataset of color fundus images. The study reported that the proposed method achieved an accuracy of 96.6% in classifying glaucoma and healthy eyes, demonstrating the potential of using color fundus images and CNNs with transfer learning for automatic glaucoma classification.

In [38] used a dataset of retinal images of eyes with glaucoma and healthy eyes, and a CNN as the classification network. They also used methods such as data augmentation and transfer learning to increase the performance of the network. This study reported that the proposed network achieved an accuracy of 96.5% in classifying glaucoma and healthy eyes, demonstrating the potential of using deep learning-based approaches and CNN method for automatic glaucoma classification in retinal images. In another study [39] that proposed an improved swarm optimization method for glaucoma classification using deep features. The study used a combination of a swarm optimization algorithm called SEGSO and a deep learning model called VGGNet, to optimize the features extracted from eye fundus images for glaucoma classification. The authors used a dataset of eye fundus images and trained the model on this dataset. They then evaluated the performance of the model on a separate test dataset and reported that the proposed method achieved high accuracy in glaucoma classification, with an accuracy of 99.67%. The study demonstrates that the suggested method is an efficacious method for glaucoma classification using deep features and can be a good option for glaucoma diagnosis. Other ref [40] that suggested an hybrid feature space composed of texture information and transfer learning for glaucoma classification. The authors used a dataset of eye fundus images and trained a deep learning model using the texture features extracted from the images, and transfer learning technique. They then evaluated the performance of the model on a separate test dataset and reported that the proposed method achieved high accuracy in glaucoma classification, with an accuracy of 93.61%. The study demonstrates that the suggested method is a powerful system for glaucoma classification using texture information and transfer learning and can be a good option for glaucoma diagnosis.

Optimizers and activation functions, which are components of complex deep learning architectures, significantly affect network accuracies. For this reason, this study analyzes how optimizers and activation functions improve the performance of three-layer and six-layer CNN architectures designed for glaucoma classification. Architectural variations with several existing optimizers and activation functions were created and tested. In addition, a new activation function, Trish, is proposed and compared with architectures using other activation functions. This architecture, designed specifically for glaucoma classification, was found to be highly successful. This study by intervening in the internal structure of deep learning architectures, especially the activation function, is scientifically seminal. Comparison of performance regarding glaucoma classification is given in Table 12.

Table 12 Performance results for glaucoma classification

It has been observed that the proposed activation function satisfies the performance in deep learning architectures with the appropriate optimizer. The six-layer CNN activation function proposed with Adamax optimizer in ACRIMA dataset obtained the highest values of 0.2406 error, 0.9722 validation accuracy, and 0.9842 f-score, respectively. The experimental studies have shown that satisfactory results have been obtained. In this study, the activation function with optimizer improve to performance of the CNN architecture was analyzed.

The proposed Trish activation function can outperform ReLU and other traditional activation functions in certain tasks and problems, especially in the classification of glaucoma disease. This is due to the fact that it offers an adaptive approach and can better manage the complexity of deep learning models. The ability to generate nonzero values for negative inputs helps to avoid the dead neuron problem and increases the learning capacity of the model, allowing it to achieve non-saturation. It also provides an advantage against the problem of gradient loss in deep networks by allowing gradients to flow better during model training. However, the computational cost of Trish can be higher than that of ReLU, and it needs an optimal choice of parameters to maximize its performance. Determining these parameters may require additional experiments and tuning. Furthermore, although Trish performs well on certain tasks and datasets, there is no guarantee that it is the best choice in all cases; different problems and data structures may require different activation functions.

6 Conclusion

Network accuracies are greatly influenced by optimizers and activation functions, which are parts of complex deep learning architectures. This study examines ways that optimizers and activation functions can improve the effectiveness of CNN architectures with two different numbers of layers for classifying glaucoma. A new activation function called Trish is proposed, and various existing optimizers and activation functions are used to create different architectural variations for comparison. It was discovered that this architecture, created specifically for the classification of glaucoma and using Trish, was quite effective. This discovery is scientifically groundbreaking as it diversifies the activation function, an important component of the deep learning system.

This study introduces a new non-monotonic activation function to improve the training success of deep learning classifiers. Three different datasets, fundus (HRF), ACRIMA, and CIFAR-10, were used. For this purpose, optimization method (SGD, Adam, RmsProp, AdaDelta, AdaGrad, Adamax, and Nadam), ReLU, LReLU, Mish, Swish, Smish, Logish, and proposed activation function and three-layer and six-layer CNN from different deep learning architectures were used. Thanks to the simulation, glaucoma disease detection was carried out. Moreover, experimental evaluation was made through the CIFAR-10 dataset to further support the proposed activation function.

Finally, the effectiveness of the proposed method was compared to the performance of other activation functions with different optimizers in a variety of measures, including training loss, training accuracy, training f-score, validation loss, validation accuracy, and validation f-score. The highest performance obtained for the detection of glaucoma disease over the HRF dataset using the proposed activation function and Adamax optimizer with a six-layer CNN architecture has 0.6429 error, 0.7538 validation accuracy, and 0.7874 f-score values, respectively. The proposed activation function, using the ACRIMA dataset, has the highest performance obtained with six-layer CNN and Adamax optimizer, with 0.2406 error, 0.9722 validation accuracy, and 0.9842 f-score, respectively. The advantage of the proposed activation function with an optimizer is the efficiency is high.