1 Introduction

Using a mobile phone to make bank transactions has become a regular and well-liked practice in the digital and internet age [1]. The security and authentication of users have become increasingly crucial with the growth of mobile banking and the rise in user numbers [2]. To enhance and facilitate this, intelligent facial authentication has been introduced as a fresh and potent technology [3]. The security level of biometric devices should be increased to provide an efficient system, especially for online banking [4]. Mobile authentication can be a suitable solution that enables online banking, mobile banking, and mobile payments in a way that easily ensures security [5, 6]. Authentication alone is susceptible to attacks; in cases of theft or trusted third parties, security can be easily breached [7]. Hackers can easily break security because most passwords look weak. Secure banking gives customers the confidence to know that their information is safe and that they can make secure transactions with confidence [8]. To create security in the online banking system, of which Mobile Bank is one, various methods have been presented so far [9]. Each of these methods has tried to discover the attack with a specific logic and strategy and has prevented penetration into the system [10]. Despite the many efforts that have been made, these methods are still facing security challenges and have not been able to maintain adequate security coverage in these systems [11]. Therefore, in this article, we will present a model that uses optimized hybrid capabilities to discover the identity of samples and authenticate people with the help of online mobile phone imaging [12]. The proposed model in this article is implemented based on the approaches of artificial neural networks (ANN), adaptive neural fuzzy networks (ANFIS), and decision trees (DT) for person authentication. However these methods alone do not have good accuracy. Therefore, we have used a wild horse optimization algorithm to improve the performance of these machine learning systems, and we have used the fuzzy combination of the results to make the final decision. In this work, a facial authentication algorithm for mobile banking is modeled under MATLAB software, and then plans, techniques and other suggested items for the desired system are implemented and checked. Finally, the simulation results are compared with other authentication techniques. In this context, we will try to achieve smart technology for safe and convenient banking based on identifying the identity of people based on their face images with mobile phone calls. So far, the authentication stage has been used in mobile banking, and the existing algorithms in this field are permeable and have security weaknesses. In this context, after reviewing the existing authentication methods and comparing them, a new integrated method based on the hybrid model optimized with WHO in mobile banking is presented to establish more security and accuracy. Face recognition technology (FRT) is known as equipment to support identity verification and authentication. Great strides have been made in developing accurate and tamper-resistant FRT solutions with the help of machine learning (ML) and artificial intelligence (AI) technologies, both on-chip and in the cloud. These developments have led to more banks’ confidence in using this technology for a wide range of applications and use cases. The use of evolutionary algorithms as a new approach in this paper can help create a hybrid model in identification. Based on this, we have been able to help increase the security of FRT with the help of a feature-matching problem extracted from a set of people’s images with the help of a genetic algorithm. Banks are also directly leveraging the power of ML, AI technology, and evolutionary algorithms to improve biometric performance and identity recognition. This is essential and gives banks confidence that biometric technology is secure and reliable. Conventional techniques for facial feature detection and analysis mostly lack robustness and suffer from high computation times. This paper aims to discover ways for machines to learn to interpret information in faces automatically without the need for manual feature clustering, using a deep learning approach. The important contributions of the current work are summarized as follows: (1) It presents a face authentication system with a machine learning-based implementation, using a hybrid model for dynamic authentication. (2) The proposed hybrid technique was designed and tested with the help of a WHO meta-heuristic algorithm in soft modeling based on people’s authentication to increase recognition accuracy. (3) The segmentation of features extracted from different types of images is based on a K-means-based clustering model for three sets of machine learning methods: ANFIS, ANN, and Decision Tree (DT). (4) Using a fuzzy logic system to make decisions to identify people with the highest possible accuracy (5) A genetic algorithm has been used to match features, select these features, and reduce features by removing features incompatible with real people in each system. In this case, the mobile implementation is done by the processors of all types of phones.

The remaining portion of the research paper is organized as follows: The researchers’ earlier research is reviewed in Sect. 2, and methods are defined in Sect. 3, the proposed technique is clarified in Sect. 4, the simulation tests and their conclusions are highlighted in Sect. 5, and the conclusion and future research directions are provided in Sect. 6.

2 Literature Review

Currently, mobile phone user authentication systems with the help of PIN code, fingerprint, and face recognition methods have several limitations. In the article [13], a comparison of single-modal and multimodal behavioral biometric features has been made, while the studied techniques consider different activities, such as typing, scrolling, drawing numbers, and tapping on the screen. A separate recurrent neural network (RNN) with triple loss is implemented for each modality. Then, the weighted combination of different modalities is done at the score level.

Reference [14] implements a comprehensive approach to smart home security that enhances privacy and security using two distinct and developing technologies, facial authentication and speech recognition via his mobile phone/tablet/PC. Neural networks are used to carry out the entire process. Data privacy and the resource constraints of mobile devices were two major authentication concerns that Article [15] proposed a hybrid solution to address. In the first, partial semantic encryption is used to carry out encryption based on the Paillier algorithm. In contrast, the latter deploys a deep convolutional neural network and a local ternary pattern combination to achieve facial recognition.

Since deep neural networks (DNNs) are not robust against their input perturbations, face recognition models (FRMs) on DNNs suffer from this vulnerability. According to the method presented [16], hostile attacks are designed following the identity preservation changes in faces, and in this situation, defects in FRMs is observed for recognizing the images belonging to the same identity. The modeling of these identity-preserving semantic changes is done through perturbations limited to direction and magnitude in the hidden space of Style GAN. The important point is that the semantic robustness of FRM is identified by the statistical description of perturbations that lead to malfunctions in the FRM.

To develop the performance of video-based face recognition, a novel semantic-based subspace model is suggested [17, 18]. The significant goal is to make an appropriate low-dimensional subspace for each person, upon which a semantic model is constructed to categorize the person’s key frames into definite classes. Subsequently, after the semantic classification, the key frames belonging to the same classes are utilized to train the linear classifiers for recognition. Interestingly, extensive experiments on a large face video database (XM2VTS) reveal that the aforementioned methodology attains a noteworthy performance enhancement over the traditional methods.

Typically, to corroborate the user identity, the smartphone’s user authentication is accomplished using mechanisms (password or security pattern). The advantages of these mechanisms include simplicity, cheapness, and speed for frequent entry. With this experience, they are damaged in the same way as a shoulder surfing or smudge attack. This problem could be resolved by authenticating the users utilizing their behavior (i.e., touch behavior) while using smartphones. Such behaviors contain finger pressure, size, and pressure time while tapping keys. Selecting features (from these behaviors) could play an imperative role in the authentication process’s performance. Hence, the objective of article [19, 20] is to suggest a well-organized authentication technique providing an implicit authentication for smartphone users while not imposing an additional cost of special hardware and addressing the limited smartphone capabilities. First, according to the filter and wrapper attitudes, the evaluation features selection techniques are placed and then the best method is used to propose the implicit authentication method. It should be noted that the estimation of these techniques is done according to the random forest classifier.

Facial recognition indicates that it is the only data accessible in the real world in many functional programs, which results in a significant improvement in performance for the majority of existing deep learning-based FAR approaches. Spatial-semantic patch learning (SSPL), a method that requires two steps for training, is suggested [21]. To learn the spatial-semantic relationship from large-scale unlabeled facial data, three auxiliary tasks a patch rotation task (PRT), a patch segmentation task (PST), and a patch classification task (PCT) are first constructed. In particular, PRT uses self-supervised learning to take advantage of the spatial information contained in facial photos. Based on a facial parsing model, PST, and PCT respectively capture the pixel-level and image-level semantic information of facial images. The second step is the transfer of spatial-semantic knowledge gained from auxiliary activities to the FAR task. This enables the pre-trained model to be fine-tuned with a relatively small amount of labeled data. The technologies for building smart cameras for semantic image processing based on the ELcore cores are described [22]. The steps of semantic image analysis to recognize faces are considered. On ELcore of DSP-cores, the resource-intensive algorithms are identified and put into practice. A method is suggested for the automatic comparative labeling of face soft biometrics [23]. Further research is done on unrestricted human face recognition utilizing this comparative soft biometrics in a gallery with human labels (and vice versa).

Article [63] introduced a simple and effective deep learning Fourier-based type-2 fuzzy neural network for high-dimensional problems. The rules are directly constructed by fast Fourier transformation. The input matrix/vector is segmented, and each segment represents a fuzzy rule. The upper/lower bounds of rule firings are obtained by the Fourier transformation approach. The output is computed by a simple type-reduction method. All antecedent and consequent parameters are optimized by simple gradient descent and fuzzy correntropy-based extended Kalman filter. The kernel size of conventional correntropy-based filters is determined by a fuzzy system. The convergence of the learning method is proved by the Lyapunov method. The effectiveness of the suggested approach is verified by the face recognition problem (1024 input variables), English handwriting digit recognition (1024 input variables), and modeling problem with real-world data set (32 input variables). The simulations and comparisons demonstrate the superiority of the introduced scheme.

According to the studies conducted in this section, each study has presented a new method to deal with the problem of identity verification. The investigated techniques have advantages and disadvantages that are mentioned in Table 1. An important issue that has not been investigated in all methods is the lack of confidence in the authentication techniques of people in these studies, to be used for mobile banking. To provide a variety of techniques based on the Hybrid model, it has been attempted to strengthen the dependability of the suggested authentication system in this work. The criterion of reliability for authentication in this research is increased with the help of the fuzzy logic technique and fuzzy rules governing it.

Table 1 Comparison of review articles

3 Methodology

Since the development of a face authentication system is what we are interested in, this study proposes a hybrid enhanced method for machine learning systems that addresses security detection and face verification problems. This work considers face authentication methods based on single images acceptable to video-based methods, highlighting the fact that both are used for specific purposes and, in many cases, these techniques can be successfully combined. They complement each other and improve security measures against attacks and image forgery. In this case, three machine-learning methods are used for face authentication. The basis of information segmentation includes extracted features, fuzzy K-means clustering, feature selection with the help of genetic algorithms, and strengthening and optimization of machine learning systems with the WHO algorithm, which are briefly defined below.

3.1 Machine Learning

A way of controlling a machine using reasoning skills based on learning outcomes is known as machine learning. In other words, when a computer is given a detailed data set, the machine automatically learns the corresponding rules and produces the outcome of applying the rule to further data.

Deep learning is one of these machine learning techniques that is gaining attention since it imitates human neurons and arranges numerous learning layers between inputs and outputs to produce more advanced outcomes [28]. The Deep Learning Model diagram is shown in Fig. 1 [29].

Fig. 1
figure 1

a Deep learning neural network typical architecture: one output, one input, and K hidden layers; b artificial neuron: a basic computational building block for neural networks [18]

To extract intricate and high-level abstractions of data representations, deep learning (DL) can be used. It is accomplished by deploying a hierarchical, layered learning architecture in which less abstract (i.e., lower-level) features are expressed, explained, and implemented on top of more abstract (i.e., higher-level) features [18]—see Fig. 1a. In big data analytics (BDA), where the majority of the raw data is unlabeled and uncategorized, DL approaches may analyze and learn from a tremendous amount of unsupervised data [18, 30].

Among the machine learning models based on deep learning, an artificial neural network (ANN) [31], Adaptive-network-based fuzzy inference (ANFIS) [32], and decision tree (DT) [33] are important methods used for object recognition. In this work, we have used these tools to authenticate people with a combination of extracted features.

3.1.1 Artificial Neural Network (ANN)

Artificial neural networks (ANNs) are biologically inspired computational networks. Among the various types of ANNs, in this work, we focus on multilayer perceptron (MLPs) with back propagation learning algorithms. MLPs, the ANNs most commonly used for a wide variety of problems, are based on a supervised procedure and comprise three layers: input, hidden, and output. We discuss various aspects of MLPs, including structure, algorithm, data preprocessing, over fitting, and sensitivity analysis. ANN models are found to perform extremely well in the field of prediction problems. However, there exists further scope to improve the performance of ANN modeling; for example, the efficiency of any ANN-based forecasting model may be substantially improved using multiple input parameters chosen from sensitivity analysis and hybrid models. Besides, there are many modifications in the existing ANN models, and new algorithms are also developed in recent years. However, despite improved output, the ANN models cannot provide a clear-cut relationship among interconnected parameters of different processes. From a modeling point of view, a major drawback of the neural networks is that the underlying physical processes or mechanisms are not easily understood whereas statistical or stochastic models can reveal useful information about the series under study.

3.1.2 Adaptive-Network-Based Fuzzy Inference (ANFIS)

An adaptive neuro-fuzzy inference system or adaptive network-based fuzzy inference system (ANFIS) is a kind of artificial neural network that is based on the Takagi–Sugeno fuzzy inference system. The technique was developed in the early 1990s [53]. Since it integrates both neural networks and fuzzy logic principles, it has the potential to capture the benefits of both in a single framework. Its inference system corresponds to a set of fuzzy IF–THEN rules that have the learning capability to approximate nonlinear functions [54]. Hence, ANFIS is considered to be a universal estimator [55]. To use the ANFIS more efficiently and optimally, one can use the best parameters obtained by the genetic algorithm [56]. It has uses in intelligent situational-aware energy management systems.

It is possible to identify two parts in the network structure, namely the premise and consequence parts. In more detail, the architecture is composed of five layers. The first layer takes the input values and determines the membership functions belonging to them. It is commonly called the fuzzification layer. The membership degrees of each function are computed by using the premise parameter set, namely {a, b, c}. The second layer is responsible for generating the firing strengths for the rules. Due to its task, the second layer is denoted as the “rule layer”. The role of the third layer is to normalize the computed firing strengths, by dividing each value for the total firing strength. The fourth layer takes as input the normalized values and the consequence parameter set {p, q, r}. The values returned by this layer are the defuzzificated ones and those values are passed to the last layer to return the final output.

3.1.3 Decision Tree (DT)

Decision tree learning is a supervised learning approach used in statistics, data mining, and machine learning. In this formalism, a classification or regression decision tree is used as a predictive model to conclude a set of observations.

Tree models where the target variable can take a discrete set of values are called classification trees; in these tree structures, leaves represent class labels and branches represent conjunctions of features that lead to those class labels. Decision trees where the target variable can take continuous values (typically real numbers) are called regression trees. More generally, the concept of regression trees can be extended to any kind of object equipped with pairwise dissimilarities such as categorical sequences [57]. Decision trees are among the most popular machine learning algorithms given their intelligibility and simplicity.

A decision tree is a simple representation for classifying examples. For this section, assume that all of the input features have finite discrete domains, and there is a single target feature called the “classification”. Each element of the domain of the classification is called a class. A decision tree or a classification tree is a tree in which each internal (non-leaf) node is labeled with an input feature. The arcs coming from a node labeled with an input feature are labeled with each of the possible values of the target feature or the arc leads to a subordinate decision node on a different input feature. Each leaf of the tree is labeled with a class or a probability distribution over the classes, signifying that the data set has been classified by the tree into either a specific class or into a particular probability distribution.

3.2 Fuzzy Logic System

Lotfi Zadeh discovered Fuzzy logic in 1965. He realized that the way of human reasoning is not precise and cannot be represented by binary values of 0 or 1. Fuzzy logic from the discovery of Lotfi Zadeh is defined as a multivariable logic that allows for the representation of the human way of reasoning in a way that can be processed by a computer for further interpretation [58].

The fuzzy system consists of three parts: fuzzification, fuzzy rules, and de-fuzzification. The fuzzifier scales and maps input variables to fuzzy sets. It is the establishment of the fact base of the fuzzy system. It identifies the inputs and output of the system, defines appropriate IF–THEN rules, and uses raw data to derive a membership function. The engineer determines membership functions that map the crisp values of interest to fuzzy values. The inference engine is used for approximate reasoning and it deduces the control action. Evaluate all rules and determine their truth values. If an input does not precisely correspond to an IF THEN rule, partial matching of the input data is used to interpolate an answer. The defuzzification process is used to convert fuzzy output values to control signals. This involves the conversion of the fuzzy value obtained from composition into a “crisp” value. It is necessary since controllers of physical systems require discrete signals [59].

3.3 Fuzzy K-Mean Clustering

A clustering procedure known as the K-Means classic problem distributes a set of data points to clusters based on how similar they are to other data points in that cluster. It should be noted that they are distinct if they are a part of other clusters. The K-Means algorithm is widely utilized because of its effectiveness. Over the years, numerous adjustments and generalizations have been established and planned [24]. The fuzzy K-Means (FKM) algorithm is the most often used of the various K-Means variations. Notably, it was first put forth and advanced [45, 46]. In this way, article [24] presented a penalty term to the target function of FKM to address some of the clustering-related issues, such as the number of clusters and major cluster centers. The goal of this algorithm’s clustering of X into C clusters is to minimize the following objective function [24]:

$$f\left[\mathbf{U}, \mathbf{V}\right]=\sum_{i=1}^{n}\sum_{k=1}^{c}{u}_{ik}{d}_{ik}+\gamma \sum_{i=1}^{n}\sum_{k=1}^{c}{u}_{ik}{\text{log}}{u}_{ik}$$
(1)

Subject to:

$$\sum_{k=1}^{c}{u}_{ik}=1, {u}_{ik}\in \left(\mathrm{0,1}\right], 1\le i\le n, 1\le k\le c$$

where dik is a measuring difference between the KT cluster center and it object, V is a CMM matrix comprising cluster centers, and U is an NC partition matrix. Applying (1), the following example illustrates how to minimize the alternate minimization between the cluster center matrix V and the membership matrix U:

$${u}_{ik}=\frac{{\text{exp}}(\frac{{-d}_{ik}}{\gamma })}{{\sum }_{s=1}^{c}{\text{exp}}(\frac{{-d}_{is}}{\gamma })}$$
(2)
$${\mathbf{v}}_{k}=\frac{{\sum }_{i=1}^{n}{u}_{ik}\mathbf{X}i}{{\sum }_{i=1}^{n}{u}_{ik}}$$
(3)

The conventional K-MEANS algorithm’s cost function is the first term in (1). In the clustering process, the second term is included to increase the negative objects-to-clusters membership entropy, which can simultaneously limit intra-cluster scattering and maximize the negative weight entropy to identify clusters that will help with object association.

3.4 Wild Horse Optimizer Algorithm (WHO)

The natural behavior of an agent, which can be a person, an animal, a plant, or a physical or chemical agent, serves as the main inspiration for optimization algorithms. Animal behavior has served as an inspiration for many algorithms developed in the previous ten years. In this study, we used the wild horse optimizer (WHO), a novel optimization method that was motivated by the social behavior of wild horses. Horses typically live in herds made up of a stallion, multiple mares, and their offspring. Horses engage in a variety of behaviors, including mating, dominance, chasing, and grazing [47]. Horses have a charming demeanor that sets them apart from other animals. Due to the way horses breed, their offspring join other groups before they reach adolescence. To avoid a father mating with a daughter or sibling, this escape was created. The proposed method draws heavily on the courteous behavior of the horse [25]. Figure 2 shows the flowchart for the wild horse optimizer algorithm that was employed in this study. The wild horse optimizer consists of five main steps as follows:

  • Creating an initial population and forming horse groups and selecting leaders;

  • Grazing and mating of horses;

  • Leadership and leading the group by the leader (stallion);

  • Exchange and selection of leaders;

  • Save the best solution.

Fig. 2
figure 2

Flowchart of the proposed WHO algorithm [25]

4 Proposed Work

Research on the use of biometric technology in banks has recently become important for better recognition of new customers, secure authentication of existing customers, protection of high-value transactions, and combating fraud. Interestingly, most traditional physical branches of banks use biometrics. In this context, the latest digital platforms also use biometrics. This technology is considered the only reliable tool for guaranteeing identity and guaranteeing banking security in all channels.

The trends leading to the adoption of biometrics among banks are numerous and include the following:

  • Emergence of mobile phones and mobile phone-based multi-faceted biometric authentication.

  • The emergence of biometric bank cards means “goodbye to PIN codes”.

  • Cross-Channel Reception. Adoption of biometrics is happening across all banking channels—supported by open banking APIs, regulations like PSD2 that inspire the use of biometrics in multi-factor authentication scenarios, and IoT devices that support voice and video and increasingly encounter biometrics.

In this article, online and mobile banking and authentication techniques were discussed. Security problems in mobile banking were also investigated. In this context, a new method has been presented to solve the main challenge of security and authentication in mobile banking. The proposed method is a combination of data mining techniques including deep learning methods, including artificial neural network (ANN), adaptive neural fuzzy network (ANFIS), and C4.5 decision tree algorithm, all enhanced with the help of the WHO wild horse optimization algorithm. In the following, the steps of implementing the identity authentication scheme with people’s faces are described based on the proposed hybrid model. The basis of the hybrid modeling of this work is the compatibility of features extracted from the face images of people for authentication.

This process includes:

Phase 1: the data set used in this article can include any type of data that is used in the field of mobile banking. However due to the focus of this work on the authentication of real persons, we will try to use a set of images. Different people should be used from different angles. References [48, 49] can be among the data sets used in this research.

Phase 2: developing an Enhanced Methodology is an important issue in the data mining literature and is often overlooked as a step in the data mining process. The important point is that in real-world machine learning applications, the opposite situation is observed and the desired accuracy is avoided. In this situation, in comparison with the existing machine learning methods, it is tried to use modified or strengthened approaches of these methods. Also, for data preparation, two main tasks are considered:

  1. (a)

    In conducting data mining projects, organize data in a standardized form so that they are ready for processing with data mining and other computer-based tools.

  2. (b)

    The data set should be prepared in such a way that it leads to the best performance of the data mining methods.

Phase 3: related to categories, the selection of categories for integration should be chosen in such a way that these categories complement each other and each one should be briefly explained. Separation of training and test samples for categories that are complementary to each other is done based on the features extracted from the data processing stage. In this case, we applied a supervised fuzzy K-means clustering technique to a set of image samples to train a machine-learning system. This data set contains 77 features extracted from the face images of the subjects.

Phase 4: in this stage, the feature selection technique based on the genetic algorithm is used for each category of features to select the set of features that have the most compatibility in the correct estimation of people. Based on the proposed algorithms, classification and separation of samples are done with the help of combined classification, and at this stage, data matching is done based on feature reduction using the proposed decision-making algorithm and applying them to the training data of people. For this step, a consistency objective function is defined, which is introduced in the next section.

Phase 5: in this phase of our work, we will use a WHO algorithm to enhance each of the machine learning systems based on the features assigned to them. The purpose of this phase is to increase the accuracy of machine learning systems based on weighting the specific features of each system.

Phase 6: The results obtained from the categories are combined in the form of majority votes. In this step, the selected values for the final output are obtained based on the categories and based on a fuzzy collective decision for the data from the responses. The fuzzy rules governing this decision are presented in the next section. The processes of implementing the proposed face authentication plan based on the Hybrid model are shown in Fig. 3.

Fig. 3
figure 3

Flowchart of steps to implement the proposed face authentication plan based on the hybrid model

4.1 Hybrid Model Techniques with the Help of Feature Adaptability and People’s Labeled Outputs

This project is carried out according to the steps mentioned to implement an enhanced hybrid machine learning system as training and testing system including ANN, ANFIS, and DT based on the proposed hybrid model. In this approach, we have used two techniques to create a hybrid model. In the first technique, we used fuzzy k-means clustering to classify features extracted from people’s faces with the help of image training data. Therefore, according to the output values of the training data, we find the feature clustering for the images of people who have registered to create a mobile banking account. The proposed clustering divides the extracted features of the images into three categories of features used for the machine learning system.

The second technique is based on the contribution ability of a polynomial objective function for each feature of the training images. In this technique, after normalizing the features, we use the genetic algorithm. This algorithm determines the polynomial coefficients after 300 iterations. In this section, we define the objective function as the polynomial deviation of the objective output values for the number of images of different people. The polynomial coefficients represent the independent responses of the genetic algorithm for the minimum final deviation from the image output values. In this step, after determining the desired multinational coefficients for all the features, whichever one has a lower output error, they are identified as matching features with the person’s image. These features are selected to train the machine learning system in clusters. The object function program code is below (Fig. 4).

Fig. 4
figure 4

The object function program code (for GA)

The technique of improving and strengthening machine learning systems to increase classification accuracy is included in the following work with the help of the following objective function relationship for each system used based on the characteristics assigned to them. In this technique, we have used the wild horse algorithm to determine the weighting for each feature to improve the accuracy of the system.

$${\text{Fitness function}}\, = \,100{-}{\text{accuracy}};\quad {\text{for}}\;\left( {W_{1} , \, W_{2} , \, W_{3} , \, \ldots \, W_{i} , \ldots .W_{n} } \right)$$
(4)

where Wi is the weight of the feature assigned to each machine system.

In continue, in Fig. 5, the pseudo-code of the objective functions for each of the machine learning systems is given to explain the weighting of different systems.

Fig. 5
figure 5

The object function program code (for WHO)

4.2 Corresponding Machine Learning Systems Answers

In the previous step, the objective function was interpreted and defined to remove unimportant features for classification with the help of a genetic algorithm. Then the remaining features are assigned to machine learning systems with the help of the k-means fuzzy clustering method. An innovation in this work is the use of the wild horse optimization (WHO) algorithm to improve the performance of a variety of machine learning systems. In this case, applying each of the inputs to different specific weights, will help to increase the accuracy of classification and authentication of people. The basis of this innovation is the definition of higher weights for features that have a better effect on the correct classification of images to the intended people. On the other hand, the features that make the system to correctly identify people will be removed from the decision cycle to identify people with a lower weighting value.

After selecting and classifying the features extracted from the images based on the performed divisions, a set of features is assigned to each machine learning system. Given training for each real person, each machine learning system is given a choice with a certain percentage of accuracy. With the help of the wild horse optimization algorithm in each of the machine learning systems (ANN, ANFIS, and DT), the weights of the assigned input features are defined. Figure 6 shows the performance results of the WHO algorithm for each of the machine learning systems. This algorithm enhances each machine to increase accuracy by weighting assigned input features. Table 2 shows the results of improving the accuracy of each machine learning system after optimization. As shown in this work, good improvements have been achieved based on the weighting of the input features.

Fig. 6
figure 6

Performance results of the WHO algorithm for 50 iterations with a population of 10 horses for A ANN. B ANFIS. C DT

Table 2 The results of correcting the accuracy of machine learning systems with the help of the wild horse optimization algorithm

In the last step, after determining the classification results of each of the learning machines for the studied images, a Takagi–Sugeno type fuzzy system according to Fig. 7 will be used to decide confirming the identity of the person in question. The governing rules for this fuzzy decision are given in Table 3. The input values of the fuzzy system are defined with the help of the accuracy level of each desired machine learning system for each person in the range between 0 and 1, which is determined from relations 5 and 6. Based on relationship 5, the estimation accuracy value of each person has been calculated with different machine learning systems. Then, for new online images, to identify people’s identity, the decision is made with relation 6 with the help of this fuzzy system. The output value of the system experiences values between 0 and 1. A value of 1 for the target person means that this person’s image is correctly identified and estimated by this decision system with 100% accuracy. On the other hand, the value 0 means that the person identified by this decision system does not belong to the received image. Based on this, a decision range for false images in the range of 10% to 90% is included in the output. The system considers the results fake and rejected. The relationship of calculating the accuracy of identifying people by the desired learning algorithm is defined as follows:

$${\text{Person identification accuracy}} = {\text{NCIDPimg}}/{\text{TNIDPimg}}.$$
(5)

where NCIDPimg: number of correctly identified person images and TNIDPimg: total number of person images in the machine learning system.

Fig. 7
figure 7

A Takagi Sugeno fuzzy decision system representation. B Input membership functions. C Characteristics of the output results of the proposed system for three values of 0.15, 0.5, and 0.85 for ANFIS input, respectively

Table 3 Basic rules of fuzzy decision-making

In this dataset used for one person, there are a certain number of images with different views and qualities of each person. Based on the number of correctly recognized images of people’s faces, people can be authenticated and calculations can be made with a high-confidence recognition model. As in Eq. 5, for each person per machine learning system has been used to determine the accuracy of identifying a particular person. Relation 6 ensures a reliable decision to authenticate the image received from the mobile.

5 Results and Discussions

To test the proposed approaches, a set of images taken from the AR face database [31, 32] and mobile camera frames for different people have been used. 15 photos with different facial expressions and light changes are considered for each person. The MUCT database consists of 3755 faces for 76 registered members. From this collection of images, we have selected 10 photos for training data. 3 images for test data and 2 images for validation data were considered for each person. For the collection of images, 50 people from the database and 5 people have been selected for different people with images taken with a mobile camera. To prevent forgery and fraud, we have used a series of images of different frames of people’s faces taken with different mobile cameras. The use of mobile cameras with real-time processing modes, while increasing the practical value of our proposed method in the field of mobile banking, greatly reduces the possibility of hackers using fake images. The online use of image processing with different frames of moving images has increased security against the placement of fake still images.

In our method, a video of a user doing each task counts as a separate login attempt. Since data collection is not controlled, videos of different lengths are collected. We present an automated pipeline to convert recorded videos into structured data that can be used to create features for face authentication. To get the best performance from the MATLAB software packages used for face detection and landmark detection, we first recovered the orientation information of the video. Then, a frame sample is taken from each collected video. All the videos we reviewed were sampled at 10 frames per second (fps). Then, using the aforementioned techniques and functions and the MATLAB toolbox, face recognition has been modeled and implemented for the paradigm. In this case, it has been tried to make the sampling time of imaging frames more than the processing time.

Pre-processing of images is done to remove noise and increase the quality of images with the help of primary models. For the light problem in this work, it has been tried to apply the average contrast of the images for all samples and frames in the value of 30 normal. That is, in the pre-processing stage, it tried to balance the contrast intensity of all images with 30% of the maximum light, and for images with contrast intensity less than 10% and above 90%, it is not considered for identity recognition and removed from the processing cycle or considered as fake images. Figure 8 shows a sample image for multiple people for authentication. After improving the images, according to Fig. 8b, the face image is divided into different parts, which are of great importance for identifying people. These parts include the eyes, nose, mouth, and facial skin.

Fig. 8
figure 8

Steps of facial image segmentation to extract features for several image samples. A The original image. B Determining the position of the eyes, nose, and mouth. C Display segmentation results

To extract features, we have used different functions to extract texture, color, and different geometric features of different parts. A total of 77 features are extracted for each image of the person in question, and due to the ineffectiveness of some features for image samples to classify people’s identity, the number of these features should be reduced in the next processing with the help of a genetic algorithm. According to this perspective, in this work, a hybrid model for clustering image features based on their dependence on target people and their compatibility with people’s authentication codes is presented. This hybrid model helps to reduce the amount of information processing for mobile processor systems. Also, it removes features that are less effective in identifying people. According to these processes, we divide the selected features into three categories, and with the help of the proposed techniques based on genetic algorithm and fuzzy k-means clustering, the set of 77 features are classified into three groups 2, 4, and 24 and decreases. It has used a total of 30 features for the image of people’s faces.

At this stage, the selected and categorized features are entered into three machine learning systems named ANN, ANFIS, and DT by 24, 4, and 2 groups, respectively. Training is done with the help of this data. Figure 9 shows the results of training by machine learning systems. Table 4 shows the accuracy results for the entire set of images of different people for all types of training and testing and validation data have been checked and calculated for all the images of different people.

Fig. 9
figure 9

Displaying the performance results of machine learning systems for test images for 12 people. A ANFIS: adaptive neural fuzzy system. B ANN: light artificial neural network system. C DT: decision tree system

Table 4 Comparing the accuracy of images for training and testing datasets and validation by different learning systems

We have selected 15 different and varied images for each person, and with the help of this set of images, we calculate different accuracies according to the total correctly recognized images according to Eq. 5. Now, for the tested image data, the authentication accuracy of each person for each machine learning system with the help of Eq. 5, is shown in Fig. 10. According to Fig. 10, there are different recognition accuracy guarantees for the results of each machine learning technique, which are different for each individual. Among the types of machine learning techniques, the ANFIS method is more accurate. In this figure, the recognition accuracy results of the five tested individuals for all three machine learning systems are shown in the form of a bar graph. By increasing the number of recorded images of the person in the personal information bank, the accuracy of identification increases, and the confidence of the proposed authentication system’s performance increases.

Fig. 10
figure 10

Calculated person identification accuracy by DT, ANFIS, and ANN for 5 people

These people are considered in the {a, b, c, d, e} group. According to the results displayed in the bar chart, person d has the highest accuracy in ensuring authentication compared to other people in the study group. Person e is at the lowest level of accuracy. Therefore, the authentication of this person is checked with more doubt and therefore it needs more images to increase accuracy.

The decision to identify people based on pre-trained samples is checked for all people according to the results of this image with the fuzzy decision system. Information is reviewed for each person who is identified. Then it calculates and compares the information of the desired person for all the recorded images with the help of the studied machine learning systems. In the last step, the proposed fuzzy system performs the final diagnosis. Meanwhile, the identity of a person with low output is invalid and makes the person unrecognizable.

Finally, in Fig. 11, the decision results for the test image with the help of a fuzzy logic system are displayed. As shown in the figure, the tested image sample, ensures up to 98.5% correct face recognition with the help of a fuzzy decision logic system. To check the performance of decision-making in this method, first, for each machine learning technique, the selected features from the desired person’s image are sent, and then each system declares its opinion for the authentication of the person with the desired selection percentage of the person based on formula 5. At first, a person with the label of each of the results of the systems makes an initial decision based on different percentages, and each label with a higher percentage is considered as the initial choice. For other results with the opposite label of the initial choice, a percentage of zero is loaded. Now, with the percentage of the results of each system and with the help of fuzzy logic, the final decision is made with the determined percentage.

Fig. 11
figure 11

Displaying the results of the authentication system for deciding two image samples (a) Acceptable. (b) Unacceptable

In describing the function of this system, it can be said that identity recognition is done for the desired person’s image with the help of any machine learning system. In the following, the inputs of the fuzzy system are defined for the results with the help of the following relationship:

$${\text{Output}} = {\text{Decisionsys}}\left( {W_{{{\text{DT}}}} \times X_{{{\text{DT}}}} , \, W_{{{\text{ANN}}}} \times X_{{{\text{ANN}}}} ,W_{{{\text{DT}}}} \times X_{{{\text{ANFIS}}}} } \right)$$
(6)

In this regard, x is the matrix of M × 1 and M is the number of people registered in the bank. X represents the estimated value for the person’s identity by different machine learning systems, which has a value between [0 and 1]. The closer it is to zero, the person in question has not been selected from the point of view of that machine, and if it is close to one, the person in question has been selected from the point of view of that machine. W matrix M × 1 has displayed the accuracy value of recognizing people with the help of each of the different machine learning systems. The output of this relationship, which is calculated with the help of the proposed fuzzy system, is calculated and quantified for all people. The final result to identify the person with a value above 0.9 is determined and authenticated.

5.1 Performance Metrics

The basic criteria used to evaluate an authentication system depend on the amount of authentication errors. In related works, these criteria are used to determine the quality level of authentication, some of which we discuss below [50,51,52]:

  • To define the performance of a classifier, a confusion matrix is utilized. In this circumstance, there are two likely predicted classes: “Genuine” and "Impostor". The parameters of this table are as follows:

  • ΤA (Actual Acceptance) is the number of patterns that belong to a real user and are correctly rated as “authentic.”

  • TR (True Rejection) is the number of templates that do not belong to the real user and are correctly rated as “Impostor.”

  • FA (False Acceptance) is the number of patterns that do not belong to the real user and are wrongly rated as “genuine.”

  • FR (False Reject) is the number of templates belonging to the real user and wrongly rated as “Impostor.”

Bearing the above considerations in mind, the evaluation of true acceptance rate (TAR), false acceptance rate (FAR), false rejection rate (FRR), accuracy, and equal error rate (EER) are as follows [34, 51]:

  • The true acceptance rate (TAR) is the conditional probability of a pattern being categorized into the “genuine” class, given that it belongs to it. TAR is given by the formulation:

    $${\text{TAR}} = \frac{{{\text{TA}}}}{{{\text{TA}} + {\text{FR}}^{\prime } }}$$
    (7)
  • False Acceptance Rate (FAR) is the conditional probability of a pattern to be categorized in the class “Genuine” given that it does not belong to it. FAR is given by the formulation:

    $${\text{FAR}} = \frac{{{\text{FA}}}}{{{\text{FA}} + {\text{TR}}^{\prime } }}$$
    (8)
  • False Reject Rate (FRR) is the conditional probability of a pattern not being categorized in the class “Genuine” given that it belongs to it. FRR is given by the formulation:

    $${\text{FRR}} = \frac{{{\text{FR}}}}{{{\text{FR}} + {\text{TA}}^{\prime } }}$$
    (9)
  • Accuracy is defined as the probability of a correct classification of a pattern. Accuracy is given by the formulation:

    $${\text{Accuracy}} = \frac{{{\text{TA}} + {\text{TR}}}}{{{\text{TA}} + {\text{TR}} + {\text{FA}} + {\text{FR}}^{\prime } }}$$
    (10)
  • Equal Error Rate (EER) is the error rate that is reached by tuning the recognition threshold of the system such that FAR and FRR are equal [35, 52]:

    (11)

Table 5 compares the simulation results for the desired data with other techniques in the articles. By observing the calculated parameters in this article, you can see the accurate and appropriate performance of the proposed technique.

Table 5 Comparing results with other articles

6 Conclusion

In this work, a dynamic face authentication technique is created according to the hybrid model with the help of feature clustering with fuzzy k-means method and machine learning systems optimized with the WHO algorithm and feature selection adapted with the GA algorithm. In engineering methods, we have selected features that are effective in representing face adaptive features and have less processing complexity by reducing features. In addition, a set of improved machine learning architectures was programmed and configured to achieve more reliable face verification. Most importantly, our study provides an effective approach for facial authentication systems that can be used in mobile banking and enhance bank account security and customer trust. We have used a fuzzy decision system to ensure the final choice for authenticating people. By examining the results of machine learning techniques (ANN, ANFIS, DT), authentication for the target person is done with high accuracy. The main problem of this proposed method is that it does not make any decisions for unlabeled individuals that have not been introduced before. In future work, paradigms are proposed that can detect different types of online and media banking-based attacks (print attacks, screen attacks, 2D masks, media-stolen videos, deep fakes) while detecting unlabeled individuals.