1 Introduction

The concept of biometrics originates from the words “life” and “measurement” and is defined as an identification method that distinguishes individuals from others by using biological data specific to the person. Biometrics encompasses tools that allow people to differentiate their physical or behavioral characteristics from those of others [12]. Biometric systems can be broadly classified into two groups, as illustrated in Fig. 1: physical (passive) and behavioral (active) systems. Physical biometric systems are based on unchanging physical features, such as fingerprints, hand geometry, face, voice, iris, and retina, which set individuals apart from one another. Behavioral biometric systems encompass behaviors performed for specific purposes at particular times, with each person displaying a unique approach. These behaviors include signature, writing dynamics, lip movements during speech, and gait [42].

Fig. 1
figure 1

Biometrics factor authentication [45]

The field of information systems has encountered significant challenges in terms of information security. Establishing user authorization is among the crucial elements of computer system security [22]. In the realm of user authentication mechanisms, behavioral biometrics provide an extra layer of security [4]. One of their major advantages is that they cannot be replicated. Furthermore, these systems are highly secure and user-friendly, effectively serving as an irreplaceable key. Biometric systems are also well-suited for integration with mobile systems, effectively mitigating issues such as password theft or forgetfulness [48].

Authentication methods in mobile systems can be categorized into two main approaches: static mode (one-time authentication) and dynamic mode (continuous authentication). In the static mode, a subject's identity is verified based on the input provided by the subject during the initial access to a system. This initial authentication step serves as the primary line of defense and is the most commonly employed security measure on mobile devices. Common input types in the static model include character-based and number-based passwords. Conversely, in the dynamic mode, a subject's identity is continuously verified throughout the active session of a mobile device. Authentication implemented in dynamic mode can detect unfamiliar touch dynamics patterns when someone other than the authorized user attempts to use the mobile device. The detection of an unrecognized touch dynamics pattern may result in restricting access to sensitive applications or triggering an additional re-authentication request [44]. Continuous authentication is gaining prominence as a potential alternative or complementary solution. This approach involves the continuous monitoring of a user's interaction patterns with a device, ensuring uninterrupted service. Continuous authentication relies on behavioral biometrics, encompassing unique behavioral patterns exhibited by users. Within a continuous authentication framework, the authentication process occurs in real time throughout the user's interaction, reducing the need for explicit authentication and providing users with a more convenient and seamless experience [19].

In the realm of authentication, machine-learning approaches play a pivotal role. Selecting the appropriate classifier for a study, aligned with the problem and data at hand, is of paramount importance. Standard numerical programming approaches often struggle to yield optimal solutions [11]. Consequently, enhancing the performance of these methods can be achieved through the development of hybrid systems [35, 54]. Classical machine learning methods rely on previously extracted features, and identifying the most efficient ones among these features is another critical aspect of this field [7]. Computational tools are employed to facilitate informed decision-making [15]. Meta-heuristic algorithms stand out as approximation techniques employed to tackle optimization problems [17]. Wrapper-based feature selection methods involve searching for the best feature subsets using various metaheuristic algorithms [16]. An alternative approach to identifying valuable features is filter-based methods, which are generally less costly than wrapper-based methods [33].

In this study, an analysis of soft keyboard typing behaviors, a subset of behavioral biometrics, was conducted for user recognition. Subsequently, a continuous authentication system was developed by incorporating machine learning techniques and soft keyboard typing behavior on smartphones. The main contributions of this study can be summarized as follows:

  1. 1.

    User identification was achieved by analyzing soft keyboard typing behaviors as a part of behavioral biometrics, utilizing smartphone sensors. In addition to the accelerometer and gyroscope sensors, features related to screen touch were also incorporated. This approach allowed for the examination of distinctions in how the user holds the smartphone and their typing patterns.

  2. 2.

    For this purpose, smartphone data acquired from 59 users were used. 125 unique features were extracted from the raw data.

  3. 3.

    To identify the most efficient and effective features, they were ranked using a correlation-based feature selection (CFS) method. Subsequently, the data were classified using the random forest (RF), k-nearest neighbors (kNN), and simple logistic regression (SLR) methods. The experimental results of the hybrid structure established in this study have demonstrated the feasibility of detecting users based on their soft keyboard typing behaviors in a remarkably short time, as fast as 0.03 ms, with a classification accuracy of up to 93%.

  4. 4.

    To the best of the authors’ knowledge, it is the first study to examine soft keyboard typing behavior in smartphones with motion sensors and propose a continuous authentication architecture. In addition, simple logistics is a method that has not been tried before in this field. The findings show that it provides higher success and lower test time than well-known methods.

  5. 5.

    Furthermore, a real-time mobile application structure has been developed for authentication. With this designed system, continuous authentication with high accuracy and energy efficiency can be effectively achieved.

Studies in the literature on authentication are explained in Sect. 2. The created dataset, correlation-based feature selection, and simple logistic regression method used in classification are briefly explained in Sect. 3. Obtained experimental findings are handled and discussed in Sect. 4. In Sect. 5, the continuous mobile authentication structure is explained. Finally, the study is concluded in Sect. 6.

2 Related work

When the related works are examined, it is seen that various approaches have been followed for smartphone authentication. Some of these studies are continuous authentication systems, while others are one-time authentication systems. In this context, various data sources and various machine learning methods were used.

Srikar et al. [41] designed a system that enables it to control and recognize the devices allocated to it using audio signals. Acien et al. [3] evaluated a biometric authentication system based on touch gestures with data obtained from a smartphone. To authenticate users, Acien et al. [2] exploited touch dynamics such as touch motions and keystrokes, as well as an accelerometer, gyroscope, WiFi, location, and application usage information. Lu et al. [25] proposed a method that authenticates users with keystrokes while typing in free text. For gesture-typing on mobile devices, Smith-Creasey and Rajarajan [40] developed a novel continuous authentication technique. Ma et al. [26] introduced a unique machine learning-based method for the automatic analysis of authentication and key agreement procedures. Lu et al. [24] proposed a lip-reading-based user authentication system on smartphones for user authentication. Yuksel et al. [53] examined the phone holding and typing behavior of users with smartphone accelerometers and gyroscope sensors. Based on the inadequacies of previous approaches, Zhu et al. [57] suggested a hybrid deep learning system for challenging real-world mobile authentication. da Silva Cruz and Goldschmidt [10] proposed a deep neural network-based structure to perform user recognition based on keystroke dynamics. Wang et al. [47] used face recognition for user authentication. Qin et al. [29] proposed an authentication system using biometric walking information. Yang et al. [52] introduced BehaveSense, a continuous authentication technique for mobile applications based on touch-based behavioral biometrics. Incel et al. [19] studied if it is possible to continually validate users with a certain performance in a mobile banking application using behavioral biometrics. Buriro et al. [9] presented a behavioral biometric-based smartphone user authentication mechanism. Abuhamad et al. [1] suggested a deep learning-based active authentication method based on smartphone sensors. Tse and Hung [45] presented an authentication scheme for touchscreen mobile devices that uses a combination of password, keystroke dynamics, and swipe dynamics. Nguyen and Memon [27] presented a touch-based authentication for smartwatches. To extract identity traits from touch traces, Zhao et al. [55] presented a novel graphical touch gesture feature. Feng et al. [13] presented a new touchscreen-based authentication approach in mobile devices. Lu and Liu [23] proposed a smartphone user authentication system based on finger movements on the screen. Shen et al. [36] looked at the viability of employing motion sensor data for smartphone user authentication. Xu et al. [51] examined how to model multiple touch data types and perform continuous authentication accordingly. Ramadan et al. [30] showed that different users exhibit different touch patterns. Zheng et al. [56] proposed a user authentication mechanism to detect whether an authenticating user is the real owner of the smartphone or someone else who knows the password.

This study proposes a continuous authentication system in which the soft keyboard usage behaviors of the smartphone user can be examined without being restricted to a certain application. The system uses motion sensor data and touch screen information. Studies in the literature show that a solution to the authentication problem can be found with the data obtained from the camera, microphone, motion sensor, and touch screen of smartphones. Various studies have been carried out with these data sources by examining image processing, audio signal processing, and keystroke dynamics. The data sources used in these studies and the handling of the data can be presented as other feasible alternatives to the current study. However, not every approach supports continuous authentication. Typing is an action performed in many applications of smartphone use. This makes continuous authentication possible. In addition, highly successful recognition is achieved with high-precision information obtained from motion sensors. Additionally, a CFS-based hybrid classification structure was adopted. Other filter-based feature selection approaches such as relief, information gain, and symmetric uncertainty can be considered as an alternative to the adopted hybrid architecture. However, the proposed CFS-based approach is easy to implement and appears to produce effective results as a result of preliminary experiments. As an alternative to filter-based feature selection approaches, wrapper-based approaches stand out. Although these approaches, which generally use meta-heuristic algorithms, have the potential to produce better results than filter-based approaches, they have more computational costs and are directly dependent on the classification method.

3 Materials and methods

3.1 Dataset and feature extraction

In this study, it is aimed to determine the user identity by examining the behavior of typing on the smartphone soft keyboard of the people. In order to acquire smartphone data, a mobile application was developed, that can run on the Android operating system, whose screenshots are presented in Fig. 2. Participants were asked to type sentences of different lengths. The dataset includes data that provides information about the user's phone-holding and typing patterns. These are the user's ID (identity document), number of touches to the screen, and number of erases, as well as signals obtained from the accelerometer, gyroscope, ambient light, magnetometer, and proximity sensors. These sensors can perform ultra-high precision measurements with current technology [37]. In the dataset, the data provided by each participant using their own smartphone were used. Accordingly, the data of 59 participants were analyzed. Depending on the system set up for user identification, the data obtained are divided into 5-s windows.

Fig. 2
figure 2

Data acquisition application screenshots

Accelerometer and gyroscope sensor data were obtained in three axes. In addition to these axes, the magnitude axis given in Eq. 1, which expresses the sum of the three axes, was calculated. 15 statistical metrics were applied to the sensor signals.

$${\text{magnitude}} = \sqrt {x^{2} + y^{2} + z^{2} }$$
(1)

In studies carried out with sensor data obtained from mobile devices, standard deviation, and average values are generally used when creating feature sets due to their low complexity and reasonable results [39]. In addition to these attributes, there are also studies using zero crossing, spectral energy [38], min, max [20], variance, and median [34] values. In this study, the features were extracted by applying various statistical formulas to the obtained raw smartphone data. As in studies by Şen et al. [43] and Sağbaş et al. [31] min, max, mean, variance, standard deviation, skewness, kurtosis, zero crossing, mean energy, mean teager energy, mean curve length, median, q1, q3, sum operations are applied to the signals divided into 5-s windows. Thus, it is aimed to determine the most effective feature sets among a larger number of features. The flowchart for obtaining the dataset is presented in Fig. 3.

Fig. 3
figure 3

Dataset flowchart

In addition to the features obtained from the sensor signals, number of key press, number of delete keypress, the variance between keypress time, the variance between delete key press time and the ratio of number of delete keypress and number of the key press were added to the dataset as a feature. The list of obtained features is presented in Table 1.

Table 1 List of obtained features

3.2 Correlation-based feature selection (CFS)

Feature selection is the process of selecting a subset of relevant features for use in model construction. The goal of feature selection is to produce an efficient classification model with a high success rate while reducing the size of high-dimensional classification problems [46]. Correlation-based feature selection ranks qualities using a heuristic assessment function based on correlations [50]. This method employs both a search algorithm and a function for calculating the information values of feature subsets. CFS calculates the internal correlation values between each feature as well as its success in estimating the class label of each feature when measuring the values of subsets of features. This method is based on the idea that good feature subsets are made up of features that are substantially linked with the relevant class but not with each other [8, 18]. The criterion used to evaluate a subset of features can be expressed as follows.

$$M_{s} = \frac{{k\overline{r}_{ci} }}{{\sqrt {k + k\left( {k - 1} \right)r_{{i\mathop i\limits^{`} }} } }}$$
(2)

In the formula, the number of features in k subsets \({\overline{r}}_{ci}\) shows the average correlation between Y and the feature, and \(r_{{i\mathop i\limits^{`} }}\) shows the average internal correlation of features between each other.

3.3 Simple logistic regression (SLR)

A logistic regression model that models the final class probabilities Pr(G = j | X = x) for classes J is a more effective technique to employ regression for classification tasks. These probabilities are modeled using linear functions in x, while also being added to one and remaining in the range [0, 1] [5, 21].

Friedman et al. [14] presented additive logistic regression research. In this mentioned study, various algorithms such as discrete AdaBoost, real AdaBoost, LogitBoost, gentle AdaBoost, and AdaBoost.MH were used to create new logistic regression models. Simple logistic is a classifier used to create linear logistic regression models. LogitBoost, whose algorithm is given in Fig. 4, is used to fit the logistic models. Cross-validation is used to determine the optimal amount of LogitBoost iterations to conduct, resulting in automatic feature selection [49].

Fig. 4
figure 4

LogitBoost algorithm [14]

LogitBoost generates the response variables' zij, which embody the inaccuracy of the currently fit model in the training data (in terms of probability estimates), and then seeks to improve the model by adding a function at each iteration. Fit the solution with a least square error, fmj to committee fj [14].

3.4 Performance metrics

It is seen that various alternative metrics are used to evaluate models in authentication studies. These are true acceptance rate [9], classification accuracy [2, 30, 45, 52, 53, 57], false acceptance rate [13], false rejection rate [23, 36], equal error rate [3, 19, 56], f1-score [1], and average error rate [51]. In this study, five different performance metrics were used to compare the performance of machine learning methods in detecting user identity. These are classification accuracy (CA), precision, false-positive rate (FPR), true positive rate (TPR), and f-score the formulas of which are presented in Eqs. 37. The accuracy rate is used to describe the closeness of a measurement to the true value. The likelihood that a genuine positive will test positive is known as the true positive rate. The false-positive rate is the percentage of all negative results that result in a positive test result. Precision is the ability of the classifier not to label a negative sample as positive. F-score is the harmonic mean of TPR and precision [6].

$${\text{CA}} = \left( {{\text{TN}} + {\text{TP}}} \right)/\left( {{\text{TN}} + {\text{TP}} + {\text{FN}} + {\text{FP}}} \right)$$
(3)
$${\text{TPR}} = {\text{TP}}/\left( {{\text{TP}} + {\text{FN}}} \right)$$
(4)
$${\text{FPR}} = {\text{FP}}/\left( {{\text{FP}} + {\text{TN}}} \right)$$
(5)
$${\text{Precision}} = {\text{TP}}/\left( {{\text{TP}} + {\text{FP}}} \right)$$
(6)
$${\text{F - score}} = {\text{TP}}/\left( {{\text{TP}} + 0.5 \times \left( {{\text{FP}} + {\text{FN}}} \right)} \right)$$
(7)

4 Experimental results and discussion

In this study, the data obtained from the smartphone were tested with three different classification approaches. Tests were performed on a computer with an Intel Core i5-7400 3.0 GHz processor on a Windows 10 operating system using the Java programming language in Apache NetBeans version 11.2 by applying tenfold cross-validation. WEKA toolkit version 3.8.5 was used for feature selection and classification.

Various heuristic algorithms such as best first search, genetic algorithm, greedy search, and particle swarm optimization are used in filter-based feature selection approaches [28]. However, optimization algorithms (i.e., genetic algorithm, particle swarm optimization) have disadvantages such as parameter selection problems, convergence problems, unbalanced distribution, problem dependency, and high computational cost. In this study, after implementing CFS to the dataset, the attributes are ordered. Experiments were carried out with the best N-element subset approach which provided successful results in Sağbaş et al. [32] and Şen et al. [43]. The flow chart of this approach is given in Fig. 5.

Fig. 5
figure 5

Flowchart of best N-element subset method

As shown in Fig. 5, the features are sorted according to the score values obtained from the CFS. Afterward, the best features were added one after another and the experiments were repeated. When related studies on authentication are examined, random forest [9], kNN [53], support vector machine [2, 19, 36], Bayesian networks [13], artificial neural networks [30] and various deep learning approaches [1, 3, 57] appear to have been used. These approaches are well-known and frequently used classification models. In this study, each feature subset is tested with random forest (RF), k nearest neighbor (kNN), and simple logistic regression (SLR) methods, and their performances are compared. As a result of the preliminary experiments, it was decided to determine the k value as 1 in the kNN method. LinearNNSearch was used as the nearest neighbor search algorithm. In the RF method, the number of leaves was determined as 200 and number of trees was determinated 100. In SLR, the method default values, heuristicStop as 50, and maxBoostingIteration as 500 were used. The change in the accuracy rates obtained as a result of the experiments is presented in Fig. 6.

Fig. 6
figure 6

Change in accuracy rates according to the number of elements in the feature subset

When the change in classification accuracy is examined, a significant improvement is noticeable after the first 5 features. After reaching a feature subset of 35 elements, the accuracy is remarkably close to the best results achieved. The highest classification accuracy, at 92.9551%, was obtained with the SLR classifier, using a 92-element feature subset. kNN classifier achieved an accuracy of 89.604% with a 73-element subset, while the RF classifier reached an accuracy of 90.3656% with a 105-element subset. Test times for the classifications presented in Fig. 5 can be found in Fig. 7.

Fig. 7
figure 7

Change in test times according to feature subset element numbers

Upon examination of Fig. 6, it becomes evident that the test time of the kNN method, which does not create a model but performs classification directly on samples, displays a linear increase. Conversely, the test times for the SLR and RF methods exhibit a consistent trajectory from the beginning to the end. It is important to note that the times presented in the chart represent the duration required to test the entire dataset, encompassing 2626 patterns. The performance metrics for the best results achieved as a result of the conducted tests are provided in Table 2.

Table 2 Performance measures of the best results according to the methods

Upon examination of the performance measurements, it is evident that the most successful method is the SLR. This method achieved an impressive classification accuracy of approximately 93%, and it required 92 features for this classification, reducing the feature set by 26%. Individual values for TPR, FPR, precision, and f-score were calculated. The lowest TPR observed was 0.621, while the average TPR was computed as 0.930. The highest FPR value was 0.006, while the average FPR was 0.001. The mean precision and f-score values were calculated as 0.954 and 0.929, respectively. Considering test times, the method with the longest test time was kNN, taking 30 ms to classify a pattern. RF followed with a test time of 0.13 ms. In contrast, the SLR method had a pattern classification time of 0.03 ms. Precision values based on participants are presented in Fig. 8.

Fig. 8
figure 8

Precision values on the basis of participants

When the results are analyzed on the basis of participants, it is seen that there are a limited number of participants whose precision 0.85. It was observed that the performance measurements of the participants numbered 4, 24, 25, 26, 36, 39, 48, and 53 were lower than the other participants. However, 100% success was achieved in estimating participants 7, 9, 13, 14, 15, 18, 20, 22, 23, 27, 28, 29, 32, 34, 35, 40, 46, 49, 51, 55, 56, 57 and 60.

A detailed comparison table of related authentication studies is presented in Table 3. This table presents the data types of the studies, evaluation metrics, obtained evaluation values, and the machine learning methods used. But, it is not possible to compare this study directly with other studies. Because the types of data used and the approaches to identify people differ. For authentication, Srikar et al. [41] used sound signals, Lu et al. [24] lip-reading, Wang et al. [47] facial recognition, Qin et al. [29] biometric gait information. In addition to these, various studies were carried out by using the screen touch and keystroke dynamics. Feng et al. [13], Zhao et al. [55] Tse and Hung [45], Yang et al. [52], Ramadan et al. [30], Lu and Liu [23], Acien et al. [3], Xu et al. [51] examined tactile approaches. Shen et al. [36], Abuhamad et al. [1], Incel et al. [19], Acien et al. [2], and Yuksel et al. [53] also benefited from motion sensors. When evaluation metrics are considered, it can be seen that metrics such as false acceptance rate, false rejection rate, classification accuracy, average error rate, true acceptance rate, and f-score were used. If the studies that use the accuracy rate as an evaluation metric are filtered, the studies suitable for comparison are as follows: Ramadan et al. [30], Tse and Hung [45], Yuksel et al. [53], Yang et al. [52], Acien et al. [2], Lu et al. [24], Wang et al. [47], and Zhu et al. [57]. The average accuracy rate for these eight studies is 92.47%. However, it is worth remembering again that the types of data used in the studies are different from each other.

Table 3 Comparison of authentication studies

5 Structure of mobile continuous authentication system

The experiments conducted have demonstrated the possibility of accurately identifying the user of a smartphone by analyzing their soft keyboard typing behaviors. Consequently, a mobile user authentication system was developed. This system initiates by prompting the user to enter a specific text. Once the first letter is entered into the text box, a 5-s process begins. If no text is input within a 2-s interval, the process is terminated. Following the completion of the data collection process, 92 features selected by the CFS method for the SLR method are extracted. Subsequently, this pattern is classified using the SLR method, utilizing a pre-trained model loaded on the smartphone to determine the user. The obtained result is stored in an n-length circular queue structure. Access is permitted or denied based on the results stored in the queue. If none of the results in the circular queue belong to the phone's user, access is denied, resulting in the phone being locked. Conversely, access is granted if there is a pattern in the circular queue that corresponds to the phone's user. The system's workflow is illustrated in Fig. 9.

Fig. 9
figure 9

Continuous authentication structure based on typing behaviors

The proposed architecture offers several advantages. Sensor data collection occurs exclusively during user typing, enhancing energy efficiency since the sensors remain passive during other activities. Furthermore, the system is not limited to a specific application but can function across all applications that involve text input. The use of a circular queue is integral to the system. This data structure operates on a first-in, first-out principle, and the last position connects to the first, creating a circular arrangement. The obtained results are stored in the circular queue structure with a specified length, preventing the phone from being locked based on a single incorrect guess. If none of the consecutive texts in the queue match the user of the phone, access is denied, ensuring additional security.

6 Conclusion

In this study, a continuous authentication system was proposed, focusing on the examination of smartphone soft keyboard typing behaviors. The study utilized the accelerometer and gyroscope sensors provided by smartphones to investigate the positions of users holding the phone and the variations that occur during typing. Additionally, the number of key presses and delete key presses were recorded and integrated into the classification process. Features were ranked using the correlation-based feature selection method, and their performance was assessed using the best N subset method. As a result of the experiments, the simple logistic regression (SLR) method demonstrated an impressive accuracy rate of up to 93%. The feature selection method effectively reduced the feature set by 26%, conserving memory and processing time. Moreover, the SLR method exhibited a shorter testing time compared to other tested methods, classifying a single pattern in just 0.03 ms. However, there are several limitations that could be addressed to further enhance this work. These include dealing with noisy data, accommodating a small number of participants, addressing imbalances in data distribution among participants, and optimizing parameters in classification models, all of which can negatively impact the study's success. Potential improvements for this work include: (i) Selecting one or more specific texts for identity verification and conducting evaluations exclusively on those texts, potentially increasing classification accuracy. (ii) Dynamically adjusting the pattern generation window range to enhance the efficiency of the authentication application. (iii) Cleaning contradictory data to eliminate device and human-induced noises, thereby improving classification accuracy. (iv) Expanding the study by increasing the number of participants and using different sampling methods to develop more efficient classification models. These enhancements could lead to even more robust and accurate continuous authentication systems in future.