Machine learning-based novel continuous authentication system using soft keyboard typing behavior and motion sensor data

Sağbaş, Ensar Arif; Ballı, Serkan

doi:10.1007/s00521-023-09360-9

Machine learning-based novel continuous authentication system using soft keyboard typing behavior and motion sensor data

Original Article
Open access
Published: 07 January 2024

Volume 36, pages 5433–5445, (2024)
Cite this article

Download PDF

You have full access to this open access article

Neural Computing and Applications Aims and scope Submit manuscript

Machine learning-based novel continuous authentication system using soft keyboard typing behavior and motion sensor data

Download PDF

1634 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Smartphones utilize various authentication methods, including passwords, fingerprints, and face recognition. While this information is quite practical and easy to remember, it introduces several security issues. The primary concerns involve theft, password forgetfulness, or unauthorized password copying. Implementing behavioral biometrics for user authentication adds an extra layer of security. The main contribution of this study is the utilization of soft keyboard typing behavior, a behavioral biometric, for continuous user recognition. To achieve this, the phone's grip style and typing characteristics of users are scrutinized using data collected from motion sensors and the touchscreen panel. Another challenge in mobile device authentication pertains to recognition accuracy and processing time. To expedite and optimize data classification, a hybrid classification structure is suggested. This structure incorporates correlation-based feature selection and a straightforward logistic regression method, offering rapid and highly accurate classification outcomes—a further contribution of this study. Experimental results demonstrate that user identification can be accomplished in as little as 0.03 ms, with a classification accuracy of up to 93%. Continuous authentication systems offer greater security compared to one-time authentication systems. Nevertheless, these systems might not always yield the most precise results. Overcoming this challenge necessitates the development of an efficient software architecture. In line with this, an additional contribution of this study is an explanation of how to construct a continuous authentication system using the developed architecture.

Classification of Soft Keyboard Typing Behaviors Using Mobile Device Sensors with Machine Learning

Article 17 January 2019

I Sensed It Was You: Authenticating Mobile Users with Sensor-Enhanced Keystroke Dynamics

Keystroke Dynamics-Based Analysis and Classification of Hand Posture Using Machine Learning Techniques

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The concept of biometrics originates from the words “life” and “measurement” and is defined as an identification method that distinguishes individuals from others by using biological data specific to the person. Biometrics encompasses tools that allow people to differentiate their physical or behavioral characteristics from those of others [12]. Biometric systems can be broadly classified into two groups, as illustrated in Fig. 1: physical (passive) and behavioral (active) systems. Physical biometric systems are based on unchanging physical features, such as fingerprints, hand geometry, face, voice, iris, and retina, which set individuals apart from one another. Behavioral biometric systems encompass behaviors performed for specific purposes at particular times, with each person displaying a unique approach. These behaviors include signature, writing dynamics, lip movements during speech, and gait [42].

The field of information systems has encountered significant challenges in terms of information security. Establishing user authorization is among the crucial elements of computer system security [22]. In the realm of user authentication mechanisms, behavioral biometrics provide an extra layer of security [4]. One of their major advantages is that they cannot be replicated. Furthermore, these systems are highly secure and user-friendly, effectively serving as an irreplaceable key. Biometric systems are also well-suited for integration with mobile systems, effectively mitigating issues such as password theft or forgetfulness [48].

Authentication methods in mobile systems can be categorized into two main approaches: static mode (one-time authentication) and dynamic mode (continuous authentication). In the static mode, a subject's identity is verified based on the input provided by the subject during the initial access to a system. This initial authentication step serves as the primary line of defense and is the most commonly employed security measure on mobile devices. Common input types in the static model include character-based and number-based passwords. Conversely, in the dynamic mode, a subject's identity is continuously verified throughout the active session of a mobile device. Authentication implemented in dynamic mode can detect unfamiliar touch dynamics patterns when someone other than the authorized user attempts to use the mobile device. The detection of an unrecognized touch dynamics pattern may result in restricting access to sensitive applications or triggering an additional re-authentication request [44]. Continuous authentication is gaining prominence as a potential alternative or complementary solution. This approach involves the continuous monitoring of a user's interaction patterns with a device, ensuring uninterrupted service. Continuous authentication relies on behavioral biometrics, encompassing unique behavioral patterns exhibited by users. Within a continuous authentication framework, the authentication process occurs in real time throughout the user's interaction, reducing the need for explicit authentication and providing users with a more convenient and seamless experience [19].

In the realm of authentication, machine-learning approaches play a pivotal role. Selecting the appropriate classifier for a study, aligned with the problem and data at hand, is of paramount importance. Standard numerical programming approaches often struggle to yield optimal solutions [11]. Consequently, enhancing the performance of these methods can be achieved through the development of hybrid systems [35, 54]. Classical machine learning methods rely on previously extracted features, and identifying the most efficient ones among these features is another critical aspect of this field [7]. Computational tools are employed to facilitate informed decision-making [15]. Meta-heuristic algorithms stand out as approximation techniques employed to tackle optimization problems [17]. Wrapper-based feature selection methods involve searching for the best feature subsets using various metaheuristic algorithms [16]. An alternative approach to identifying valuable features is filter-based methods, which are generally less costly than wrapper-based methods [33].

In this study, an analysis of soft keyboard typing behaviors, a subset of behavioral biometrics, was conducted for user recognition. Subsequently, a continuous authentication system was developed by incorporating machine learning techniques and soft keyboard typing behavior on smartphones. The main contributions of this study can be summarized as follows:

1.
User identification was achieved by analyzing soft keyboard typing behaviors as a part of behavioral biometrics, utilizing smartphone sensors. In addition to the accelerometer and gyroscope sensors, features related to screen touch were also incorporated. This approach allowed for the examination of distinctions in how the user holds the smartphone and their typing patterns.
2.
For this purpose, smartphone data acquired from 59 users were used. 125 unique features were extracted from the raw data.
3.
To identify the most efficient and effective features, they were ranked using a correlation-based feature selection (CFS) method. Subsequently, the data were classified using the random forest (RF), k-nearest neighbors (kNN), and simple logistic regression (SLR) methods. The experimental results of the hybrid structure established in this study have demonstrated the feasibility of detecting users based on their soft keyboard typing behaviors in a remarkably short time, as fast as 0.03 ms, with a classification accuracy of up to 93%.
4.
To the best of the authors’ knowledge, it is the first study to examine soft keyboard typing behavior in smartphones with motion sensors and propose a continuous authentication architecture. In addition, simple logistics is a method that has not been tried before in this field. The findings show that it provides higher success and lower test time than well-known methods.
5.
Furthermore, a real-time mobile application structure has been developed for authentication. With this designed system, continuous authentication with high accuracy and energy efficiency can be effectively achieved.

Studies in the literature on authentication are explained in Sect. 2. The created dataset, correlation-based feature selection, and simple logistic regression method used in classification are briefly explained in Sect. 3. Obtained experimental findings are handled and discussed in Sect. 4. In Sect. 5, the continuous mobile authentication structure is explained. Finally, the study is concluded in Sect. 6.

2 Related work

When the related works are examined, it is seen that various approaches have been followed for smartphone authentication. Some of these studies are continuous authentication systems, while others are one-time authentication systems. In this context, various data sources and various machine learning methods were used.

Srikar et al. [41] designed a system that enables it to control and recognize the devices allocated to it using audio signals. Acien et al. [3] evaluated a biometric authentication system based on touch gestures with data obtained from a smartphone. To authenticate users, Acien et al. [2] exploited touch dynamics such as touch motions and keystrokes, as well as an accelerometer, gyroscope, WiFi, location, and application usage information. Lu et al. [25] proposed a method that authenticates users with keystrokes while typing in free text. For gesture-typing on mobile devices, Smith-Creasey and Rajarajan [40] developed a novel continuous authentication technique. Ma et al. [26] introduced a unique machine learning-based method for the automatic analysis of authentication and key agreement procedures. Lu et al. [24] proposed a lip-reading-based user authentication system on smartphones for user authentication. Yuksel et al. [53] examined the phone holding and typing behavior of users with smartphone accelerometers and gyroscope sensors. Based on the inadequacies of previous approaches, Zhu et al. [57] suggested a hybrid deep learning system for challenging real-world mobile authentication. da Silva Cruz and Goldschmidt [10] proposed a deep neural network-based structure to perform user recognition based on keystroke dynamics. Wang et al. [47] used face recognition for user authentication. Qin et al. [29] proposed an authentication system using biometric walking information. Yang et al. [52] introduced BehaveSense, a continuous authentication technique for mobile applications based on touch-based behavioral biometrics. Incel et al. [19] studied if it is possible to continually validate users with a certain performance in a mobile banking application using behavioral biometrics. Buriro et al. [9] presented a behavioral biometric-based smartphone user authentication mechanism. Abuhamad et al. [1] suggested a deep learning-based active authentication method based on smartphone sensors. Tse and Hung [45] presented an authentication scheme for touchscreen mobile devices that uses a combination of password, keystroke dynamics, and swipe dynamics. Nguyen and Memon [27] presented a touch-based authentication for smartwatches. To extract identity traits from touch traces, Zhao et al. [55] presented a novel graphical touch gesture feature. Feng et al. [13] presented a new touchscreen-based authentication approach in mobile devices. Lu and Liu [23] proposed a smartphone user authentication system based on finger movements on the screen. Shen et al. [36] looked at the viability of employing motion sensor data for smartphone user authentication. Xu et al. [51] examined how to model multiple touch data types and perform continuous authentication accordingly. Ramadan et al. [30] showed that different users exhibit different touch patterns. Zheng et al. [56] proposed a user authentication mechanism to detect whether an authenticating user is the real owner of the smartphone or someone else who knows the password.

This study proposes a continuous authentication system in which the soft keyboard usage behaviors of the smartphone user can be examined without being restricted to a certain application. The system uses motion sensor data and touch screen information. Studies in the literature show that a solution to the authentication problem can be found with the data obtained from the camera, microphone, motion sensor, and touch screen of smartphones. Various studies have been carried out with these data sources by examining image processing, audio signal processing, and keystroke dynamics. The data sources used in these studies and the handling of the data can be presented as other feasible alternatives to the current study. However, not every approach supports continuous authentication. Typing is an action performed in many applications of smartphone use. This makes continuous authentication possible. In addition, highly successful recognition is achieved with high-precision information obtained from motion sensors. Additionally, a CFS-based hybrid classification structure was adopted. Other filter-based feature selection approaches such as relief, information gain, and symmetric uncertainty can be considered as an alternative to the adopted hybrid architecture. However, the proposed CFS-based approach is easy to implement and appears to produce effective results as a result of preliminary experiments. As an alternative to filter-based feature selection approaches, wrapper-based approaches stand out. Although these approaches, which generally use meta-heuristic algorithms, have the potential to produce better results than filter-based approaches, they have more computational costs and are directly dependent on the classification method.

3 Materials and methods

3.1 Dataset and feature extraction

In this study, it is aimed to determine the user identity by examining the behavior of typing on the smartphone soft keyboard of the people. In order to acquire smartphone data, a mobile application was developed, that can run on the Android operating system, whose screenshots are presented in Fig. 2. Participants were asked to type sentences of different lengths. The dataset includes data that provides information about the user's phone-holding and typing patterns. These are the user's ID (identity document), number of touches to the screen, and number of erases, as well as signals obtained from the accelerometer, gyroscope, ambient light, magnetometer, and proximity sensors. These sensors can perform ultra-high precision measurements with current technology [37]. In the dataset, the data provided by each participant using their own smartphone were used. Accordingly, the data of 59 participants were analyzed. Depending on the system set up for user identification, the data obtained are divided into 5-s windows.

Accelerometer and gyroscope sensor data were obtained in three axes. In addition to these axes, the magnitude axis given in Eq. 1, which expresses the sum of the three axes, was calculated. 15 statistical metrics were applied to the sensor signals.

$${\text{magnitude}} = \sqrt {x^{2} + y^{2} + z^{2} }$$

(1)

In studies carried out with sensor data obtained from mobile devices, standard deviation, and average values are generally used when creating feature sets due to their low complexity and reasonable results [39]. In addition to these attributes, there are also studies using zero crossing, spectral energy [38], min, max [20], variance, and median [34] values. In this study, the features were extracted by applying various statistical formulas to the obtained raw smartphone data. As in studies by Şen et al. [43] and Sağbaş et al. [31] min, max, mean, variance, standard deviation, skewness, kurtosis, zero crossing, mean energy, mean teager energy, mean curve length, median, q1, q3, sum operations are applied to the signals divided into 5-s windows. Thus, it is aimed to determine the most effective feature sets among a larger number of features. The flowchart for obtaining the dataset is presented in Fig. 3.

In addition to the features obtained from the sensor signals, number of key press, number of delete keypress, the variance between keypress time, the variance between delete key press time and the ratio of number of delete keypress and number of the key press were added to the dataset as a feature. The list of obtained features is presented in Table 1.

Table 1 List of obtained features

Full size table

3.2 Correlation-based feature selection (CFS)

Feature selection is the process of selecting a subset of relevant features for use in model construction. The goal of feature selection is to produce an efficient classification model with a high success rate while reducing the size of high-dimensional classification problems [46]. Correlation-based feature selection ranks qualities using a heuristic assessment function based on correlations [50]. This method employs both a search algorithm and a function for calculating the information values of feature subsets. CFS calculates the internal correlation values between each feature as well as its success in estimating the class label of each feature when measuring the values of subsets of features. This method is based on the idea that good feature subsets are made up of features that are substantially linked with the relevant class but not with each other [8, 18]. The criterion used to evaluate a subset of features can be expressed as follows.

$$M_{s} = \frac{{k\overline{r}_{ci} }}{{\sqrt {k + k\left( {k - 1} \right)r_{{i\mathop i\limits^{`} }} } }}$$

(2)

In the formula, the number of features in k subsets ${\overline{r}}_{ci}$ shows the average correlation between Y and the feature, and $r_{{i\mathop i\limits^{`} }}$ shows the average internal correlation of features between each other.

3.3 Simple logistic regression (SLR)

A logistic regression model that models the final class probabilities Pr(G = j | X = x) for classes J is a more effective technique to employ regression for classification tasks. These probabilities are modeled using linear functions in x, while also being added to one and remaining in the range [0, 1] [5, 21].

Friedman et al. [14] presented additive logistic regression research. In this mentioned study, various algorithms such as discrete AdaBoost, real AdaBoost, LogitBoost, gentle AdaBoost, and AdaBoost.MH were used to create new logistic regression models. Simple logistic is a classifier used to create linear logistic regression models. LogitBoost, whose algorithm is given in Fig. 4, is used to fit the logistic models. Cross-validation is used to determine the optimal amount of LogitBoost iterations to conduct, resulting in automatic feature selection [49].

LogitBoost generates the response variables' z_ij, which embody the inaccuracy of the currently fit model in the training data (in terms of probability estimates), and then seeks to improve the model by adding a function at each iteration. Fit the solution with a least square error, f_mj to committee f_j [14].

3.4 Performance metrics

It is seen that various alternative metrics are used to evaluate models in authentication studies. These are true acceptance rate [9], classification accuracy [2, 30, 45, 52, 53, 57], false acceptance rate [13], false rejection rate [23, 36], equal error rate [3, 19, 56], f1-score [1], and average error rate [51]. In this study, five different performance metrics were used to compare the performance of machine learning methods in detecting user identity. These are classification accuracy (CA), precision, false-positive rate (FPR), true positive rate (TPR), and f-score the formulas of which are presented in Eqs. 3–7. The accuracy rate is used to describe the closeness of a measurement to the true value. The likelihood that a genuine positive will test positive is known as the true positive rate. The false-positive rate is the percentage of all negative results that result in a positive test result. Precision is the ability of the classifier not to label a negative sample as positive. F-score is the harmonic mean of TPR and precision [6].

$${\text{CA}} = \left( {{\text{TN}} + {\text{TP}}} \right)/\left( {{\text{TN}} + {\text{TP}} + {\text{FN}} + {\text{FP}}} \right)$$

(3)

$${\text{TPR}} = {\text{TP}}/\left( {{\text{TP}} + {\text{FN}}} \right)$$

(4)

$${\text{FPR}} = {\text{FP}}/\left( {{\text{FP}} + {\text{TN}}} \right)$$

(5)

$${\text{Precision}} = {\text{TP}}/\left( {{\text{TP}} + {\text{FP}}} \right)$$

(6)

$${\text{F - score}} = {\text{TP}}/\left( {{\text{TP}} + 0.5 \times \left( {{\text{FP}} + {\text{FN}}} \right)} \right)$$

(7)

4 Experimental results and discussion

In this study, the data obtained from the smartphone were tested with three different classification approaches. Tests were performed on a computer with an Intel Core i5-7400 3.0 GHz processor on a Windows 10 operating system using the Java programming language in Apache NetBeans version 11.2 by applying tenfold cross-validation. WEKA toolkit version 3.8.5 was used for feature selection and classification.

Various heuristic algorithms such as best first search, genetic algorithm, greedy search, and particle swarm optimization are used in filter-based feature selection approaches [28]. However, optimization algorithms (i.e., genetic algorithm, particle swarm optimization) have disadvantages such as parameter selection problems, convergence problems, unbalanced distribution, problem dependency, and high computational cost. In this study, after implementing CFS to the dataset, the attributes are ordered. Experiments were carried out with the best N-element subset approach which provided successful results in Sağbaş et al. [32] and Şen et al. [43]. The flow chart of this approach is given in Fig. 5.

As shown in Fig. 5, the features are sorted according to the score values obtained from the CFS. Afterward, the best features were added one after another and the experiments were repeated. When related studies on authentication are examined, random forest [9], kNN [53], support vector machine [2, 19, 36], Bayesian networks [13], artificial neural networks [30] and various deep learning approaches [1, 3, 57] appear to have been used. These approaches are well-known and frequently used classification models. In this study, each feature subset is tested with random forest (RF), k nearest neighbor (kNN), and simple logistic regression (SLR) methods, and their performances are compared. As a result of the preliminary experiments, it was decided to determine the k value as 1 in the kNN method. LinearNNSearch was used as the nearest neighbor search algorithm. In the RF method, the number of leaves was determined as 200 and number of trees was determinated 100. In SLR, the method default values, heuristicStop as 50, and maxBoostingIteration as 500 were used. The change in the accuracy rates obtained as a result of the experiments is presented in Fig. 6.

When the change in classification accuracy is examined, a significant improvement is noticeable after the first 5 features. After reaching a feature subset of 35 elements, the accuracy is remarkably close to the best results achieved. The highest classification accuracy, at 92.9551%, was obtained with the SLR classifier, using a 92-element feature subset. kNN classifier achieved an accuracy of 89.604% with a 73-element subset, while the RF classifier reached an accuracy of 90.3656% with a 105-element subset. Test times for the classifications presented in Fig. 5 can be found in Fig. 7.

Upon examination of Fig. 6, it becomes evident that the test time of the kNN method, which does not create a model but performs classification directly on samples, displays a linear increase. Conversely, the test times for the SLR and RF methods exhibit a consistent trajectory from the beginning to the end. It is important to note that the times presented in the chart represent the duration required to test the entire dataset, encompassing 2626 patterns. The performance metrics for the best results achieved as a result of the conducted tests are provided in Table 2.

Table 2 Performance measures of the best results according to the methods

Full size table

Upon examination of the performance measurements, it is evident that the most successful method is the SLR. This method achieved an impressive classification accuracy of approximately 93%, and it required 92 features for this classification, reducing the feature set by 26%. Individual values for TPR, FPR, precision, and f-score were calculated. The lowest TPR observed was 0.621, while the average TPR was computed as 0.930. The highest FPR value was 0.006, while the average FPR was 0.001. The mean precision and f-score values were calculated as 0.954 and 0.929, respectively. Considering test times, the method with the longest test time was kNN, taking 30 ms to classify a pattern. RF followed with a test time of 0.13 ms. In contrast, the SLR method had a pattern classification time of 0.03 ms. Precision values based on participants are presented in Fig. 8.

When the results are analyzed on the basis of participants, it is seen that there are a limited number of participants whose precision 0.85. It was observed that the performance measurements of the participants numbered 4, 24, 25, 26, 36, 39, 48, and 53 were lower than the other participants. However, 100% success was achieved in estimating participants 7, 9, 13, 14, 15, 18, 20, 22, 23, 27, 28, 29, 32, 34, 35, 40, 46, 49, 51, 55, 56, 57 and 60.

A detailed comparison table of related authentication studies is presented in Table 3. This table presents the data types of the studies, evaluation metrics, obtained evaluation values, and the machine learning methods used. But, it is not possible to compare this study directly with other studies. Because the types of data used and the approaches to identify people differ. For authentication, Srikar et al. [41] used sound signals, Lu et al. [24] lip-reading, Wang et al. [47] facial recognition, Qin et al. [29] biometric gait information. In addition to these, various studies were carried out by using the screen touch and keystroke dynamics. Feng et al. [13], Zhao et al. [55] Tse and Hung [45], Yang et al. [52], Ramadan et al. [30], Lu and Liu [23], Acien et al. [3], Xu et al. [51] examined tactile approaches. Shen et al. [36], Abuhamad et al. [1], Incel et al. [19], Acien et al. [2], and Yuksel et al. [53] also benefited from motion sensors. When evaluation metrics are considered, it can be seen that metrics such as false acceptance rate, false rejection rate, classification accuracy, average error rate, true acceptance rate, and f-score were used. If the studies that use the accuracy rate as an evaluation metric are filtered, the studies suitable for comparison are as follows: Ramadan et al. [30], Tse and Hung [45], Yuksel et al. [53], Yang et al. [52], Acien et al. [2], Lu et al. [24], Wang et al. [47], and Zhu et al. [57]. The average accuracy rate for these eight studies is 92.47%. However, it is worth remembering again that the types of data used in the studies are different from each other.

Table 3 Comparison of authentication studies

Full size table

5 Structure of mobile continuous authentication system

The experiments conducted have demonstrated the possibility of accurately identifying the user of a smartphone by analyzing their soft keyboard typing behaviors. Consequently, a mobile user authentication system was developed. This system initiates by prompting the user to enter a specific text. Once the first letter is entered into the text box, a 5-s process begins. If no text is input within a 2-s interval, the process is terminated. Following the completion of the data collection process, 92 features selected by the CFS method for the SLR method are extracted. Subsequently, this pattern is classified using the SLR method, utilizing a pre-trained model loaded on the smartphone to determine the user. The obtained result is stored in an n-length circular queue structure. Access is permitted or denied based on the results stored in the queue. If none of the results in the circular queue belong to the phone's user, access is denied, resulting in the phone being locked. Conversely, access is granted if there is a pattern in the circular queue that corresponds to the phone's user. The system's workflow is illustrated in Fig. 9.

The proposed architecture offers several advantages. Sensor data collection occurs exclusively during user typing, enhancing energy efficiency since the sensors remain passive during other activities. Furthermore, the system is not limited to a specific application but can function across all applications that involve text input. The use of a circular queue is integral to the system. This data structure operates on a first-in, first-out principle, and the last position connects to the first, creating a circular arrangement. The obtained results are stored in the circular queue structure with a specified length, preventing the phone from being locked based on a single incorrect guess. If none of the consecutive texts in the queue match the user of the phone, access is denied, ensuring additional security.

6 Conclusion

In this study, a continuous authentication system was proposed, focusing on the examination of smartphone soft keyboard typing behaviors. The study utilized the accelerometer and gyroscope sensors provided by smartphones to investigate the positions of users holding the phone and the variations that occur during typing. Additionally, the number of key presses and delete key presses were recorded and integrated into the classification process. Features were ranked using the correlation-based feature selection method, and their performance was assessed using the best N subset method. As a result of the experiments, the simple logistic regression (SLR) method demonstrated an impressive accuracy rate of up to 93%. The feature selection method effectively reduced the feature set by 26%, conserving memory and processing time. Moreover, the SLR method exhibited a shorter testing time compared to other tested methods, classifying a single pattern in just 0.03 ms. However, there are several limitations that could be addressed to further enhance this work. These include dealing with noisy data, accommodating a small number of participants, addressing imbalances in data distribution among participants, and optimizing parameters in classification models, all of which can negatively impact the study's success. Potential improvements for this work include: (i) Selecting one or more specific texts for identity verification and conducting evaluations exclusively on those texts, potentially increasing classification accuracy. (ii) Dynamically adjusting the pattern generation window range to enhance the efficiency of the authentication application. (iii) Cleaning contradictory data to eliminate device and human-induced noises, thereby improving classification accuracy. (iv) Expanding the study by increasing the number of participants and using different sampling methods to develop more efficient classification models. These enhancements could lead to even more robust and accurate continuous authentication systems in future.

Data availability

Not applicable.

Code availability

Not applicable.

References

Abuhamad M, Abuhmed T, Mohaisen D, Nyang D (2020) AUToSen: Deep-learning-based implicit continuous authentication using smartphone sensors. IEEE Internet Things J 7(6):5008–5020
Google Scholar
Acien A, Morales A, Vera-Rodriguez R, Fierrez J, Tolosana R (2019) Multilock: mobile active authentication based on multiple biometric and behavioral patterns. In: 1st ınternational workshop on multimodal understanding and learning for embodied applications, pp 53–59
Acien A, Morales A, Vera-Rodriguez R, Fierrez J (2020) Smartphone sensors for modeling human-computer interaction: general outlook and research datasets for user authentication. In: 2020 IEEE 44th annual computers, software, and applications conference (COMPSAC), pp 1273–1278 IEEE
Antal M, Fejér N (2020) Mouse dynamics based user recognition using deep learning. Acta Univ Sapientiae Inform 12(1):39–50
Google Scholar
Ballı S, Karasoy O (2019) Development of content-based SMS classification application by using Word2Vec-based feature extraction. IET Softw 13(4):295–304
Google Scholar
Ballı S, Özdemir E (2021) A novel method for prediction of EuroLeague game results using hybrid feature extraction and machine learning techniques. Chaos Solitons Fractals 150:111119
MathSciNet Google Scholar
Balli S, Sağbaş EA, Peker M (2019) Human activity recognition from smart watch sensor data using a hybrid of principal component analysis and random forest algorithm. Meas Control 52(1–2):37–45
Google Scholar
Budak H (2018) Feature selection methods and a new approach. Süleyman Demirel Univ J Nat Appl Sci 22:21–31
Google Scholar
Buriro A, Crispo B, Conti M (2019) AnswerAuth: a bimodal behavioral biometric-based user authentication scheme for smartphones. J inf security appl 44:89–103
Google Scholar
da Silva Cruz MA, Goldschmidt RR (2019) Deep neural networks applied to user recognition based on keystroke dynamics: learning from raw data. In: Proceedings of the XV Brazilian symposium on ınformation systems, pp 1–8
Devi RM, Premkumar M, Jangir P, Elkotb MA, Elavarasan RM, Nisar KS (2022) IRKO: an ımproved Runge-Kutta optimization algorithm for global optimization problems. Comput Mater Contin. https://doi.org/10.32604/cmc.2022.020847
Article Google Scholar
Evliyaoğlu F (2015) The success of biometric identification techniques for preventing healthcare abuse. Sos Güvence 8:96–118
Google Scholar
Feng T, Liu Z, Kwon KA, Shi W, Carbunar B, Jiang Y, Nguyen N (2012) Continuous mobile authentication using touchscreen gestures. In: 2012 IEEE conference on technologies for homeland security (HST), pp 451–456 IEEE
Friedman J, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann Stat 28(2):337–407
Google Scholar
Ghasemi M, Akbari MA, Jun C, Bateni SM, Zare M, Zahedi A, Chau KW (2022) Circulatory system based optimization (CSBO): an expert multilevel biologically inspired meta-heuristic algorithm. Eng Appl Comput Fluid Mech 16(1):1483–1525
Google Scholar
Gokalp O, Tasci E, Ugur A (2020) A novel wrapper feature selection algorithm based on iterated greedy metaheuristic for sentiment classification. Expert Syst Appl 146:113176
Google Scholar
Gupta D, Dhar AR, Roy SS (2021) A partition cum unification based genetic-firefly algorithm for single objective optimization. Sādhanā 46(3):121
Google Scholar
Hall M (1999) Correlation-based feature selection for machine learning, the university of Waikato, PhD Thesis, Hamilton
Incel ÖD, Günay S, Akan Y, Barlas Y, Basar OE, Alptekin GI, Isbilen M (2021) DAKOTA: sensor and touch screen-based continuous authentication on a mobile banking application. IEEE Access 9:38943–38960
Google Scholar
Kose M, Incel OD, Ersoy C (2012) Online human activity recognition on smart phones. In: workshop on mobile sensing: from smartphones and wearables to big data, vol 16(2012), pp 11–15
Landwehr N, Hall M, Frank E (2005) Logistic model trees. Mach learn 59(1–2):161–205
Google Scholar
Lin IC, Ou HH, Hwang MS (2005) A user authentication system using back-propagation network. Neural Comput Appl 14(3):243–249
Google Scholar
Lu L, Liu Y (2015) Safeguard: User reauthentication on smartphones via behavioral biometrics. IEEE Trans on Comput Soc Syst 2(3):53–64
Google Scholar
Lu L, Yu J, Chen Y, Liu H, Zhu Y, Kong L, Li M (2019) Lip reading-based user authentication through acoustic sensing on smartphones. IEEE/ACM Trans Netw 27(1):447–460
Google Scholar
Lu X, Zhang S, Hui P, Lio P (2020) Continuous authentication by free-text keystroke based on CNN and RNN. Comput Secur 96:101861
Google Scholar
Ma Z, Liu Y, Wang Z, Ge H, Zhao M (2020) A machine learning-based scheme for the security analysis of authentication and key agreement protocols. Neural Comput Appl 32(22):16819–16831
Google Scholar
Nguyen T, Memon N (2018) Tap-based user authentication for smartwatches. Comput Secur 78:174–186
Google Scholar
Onan A, Korukoğlu S (2016) The analysis of feature selection methods in text classification. Academic Computing, Aydın, pp 59–66
Google Scholar
Qin Z, Huang G, Xiong H, Qin Z, Choo KKR (2019) A fuzzy authentication system based on neural network learning and extreme value statistics. IEEE Trans Fuzzy Syst 29(3):549–559
Google Scholar
Ramadan A, Hemeda H, Sarhan A (2017) Touch-input based continuous authentication using gesture-level and session-level features. In: 2017 8th IEEE annual ınformation technology, electronics and mobile communication conference (IEMCON), pp 222–229 IEEE
Sağbaş EA, Korukoglu S, Balli S (2020) Stress detection via keyboard typing behaviors by using smartphone sensors and machine learning techniques. J Med Syst 44(4):1–12
Google Scholar
Sağbaş EA, Korukoglu S, Balli S (2022) Stress detection on smartphone data with a machine learning approach based on Mahalanobis distance-based outlier finding and ReliefF feature selection. Pamukkale Univ J Eng Sci 28(2):333–345
Google Scholar
Sağbaş EA (2023) A comparative study on text sentiment classification by using filter-based feature selection methods. Fırat Univ J Eng Sci 35(1):239–250
Google Scholar
Sağbaş EA, Korukoglu S, Ballı S (2023) Real-time stress detection from smartphone sensor data using genetic algorithm-based feature subset optimization and k-nearest neighbor algorithm. Multimed Tools and Appl. https://doi.org/10.1007/s11042-023-15706-1
Article Google Scholar
Sammen SS, Ghorbani MA, Malik A, Tikhamarine Y, AmirRahmani M, Al-Ansari N, Chau KW (2020) Enhanced artificial neural network with Harris hawks optimization for predicting scour depth downstream of ski-jump spillway. Appl Sci 10(15):5160
CAS Google Scholar
Shen C, Yu T, Yuan S, Li Y, Guan X (2016) Performance analysis of motion-sensor behavior for user authentication on smartphones. Sensors 16(3):345
ADS PubMed PubMed Central Google Scholar
Shen JP, Li C (2017) A semi-continuum-based bending analysis for extreme-thin micro/nano-beams and new proposal for nonlocal differential constitution. Compos Struct 172:210–220
Google Scholar
Shoaib M, Bosch S, Incel OD, Scholten H, Havinga PJ (2014) Fusion of smartphone motion sensors for physical activity recognition. Sensors 14(6):10146–10176
ADS PubMed PubMed Central Google Scholar
Shoaib M, Bosch S, Incel OD, Scholten H, Havinga PJ (2016) Complex human activity recognition using smartphone and wrist-worn motion sensors. Sensors 16(4):426
ADS PubMed PubMed Central Google Scholar
Smith-Creasey M, Rajarajan M (2019) A novel word-independent gesture-typing continuous authentication scheme for mobile devices. Comput Secur 83:140–150
Google Scholar
Srikar NC, Sasidhar B, Chanukya OV, Suggala RP (2020) User recognition using voice and appliance automation. J Interdiscip Cycle Res 12(4):152–158
Google Scholar
Şamlı R, Yüksel ME (2009) Biyometrik Güvenlik Sistemleri. Akad Bilişim 9:683–689
Google Scholar
Şen B, Peker M, Çavuşoğlu A, Çelebi FV (2014) A comparative study on classification of sleep stage based on EEG signals using feature selection and classification algorithms. J Med Syst 38(3):1–21
Google Scholar
Teh PS, Zhang N, Teoh ABJ, Chen K (2016) A survey on touch dynamics authentication in mobile devices. Comput Secur 59:210–235
Google Scholar
Tse KW, Hung K (2019) Behavioral biometrics scheme with keystroke and swipe dynamics for user authentication on mobile platform. In: 2019 IEEE 9th symposium on computer applications & ındustrial electronics (ISCAIE), pp 125–130 IEEE
Uzun B, Ballı S (2022) A novel method for intrusion detection in computer networks by identifying multivariate outliers and Relief feature selection. Neural Comput Appl 34(20):1–6
Google Scholar
Wang X, Xue H, Liu X, Pei Q (2019) A privacy-preserving edge computation-based face verification system for user authentication. IEEE Access 7:14186–14197
Google Scholar
Web-1: https://www.perkotek.com/biyometrik-tanima-sistemleri-nedir/. Accessed 15 mar 2021
Web-2: https://lost-contact.mit.edu/afs//cs.wisc.edu/unsup/weka-3.6.4/doc/weka/classifiers/functions/SimpleLogistic.html. Accessed 23 mar 2021
Wosiak A, Zakrzewska D (2018) Integrating correlation-based feature selection and clustering for improved cardiovascular disease diagnosis. Complexity. https://doi.org/10.1155/2018/2520706
Article Google Scholar
Xu H, Zhou Y, Lyu MR (2014) Towards continuous and passive authentication via touch biometrics: an experimental study on smartphones. In: 10th symposium on usable privacy and security (SOUPS), pp 187–198
Yang Y, Guo B, Wang Z, Li M, Yu Z, Zhou X (2019) Behavesense: continuous authentication for security-sensitive mobile apps using behavioral biometrics. Ad Hoc Netw 84:9–18
Google Scholar
Yuksel AS, Senel FA, Cankaya IA (2019) Classification of soft keyboard typing behaviors using mobile device sensors with machine learning. Arab J Sci Eng 44(4):3929–3942
Google Scholar
Zhao N, Ghaemi A, Wu C, Band SS, Chau KW, Zaguia A, Mosavi AH (2021) A decomposition and multi-objective evolutionary optimization model for suspended sediment load prediction in rivers. Eng Appl Comput Fluid Mech 15(1):1811–1829
Google Scholar
Zhao X, Feng T, Shi W (2013) Continuous mobile authentication using a novel graphic touch gesture feature. In: 2013 IEEE sixth international conference on biometrics: theory, applications and systems (BTAS), pp 1–6 IEEE
Zheng N, Bai K, Huang H, Wang H (2014) You are how you touch: user verification on smartphones via tapping behaviors. In: 2014 IEEE 22nd ınternational conference on network protocols, pp 221–232 IEEE
Zhu T, Weng Z, Chen G, Fu L (2020) A hybrid deep learning system for real-world mobile user authentication using motion sensors. Sensors 20(14):3876
ADS PubMed PubMed Central Google Scholar

Download references

Funding

Open access funding provided by the Scientific and Technological Research Council of Türkiye (TÜBİTAK).

Author information

Authors and Affiliations

Department of Information Systems Engineering, Faculty of Technology, Muğla Sıtkı Koçman University, 48000, Muğla, Turkey
Ensar Arif Sağbaş
Department of Software Engineering, Faculty of Computer and Informatics, Mehmet Akif Ersoy University, 15300, Burdur, Turkey
Serkan Ballı

Authors

Ensar Arif Sağbaş
View author publications
You can also search for this author in PubMed Google Scholar
Serkan Ballı
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Serkan Ballı.

Ethics declarations

Competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethics approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. The study was reviewed and approved by the Scientific Research and Publication Ethics Boards, Ege University (Ethics approval protocol number: 11/01-362, date: 26.11.2019).

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Sağbaş, E.A., Ballı, S. Machine learning-based novel continuous authentication system using soft keyboard typing behavior and motion sensor data. Neural Comput & Applic 36, 5433–5445 (2024). https://doi.org/10.1007/s00521-023-09360-9

Download citation

Received: 15 December 2022
Accepted: 07 December 2023
Published: 07 January 2024
Issue Date: April 2024
DOI: https://doi.org/10.1007/s00521-023-09360-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Machine learning-based novel continuous authentication system using soft keyboard typing behavior and motion sensor data

Abstract

Similar content being viewed by others

Classification of Soft Keyboard Typing Behaviors Using Mobile Device Sensors with Machine Learning

I Sensed It Was You: Authenticating Mobile Users with Sensor-Enhanced Keystroke Dynamics

Keystroke Dynamics-Based Analysis and Classification of Hand Posture Using Machine Learning Techniques

1 Introduction

2 Related work