1 Introduction

Recent technological advancements caused a huge increase in the use of mobile devices. Smart phones, notebooks, and iPads come with many capabilities including email, text messaging, gaming, web browsing, navigation, and recording pictures/videos. These devices store a lot of personal information and, if stolen, loss of control over the data may be more important than the loss of the smart mobile device.

Some prior works on mobile device security have focused on physical aspects and/or access control methods (e.g., strong passwords, voice recognition [26], or fingerprints [21]). However, such approaches do not protect the private data on stolen devices in the post-authentication state. Today’s smart devices are already equipped with tools that allow us to obtain vast amount of data about user behavior, such as application usage logs. In addition, many mobile devices are equipped with location identification tools such as Global Positioning System (GPS) receivers, which can be used to track locations in case of theft. However, existing works using GPS-features to protect mobile devices (e.g., GadgetTrak [12] and RecoveryCop [25]) depend on the owner to report the theft, and it may take hours before the owner realizes it, at which point private data may have already been exploited. Even Laptop Cop [23] requires user intervention to remotely/manually delete the data on stolen devices.

Our main goal is to develop efficient techniques for protecting data saved on mobile devices by detecting anomalous spatio–temporal behavior as compared to the regular motion patterns of the owners. A study performed by González et al. [14] on 100,000 trajectories of anonymized mobile phone users whose positions were tracked for a 6-month period has demonstrated that many individuals tend to have small sets of locations that they visit frequently (e.g. home, work, school) and tend to take the same path when moving between locations. Observations González et al. [14] imply that the user’s presence at a certain time in a certain location is predictable—hence, we can utilize this to build a user profile which, in turn, can be used to perform anomaly detection.

In a previous study [34], we used network access patterns and file system activities on laptops to build a behavioral model based on K-means clustering that permitted attack detection with a latency of 5 min and an accuracy of 90 %. In a recent work [35], we used users' location information and trajectory data to build the profile of smart phone users, and we were able to detect attacks within 15 min with 81 % accuracy. This paper extends our results [35] as follows:

  1. 1.

    We present an enhanced user model based on the previously discussed spatio–temporal information and trajectory data approach where we assumed a normal distribution histogram for the user profile. We eliminated the low end of the distribution (lower than 10 % values) during the detection analysis in order to achieve 96 % detection accuracy.

  2. 2.

    We propose, implement, and compare two data reduction techniques that enable us to reduce the memory requirements by ≈90 % and consequently reduce the processing time. Those techniques are the Row-Merge algorithm, which combines adjacent rows in our data structures and the MDLP algorithm, which is an adaptation of an existing statistical technique [3] to our settings.

  3. 3.

    We evaluated our techniques on an additional spatio–temporal data set—Geolife [3638].

In summary, this article makes the following main contributions.

  • We develop two statistical profiling approaches and corresponding representations: one based on empirical cumulative probability measure and the other based on the Markov property, in order to model the normal behavior of a user in a fixed time-window. An anomaly is detected when the probability of a user window reflecting a normal behavior falls below a threshold that is determined by controlling the recall rate of the user’s normal behavior.

  • We present two techniques that reduce user profile memory requirements while still allowing accurate attack detection.

  • We present a detailed experimental evaluation of the proposed methodologies over two data sets, quantifying the benefits of our approaches.

In the rest of this paper, Sect. 2 places the work in the context of our system architecture and discusses the data and feature extraction methods. Section 3 presents the detail of the user profile representation and our anomaly-based detection schemes. Section 4 presents the methods used to reduce the size of the user profile data. Section 5 presents a comprehensive experimental evaluation of our methods. Section 6 describes related work and Sect. 7 concludes the paper and indicates directions for future work.

2 Preliminaries

We now give an overview of our system architecture, followed by discussion of the properties of the data and their use in feature extraction.

Our system for automatic generation of mobility models and detection of spatio–temporal behavioral anomalies is based on a client–server architecture utilizing cloud computing. Its main modules are (1) data collection, (2) feature extraction, (3) user profile/model building, (4) data reduction, and (5) anomaly detection. The detection accuracy will be determined by which anomalous behavior can be distinguished using such models and considering other users’ models for anomaly detection; Fig. 1 illustrates the integration of these modules into our system architecture, which consists of the following sub-systems:

Fig. 1
figure 1

System architecture

(ICS)—the information capturing system, which resides on the mobile device, contains an application to track the device location, register it periodically, and save it in a new log file every T minutes. It also contains the feature extraction module.

(IMS)—the information management system, which collects the log-files from the ICS and resides on a computer with higher performance and much looser power consumption constraints than the mobile device. It is responsible for building mobility models and performing anomaly detection. Upon building the user model, the IMS, possibly after the data reduction process, sends the user model to the mobile device, allowing local detection of attacks in the absence of wireless connection.

(RMS)—the response management system, which resides on both the mobile device and the remote server hosting the IMS. Upon receiving an alert, the RMS identifies the appropriate action to protect data on the mobile device, for example, notifying the device owner, locking the device, or automatically deleting private data.

Our work focuses on the algorithms and implementations for the ICS and the IMS modules, since the RMS consists of user-dependent actions to be executed upon actual detection of an attack. Again, the rationale is to maximize the extent to which the mobile devices themselves can detect the anomalous spatio–temporal behavior. While the data structures representing the user motion are built at the server, in the case of transient network failure, classification can still be performed on the client using the most recent transmitted matrix. Clearly, this may affect the classification accuracy if the network connection is not available for a prolonged period of time.

2.1 Mobility profiles

We now present our setup for the data collection and the feature extraction modules.

2.1.1 Data collection

Motion traces are essential for model construction and anomaly detection. To obtain them, motion monitoring software needs to be developed to collect information about each user’s motion patterns—that is, his spatio–temporal data (along with the other user activities such as file system access and network activities). These are saved as trace files, to be sent to the IMS system periodically at pre-determined intervals.

In our initial system implementation, we relied on the fact that a number of researchers gathered vast amount of motion traces and they are publicly available [11, 14, 27, 28, 38]. We note, however, that some of these traces were collected for reasons different from ours, with different experimental settings and requirements.

Our desiderata can be summarized by the following properties, abbreviated as (LCF):

  • longevity (L): collected for a long period of time, continuously;

  • consistency (C): collected at regular times (e.g., same times daily); and

  • high frequency (F): to support fast anomaly detection.

After analyzing the different available traces, two data sets—Reality Mining data set [11] and Geolife [38]—turned out to provide closest match for the (LCF) properties.

The Reality Mining data set contains traces collected over a 9-month period for over 100 users, consisting of phone calls logs, locations identified by tower IDs and area IDs, event logs, and device-specific data such as the device specs. The collection interval ranged from a few seconds to 15 min, with an average of 2.5 min (except when the mobile device was off) at regular time-instants daily.

The Geolife data set is a collection of GPS trajectories for 178 users in a period of over 4 years. The data was recorded with high frequency where 91 % of the trajectories are logged every 1–5 seconds or every 5–10 meters per point. With closer examination, we noticed that about 50 % of this data set is also compliant with the LCF properties.

2.1.2 Feature extraction

Reality Mining data set: The traces have over 55 data features capturing information about the users’ mobility, activity, communication events, reporting time, and device-specific information such as the MAC address and the device maker. Since we focus on the spatio–temporal and trajectory features, properties like user activity, device-specific information, user communication style, and user affiliation information were not considered.

Reality Mining data provides three values to represent a location: cell tower ID, area ID, and area name. The cell tower ID gives information associated with user’s location—therefore, it is a source of information for the user’s movement over time. However, the tower ID information in the Reality Mining data set has no geographical coordinate information, and since each physical location could be associated with multiple tower IDs, we consider the tower ID as unreliable feature. Thus, we have selected the area ID to represent the location information in our study. Area ID represents the physical location (Library, Office, etc.) identified either by the information capturing system (ICS) itself or by the user when reporting new locations such as home, office, restaurants, etc.

Our spatio–temporal analysis techniques depends on extracting the following features from the Reality Mining log:

  1. 1.

    (u i )—User ID;

  2. 2.

    (l j )—Location information, represented by the area ID in the traces; and

  3. 3.

    (t k )—Timestamps of the data records in the trace.

Thus, our input data records are tuples of the form (u i l j t k ).

Geolife data set: These traces have a smaller number of features—only seven of them, including the longitude, latitude, and altitude information in addition to the date and time information, and the transportation mode (car, bus, walking, \(\ldots\)).

We note that this data set was collected in 30 different cities in China, United States, Korea, and Europe, and we focused on the trajectories that were collected in same cities.

Examining the row data directed us to think that most study users started at the location with 39.0°–41.0°, 115.5°–117.5°) coordinates and then moved to different areas by bus, train, plane, or boat. This location represents Beijing (China [31]). We focused on an area of (138 × 110) square miles [24].

To utilize this data set, the GPS location information needed to be mapped into an area ID, so that the structure is similar to the Reality Mining data set representation. The longitude and latitude information is provided by degrees, with precision up to (0.000001°). In the GPS system, at 39° latitude, any change on the longitude to the (±0.0001°) digit represents ≈8 m, while at the 116° longitude, the same change in the altitude represents ≈7 m. Therefore, in this data set, we rounded the coordinate numbers to the closest fourth decimal digit and have each coordinate pair represent an area ID, again having records/tuples of the form (u i l j t k ).

3 Data models and anomaly detection

We developed two statistical profiling approaches in order to model the normal behavior of a user in a fixed window. Model #1 is based on the empirical cumulative probability measure of location and time, while Model #2 is based on the Markov transition property. In this section, we describe each of them in detail.

3.1 Model #1: Location-in-time probability measure

In Model #1, for each user u i , we extract the location l j and timestamp t k . For conciseness, we will sometimes neglect notation for user ID when it is clear from the context.

3.1.1 Building the user profile

Since our goal is to detect attacks by detecting deviation from the user’s normal behavior, the first step is to develop a model of a user’s normal behavior based on the set of locations that the user has visited during the data collection period. To build the respective user profiles for each user in the data set, we divided the data logs evenly into two consecutive data sets: model_data (used for model construction) and test_data (used for evaluation).

Utilizing the model_data, a user profile was constructed as follows:

  1. 1.

    For each user u i , we extracted the distinct locations and kept track of them in a list (L i ).

  2. 2.

    We built a table of |L i | columns and NT rows to save the location probability values, where NT stands for the number of minutes in a day.

  3. 3.

    We calculated the probability Prob i (t k l j ) that represents the fraction of time in the model_data in which the user u i was at location l j at time t k , where 1 ≤ j ≤ |L i |, and 1 ≤ k ≤ NT. Recall that at any given time t k , the user u i should be at some unique location l j from the location list L i

    $$ \forall t_k \in NT, \,\, \exists \,l_j \in L_i \quad\hbox{where}\,\hbox{Prob}_i(t_k,l_j) > 0 $$
    (1)
  4. 4.

    We extracted from the user distinct location list L i the user’s common locations list (UCL i ), which consists of locations that the user has visited more than 1 % of the time during the data collection period. All locations that have been visited less than 1 % of the time will be saved in the Infrequent list (IF i ) in order to be able to delete all related records from the model_data, and the respective columns from the user profile.

    $$ \begin{aligned} &\forall l_j \in L_i:\\ &\hbox{if} \sum\limits_{k=1}^{NT} \hbox{Prob}_i(t_k,l_j) \geq 0.01 \, \hbox{then} \, l_j \in UCL_i\\ &\hbox{if} \sum\limits_{k=1}^{NT} \hbox{Prob}_i(t_k,l_j) < 0.01 \,\hbox{then} \, l_j \in IF_i \end{aligned} $$

    We selected the value of 1 % based on the study performed by Bayir et al. [4], which reported that individuals spend 79–85 % of their time in small number of locations (2–8), and less than 15 % of their time in large number of locations that they have visited only less than 1 % of the time. We observed that the fraction |IF i |/|L i | can be significant, and hence, keeping track only of UCL i is a first step toward reducing the storage costs.

  5. 5.

    We eliminated the least visited locations from the profile table, and obtained a final user profile size of |UCL i | ≤ |L i | columns and NT rows.

  6. 6.

    We create a discrete probability distribution by counting visits to the common locations and then normalizing so they sum to one.

    $$ \sum_{k=1}^{NT} \sum_{j=1}^{|UCL_i|} LOC{{\text{-}}}IN{{\text{-}}}TIME_{i}(t_{k},l_{j})= 1 $$
    (2)

Figure 2 shows an example user profile represented as a two-dimensional matrix with (NT × |UCL i |) elements. Rows (t k ) correspond to the minutes of the day (12:00 AM, 12:01 AM, \(\ldots\), 11:59 PM) and columns (l j ) correspond to locations (l 1l 2l 3l 4l 5). Each cell in the user profile represents the weighted probability LOC-IN-TIME i (t k l j ).

Fig. 2
figure 2

User profile for Model #1

For example, this user profile shows that the user has never been in locations l 2l 4, or l 5 at 12:00 AM during the data collection period, while 4 % of the data collection time he was at location l 1. Please note that while reading the timestamp in the mobility trace, we consider the hour and the minute values only; therefore, each row in the user profile represents the minute of the day plus 59 seconds.

3.1.2 Anomaly detection

The anomaly detection process is responsible for receiving streams of user mobility data, comparing them with the user profile, and identifying an anomaly (potential theft of the mobile device).

Our anomaly detection scheme falls into the class of statistical methods [7] which are based on the assumption that normal data instances occur in a high probability measure of the stochastic model while anomalies occur in the low probability region of the stochastic model. Our scheme is a nonparametric collective anomaly detection model, where the probability values are extracted from the traces and an anomaly represents an unusual sequence of data.

The first step of our collective anomaly detection scheme is to randomly select 100 samples \((S_1, S_2, S_3,\,\ldots,\,S_m,\,\ldots,\,S_{100})\) from the test_data set, for which the time span is T minutes as shown in Fig. 3.

Fig. 3
figure 3

Example of dividing the test_data set into 100 test samples

A random sample S m of time span T corresponds to a contiguous sequence of records: \((u_{i},l_{j},t_{k}),\,(u_{i}, l_{j_1}, t_{k_1}),\,\ldots,\,(u_{i},l_{j_x}, t_{k_x}),\,\ldots,\,(u_{i}, l_{j_n}, t_{k_n})\) satisfying these three conditions:

  • \(t_{k} \leq t_{k_1}, \ldots, \leq t_{k_x}, \ldots, \leq t_{k_n}\)

  • \((t_{k_n} - {t_k})\leq {T}\)

  • \((t_{k_{n+1}} - {t_k}) > {T}\)

The number of records per sample varies among samples due to the variation in data collection interval.

For each sample S m , we define the empirical cumulative probability \(P_{S_{m}}\) of the records in the sequence using the probability distribution table established during the profile building phase and based on the model_data representative of the user u i as follows:

$$ P_{S_m} = \sum _{(k,j)\in S_{m}} LOC{{\text{-}}}IN{{\text{-}}}TIME_{i}(t_k, l_j) $$
(3)

As an example, let us consider the sample S 1 as shown in Fig. 4. To calculate the \(P_{S_{1}}\) value, we check the user profile illustrated in Fig. 2 and we extract the values for each corresponding record from the representative user profile. Therefore,

$$ \begin{aligned} P_{S_1} &= LOC{{\text{-}}}IN{{\text{-}}}TIME_{i}(12:00 AM,\,l_1)\\ &\quad+LOC{{\text{-}}}IN{{\text{-}}}TIME_{i}(12:01 AM,\,l_1)\\ &\quad+LOC{{\text{-}}}IN{{\text{-}}}TIME_{i}(12:02 AM,\, l_1)\\ &\quad+LOC{{\text{-}}}IN{{\text{-}}}TIME_{i}(12:03 AM, \,l_1)\\ &\quad+LOC{{\text{-}}}IN{{\text{-}}}TIME_{i}(12:04 AM, \, l_1)\\ &\\ P_{S_1} &= 0.04 + 0.04 + 0.03 + 0.032 + 0.038 = 0.18 \end{aligned} $$
Fig. 4
figure 4

Example of calculating \(P_{S_{m}}\) value

We calculate the \(P_{S_{m}}\) values, essential for defining the user trust value, for all 100 samples as illustrated (cf. Fig. 4) similarly.

We are now ready to relate the time span of a sample derived from the model data to the detection delay:

Definition 1

Detection delay (T) is the shortest length (measured in time) of the trace generated by the mobile device that would allow the system to distinguish among users with an acceptable accuracy rate.

The detection delay T equals the time span of the user samples discussed above and the incoming data stream windows need to cover also the same time span T.

Definition 2

Trust value (P trust ) for Model #1 is the empirical cumulative probability of samples of span T that represents a confidence interval of 90 % based on the user profile. All data stream windows with cumulative probability less than P trust are considered attacks.

Attacks are detected via mismatches between the data stream windows and the samples conforming to the normal user behavior, yielding an attack detection delay T. When the empirical cumulative probability of a specific data stream window drops below the Trust value (P trust ), our system concludes that the mobile device is used by someone other than its owner, or what we call in this paper, the device is under attack.

Definition 3

False acceptance rate (FAR) is the percentage of the attack data stream windows that are accepted by the system as normal user behavior.

Definition 4

False rejection rate (FRR) is the percentage of the user’s normal data stream windows that are identified by the system as an attack.

We focus on the FAR and FRR values—an ideal system should have FAR = FRR = 0. Yet errors are possible since human mobility traces can deviate from the calculated profile from time to time. Therefore, our goal is to associate with every user a P trust value that strikes a good balance between the FAR and FRR values.

In the example illustrated in Fig. 4, the smallest \(P_{S_{m}}\) value equals zero; therefore, setting P trust also equals to the smallest \(P_{S_{m}}\) value, implies that the system will accept every incoming stream window and treat as acceptable user behavior. Subsequently, in this case, we obtain FAR = 100 % and FRR = 0 %.

Fig. 5
figure 5

The relationship between precision and recall

Figure 5 shows the relationship between precision and recall as examined in our data set, where (x =)recall = 1 − FRR and (y =)precision = 1 − FAR. We observe that the high recall, say 0.9, implies a small FRR (0.1) and large FAR (0.8) values. As we decrease the recall, the FRR values increase, for example, for recall = 0.7, we get FRR = 0.3 and correspondingly FAR = 0.6.

Fig. 6
figure 6

The histogram of cumulative probability for user u 92

Our heuristic starts with the observation that the histogram of the cumulative probabilities for the 100 traces of each user is close to the histogram for user u 92 as shown in Fig 6. Next, we choose a P trust value for each user that guarantees FRR ≤ 10 % which corresponds to accepting 90 % of the user’s normal behavior based on the trace samples. If we consider \(P_{S_{m}}\) to be a random window cumulative probability, then the range from the determined P trust value to one forms a 90 % confidence interval for \(P_{S_{m}}\). Intuitively, for FRR ≤ 10 %, the P trust score can maximally discover behavior anomalies (corresponding the true attack) with a very small false alarm rate.

After calculating the \(P_{trust_{i}}\) for each user, the anomaly detection process can start, formally described by Algorithm 1.

figure a

3.2 Model #2: Markov-based transition probability matrix

Model #2 is a collective anomaly detection scheme that makes use of them Markov chain stationary property. In our model, states correspond to tuples of the form (u i t k l j ). The Markov property in our context means that the probability of a user u i moving to location \(l_{j^{\prime}}\) at time t k does depend on previous location l j visited by u i in the model_data.

Conceptually, the user’s location–duration trace is divided into sequences, that is, trajectories. Each trajectory consists of a start point (SSP), a number of intermediate points, and an end point SEP, and may differ semantically due to the notion of stopping time T STP which is defined as the time interval during which the user is stationary. Based on observations from other researchers [32], we use T STP  = 30 min.

3.2.1 Building user profile

For Model #2, the user profile is a three-dimensional table LOC-TIME-MOVE i . Each entry in this table, \( LOC{{\text{-}}}TIME{{\text{-}}}MOVE_{i}(t_k,l_{j^{\prime}},l_j)\), represents the weighted probability of the user u i moving from location l j to location \(l_{j^{\prime}}\) at time t k .

Figure 7 represents the state diagram corresponding to a trace of user sequences starting at location l 1 at time t 1. The nodes stand for the identified (locations,time) tuples while the edges represent moves from a location to another weighted by the probability of the transition. Note that for conciseness, we did not store the time information in the nodes of the state graph, but are showing it in the rightmost column.

Fig. 7
figure 7

State graph representing the user sequences when the user starts at location l 1 at time t 1

This state graph illustrates that if the user u i was at location l 1 at time t 1, there is 50 % probability that the user will stay at the same location l 1 at time t 2, 20 % probability to go to location l 2, 25 % probability to go to location l 3, and 5 % probability to go to location l 4, while the user has never travelled to locations l 5 or l 6 from location l 1 at time t 2 during the data collection period. At time t 3, the user who ended up at location l 1 at time t 2 has 70 % probability to stay at location l 1, and 30 % probability to go to location l 3, and so on.

To build the user profile utilizing the three-dimensional data structure, we perform the following tasks:

  1. 1.

    Read the model_data set.

  2. 2.

    Build a list of the user’s distinct locations (L i ).

  3. 3.

    Build and initialize a three-dimensional matrix of size (NT  ×  |Li|  ×  |Li|) where each 2-D plane represents a different starting location for a trajectory.

  4. 4.

    Identify the first record in the data set and keep track of its timestamp t 1 and location l 1. This location becomes the Starting Point, SSP, for this trajectory. Increase the frequency value and calculate the probability value Prob i (t 1l 1l 1).

  5. 5.

    Read the time stamp and the location of the next record t 2,

    • If t 2 − t 1 ≥ T STP , then this record will be considered a new SSP for a new trajectory, and the previous point is an SEP for the previous trajectory. Go to Step 5.

    • If t 2 − t 1 < T STP , we increase the frequency value by one and calculate the probability value Prob i (t 2l 2l 1).

  6. 6.

    We repeat the task #4 until we reach the end of the data set.

  7. 7.

    Create the UCL i list by eliminating all locations that the user visited less than 1 % of the time, (same reason as indicated in Sect. 3.1).

  8. 8.

    Create the new user profile with the NT rows, |UCL i | columns, and |UCL i | depth.

  9. 9.

    Replace the probability value, \(\hbox{Prob}_i(t_k,l_{j^{\prime}},l_j)\), in each cell in the new user profile with the weighted probability value, \(LOC{{\text{-}}}TIME{{\text{-}}}MOVE_i(t_k,l_{j^{\prime}},l_j)\).

  10. 10.

    The final user profile will have the sum of each row in each starting location matrix equals to one.

    $$ \begin{aligned} &\forall l_{j^{\prime}} \in UCL_i \,\hbox{and}\, \forall k \in NT\\ &\sum_{j=1}^{UCL_i}\, LOC{{\text{-}}}TIME{{\text{-}}}MOVE_i(t_k,l_{j^{\prime}},l_j)\,= \,1 \end{aligned} $$
    (4)

Upon completing this process for all the records in the model_data set, the user profile will be produced as a three-dimensional matrix as shown in Fig. 8. The rows represent the minutes of the day where each minute is a row, the columns represent the locations from the UCL i list, and the depth represents the starting locations from the UCL i .

Fig. 8
figure 8

Mobility model for user u i (Model #2)

For example, if we consider the user trajectory represented in the Fig. 7, the associated user profile would be the one shown in Fig. 8. In this example, the user starts at location l 1 at time t 1. The cell (t 2l 1l 1) indicates the weighted probability value for the user to be at location l 1 at time t 2, when he was at location l 1 in the previous record, LOC-TIME-MOVE i (t 2l 1l 1) = 50 %, and the cell (t 2l 2l 1) indicates the probability value for the user goes to location l 2 at time t 2, when he was at location l 1 in the previous record, LOC-TIME-MOVE i (t 2l 2l 1) = 20 %, and so on.

3.2.2 Anomaly detection

The anomaly detection process for Model #2 follows the same principle as in Model #1 anomaly detection process, where we focus on the evaluation of \(P_{trust}^{\prime},\,\hbox{FAR}^{\prime}\), and \(\hbox{FRR}^{\prime}\) to identify the system ability to detect attack. The calculation for \(\hbox{FAR}^{\prime}\) and \(\hbox{FRR}^{\prime}\) is the same as explained in Sect. 3.1.2, but the computation of trust values \(P_{trust}^{\prime}\) for each user is different.

Definition 5

The trust value \((P_{trust}^{\prime})\) for Model #2 is the trace joint probability value that represents a confidence interval of 90 % based on the user profile. All traces with probability value less than \(P_{trust}^{\prime}\) are considered attacks.

For Model #2 and since we capture the user movement, where the probability of the user to be in any location at any time is highly dependent on the previous location at the previous point of time, we calculate the Markov sequence probability value for each trace sample (S m ) rather than the cumulative probability value as follows:

$$ P_{S_m}^{\prime} = \prod_{(k,j,j^{\prime}) \in S_{m}} LOC{{\text{-}}}TIME{{\text{-}}}MOVE_i(t_k,l_{j^{\prime}},l_j). $$
(5)

The joint probability value is the product of the probabilities of all records in the trace sample S m , as indicated in the LOC-TIME-MOVE table. Equation 5 shows that if any record in the sequence has a probability of zero, which corresponds to the fact that the user has never moved between these two locations at this time before, the whole trace will be considered an attack because the \(P_{S_m}^{\prime} =0\). To reduce the penalty for deviation from the normal path, we introduce the concept of Trace Threat Level (TL), which represents the percentage of the records in the trace that has no representation in the user profile. Thus, if \(LOC{{\text{-}}}TIME{{\text{-}}}MOVE_{i}(t_k,l_{j^{\prime}},l_j) =0\), we eliminate this value from the calculation of the trace joint probability value and increase the Threat Level value by one. We use a threat level threshold of TL trust  = 10 % of the total records in the trace, based on empirical analysis.

As an example, Fig. 9 shows two paths; the solid curve represents the normal path in the user’s profile and the dashed curve represents the currently detected trajectory. In this example, the user profile indicates that when the starting point at time t 1 is location l 2, the normal path of duration T is \(l_2\rightarrow l_3\rightarrow l_4\rightarrow l_5 \rightarrow l_6 \rightarrow l_7\). In contrast, the captured user trajectory that starts at location l 2 at time t 1 consists of the sequence \(l_2\rightarrow l_1\rightarrow l_2 \rightarrow l_3\rightarrow l_4 \rightarrow l_5\rightarrow l_6\). To determine whether this is an expected or anomalous user behavior, we compare the joint probability of this path with the profile of the particular user. The calculated value should be equal to or greater than the trust value for that user.

Fig. 9
figure 9

User path analysis

To calculate the captured trajectory joint probability \(P_{SW_m}^{\prime}\) as indicated in Eq. 5, we first identify the starting point SSP = l j and the time t k . (Please note that we use \(P_{SW_m}^{\prime}\) to indicate trajectory probability which is used to perform the anomaly detection process, and \(P_{S_m}^{\prime}\) to refer to sample probability which is used for the calculation of \(P_{trust}^{\prime}\) value). Then, we check whether \(l_{j} \in UCL_{i}\) or not. If not, we increase the threat level \(\hbox{TL}\) value by one. Otherwise, we identify this location, l j as the starting location, and check the value LOC-TIME-MOVE i (t k l j l j ) to calculate the trace joint probability value \(P_{SW_m}^{\prime}\). Next step is to identify the next record, and read the data \(l_{j^{\prime}}\) and \(t_{k^{\prime}}\) from that record. If \(l_{j^{\prime}} \in UCL_i\), we obtain the weighted probability value \(LOC{{\text{-}}}TIME{{\text{-}}}MOVE_{i}(t_k, l_{j^{\prime}}, l_{j})\). If not, we increase the \(\hbox{TL}\) value again. This process is repeated for the entire user trace and, upon completion, we check whether \(\hbox{TL} \geq\hbox{TL}_{trust}\) or not. If it is greater than \(\hbox{TL}_{trust}\), this trajectory is judged to have been generated by someone other than the user, that is, an attacker. If not, we subsequently check the \(P_{SW_m}^{\prime}\) value. If \(P_{SW_m}^{\prime} \geq P_{trust_i}^{\prime}\), the trajectory is judged to belong to the user; otherwise, it is treated as a trajectory generated by an attacker.

4 Data reduction

We now aim to further improve the efficiency by reducing the size of the user model, with low impact on the accuracy of the attack detection. The reduction benefits are twofold:

  1. 1.

    reduce the amount of memory occupied by the user model, and

  2. 2.

    reduce the CPU time required to perform the detection process and therefore enhance the system performance.

The Reality Mining data set consists of 93 users with total number of distinct locations is in the range 3–100, and average of 28 locations. While the Geolife data set has 65 users with total number of distinct locations 134–1,200.

For these data sets, the user profile requires up to (1,440 × 1,200) ≈ 106 memory locations for Model #1 and up to (1,440 × 1,200 × 1,200) ≈ 109 memory locations for Model #2. We propose and analyze two different solutions to reduce the size of the models’ representations:

  1. 1.

    The first is called the Row-Merge algorithm, where the rows in the user model table LOC-IN-TIME i are combined if they fit this condition: \(\hbox{For\, each}\, t_k \in NT:\)

    $$ \forall l_j \in UCL_i \, (\hbox{Count}\, LOC{{\text{-}}}IN{{\text{-}}}TIME_{i}(t_k,l_j) \neq 0)\geq ThV, $$
    (6)

    where ThV is a threshold value that guarantees no detection degradation occurs due to the Row-Merge process.

  2. 2.

    The second algorithm is based on the MDLP (Minimal Description Length Principle) [3, 18].

We now present in detail the algorithms along with their complexity analysis.

4.1 Row-Merge algorithm

As discussed in Sect. 3.1, three properties identify a profile:

  • At any minute t k , user u i can exist in any location l j identified in the L i list.

  • At any minute t k , each user u i has to be located in one of the locations l j that are identified in the distinct locations list L i . Yet, by considering the UCL i list rather than the L i list, our user profile will have certain time of the day that is not accounted for therefore the new rule is as follows:

    $$ \forall{l_j}\in IF_i,\,\exists\,t_k\quad\hbox{where}\,\hbox{Prob}_{i}(t_k,l_j)=0. $$
  • Based on Model #1, the sum of all the probability values in every user profile is equal to 1:

    $$ \forall{u_i}, \,\sum_k^{NT} \sum_j^{|UCL_i|} LOC{{\text{-}}}IN{{\text{-}}}TIME_{i}(t_k,l_j)= 1. $$

Observation:

Although there is no rule preventing a user from being at any place at any time, the patterns indicated different findings. The user profile shows that at a certain time of the day (late night to early morning for example), some users are always at one place, and the probability value for the rest of locations equals zero.

Figure 10 shows the distribution of the time the user spends in each location every day, and it indicates that each user has only few locations that he spends most of the time at. For instance, although user u 12 has 13 distinct locations in his profile, there are only three locations u 12 visits every day (l 2l 8l 9). Similarly, user u 37 has 51 distinct locations in UCL 37, while he spends most of his time in seven locations (l 1l 6l 9l 19l 25l 28l 50).

Based on the above observation, we conclude that if we consolidated the time periods where a given user has been in few locations (less than ThV) into one row, we could significantly reduce the size of the matrix representing that user’s profile. The main idea is to consolidate consecutive rows where the total number of LOC-IN-TIME i (t k ,l j ) = 0 in that row is less than ThV. The Row-Merge approach is formalized by Algorithm 2.

figure b

The data input is the user profile consisting of NT rows and |UCL i | columns, while the output is a matrix with reduced number of rows RN and |UCL i | columns. The first step is to read the user profile. For each row, we count the total number of cells that are greater than zero as illustrated in Line 11. If the count is less than the threshold value ThV, then this row will be merged with the previous one in the LOC-IN-TIME i table as shown in Line 13. If the count of probability values that are greater than zero is greater than ThV, we keep this row and update the Time_Index i as shown in Lines 15–21. Upon completion, we initialize the Reduced_Model_Data(RMD i ) with the new number of rows as shown in Line 24 and save the new user model in this matrix as illustrated in Lines 26–30. The complexity of this algorithm is O(NT  ×  |UCL i |).

As an example, assume that Fig. 11 represents a user profile for user u i with eight distinct locations \(l_1, l_2,\ldots,l_8\). The size of this user profile is 1,440 × 8 = 11,520 numeric values. In this example, we illustrate the Row-Merge process for a threshold value ThV = 1 which indicates that all rows in the user profile that have at most one probability value that is greater than zero will be merged with the previous row. Figure 12 details this process.

Fig. 10
figure 10

The average number of minutes per day the user spent in each location

Fig. 11
figure 11

Example user profile

Fig. 12
figure 12

Row-Merge process

The total reduction in the matrix size is equal to the total eliminated rows (1,440 − Q) multiplied by 8, the total number of distinct location.

4.2 MDLP algorithm

In this section, we aim to compress the size of the user model by applying the Minimum Description Length Principle (MDLP), based on the following insight: any regularity in the data can be used to compress the data, that is, to describe it using fewer symbols than the number of symbols needed to describe the data literally. The more regularities there are, the more the data can be compressed [15].

Observation: The challenge in this approach is to identify regularity in our user model that is not completely regular; there is no single subset \(LOC{{\text{-}}}IN{{\text{-}}}TIME_i^{\prime\prime}\) in the user model LOC-IN-TIME i that we could identify as a hypothesis to regenerate the user model. In fact, it is acceptable to assume that our user model could consist of several subsets \(LOC{{\text{-}}}IN{{\text{-}}}TIME_{i_1}^{\prime\prime}, LOC{{\text{-}}}IN{{\text{-}}}TIME_{i_2}^{\prime\prime},\ldots, LOC{{\text{-}}}IN{{\text{-}}}TIME_{i_r}^{\prime\prime}\) that combined could be utilized to regenerate the user model LOC-IN-TIME i and therefore are used for model compression. On the other hand, we understand that there will always be some regular data sets in our user model which we will not be able to compress.

The Algorithm: The single user model for u i is LOC-IN-TIME i , where each row corresponds to a minute in the day t k , and each column corresponds a user distinct location l j , and each entry LOC-IN-TIME i (t k l j ) records the probability for the given user u i to visit the location l j at time t k . Given this, we focus on the temporal regularity in this data set in order to perform our user model compression. We would like to find a function H i that partitions the time into consecutive intervals, based on user model matrix, and minimizes the loss when combining/merging two rows.

To facilitate our discussion, we assume that we maintain the row sum (R i (k)) and the time frequency (T i (k)), for each row k in the user profile LOC-IN-TIME i (t k l j ), where

$$ R_i(k) =\sum_{j=1}^{|UCL_i|} LOC{{\text{-}}}IN{\text{-}}TIME_i(t_k,l_j) $$
(7)

and \(\forall\,records\,\in model\_data\):

$$ T_i(k) =Record\_Count\quad\hbox{where}\,(t_{k-1})<t_k\leq(t_{k+1}) $$
(8)

The time frequency represents the number of records in the data log that represents each time interval, for one minute intervals.

Then, the function H i (k) for each row in the user u i profile is calculated as

$$ \begin{aligned}Red_i(t_k,l_j)=\frac{LOC{\text{-}}IN{\text{-}}TIME_i(t_k,l_j)}{R_i(k)} \; \rm {and} \\ H_i(k) = - \sum_{j=1}^{|UCL_i|} \left(Red_i(t_k,l_j)\right) \times \hbox{log}\left(Red_i(t_k,l_j)\right). \end{aligned} $$
(9)

Clearly, this resembles the well-known entropy—a common measure for the information loss [15]. Combining two consecutive rows k and k + 1 and merge them into one interval k, k + 1, the cell value for each cell in the model is

$$ \begin{aligned} LOC{\text{-}}IN{\text{-}}TIME_{i}^{\prime}(t_k,l_j)= LOC{\text{-}}IN{\text{-}}TIME_{i}(t_{k},l_{j}) + LOC{\text{-}}IN{\text{-}}TIME_{i}(t_{k+1},l_{j}) \end{aligned} $$

and \(R_i^{\prime}(k)=R_i(k) + R_i(k+1)\); therefore, the function H i (k, k + 1) is calculated as

$$ H_i(k,k+1) = - \sum_{j=1}^{|UCL_i|} \left(Red_i(t_k,l_j)\right) \hbox{log}\left(Red_i(t_k,l_j)\right). $$
(10)

The information loss is measured by the difference between the entropy when we combine two rows into one and the combined weighted entropy sum from the two rows

$$ \begin{aligned} T_i(k)\times H_i(k) + T_i(k+1)\times H_i(k+1)-(T_i(k) + T_i(k+1))\times H_i(k,k+1)+ (|UCL_i|-1)\times log DS. \end{aligned} $$

where (DS = |model_data| − RIF i ) as calculated in Sect. 3.1.1.

Assuming that the matrix has NT rows and |UCL i | columns, the complexity of the matrix model can be written as (|UCL i | − 1) × log DS.

Given this, we can apply the MDLP (Minimal Description Length Principle) which is the sum of the model complexity and the overall entropy of the matrix

$$ \sum_{k=1}^{NT} [H_i(k) \times T_i(k)] + [(|UCL_i|-1)\times log {DS}].$$
(11)

A single greedy algorithm can simply choose the two consecutive rows which can produce the smallest information loss and then group them into one row. This procedure can be performed until the overall MDLP measure cannot decrease.

The MDLP algorithm has O(NT × |UCL i |) complexity. Algorithm 3 formalizes this approach.

figure c

As an example, assume that Fig. 10 represents a user profile for user u i with eight distinct locations \(1_1, l_2,\ldots,8_8\). The size of this user profile is 1,440 × 8 = 11,520 numeric values. In this example, we illustrate the MDLP process, and Fig. 13 details this process as follows.

  • First, we initialize a Time_Index i matrix.

  • We calculate the H i (k) value for each row k in the database and save it in the H matrix .

  • We run the MDLP algorithm against this user profile starting with the first two rows t 1 and t 2. In this example, H i (1) = 0, H i (2) = 1 and H i (1,2) = 0.93.

  • We calculate the information loss value:

    $$ \begin{aligned} T_i(1)\times H_i(1)+T_i(2)\times H_i(2)-(T_i(1)+T_i(2))\times H_i(1,2)+[(|UCL_i|-1)\times log {DS}]. \end{aligned} $$

    In this case, the information loss value is > 0, which indicates that we can combine these two rows together, and subsequently the first row in the reduced matrix will be

    $$ \begin{aligned} LOC{\text{-}}IN{\text{-}}TIME_i(t_1,l_j)=LOC{\text{-}}IN{\text{-}}TIME_i(t_1,l_j)+LOC{\text{-}}IN{\text{-}}TIME_i(t_2,l_j). \end{aligned} $$
  • We update both H matrix to have the H i (1) = H i (1,2), the time  frequency to have T i (1) = T i (1) + T i (2), and the Time_Index i matrix to have the first cell has the value t 2 to indicate that the first row in the reduced user profile represents the minutes t 1 and t 2 of the day.

  • We advance to the third row and examine the H matrix values for the rows t 1 and t 3. In this case, H i (1) = 0.93 and H i (3) = 0 while H i (1,3) = 0.81, and therefore,

    $$ \begin{aligned} T_i(1)\times H_i(1)+T_i(3)\times H_i(3)-(T_i(1)+T_i(3))\times H_i(1,3)+[(|UCL_i|-1)\times log {DS}]. \end{aligned} $$

    In this case, the value is less than zero, and we cannot combine the rows.

  • We repeat this process and scan the user profile row by row until we achieve a matrix with \(Q^{\prime}\times 8\) values and a Time_Index i matrix with \(Q^{\prime}\) values.

Fig. 13
figure 13

MDLP reduction process

The total reduction in the matrix size is equal to the total eliminated rows multiplied by the total number of distinct locations \((1,440 - Q^{\prime})\times 8\).

5 Experimental results

This section has two main parts. (1) We provide a detailed evaluation for our attack detection algorithms. We test our ability to build user profiles based on spatio–temporal traces and to detect anomalous behavior based on these profiles. We examine and compare the test results for both methods explained in Sect. 2.1.2. (2) We evaluate the effects of each data reduction algorithm in terms of reducing the user profile size and illustrate that this reduction has small impact on the detection accuracy.

5.1 Anomaly detection results

As discussed in Sect. 2.1.2, we used the Reality Mining [11] and Geolife mobility [38] traces for this evaluation.

Each user log was divided into two equal contiguous data sets: training data set (model_data) and testing data set (test_data) as described in Sect. 3.1. For each user, we randomly selected 100 samples from the test_data log with T duration. We repeated each test for four different T values (5, 15, 30, and 60 min). The T value is the detection delay.

5.1.1 Results for Model #1

For each user, we constructed models and calculated trust values P trust following the steps described in Sect. 3.1. Attacker behavior traces are not presently available. However, traces for different users are available.

In our previous study [35], we explained the detailed results based on the Reality Mining data set where we demonstrated that the Model #1 of our system is capable of detecting an attack with a 94.4 % accuracy rate within 15 min as shown in Fig. 14. This figure indicates that in case of theft, our system has 94.4 % chance of notifying the device owner of the theft within 15 min and 92.0 % chance of notifying the device owner of the theft within 5 min.

Fig. 14
figure 14

Accuracy in anomaly detecting and standard deviation results in relation to trajectory size based on Model #1 and Reality Mining data set

In this study, we evaluate the system’s ability to perform attack detection based on the Geolife data set. In Sect. 3, we have indicated that we have limited our user locations to an area of (138  ×  110) miles, and we mapped each (7  ×  8) meter to an area ID based on a (±0.0001°) change in each coordinate. Based on this information, and after eliminating all the records that do not fit within the Beijing area, we had a total 65 users with 100–1,200 distinct locations. The average was 780 distinct locations. We mapped these locations into area IDs (1–1,200).

Looking closer at the available location data, we notice a big difference among users and their trajectories. Figure 15 shows four different randomly selected users with their respective location histograms. This figure indicates that individuals tend not to travel very far, and when they do, they do not stay long. Although few users share some area IDs, the percentage of them visiting these areas is different, and the time of the day for these visits is different too. We reported similar observations for the Reality Mining data set [35].

Fig. 15
figure 15

Distinct location histogram for the Geolife data set. a user 20, b user 62, c user 123, d user 170

Our test results with regard to accuracy exhibit also similar patterns. We were able to achieve a 95.6 % accuracy detection rate when T = 5 min and 93.8 % accuracy rate when T = 15 min as shown in Fig. 16. We notice that the detection accuracy for the 5-min delay is better in the Geolife than the 15-min detection delay. We can correlate this to the fact that this data set did not have many long time sequences to use as a test sequence, which resulted in a decline in the detection accuracy associated with the longer delay. On the other hand, the granularity of the location information is finer for the Geolife than the Reality Mining data set, which also contributes to a higher detection rate for smaller values of T.

Fig. 16
figure 16

Accuracy in anomaly detecting and standard deviation results in relation to trajectory size based on Model #1 and Geolife data set

5.1.2 Results of Model #2

We followed the same steps described in [35] to calculate the \(\hbox{FAR}^{\prime}\) values based on Model # 2 user profile and probability analysis. We demonstrated in that study based on the Reality Mining data set that Model #2 is capable of detecting an attack with a 96.13 % accuracy rate within 15 min as shown in Fig. 17.

Fig. 17
figure 17

Accuracy in anomaly detecting and standard deviation results in relation to trajectory size based on Model #2 and Reality Mining data set

The Geolife data set allowed high detection accuracy too as shown in Fig. 18. The detection accuracy range was 96.0–90.5 %. Lower \(P_{trust}^{\prime}\) values are associated with the longer traces, which indicates that it is uncommon for most users to make large day-to-day changes in motion patterns affecting short intervals within a trace. However, longer intervals are more likely to change from day to day.

Fig. 18
figure 18

Accuracy in anomaly detecting and standard deviation results in relation to trajectory size based on Model #2 and Geolife data set

5.1.3 Model comparison

As illustrated in Figs. 14, 16, 17, and 18, the average accuracy is slightly better for Model #2 than for Model #1 for small sample intervals (less than 30 min)—but the standard deviation is significantly better in the Reality Mining data set, while the improvement is slightly there for the "Geolife" data set.

However, the cost of obtaining this small improvement in accuracy (≤2 %) for T = 15 min is expensive, considering the required memory to store the user profile: (NT  ×  |UCL i |) for Model #1, and (NT  ×  |UCL i |  ×  |UCL i |) for Model #2, along with the respective time complexity for anomaly detection O(TS  ×  NT  ×  |UCL i |) for Model #1 and O(TS  ×  NT  ×  |UCL i |  ×  |UCL i |) for Model #2. Therefore, our recommendation for this data set is to use Model #1 approach to achieve a high detection accuracy with lower memory requirements.

The simplicity of the resulting user models resulted in an efficient anomaly detection process supporting an average detection time 0.02 seconds, as shown in Fig. 19. A comparison between our results and those of existing systems is given in Table 1.

Fig. 19
figure 19

Anomaly detection elapsed time according to sample interval

Table 1 Comparison with existing theft detection systems

5.2 Reduction results

In this section, we evaluate the efficiency of the Matrix Reduction methods based on running both the Row-Merge algorithm and the MDLP algorithm. Our evaluation is based on the user profiles built in the previous sections. Efficiency evaluation includes

  1. 1.

    reduction rate,

  2. 2.

    detection accuracy in relation to the reduced profile,

  3. 3.

    algorithm complexity, and

  4. 4.

    elapsed time required to perform the reduction process.

5.2.1 Row-Merge algorithm

Processing the Row-Merge matrix reduction algorithm for all the users’ profiles in both data sets when ThV = 1 has shown an inconsistent reduction rate among the users. Figure 20 shows these results based on the Reality Mining data set. The reduction rate is high for the profiles with few distinct locations (<5), while it is low for the user profiles with more than five distinct locations. Yet, after performing the detection attack as described in Sect. 3.1, we noticed that the detection accuracy has not been impacted negatively, which is not surprising based on the analysis provided in the example in Sect. 4.1. Figure 21 illustrates the detection accuracy for all users based on the reduced user profile data when ThV = 1.

Fig. 20
figure 20

The reduction percentage based on the number of distinct location in the profile

Fig. 21
figure 21

The detection accuracy for the 93 users after applying the Row-Merge reduction algorithm

To further improve the reduction rate, we decided to explore the possibility of consolidating the rows that represent t k where ThV > 1. For each user u i , we examined several values for \(ThV=1,2,3,4,5,\ldots,/\frac{|UCL_{i}|}{3}\). The changes of the ThV value have improved the reduction rate and did not have a nominal negative impact on the detection accuracy. \(ThV=\frac{|UCL_{i}|}{3}\) produced the best results, where we have seen consistent reduction rate and high detection accuracy.

5.2.2 MDLP algorithm

Processing the MDLP matrix reduction algorithm for the same users’ profiles has shown a consistent reduction rate among all users as shown in Fig. 22 based on the Reality Mining data set. The total number of distinct locations had no effect on the reduction rate. In addition, the detection accuracy rate has not been impacted negatively (less than 1 % decrease in detection accuracy) as shown in Fig. 23.

Fig. 22
figure 22

The reduction percentage based on the number of distinct location in the profile based on the MDLP algorithm

Fig. 23
figure 23

The detection accuracy based on the MDLP reduction algorithm

We notice that the reduction rate for the MDLP algorithm is in the range (67.6–99.5 %) with an average of 87.6 %. This reduction came at a cost of only 1 % reduction in detection accuracy.

As we have indicated previously, these results are expected since users spend most of their times in a few locations, and they repeatedly visit these locations.

5.2.3 Row-Merge algorithm versus MDLP algorithm

In the previous subsections, we have illustrated that these two algorithms provide comparable data reduction percentages, in addition to a comparable detection accuracy rates as shown in Fig. 24. This figure shows the reduction rate and the detection accuracy based on the reduction algorithms for both the Reality Mining data set and the Geolife data set. We notice that the Row-Merge algorithm reduction rate is related to the ThV value, for example, when ThV = 1 the reduction rate average was as low as 34.5 % for Reality Mining data set, while it was 0.069 % for the Geolife data set. This rate has improved with the increase of the ThV value where the reduction rate reached the 91 % for the Reality Mining data set, and 93 % for the Geolife data set when \(\left(ThV=\frac{|UCL_{i}|}{3}\right)\). The main reason for this difference in the reduction rate between the two data sets is the number of distinct locations associated with each user; thus, we achieved a comparable results when we used \(\left(ThV =\frac{|UCL_{i}|}{3}\right)\). Similar results were achieved by the MDLP, which performed very well on both data sets.

Fig. 24
figure 24

A comparison of the reduction rate and detection accuracy results based on different ThV values for the Row-Merge algorithm and the MDLP algorithm

Regardless of the reduction rate, the detection accuracy rate was not affected significantly after data reduction for both algorithms and data sets. Therefore, it is hard to compare these two algorithms based on the reduction rate and the detection accuracy only.

Upon examining the algorithm time complexity, we noticed that both algorithms have O(NT  ×  |UCL i |) complexity. However, the MDLP algorithm has higher overhead because it accesses the user model several times for every reduction process; the first time to calculate the H function values, the second time to perform a first round of reduction, the third time to perform a second round, and if needed, forth and fifth time. On the other hand, the Row-Merge algorithm goes over the user model only one time and performs the reduction. In addition, the Row-Merge algorithm performs a simple comparison at every row with for the total number of nonzero values in the row with a pre-defined value ThV, and based on the results, the algorithm either merges the rows for reduction or skip to the next row. This simple logic makes the Row-Merge algorithm more efficient than the MDLP algorithm that required double the time to obtain the same reduction rate, and similar accuracy as shown in Fig. 25.

Fig. 25
figure 25

The elapsed time to perform the matrix reduction process

6 Related work

Spatio–temporal data management and efficient query processing techniques have been the topics of intensive research in the field of Moving Objects Databases [16]. In particular, trajectory analysis and similarity detection have yielded numerous research results in the recent years [9, 13, 22]. Several results from this arena have goals similar to ours. For example, Mouza and Rigaux [10] propose regular expression-based algorithms for detecting mobility patterns. However, those patterns do not explicitly model the temporal dimension of the motion, that is, the focus is more on routes than trajectories.

In order to improve application awareness during trajectory data analysis, Alvares et al. [2] proposed adding semantic information during trajectory preprocessing. Hung et al. [20] proposed the complementary approach of using a probabilistic suffix tree to measure separation among users trajectories. Xie et al. [32] addressed the problem of predicting social activities based on users' trajectories. In addition, Trestian et al. [30] used association rule mining to investigate the relationships between geographic locations and user habits for mobile devices.

Detecting malware in mobile devices usage is a topic that has been tackled via various formalism. Using temporal logic of causal knowledge as language, malicious behavior signatures were proposed by Bose et al. [5] for mobile devices running Symbian OS. A complementary approach based on diffusion over bipartite graphs was presented by Alpcan et al. [1], and another approach that studies Bayesian networks, RBF, KNN, and random forest is presented by Damopoulos et al. [8]. Fraud detection based on usage behavior has also been addressed [6], where the underlying classifier is based on artificial neural networks. While in our earlier work [34], we attempted to use file-access patterns to detect malicious use; in this work, our focus was on detecting deviations from individual spatio–temporal patterns.

A cloud-based framework to detect intrusions and to provide fast response for the mobile device is introduced by Houmansadr et al. [19]. Their goal is complementary to our approach of enabling the mobile devices themselves to detect a potential theft by comparing user’s trajectories.

Sun et al. [29] proposed mobile intrusion detection based on the Lempel–Ziv compression algorithm and Markov Chains. The proposed technique used three-level Markov Chains and did not consider the association between time of the day and the location. Their ability to detect attack using the proposed technique is limited to the times at which the user is making phone calls and moving faster than 60 miles per hour. Yan et al. [33] improved on this work, yet the delay in detecting attack was 24 h, since the traces were obtained once a day, with a sampling period of 30 min. Our technique has an attack detection latency of 15 min. Hall et al. [17] proposed an intrusion detection method based on mobility traces. Their focus was on public transportation traces in which the paths are pre-defined.

7 Concluding remarks

We presented an approach for detecting anomalous use of mobile devices. Our system uses spatio–temporal mobility data to build models that have high anomaly detection accuracy. Combining the spatio–temporal model (for users with few locations) and trajectory-based model (for users with many locations) allowed an average attack detection rate of 94.48 % for Model #1 and 96.13 % for Model #2, with a detection latency of 15 min. To further improve the efficiency of this system, we applied a couple of data reduction algorithms (Row-Merge, MDLP), which enabled us to obtain high reduction rate while still capable of detecting attacks with a 94 % accuracy.

One possible extension is to enrich the model by allowing nonzero probabilities to capture the cases of the owner visiting new locations. In addition, we would like to investigate the quality of accuracy, along with other trade-offs that may be involved when our approach is dealing with prolonged disconnections from a server. We also plan to expand this study to incorporate/couple different context dimensions (e.g., call-patterns and application logs) in order to improve the detection accuracy.