1 Introduction

The aging of the population is one of the most important challenges for public healthcare sector in the twenty-first century. Mild cognitive impairment (MCI) is one of the most prevalent diseases suffered by the seniors. People suffering from MCI (i.e. patients) may forget their destination while moving from one area to another. As a result, strange trajectory patterns are obtained and the so-called wandering episodes occur. Fortunately, technology allows their movements to be continuously tracked, and hence, appropriate real-time assisting services to address their difficulties in navigational tasks can be provided. Otherwise they may get lost, which can cause serious injuries or even their death.

GPS and navigation applications are key enablers to many services offered by mobile devices, from state-of-the-art products to low-cost smartphones. As a result, tracking and monitoring systems have emerged as a good solution to assist elderly people during their outdoor mobility issues. In such systems, the patient makes use of a wearable or a smartphone that is capable of obtaining its location (using GPS technology). In addition, the system is able to warn patients or to send alarms to caregivers. Monitoring systems contribute to self-care and reduce stress on patients’ relatives and caregivers as well. Moreover, cognitive environments and smart cities pave the way to the deployment of advanced assistance services for seniors in the smart health and cognitive health paradigms [37].

Some of the existing tracking and monitoring systems require the patients to interact with smartphone application in a variety of ways (e.g. checking in to their destination [13], pushing a panic button [34] or selecting the destination area [42]. However, monitoring systems should take into account the inability of some patients to interact with their devices. Hence, it is essential that monitoring systems nothing is required from the patient except to carry the smartphone.

It is commonly assumed that elderly people follow regular mobility routines, i.e. they visit the same locations and walk through the same routes from one area to another. This fact makes it possible to detect the abnormal movements. Notwithstanding, most of the proposals are unable to work well unless enough mobility data are available. For example, to detect the pacing pattern in the wanderer movement, he/she must move back and forth between two points or more. However, the abnormal situation should be discovered in its first appearance to prevent the person from getting lost in advance.

Next location prediction can predict the time when a person is going to be present at particular locations [18]. In other words, it can be used to detect anomalous behaviour such as when a disabled patient or child is expected to be at a certain location but is not. Further action can be initiated such as an emergency call. In addition to that, presence prediction could be useful in other scenarios such as an intelligent postal service.

The convolution technique is shown to be quite effective in exploiting the correlation of various types of data which is considered as the key to the success of CNN for a variety of tasks [22]. CNN is especially adapted to various computer vision tasks because of their ability to abstract representations with local operations. The use of CNN has achieved significant success in various applications such as computer vision [32, 33, 43, 44], speech [7, 19, 39, 48] and natural language processing [6, 45]. A CNN assumes that there is a very specific structure in the data, where inputs that are close to each other are related, whereas inputs that are further away are less related. In images, this makes sense since we normally have patches of similar texture, lighting and colour. In text, words that occur close to each other in a sentence or a paragraph are more likely to share some semantic meaning. In sound, there are similar patterns where the sound spectra at time steps close to each other tend to be similar and particular transitions from one phoneme to another are more common than others in a given language. In people’s mobility modeling, the movement flow pattern of individuals from one location to another is required to extract. One promising approach is to adopt the fast, scalable and end-to-end trainable CNN. Although CNN is originally designed to cope with image data, it can be used for sequence modeling tasks such as location prediction.

At initial stages of dementia and other cognitive deficits due to age-related memory loss, the system can be used to provide information related to early disease diagnosis by assessing their movement behaviour in case of abnormal movements.

1.1 Contribution and plan of the article

We propose a monitoring system that can assist elderly people during their outdoor movements. The system contains two models. The first one is called SpaceTime-Convolutional Neural Network (ST-CNN), which runs convolutional neural networks (CNNs) in a server utilizing the senior’s historical movement data. This model is responsible for predicting the locations that the elder will visit. Based on that, the route and the expected time spent to reach that locations are obtained. In our system, all information related to the movement should has a key role in order to identify erroneous behaviours. During a movement, the caregivers are warned in case the patient spends a long time to reach one of the predicted locations or moves to unpredicted locations by changing the routes or still moves in the same area. Furthermore, we demonstrate how the abnormal movements can be detected and how the system assists the elderly people to be safe in real-time movement scenarios. It is based on the second model called abnormal behaviour detection (ABD) model that takes advantage of recurrent neural networks (RNNs) to analyse time- and space-related variables. In order to evaluate the system, three different datasets are used, each one of them with its own descriptors. First, outdoor trajectories from Catalan patients diagnosed with MCI are used. Second, two additional online datasets, which contain trajectories from individuals (not necessarily suffering from MCI) are also used; GeoLife and OpenStreetMap datasets.

The system is autonomous, and hence, no explicit input from the user is required. The system is able to learn about a user’s movement behaviour. With the aim of minimising patient interventions or interactions with the monitoring system, our proposal generates the predicted destinations and detects the abnormal behaviour based on deep learning models. In our system, the monitoring is performed in real-time and the abnormal behaviour is immediately detected.

The rest of the paper is organized as the following. Section 2 presents the related work. Our proposal is explained in Sect. 3. Next, we present and discuss the experiment results in Sect. 4. Finally, the article concludes in Sect. 5.

2 Related work

There are a number of proposals in the literature, some of them are related to methods and heuristics focusing on locations and trajectories, assessed using trajectories datasets, whereas some others address real implementations of prototypes and products.

The earliest works in this field began to exploit the combination of mobile phones and GPS receivers to track elderly people. [31] proposed Opportunity Knocks (OK), to guide and assist people with MCI when they are hesitant about their destinations. Smartphones together with physically separated sensor beacon devices, Bluetooth sensor beacon and General Packet Radio Service (GPRS), were carried by patients. The elderly people were asked to specify where they wanted to go, which is not applicable in case of those with MCI. Our proposed system is autonomous, and hence, no explicit input from the user is required. A Hierarchical Dynamic Bayesian Network model was used to predict the on-going route using their previous routes. When it comes to connectivity and latency, combining various platforms in a robust and effective manner is challenging.

In recent study, [5] proposed an app called SOD which includes frontend software for portable devices and a cloud-based backend system. The frontend collects and sends location data to the backend system for processing. The backend is designed to be flexible enough to allow for the deployment of artificial intelligence techniques to detect wandering and getting lost situations, while being able to process a high volume of data efficiently and consistently. The authors left testing the app with real users, comparing it with other methods and discussing automatic detection of abnormal activity for future work.

A simpler solution addressing dementia is OutCare [42], which raised alarms when significant deviations from the daily routines were found. The system was tested with dozens of participants, but aged under 50 (not from elderly groups) and the deterioration capabilities were not mentioned. However, the lack of elderly participants leaves questions regarding the system validation. Our system is evaluated on dataset contains the daily trajectories of elder individuals suffering from MCI, gathered during the SIMPATIC project.

In [47], the authors proposed a social network of caregivers to locate and secure wandering Alzheimer’s disease patients. The proposed system is dependent on the safe-zone idea. The person with dementia wears a GPS and GSM SIM-equipped wrist-watch tracker that sends location data to a remote server. Caregivers carry a smartphone with a communication management app installed. To check the safety status of a person with dementia, an auto intervention mechanism is used. Through the tracker, a no-voice communication GSM call is initiated to inquire about the state of the person with dementia. By making a similar call through the tracker, the patient can ensure a safe status. When a person with dementia fails to make a safety response call, the system alerts the caregivers. However, wandering behaviour detection systems should take into account the inability of some patients to interact with their devices.

The iRoute system [13] was proposed to track people with dementia during their outdoor movements and assist them in case of disorientation. The system was capable of learning new routes and guide the patients towards learned routes if they were lost. The system followed a Belief-Desire-Intention agent model using the preferences and historical records of wanderers.

Vuong et al. [41] proposed a framework using both classical machine learning classifiers and long short-term memory network (LSTM) to recognize dementia-related wandering patterns. The orientation data are used in the study which is gathered from inertial sensors in mobile devices. Experimental results show that LSTM achieves the best results compared to classical machine learning classifiers. The time series data are utilized to classify the movement characteristics: direct, pacing, lapping and random where each time series is associated with one of the four classes only.

The problem of wandering recognition in indoor environments is addressed in [14]. The Random Forest, Neural Network and the Long Short Term Memory (LSTM) networks are utilized for the wandering behaviour prediction. The results show that the LSTM network achieved the best prediction results. The proposed models are evaluated using datasets that include simulated wandering movements but do not include real-world data from dementia patients.

Lin et al. [24] presented a method to determine if the people with MCI were wandering by searching sharp changes of directions along with their GPS traces. This work was based on the assumption that inefficient patterns (e.g. random, pacing and lapping) have a loop-like locomotion nature, and the direction changes are highly frequent in this kind of patterns.

The SIMPATIC project [26] focused on the development of an autonomous system that monitored real-time trajectories from people with dementia, also counting with an application for the caregivers that received alarms under certain circumstances. A server that processed the locations received, extracted features from the on-going trajectory, and raised alarms to the caregivers’ application when needed. The system was tested with 16 patients diagnosed with early or middle stages of dementia from the area of Tarragona (Catalonia, Spain).

Khodabandehloo and Riboni [16] proposed a collaborative learning approach for recognizing symptoms of cognitive impairment in smart homes. The abnormal movements are identified using a personalized model that takes into account the design of the home and the personalities of the people who live there. The proposed approach is evaluated using dataset that is collected and labeled by researchers for people with dementia, MCI persons and cognitively healthy people. The recognition approach assumes that a single person lives in the home; however, multi-inhabitant scenarios introduce additional challenges.

In Sposaro et al. [38], Bayesian theory was used to calculate a wandering probability. The authors implemented the iWander application, which asked the wanderer if he/she was disoriented when a possible wandering behaviour was detected. In case of disorientation, the application guided the patient to a safe area and then notified caregivers. In contrast to the SIMPATIC solution, iWander needed the interaction of the wanderer with button prompts, which may pose some trouble for elderly.

LaCasa [12] used Markov decision process and contextual information to provide wandering assistance, whereas learned from the trajectories of the wanderers using Bayesian methods. The authors assumed that the individuals were at a known location as long as their smartphones were connected to a known WiFi. This assumption was not always true where individuals might be wandering even in well-known areas.

Lin et al. [25] proposed a method called Isolation-Based Disorientation Detection to detect abnormal trajectories. Patient’s trajectories previously collected were modeled as a graph in which the vertices are the frequent visited locations, and the edges are the routes among those locations. Presence of looping inside the graph or deviation of a defined route were considered as potential instances of disorientation.

While tracking elderly people is considered as a violation of privacy rights and loss of independence, the relatives and caregivers consider such tracking as a solution to keep elderly people safe. In this context, researchers proposed balanced solutions to support and consider the views of all parties [8, 20].

3 The proposed system: SafeMove

In this section, first we overview our proposed system and then we describe with details all system components.

Fig. 1
figure 1

SafeMove system architecture. The server side consists of six parts to track and monitor patient movements to ensure that the patient is located in a safe area. The client side consists of the patient’s smartphone application which sends patient’s locations to the server and displays the system output

The overall architecture of the system is shown in Fig. 1. The system contains seven parts. The primary parts are as follows:

  1. 1.

    Monitoring Unit: is the main part in SafeMove which is responsible for receiving and distributing the data from and to the different system parts. Moreover, each patient movement is analyzed in this unit to determine whether or not he/she is located in a safe area.

  2. 2.

    Prediction Unit: is the part that is responsible for predicting people’s mobility.

  3. 3.

    Places Identification Unit: detects the significant places in the patient movement area and provides mainly the place ID that represents those places.

  4. 4.

    Abnormal Detection Unit: is the part that is responsible for detecting the abnormal movement of the patient.

The secondary parts are as follows:

  1. 1.

    Patient’s smartphone application: is responsible for sending the data related to the patient’s positions to a remote server and displaying the outputs of the system.

  2. 2.

    Alert Unit: is responsible for sending notifications to the relatives or caregivers.

  3. 3.

    Assistance Unit: is responsible for helping elderly people until reaching the desired and safe destination.

The overall functioning of the system is as follows:

  1. 1.

    Mobility data previously collected is converted from GPS coordinates into discrete values associated with specific places, in the Place Identification Unit.

  2. 2.

    The output is then sent to the Prediction Unit, where each patient has a different prediction model.

  3. 3.

    The Patient’s Smartphone Application provides the Monitoring Unit with the current GPS coordinates and the timestamp through the available Internet connection.

  4. 4.

    After receiving patient current location (GPS coordinates and the timestamp), the Monitoring Unit sends these information to the Places Identification Unit to provide the place ID, and then to the Prediction Unit to predict the next places.

  5. 5.

    Next, ‘N’ locations are sent to the Monitoring Unit which in turn finds the routes and computes the expected time spent to reach that locations.

  6. 6.

    The Monitoring Unit forwards back all the system output information to the Patient’s Smartphone Application for displaying through the user interface (will be available for reference for the patient).

  7. 7.

    Every time threshold (5 seconds, for instance), the Monitoring Unit continuously receives the patient current location from Patient’s Smartphone Application. Then, each patient movement is analyzed to detect the abnormal behaviour using the Abnormal Detection Unit.

  8. 8.

    In case of an abnormal movement behaviour, warning notifications are sent to the Alerts Unit.

  9. 9.

    The Alerts Unit will notify the relatives or caregivers by sending alerts.

  10. 10.

    Finally, the Assistance Unit is activated to help the elderly patient to reach a safe place.

Next sections cover with details the individual components. We specially focus on the Prediction, Monitoring and Abnormal Behaviour Detection units.

3.1 Patient’s smartphone application

Due to the widespread usage of smartphones, they have been used as our client hardware. The system is designed to run on Android-enabled devices with GPS in mind. Collecting data from the device’s sensors such as GPS location and timestamps can run as a background service. The system uses the built-in location technologies of the smartphone for location purposes without any mobile user interaction.

3.2 Places identification unit

This is the part of the system that is responsible for mapping each GPS point into the matching place, using place IDs.

Definition 1

A GPS point is a tuple \(G = (x_i, y_i, t_i)\) where \((x_i, y_i)\) denotes a GPS coordinate, \(x_i\) is latitude, \(y_i\) is longitude, and \(t_i\) is the timestamp when GPS coordinates were recorded.

We convert the sequence of GPS points of each user mobility data into a sequence of places by detecting the significant places [3]. Therefore, the mobility routines can be obtained.

Definition 2

We denote the set of places by \(\mathcal {P} = \{p_1, p_2, \dots \}\) and the set of movement times by \(\mathcal {T} = \{t_1, t_2, \dots \}\). The user trajectories are represented as a sequence of movements \(\mathcal {M} = \{\mathcal {M}_1, \mathcal {M}_2\), ..., \(\mathcal {M}_n\}\), where n is the length of the user’s trajectories. Each movement \(\mathcal {M}_i\) is represented by a tuple \((p_i, t_i)\) where \(p_i\) \(\in\) \(\mathcal {P}\) is the place identifier and \(t_i\) \(\in\) \(\mathcal {T}\) is the corresponding movement time.

To detect the significant places, we use firstly the algorithm proposed by [23]. The significant places represent those spatial regions where the user has stayed for more than a pre-determined time threshold providing that the distance between the start and end points of the place are under a specific threshold. Secondly, the obtained places are clustered into several geospatial regions using Density-based spatial clustering of applications with noise (DBSCAN) [9], and then each place is given a unique ID. Finally, to formulate the place history of each user, the GPS points located in the same region of the clustered places are replaced by their IDs. Furthermore, during real-time movement, each GPS point sent to the system is converted into a place ID it belongs to.

Fig. 2
figure 2

ST-CNN model architecture. The place and time vectors are the input to the model, while the output is the next place the person will visit

3.3 Prediction unit

The overall structure of ST-CNN model is shown in Fig. 2. The ST-CNN model (Algorithm 1) is composed of an embedding layer for both places and time (lines 3–6) followed by a convolutional and max pooling layers (lines 7–16), fully connected network (lines 17–19) and a softmax classification layer (lines 20–23).

figure a

Given a user u in a place \(p_i\) at time \(t_i\), the ST-CNN predicts user’s future place \({\hat{p}}_{i+1}\) on the basis of his/her historical movement records, which is processed in a sliding window from \(\mathcal {M}^u_{i-w}\) to \(\mathcal {M}^u_i\), by modeling:

$$\begin{aligned} P({\hat{p}}_{i+1} = p_j | \mathcal {M}_i, \mathcal {M}_{i-1}, \ldots , \mathcal {M}_{i-w}) \end{aligned}$$
(1)

where \(p_j\) is a place \(\in\) \(\mathcal {P}^u\) and w is the number of visited places taken as inputs to the model.

The input layer consists of two vectors. The first vector \(p_i \in \mathbb {R}^N\) is the place ID where N is the number of places. The second vector \(t_i \in \mathbb {R}^M\) represents the leaving time (in hours) from the place [4] and M is the number of hours per day (i.e. 24 hours). These two vectors are encoded using one-hot encoding [11] (i.e. for a given input data, only one out of the vector values will be 1, and all the others are 0).

The two input vectors are passed through an embedding layer in order to learn a meaningful representation of the places and the leaving times input features. This representation enables the model to capture the embedded semantic information about user behaviour and as a consequence improving the prediction performance. The Place embedded matrix \(P\!e \in \mathbb {R}^{N\times d_p}\) represents a set of places where \(d_p\) is the dimensionality of the embedded vector. Similarly, \(T\!e \in \mathbb {R}^ {M\times d_t}\) is the Time embedded matrix that represents a set of times and \(d_t\) is the dimensionality of the embedded vector. If w is the number of movements \(\mathcal {M}\) taken as inputs to the model, then \(pe \in \mathbb {R}^{w\times d_p}\) and \(te \in \mathbb {R}^{w\times d_t}\) are the place and time inputs, respectively, as follows:

$$\begin{aligned}&pe_i = p_i \cdot Pe \end{aligned}$$
(2)
$$\begin{aligned}&te_i = t_i \cdot Te \end{aligned}$$
(3)

To learn to capture and compose features of movement sequences, the neural network applies a series of transformations to the place and time input matrices using convolution, nonlinearity, pooling and concatenation operations.

A convolution operation involves a filter \(F_i \in \mathbb {R}^ {k\times d_p}\), which is applied to each window size w comes from \(P\!e\) matrix \(\{pe_{1:k}, pe_{2:k+1}, \dots pe_{w-k+1:w}\}\) where k is the filter height. This operation results in a vector \(P_{conv_i} \in \mathbb {R}^ {w-k+1}\) which is computed as follows:

$$\begin{aligned} P_{conv_i} = f(F_i \otimes pe_{i:i+w-1}+b) \end{aligned}$$
(4)

where \(\otimes\) is the element-wise multiplication, and f is a nonlinear function such as a Rectified linear unit (ReLU) and b is a bias vector.

The previous operation is applied using a single filter. For a richer feature representation of the input data, we apply a set of filters that work in parallel generating multiple feature maps. If the number of filters are |F|, then a feature map matrix \(P_{conv} \in \mathbb {R}^ {w-k+1\times |F|}\) is obtained.

$$\begin{aligned} P_{conv} = [P_{conv_1}, P_{conv_2}, \dots P_{conv_{|F|}}] \end{aligned}$$
(5)

The same procedures are applied to the time input matrix and finally, feature map matrix \(T_{conv} \in \mathbb {R}^ {w-k+1\times |F|}\) is obtained.

After passing the convolutional layer outputs (\(P_{conv}\) and \(T_{conv}\)) through the activation function, it is passed to the pooling layer in order to aggregate the information and reduce the representation. We apply a max pooling operation which simply returns the maximum value to capture the most important feature \(P_{pool}\) and \(T_{pool}\).

The output of the pooling layer \(PT_{concat}\) is passed to a fully connected softmax layer to obtain the probability distribution over the places.

$$\begin{aligned} P(\hat{p}_{{i + 1}} & = p_{j} |{\mathcal{M}}_{i} , \ldots ,{\mathcal{M}}_{{i - w}} ) = \hat{y}_{i} \\ & = softmax(PT_{{concat}} \cdot V + b_{o} ) \\ & = \frac{{e^{{PT_{{concat}} \cdot V_{i} + b_{{s_{i} }} }} }}{{\sum\nolimits_{{j = 1}}^{N} {e^{{PT_{{concat}} \cdot V_{j} + b_{{s_{j} }} }} } }} \\ \end{aligned}$$
(6)

where V and \(b_o\) are the model parameters to be trained.

Adam optimization algorithm [17], an extension of Stochastic Gradient Descent (SGD), is used to train the network, while Back-Propagation Through Time algorithm [35] is used to compute the gradients. The model parameters are \(\theta = [P\!e, T\!e, F, V, b, b_o]\) where \(P\!e\) and \(T\!e\) are the embedding matrices of the places and time, respectively. F is the set of filters, and b is the convolutional bias. V and \(b_o\) are the weight and bias of the softmax layers, respectively. The cost function used is the cross entropy which is defined as:

$$\begin{aligned} J = -\sum _{i=1}^{n} y_i\cdot log({\hat{y}}_i) \end{aligned}$$
(7)

where n is the number of training samples, y is the real user next location, and \({\hat{y}}\) is the predicted next location probability.

3.4 Monitoring unit

Monitoring unit is the core component that is used to track and monitor movements, in addition to distributing the data from and to the different system parts. Due to the necessity to take action as soon as possible, the system must work in real time. Therefore, every time the system receives a new GPS location, the monitoring unit is executed to ensure that the patient behaves normally.

There are four system statuses: no moving, predicting, moving and stop. The system status is set to no moving to indicate that the patient is indoor such as at night or inside a building. The patient movement can be detected using the GPS and accelerometer sensor [46]. Therefore, when the patient is moving, the patient’s smartphone application will send the current position to the system. Then, the monitoring unit will operate and forward the current position to the prediction model. Thus, the system status is set to predicting. After user progress to one of the predicted location, the system status is set to moving to indicate that there is no need to predict and every patient movement will be monitored. After reaching to one of the predicted locations, the system status is set to no moving. The status property is set to stop where the system will stop working in case the patient in a holiday with his/her relatives, for instance.

figure b

During the movement from a region to another, the state of a user generally changes from indoor to outdoor and vice versa. The abnormal behaviour could happen in the outdoor environment. In this paper, we use the term “sub-trajectory” to refer to the movement between two regions. Each sub-trajectory starts from the first GPS point in the first region until the first GPS point belongs to the next region. Given a trajectory, we divide it into sub-trajectory and each one is labeled as normal or abnormal.

[28] investigated travel patterns of nursing home residents with dementia. Four different travel patterns were identified: direct, random, pacing and lapping. In addition to that, we add two different patterns related to the predicted location: indirect and stopping patterns (Fig. 3).

The movement behaviour of a person is considered as abnormal if one of the following constraints are satisfied (Fig. 3):

  1. 1.

    Time spent to reach the predicted place is more than the expected time.

  2. 2.

    Taking the opposite direction for all predicted places.

  3. 3.

    The trajectory is random, pacing and lapping.

Note that the patient tends to forget the location and as a result, the patient may spend a long time to reach one of the predicted locations. The patient may also change the direction to unpredicted location or may wander in the same area.

The main steps of the SafeMove system are described in Algorithm 2 that takes a sequence of GPS points either from patient’s mobility data previously collected or in real time. The place ID of the current location is obtained by calling Places identification unit (line 3). Based on the obtained place ID at a certain time, the ST-CNN model will predict ‘N’ places (line 4), three places are specified in the algorithm. Based on the predicted places, the routes, the distances and the expected times to reach that places are obtained (lines 5-7).

The distance is computed using Haversine Distance, Eq. 8:

$$\begin{aligned} distance = 2r \arcsin \left( \sqrt{ \sin ^2(\frac{\Delta \phi }{2})+ \cos \phi _1 \cdot \cos \phi _2 \cdot \sin ^2(\frac{\Delta \lambda }{2}) }\right) \end{aligned}$$
(8)

where r is the radius of the sphere, \(\phi\) is latitude, \(\lambda\) is longitude and \(\Delta \phi\), \(\Delta \lambda\) are the differences in latitude and longitude, respectively.

After user progress on a route, each user movement is analysed. Three different values are evaluated during the movement: direction, distance and time. The direction is computed between the current GPS point and the previous one (line 9). The spent distance is computed between the current and the first GPS points (line 10), while the updated distances are computed between the current GPS and the predicted places (line 11). Finlay, the spent time is computed between the current time and the first GPS point time (line 12).

After that, the system updates the route by matching the current route with all other predicted routes (lines 19-21) taking into account that the patient can take different routes to the predicted places. The obtained spent distance is compared with the distances to the predicted places (lines 22-24) which can be used to detect the change in the direction. If the spent distance value are increased, it means that there is a change in the direction. The updated distances are compared with the distances to the predicted places (lines 25-27) to know exactly the place the patient is very close to. However, if the updated distance still almost similar, it is an evidence of stopping pattern. The obtained spent time is also compared with the expected time to reach the predicted places (lines 24-26). Regarding lines 14-18, they are related to the abnormal detection unit which is described in the next subsection.

Meanwhile, the new position is compared against the predicted places in order to know exactly the place the patient is very close to. During the movement, the patient is represented in a geographical map, thus, relatives or caregivers can keep track of the patient’s progress continuously. Moreover, using various calculated values, several conditions can be performed to ensure that every movement of the patient is fully monitored.

Fig. 3
figure 3

Movement behaviour types: direct(trajectory to P2), random(trajectory to P3), lapping (trajectory to P4), pacing (trajectory to P5), stopping (trajectory to P1) and indirect patterns (trajectory to unknown place)

3.5 Abnormal detection unit

Fig. 4
figure 4

Bearing and directions

This part is responsible for detecting the abnormal behaviour in the elder movements that can occur due to the random, pacing and lapping pattern. For the sake of brevity, such variety of travel pattern will be referred to as random.

In order to detect the random pattern, we use the Bearing Angle formula that used to find the direction between two consecutive GPS locations. Bearing can be defined as an angle or direction between the north-south line of earth (initial bearing) and the line connecting two GPS points, see Fig. 4a. Given two GPS locations: \(G_{1} (\phi _1, \lambda _1)\) and \(G_{2} (\phi _2, \lambda _2)\), the bearing is computed using Eq. 9:

$$\begin{aligned} \beta = atan2 \left( \sin \Delta \lambda \cdot \cos \phi _2, \cos \phi _1 \cdot \sin \phi _2-\sin \phi _1 \cdot \cos \phi _2 \cdot \cos \Delta \lambda \right) \end{aligned}$$
(9)

where \(\phi\) is latitude, \(\lambda\) is longitude, and \(\Delta \lambda\) is the difference in longitude. The output of this formula is a value between 0 and 360.

Fig. 5
figure 5

Changes in directions

As shown in Fig. 4b, we have four intervals that describe the four directions a user could move towards: (45-135), (135-225), (225-315) and (315-360 and 0-45). These intervals are normalized in order to obtain the similar corresponding directions. We used four values: 1 (45-135), 2 (135-225), 3 (225-315) and 0 (315-360 and 0-45). If the normalized bearing values belong to the same interval, it means that the points are located in the same line Fig. 5a; otherwise, the points are in different lines (Fig. 5b).

From the above analysis, we can see that the change in the bearing value intervals can serve as an indicator for random or abnormal behaviour. However, human behaviour mobility is essentially contains change in direction in the daily life. In order to distinguish an abnormal from normal behaviour, we add two threshold variables. The first one refers to the number of GPS readings that the system should wait before the final decision (i.e. normal or abnormal behaviour). The second one is length of the trajectory being evaluated (i.e. the distance between the first and last GPS points of that trajectory should be greater than a distance threshold). In this paper, we consider a sequence of 20 GPS points within 100 meters as abnormal behaviour. Those values can be adjusted to meet individual needs.

Table 1 Examples of pattern evaluation

If the monitoring system read the patient position every 5 seconds, 20 GPS points will be read within almost 1.6 minutes. This is means that if we have four different direction values, then number of different possible sequences are \(4^{20}=1,099,511,627,776\). Therefore, in order to train the abnormal detection model, we build a dataset which contains that number of records. Then, we manually label a portion of that records with normal or abnormal behaviour based on the principles proposed in [28], see an examples in Table 1. Then, we used the labeled dataset to train RNNs model in order to label the remaining unlabeled records. Finally, we used all dataset records to train ABD model using RNNs as follows.

Fig. 6
figure 6

Abnormal behaviour detection model. The direction sequences are the input to the model, while the output is normal or abnormal behaviour

The user directions are represented as a sequence of values \(\mathcal {S} = \{s_1\), ..., \(s_n\}\), where \(s_i\) \(\in\) \(\{0,1,2,3\}\) and n is the length of the user’s directions. Given a user u with a sequence of directions, the model classifies user’s behaviour as normal or abnormal.

Figure 6 shows the graphical illustration of the model. The model contains an input, embedding, recurrent and classification layers as well as inner weight matrices.

The input layer consists of one vector \(s_i \in \mathbb {R}^N\) which represents the direction value where N is the number of different direction values. This vector is encoded using one-hot encoding then passed through an embedding layer to produce a vector with d dimension. If the number of direction values is N and the dimensionality of the embedded vector is d, then the dimensionality of the embedded matrix \(S\!e\) is \(N\times d\) where \(S\!e\) represents a set of direction values. The embedded vector \(se_i \in \mathbb {R} ^ d\) is given by multiplying the embedded matrix \(S\!e\) and the input vector \(s_i\).

$$\begin{aligned} {se}_i = s_i\cdot S\!e \end{aligned}$$
(10)

The values of the recurrent layer \(h_i \in \mathbb {R}^ {d_h}\) are computed as below where \(d_h\) is the dimensionality of the recurrent layer vector:

$$\begin{aligned} h_i = f \Bigg ({se}_i\cdot W\!S + h_{i-1}\cdot W\!h_{i-1} + bh \Bigg )\ \end{aligned}$$
(11)

where \(W\!S \in \mathbb {R} ^ {d \times d_h}\) and \(W\!h_{i-1} \in \mathbb {R} ^ {d_h \times d_h}\) are the weight matrices and \(bh \in \mathbb {R}^ {d_h}\) is the hidden layer bias. Hyperbolic tangent is used as the nonlinear activation function for the recurrent layer.

$$\begin{aligned} f(x) = \frac{1-e^{-2x}}{1+e^{-2x}} \end{aligned}$$
(12)

The classification layer \(\hat{y}_i \in \mathbb {R}\) produces a scalar value ranges from 0 to 1. The value more than 0.5 means normal class, while less than 0.5 is abnormal class. Its values are computed as:

$$\begin{aligned} {\hat{y}} = g(h_i\cdot W\!h + bo) \end{aligned}$$
(13)

where \(W\!h \in \mathbb {R}^ {d_h \times N}\) represents the weight matrix between the hidden and output layers and \(bo \in \mathbb {R}^ {N}\) is the output layer bias. The classification layer is a Sigmoid layer which is suitable for this case.

$$\begin{aligned} g(x) = \frac{1}{1+e^{-x}} \end{aligned}$$
(14)

There are three major steps to training a neural network. It begins by doing a forward pass and making a prediction. Second, it uses a loss function to compare the prediction against the ground truth. The error value produced by the loss function is an estimate of how poorly the network is performing. Finally, it uses that error value to calculate the gradients for each node in the network using back-propagation. The cost function used is the cross entropy which is defined as:

$$\begin{aligned} J = -\sum _{i=1}^{n} y_i\cdot log({\hat{y}}_i) \end{aligned}$$
(15)

where n is the number of training samples, y is the real user behaviour, and \({\hat{y}}\) is the classified one. Optimization is performed using Adam update rule [17], a simple and computationally efficient approach for gradient-based optimization. Back-Propagation Through Time is the training algorithm applied to sequence data that used to update the model parameters in order to minimize the error of the network outputs. In order to minimize the loss, we need to compute:

$$\begin{aligned} \frac{\partial J}{\partial \theta } = \frac{\partial }{\partial \theta } \Bigg (-\sum _{i=1}^{n} y_i\cdot log(g((f ({se}_i\cdot W\!S + h_{i-1}\cdot W\!h_{i-1} + bh ))\cdot W\!h + bo))\Bigg ) \end{aligned}$$
(16)

where \(\theta\) is the model parameters, \(\theta = [S\!e, W\!S, W\!h_{i-1}, W\!h, bh, bo, h_0]\) and \(h_0\) is the initial vector for the recurrent layer.

3.6 Alert unit

When the system detects patient’s abnormal movement behaviour, it will notify the relatives and caregivers by sending an alerts accompanied by the patient current location (GPS coordinates).

3.7 Assistance unit

This unit is responsible of providing the way that keeps disoriented patient safe. It can help by displaying a map on their phone and creating a routes towards the nearest predicted locations or any location stored in the patient’s previous history including the starting location of the movement. Then, the patient follows a series of navigation instructions sounds. The monitoring system will keep track the patient until reaching the desired destination.

While different assistance means can be used, the most important issue is the ability of the patient to understand that assistance type; otherwise, it will be useless. Thus, the assistance type must be added to the system based on what the patient wants. This is one aspect that supports the idea of customizable system.

4 Experimental evaluation

In this section, we aim at demonstrating the performance of the deep learning core of our SafeMove system, by conducting tests using three real-world datasets.

4.1 Datasets

There are many online trajectory datasets, but finding datasets containing trajectories for elderly people with/without abnormal movement behaviour is not a straightforward task [27]. We use three real-world datasets in our experiments, i.e., SIMPATIC, GeoLife and OpenStreetMap.

  • SIMPATIC project dataset [36]: It contains the daily trajectories (from end 2013 to mid 2016) of 18 Catalonian elder individuals suffering from MCI, gathered during the SIMPATIC project. The dataset contains around 2000 trajectories with low sampling (3 minutes), for design reasons.

  • GeoLife dataset [49]: It is an open GPS trajectory dataset released by Microsoft Research Asia that contains more than 17000 trajectories of 182 individuals from 2007 to 2012. Most of the trajectories were created in Beijing (China), but there are a few in USA and Europe. The trajectories were recorded with high sampling (1-5 seconds or every 5-10 meters per point). This dataset is widely used in many research fields: mobility pattern mining, user activity recognition and location privacy, among others.

  • OpenStreetMap dataset [29]: It is a collaborative project aiming at creating free editable maps. In addition, an individual can upload a personal trace to the OpenStreetMap public repository. The repository is being updated continuously, and it already contains more than one million trajectories gathered from thousands of individuals around the world since 2005.

We use the three datasets to evaluate system monitoring. For SIMPATIC dataset, we select normal and abnormal trajectory and label them manually. We evaluate 496 trajectories, 417 of them presenting some kind of abnormal behaviour. Additionally, we can observe that SIMPATIC dataset contains shorter trajectories (elderly does not use to walk large distances), while GeoLife dataset is longer and with more GPS locations due to its high sampling rate. For OpenStreetMap datasets, we chose 16 individuals’ GPS traces as our test datasets, while all individuals’ GPS traces of GeoLife dataset are used. Since the datasets are not of elderly people, they do not contain abnormal movement patterns. Thus, in order to test the performance of the monitoring system in detecting abnormal patterns in trajectories, we added several trajectories with abnormal patterns manually.

4.2 Experimental setup

The evaluation process is divided into two phases. The first phase is the evaluation of the prediction model. The second phase is the evaluation of the abnormal detection part.

4.2.1 Experimental setup: prediction model

We compare the prediction model performance with four outstanding proposals found in the literature:

  • n-MMC [10] is a classical sequential model which exploits the transition probabilities.

  • NN [21, 30, 40] has been successfully applied in computer vision, speech recognition, etc.

  • RNN [15] is widely used for time series prediction.

  • STF-RNN [1] used embedding representation of space and time as inputs to RNN.

  • GTR [2] used embedding representation and neural pooling function of input data.

We only use GeoLife dataset to evaluate the prediction model. The parameters of our model are as follows: the window size w is set to 2. The dimensionality of the embedded vector of the places (\(d_p\)) and time (\(d_t\)) are 160 and 6, respectively. The height of the convolution filters k is set to 2, and the number of convolutional feature maps is 175. We use ReLU activation function and a simple max-pooling function.

Recall and Precision are employed as our evaluation metrics in all experiments in order to assess the efficiency of the prediction models. The Recall@N is defined as the ratio between the number of correct predictions (i.e. locations) over the total number of real visited locations. The Precision@N is defined as the ratio between the number of correct predictions over the total number of predictions. In our study, we only report N = 1, 2, 3.

Supposing that \(L_u\) denotes the set of corresponding real visited locations by a user u in the test data, \(P\!L_{N,u}\) denotes the set of top N predicted locations and U is the set of users, the definitions of Precision@N and Recall@N are formulated as below:

$$\begin{aligned}&R@N = \frac{1}{|U|}\sum _{u \in U} \frac{|L_u \cap P\!L_{N,u}|}{|L_u|} \end{aligned}$$
(17)
$$\begin{aligned}&P@N = \frac{1}{|U|}\sum _{u \in U} \frac{|L_u \cap P\!L_{N,u}|}{N|P\!L_{N,u}|} \end{aligned}$$
(18)

The model of each user is trained using threefold cross-validation technique where mobility data are partitioned into three sub-data of equal size. The Precision and Recall scores of each case from each user is then calculated, and the final results of all users are averaged.

4.2.2 Experimental setup: ABD model

We compare the detection model performance with an outstanding proposal in the literature. \(\theta\)_WD [24] is method to determine wandering patterns by searching sharp changes of directions along their GPS traces. In addition to that, the comparison is conducted when using the detection model ABD together with the prediction model ST-CNN.

The most important step is to determine the most suitable parameters of that model in that dataset. Since detecting abnormal behaviour depends on the dataset, the threshold depends on it too. The parameters of our model are as follows: the window size w is set to 20. The dimensionality of the embedded vector of the direction d and the hidden layer \(d_r\) are 50 and 20, respectively. Different distance values are used \(\delta \in \{10, 50, 100, 150, 200\}\).

When applying the monitoring system given a user movement, it returns a prediction with two possible outcomes: (i) the movement contains normal behaviour which is the default situation, or (ii) the movement contains abnormal behaviour. However, we know in advance whether the trajectory actually contains abnormal movement or not. So, we validate whether the classification is correct, by comparing the prediction of the system monitoring and the trajectory’s label. For this reason, we apply a binary classification. Since the goal of the system monitoring is to detect abnormal movement, we consider that the “positive” class is “abnormal”, and the “negative” class corresponds to “normal”. In practice, a classified trajectory falls into one of the four categories: True Positive (TP), an abnormal trajectory has been successfully classified as abnormal, False Negative (FN), an abnormal trajectory has been classified as normal, False Positive (FP), which a normal trajectory has been classified as abnormal, and True Negative (TN), a normal trajectory has been successfully classified as normal.

From the evaluation perspective, we consider some statistical measures that can be derived from the confusion matrix after classifying each trajectory such as: Recall, Precision, Specificity, F1-score and Accuracy.

4.3 Experimental results

4.3.1 Prediction model results

Results of our model against five other models are shown in Table 2 in terms of Precision@N and Recall@N with N = 1, 2, 3. We can find that the worst results are obtained by n-MMC and NN due to only considering the movement sequences without the time. Furthermore, the results demonstrate the ability of the models that used a RNN structure to successfully analyse movement sequences by taking into account historical dependencies which enables the models to get more accurate results. A great improvement is achieved by STF-RNN though using the internal representations learning of the space and time features. In addition, it can be observed form the table that GTR slightly improves the results compared with STF-RNN due to using the time encoding scheme in the model operations which helps the model to effectively capture the temporal effects. We further notice that our model outperforms STF-RNN, GTR and other baseline models. For instance, ST-CNN outperforms STF-RNN and GTR by 1.1% and 0.64%, respectively, in terms of P@1. This is mainly because that CNN explicitly employs the interactive information, while most other methods only rely on the global information.

Table 2 Performance comparison. Best scores are in bold

4.3.2 Abnormal detection model results

Figure 7 shows the confusion matrix obtained from our model classification procedure on the three datasets where main diagonal values represent the correct classifications, whereas off diagonal values are incorrect classifications. It is clear from the figure that our model is able to detect the abnormal trajectories more than the normal ones. For instance, the confusion matrix in Fig. 7a shows that more than 90% from the abnormal trajectories were classified correctly, while the model mostly misclassifies normal trajectories as abnormal. This could be because of the shorter trajectories with low sampling rate.

For GeoLife dataset (7b), the biggest confusion happened when the trajectories are normal but classified as abnormal by the model. Since the GeoLife trajectories contain a huge number of GPS points, they have been classified as abnormal when detecting a change in the direction with a small distance value. On the contrary with the trajectories of OpenStreetMap dataset, our detection model needs more direction changes within a certain distance value to ensure abnormal behaviour.

The confusion matrix in Fig. 7c shows that our model does not work correctly for OpenStreetMap dataset. This is due to that the sampling rate for location acquisition is not fixed and the distances between the consecutive GPS points are almost high. When a low location acquisition rate is used together with long distances values between the consecutive GPS points, a huge number of abnormal trajectories will be missed. In addition to that, since this dataset is imbalanced, most of the correct classifications are normal behaviour trajectories.

These results could be improved by modifying the values of the model parameters (i.e., number of GPS readings and distance between the GPS points). Additionally, training the model with more data would allow for better generalize on its classifications and then improve the performance. In spite of this, the proposed detection model achieved good performance.

Fig. 7
figure 7

Confusion matrix of evaluating the abnormal detection model on the datasets. Each entry in column c and row r represents the percentage of action r that was classified to be action c

Table 3 Performance comparison. Best scores are in bold

Table 3 illustrates the classification results in terms of Recall, Precision, Specificity, F1-score and Accuracy. It is shown that ABD and ABD+ST-CNN that used a RNN and CNN structures perform better than \(\theta\)_WD on SIMPATIC and OpenStreetMap datasets. ABD+ST-CNN performs better than ABD. This indicates that using the prediction model is effective in achieving more control on the patient’s outdoor movement, which enables the system to obtain more accurate results by considering where the patient is going next. Focusing on the recall attribute, we see that ABD+ST-CNN detects up to 90% of the abnormal trajectories on SIMPATIC and GeoLife datasets. However, when the model detects abnormal, it is true with 67% and 73% of the cases on SIMPATIC and GeoLife datasets, respectively. Regarding OpenStreetMap, we can clearly see that our model is not appropriate for this dataset where recall is 19% and 28.6% with ABD and ABD+ST-CNN, respectively. 38% and 41% from all abnormal predictions are detected with ABD and ABD+ST-CNN, respectively. As mentioned before, any individual can upload a personal trace to the OpenStreetMap public repository. This means that the traces can contain trajectories using transportation means and sport activities which cannot be used for abnormal behaviour detection.

Regarding \(\theta\)_WD method, it is hard to detect abnormal trajectories, but, when it happens, the trajectory is often abnormal with 48%, 34% and 86.7% on SIMPATIC, GeoLife and OpenStreetMap datasets, respectively. We observe that the poor results are on OpenStreetMap dataset, since it has a low recall and precision values (14% and 34%, respectively). Moreover, we conclude that this method performs well when GeoLife is used. Consequently, we could infer that this method works better when analysing trajectories with high sampling rate.

Fig. 8
figure 8

Distance effect on the datasets

To investigate the impact of the distance parameter, we conduct several experiments to check the detection model performance with various distance values as shown in Fig. 8. The best performance is obtained when the distance parameter value is 10 for SIMPATIC. However, the best performance reaches its peak when the distance parameter values are 100 and 150 in GeoLife and OpenStreetMap, respectively. This is due to the short trajectories of SIMPATIC compared to the trajectories of other datasets. Since GeoLife and OpenStreetMap trajectories are more dense (low sampling rate) than SIMPATIC, they are more likely to contain cycles.

4.4 Running time

In order to assess whether our models can be used in a real-time system, we measure the exact running time in seconds taken by ST-CNN and ABD models. We simply run the models on trajectories and measure the amount of time for the on-line operations (i.e. predicting and detecting the abnormal behaviour). The experiment is repeated 10 times for each model. The average and standard deviation are then calculated. All experiments were conducted on PC with 2.7 GHz Intel Core i5 CPU and 8 GB memory.

The average ST-CNN predicting running time is 0.00037 with ±0.00006 standard deviation, while the average ABD detecting running time is 0.00003 with ±0.0000004 standard deviation. It can be noted that the on-line operations are very quick due to the fast computations.

5 Conclusions and future work

We have presented a system called SafeMove which utilizes deep learning techniques to provide cognitive assistance to elderly people. It relies on the historical mobility data as a basis for predicting likely locations and then detecting abnormal behaviour. The system in the server runs a Convolutional Neural Networks (CNNs) utilizing an elder historical movement information in order to learn his/her movements. It is responsible for predicting the locations that he/she will visit, the route and the expected time spent to reach that locations. Moreover, three different variables are evaluated during the movement: distance, direction and time. We then developed a model called abnormal behaviour detection (ABD) that take advantage of recurrent neural networks (RNNs) to detect the different abnormal behaviours scenario in real-time. The findings of this study indicate the potential utility of this system for monitoring and detecting wandering behaviour in a real-world scenario with patients suffering from MCI.

The success of any monitoring system depends mainly on the accuracy and availability of mobile user’s location information. In addition to that, the quality of the Internet communication between the user’s smartphone and the server is essential for the continued operation of the monitoring system.

A major drawback of this system is related to the dataset utilized to build the prediction model. The system assumes that the dataset is available and that it was collected in the period when the patient had no movement problem. If the historical movement dataset is unavailable or contains abnormal behavior, the system will not be able to obtain the user’s regular mobility routines. This issue could be addressed by manually determining the significant places by the patient’s relatives. Each place can be given a probability to be the next location the patient will visit depending on the importance of the place and the time of movement. Furthermore, similarities between trajectories belonging to different people can be computed in order to identify the persons with similar mobility routines. On the other hand, the question that might be asked is: How long should movement histories of a patient be stored and used to improve prediction accuracy? The need for large data is essential so that the model will be able to learn the regular mobility routines. In general, the length of time the data should be stored is determined by the ultimate objective. For example, if the data is collected to obtain the user’s regular mobility routines and predict the next location the user will visit, it should be stored for a longer period of time. However, if the purpose is to detect immediate wandering behaviour, then the data can be discarded after a shorter amount of time. Regarding the amount of data transferred between patient’s smartphone and the server, it can be expected that a considerable connections are to be used, which means more power consumption. In order to reduce high demand for the network and energy consumption, different measures are taken into consideration while building the monitoring system. All processes are running at the system side. The GPS coordinates and the timestamp are sent every specific time from the client side to the system side. Moreover, the caregivers and relatives can modify how often the patient’s smartphone sends the GPS coordinates and the timestamp. Last but not least, the system is autonomous, and hence, no explicit input from the patient is required which means less interaction between the server and client side.

In this article, the theoretical framework corresponding to the deep learning components of SafeMove has been tested. Our future challenge will be focused on the complete development of an Internet enabled system for SafeMove and its corresponding mobile application, so as to build a proof-of-concept and conduct a test with potential users. In addition, we plan to develop the system using a graph-based model that can be applied to obtain the optimal solving of the path between points on the road during patient movements.