1 Introduction

In the Asia-Pacific regions, both developing and developed countries facing traffic management issues in urban areas because of the increasing population. It increases the number of vehicles on roads. Over the years, a wide variety of traffic management systems have been developed for urban traffic control [16]. Mobility path information of cell phone users plays an important role in a wide range of cell phone applications, such as context-based search, advertising, early warning systems [1, 17], traffic planning and monitoring [8, 15], route prediction [10, 11], and air pollution exposure estimation [3]. Mobility and demographic based user profile learning is important for these applications [6]. User’s Mobility Detection is the approach by which one can find the location of a user using a GPS sensor equipped in a smart device. There is a large number of ways by which we can get this Geolocation information of a user. For example, using sensors, we can detect the location of a user. With the advent of the Internet of Things, a large number of devices are connected and communicating with the devices carried by humans. Thus, there is a need for a robust system to detect a user’s movement to learn how it works in different applications. User mobility includes the study of different areas like the user’s point of interest, routes, traffic, individual mobility pattern, etc. Analysis of patterns of tracks and investigating the evolution of patterns over time [23].

An analysis is done at a particular interval of time. There is a time-dependent social graph that provides random and social interactions. There is a systematic way of detecting and tracking human movement by their mobility data generated from different Geo Location sensors. Characteristics of group mobility are group evolution, periodicity and meeting duration [5]. Nguyen et al. [19] presents a novel method for mobility detection based on social events and their relationships as a knowledge graph. The graph based analysis is presented with the data collected from social media.

Characterization of group evolution is done over time by considering first the structure change rates of growth, contraction, birth, and death; second the group meeting periodicity; and third the group meeting duration and correlation with the strength of the bond of group and group’s stability [23]. The mobility of users in networks is responsible for dynamic changes in user accesses to base stations. The rapid movement of a dense group of users in a network causes degradation of the quality of service. Dynamic base station switching schemes monitor and model the movements of the group by understanding the utilization of the network. Big data analytics can be used for developing and validating the detection methods to monitor group mobility, validate connected/idle duration of models and simulations based on dynamic switching of users between base stations [20].

As per [4], mobile movement data were collected for analyzing user mobility. These datasets cover periods: July to August 2007 and November 2007 to June 2008 from 10 mobile operators of Portugal and May to October 2007 from 5 mobile operators of France. Both datasets contain more than a billion calls from 2 million users in Portugal (\(\sim 20\%\) of the total population) and 17 million users in France (\(\sim 30\%\) of the total population) [4]. Another mobile movement-based user mobility detection was analyzed in Haiti for understanding the population disbursement before and after an earthquake. Mobile phone location data was collected for the largest mobile phone operator in Haiti (Digicel) and analyzed the movements of 1.9 million mobile phone users during the period from 42 days before, to 341 days after the devastating [14]. The prior approach is taken into our previous paper considering a small location belongs to organization premises where the mobility of individuals was collected and analyzed using graph-based analysis. By extending this work we are proposing detection of the trajectory of a taxi movement in the city based on the GPS location data collected. [25]

Fig. 1
figure 1

Data source and expected outcomes for proposed work

Figure 1 shows the potential fields where the outcome of the proposed work can be exploited. One of the outcomes helps smart city users to enhance the usage of the available transport facility by sharing the location information. Smart Mobility means the recommendations for sharing transport services to enhance cost-effectiveness and time saving based on smart city-data. The generated recommendation helps the people of the smart city to enhance their living standards by using recommendations about the resources and events. It can lead to a smart living environment by using these available resources optimally. This also helps people to enhance their living standards using available resources and facilities. Knowledge and pattern regarding the traffic plan help the people of the smart city to use the resources effectively and save time for transportation. The smart traffic will also be helpful for the governmental agencies for planning different events in the city smartly. The resource like land-use, environment, socio-economical, energy consumption and transportations distribution for a city should be transparent and equal. This will enable the distribution and establishment of these types of resources. Many environmental alerts before and after any disaster help the people of the smart city to make a better decision about their movement and displacement. It also enhances the accountability of the governmental agencies for any events.

The rest of the paper is organized as follows. Section 2 presents the related research work in the area of mobility detection and prediction. Section 3 presents the proposed research work of this paper. This paper is mainly based on the project related to the smart city for user mobility detection and prediction. Section 4 describes the methodology, tools, and concepts applied to achieve the proposed outcome. Section 5 presents an experimental analysis done for justification of proposed research work. In Sect.  6 result and analysis are discussed.

2 Related work

Much research has been conducted in the field of user mobility detection and prediction about many application areas. Table 1 shows a comparative analysis of different approaches applied in the area of user mobility detection and prediction.

Yavas et al. (2005) [28] proposed a new algorithm for mobility prediction for mobile users. Mobile holders always reveal information about their location to the internet services provider. The algorithm is mainly predicting the next inter-cell movement of a mobile user in a Personal Communication Systems network. Kim et al. [9] proposed a mobility model with an emphasis on user movement on specific popular places, referred to as hotspot regions. They considered mobility characteristics including pause time, speed and directions of movements. Cho et al. (2011) [2] proposed a social network based user mobility prediction model. Data are collected from mobile user mobility and a social network. It predicts a daily routine with the geographic location of the user. User day to day movement pattern is identified based on social network data collected from his mobile movement.

Liu et al. (2013) [12] proposed a mathematical model for analyzing a user’s moving pattern based on mobile movement data. The mobile location data are captured based on the voice call from mobile in particular locations for one year. The machine learning based model predicts user location with good accuracy. Faye et al.  (2017) [7] proposed a data analysis model for predicting user mobility using data captured from smart devices often carried by users. These devices include a smartphone, smartwatches, etc. Different sensors present in most commercial smart devices can be used to deliver mobility information and patterns. It also provides a mobility assistant mechanism based on the combination of mobile wi-fi activity data.

Table 1 Comparative analysis of different approaches applied in the area of user mobility detection and prediction

Watanabe et al. (2017) [27] developed a novel proof-of-concept framework called RouteDetector to identify a route of the train based on readings of smart sensors devices attached. A machine learning based analysis predicted a potential path for a train in a schedule for specific locations.

Deciding a suitable data source for user mobility analysis is an important concern. A brief analysis of various data sources for user mobility in the multi-user context modeling environment is presented in [18]. They have described factors for choosing an ideal dataset for such tasks along with their desired characteristics.

Senaratne et al. in 2018 [22] introduced a visual analytics based approach for comparing mobile usage patterns and detecting anomalies in daily routines across regions and user groups. A GSM user internet usage database of 358 users collected over a period of seven months from Santiago de Chile is used to explore the Spatio-temporal patterns derived from the user movement traces. They further demonstrate their contribution in terms of similarity of user movements, classifying home and work area of users, region partitioning based on origin and destination, and temporal change detection. The outcomes of their work can be helpful for smart city urban planning and transportation management.

A recent study by Liu et al. [13] proposed a big data approach to examine geographic patterns of time-space aggregate human activity and its impact on land use characteristics. A practical approach for city policymakers and planners to understand the patterns of land use and human activity with new and emerging location-based big data is presented in their study. The work in [21] proposed a framework for big mobile data, based on real data traffic collected from second-, third- and fourth-generation networks from almost 7 million users and in densely populated areas. Their findings are helpful in the context of urban planning, traffic control, and mobile network resource optimization, etc.

3 Proposed work

The proposed research work presented in this paper, is based on the research project focused on the design and development of integrated data structure for large scale data captured for a project based on user mobility (please refer Fig. 2). Due to the large size, data generation speed, and diverse nature of smart city data, this problem is considered under the Big Data problem. A prototype has been designed and developed to demonstrate the effectiveness of the analytics service for Big Data Analytics. The prototype has been implemented using opensource solutions available for Big Data Analytics, and its results are evaluated concerning the parameters such as efficiency and effectiveness. The experiment analyzes and visualizes the data which contains the GPS trajectories of 10,357 taxis from Feb. 2 to Feb. 8, 2008, within Beijing [29, 30]. Information and Communication Technologies (ICT) and the Internet of Things (IoT) play the key roles in Smart City projects. It is a very challenging task to handle a large amount of data generated in different processes and connected devices of projects related to land-use, environment, social network and economy, energy consumption and transportations.

Fig. 2
figure 2

Traffic detection and prediction system for smart city projects

The proposed system architecture used for this work is shown in Fig. 3, which is divided into three segments. The functionality of each segment contributes to meet out the objective of the project. The lowest segment consists of different sensors, heterogeneous repositories. It is responsible for data acquisition, data cleansing and data classification with state-of-the-art approaches. The middle segment supports the new scenario to develop links that were not possible in the lowest layer. Moreover, once the data are collected from the heterogeneous sources then the mapping between resources has been established. Then, data are made semantically relevant and browse the table, which helps the end-users to select parameters for the analysis. Traditional metadata formats such as DBLP and open library are used to describe and store it. Then, mapping of this data is done with the usage of resource description semantics, which contains all the links of various resources. An analytic engine is a topmost segment that processes application-specific data. Further, it utilizes the data available with the data segment and also helps the user in query submission, algorithm processing, and workflow to get information from repositories. To handle the aforementioned issues, big data mining has emerged as a new technique, which is used to identify large data sets because of complexity, cardinality, and continually. These are being used in various applications such as network traffic, businesses, etc. Moreover, these are useful to generate non-obvious relations and associations from a huge data set of smart cities. Since the main focus of this research work is User Mobility Detection and Prediction System, we will mainly focus on the mobility of the user and explain in detail. To achieve it, various statistical modeling, machine learning, and data mining techniques can be applied.

Fig. 3
figure 3

Proposed data analytics approach

4 Methodology

This section describes the concepts and methodology for Big Data analysis, user mobility and graph-based data analytics with their process flow and execution. Different types of sensors for user mobility detection are discussed.

4.1 Big data analysis

A huge amount of data are generated every day from different resources at every time. The handling of these overflowing data is called big data analytics. It is the study of data, categorizing it in terms of uses, its application, and method of how the data is obtained, the size of data & many more. We can represent or categorize them in any way as per our requirement & for the best possible outcome. For example, it can store in tabular form, binary tree form, graphical form, etc. and analyze it using Excel, SQL, Hadoop, etc. In general, big data is a lot of data that cannot be processed or analyzed normally. We need specialized tools to solve these like efficient software, better hard drive, fats processor, etc. So, for example, we need to analyze 5GB of data, so if one processor takes 1 hour to complete 1 task N processor can do it 1/N hour. This is the approach used to analyze the overflowing data. User mobility data is generated in huge volumes with a very high speed (velocity). Also, these types of data are generated from different types of devices in different formats and variations. User mobility detection and prediction problem consider under Big Data Analytics because its data fulfills all the three dimensions (Volume, Velocity, and Verity) of Big Data. For this type of big data, we are proposing a big data environment for storage and computation in a distributed manner with Hadoop and Spark.

4.2 User mobility detection sensors

Sensors that can be used for mobility detection: The accelerometer, magnetometer, and gyroscope can be chosen as they can detect fine-grained detail about motion. The light sensor can be chosen as changes in the level of light detected may be evident between different locations. Table 2 depicts a detailed description of customarily used sensors in various mobile manufacturing companies. In the initial time of ATMS and ITS systems, the data was being captured using different types of sensors present at fixed positions. These sensors were able to detect the nearby vehicles passing through it. Earlier, inductive loop detectors were most popular to detect the vehicles but nowadays there are a variety of fixed position sensors available as listed in Table 2.

Table 2 List of sensors for mobility detection and prediction systems

4.3 Graph based data analytics

A social information graph can be build where GeoHash tag of an individual taxi represents the vertex. For edges between these vertices, the link is identified based on the GeoHash tags collected with a different timestamp. The difference between these timestamps for an individual taxi shows the relationship between the vertices of the graph. A graph G(VE) is built with a set of vertices V and a set of edges E. Here a set of vertex V is represented by GeoHash tags which are generated by a combination of latitude and longitude and set of edges E represents the link between these vertices. The dataset collected from the T-Drive Taxi Trajectories dataset provides the values of attribute taxi-id, timestamp, latitude, and longitude. Here a set of edges shows a link between these GeoHash tags that have timestamp differences more than by given threshold value. As per algorithm 1, steps 1 to 3 are showing the vertex generation using latitude and longitude from the given dataset D. Step 4 to 6 shows the process to generate edges from these vertices. Step 7 is for building a graph using the set of vertices V and a set of edges E. Step 8 to 13 describe how in-degree is generated for each vertex and printed. Step 14 to 18 shows the execution of the page rank algorithm. Table 3 provides notations used for page rank algorithm.

Table 3 Page rank notations
figure i

5 Experimental analysis

This section describes the execution plan to achieve the proposed objectives. For experiment analysis, a Big Data environment is set up with Hadoop for distributed storage [24] and Spark for distributed computing [26]. Spark’s GraphX component provides the API for graph analytics like in-degree, PageRank, etc. Here we applied spark’s PySpark which provides an interface with Resilient Distributed Datasets in apache spark and python. GraphFrame is used to access all the API of the GraphX component of the spark which is implemented in the scala programming language.

5.1 Dataset selection

For experimental analysis, the T-drive Taxi Trajectories from Nokia MDC datasets are selected. A total 10,357 taxi covered around 9 million kilometers to generate this dataset. Around 15 million points covered for GPS trajectories, the dataset was generated in the period of 2 to 8 February 2008. Figure 4 shows the data distribution with time and distance intervals [29, 30]. The format of data contains in taxi id, timestamp, longitude, latitude.

Fig. 4
figure 4

Histograms of time interval and distance between two consecutive points [29, 30]

5.2 Data preprocessing

Data transformation is the process of converting/transforming data from one form or structure into another form or structure. Data transformation is critical to operations such as data integration and data management. Data transformation can include a range of activities such as: converting data types, data cleaning by removing null values or duplicate data, enrich the data, or perform aggregations, depending on the requirements of the project. Data Discovery performs knowledge gain of user mobility with the correlation between locations. Data Structure containing records of booking of a taxi with timestamp and latitude and longitude of the pickup location. Extracting Data combine these multiple data files into one single file containing all the record of taxi-id, latitude, longitude and timestamp. Here transformation of latitude and longitude are concatenated together to form a unique area called GeoHash tag. The Geohash is the method for encoding regions with specific precision of latitude and longitude. Geohashes offer properties like arbitrary precision and the possibility of gradually removing characters from the end of the code to reduce its size. After applying the geo hash method we get the data in the form of the shorter hash. From the GeoHash file and the simulator file, we can get the group of id containing the same location/ area (see Fig. 5). For example:

2–4,

URTFD,

−18375.0

2–4 ,

URTFD,

−12837.0

\(\cdots\)

  
Fig. 5
figure 5

Data file after applying GeoHash and edges generation

5.3 Graph based user mobility detection and prediction

Figures 6 and 7 show the Python code used to build a Graph-based onset of vertices V and set of edges E generated as shown in Fig. 5c. Geohash represents the GPS location of an individual taxi. Edges between these Geohash tags show the link between the vertexes. Figure 6 shows the steps for building graph G(VE). The graph shows the connections between the GeoHash tags. A Big Data environment is set up using Hadoop to achieve distributed storage and Spark for achieving distributed computing. The GraphX component of the Spark framework provides API to execute graph algorithms and Google’s PageRank algorithm. GraphX is implemented in the Scala programming language and can be only used by the Scala program. To overcome this issue, the Graphframe, a python implementation to execute GraphX API is used. In-degree and PageRank algorithms are executed as per Fig.  7.

Fig. 6
figure 6

Python code for Graph Generation using GeoHash Tags

Fig. 7
figure 7

Python code for execution of In-Degree and Pagerank algorithm

5.4 Results

TableS 4 and 5 show the importance of GeoHash tags based on graphs’ In-Degree and Google’s PageRank algorithms. The importance of geolocations based on the combination of latitude and longitude on different levels is observed. It is also observed that there are various effects of traffic on different levels of GeoHash tags generated. Table 4 shows top 20 GeoHash tags using the In-degree algorithm and Table 5 shows top 10 Geohash tags using PageRank algorithm. This can be also used to detect the mobility patterns in a city which helps for the planning of traffic arrangements in urban areas. The technique GeoHash which we have used for data transformation gives the dissimilarity with greater precision. But this precision leads to higher computation cost which is the major drawback. So to reduce the computation cost the value of precision needs to be decreased.

Table 4 Top 10 important GeoHash tags based on in-degree of vertices
Table 5 Top 20 GeoHash tags based on pagerank algorithm

6 Discussion

The approach proposed in this work and the analysis thereafter may help in traffic planning at the city level as well as for infrastructure setup. It is more motivated compared to our previous work on user mobility detection and prediction in small premises for a smart city project [25]. As per the result computed Tables 4 and 5 the historical trajectory can be used to predict the future trajectory of moving vehicles. The graph shows the connection between different vertices where vertex representing the taxi id. The graph-based analysis will help plan the city traffic as well as infrastructural setup. Here vertex is representing the trajectory of taxi’s at an instance of time and the edges are representing the taxi’s trajectory for taxi movement. PageRank algorithm of the graph showing the highly influenced vertex in the graph which can be used for traffic management and planning.

However, the method becomes inefficient when the number of data increases. The approach isnt scalable as it needs to train the model separately for every taxi, especially span a large area of the road network. Here we are suggesting that the trajectory of a specific taxi is also correlated to the trajectory of taxis in its nearby region. This approach is scalable to even large amounts of data that require big data engines while also being able to predict long-term trajectories in large cities.

7 Conclusion and future work

In this paper, we have contributed to the design and implementation of a prototype with an objective to demonstrate the effectiveness of the analytics service for Big Data Analytics. User mobility detection based on the location data generated from different Geo location sensors is the main objective of this paper. This can help to track user movement and pattern prediction of the path, thus the population movement and distribution during a specific time period can also be identified and predicted. We have used the T-driveTaxi Trajectories from Nokia MDC datasets for predicting taxi movement. Geohash tag is generated and a Hadoop and Spark based Big Data environment is set up for data analysis. Pattern identification based on the past mobility pattern can also be generated where machine learning and deep learning algorithms can play a vital role.