1 Introduction

Navigation maps are thematic maps that depict digitized information from city environment and are created to enable the fastest possible spatial orientation and navigation in the urban space both for humans and for autonomous vehicles. Obviously, a map can best serve such a purpose when it is accurate and the features depicted on it are semantically correct and up-to-date. A question that naturally arises is how maps can be updated at optimal time and cost. One idea is to utilize information from volunteers that agree to share their geo-footprints, created while using the infrastructure. These can then be collectively processed for extracting information that can either update current map features or enrich the map with new ones. With the increasing usage of mobile smart devices, the joint collection of information by volunteers—either opportunistic or participatory—known as crowd-sourcing has an increasing relevance and potential (Heipke 2010). Examples of crowd-sourced information are the measurement of atmospheric data (Muller et al. 2015), the measurement of rainfall (Fitzner and Sester 2016), or the availability of parking spaces in an area (Urra and Ilarri 2019; Rybarsch et al. 2017) as well as the detection of road traffic congestion (Dimri et al. 2019).

The regulation type of an intersection is important information for navigation to determine the path with the shortest travel time. Autonomous vehicles would also need such intersection-context information for using it during the complex decision making process at highly interactive locations, from a traffic participant point of view. Nevertheless, this information is yet largely missing in open-source maps, such as OpenStreetMap (OSM). The work presented in this paper is motivated exactly by this observation. In particular, the research question that is explored here is, how accurately can the traffic rules of intersections be detected by using “lightweight” crowd-sourced information such as GPS trajectories. We are especially interested in detecting regulator types, which so far remain underexplored, namely traffic lights (TL), yield (YS), priority (PS) and stop signs (SP), as well as uncontrolled intersections (UC). Priority signs indicate that a traffic participant is on a road of higher priority and all crossing roads are of minor priority. This regulation is always coupled with yield or stop signs at the crossing roads of the same intersection. Our approach investigates those regulator types for the different roadways of intersections.

We explore the research question stated above in the following way: in Sect. 2, we present related work, in Sect. 3, we describe the data and the method we used for the classification task, and in Sect. 4 we present the results. Discussion of the results and future directions are given in Sects. 5 and 6, respectively.

2 Related Work

Recently, Zourlidou and Sester (2019) conducted a systematic literature review on methods that detect and identify traffic regulators from crowd-sourced data. By analyzing relevant articles, they underline the predictive ability of the detection methods (over 80% accuracy), the low diversity of the predicted classes within each study (i.e. mainly regulator types traffic lights and yield/priority) and the low percentage of studies that examine the cross-city applicability of their proposed methods (i.e. train in city X and test in city Y, to show the transferability of the learned models).

Meneroux et al. (2019) addressed the problem of detecting traffic lights by suggesting a speed-profile-based method under a classification perspective. By testing three different ways of deriving features, they demonstrate that a functional description of speed profiles with wavelet transformation outperforms the other approaches (raw speed measurements and image recognition technique). Additionally, they tested six different classification methods (Naiver–Bayes, K-Nearest Neighbors, Decision Tree, SVM, Random Ferns and Random Forest) and found that Random Forest yielded the best accuracy score (95%).

Zourlidou et al. (2019) proposed a supervised method (C4.5 Decision Tree) for intersection classification according to traffic regulators based on the associated speed profiles at regulated locations. Each speed profile consisted of a sequence of speed logs in either constant time (every sec) or space (every meter) intervals. As speed logs they use accurate measurements of vehicle’s real speed acquired from vehicle’s CAN-BUS. The classifier is trained at intersections of different regulation controls assigning uniformly the same label for all trajectories that cross an intersection of certain type of regulation. Then for predicting the label of an intersection, each trajectory’s speed profile is classified with the trained classifier and the predicted labels of all crossing trajectories per intersection are finally aggregated (major voting) to a single predicted label. The results show high recall (100%) for prediction of traffic light category; however, low precision (31%) and F measure (45.1%) mainly resulted from low performance on priority/yield category instances.

Qiu et al. (2018) detect stop signs, based on a prevalent characteristic of stopping at a stop sign: a deceleration followed by an acceleration. Also, to distinguish between four-way and two-way stop signs and between stop signs and traffic lights, they use crowd-sourcing and some heuristics: if there exists a stop segment S at intersection I, but do also exist k other traces with the same heading as S (where k is a small integer) and they do not contain a stop segment at I, no stop sign is located at I in that direction. For evaluating the regulator detection, they examined two different scenarios: using on-board car sensors (yaw rate, steering wheel angle, brake/throttle position and inertial sensors) and mobile phone inertial sensors (gyroscope, magnetometer and accelerometer). Interestingly, they found that although the car-sensing approach uses more special sensors such as the brake and throttle (precision 93.24%, recall 83.78%), the phone-sensing has comparable precision (90.32%) and recall (85.71%). The lower precision of phone sensing is observed when a vehicle passes through a green light at speeds lower than a certain speed threshold, yielding false positives. Car-sensing does not suffer from this problem as it uses the additional information from brake and throttle sensors. However, crowd-sensing regulators with phone-sensing gave qualitatively similar results to car-sensing data.

Carisi et al. (2011) propose a simple heuristic method to enrich digital maps with the location of stop signs and traffic lights, as well as the timing of the latter, using a small number of traces per road segment (five traversals per road segment for locating stop signs and seven traversals for locating traffic lights and estimating their associate timing). They report accuracy of 90%. Moreover, Saremi and Abdelzaher (2015) exploit map-based features derived from OSM, such as the speed rating of road segments, distance of one intersection to the next closest one, end-to-end distance, and category of street segments such as motorway, trunk, primary, secondary, tertiary, motorway link, primary link, unclassified, road, residential, or service. They use a Random Forest classifier to predict traffic regulators, examining two different feature-vector settings: using only map-based features and a combination of map-based features with trace-derived attributes (number of stops, traverse speed and stop duration). Their findings shows that using the richer feature vector, the classification accuracy increased to (97%).

The earliest relevant study has been proposed by Pribe and Rogers (1999) and concerns a method that uses a Neural Network (NN) for learning to associate the driver behavior with two types of traffic rules: traffic lights and stop signs. As input to the NN, they use the average and standard deviation of stop-event related features: the number of vehicle stops, total duration of all stops and last three stops (closest to the intersection). They also compute the percentage of traversals that include at least one stop for each road segment. A similar methodology is suggested by Hu et al. (2015). We consider this work as the most complete on the research domain we examine here and methodologically closest to the method we propose in this paper. In particular, they describe a supervised approach (Random Forest) as well as an unsupervised method (spectral clustering) for a three-class classification problem (stop signs, traffic lights and uncontrolled intersections). Both methods use a physical feature vector (final stop duration, minimum crossing speed, number of deceleration, number of stops, and distance from intersection) and a statistical one (minimum, maximum, mean and variance of the physical feature values) for describing the crossing behavior of vehicles at intersection locations. The reported accuracy is greater than 90% in different feature, training and testing settings (various features subsets, proportions of available training/testing data).

The majority of studies focus mainly on traffic lights, stop signs and uncontrolled intersections; turning restrictions are a separate category. There is only one study (Zourlidou et al. 2019), which investigates additional regulation types. The focus of the study presented here is the so-far underexplored subset of traffic rules, composed of traffic lights (TL), yield (YS), priority (PS) and stop signs (SP), as well as uncontrolled intersections (UC) where the right-of-way rule regulates the traffic. In Sect. 3, we introduce the methodological framework for detecting the aforementioned regulators, as well as the dataset used for testing the proposed method.

3 Data and Methodology

3.1 Dataset

The trajectories used in our study were self-collected with a mobile Android application (Geo Tracker). In total, 700 trajectories were recorded during everyday car journeys in the period from December 2017 to March 2019. Our research area is located in Hannover (Germany), mainly focused at the northern part of the inner city and surrounding area (see Fig. 1). Each trajectory has a length between 5 and 14 kilometers and the total length of all of them is 3,748 kilometers.

Fig. 1
figure 1

Research area in the northern part of Hannover showing the recorded trajectories (blue). Basemap: OpenStreetMap (2019)

Table 1 Distribution of trajectories according to the regulator type they sample
Fig. 2
figure 2

Workflow of the methodology consisting of the input data, pre-processing and the classification process

In total, 1064 traffic road intersections are included in the used dataset. 717 of them are three-way intersections and 335 are four-way intersections. It is obvious that there is an imbalance in the number of examples for the different types, which influences the later processing steps. Priority-signs are always observed together with either yield-signs or stop signs at the same intersection. However, due to the fact that not all approaching roadways are sampled equally with trajectories, the dataset has very few examples from yield- and stop sign regulated roadways. The distribution of samples according to the regulation type they sample and their membership to the complete/no-turnings dataset can be seen in Table 1.

We divided the dataset into two training datasets. The first set contains all trajectories (complete dataset), while the second one discards the turning trajectories (no-turnings dataset) to eliminate a possible impact of the turnings. Furthermore, we also investigated the influence of the number of traversals of an intersection. This parameter mainly has an effect on the number of contained samples, as well as the overall performance of the classification process. To include as many samples in the analysis as possible, we experimentally tuned this number to be small enough to keep the corresponding classification accuracy high. Nevertheless, as we show later, this procedure further reduced the number of intersection roadways of the dataset that qualifies this minimum requirement.

3.2 Learning Traffic Regulators

The flow diagram depicted in Fig. 2 shows the processing steps of the traffic regulation detection, from data selection to regulation prediction. In the first step, the GPS trajectories within a 100-m buffer around the intersections’ center are selected. For each intersection and for each trip (trajectory) that crosses that intersection, a set of different physical features is computed. These physical features describe the behavior of vehicles when approaching the intersection (compare description in Table 2). In addition, we have divided the intersections into their roadways to separate the regulations of each intersection entrance. The statistical features of each intersection roadway are extracted by analysing all the trajectories that cross that certain intersection approach. A more detailed explanation of the different physical and statistical features and their relationship is shown in Table 2. We adopt the distinction of features into these two categories from Hu et al. (2015).

Table 2 Overview of physical and statistical features as well as their definition

Regarding the classification process, we tested different classifiers (Decision Tree, Random Forest, Support Vector Machine and Neural Network). For this purpose, we utilized the Python implementation in Scikit-learn (Pedregosa et al. 2011). Without elaborate parameter adjustments (default settings), the Random Forest outperformed the other classification techniques (compare Table 3) and was, therefore, selected for further experimentation of different conditions and settings. Random Forest is an ensemble of Decision Trees where, once all predictors (Decision Trees) are trained, the ensemble makes a prediction for a new instance by simply aggregating the predictions of all predictors (see Fig. 3). The aggregation function is typically the most frequent prediction among predicted classes by the predictors. The key concept behind the Random Forest classifier is that, instead of using a single predictor (a single Decision Tree on the training set), it trains several Trees on different random subsets of the training set via the bagging method (sampling from the training set with replacement) or sometimes pasting (sampling without replacement). That way each individual predictor has a higher bias than if it was trained on the original training set, but by aggregating the results of all predictors, the ensemble reduces both bias and variance, making Random Forest a very efficient classifier. To fine-tune Random Forest’s hyperparameters, the grid search was used.

Table 3 Overview of the performance of different classification methods using their default hyperparameter settings
Fig. 3
figure 3

A schematic depiction of the classification flow of Random Forest with three Decision Trees as predictors

Due to the large imbalance in our dataset regarding the regulator types, we applied different oversampling methods in order to counter this effect. The three oversampling methods are random oversampling (Freund and Schapire 1996), SMOTE—synthetic minority oversampling technique (Chawla et al. 2002) and ADASYN—adaptive synthetic sampling approach (He et al. 2008). The first one, random oversampling, performed better than others and was selected for our experiments in this paper. Additionally, we used the Bagging and the AdaBoost booster to further improve the accuracy of our results.

Additionally, we investigated the impact of the minimum number of crossing traversals per intersection roadway based on different accuracy measures. We experimentally found the optimal value for the minimum number of needed traversals for both datasets, complete (14) and no-turnings (16). These values were found by iterating the minimum number of traversals and choosing the one with the highest accuracy and k-fold score having the smallest gap between them.

4 Results

4.1 Overview of the Results

When it comes to the results, only TL, PS and UC samples were analyzed based on the achieved accuracy score and the k-folds accuracy. The accuracy score is the relation of correctly classified samples and the total number of samples (Sokolova and Lapalme 2009). The k-fold cross-validation is a less biased measure and considers all slices of available data during the training and testing phases (Witten et al. 2017).

An overview of the experimental setups and resulting performance can be seen in Table 4. Overall, the experiments show that the accuracy is improved when omitting turning trajectories from the classification process. Moreover, the application of the random oversampling method as well as the application of booster methods, such as Bagging and Adaboost, further increase the accuracy by 3–4%.

Table 4 Experiment setups and resulting performances

When comparing the initial results of both datasets (see experiments A and B in Table 4), the difference is approximately 2% for the overall accuracy score (82.3% and 84.6%) and the mean k-folds accuracy (83.2% and 85.3%). The application of random oversampling (see experiment C and D in Table 4), leads to an increase in accuracy, whereas the mean k-folds accuracy remains the same for both datasets. Moreover, while the overall accuracy drops by 0.6% for the complete dataset, it does increase by 1.9% when using the no-turnings dataset. Thus, the difference between analyzing only straight intersection crossings vs all trajectories leads to an increase in accuracy of 5.5%. The applied booster methods only increase one of the respective accuracy measures: while the Bagging booster increases the overall accuracy to a maximum of 90.9%, the AdaBooster maximizes the mean k-folds accuracy at 88.0%.

In most cases the TL regulated intersections are successfully recognized. In contrast, PS is the most often misclassified regulator type since a minimum of 12% of the samples is misclassified as TL/UC intersection (compare Tables 5 and 6). Additionally, using a bagging booster the correctly classified samples are increased.

Table 5 Confusion matrix of the no-turnings dataset using a minimum required number of 16 traversals for the classifier training (Experiment B, left) and additionally using the random oversampling (Experiment D, right)
Table 6 Confusion matrix of the no-turnings dataset using a minimum required number of 16 traversals for the classifier training, as well as using random oversampling and Bagging booster (Experiment E, left) or AdaBoost booster (Experiment F, right)

4.2 Feature Importance

The feature importance is derived through the impurity measure of each feature during the training process of the Random Forest. It is a measure of how well the subdivisions at the single nodes of the Decision Tree of the Random Forest are executed. Therefore, the feature importance is a score indicating how valuable a feature is for the decision making process.

The feature importance histograms, depicted in Figs. 4 and 5, for both datasets without random oversampling, show a distribution with similar peak features. These features are the percentage of trajectories with at least one stop (PcTSP), the mean of the total number of stops (meanNumSP) and the mean of the maximum speed (meanMaxS), additionally for the no-turnings dataset the mean of the mean speed (meanMeanS). Throughout all experiments, the features minimum number of stops (minNumSP), minimum duration of stops (minDurSP) and minimum duration of the last stop (minDurLSP) are the least important features.

After the application of the random oversampling methods, these histograms change for the complete data-set, while the no-turnings dataset keep the previous distribution, but with an importance gain for the predominant features and a loss for the inconspicuous features (see Figs. 6 and 7). This leads to the fact that the complete dataset has the dominant features PctSP, min-DistSP, MeanDistSP, minDistLSP, varMeanS and meanMaxS. This distribution differs when using the no-turnings dataset. Here the features related to the distance have lower impact. Furthermore, the speed-related features (meanMeanS and maxMaxS) as well as the mean number of stops (meanNumSP) show a higher importance.

Fig. 4
figure 4

Feature importance histograms for complete dataset

Fig. 5
figure 5

Feature importance histograms for no-turnings dataset.

Fig. 6
figure 6

Feature importance histograms for complete dataset with oversampling enabled

Fig. 7
figure 7

Feature importance histograms for no-turning dataset with oversampling enabled

5 Discussion of the Results

A surprising finding of this study is that the speed-related features have a higher importance than the stop duration or stop-distance-related features. This might originate from the influence of the traffic flow during the data collection process. We assume that stops are not only enforced by intersection regulations but also by the interaction with other traffic participants (parking or turning maneuvers, etc). Therefore, the characteristics of stop events can partly lose their significance in the classification. This finding also makes this study distinct from other related studies, which are methodologically similar, e.g. (Hu et al. 2015), however, do not use such speed related features (mean and maximum crossing speed).

The increased accuracy when using only no-turning samples can have various explanations as well. Often intersections have separate lanes for turning vehicles. Such intersections may have a different regulator type for the turning right lane (YS for the turning lane and TL for the rest). We assume that this special setup influences the behavior at the intersections. Moreover, when a vehicle is turning, it may need to stop to give priority to bicycles or pedestrians (according to the German Traffic Code). This could lead to a change in the movement behavior of the entering vehicle (having priority over other participants). As a result the movement of the vehicle could reflect a non-priority movement behavior.

When we investigate the correlation of the TL and PS regulated samples we need to recapitulate the fact that a TL regulated intersection has two phases. The first phase is the red light phase which indicates a similar behavior as for a SP regulated intersection. The second phase is the green light phase where the same vehicle behavior can be expected as for a PS regulated intersection. This allows us to explain the higher correlation of TL and PS. One could argue to treat the TL as two different regulator types. One for the traffic light green (TL-G) and the other for the traffic light red (TL-R). However, such treatment would require ground truth data which are difficult to obtain.

In conclusion, comparing our result with the only existing study that detects the same regulator types (Zourlidou et al. 2019), our proposed method performs better (accuracy score 90.4% vs 83%).

6 Conclusions and Outlook

In this paper, we presented a method to classify more than two intersection regulator types based on features which can be derived from GPS trajectory data. For the regulator types PS, TL and UC, the achieved accuracy is over 90%.

To further improve our approach, additional samples especially for the highly undersampled regulator types (namely, YS and SP) need to be collected. Moreover, it should be further explored whether a more dynamic approach for the data point selection can improve the performance (our method selects to process GPS trajectories within a fixed buffer around the intersection locations). By considering only GPS points from the previous to the current intersection, we can ensure that only relevant data points for a current intersection will be used in the classification process.

Additionally, by exploring features’ importance, features of low importance can be dropped from the analysis. Moreover, we can use specific feature combinations implemented by other authors (Hu et al. 2015) to compare the results.

The current approach classifies each incoming intersection roadway independently from the other roadways of the same intersection. The regulation of a whole intersection, however, follows certain rules, which can be used to validate (and possibly correct) an individual classification. In the case of an intersection where three of four roadways are predicted as TL, the last one should be corrected to TL. In our experiments this could not be applied due to data limitations.

Concerning the challenges of the approach, we identify as such the sparsity of the trajectory data as the most prominent aspect of the problem as they were collected in an opportunistic way (no instructions were given to the driver regarding where and how to drive). Instead, the collected data represent a natural driving behaviour and route preferences selected from the driver himself. Characteristic of the latter is the uneven sampling of the road segments. Nevertheless, our findings show that a minimum number of traversals per intersection approach must be met to achieve high accuracy.

Due to the sparsity of the trajectory data we have not explored possible time dependencies that may be valid between the sampled data and the classification result. A larger set of samples might uncover stop-duration (or number of stops) vs. time dependencies, but we do not expect such patterns to influence the classification.

As an interesting direction for future exploration, we suggest to include map-based information as additional features to improve the classification results for the intersection regulators. One example could be the distance between the current and the previously crossed intersection. This distance may add extra contextual information of the surrounding area, which can be a distinctive indicator for regulator types (e.g. a low distance between intersections could be indicative of non traffic light regulations).

Moreover, vehicle cameras could be included to receive visual information of the surrounding area. The problem of traffic sign recognition is a widely explored research topic. Therefore, visual information could be used as support or validation evidence of our approach’s findings. That way the accuracy might be raised up close to 100% to make it suitable for applications relevant for autonomous driving.

Last we mention as an extra aspect for consideration, the idea to examine (crowd-sourced) pedestrian trajectories complementary to vehicle trajectories, so that the problem of resolving the traffic regulation of a location to be tackled using evidence from different types of traffic-participants.