Introduction

Common data sources for obtaining traffic information, such as loop detector data or trajectories from GNSS devices, are widely used in traffic engineering and management. However, these sources are either limited in their information content, e.g., only the vehicular flow and occupancy for single loop detectors, or they are restricted to certain modes of transport, such as bus or taxi GNSS data (Yu et al. 2023). Using video analysis from static cameras as well as equipped unmanned aerial vehicles has become a viable means for traffic monitoring (Barmpounakis et al. 2016; Kim et al. 2019). With video monitoring, detailed trajectory data can be extracted for all road users in the observed camera field of view. This becomes more and more important to understand the behavior and the strategic, tactical, and operative maneuvers of road users and to increase traffic safety and efficiency with this knowledge (Outay et al. 2020).

Trajectory data sets have been used for different types of analysis, proving its wide range of possibilities this kind of data provides. For example, it was investigated how speed behavior changes in the inner-city traffic network taking multimodality into account (Paipuri et al. 2021), microscopic parameters for driving behavior were analyzed (Pham et al. 2020) and lane choice and lane change behavior were considered (Espalader-Clapés et al. 2023a). Traffic engineering applications have also been investigated, for example, by estimating queue length profiles (Zhou et al. 2021). A comprehensive revisit on flow studies based on trajectory data can be found in Li et al. (2020).

Regarding safety analysis (Wang et al. 2019) and the environmental impact of traffic, detailed trajectory data can also give valuable insights. Based on a large-scale data set with vehicle trajectories of motorized road users (Barmpounakis et al. 2021), the generated emissions and the impact on air pollution were considered. They developed an emissions-macroscopic fundamental diagram by including a microscopic emission model based on the trajectories. In addition, noise emissions in a congested environment were investigated and the correlation between noise emissions and traffic congestion was analyzed (Espalader-Clapés et al. 2023b).

These examples show that the need for data is becoming more relevant in the field of general traffic analysis and behavior, but also of individual traffic maneuvers like turning, overtaking, etc. It gets especially important when looking at the traffic behavior of vulnerable road users (VRU) as cyclists and pedestrians, as proper data sets about those road users are quite rare and therefore, knowledge about their behavior in the traffic system is quite limited.

For motorized traffic, most large-scale data sets also stem from observations on freeways, where the sensor density and the need for traffic detection are also higher, and therefore, data generation is easier. For urban traffic systems with a much higher variety of different road users and also a more complex road network structure, it is rather difficult to use (existing) infrastructure-based sensors like on freeways (Margreiter 2016; Motamedidehkordi et al. 2017) to have a full data coverage of all road users and their interactions. This present work goes one step further and elaborates on the use of drone video footage in combination with innovative recognition algorithms to provide a full temporal and spatial overview of the complete set of all road users. This enables further research to study in particular the interactions between different road users, especially motorized road users and VRUs with a focus on traffic safety and to better understand multimodal traffic patterns. Such data contributes significantly to improve traffic safety and to optimize the road network operations, as well as to train further video recognition algorithms and neural networks.

Existing Datasets

Mainly the first attempt for extracting trajectories from video data on a large scale is the NGSIM dataset (U.S. 2006). This dataset covers several hundred meters of traffic, mainly on major freeways, collected by several stationary cameras and only includes motorized traffic. Even though this data set was undoubtedly a pioneer and the basis for many findings, several studies have shown that the data set contains inaccuracies and anomalies (Punzo et al. 2011; Coifman and Li 2017). Further highway trajectories are publicly available for research purposes. The highD (Krajewski et al. 2018) and the exiD (Moers et al. 2022) dataset, published by the ika Aachen, provide information on free stretches, as well as exits and entries to German highways, respectively. Also collected by Unmanned Aerial Vehicles (UAV) on German highways, the AUTOMATUM DATASET (Spannaus et al. 2021) provides trajectory information initially intended to validate automated driving functions. Shi et al. (2021) collected the High-Granularity Highway Simulation (HIGH-SIM) dataset using a helicopter, ensuring higher altitudes. This dataset also contains highly accurate highway trajectories.

Regarding urban traffic scenarios, mainly single intersections were taken into consideration up till now. Also published by the ika Aachen, the inD dataset (Bock et al. 2020) provides trajectories from four different urban intersections and the rounD dataset (Krajewski et al. 2020) for three roundabouts. For both data sets, the videos were collected using drones and they include pedestrians and bicyclists besides motorized traffic. The recordings are limited to the battery capacity, and therefore, several recordings exist for each location, but they are matched neither over time nor over space. The CitySim dataset (Zheng 2023) is a dataset that covers twelve different locations and over 1000 min of recording, also filmed by drones. Including a wide range of road geometries, both on intersections and freeway segments, it was initially supposed to contribute to safety research, as well as digital twins.

When it comes to covering a wider range of urban roads, the pNEUMA dataset (Barmpounakis and Geroliminis 2020) is a seminal dataset including all modes of motorized traffic in the city center of Athens. Using a swarm of ten drones in parallel during the morning peak hours on five working days, around 1.4 km2 was covered. Again, the recordings were limited due to battery constraints, resulting in recording lengths of around 15–20 min for each period. Therefore, the data are continuous in space but lack information on VRUs and the periods in between the flights. Recently, Yu et al. (2023) published a stationary camera-based dataset. For two cities in China, the authors used cameras at intersections distributed across the city to extract motorized traffic. They developed an algorithm to recognize the vehicles so that the trajectories could be tracked over the entire area and recognized across several cameras. Naturally, they could not directly track the unmonitored intermediate routes. A comparison of the mentioned data sets is shown in Table 1. A further overview of earlier data sets can be found in (Bock et al. 2020).

Table 1 Comparison of different available trajectory data sets

The limitations of the data sets listed above give rise to the need for (i) a coherent record covering several intersections, which is also (ii) continuous over time, (iii) depicts VRUs in addition to motorized traffic and (iv) is publicly available open source for research purposes. We close these gaps with the TUMDOT-MUC dataset (Trajectories from Urban Multimodal Drone Observations of Traffic—Munich), which is a trajectory dataset that was recorded using twelve drones during the afternoon peak hours on two weekdays. The experiment and the dataset are described in more detail in the following chapters. The data set was gathered within the Munich TEMPUS project (Kutsch et al. 2022), a project to prepare the Munich urban and suburban road network for connected and automated driving.

Experimental Design

The drone surveys were conducted along Rheinstrasse in Munich, Germany, between Bonner Platz and Leopoldstrasse. Six locations were covered by two drones each, as shown in Fig. 1. The drones had a flying altitude of 110 m when taking the videos, resulting in a recording area of 143 m on 75 m for each drone location. With the need for an overlap between the individual locations for reconstructing the complete trajectories throughout the whole stretch, a total length of 700 m was recorded. The road is highly frequented and can be classified as a main road. The perimeter includes two signalized intersections, one right-of-way-controlled intersection, two non-signalized T-intersections and two signalized pedestrian crossings. Bicycle traffic in this area is mostly guided on a protective lane. For the general traffic, and pedestrians in particular, there are several points of interest within the recording area and the immediate surroundings. In addition to a school and a kindergarten, several doctors and pharmacies, as well as numerous shopping facilities, both for groceries and drugstores, and recreational activities are also available. Several restaurants, a sports club and three hotels are located in the area. There are also several access points to public transport, namely a tram and several bus stops on Leopoldstrasse (location 6), as well as a subway access point on Bonner Platz (location 1).

Fig. 1
figure 1

The six observation locations in the North of Munich (Own elaborations based on Google Earth Pro)

Until today, drones have, assuming optimal conditions, a maximum flight time of around 30 min due their limited battery capacity. Various systems, such as balloons and tethered drones with power supply, have been assessed but do not yet provide the necessary image stability or are not suitable due to the high costs and the more complicated approval process, especially in the city center. For this study, in order to create a continuous data set, two drones were used for each location, and a ‘shake-hands’ was performed in the air so that the drone to be replaced only descended when the following drone was already in the air and had taken over and fixed the image section. Accordingly, a total of twelve drones were in use. They recorded a total of analyzed videos amounting to almost 42 h from their respective single locations. The recordings were performed on Thursday, October 6, 2022 and Wednesday, October 12, 2022. The days in the middle of the week were determined according to surrounding detector data with the purpose to capture the most frequented times in the week. On both days, flights were performed in the afternoon in order to be able to depict the leisure traffic as well as the commuter traffic and therefore the peak-hour. On the first day, the last drone was in the air at 15:35, so that a complete image of the entire stretch could be generated. The recordings were carried out until 18:45. On the second day, the first shots were started even earlier to counteract the earlier sunset. Thus, from 15:00 on all drones were in the air until the end of the recording at 18:25.

Due to their altitude and a mostly unobstructed viewing angle toward the observed road and sidewalk segments, the drones were able to record all road users in the designated area. The different classified modes are passenger cars, busses, trucks, trams, motorcycles, (e-)bicycles, pedestrians and (e-)kick-scooters. Moreover, truck trailers and bus trailers (in the city of Munich several high-capacity bus routes operate busses with bus trailers) were also recognized and categorized. Please note that no distinction could be made between ordinary bicycles, electric bicycles and cargo bicycles as these could not be differentiated from the corresponding flight altitude. In addition to those road users, the dataset also contains trajectories of an electrified bicycle Rickshaw (Fehn et al. 2023; Margreiter et al. 2023) as well as the trajectories of two semi-automated vehicles as part of the TEMPUS project.

Due to the high complexity of coordinating twelve drones, isolated technical problems (activation of the drone’s return home function when the battery was almost fully charged, inexplicable distortion of the videos), as well as strong rotations of individual drones due to wind gusts, there were some isolated monitoring gaps during the recordings, which are shown in Fig. 2. This mainly affected location 3 and, for two short periods, location 6. At the other locations, the monitoring and the changeover in the air proceeded smoothly.

Fig. 2
figure 2

The observation timelines for both days, including recording gaps

The continuous temporal coverage requires additional effort in terms of planning and resources. As far as the recording area is concerned, twice as many drones, pilots and batteries are required to cover the same area. Alternatively, all drones, in this case twelve, could be used in parallel. This would result in twice the area covered. Finally, the resources could also cover a more extended period than just the planned peak hours. In all cases, there would be a time gap when the battery is changed and valuable information would be lost. However, the following aspects supported continuous recording for this experiment:

  1. 1.

    The area chosen for the survey was selected, so that there is a high volume of traffic and covers many pedestrians and cyclists. A local extension would still be valuable but not that much related to the chosen focus as the surrounding roads do not have a similar impact on VRU.

  2. 2.

    The effective recording time per drone was between 15 and 18 min. Assuming 10 min of idle time for technical tasks and coordination, the number of trajectories would amount to 55–65% compared to full observation. If the observation time were extended, i.e., by adding further recordings before and after the chosen recording hours, it would be possible to achieve the same amount of trajectories. However, the loop count values of the adjacent detectors show that traffic decreases significantly outside the peak hours. For the detector at the eastbound approach to Leopoldstrasse, the traffic volume for the peak hours between 15:00 and 18:00 on the second day of recording was around 385 vehicles/hour. In contrast, the traffic counts between 12:00 and 15:00 are roughly 270 vehicles/hour. Likewise, the number drops from 290 vehicles/hour at 18:00 to 170 vehicles/hour at 20:00. Therefore, the trajectories would be 25–55% less in these periods, and almost double the time would be needed to cover the same amount of trajectories.

  3. 3.

    Traffic engineering-related aspects, such as the traffic state and the formation of waiting queues, are very time-dependent and related to the near past states. Accordingly, a continuous data set is valuable for estimation approaches that include the short-term past.

Processing Steps

Besides the missing time gaps, further implications arose from shadowing caused by trees and construction sites due to the bird's eye view of the drones. These visual obstructions are shown in blue in Fig. 3A. The obstructions only affect the pathways for pedestrians and bicyclists and mainly occur at location 1 and on the northern sidewalk of Rheinstrasse. In addition, in some places, it is difficult even for the human eye to distinguish whether an object is a dynamic object, a static object or a shadow, despite the comparatively low flight altitude. Figure 3B shows the signalized crossing from location 4. Here, for example, the light signal post is often misclassified as a person and the distinction between individuals in the waiting area is regularly blurred. Figure 3C shows the eastern crossing of Potsdamer Strasse at the signal ford. Here, too, the large number of people and the low sun make it difficult to distinguish between the individual road users and their shadows. This problem also occurs with two people walking very close to each other. Several filters and processing steps were performed to remove misclassifications, connect trajectories separated by visual obstructions and sort out static objects.

Fig. 3
figure 3

Visual obstructions in the recording area due to trees and construction sites shown in blue (A) (Own elaborations based on Google Earth Pro) and visualization of problematic scenarios due to static objects, such as traffic signals (B) and shadows (C) in crowded areas (zoomed view) (Colour figure online)

The raw data, as provided by DeepScenario GmbH, consists of one data file per evaluated frame in each individual video, i.e., for each location and drone flight, and a matching file of the trajectories for each timeline on one location. To obtain these timelines for each location, the consecutive videos were connected at one location using manual frame matching, i.e., the frame of the ending video was compared with the corresponding frame of the following video, provided there were no gaps in the recording. The evaluation frequency was set to 12.5 frames per second, i.e., providing updated information every 0.08 s. The extracted trajectories contain information on the track (or object) ID, classification, coordinates, dimensions, speed and acceleration of each tracked object in each frame. Based on this information, the following steps were implemented to get a clean and spatially and temporally continuous dataset:

(i) Static objects, which do not move throughout the whole observed period, were filtered out. These trajectories belong to misclassified objects, such as traffic light poles, postal mailboxes or parked vehicles. This filter is based on a defined displacement threshold, which was chosen to be two meters in our case. The center point of the object during the tracked period was first determined on the 2D space, i.e., the longitude (x) and latitude (y) coordinates. The Euclidean distance to this center point was then calculated for each point of the trajectory in time. If the amount of timestamps, where the distance is below the threshold value, exceeds 80%, the trajectory was removed.

(ii) For non-motorized road users, a second filter was used to connect different trajectories that belong to the same object but are not assigned to the same ID due to obstruction or other reasons. This is summarized in the following steps:

  1. 1.

    Category, position and speed at the first and last point in time of the trajectory were determined for each object. As the speeds of objects are specified in terms of direction as x- and y-vectors according to the axis of the global coordinate system (the third dimension, i.e., the z-vector, was not used for filtering), the average of the values for the last second was used for the final speed in order to include changes in direction.

  2. 2.

    For each object, it was checked whether there is an object with a higher ID (i.e., an object that was detected later than the currently considered one), where (a) the time gap between the last detection of the current object and the starting point of the following object is below a defined threshold time gap and (b) whose category matches. From testing different threshold time gaps, a maximum of 250 frames turned out to perform best for all VRUs.

  3. 3.

    A calculated position of the ending trajectory was then computed for the determined time gap using the end speed vectors described before and the actual time difference.

  4. 4.

    If the distance between the calculated position from step 3 and the start position of the following object is less than a defined threshold distance, both trajectories are assigned the same ID. The threshold distance was set at 4.0 m.

  5. 5.

    The gap values that could not be recorded directly were then filled by linear interpolation. Interpolated values are marked in the data set with an additional attribute to prevent the illusion of unmonitored ground truth, for example, for behavioral models.

(iii) The data for the individual locations were then once again filtered for trajectories existing at least 0.5 s.

(iv) As a last step concerning the individual videos, a manual check was carried out in order to eliminate further errors as much as possible.

(v) The resulting data, which is accordingly available isolated for the individual locations and time slots, then had to be linked in terms of time and space. Due to small localization and dimension errors of up to 15–20 cm, isolated misclassifications, errors in the time synchronization of the drones, as well as small deviations in the detected speeds, the linking of the trajectories is not quite trivial, especially for pedestrians and cyclists.

A synchronization file connecting the ending frame of one video to the corresponding start frame of the following video was manually created for continuously matching the consecutive videos on one location. For each video transition in time, a matching score S was calculated for all objects appearing in the ending frame and the starting frame of the following video, wherever the spatial distance was less than five meters in between. For this purpose, each object’s coordinate, labeled x and y, their speed vectors vx and vy, referenced in the global coordinate system, as well as their dimensions in the objects coordinate system, i.e., the width dx and the length dy were utilized. This score, shown in formula 1, was then used to assign the corresponding object IDs. A value close to 0 indicates the same object, as the current position, the speed and the area covered by the objects are nearly equal. A value higher than 3 turned out not to be a match in any case assuming the correct transition frame.

$$S=\sqrt{{\left({x}_{2}- {x}_{1}\right)}^{2}+ {\left({y}_{2}- {y}_{1}\right)}^{2}}+ \sqrt{{\left({v}_{{\text{x}}_{2}}- {v}_{{\text{x}}_{1}}\right)}^{2}+ {\left({v}_{{\text{y}}_{2}}- {v}_{{\text{y}}_{1}}\right)}^{2}}+ \left|\left({d}_{{\text{x}}_{1}}* {d}_{{\text{y}}_{1}}\right)- \left({d}_{{\text{x}}_{2}}* {d}_{{\text{y}}_{2}}\right)\right|$$
(1)

From the previous filters, objects tracked in one video might not appear in the consecutive video. For example, a car that performs a parking maneuver in a video is present in the corresponding data. However, if this car is parked for the duration of the subsequent video, it will be filtered out there. Consequently, objects without a match in the respective other data file might eventually appear. These cases retain with their existing ID. In cases where more than one pair of objects had a score below 3, the pair with the lowest score was matched. The IDs of the merged objects were adjusted so that the ID from the first video was retained. The remaining objects received a new, globally unique ID and the timestamps were converted from individual videos to a global timeline.

(vi) The continuous timelines were then merged across the six locations. As described above, the data set at this point contained a unique ID for each object in one timeline per location and day. To track individual trajectories throughout the whole observation area, the respective track ID needed to be matched between the locations as well. The linking logic was implemented according to the following description to guarantee a scalable approach.

Before starting to concatenate, a reference location needed to be determined. To ensure the correct time for each location and since the start-times vary slightly between the drones, location 2 was chosen as reference timeline for this. This means that the first frame of the first video of location 2 is set to be the global frame 0 as it provides a continuous recording throughout both days. Starting from this location, all other locations were added one by one and the global time stamps were adjusted. The process for merging locations 2 and 3 is explained below as an example.

  1. 1.

    The overlapping areas of the first videos of both locations were manually compared to find the rough time difference (to the second) between the beginnings of the videos, as shown in Fig. 4. For location 3 on the first day, the second 26 (i.e., roughly 26 s with 25 frames per second = frame 650) equals the start of location 2.

  2. 2.

    Based on this difference, the previously used score S, described in Eq. 1, was determined for all objects for the period of 100 frames before and after this determined matched frame in the filtered dataset. Again, only scores below three were kept, as well as the lowest score for duplicate matches was used.

  3. 3.

    The frame in the search area with the highest number of matches and the lowest sum of all scores was retained as the matching frame. Therefore, frame zero from location 2 was compared with every frame between 550 and 750 from location 3 – and the frame with the best score was stored.

  4. 4.

    This process was carried out for the first 100 s (i.e., the first 1250 frames) of each timeline, and again, for each comparison, the frame with the best score and maximum number of matches was stored. The time difference that occurred most frequently with the method described was defined as the actual time difference.

  5. 5.

    This frame difference was then used to assign the objects in the overlapping area for each frame in the timelines of the neighboring locations. The matched IDs were then again assigned to the same object ID, whereas the non-matched objects received a new globally unique ID.

  6. 6.

    In the overlapping area, only the entries of location 2 were kept to eliminate duplicate data points. Finally, the timestamps of location 3 were adjusted to match those from location 2.

Fig. 4
figure 4

Overlapping areas in the respective first videos of location 2 and location 3 on the first recording day. An offset of roughly 26 s was found and used as input for merging the trajectories of these locations

This process was repeated for each timeline after an observation gap as shown in Fig. 2, resulting in a complete merge of location 2 and location 3 on one day. As soon as the two locations were completely processed, the next location was added to the data in the same way. As a result, the final data sets with connected trajectories and global timestamps were retrieved. The whole processing pipeline is visualized in Fig. 5. The resulting merged data set, which is available open source, is explained in more detail in the following section.

Fig. 5
figure 5

Pipeline for recording and processing the data in order to obtain a time and space-continuous image of the traffic

Format Description

The processed data, split into 10 min sections for easier handling, are available in csv-formatted files to ensure easy access. Each file contains the trajectories of all road users in the entire survey area in the corresponding period. One line describes an object at a point in time with the attributes shown in Table 2.

Table 2 Extracted data parameters from the drone video footage

The timestamp describes the global time in seconds, starting from the very first reference frame. This is for both days the first frame the recording started at each location, i.e., when all drones started recording in parallel. The object category is stored as a numeric value for each category mentioned before (truck, bicycle, pedestrian, etc.). Category ID 1, as shown in the example, would match the classification as a passenger car. The track ID is the object identifier for each road user. This value is unique for the whole data sets, making it possible to follow objects throughout the different subsequent frames. The translation describes the location of the respective object, where the values describe the offset with respect to a reference point, which is given in the UTM 32 N (NIMA 1989) coordinate system and is uniform for the entire data set. The reference point and the assignment table for the object classifications, as well as other meta information, are available together with the data set. The translation value refers to the center of the bounding box and the bottom of the object, i.e., on the ground. As in the global coordinate system, one unit equals one meter. In addition to the position, the three-dimensional bounding box, i.e., the length, width and height of the object are also shown in meters. The current velocity and the acceleration are given as a vector in the respective global coordinate system and the rotation describes the current orientation in that system. The rotation vector refers to the respective rotational axis, i.e., the z value refers to the orientation parallel to the surface of the earth. It can easily be transformed into any other representation, for example with the scipy.spatial.transform.Rotation module.Footnote 1 As the processing steps include interpolation steps for the obstructed areas, each data point is marked as 1 if it is interpolated and 0 if it comes directly from the computer vision algorithm supplied by DeepScenario GmbH.

Additional Data

In addition to the trajectory data itself, supporting material is also available. The loop detectors shown in red in Fig. 6 provide counts of flow in vehicles per interval, aggregated for 15 min intervals, as well as occupancy values between 0 and 1. The loop detectors are inductive single loops. Therefore, no velocities or classifications are detected. One detector per lane is located in front of each intersection approach. The values are provided for the periods of the surveys on both recording days and can be downloaded together with the trajectories.

Fig. 6
figure 6

Single inductive loop detectors (red) and traffic signals (yellow) of which the data can be downloaded additionally. (Own elaborations based on Google Earth Pro) (Colour figure online)

Furthermore, the switching data of the traffic signal systems shown in yellow in Fig. 6, which contain the actual signals on a second-by-second basis, are also supplied. Each signal group is assigned a unique ID. The data table contains the attributes signal ID, a timestamp and the displayed signal. One line corresponds to the display status of one signal group. A signal change is saved in a new row with the same signal ID, the timestamp to the second and the updated signal. Consequently, the difference between the two timestamps corresponds to the display duration of the first signal.

In order to be able to match the trajectories to a map, an OpenDRIVE map as specified by the Association for Standardization of Automation and Measuring Systems (ASAM)Footnote 2 is supplied, containing the full ground truth of the street layout and the infrastructure of the recording area.

Resulting Dataset and Limitations

The resulting data set contains over 24,000 trajectories per day. A more detailed look at the trajectories on the second day of the survey (October 12) shows that out of a total of 24.138 trajectories, 10.531 originate from motorized road users, i.e., cars, motorcycles, busses, heavy goods vehicles or streetcars (i.e., trams), see Fig. 7A. Consequently, around 56% represent VRUs, i.e., pedestrians, cyclists and scooters. The average trajectory length is 186 m, where motorized road users appear to have longer trips, as can be seen in Fig. 7B.

Fig. 7
figure 7

The mode share of the collected trajectories (A) and the average trip length per category (B)

At this point, it should be mentioned again that there were problems with obstructions from trees and considerable difficulties with people walking in groups and being very close together. These errors could not completely be corrected in post-processing, which is why trajectory breaks and abrupt tracking changes between two people occur. The actual number of VRUs is, therefore, smaller than the number of trajectories, and the average trip length is somewhat longer. These errors do not exist for motorized traffic, which, on the one hand, uses the uncovered road and, on the other hand, is sufficiently large to always differentiate clearly.

The data can be used to quickly analyze traffic patterns. By having all trajectories available, capacity bottlenecks can be easily identified. Figure 8 shows the average speed of all motorized road users in the period from 16:00 to 16:15. The area was divided into cells of equal size, each with an edge length of 0.25 m, and each data point was assigned using the position corresponding to the cell. An average speed was then determined for each cell from all the data points and displayed. Blue cells represent a slow average speed, orange and yellow area represent high up to free flow speed. In particular, it can be seen that the eastbound Rheinstrasse in the approach to Leopoldstrasse has significantly lower average speeds than the opposite direction. The illustration realistically depicts the problem of long queues and major time losses.

Fig. 8
figure 8

Mean speed in the observation area between 16:00 and 16:15 on the second recording day

Conclusions and Outlook

The highly detailed dataset allows applications in traffic flow theory to be developed, validated, and calibrated. For example, the immense ground truth data set can be used to develop and verify car following models, turning behavior models, link-level traffic flow models, or queue length estimators. In addition, the multimodal trajectories, which include non-motorized road users with a high level of detail, also depict interactions between VRUs and motorized vehicles, both on the open, non-disturbed link, as well as at signalized and non-signalized intersections. Consequently, especially in the presence of different levels of vehicles’ connectivity and automation, important conclusions can be derived regarding the impact on traffic flow in urban environments.

Also, regarding traffic safety aspects, having each road user together with the information on the underlying road network, different situations have been captured. Having several different types of intersections, as well as different types of bicycle paths and other infrastructure elements, the dataset is ideally suited for a more detailed safety analysis.

To enable a wide and global access for scientists and practitioners, the TUMDOT-MUC data set is available open source. The open data set includes all extracted trajectories for all road users continuously in time and space through the whole area covered by all twelve drones at the six locations, as well all the supporting data sources mentioned before. It is available under the Creative Commons CC BY-NC 4.0 license to guarantee easy access and usability for further research.

In beginning of summer 2023, a similar dataset was gathered in the city of Ingolstadt, Germany, via the KIVI project (Ilic et al. 2022a, 2022b) funded by the German Federal Ministry of Digital and Transport. The number of drones was increased to a total of 18, which flew at a reduced altitude of 100 m. This enabled an even greater accuracy to be achieved and a larger area to be mapped, in total 9 locations. Also for this study in Ingolstadt, the afternoon peak hours were chosen for the observation time, with recordings being made on a total of three days. Likewise, this data set, TUMDOT-IN, will be made available open source.