Keywords

1 Introduction

Due to safety concerns, testing and analyzing safety-critical urban interaction scenarios involving vehicles, pedestrians, and bicycles on real roads or proving grounds is challenging. On the one hand, current approaches, which utilize moving dummy pedestrian or bicycle targets on proving grounds or avatars in numerical simulations represent human interaction behavior only to a very limited extend. On the other hand, analyses in motion laboratories, an approach which has been increasingly used in recent years for evaluating pedestrian and bicycle behavior in traffic in a safe test environment, consider the automated vehicle only in simulation. With both approaches, the behavior of the interaction partner is always only realistically depicted to a limited extend; real behavior is only approximately or not at all possible.

In this paper, we introduce a new cyber-physical test environment for a realistic and safe analysis of the interaction of advanced driver assistance systems or automated driving systems with vulnerable road users (VRUs) and we analyze the strengths and weaknesses of such an approach. Such an approach has been theoretically proposed in [1], but, to the best knowledge of the authors, no realization of their idea or experimental results have been shown so far.

2 Method

For safe and realistic studies of Vehicle-VRU interaction scenarios, we combine a Vehicle-in-the-Loop test bench (ViL) with a Pedestrian-in-the-Loop test bench (PiL) and a Cyclist-in-the-Loop (CiL) test bench via a virtual test field as shown in Fig. 1.

Fig. 1.
figure 1

MotionLab - Connected test benches for human-machine interaction of automated vehicles with VRUs

The real automated vehicle on the ViL test bench perceives information about other traffic participants from the virtual test field via its camera filming a projected image from a virtual reality (VR) environment model. It can respond by steering, accelerating, or braking against electrical motors that provide realistic force and torque feedback and update the pose of a vehicle avatar in the same VR environment model. More detailed information about the ViL test bench is available in our previous works [2,3,4].

On the PiL test bench, a real human test subject wears a head mounted display (HMD) and perceives information from the VR environment model. The subject moves on a low-μ walking platform with adaptive incline and is fixated at the hips for this purpose. The position and orientation of the subject’s head (from the HMD), hands, and feet (from trackers), together with the position and orientation of the test subject’s hips, as well as walking speed and direction (from the walking platform) are provided to an Unreal Engine 5 implementation. There, this tracking information is used to calculate the local pose of a human skeleton model [5]. Excessive degrees of freedom of this model are derived using inverse kinematics (e.g., the positions of elbows and knees). Using the walking speed and direction, the pedestrian avatar changes its global position in the environment and is animated accordingly.

On the CiL test bench, a real human test subject wears an HMD and perceives information from the VR environment model while riding a bike on a bicycle trainer. A camera-based 3D human pose estimation (HPE) system is used to determine the precise position of the human rider relative to the bike as an additional signal into a multi-body system (MBS) model of the bicycle with rider. The camera-based 3D HPE has been integrated as an efficient and flexible alternative to the previous purely laboratory-based motion capture using marker-based systems as part of a collaboration with Subsequent GmbH (https://www.subsequent.ai). The AI method used enables the use of simple video data, such as from smartphones or vehicle cameras, to evaluate the detailed 3D movement data of the human skeleton in real time. The solution has so far been used in professional and elite sports, neurological rehabilitation, home fitness, and security [6]. Additionally, the measured steering angle and wheel speed are used as inputs. The MBS model controls the resistance torque on the eddy-current dyno on the bike’s rear axle. The pose of the bike and the rider is sent from the MBS model in real time to an Unreal Engine 5 implementation and animates a cyclist and bike avatar in the virtual test field in the same way as on the PiL test bench.

Information from the virtual test field then again feeds into the vehicle’s sensor input, allowing it to react to the VRUs’ actions without endangering the subjects in any way.

3 Experimental Validation

For the analysis of the strengths and weaknesses of the proposed method, we conduct tests both in real-world conditions and in our test environment. The general functionalities of the ViL test bench, such as the integration of the real vehicle and the transfer of the real vehicle’s motion to the motion of the vehicle in the virtual environment, have been previously demonstrated by us [2, 4]. Therefore, our tests focus on the integration of the PiL test bench and its influence on the vehicle’s perception. If the perception works similarly, we expect only minor differences in all subsequent vehicle functions.

For our tests, we use a real automated shuttle bus that employs a monocular camera (1280 × 960 px, 20 fps) and feeds the images to an instance of the 3D HPE algorithm, which we also use in the CiL test bench, to estimate a skeleton representation of the pedestrians pose. We treat the anchor point of the skeleton as position of the pedestrian.

For different pedestrian poses, the skeleton estimation accurately represents the position of various limbs (see Fig. 2, left and middle). As expected, the estimation is accurate for visible limbs but shows higher deviation for covered limbs. However, the skeleton estimation may not always represent the pose of the human test subject correctly if untracked joints are highly bent. In such cases, there is a deviation between the test subject’s pose and the avatar’s pose (see Fig. 2, middle and right; note the deviation in the position of the left and right elbows) and the HPE algorithm is fed a divergent visual representation of the test subject. This issue mainly concerns elbow positions and, to a lesser extent, knee positions.

Fig. 2.
figure 2

Skeleton estimation in reality (left), skeleton estimation on test bench (middle), and respective pose of the test subject (right)

To compare the performance for static objects, we position test subjects of small (166 cm), medium (178 cm), and large (189 cm) heights, or similarly sized human avatars, in defined positions and compare the output of the HPE algorithm. The pedestrians are detected in similar positions (see Fig. 3). For test subjects placed closer to the vehicle (x < 12 m) the position’s standard deviations fall within the interval between 0.7 cm and 7.9 cm; for test subjects further away, they fall within the interval between 4.2 cm and 53.7 cm. The measurements for both real-world conditions and the test bench setup are similar for mean value and standard deviation, especially for smaller distances to the vehicle. We are also able to reproduce the systematic measurement error for different test subject heights.

Fig. 3.
figure 3

Static pedestrian position estimations

To compare the performance for dynamic objects, the test subject had the task of walking past the vehicle at predefined distances (laterally back and forth both closer and further away). In the real-world scenario, these distances were marked with cones; for the test bench setup, ground truth data of the test subject’s position is available from the VR environment model. In the results (see Fig. 4), we can see that the pedestrian is detected with a similar deviation from the real path in both environments. However, we note higher uncertainty in the position estimation for higher distances in the test bench scenario. We assume, this is due to the unrealistic reflective surface of the pedestrian avatar (see Fig. 2, middle).

Fig. 4.
figure 4

Dynamic pedestrian position estimations with ground truth

To estimate the latency of our test bench, we recorded both the test subject in the motion laboratory and the image projected for camera stimulation of the vehicle using a camera (see Fig. 5), which can record at 240 frames per second - this corresponds to a temporal resolution of approximately 4.2 ms. A test subject raises or lowers their arm in a semi-circular motion, which in turn animates the pedestrian avatar accordingly. We counted the number of camera frames between when the test subject’s arm is at a 90° angle to the torso and when the avatar’s arm reaches the same position. Considering the recording frequency, we observed an average latency of 134 ms with a standard deviation of 15 ms across ten individual measurements. All measured latencies fall within the interval between 104 ms and 154 ms.

Fig. 5.
figure 5

Setup for latency measurement

4 Summary and Outlook

This paper presents a test environment in which a Vehicle-in-the-Loop test bench is combined with a motion laboratory to include a human pedestrian and a human cyclist. We demonstrate that vehicle perception can be stimulated in the test environment in a manner comparable to real-world conditions. Our approach surpasses traditional test benches and motion laboratory concepts while maintaining the safety of laboratory environments, unlike proving ground tests.

In the future, we plan to add tests for vehicle-cyclist interaction scenarios. To improve pedestrian depiction in the test bench, we want to use more realistic pedestrian avatars. Additionally, we aim to integrate the human pose estimation algorithm within the Pedestrian-in-the-Loop test bench to increase the accuracy of the pedestrian avatar’s animation, particularly for otherwise untracked limbs.