Machine learning-based test selection for simulation-based testing of self-driving cars software

Birchler, Christian; Khatiri, Sajad; Bosshard, Bill; Gambi, Alessio; Panichella, Sebastiano

doi:10.1007/s10664-023-10286-y

Machine learning-based test selection for simulation-based testing of self-driving cars software

Open access
Published: 26 April 2023

Volume 28, article number 71, (2023)
Cite this article

Download PDF

You have full access to this open access article

Empirical Software Engineering Aims and scope Submit manuscript

Machine learning-based test selection for simulation-based testing of self-driving cars software

Download PDF

3636 Accesses
12 Citations
4 Altmetric
Explore all metrics

Abstract

Simulation platforms facilitate the development of emerging Cyber-Physical Systems (CPS) like self-driving cars (SDC) because they are more efficient and less dangerous than field operational test cases. Despite this, thoroughly testing SDCs in simulated environments remains challenging because SDCs must be tested in a sheer amount of long-running test cases. Past results on software testing optimization have shown that not all the test cases contribute equally to establishing confidence in test subjects’ quality and reliability, and the execution of “safe and uninformative” test cases can be skipped to reduce testing effort. However, this problem is only partially addressed in the context of SDC simulation platforms. In this paper, we investigate test selection strategies to increase the cost-effectiveness of simulation-based testing in the context of SDCs. We propose an approach called SDC-Scissor (SDC coS t-effeC tI ve teS t S electOR) that leverages Machine Learning (ML) strategies to identify and skip test cases that are unlikely to detect faults in SDCs before executing them. Our evaluation shows that SDC-Scissor outperforms the baselines. With the Logistic model, we achieve an accuracy of 70%, a precision of 65%, and a recall of 80% in selecting tests leading to a fault and improved testing cost-effectiveness. Specifically, SDC-Scissor avoided the execution of 50% of unnecessary tests as well as outperformed two baseline strategies. Complementary to existing work, we also integrated SDC-Scissor into the context of an industrial organization in the automotive domain to demonstrate how it can be used in industrial settings.

Two is better than one: digital siblings to improve autonomous driving testing

Article Open access 17 May 2024

Model vs system level testing of autonomous driving systems: a replication and extension study

Article Open access 02 May 2023

Self Drive Guard: A Simulation Platform for Autonomous Driving Systems

Find the latest articles, discoveries, and news in related topics.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Cyber-Physical Systems (CPSs) leverage physical capabilities from hardware components as well as computational and artificial intelligence from software components to operate in complex and dynamic environments, potentially involving humans (Baheti and Gill 2011). Specifically, CPSs continuously collect sensor data from the surrounding environment and analyze them to control physical actuators at run-time (Baheti and Gill 2011; Academies of Sciences 2017).

CPSs find application in many domains ranging from Robotics and Transportation to Healthcare and are expected to drastically improve the quality of life of citizens and the economy (Chen 2017). For instance, self-driving cars (SDCs), an emerging application of CPS in transportation, are expected to impact our society profoundly by drastically reducing human errors that currently cause more than 90% of driving accidents, improving passenger comfort, and limiting pollution (Kalra and Paddock 2016). Currently, one of the main factors limiting the widespread usage of SDCs is the lack of adequate testing. Releasing SDCs equipped with defective software poses the risk that they might become erratic, which has already led to some fatal crashes (Baheti and Gill 2011; Guardian 2018). Testing automation is crucial for ensuring the safety and reliability of software, including the one controlling SDCs (Kalra and Paddock 2016; Kim et al. 2019). However, most developers rely on human-written test cases to assess SDCs’ behavior. This practice has several limitations and drawbacks: (i) difficulty in testing SDCs in representative and safety-critical scenarios (Guardian 2018; The-Washington-Post 2019; Ingrand 2019); (ii) difficulty in assessing SDC’s behavior in different environments and execution conditions (Kalra and Paddock 2016). As a consequence, SDC practitioners in the field are facing a fundamental development challenge: observability, testability, and predictability of the behavior of SDCs are highly limited (Guardian 2018; The-Washington-Post 2019; Ingrand 2019). Thus, new testing practices and tools are needed to find SDC faults earlier during development and, eventually, support the widespread usage of autonomous driving.

Simulation environments can potentially address several of the challenges mentioned above (BeamNG GmbH 2022; Bondi et al. 2018; Dosovitskiy et al. 2017; Nvidia 2020) since simulation-based testing is more efficient than and can be as effective as traditional field operational testing (Afzal et al. 2020; Dosovitskiy et al. 2017). Additionally, simulation-based testing results are easier to replicate and can support established model-in-the-loop (MiL), software-in-the-loop (SiL), and hardware-in-the-loop (HiL) development strategies. Consequently, an increasingly large number of commercial and open-source simulation environments have been delivered to the market to conduct testing in the autonomous driving domain (Dosovitskiy et al. 2017; BeamNG GmbH 2022) as well as other CPS domains (Shin et al. 2018). For such reasons, our work focuses on simulation-based testing in the context of SDCs.

1.1 Problem Statement and Research Questions

Simulation environments enable automated test generation and execution (Gambi et al. 2019). However, the potential size of the testing space of simulation environments is, in principle, infinite, which poses several challenges and questions (What SDC test cases to select to identify faults efficiently? Is it possible to characterize safety-critical SDC tests?) in exercising the SDC behaviors adequately (Birchler et al. 2023, 2022, 2022c; Abdessalem et al. 2018b; Gambi et al. 2019). The time budget devoted to testing activities are usually limited, making the identification of faults particularly challenging in the SDC domain since the execution of simulation-based tests is considerably slower compared to other forms of tests (e.g., unit and system tests of traditional software systems).

For instance, testing how an ego-car handles a driving scenario can easily take several minutes (Panichella et al. 2021; Birchler et al. 2022, 2022c); in contrast, running a unit or system test of a traditional software system takes some (milli)seconds. It is important to point out that simulation-based testing tests the subject on the system level, which involves all components and not just a unit, and simulates the environment from which the test subject takes its inputs. Therefore, it is paramount that developers test SDCs cost-effectively, for example, by using test suites optimized to reduce testing effort or by improving existing automated test generators’ efficiency without affecting their ability to identify faults (Yoo and Harman 2010; Nucci et al. 2020; Abdessalem et al. 2018b).

In this paper, we investigate techniques to improve the cost-effectiveness of simulation-based testing in the context of SDCs. Specifically, we focus on techniques that employ Machine Learning (ML) models for supporting test case selection (TCS), addressing the following main challenges: (i) to leverage test case characteristics as well as ad-hoc SDC test case metrics to characterize best unsafe (fault revealing) and safe (not fault revealing) SDC test cases; (ii) to identify suitable ML models that can reliably predict the SDCs’ behavior before executing those test cases; (iii) to experiment with the usage of such ML strategies to effectively distinguish unsafe test cases from safe ones; (iv) to integrate the proposed ML-based approach into the context of an industrial organization in the automotive domain, thus demonstrating its applicability in industrial settings.

We are interested in testing the safety of SDCs; therefore, we deem as relevant those scenarios that expose a fault (e.g., an SDC drives out off the road). We call those scenarios unsafe. Consequently, our TCS techniques exploit ML models to classify SDC test cases that are unsafe (i.e., likely to expose a fault) or safe.

To address the aforementioned challenges, in this paper, we seek to answer the following research questions:

RQ₁: To what extent is it possible to identify safe and unsafe SDC test cases before executing them? Answering RQ₁ is important to understand whether, and to what extent, it is possible to classify test cases for SDCs before executing them and by only considering static input features (i.e., referred to as Road Characteristics). We investigate the use of ML models for classifying test cases and study their application in the context of Lane Keeping, the fundamental requirement in autonomous driving. Specifically, in testing lane-keeping systems, unsafe scenarios cause self-driving cars to depart their lane (Gambi et al. 2019; Birchler et al.2022, 2022c), and input features describe the geometry of a road as a whole (i.e., Road Features).
RQ₂: Does SDC-Scissor improve the cost-effectiveness of simulation-based testing of SDCs? RQ₂ investigates whether SDC-Scissor improves the cost-effectiveness of simulation-based testing of SDCs, compared to baseline approaches. Hence, in the context of RQ₂, we investigated whether SDC-Scissor reduces the time dedicated to executing irrelevant (safe) tests without affecting testing effectiveness.
RQ₃: What is the actual upper bound on the precision and recall of ML techniques in identifying SDC safe and unsafe test cases when using static SDC features? In RQ₁ and RQ₂, we focused on investigating the feasibility and cost-effectiveness of using SDC Road Characteristics as features for the problem of classifying SDC test cases before executing them. In RQ₃, we explore a complementary aspect, which is investigating whether there is an actual upper bound on precision and recall of ML techniques in identifying SDC safe and unsafe test cases when using static SDC features (available before executing the tests). Hence, once we identified the best ML models for classifying safe and unsafe test cases when compared to baseline approaches (in RQ₁ and RQ₂), we focus on answering RQ₃ by (i) designing additional SDC test case features, called Diversity Metrics (compared to the previous features used in RQ₁ and RQ₂ for training the ML models, these metrics are more complex than just computing simple road characteristics of SDC test cases); and (ii) leveraging hyperparameter tuning strategies to find the optimal configurations of the most promising ML models (as observed in RQ₁ and RQ₂).

We conducted our investigation using the freely available SDCs simulator BeamNG.tech (BeamNG GmbH 2022) (elaborated in Section 2). We selected BeamNG.tech because it can execute procedurally generated driving scenarios, and it was recently adopted as the reference simulator in the ninth and tenth editions of the Search-Based Software Testing tool competition^{Footnote 1} (Panichella et al. 2021; Devroey et al. 2022).

Complementary to the investigation of the aforementioned research questions, we investigate the extent to which SDC-Scissor can be integrated into the context of industrial organizations in the automotive domain. Specifically, to perform such an investigation, we generate SDC test cases and assess the ability of SDC-Scissor to generate signals compatible with the CAN Bus protocol (CIA 2017; Boumiza and Braham 2019; Gundu and Maleki 2022) used in the AICAS organization (details about the AICAS company, their protocol, as well as the design and results of our integration study, are provided in Section 6).

1.2 Summary of Results & Paper Contributions

SDC-Scissor avoided the execution of 50% of unnecessary tests as well as identified more failure triggering test cases compared to two baseline strategies.

SDC-Scissor outperformed the baseline across all test pools; with the Logistic model, we achieved an accuracy of 70%, a precision of 65%, and a recall of 80% (Table 12) in selecting unsafe tests.

Our assessment of SDC-Scissor shows that SDC-Scissor successfully selects test cases independently from the AI engine used or different driving styles, with the Logistic model providing the more stable results. Our results also show that the knowledge is not transferable from one AI engine to another one, i.e., SDC-Scissor performed worse when training ML models on data from a specific AI engine and testing on data from a different AI engine. However, from the discussion of our results (in RQ₃), we also observed that there is an upper bound for the extent to which static SDC features can be used to predict SDC testing outcomes. Finally, the integration of SDC-Scissor into the AICAS use case allowed us to demonstrate that the proposed approach can automate the testing process of such a large automotive company, coping with the need to complement their hardware-based simulation (based on the Can Bus protocol) with simulation-based testing automation. The contributions of this paper can be summarized as follows:

Selection of SDCs test cases(RQ₁): We investigated new methods in the area of SDCs for test case selection. We first compute SDC features that can be used to characterize safe and unsafe test cases before executing them. Hence, we introduced SDC-Scissor that leverages ML models to support test case selection for SDCs, to enhance testing cost-effectiveness.
SDC-Scissor’s Cost-effectiveness (RQ₂): We compared the proposed approach against two distinct baseline approaches to demonstrate the testing cost-effectiveness of SDC-Scissor. The first one is a random baseline approach that selects tests randomly. The second baseline selects tests based on their road length, which means that test cases with long roads are preferred based on the intuitive assumption that long roads have a higher probability of being unsafe.
Offline v.s. Real-time Training (RQ₂): We investigated two opposite setups for SDC test case selection that leverage ML models trained on offline data (i.e., trained on a large static dataset) and real-time data (i.e., dynamically generated tests).
Upper-bound of SDC static features (RQ₃): We empirically investigated whether there is an actual upper-bound on the precision and recall of ML techniques in identifying SDC safe and unsafe test cases when using static SDC features (available before executing the tests).
Integration of SDC-Scissor in an Industrial Use Case (analysis detailed Section 6): We integrated SDC-Scissor into the development context of the AICAS use case, demonstrating that the proposed tool can automate the testing process of such a large automotive company.

To foster the replicability of our study, we built a large dataset of labeled test cases (Khatiri et al. 2021) that can be used for replicating our results and promoting further research. Furthermore, SDC-Scissor is publicly available on GitHub,^{Footnote 2} which can be used with the data to replicate our results.

Paper Structure

The paper proceeds as follows: Section 2 provides some background about CPS simulation technologies, regression testing, a discussion of the simulation-based testing (of Lane Keeping) systems used in the context of our study, a discussion on automated test generation in the context of SDCs, and a summary of the main terminology used in our study. Section 3 presents the approach proposed in this paper. Section 4 describes the empirical study design, while Section 5 presents its main results. Section 6 provides a brief background on AICAS, the industrial organization involved in our study, details on the Can Bus (i.e., their signal-based protocol), and elaborates on the design and results of SDC-Scissor’s integration within the AICAS organization. Section 7 reflects on the results reported in Section 5 and Section 6, providing complementary insights and providing a discussion on future work for researchers and SDC developers. Section 8 discusses related work, while Section 9 discusses the threats that could affect the validity of our results. Finally, Section 10 concludes the paper and outlines future research directions.

2 Background

This section introduces background elements to make this paper self-contained. It presents the main approaches to SDC simulation (Section 2.1) and discusses automated testing of Lane Keeping systems (Section 2.2). Finally, it concludes with a recap of the terminology used in the rest of this paper (Section 2.3).

2.1 CPS Simulation Technologies

Several simulation technologies have been developed to support developers in various stages of the design and validation of CPSs. Those technologies provide various levels of accuracy and realism at different execution costs, i.e., more accurate simulations generally require larger computational power. In the domain of self-driving cars, developers resort to abstract simulation models (González et al. 2018; Sontges and Althoff 2018; Althoff et al. 2017), rigid-body simulations (Loquercio et al. 2020; Zapridou et al. 2020), and soft-body simulations (Gambi et al. 2019; Riccio and Tonella 2020) among others.

Basic simulation models, like MATLAB and Simulink models as well as abstract driving scenarios (Althoff et al. 2017), have been mainly utilized for model-in-the-loop simulations, benchmarking of trajectory planners, and Hardware/Software co-design. They implement fundamental abstractions (e.g., signals, motion primitives) but target mostly non-real-time executions and lack photo-realism, which limits their applicability for testing SDC systems.

Rigid-body simulations approximate the physics of bodies by modeling entities as undeformable bodies (Abdessalem et al. 2018b). Rigid-body simulations implement a very coarse approximation of reality and can simulate only basic object motions and rotations. Consequently, rigid-body simulations cannot simulate realistic and critical scenarios (e.g., car crashes, inertia) accurately, even when they are combined with rendering engines to achieve photo-realistic simulations (Dosovitskiy et al. 2017; Bondi et al. 2018; Xu et al. 2019).

Soft-body simulations improve over rigid-body simulations and can simulate a wide range of simulation cases in addition to primitive body motions and rotations. As stated by Dalboni and Soldati (Dalboni and Soldati 2019), soft-body simulations can simulate body deformations, anisotropic mass distributions, and inertia, which are essential in many CPS domains. For SDCs, soft-body simulations are a better fit for simulating safety-critical driving scenarios (Gambi et al. 2019) and, like rigid-body simulations, they can be coupled with powerful rendering engines to achieve photo-realism (e.g., BeamNG GmbH (2022)). Consequently, in our work, we leverage soft-body simulations for simulation-based testing of SDCs.

2.2 Simulation-Based Testing of Lane Keeping Systems

In this paper, we study how SDC-Scissor can optimize the testing of the software that controls self-driving cars using physically accurate driving simulations. Specifically, we focus on testing Lane Keeping systems (LKS) that implement one of the fundamental features of autonomous driving.

Simulation-based testing requires creating relevant testing scenarios and reifying them into concrete executions (Li et al. 2016). In accordance with current research on automated testing of LKS (Panichella et al. 2021; Gambi et al. 2022), we consider scenarios that take place on a sunny day on single, flat roads surrounded by plain green grass. Consequently, tests take the form of the following driving task: driving without going off the lane from a given starting position, i.e., the beginning of a road, to a target position, i.e., the end of that road.

The roads defining these driving tasks are obtained by interpolating road points using cubic-splines to obtain a smooth road spine, i.e., the road’s center line (see Fig. 1). Driving simulators use the road spines to implement the actual driving tasks to execute.

In this context, unsafe tests correspond to virtual roads that expose problems in the ego-vehicle while driving autonomously on them, for instance, causing it to drive off-road or invade the opposite lane. As discussed in the next Section, SDC-Scissor extracts a set of features from the road spine and road points that enable it to predict whether the corresponding virtual road will expose a problem in the ego-vehicle before the test execution.

SDC-Scissor relies on the open-source testing infrastructure developed for the CPS testing competition of the SBST (Search-Based Software Testing) workshop (Panichella et al. 2021). This infrastructure can automatically implement executable simulations from the road spines, execute them, and collect their results (e.g., pass/fail). We opted for this infrastructure for two main reasons: (1) It utilizes BeamNG.tech (BeamNG GmbH 2022) simulator; hence, it can execute physically accurate and photo-realistic driving simulations. (2) It has already been used to benchmark several automatic test generators (see Panichella et al. (2021) and Gambi et al. (2022)); hence, it enables us to study the generality of SDC-Scissor. SDC-Scissor uses Frenetic (Castellano et al. 2021) as the main test generator, which uses a genetic algorithm for defining road points on a cartesian plane.

The open-source testing infrastructure developed for the CPS testing competition (Panichella et al. 2021) enables driving agents to drive simulated vehicles and get programmatic control over running simulations (e.g., pause/resume simulations, move objects around). We consider two different driving agents as test subjects for our evaluation: The first is the driving agent shipped with the BeamNG.tech, which we refer to as BeamNG.AI, and the second, is an open-source trajectory planner, which we refer to as Driver.AI^{Footnote 3} (Gambi et al. 2019). As explained by BeamNG.tech developers, a parameter called the “risk factor” (RF) controls the driving style of BeamNG.AI: low RF values (e.g., 0.7) result in smooth driving, whereas high RF values (e.g., 1.2 and above) result in an edgy driving that may lead the ego-car to “cut corners”. Driver.AI instead analyzes the road geometry and plans the car trajectory by computing for each turn the maximum safe driving speed (v) using the standard formula for centripetal force on flat roads with static friction (μ) (CNX 2021):

$$ v = \sqrt{\mu \times r \times g} $$

(1)

where r is the turn radius and g is the free-fall acceleration.

Driver.AI relies on the user to provide the value of the friction coefficient, as well as information about the maximum acceleration and deceleration of the ego-car. In our evaluation, we estimated those values empirically following a trial-and-error approach. It is important to mention that, at the moment, both BeamNG.tech and Driver.AI do not have previous versions of their driving agents. This means that their behavior can only be altered or investigated by experimenting with the parameters already discussed in the context of our study. As a consequence, the target of our regression testing strategy is primarily focused on enabling SDC test selection, with the main goal of reducing the effort required to detect faults. For future work, assuming new versions of both BeamNG.tech and Driver.AI are delivered, we plan to experiment with consecutive versions of these AI agents so that it is possible to investigate the potential fault-detection capability of both of them.

2.3 Article Terminology

To avoid any confusion in terminology, it is important to note that in the rest of the paper, we will refer to simulation-based test cases generated by SDC-Scissor as test cases. Test cases are composed of virtual roads composed of a sequence of multiple road segments, as exemplified in Fig. 1. Formally, road segments refers to (parametric) portions of roads of test cases; hence, they can be straight segments (no curvature), left turns (positive curvature), or right turns (negative curvature).

We refer to test cases that have been executed and evaluated in simulation as executed test cases. Then, if a test is passed successfully, we refer to it as a passing test, and if it failed, potentially revealing some issues with the system under test, we refer to it as a failing test.

On the other hand, as we elaborate more in the next sections, SDC-Scissor automatically assigns labels to the test cases regarding them being likely to fail or pass without executing them. In this context, we refer to the test cases which are considered by SDC-Scissor to be likely to pass as safe test cases and the ones that are considered likely to fail as unsafe test cases.

Regarding the features used in SDC-Scissor, static (road) features refer to any test case features that can be calculated without running any simulations, i.e., they are suitable for predicting test results (simulation results) before running simulation. As discussed in detail in the next section, we propose to use two different sets of road features: road characteristics and diversity metrics.

Regarding the experiments to answer RQ₂, we will discuss offline experiments that involves test selection from a previously generated (offline) pool of test cases in Section 4.2.2. We conducted the offline experiment in two experimental setups that mimic the issues of having a limited testing budget in the context of SDCs: 1) FIX, in which the amount of total test cases that can be executed in the simulation environment is fixed to a certain number. 2) REACH in which we continue executing the test cases until we reach a certain number of failing tests.

As discussed later in Section 5.3, we complement RQ₂ evaluations with real-time experiments, in which we study the application of SDC-Scissor to automated test generation, i.e., the test pool is being generated in real-time, and only the unsafe tests are being kept and executed. There, we have two experimental setups: 1) with a pre-trained ML model. 2) with an adaptive ML model that could be retrained with the correct labels of the generated test cases.

3 The SDC-Scissor Approach

In this section, we first overview SDC-Scissor’s software architecture and its main usage scenarios (Section 3.1); next, we describe the selected features used as inputs to SDC-Scissor (Section 3.2); finally, we explain how SDC-Scissor uses these features to classify test cases before executing them (Section 3.3).

3.1 SDC-Scissor Architecture Overview

SDC-Scissor supports two main usage scenarios: Benchmarking and Prediction. In the Benchmarking scenario, SDC developers (or testers) leverage SDC-Scissor to determine the best ML model(s) to classify SDC simulation-based tests as safe or unsafe. In the Prediction scenario, instead, SDC-Scissor uses the most promising ML model(s) to classify newly generated test cases.

SDC-Scissor Software Architecture (Fig. 2) implements these scenarios by means of five main software components, which have the main following responsibilities and relations:

(i)
SDC-Test Generator generates SDC simulation-based test cases.
(ii)
SDC-Test Executor executes the tests and stores the test results, i.e., safe or unsafe labels, to allow training of the ML models.
(iii)
SDC-Features Extractor extracts the input features from the SDC simulation-based test cases.
(iv)
SDC-Benchmarker uses these features and collected labels to train the selected ML models and determines which ML model best predicts the tests that are more likely to detect faults.
(v)
SDC-Predictor uses the trained ML models to classify newly generated test cases, thus achieving cost-effective SDC simulation-based testing via test selection.

3.2 SDC Test Case Features

SDC Test Case Road Characteristics - Features Set 1

(Used in RQ1, RQ2, and RQ3). To predict whether test cases are likely to result in safe or unsafe test cases before their execution, we use a set of simple static features extracted from the global characteristics (we refer to Road Characteristics) of the virtual roads used as test cases. We extract two types of Road Characteristics describing the main road attributes (see Table 1) and descriptive statistics about the road composition (see Table 2). Exemplary road attributes we consider are the total length of the virtual road, its starting and target positions on the map, and the count of left and right turns. To calculate road statistics, instead, we adopt the following procedure: (1) We extract the driving path that the ego-car must follow during the test execution; this path defines the test case and contains the road segments that the ego-car must traverse to reach the target position from the starting position. (2) We extract the metrics such as segment length, road angle, and pivot radius from the road segments. (3) We compute descriptive statistics by applying standard aggregation functions (e.g., minimum, maximum, average) on the collected road segment metrics.

Table 1 Road attributes extracted by the SDC-Features Extractor

Machine learning-based test selection for simulation-based testing of self-driving cars software

Abstract

Similar content being viewed by others

Two is better than one: digital siblings to improve autonomous driving testing

Model vs system level testing of autonomous driving systems: a replication and extension study

Self Drive Guard: A Simulation Platform for Autonomous Driving Systems

Explore related subjects

1 Introduction

1.1 Problem Statement and Research Questions

1.2 Summary of Results & Paper Contributions

Paper Structure

2 Background

2.1 CPS Simulation Technologies

2.2 Simulation-Based Testing of Lane Keeping Systems

2.3 Article Terminology

3 The SDC-Scissor Approach

3.1 SDC-Scissor Architecture Overview

3.2 SDC Test Case Features

SDC Test Case Road Characteristics - Features Set 1

SDC Test Case: Diversity Metrics - Features Set 2 (Used in RQ3)

3.3 The SDC-Scissor’s Workflow

4 Study Design

4.1 SDC Test Cases Dataset Preparation

4.2 Research Method

4.2.1 Machine Learning-based Experiments (RQ1)

Rebalancing of Training Data

Size of the Training Dataset

4.2.2 Offline Experiments (RQ2)

Experimental Setups of Offline Experiments

4.2.3 Real-Time Experiments (RQ2)

4.2.4 Optimization Experiments (RQ3)

5 Results

5.1 Machine Learning-Based Experiments (RQ1)

5.1.1 Machine Learning-Based Experiments with Road Characteristics

5.1.2 Analysis of Relevant Features

5.1.3 Impact of Risk Factor (RF)

5.1.4 Knowledge Transfer Between Different Driving Agents

5.2 Offline Experiments (RQ2)

5.2.1 FIX Experiment results

5.2.2 REACH Experiment

5.3 Real-Time Experiments (RQ2)

Baseline vs. Pre-trained and Adaptive Models

Adaptive vs. Pre-trained Model

Training costs: Pre-trained and Adaptive Models v.s. Random Baseline

Training Dataset Preparation: Pre-trained and Adaptive Models v.s. Random Baseline

5.4 Optimization Experiments (RQ3)

6 Integration of SDC-Scissor in the Industrial Use Case

6.1 Experiments Involving an Industrial Use Case (AICAS)

Integration Steps

6.2 Industrial Use Case (AICAS): Integration Results

7 Discussion

7.1 Discussion of Experiments Using Road Characteristics as Input Features to the ML Models

7.2 Further Remarks and Future Directions

8 Related Work

9 Threats to Validity

10 Conclusions and Future Work

Data Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Appendices

Appendix A: Analysis of Relevant Features (RQ1)

Appendix B: Impact of Risk factor (RF) on Classification Performance (RQ1)

Appendix C: Transfer Knowledge of ML Models When Using Different Driving Agents (RQ1)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

4.2.1 Machine Learning-based Experiments (RQ₁)

4.2.2 Offline Experiments (RQ₂)

4.2.3 Real-Time Experiments (RQ₂)

4.2.4 Optimization Experiments (RQ₃)

5.1 Machine Learning-Based Experiments (RQ₁)

5.2 Offline Experiments (RQ₂)

5.3 Real-Time Experiments (RQ₂)

5.4 Optimization Experiments (RQ₃)

Appendix A: Analysis of Relevant Features (RQ₁)

Appendix B: Impact of Risk factor (RF) on Classification Performance (RQ₁)

Appendix C: Transfer Knowledge of ML Models When Using Different Driving Agents (RQ₁)