1 Introduction and literature review

Transport safety and security are sensitive issues that affect all transport users and providers. Self-driving cars could increase safety, prevent deaths, and increase lower-income mobility. But any discussion on the self-driving cars must involve security issues and particularly cybersecurity. Cyberattacks targeting in-vehicle networks can result in a reasonable threat considering passenger safety.

When the field of automotive cybersecurity is investigated, vehicular ad-hoc networks (VANETs) cannot be ignored. It seems to be clear that in-vehicle and inter-vehicle communication networks are essential to implement highly automated and connected transportation systems. Accordingly, Sharma and Kaul (2018) present a summary related to the different intrusion detection methods to introduce the different approaches especially focusing on VANETs. In the last part of the paper, the benchmarking of the different models has been performed, presenting pros and cons with regard to the investigated models.

Recently, one of the most known cybersecurity issues of automotive communication technologies is the BlueBorne vulnerability. However, there are other important considerations needed to be analyzed related to Bluetooth technology. Cheah et al. (2017) have performed a comprehensive evaluation focusing on the automotive Bluetooth interface. Accordingly, the researchers have proposed a model environment for analyzing Bluetooth interfaces of road vehicles in detail and have developed a specific application to demonstrate the applicability of the developed model.

In general, modeling cyberattacks and defense-related processes have always been an important issue in the field of security. Accordingly, Ten et al. (2010) have performed an overall investigation related to a high-level control security environment, including real-time analysis, intrusion detection, effect analysis, and defense concepts.

Beyond the general cybersecurity-related considerations of risk evaluation methods and defense strategy identification processes, due to the numerous specific features of the automotive sector, the vulnerability of vehicular systems are difficult to be represented by models of other disciplines. Therefore Ward et al. (2013) have described how a functional safety based cybersecurity method can be developed based on the approach described in ISO 26262 and the related MISRA Safety Analysis Guidelines. Schoitsch et al. (2015) and others have also emphasized the necessity of a combined approach in the automotive industry to integrate safety and cybersecurity (Schoitsch et al. 2015). Moreover, other studies have drawn attention to the relationship of product life cycle and security, especially focusing on over the air updating possibilities in the automotive industry (Boehner 2019). Besides this, the latest solutions are also needed to be adapted in the automotive security sector. Accordingly Fraga-Lamas and Fernández-Caramés (2019) have introduced the application possibilities of block-chain technologies in a road transport environment. Other researches have emphasized the importance of the most up-to-date results in the field of automotive cybersecurity regarding network architectures, common protocols, and novel design concepts (Lo Bello et al. 2019).

In recent years, the number of attacks on in-vehicle networks has been started to grow dynamically. The in-vehicle network is a system for transferring data among different electronic control units (ECUs) of the vehicle via a serial data bus. In other words, in-vehicle networks facilitate information sharing among distributed applications (Leen and Heffernan 2002). Applying a serial data bus reduces the number of wiring, influencing the weight of a vehicle, and enhances its performance. As today's vehicles contain more than 70 ECUs (Henniger et al. 2009), effective communication processes are required among them to reach their full functionality. As vehicles adopt more and more technological applications and implemented connectivity functions to the external world, threats on electronic functionalities are highly rising.

Telematics systems of modern cars provide value-added features such as remote diagnostics, crash response, and stolen car recovery over a long-range wireless link. These telematic systems connect in-vehicle networks with external communication centers via remote endpoints connectors. Furthermore, the introduction of the vehicle to vehicle and vehicle to infrastructure communication in the future developed autonomous cars (Maile and Delgrossi 2009; Ahmed-Zaid et al. 2011) will broaden the possible attack surface.

Beyond performing security tests and improving the efficiency of the defense system, security can already be considered during the design process. Tsigkanos et al. (2015) have introduced a novel approach allowing security software engineers to represent security requirements and the topology of the operational environment at the same time. A similar approach has been followed by Mirkovic (2010) and others through developing adequate facilities, tools, and processes to ensure resources for researches in cybersecurity, also emphasizing the importance of network topology. The importance of in-vehicle communication network topologies has already been investigated by researchers (Miller and Valasek 2014; Lin and Sangiovanni-Vincentelli 2012), however detailed and comprehensive analysis of the different topologies have not been performed in the automotive industry so far.

Generally, the attack possibilities will be strongly influenced by the different remote endpoints, the topology of the in-vehicle networks, and the applied safety programs. Lee and Kim (2011) have shown that the topology structure has a direct impact on the robustness of a network under attack. Hegde et al. (2013) have applied a simulation software (i.e., CANoe) to study and compare the effect of applying different in-vehicle network topologies on the network performance. Furthermore, researchers have demonstrated how vulnerability in remote endpoints can be exploited to access in-vehicle networks, penetrate ECUs, and then control or sabotage vehicle maneuver (Checkoway and McCoy 2011; Miller and Valasek 2013).

The growing demands for the new functionality of modern vehicles, as well as the possible accompanying cyberattacks, define a complex evaluation environment that cannot be handled with traditional techniques. Rather than focusing on individual attacks, this paper provides a comprehensive analysis of a relatively large sample of in-vehicle networks. The analysis has based on the development of a complex statistical methodology for identifying, classifying, modeling, and evaluating in-vehicle network topologies in regard to potential cyber incidents.

2 Data description

In this study, we aimed to test the degree of exposure of vehicular network topologies to cyberattacks based on the data of 114 vehicles. The topology database was generated based on multiple resources. The most helpful resource was the VAG group’s Self Study Program manuals (V. Group 2019) that gave a detailed description of the vehicles' networks. Other service manuals and wiring diagrams were also used from Chevrolet (Chevrolet 2009), Mercedes (Mercedes 2019), Mitsubishi (Minato 2019), and Toyota (Toyota 2019). The rest of the materials were collected from various training, research, and presentation materials (Suzuki 2012; Köhl et al. 2003; Burns 1996) available online or downloaded from official databases.

The used data has included information about vehicle characteristics (i.e., manufacturer, model, year of production, manufacturer’s suggested retail price (MSRP), engine type and capacity), some relevant details related to the internal network (i.e. the number of segmented network domains, ECUs, remote endpoints), and the processed information on network topology security (i.e., accesses protected by gateways). The MSRP was collected from online (i.e., Autotrader.com and Kbb.com), which represents the original value of the vehicle on release. A significant pillar of the developed evaluation framework is the identification of the derived value of Appendix 1 that gives a comprehensive characterization of vulnerability based on some of the relevant variables used in this study. Besides this, it has to be emphasized that collecting such data has meant a considerable challenge, and remarkable efforts, resources, and time have been needed to be allocated to search for the required details.

3 Methodology

Communication among the electronic components of the vehicle is essential for providing its main functions. Different possible combinations of network types, ECUs, sensors, and other electronic devices within a vehicle can result in different network topologies. The topology of the internal vehicle networks usually vary by manufacturers (Miller and Valasek 2014) and is designed to balance among efficiency, convenience, and cost with modern technological solutions. Due to the continuous development of vehicle technologies and functionalities, automotive network architecture is getting more complex and susceptible to new ways of penetration from attackers (András et al. 2015).

This study aims to provide a new approach for describing and measuring the vulnerability of in-vehicle networks concerning malicious interventions. The methodology focuses on a relatively large sample size (i.e., difficult to analyze every object alone) by proposing a new framework of statistical techniques for measuring, classifying, and modeling in-vehicle networks according to the changed vulnerability.

The developed methodology was the result of a unique research process considering the existing cybersecurity standards applied in the automotive industry and in the IT sector. We did take a deeper look into the SAE recommendations (SAE J 2016; Szalay et al. 2017) for Cybersecurity Best Practices for Modern Vehicles, which pointed out that the methods of Critical Security Controls for Effective Cyber Defense (CIS CSC) can be directly applied in numerous industrial segments. However, in the case of the automotive sector, the proposed approaches should be adapted according to the specific characteristics of the sector.

In the first step we investigate the layered approach recommended by the Cybersecurity Framework of National Institute of Standards and Technology. The layered approach describes the key points of designing and building a comprehensive and systematic cybersecurity protection system for vehicles. Following this, our classification model has numerous connection points with the mentioned layered approach. The technique of layered cybersecurity protection applies a risk-based identification and protection process focusing on the safety–critical modules of the vehicle. Since the external availability and protection level of safety–critical elements is a considerably emphatic part of our model, the output of the risk-based identification process is critical from the viewpoint of our evaluation method.

If we study the CIS CSC methodology further, we find that the cybersecurity gap assessment process should also be an important part of a comprehensive security analysis. Beyond, the operative assessment, the documentation of the performed methods should also be performed in a very detailed way. This consists (SAE J 2016; Szalay et al. 2017):

  • A risk assessment,

  • The penetration test results,

  • And the organizational decisions.

Our methodology can contribute to the 3 main requirements for performing and documenting a risk assessment process. The SAE CBPMBV describes these requirement as follows.

A risk assessment study has to include at least the evaluation of the internal vehicle networks, of the external wireless networks, and all the possible interfaces that can be applied to reach an ECU externally. This characteristic of the topology is considered in our method by the identification of the critical ECU-s and unprotected ECU-s (SAE J 2016; Szalay et al. 2017).

The external and internal availability of vehicle ECUs should be restricted to the most important functions of the given vehicle module and the used connection points has to be secured to protect the access of malicious perpetrators. In accordance with this methodology, we considered these possible attack surfaces as the remote endpoints of the network. The number of remote endpoints and unprotected remote endpoints was included in rating and documentation process.

Logical and physical segmentation methods should be applied to isolate controllers, safety–critical modules and network domains from external connection points to prevent unauthorized access. According to the recommendation, we also considered the aspects of network segmentation as a vital security feature. The number of segmented network domains and the amount of unprotected network domains were the third factor of our evaluation process.

Accordingly, in light of the investigated standard automotive cybersecurity methods, the proposed novel cyber security evaluation model meets the expectations of the current recommendation of the Society of Automotive Engineers.

3.1 In-vehicle network vulnerability

Modern vehicles can contain more than 70 ECUs (Hegde et al. 2013). Each ECU is responsible for one or more specific vehicle functions (SAE 2016; Szalay et al. 2017). For full functionality, the ECUs need to communicate with each other and with the outside world (Zöldy and Zsombók 2019). Unprotected ECUs might tempt attackers and consequently posing a threat to vehicles and their passengers. The threat level is generally influenced by the characteristic of the vehicle network topology and its external endpoints.

Safety–critical attacks against today's vehicles require two general stages. The first stage includes the injection of malicious data from outside using either remote or onboard access endpoints. In most of the cases, remote attack possibilities can cause even more risk for vehicle safety (Valasek and Miller 2015), since sending some malicious codes via wireless, internet, or Bluetooth can be more convenient for the attackers. The second stage requires to gain control over some ECUs or sabotage their functionality. The degree of vulnerability is strongly influenced by the function of the hacked ECUs. For example, some ECUs are responsible for passengers' convenience, such as switching the radio or fasten the seatbelt. These ECUs are installed within a convenient or infotainment internal network. In contrast, other ECUs are directly responsible for various physical aspects of the vehicle, such as monitoring the steering wheel angle or controlling the ABS system and can be found within a powertrain network domain.

Cyberattacks can result in a serious threat to passenger safety. Vehicle manufacturers aim to monitor and protect all possible internal and external accesses in the designed network to reduce all possible ways of malicious intervention. Access points are frequently protected by gateway technology. The gateway cannot be installed for every access point due to its cost and the complexity of the network architectures. However this causes relevant security gaps for perpetrators to attack in-vehicle networks.

According to our expectations, the developed analytical framework can reveal the relevancy of the considered three critical aspects that might directly influence vehicle security and safety. The mentioned three aspects analyzed with special attention are the security effect of remote endpoints, the impact of segmented network domains, and the location of critical ECUs. Therefore, the methodology applied for measuring the vulnerability of in-vehicle networks is based on the ratio of unprotected components in case of those three aspects (i.e. remote endpoints, network domain, and ECUs) for every investigated certain network topology separately. Critical ECUs were selected based on their automotive safety integrity (ASIL) level. ASIL framework is a key component of the safety standard ISO 26262 since it can be used to measure the risk of a specific system component. The more complex the system, the higher the risk of systematic failures and random hardware failures. There are four ASIL values, named A–D. ASIL A is the minimum level of risk. And ASIL D is the maximum. So, ASIL D has stricter compliance requirements than ASIL A. Accordingly, during our research, in the case of critical ECUs, we focused on components characterized by ASIL D.

3.2 Hierarchical clustering analysis

Regarding the security aspects of in-vehicle network topology, previous literature has only concentrated on small sample size and rather analyze every vehicle individually. In the case of larger sample size, it is not very easy to deal with every single vehicle individually. Hence the data should be treated as a group. When dealing with a large dataset, data mining techniques are typically used to extract useful information and patterns from such data and convert it into an understandable structure.

Cluster analysis is a well-known data mining techniques, aiming to classify data into homogeneous groups so it can be easily understood and applied in further operations. The selection of the best clustering method is based on the objectives and field of study, and there is no such thing as a universally”best clustering method” (Hennig and Liao 2013; Luxburg et al. 2012).

Clustering is a qualitative method that aims to classify a set of data, regardless of size, into small homogeneous groups. However, the hierarchical clustering is claimed to perform the best in case of a relatively small sample size. According to Abbas (2008) and Verma et al. (2012) the partitioning algorithms (like k-means and expectation–maximization) are recommended for a huge dataset while hierarchical algorithms show good results when using with small data. Verma et al. (2012) have also found that density-based clustering methods (i.e., DBSCAN and OPTICS) do not perform well on small datasets.

According to the above-raised discussion, the hierarchical clustering seems best to fit the small size data of this study. The hierarchical cluster is a popular clustering method. The method is generally based on constructing a hierarchy cluster by recursively partitioning the data according to some given similarity measures (Everitt et al. 2011). Since clustering is the grouping of similar objects, similarity measure methods should be applied to determine whether two objects are similar or dissimilar. Following the objective of this paper, it seems to be reasonable to classify in-vehicle network data based on the similarity in their vulnerability level. With this, it becomes possible to identify and compare the most susceptible in-vehicle network topologies from a security point of view.

The hierarchical process can be performed based on either a bottom-up or a top-down approach. In the case of the bottom-up clustering method, each object initially represents a cluster on its own. Then, fewer clusters are created by merging similar objects. This method is called the agglomerative hierarchical clustering. The inverse process is the top-down or division hierarchical clustering. The result of the hierarchical clustering is a dendrogram, which presents the nested groups of clusters at different similarity levels. The desired cluster structure of the data is obtained by cutting the dendrogram at a suitable level.

The result and process for classifying vehicles by their vulnerability values using the hierarchical agglomeration clustering method can be summarized, as in the dendrogram (Fig. 1), as follows: Every vehicle has a single vulnerability value (e.g., v1, v2, v3, v3, v5). The network vulnerability value for each vehicle (object) initially represents a cluster by itself (Fig. 1a). Then converged clusters are successively merged in smaller groups but higher members, until the desired cluster structure is obtained (Fig. 1b–d respectively). The height in the dendrogram (Fig. 1) represents the distance at which each fusion is made. The desired cluster structure of the data is obtained by cutting the dendrogram at a suitable level. For example, cutting the hierarchy at the second-level, as shown in Fig. 1, results in three groups; (v1, v2), (v3), and (v4, v5).

Fig. 1
figure 1

Hierarchical dendrogram. The height represents the smallest distance of point-point, point-group or group–group

However, choosing the right clustering level or the appropriate number of clusters are crucial aspects in the case of this technique. The optimal number of clusters is selected based on previous experiences depending on the type of study. The selection process is usually supported by some statistical techniques. To determine the optimal number of clusters statistically, the Elbow method can be applied. According to this approach, it is recommended to merge clusters in the cluster space until the marginal increase in the variance of cluster elements would exceed the acceptable level (Ketchen and Shook 2002).

The application of the hierarchical method, in this study, is very attractive since it can be more efficient in case of a relatively small sample size (Baker and Hubert 1975) compared to other clustering methods (e.g., partitioning methods).

3.3 Ordinal logistic regression

We developed a generalized ordered logistic regression model to understand the variability of in-vehicle network vulnerability patterns and investigate their main contributing variables. Like all regression, the ordered logistic regression is a predictive analysis used to describe data and to explain the relationship between one dependent variable and one or more explanatory variables (Quinn et al. 2001). What distinguishes the ordered logistic regression from the linear regression is the response variable. The ordered logistic regression allows using an ordered set of categories to define the dependent variable. Developing such a model can exactly fit the objective of the current analysis since it allows us to reveal in-vehicle network security characteristics for each different vulnerability level separately.

The ordinal model provides a cumulative probability representation; the sum of all possible outcomes is equal to 1. The number of categories is C, and c indicates the certain index of the investigated category. The probability of being in the c-th category is πc (for every c = 1,2,…,C). Accordingly, using the expected cumulative log link function (ηc) the log odd ratio compares the probability of being included by category c (πc) and its complementary probability (1 − πc). The log odd for this ratio is as follows.

$$\eta_{c} = \log \left( {\frac{{\pi_{c} }}{{1 - \pi_{C} }}} \right)$$
(1)

The odd ratio can be derived by the exponentiations of the log-odd coefficients [exp.( ηc)]. When the odd ratio is less than one, the probability of being included by the given category is less than its complementary probability, and if the probability of being included by the investigated category c is more than the probability of the complementary set, it results in a higher odd ratio, above one.

4 Results and discussion

4.1 Vehicle network vulnerability results

In-vehicle network security can be strongly affected by the degree of access protection considering the three investigated aspects; remote endpoints, segmented network domains, and the location of critical ECUs. Any unprotected access points without a properly designed and installed gateway may leave a serious security gap for perpetrators to attack the in-vehicle network and thus threaten passenger safety.

The methodology applied for measuring the vulnerability level of the in-vehicle network is based on the percentage of unprotected accesses in case of the three considered aspects (i.e., remote endpoints, segmented network domain, and critical ECUs) for every in-vehicle network respectively. All of the three considered components represent the most sensitive parts of the network, especially considering the field of cyberattacks and pose a high level of threat to the vehicle and passenger's safety. In another words, they all have the same high level of importance and consequently assumed to have the same weight and influence on the vulnerability of the in-vehicle network for malicious attacks, and can be added in a single value. For example, referring to Appendix 1, total network vulnerability for Audi A3 (i.e., ID = 29 in Appendix 1) is equal to 1.83. This resulted from the summation of the ratios of unprotected accesses in case of the segmented network domains (i.e. \(3 \div 6 = 0.5\)), remote endpoints (i.e. \(1 \div 1 = 1\)), and critical ECUs (i.e. \(3 \div 9 = 0.3\)).

Figure 2 presents security vulnerability for the top 50 in-vehicle network topology out of the 114 data sample, ordered by their ID (see Appendix 1) and vulnerability level.

Fig. 2
figure 2

Vulnerability values for the most vulnerable 50 in-vehicle network topology

As seen from Appendix 1, different network architectures are provided by each car manufacturer and each car. This means that each car has different security vulnerability level, as shown in Fig. 2.

The presented vulnerability by Fig. 2 does not mean that low vulnerable vehicles are more secured than high vulnerable vehicles since this is also influenced by the technical characteristic of the installed gateways and other network elements and vice versa. On the other hand it provides a reliable method, how structural vulnerability can be described and compared.

According to the obtained results (Fig. 2 and Appendix 1), the five most vulnerable vehicles are presented below (in descending order):

  1. 1.

    2018 Toyota IQ (vulnerability = 3.67)

  2. 2.

    2019 Toyota Aygo (vulnerability = 3.00)

  3. 3.

    2010 Infinity G37 (vulnerability = 3.00)

  4. 4.

    2014 Jeep Cherokee (vulnerability = 3.00)

  5. 5.

    2007 Volkswagen Crafter (vulnerability = 3.00)

Based on their topology characteristics, the least vulnerable vehicles (vulnerability = 0.00 for all) are as follows (see also Appendix 1):

  1. 1.

    2005 Volkswagen Transporter

  2. 2.

    2002 Volkswagen Touareg

  3. 3.

    2005 Audi A6

  4. 4.

    2018 Toyota Corolla

  5. 5.

    2017 Toyota CHR

  6. 6.

    2018 Toyota Avensis

The most noticeable result of Fig. 2 is the different Toyota models appear both among the most and also among the least vulnerable models. All of these Toyota models are newly introduced in the markets (2017–2019). However, cost-effectiveness seems to play an important role in the designed topologies, where higher costs show lower network vulnerability. The more expensive "Toyota" models are also provided by a higher number of segmented network domains (i.e. 5), and ECUs with an extra type of remote endpoints (i.e. navigation or head unit) with fully protected accesses, as described in Appendix 1.

Generally, several factors that might affect the weakness of the topology of a car network, such as; the number of critical ECUs, segmented networks, remote endpoints, as well as the cost and age of the vehicle model, as the most notable factors. But it seems to be difficult to determine any specific pattern of these factors by the naked eye for the given sample size. Thus, it is required to introduce some auxiliary statistical techniques to help in classifying and understanding the most relevant factors influencing in-vehicle network vulnerability and their patterns.

4.2 Hierarchical clustering results

As the first step, the heterogeneity of the data has to be revealed to facilitate understanding the vulnerability patterns of the investigated in-vehicle networks. The hierarchical clustering has been applied to classify and organize the vehicles (Appendix 1) by their internal network vulnerability values.

The process begins by applying the agglomerative hierarchical clustering with a suitable distance metric technique to measure the affinity between vulnerability values. The application of the IBM SPSS software for the given data has resulted in a dendrogram (i.e. the hierarchy of the partitioned data). Figure 3 represents the process of defining the suitable partition level (i.e., number of clusters) by the Elbow method.

Fig. 3
figure 3

Identifying the optimal number of clusters

Based on Fig. 3, it can be observed that the selected value of partition—located on the x-axis—is 111. Following the Elbow method, this value represents the optimal case from where the further reduction of the number of clusters would result in an unfavorable level of variance and cluster heterogeneity. Therefore, the optimal number of clusters is 3 (i.e., the difference between the total sample size -114- and the optimal value -111-). In other words, in-vehicle networks have been divided into three groups (or clusters) by vulnerability level (see Appendix 1). Cluster number 1 (CL1), includes 9 vehicles represent the least secured networks (\(2.50 \le {\text{vulnerability}} \le 3.67\)). Cluster number 2 (CL2) has a moderate vulnerability (\(1.25 \le {\text{vulnerability}} \le 2.25\)) compared with the given dataset and includes 31 vehicles. At the same time, the last 74 vehicles are the least susceptible to cyberattacks (\(0.00 \le {\text{vulnerability}} \le 1.00\)) and are classified within cluster number 3 (CL3).

4.3 Modelling results

We applied the IBM SPSS software to complete the model development process, and we used the ordinal logistic regression method to estimate the model parameters.

Several models have been tested, including numerous combinations of different explanatory variables to find the most efficient predictor variable set. Eventually, the vehicle average cost, number of critical ECUs, network domains, and vehicle age (i.e., measured by subtraction the year of production of each vehicle from the newest one in the dataset) have been selected as the best predictors of the in-vehicle network vulnerability change. The model calibration results are shown in Table 1.

Table 1 Model parameter estimates

Although most of the selected variables (Table 1) do not reach the required significance level still, they are the most relevant variables related to the investigated response variable (i.e., vulnerability). The low significance can be interpreted by the relatively small sample size (i.e., 114 data objects) used in developing the model. The factor of small sample size can strongly contribute to the difficulty of statistical model development (Wisz 2008). Nevertheless, the whole model is statistically significant for all of its parameters as presented in the resulted goodness-of-fit tests shown in Table 2.

Table 2 Model goodness-of-fit tests results

The overall model validation process has been completed through using three tests: (1) model fitting, (2) goodness-of-fit, (3) and, pseudo-R-squared test, as presented in Table 2. The model-fitting approach tests the fit of the resulted model with a full complement of predictors relative to a null model with no predictors (the baseline or ‘Intercept only’ model). The statistical significance of this test (p < 0.05) indicates that the final model gives a significant improvement over the baseline intercept-only model. In the case of the developed model, the resulted values of the fitting test (Table 2) indicate that the model gives better predictions than if it is just based on the marginal probabilities for the outcome categories. The -2 log-likelihood can be used in comparisons of nested models. Table 2 also shows the resulted goodness-of-fit. The goodness-of-fit test provides additional information concerning the overall fit of the model. The statistical insignificance of the model is also proved by the chi-square statistic (p > 0.05), as the achieved results show (Table 2), which is also an important indicator for proper model fitting.

The third goodness-of-fit test refers to measuring pseudo-R-squared. The “goodness” or “acceptability” of pseudo-R-squared value depends upon the nature of the investigated factors and the explanatory variables. Here, pseudo-R-squared values (e.g. Nagelkerke = 0.29) indicate that the given explanatory variables explain 29% of the variation between the three vulnerability levels. This is in accordance with our expectations since, for surly, there are other variables that can have an impact on in-vehicle network vulnerability. However, the “pseudo” R-squared values do not have the same interpretation as standard R-squared values from OLS regression (Long and Freese 2005). Though, this does not dispute the fact that explanatory variables have a statistically significant and relatively large influence on the investigated vulnerability level. Generally, the results of the three goodness-of-fit statistics indicate good fitting of the developed model for the given data.

The interpretation of the estimated coefficients (Table 1) is measured with respect to CL3 (i.e., the lowest vulnerable vehicle network group) as a reference variable. The vehicle MSRP is the only significant variable for the 95% confidence level (p value = 0.043). The negative coefficient of the number of remote endpoints (− 0.471) indicates that expensive vehicles are more likely to have a considerably vulnerable internal network from a security point of view. In other words, as the vehicle cost increases, the security of the vehicle network for malicious intervention decreases by 0.47. This is an interesting result. The explanation is that the increase in cost is always linked to the multiple functionalities of the vehicle. According to Henniger et al. (2009), as vehicles adopt more and more technological applications and implemented connectivity functions to the external world, threats on electronic functionalities are highly rising. The positive coefficients related to the number of critical ECUs and network domains (significance is discussed above) indicate a proportional relationship between their numbers and the in-vehicle network safety. At the same time it, can also be concluded that newer (less age) and more expensive vehicles are getting more susceptible to external attacks, in general.

5 Conclusion

This study has investigated the vulnerability of the in-vehicle network topologies to cyberattacks. The methodology proposes a new statistical approach for measuring, classifying, and modeling in-vehicle networks with respect to their vulnerability to cyberattacks.

We analyzed the dataset has to facilitate understanding vulnerability patterns of in-vehicle networks through three stages: vulnerability identification, classification, and modeling. Vehicle vulnerability to cyberattacks may be strongly influenced by the characteristics of in-vehicle networks. According to literature, the remote endpoints, network domains, and critical ECUs represent the most sensitive part of the network, especially considering the field of cyberattacks. The remote endpoints can provide easier access for intruders. While network segmentation and the protection of critical ECUs can directly improve the security of the vehicle. The insufficient attention paid to these three aspects could lead to an increased security threat. Therefore, the vulnerability level of in-vehicle networks has been defined based on the total percentages of unprotected accesses for those three components (i.e., remote endpoints, segmented network domain, and critical ECUs) in case of every investigated vehicle network.

In the second stage, the hierarchical clustering has been applied to classify and organize the vehicles by their internal network vulnerability values. This stage is applied to reveal the heterogeneity and facilitate understanding the vulnerability patterns for the given in-vehicle networks. With hierarchical clustering, the data has been classified into three groups (CL1, CL2, and CL3 from high to low vulnerability) that have primarily been differentiated based on the level of vulnerability. The ordinal logistic regression has finally been applied to describe the impact of the influencing factors affecting vulnerability.

The result reveals that the most susceptible vehicle is Toyota IQ (2018), followed by Toyota Aygo (2019), Infinity G37 (2010), Jeep Cherokee (2014), Volkswagen Crafter (2007). While the least sensitive vehicles are: Volkswagen Transporter (2005), Volkswagen Touareg (2002), Audi A6 (2005), Toyota Corolla (2018), 2017 Toyota CHR (2017), and 2018 Toyota Avensis (2018). This finding has revealed the presence of such influencing factors that are mainly affected by the price category and the target consumer group of the vehicles. For instance, Toyota models appear to have the most and least secured internal networks. Accordingly, the topology of in-vehicle networks in case of the more expensive Toyota models shows higher network security compared to cheaper Toyota models. Applying the introduced modeling technique has helped in more understanding of vulnerability related factors. The results have shown that the in-vehicle security gets more vulnerable to newer vehicles. On the other hand, more expensive vehicles, due to the increased number of unprotected access points have generally been proved to be more vulnerable to cyberattacks. Besides this, the increasing number of segmented network domains affects network security in a positive way.

Generally, the application of the developed combined classification and estimation model has helped in analyzing the processed complex dataset.