CyberSecurity for Autonomous Vehicles Against Malware Attacks in Smart-Cities

—Smart autonomous vehicles (AVs) are networks of cyber physical systems (CPS) in which they wirelessly communicate with other CPS sub-systems (e.g., smart -vehicles and smart-devices) to efﬁciently and securely plan safe travel. Due to unreliable wireless communication among them, such vehicles are an easy target of malware attacks that may compromise vehicles’ autonomy, increase inter-vehicle communication latency, and drain vehicles’ power. Such compromises may result in trafﬁc congestion, threaten the safety of passengers, and can result in ﬁnancial loss. Therefore, real-time detection of such attacks is key to the safe smart transportation and Intelligent Transport Systems (ITS). Current approaches either employ static analysis or dynamic analysis techniques to detect such attacks. However, these approaches may not detect malware in real-time because of zero-day attacks and huge computational resources. Therefore, we introduce a hybrid approach that combines the strength of both analyses to efﬁciently detect malware for the privacy of smart-cities.


I. INTRODUCTION
R ECENTLY autonomous vehicular systems (AVs) have seen a gigantic growth in a wide variety of aspects with the development of smart cities to build the Intelligent Transport Systems (ITS).For instance, the dramatic use of embedded systems and wireless communication (e.g., 4G LTE and 5G) in modern internet of vehicles which ultimately improve users safety and comfort.However, growing interest in the development of connected autonomous vehicles (CAVs) and ITS has introduced new security challenges and vulnerabilities in AVs that has a great impact on the smart environments for smart-cities.However, classical computer security solutions are not applicable in automotive industry standards for invehicle, vehicle-to-vehicle (V2V) communication and vehicle to everything (V2X) communications mainly because of realtime performance requirements, constrained computational resources, and differences among heterogeneous networks and their configurations [1].
Various recent reports have sketched attempts where cybercriminals have successfully demonstrated practical but remote attacks to key functions of automotive vehicles (as depicted in Figure 1) either through V2V or V2X that include disconnecting the engine and the brakes [2]- [5].
CryptoLocker, WannaCry, and Petya attacks are prominent one of the most widely used attacks against sensitive IT systems [6].Previously, ransomware attack infected either personal computers, public or private organizations, health sectors, Internet of Things, smartphones and smart industrial systems.Now, ransomware attack is targeting smart-vehicles and smart cities that could result in the loss of human lives and financial instability.Moreover, there have been attempts where researchers have shown that malware is one of the keys and emerging security threats that can be launched by exploiting the wireless communication system of AVs [7], [8].For instance, by exploiting known vulnerabilities in the design and implementation of onboard communication systems, embedded software, and application software [9]- [11] as sketched in Table I.Moreover, a report in [2], [12], [13] has shown that an AVs is not just a simple machine by hijacking the steering and brakes of a Ford Escape and a Toyota Prius.However, on the other hand, it is of utmost critical to understand that AVs are now a network of computers that can be hacked by practicing classical cyber threat mechanisms.For instance, during the year 2015, approx.1.5 million vehicles were subject to a recall by Daimler Chrysler mainly because cybercriminals could remotely take the control of a jeep's digital system over the Internet [3].In another report [4], a team of cybercriminals remotely hijacked a Tesla Model S from a distance of approx.a dozen miles.In a more recent attempt [5], authors have identified 14 vulnerabilities in the infotainment system in several of BMW's series.Moreover, another Tesla S and Tesla X was targeted by cybercriminals in November 2019 via the Wi-Fi attack vector [6].All of the above-mentioned incidents show that the security of AVs is integral to their core functions in order to make smart transportation secure, therefore, it must be handled to protect the vehicles enabling them to operate safely.
The key to the afore-mentioned success of remote attacks on AVs is information sharing by the vehicles over a wireless medium which increases the susceptibility of the vehicles to different security and malware attacks.Consequently, data exchange including input and output data as well as protecting Electronic Control Unit (ECUs) inside the AVs are among the most significant security issues for the intelligent vehicles [8], [17].Specifically, the most damaging cyber threats, are emerging as the vehicles connect to the Internet, provide onboard Wi-Fi hotspot services, communicate with other vehicles and ITS infrastructures, and support advanced applications such as over-the-air (OTA) ECU firmware update [7].As discussed above, many modern attacks do not require physical access to a vehicle instead can now be carried out remotely over wireless by exploiting communication vulnerabilities among vehicles and other connected network services.This allows attackers to compromise more vehicles with relative ease whereas later a compromised vehicle can also be used to attack other vehicles.
Considering the performance requirements of AVs, it is important to detect a malware in real-time to timely protect any physical and financial damage and loss of human lives.Current approaches to detect such malware either employ static analysis or dynamic analysis techniques.The former techniques are good at detecting active malware, i.e., the malware that is directly targeting some unauthorized resource or feature of the vehicle, however, such techniques fail to detect any passive malware that exploits some system vulnerability through monitoring run-time data of the vehicle.The latter techniques are more robust and rigorous as they can detect any variant of malware through observing run-time behavior of systems [18] but such approaches typically require more computational resources which is not the case in autonomous vehicles.Alternatively, some approaches attempted to install vehicle gateways that allow only authorised communication to the vehicles and introduced vehicle Intrusion Detection Systems (IDSs) to detect abnormal behaviors in the Controller Area Network (CAN) [19].However, it is difficult for a gateway or IDS to block these actions in advance, as most malware and adware are behavior-based.Therefore, to detect unknown malware threats, it is vital to introduce a methodology that can detect suspicious behaviors and analyze anomalous indicators rigorously (i.e., negligible false alarms) and efficiently (i.e., in real-time).
The rest of the paper is structured as follows: Section II provides background of autonomous vehicles, while Section III sketches state of the art about rigorous malware detection techniques.Section IV explains our malware detection methodology, while Section V presents experimental setup, experiment results and critical discussion.Finally, we conclude in Section VI.

II. BACKGROUND
Modern smart Autonomous vehicles (AVs) will strikingly change the worldwide transport industry and smart environments.AVs where improving the standard of smart living and road safety also require to wirelessly communicate with other vehicles and devices to efficiently and securely plan safe travel.The number of traffic accidents are reducing day by day.In Addition, people with disabilities can significantly taking advantage from smart cities and ITS technology preventing injuries and deaths in combat [20].However, due to unreliable wireless communication among them, such vehicles are an easy target of malware attacks that may compromise vehicles' autonomy, increase inter-vehicle communication latency, and drain vehicles' power.Such compromises may result in traffic congestion, threaten the safety of passengers, and can result in financial loss.Therefore, real-time detection of such attacks is key to the safe smart transportation and ITS.With the increasing trend of Internet of Things (IoT), ITS aims to improve the efficiency and safety of AV networks [21].ITS in societies that are converting into smart cities becomes more vulnerable to cyber-threat and cyber-terrorism [22].Different types of ITS are vulnerable to attacks.The success of remote attacks on autonomous vehicles is information sharing by the vehicles over a wireless medium which increases the susceptibility of the vehicles to different security and malicious attacks.Consequently, data exchange including input and output data as well as protecting ECUs inside the AVs are among the most significant security issues for the intelligent vehicles.ECUs are the embedded system that monitors electrical systems or subsystems in a conventional vehicle for instance the energy conversion, the air conditioner, vehicle speed and the warnings on the instrument panel [23].
An AV is not just a massive car with four wheel but is made up of networked embedded computers that are responsible for performing different tasks in a smart and timely manner.Therefore, an AV is a diverse and complex environment that comprises of several types of Operating System (OS) installed among different vehicles as shown in Figure 2.Although ECU act as a brain for AVs and is considered as minicomputers yet they vary in size, purpose and the OS they run.Thus, we can divide ECUs into two categories: managed by realtime operating systems (RTOS) and general purpose operating system (GPOS).Other than that, Robotic operating system (ROS) is also used.ROS is not an operating system but is an open-source robotics framework having collection of software for robot software development.Tesla, a leading automotive car vehicle is a new energy innovation owns a self-developed OS [24] is now testing Windows OS [25] and Tesla patent seems to be working on windows operating system [26]  In [27] authors have proposed the analysis of malware on X86-based IoT devices in an autonomous driving approach features based on static analysis and using machine learning to solve problems of resource overhead for dynamic analysis.Paper [28], authors have used Bayesian Network (BN) model to analyse cyber risk in AVs by introducing the variables and causal relationships derived from the Common Vulnerability Scoring Scheme (CVSS).The model is then applied on the GPS system of the connected AVs without cyptographic authentication.
Beside other malware attacks, ransomware attacks are emerging and their analysis are used widely by the scholarly community now-a-days.In [29], the authors presented a case study of CryptoLuck Ransomware to highlight the importance of behavioral-based Ransomware detection.In [30], authors statically analyzed process monitoring on file events, processor usage, and I/O rates.In [31], authors suggested that static detection technique as used by [32], can help in evading anti-virus (AV).In [33], authors performed ransomware behavioral analysis on windows platform of 14 strains of ransomware.They observed the individual behavioral pattern of ransomware.In [34], authors presented an automated detection and analysis of ransomware to monitor dynamic behavior by generating API calls and control flow graph (CFG).Authors in [35], developed a dynamic analysis system (UNVEIL), designed specifically for the detection of ransomware by automatically generating an artificial user environment.
There are several other research efforts which follow Machine Learning (ML) based approaches to detect malware exploiting the dynamic or runtime features of executing applications.Another proposed study of dynamic analysis using machine learning through monitoring file system activity of windows platform was conducted by [31].They used classification technique by considering a wide range of features such as Windows API calls, Registry Key Operations, File System Operations, file operations performed per File Extension, Directory Operations, Dropped Files, and Strings to classify malware.
Other than static, dynamic and ML approaches, Hardware performance counters (represent the true execution behaviors of the application) are typically being employed nowadays to measure the performance of the under investigation software [36].However, none of the existing dynamic and ML malware detection techniques use hardware performance counter for malware classification specifically in autonomous vehicles.Although, however, [37] employs a dynamic approach to classify malware based on their hardware performance counters and [38] have used hardware performance counter for ransomware classification on Windows platform.
It has been observed from the literature work that most of the techniques [31] can either only observe System/API calls [33], [34], [39], file operations [35], processor usage [30], or registry activities [40].Some of the studies are based on static analysis [29] whereas other proposed techniques mainly focus on dynamic analysis for classification.A lot of solutions have been developed against malware and ransomware as well as ransomware classification among families that significantly improve the user's security.A few researches [41]- [44] have shown that there is a lack of behavioral analysis that use hybrid technique to classify malware in AVs using API Calls, File operations, Registry keys, and Hardware performance counter based features (i.e., processor usage, cache-misses, memory usage, page faults, instructions, branches, etc.).So far, hardware-based features have been analyzed on malware and non-malware apps, but have not been considered for AVs.There exists no such work that considers all these important aspects in a single methodology.We believe that collective consideration of all of the above-stated aspects can significantly improve malware detection rates in AVs.Therefore, this study encompasses efficient malware detection mechanisms in terms of a hybrid approach that utilizes static as well as dynamic analysis focuses on the use of hardware performance counters to analyze the runtime behavior to detect malware.Moreover, this work shows how accurately hardware performance counters are able to classify malware in AVs.

IV. METHODOLOGY
Autonomous vehicles (AVs) have become a core constituent of the smart transportation system [45].The computation power of AVs is gradually increasing and a large amount of information exchange is required with smart components of the transportation system.Information exchange with malicious counterparts in the smart systems could produce catastrophic results such as a change of drive-plan, sudden halt, and ignore obstacles on the roads.Generally, malware exploits different vulnerabilities of the computer system (i.e., hardware platform, operating system, and application software).However, considering the drastic implications of the malicious activity in AVs, we should formulate a holistic approach considering handling precision, vehicle efficiency, and digital security.
With the static-analysis, malware detection can take place efficiently by merely matching the known application features such as signatures (before application execution) requiring few computational resources.Therefore, static analysis provides early detection to mitigate malicious activities during autonomous vehicle operation.However, the static analysis does not encompass the zero-day attacks and obfuscated (hidden or purposefully crafted features such as like packed or compressed programs or indirect addressing [46]) malicious applications.To address these issues, a dynamic analysis based mechanism can be employed that exploits the runtime behavior (including system hardware, operating systems interactions, etc.) of the executing applications to classify and detect malicious behavior.However, the proficient detection capabilities of the dynamic analysis come along with the high-resource consumption (CPU, memory, energy-cost, etc.).Additionally, in the AV context, it would be too risky to rely directly on the dynamic analysis because of potentially high false-positive detection as compared to static analysis.
Therefore, this study encompasses efficient malware detection mechanisms in terms of a hybrid approach that utilizes static as well as dynamic analysis.Traditionally, the proposed models can be built using basic hybrid mechanisms, i.e., (i) a single hybrid approach where distinctive aspects related to both pre-/in-execution of the applications are obtained for analysis and detection.For the obligatory requirements such as efficient and thorough detection of malware with reduced false-positive rate, the hybrid-approach is appropriate and recommended.
The proposed security modules for AVs i.e., the hybrid mechanisms Combined Hybrid Analyzer (CHA) is shown in Figure 3. CHA adheres to a factual technical concept of using a hybridization concept for bringing together heterogeneous parameters (in terms of the execution requirements i.e., pre-/inexecution based parameter extraction).As discussed above, the utilization of this model has certain operational consequences that hinder its practical use.
Let's discuss the architecture of these models in detail.The proposed CHA model considers input applications and data to employ both pre-/in-execution feature extraction simultaneously.The specific features extracted can be divided into two categories, i.e., static-analysis based features (which can be extracted without application execution), and dynamic features are extracted during the execution of the application within an operating system.The static features include embedded command-strings and the usage of operating system manipulating libraries.The dynamic features (extracted during the execution) are the activity logs related to system-wide low-level configuration manipulations, invoking system call For the initial investigation and proof of the concept, we have used a dataset of executable applications MS windows platform.We have chosen Windows based dataset for several reasons, for instance, most of the major initiatives in automotive vehicle industry use Windows based services (see Table II) for their live communication, which is certainly a key source of threat to such services and eventually to the vehicles [47], [48].Furthermore, as reported in [49], Microsoft services and platforms are helping automakers to create smart connected car solutions that seamlessly address their customers' unique needs, competitively differentiate their products and generate new and sustainable revenue streams.The Microsoft services do not only offer the right tools, but also allows them to keep their data, has a secure and compliant cloud platform, and operates at a truly global scale (given that most automotive brands operate in many countries).Importantly, 85% of Fortune 500 companies already rely on Microsoft's cloud for the afore-mentioned reasons.In principle, using such platforms, automakers and suppliers can benefit from the billions of dollars that Microsoft has already invested in the cloud services.For instance, Azure already offers more than 200 services in 38 worldwide regions, with robust measures for security and the global compliance and privacy regulations that are required to support connected cars, letting automakers focus on innovation rather than building out their own cloud-based

A. Dataset
As discussed in the previous section, we have used a dataset of 1000 malware applications of different families (e.g., crypto, petya, locker) downloaded from Virusshare.com repository [51].Similarly, 1000 non-malware applications (freely available apps) are included resulting in a dataset of 2000 applications.We use a three-step ML-based mechanism: (i) feature extraction, (ii) feature selection, and (ii) application classification

B. Feature Extraction
The choice of a good feature set is the initial phase of any data mining approach.A few of the extracted features are inspired by previous work [31], [38], however, more features have also been added in this research i.e., hardware performance counters [37], [38], DLLs [52], and strings [16], [31], [53].We have extracted a total of 1713 features and 10985 features during static and dynamic analysis, respectively.Cuckoo Sandbox is selected in a Linux platform for automated dynamic analysis of Windows executable malware.It automatically runs and analyzes files and collect comprehensive analysis results that outline what the malware does while running inside an isolated operating system.All processes and file changes are tracked and logged.Generated logs and behavioral analysis reports are recorded by Cuckoo.For validation, we have used the K-fold (k=10) cross-validation mechanism and compare the malware detection accuracy of different classifiers to make sure that the dataset is used uniformly without any biasness.This results in unbiased training and testing cycles producing the results on which we could conclude with confidence.For each cycle of the training/testing, a 20% testing and 80% training partition was employed.A list of features extracted are shown in Tables V and VI as sketched in Appendix A.

C. Feature Selection
The reduced number of features increases ML model performance with minor or negligible effects on classification decisions.Moreover, feature selection minimizes the overfitting factors and the time required for training/testing increases the accuracy to generate simple interpreted models.For this, we employ the information gain criterion [54].A specific method called infogainAttributeEval from Weka is applied for attribute selection.The value of information gain determines how important a given attribute of the feature vectors is by assigning weights to emphasise the effectiveness of the features.Therefore, the top 25 features out of 1713 selected after applying the feature selection infogain algorithm for static analysis, and top 47 features out of 10985 were selected for dynamic analysis.Figure 4 depicts the top 10 static features formulated using the Info-gain method where X-Axis shows the rank of the feature.

D. Model Selection and Training
Considering the nature of the employed dataset (i.e., categorical and mixed data), this study has been conducted using the three well-known ML classifiers: Naive Bayes (NB) [55], Random Forest (RF) [56], [57], and Decision Tree (J48) [58], [59] which are more suitable for categorical and mixed data.The area under the ROC Curve [60] is a common mechanism to calculate the performance of a certain ML classifier.A higher value (i.e., near to 1) reflects the better classification capability of the ML classifier.Figure 5a shows the ROC Curve for the CHA.As depicted in Figure 5a, the RF stands prominent as compared to other ML classifiers that have attained area under the ROC curve up to 0.9816 for both classes (i.e., malware and non-malware).This indicates that the RF is the best performing classification model as compared to the other two models.

V. EXPERIMENTAL SETUP, RESULTS AND DISCUSSION
We have performed experiments on a stand-alone machine having specifications shown in Table IV.For performance evaluation of selected classifiers, we employed the following metrics.
Accuracy: We have used accuracy to evaluate the results.The accuracy is the fraction of the total number of correctly classified applications as malware or non-malware.Where TP, TN, FP, and FN stands for True Positive, True Negative, False Positive, and False Negative respectively.

Accuracy =
T P + T N T P + T N + F P + F N Precision: Precision denotes the proportion of the predicted correctly classified applications to the total of all applications that are correctly real positives.

P recision = T P T P + F P (2)
Recall: is the fraction of the actual apps that are correctly classifies to the total number of the apps that are classified correctly or incorrectly.
F-Measure: F-measure is the harmonic mean of precision and recall.F measure represents the value that tells how much the model is capable of making fine distinctions.
F M easure = 2 × P recision * Recall P recision + Recall (4) For evaluation, accuracy-related results are reported which can be defined as the fraction of the total number of correctly classified applications as malware or non-malware [61].Figure 6 shows the accuracy results for the proposed model CHA for all three ML classifiers.It is evident from the results that the CHA have shown excellent accuracy indicating that a goodpercentage of known malware can be identified using time-/cost-efficient and safer mechanism as compared to risking autonomous vehicle operations with dynamic analysis for all the potential applications.
Based on the values of the True Positive and True Negative, we have calculated precision, recall, and F-measure for CHA approach.The results of the precision and recall of classification using all the three classifiers of the CHA approach are explained in Figure 6.Results depict that RF generated 32.7% and 5.5% improvement in precision as compared to NB and J48.The values of precision for RF, NB, and J48 are 0.96, 0.723, and 0.91, respectively.RF attained the highest values of precision and recall.

VI. CONCLUSION
With the advancement in technology and use of smart connected vehicles, we can find examples where cybercriminals have already proven their intent by exploiting several vulnerabilities in the smart transportation systems of automotive ecosystem.we expect to see dramatic increase of cyber attacks against them.The vulnerabilities in the software of AVs may prove far more dangerous than malware that may appear in personal computers and mobile devices.Malicious applications harm the lives of drivers, passengers as people who are not using AVs.In this paper, we performed a comprehensive  For the future work, we plan to develop future of intelligent transportation system in smart cities that can efficiently detect high priority attacks based on IDS and evaluate their effectiveness using simulations.

Fig. 2 .
Fig. 2. Types of Operating Systems (OS) used in Smart Autonomous Vehicles

TABLE II MICROSOFT
[50]OWS BASED SERVICES FOR AUTOMOTIVE VEHICLES Microsoft aspires to empower automakers in their goals for fully autonomous driving, with elegant machine learning and artificial intelligence capabilities, as well as advanced mapping services.For instance, more recently Microsoft has partnered with TomTom, HERE and Esri, to create more intelligent location-based services across Microsoft[50].Furthermore, pseudo-code for the proposed security modules for AVs i.e., the hybrid mechanisms CHA is shown in Algorithm 1. TableIIIrepresents the notations used in pseudocode for CHA.
Extract i in Cuckoo Sandbox 3 Analyze i for static analysis 4 Get list of static (f ) 1 5 Analyze i further for dynamic analysis 6 Get list of dynamic (f ) 2 7 Set (f ) 1 and (f ) 2 for analyzer m 8 m predicted result of the i 9 Get output from predictive modeling 10 if Yes then Algorithm 1: Pseudo-code for CHA input : Applications as i output: Classification as malware or non − malware 1 Process: 2