Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

According to Internet Security Report, 1.4 billion smartphones were sold in 2015 and 83,3 % phones were running Android, [1]. Their users may save information about their personal identities, online payment system access and user’s credentials. Malware authors, cyber criminals aim to steal these information via the distribution and installation of android applications. Overall, 3.3 million applications were classified as malware in 2015. Malware authors deliver this large variety and volume of malicious software by using advanced obfuscation techniques. Therefore, behavior-based malware analysis and classification of a malware sample to its original family plays a crucial and timely role at taking security and protection counter measures.

Android is a complete operating system that uses Android application (app) package (APK) for distribution and installation of mobile apps. APK file contains components which share a set of resources like database, preference, files, classes compiled in the dex file format, etc., App components are divided in four categories: activities handling the user interaction; services carrying out background tasks; content providers managing app’s data; broadcast receivers assuring communications between components, app’s, even more Android OS. The manifest declares the app’s components and how they interact. Also user permissions required by the apps are placed in the manifest file. Android is a privilege-separated operating system, in which each application runs with a distinct system identity (Linux user ID and group ID). Parts of the system are also separated into distinct identities. Linux thereby isolates applications from each other and from the system.

Several commands can be used to infect Android devices. For example, Cat command, i.e., System/bin/cat displays files in the system and it can be executed for malicious purposes. The command-line tool LogCat can be used for viewing the internal logs. Log messages may include privacy-related information. An app can access the log file by giving every app the READ_LOGS permission with aid of the chmod command. The list of commands is described in Table 1.

Table 1. List of system commands and command’s execution frequency by our malware test set

In line with the emerging market of android smartphones, detection and classification of its malware has attracted a lot of attention. Static analysis of the executables by using commands, and modelling of malware features by using permissions and API calls is presented for the detection of a malware in [2, 3]. K-means algorithm for clustering and a decision tree learning algorithm for classification of a malware is presented by monitoring various permission based features and events extracted from applications in [4]. A learning model database is obtained by collecting the extracted features and N-gram signatures are created in [5]. Text mining and information retrieval is applied for the static analysis of a malware in [6]. In [7], a heuristics approach by using 39 different behaviour flags such as Java API calls, presence of embedded executables and code size is developed to determine whether an application is malicious or not. A deep learning for automatic generation of malware signature is studied to detect a majority of new variants of a malware in [8]. And, a detection model is trained with the information gathered via the communication among components. A security framework has been deployed by an European project called NEMESYS for gathering and analyzing information about the nature of cyber-attacks targeting mobile devices and presented a model-based approach for detection of anomalies [911].

The paper is organized as follows: In Sect. 2, we present the selected features. In Sect. 3, we implement online machine learning algorithm to the classification of malware samples and we evaluate the results. Finally, we conclude our paper.

Table 2. Features and their types

2 Feature Set

Cuckoo Sandbox is an open source analysis system and relies on virtualization technology to run a given file, [12]. It can analyze both executable and non-executable files and monitor the run-time activities. In this study, we extracted the most significant and distinguishing behavioral features from the Cuckoo’s analysis report. The list of android malware features is given in Table 2. The permissions requested by the applications are ranked according to their persistency in Table 3.

Table 3. Top 20 requested permissions

3 Implementation

The testing malware dataset is obtained from “VirusShare Malware Sharing Platform” ([13]), which provides a huge amount of different type malware including PE, HTML, Flash, Java, PDF, APK etc. All experiments were conducted under the Ubuntu 14.04 Desktop operating system with Intel(R) Core(TM) i5-2410M@2.30 GHz processor and 2 GB of RAM. The analysis with 5 guest machines took 5 days to analyze approximately 2000 samples. For labeling malware samples, we used Virustotal, an online web-based multi anti-virus scanner, [14]. The malware classes along their class-specific measures are given in Table 4.

Table 4. Malware families and their class-specific measures

3.1 Online Classification Algorithms

In general, an online learning algorithm works in a sequence of consecutive rounds. At round t, the algorithm takes an instance \(\mathbf {x}_t \in \mathbb {R} ^{d}\) , d-dimensional vector, as input to make the prediction \( \hat{y}_t \in \left\{ +1, -1\right\} \) (for binary classification) regarding to its current prediction model. After predicting, it receives the true label \( y_t \in \left\{ +1, -1\right\} \) and updates its model (a.k.a. hypothesis) based on prediction loss \( \ell (y_t, \hat{y}_t)\) meaning the incompatibility between prediction and actual class. The goal of online learning is to minimize the total number of incorrect predictions; \( sum (t: y_t \ne \hat{y}_t)\). Pseudo-code for generic online learning is given in Algorithm-1.

3.2 Classification Metrics

To evaluate the proposed method, the following class-specific metrics are used: precision, recall (a.k.a. sensitivity), specificity, balanced accuracy, and overall accuracy (the overall correctness of the model). Recall is the probability for a sample in class c to be classified correctly. On the contrary, specificity is the probability for a sample not in class c to be classified correctly. The metrics are given as follows:

$$\begin{aligned} \textit{precision}= & {} \frac{\textit{tp}}{\textit{tp + fp}} \end{aligned}$$
(1)
$$\begin{aligned} recall= & {} \frac{\textit{tp}}{\textit{tp + fn}} \end{aligned}$$
(2)
$$\begin{aligned} specificity= & {} \frac{\textit{tn}}{\textit{tn + fp}} \end{aligned}$$
(3)
$$\begin{aligned} \textit{balanced accuracy}= & {} \frac{\textit{recall + specificity}}{\textit{2}}= \frac{\textit{1}}{\textit{2}} \left( \frac{\textit{tp}}{\textit{tp + fn}} + \frac{\textit{tn}}{\textit{tn + fp}}\right) \end{aligned}$$
(4)
$$\begin{aligned} accuracy= & {} \frac{\textit{correctly classified instances}}{\textit{total number of instances}} \end{aligned}$$
(5)
figure a

For instance, consider a given class c. True positives (tp) refer to the number of the samples in class c that are correctly classified while true negatives (tn) are the number of the samples not in class c that are correctly classified. False positives (fp) refer the number of the samples not in class c that are incorrectly classified. Similarly, false negatives (fn) are the number of the samples in class c that are incorrectly classified. The terms positive and negative indicate the classifier’s success, and true and false denotes whether or not the prediction matches with ground truth label.

3.3 Testing Accuracy Results

The accuracy of testing is computed subject to different value of regularization weight parameter. The regularization weight parameter is denoted by C and determines the size of weight change at each iteration. A larger value means a possibility of a higher change in the updated weight vector and the model is created faster. But as a consequence, the model becomes more dependent to the training set and more susceptible to noise data. 10-fold cross-validation approach is used. The class-wise results for the most successful algorithm (i.e. Confidence-weighted linear classification in [15]) according to the different weight C are given in Table 5.

Table 5. Classification accuracy versus different regularization weight parameter
Fig. 1.
figure 1

Normalized confusion matrix

To analyze how well the classifier can recognize instance of different classes, we created the confusion matrix as shown in Fig. 1. The confusion matrix displays the number of correct and incorrect predictions made by the classifier with respect to ground truth (actual classes). The diagonal elements in the matrix represent the number of correctly classified instances for each class, while the off-diagonal elements represent the number of misclassified elements by the classifier. The higher the diagonal values of the confusion matrix are, the better the model fits the dataset (higher accuracy in individual family prediction). Since android.trojan.bankun family combines many functionalities executed also by other families in our dataset, android.trojan.agent, android.trojan.smskey and android.exploit.gingerbreak are incorrectly estimated as android.trojan.bankun.

4 Conclusions

This paper addresses the challenge of classifying android malware samples by using runtime artifacts while being robust to obfuscation. The presented classification system is usable on a large scale in real world due to its online machine learning methodology. The proposed method uses run-time behaviors of an executable to build the feature vector. We evaluated an online machine learning algorithm with 2000 samples belonging to 18 families. The results of this study indicate that runtime behavior modeling is a useful approach for classifying an android malware.