1 Introduction

In the fourth quarter of 2019, 35 millions of malware targeting mobile devices appeared [1]. On average, about 15 malicious applications appeared per minute. Due to the threat, many commercial antivirus products such as Bitdefender, Norton, McAfee, BullGuard, Panda, Kaspersky, ESET, Avira, Avast were launched. However, their critical limitation is that they cannot detect unknown malware because they rely on signatures of known malicious applications [66]. Therefore, the research community have been focusing on developing malware detection approaches by using a machine learning or deep learning algorithm with various features for protecting users from emerging malware [2, 5, 7, 10, 12, 14,15,16, 18,19,20,21,22,23, 26, 27, 30,31,34, 36,37,42, 45,46,51, 54, 58, 61,62,64, 66, 67, 70, 72,73,74, 76,77,78, 81,82,90, 92]. In particular, a lot of malware detection approach using deep learning algorithms were recently introduced [23, 30, 31, 34, 38, 45, 47, 49, 51, 54, 58, 77, 78, 85, 88, 92].

However, previous deep learning-based malware detection approaches commonly require very high cost (in terms of computing resources) for using them because they use a combination of multiple features to achieve the high accuracy [71]. For example, a classifier model generated by the convolutional neural network (CNN) requires enormous amount of memory for classifying data [44]. Consequently, albeit previously proposed deep learning-based malware detection systems could achieve very high accuracy, it is unlikely to employ them on our mobile devices of which computing resources are limited or personal computers. Therefore, it is of great importance to develop a malware detection approach that can protect users from newly emerging malware and can be practically used.

In this work, we propose a practical malware detection system, MAPAS, that achieves high accuracy against known and unknown malware as well as adaptable usages of computing resources. MAPAS learns behaviors of malicious applications based on API call graphs by using a deep learning algorithm (CNN). Then, it detects malware based on common patterns of API call graphs of malware. For efficiently detecting malware, MAPAS does not utilize a classifier model created by CNN but uses a lightweight classifier that calculates a similarity score between API call graphs used for malicious activities and API call graphs of applications that are going to be classified by using the Jaccard Similarity algorithm [3].

To show the effectiveness and efficiency of MAPAS, we thoroughly evaluate our prototype and compare it with a state-of-the-art Android malware detection approach, MaMaDroid [61]. MaMaDroid also utilizes API call graphs for detecting malware based on their behaviors. Our evaluation results demonstrate that MAPAS achieves better performance in terms of a processing time to classify applications and MAPAS uses much lower memory than the previous approach. Specifically, MAPAS classifies applications 145.8% faster and uses memory around ten times lower than MaMaDroid (when it used the random forest algorithm). In addition, MAPAS achieves higher accuracy (91.27%) than MaMaDroid (84.99%) for detecting unknown malware (i.e., when they classify newer malware released later than ones in our training dataset).

In summary, this paper makes the following contributions:

  • We propose a practical Android malware detection system, MAPAS, that find malware based on malicious behavioral features. To this end, MAPAS learns API call graphs of malware and detects malware based on analyzed patterns of API call graphs used for malicious behaviors. MAPAS employs a deep learning algorithm not to use a classifier model generated by the algorithm but only to discover common features of malware. MAPAS performs malware detection with a lightweight classifier for the efficiency.

  • We implement a prototype of MAPAS and thoroughly evaluate it. Also, we compare MAPAS against MaMaDroid to demonstrate the effectiveness and efficiency of it. Our evaluation results show that MAPAS achieves better performance than MaMaDroid in terms of the usage of computing resources as well as the accuracy for detecting new malware. Also, MAPAS can generally detect any type of malware with high accuracy.

This paper is organized as follows. We first provide technical backgrounds in Sect. 2. Section 3 explains the goals of MAPAS and presents the specific design approach in Section 4. We evaluate MAPAS to demonstrate its effectiveness and efficiency in Sect. 5. Previous studies are discussed in Sect. 6. Finally, Sect. 7 discusses the conclusion.

We release the source code of our proof-of-concept implementation at https://github.com/okokabv/MAPAS.

2 Background

In this section, we introduce malware detection methods and a common limitation of machine/deep learning-based Android malware detection approaches, the mainstream of malware detection approaches, that hinders practical uses of them.

2.1 Detecting android malware

Android malware detection approaches can be categorized into two groups based on analysis methods (i.e., dynamic analysis and static analysis) used to collect features of malware: (1) dynamic analysis-based malware detection approaches and (2) static analysis-based ones.

Dynamic analysis-based malware detection approaches have an advantage over static analysis-based approaches in analyzing concrete behaviors of malware [5, 12, 14, 22, 26, 27, 32, 36, 67, 70, 73, 74, 81, 83, 87, 90]. Also, they have another advantage of analyzing malware equipped with anti-analysis mechanisms such as obfuscation. However, typically the dynamic analysis method consumes a lot of resources and time because we actually need to execute applications.

On the other hand, static analysis-based malware detection approaches identify features of malware without executing them, and thus, the cost for analyzing each application is much lower than dynamic analysis-based approaches in general [2, 7, 10, 15, 16, 18,19,20,21, 23, 30, 31, 34, 38, 39, 42, 45,46,47, 49,50,51, 54, 58, 63, 64, 66, 72, 76,77,78, 82, 84,85,86, 88, 89, 92]. Because of the advantage of using less computing resources and high accuracy in static analysis-based malware detection approaches, most malware detection approaches employ the static analysis method for extracting malware’s features.

2.2 Typical features used for static analysis-based malware detection approaches

The first step to develop a malware detection system is to decide features of malware to distinguish them from benign applications. Typically, developer-written descriptions, user reviews, permissions, opcode and APIs are used as such features.

Developer-written descriptions A couple of research work employed developer-written descriptions on applications as a key feature for detecting malware [53, 62]. However, detecting malware based on developer-written descriptions is not reliable because inferring accurate execution behaviors of applications is unlikely possible.

User reviews Among Android malware detection approaches, there were attempts that employ user reviews as an important feature [33, 41]. However, similar to the malware detection approaches that use developer-written descriptions, the accuracy is not high enough to be used in a practical manner because user reviews usually do not contain concrete explanations on applications that can be used for detecting malware.

Opcode Several previous work showed there are common patterns of opcode that can be used to classify malicious applications [16, 54, 66, 85]. They used common patterns of opcode such as move and invoke of bytecode in malicious applications.

Permissions There have been many research work for detecting malware based on permissions that applications require (e.g., a user’s location, phone information, a mobile device’s network status etc.) [10, 19, 23, 42, 46, 63, 64, 76]. These approaches detect malware by using commonly used permissions such as network permission with users’ location in malicious applications. However, Avdiienko et al. [11] showed that similar to malware, most benign Android applications access sensitive information of users and use a lot of permissions that are also typically used in malware. Consequently, permission-based malware detection approaches could incur a high false positive rate.

APIs Many approaches attempted to classify malicious applications based on APIs used in them [2, 18, 30, 34, 37, 40, 58, 61]. By analyzing APIs used in an applications, we can understand functionalities that the application provide to users. For example, if an application uses APIs such as android.telephony and android.telecom, we can know that the application would monitor a mobile phone’s network status and manages phone calls. As such, Android APIs provides functional information about what an application does. Therefore, we can infer an application’s behavior by using APIs used in the application. However, if we only use APIs as a key feature for identifying malware, we can have high false positives because analyzing APIs does not provide an application’s concrete behaviors and there are a lot of common APIs used in both benign and malicious applications [11].

2.3 Unpractical machine/deep learning-based android malware detection approaches

Within several years, a surge of studies were proposed to detect Android malware by employing machine or deep learning-based approaches, which classified malicious application based on features discussed in the previous section (Sect. 2.2) [2, 5, 7, 10, 12, 14,15,16, 18,19,20,21,22,23, 26, 27, 30,31,34, 36,37,42, 45,46,51, 54, 58, 61,62,64, 66, 67, 70, 72,73,74, 76,77,78, 81,82,90, 92]. Among them, recently proposed approaches usually employed deep learning algorithms which utilize artificial neural networks [23, 30, 31, 34, 38, 45, 47, 49, 51, 54, 58, 77, 78, 85, 88, 92]. The notable advantage of deep learning algorithms is that they can eliminate the need of domain expertise and manual feature extraction because they learn features of data algorithmically [68]. However, previous approaches commonly require very high cost (in terms of computing resources and times) for using their approaches because they use a combination of multiple features to achieve the high accuracy [71]. Consequently, even though they could achieve the high accuracy, it is difficult to employ them in a practical manner due to the high cost for using them.

3 Goal

In this work, our goal is to detect malicious applications efficiently while achieving the high accuracy (1) to reduce the cost for detecting them and (2) to deal with the increasing Android malware. To this end, we optimize the Android malware detection process by using a deep learning algorithm with a deep learning interpretation approach for extracting dominant, common features used in malware.

Deep learning-based malware detection approaches showed the high accuracy but have the disadvantage of using a lot of computing resources and times (as discussed in Sect. 2.3). In general, the cost for using a deep learning algorithm (to construct a classifier model) and even for using the model to actually classify malware is very expensive because they used complex features for increasing the accuracy. In this paper, we use a deep learning algorithm with a deep learning interpretation approach not for classifying malicious applications from benign applications, but only for identifying high-weight features of malware. We, then, build a low-cost classifier that finds malicious applications based on only such high-weight features identified by a deep learning algorithm. In this way, we can avoid heuristic feature selection for detecting malware as well as we can reduce the usage of computing resources and times for detecting malware (Fig. 1).

Fig. 1
figure 1

Overview of MAPAS

4 Design

In this section, we first overview the proposed system, code-named MAPAS, (in Sect. 4.1) and demonstrate details of each step for detecting malware in Sects. 4.24.3 and 4.4.

4.1 Design overview

Malware features used In this work, we attempts to detect malicious applications based on common patterns of their API call graphs. With API call graphs, we can find concrete malicious behaviors of malicious applications [20, 50, 72]. To be specific, MAPAS analyzes frequently used patterns of API call graphs which can lead to leakages of sensitive information (social security numbers, credit card numbers, passwords, etc.) with a deep learning algorithm. MAPAS, then, detects malware based on the identified patterns of malicious API call graphs.

The design of MAPAS consists of the following three steps:

  1. (1)

    Data Preprocessing As the first step, MAPAS generates training dataset through extracting API call graphs from malicious and benign applications. Specifically, MAPAS obtains API call graphs by conducting the taint analysis with Flowdroid [9].

  2. (2)

    Identifying High-weight API Call Graphs In this step, MAPAS first vectorizes training dataset and performs deep learning on the dataset by using convolution neural networks (CNN). After the learning phase finishes, MAPAS uses the deep learning interpretation approach, Grad-CAM, to discover high-weight API call graphs used in malicious applications.

  3. (3)

    Malware Detection In the last step, MAPAS classifies malware by using the Jaccard algorithm which calculate the similarity between API call graphs of an application and the high-weight API call graphs of malicious applications.

4.2 Data preprocessing for generating training dataset

MAPAS extracts API call graphs of applications by conducting taint analysis. Taint analysis is a static analysis method used to track data flows in an application. Specifically, we use a taint analysis for analyzing data flows from specific sources that read sensitive data (e.g., a function reading a password) to sinks which can transfer data (e.g., a function writing to a socket) by identifying whether sensitive information can be leaked or not. Hence, we can find potential sensitive leakages from an application.

For MAPAS, we chose a static analysis tool based on evaluation results from Arzt [8] and Qiu et al. [65]. There are many taint analysis tools such as Flowdroid [9], AppScan [28], Epicc [60], JoDroid [56], DroidSafe [25] and Amandroid [80]. Among them, Arzt [8] and Qiu et al. [65] showed that overall Flowdroid has the best results in terms of the accuracy and the runtime performance. Therefore, in this work, we generates API call graphs based on taint analysis results from Flowdroid [9]. The detail process for generating API call graphs with Flowdroid is shown in Fig. 2.

It is worth noting that we exclude applications that have obfuscated API calls for the taint analysis. MAPAS uses Flowdroid that cannot extract API call graphs for API hiding techniques and class encryption techniques among obfuscation techniques such as renaming, control flow, string encryption, API hiding and class encryption [52]. Therefore, MAPAS has to exclude obfuscated applications that cannot extract API call graphs from Flowdroid. We leave this limitation as a future work (Fig. 3).

Fig. 2
figure 2

Process of extracting API call graphs

Fig. 3
figure 3

Example of API call graphs extracted from an APK

4.3 Deep learning and identifying high-weight API call graphs from malware

MAPAS uses a deep learning algorithm (CNN) [44] for the training dataset. While learning the dataset, the algorithm finds important features from the collected API call graphs used in malware and constructs the classification model. MAPAS, then, discovers the important features by using a deep learning interpretation approach, Grad-CAM [69]. These features will directly be used to detect malicious applications with the Jaccard algorithm. (MAPAS does not use the classifier model generated by CNN.)

Vectorizing API Call Graphs In order to apply deep learning on API call graphs, which is text-type data, they must be converted into a vector. To vectorize text-type data, we can map each word in the data to an integer and create a vector with mapped integer numbers. Also, we can vectorize text-type data by analyzing the correlation between words known as word2vec [55] and analyzing the correlation between documents known as doc2vec [43]. MAPAS does not use vectorization methods such as word2vec and doc2vec but vectorizes API call graphs by simply mapping each API call graph to an integer number. For detecting malicious applications, API call graphs that MAPAS needs to find are specific sequences of function calls from the sources to the sinks as we discussed in Sect. 4.2. Each of malicious API call graphs represents a possible case of the sensitive information leak. Therefore, to detect malware, MAPAS should focus on finding the existence of such API call graphs rather than analyzing relationships between API call graphs.

Learning the dataset: MAPAS analyzes API call graphs commonly used in malware which can leak the sensitive information. To this end, MAPAS uses CNN [44] for learning the vectorized dataset. CNN is an effective deep learning algorithm for text-type data by using regional information of the data [35]. Please refer to “Appendix A” for the details on CNN. By learning the vectorized dataset with CNN, MAPAS can find common patterns of API call graphs that are frequently used in actual malicious applications. The overall learning process in MAPAS is illustrated in Fig. 4.

Fig. 4
figure 4

Learning process using vectorized API call graphs with CNN

Finding high-weight features with a deep learning interpretation approach Deep learning models are a black-box model. Due to their multilayer and nonlinear structures, their predictions are not transparent [57]. CNN, also, operates in a black-box way, we cannot transparently figure out which API call graphs have high weights (which API call graphs are important) to detect malware from a classifier model generated by CNN. Hence, several deep learning interpretation approaches were proposed to transparently show specific data that substantially contributed to constructing a classifier model generated by a deep learning algorithm [4, 29].

To observe high-weight API call graphs analyzed by CNN, MAPAS employs Grad-CAM [69] that produces visual explanations from CNN-based models. Please refer to “Appendix B” for more details on the approach.

As a result of using Grad-CAM, MAPAS found a high-weight API call graph of which the source is android.content and the sink is java.net. This call graph can leak user’s sensitive information over the network.

After discovering high-weight features with Grad-CAM, MAPAS can classify malicious applications from benign ones based on such features. Note that MAPAS does not detect malware with the classifier model generated by CNN for reducing the cost in terms of the usage of computing resources. In Sect. 5, we demonstrate the effectiveness and efficiency of MAPAS by comparing it to the classifier model generated by CNN (Fig. 5).

Fig. 5
figure 5

A process for finding high-weight features using Grad-CAM

Fig. 6
figure 6

Malware classification process of MAPAS

4.4 Malware detection

For detecting malicious applications, MAPAS measures the similarity between two sets (the high-weight API call graphs and call graphs extracted from an unclassified application) by using Jaccard similarity algorithm [3] as shown in Fig. 6.

The Jaccard similarity has a value between 0 and 1. If two sets are exactly equal to each other, the similarity score is 1, and if two sets are totally different, the similarity score is 0. The expression of Jaccard similarity algorithm is as follows.

$$\begin{aligned} J(A,B)=\frac{\left| A\cap B \right| }{\left| A\cup B \right| }=\frac{\left| A\cap B \right| }{\left| A \right| +\left| B \right| -\left| A\cap B \right| } \end{aligned}$$
(1)
Table 1 Overview of the datasets used in our experiments

MAPAS considers an application is malware if the similarity score is higher than a threshold (0.4303) that we set based on testing results as in Sect. 5.2.

5 Evaluation

In this section, we evaluate MAPAS to demonstrate its effectiveness and efficiency. Our evaluation addresses the following research questions:

  1. RQ 1.

    How much computing resources does MAPAS use to detect malware?

    In Sect. 5.3, we evaluate the efficiency of the MAPAS’s malware detection process by comparing it with the efficiency of a classifier model that is generated by a deep learning algorithm. In Sect. 5.4, we also compare the efficiency of MaMaDroid [61] with MAPAS.

  2. RQ 2.

    How accurately can MAPAS detect malware?

    In Sect. 5.4, we first evaluate the effectiveness of MAPAS by measuring the accuracy of malware detection results.

  3. RQ 3.

    Can MAPAS detect newly emerging malicious applications?

    We evaluate the effectiveness of MAPAS against malicious applications created later than the training datasets in Sect. 5.4.

5.1 Experimental configuration

Setup We performed our evaluations on a workstation running Ubuntu 18.04 with a 20-core Intel Xeon Gold 6230 CPU at 2.10 GHz, 128 GB RAM and a NVIDIA GeForce RTX 2080 GPU.

Datasets We first collected the top 10,000 applications from Google Play Store [24]. We, then, randomly downloaded 10,653 malicious applications released in 2018 and 2019 from VirusShare [75]. In addition, we used 23,039 malicious applications from Android Malware Dataset (AMD) [79]. Wei et al. classified the AMD into 70 categories [79].

Table 1 shows the number of applications used for our evaluation. Training dataset is used for generating a classifier model with CNN. We used Test dataset for evaluating the effectiveness of MAPAS.

Table 2 High-weight API call graphs discovered by Grad-CAM
Table 3 Performance evaluation results of MAPAS and CNN

Hyper-parameters In order to minimize the usage of computing resources in the learning phase, MAPAS uses one layer of the convolution layer and one layer of the pooling layer. Specifically, MAPAS uses the following hyper-parameters: Embedding layer: 64 dimensions; Convolution layer: \({\textit{filters}}=32\), \({\textit{kernel}}\_{\textit{size}}=1\), and the rest use default values; Pooling layer: max pooling; Compile: \({\textit{optimizer}}=\)\({\textit{rmsprop}}\)’, \({\textit{loss}}=\)\({\textit{binary}}\_{\textit{crossentropy}}\)’, \({\textit{batch}}\_{\textit{size}}=500\), \({\textit{epochs}}=100\). The total number of nodes used in the CNN model is 1,128,089.

5.2 Finding high-weight features

Training dataset 9000 malicious applications provided by VirusShare [75] and 9,000 benign applications downloaded from Google Play Store [6] were used for training a classifier model with CNN. To this end, we extracted API call graphs from the 18,000 applications by using Flowdroid [9]. In total, we obtained 21,690 unique API call graphs and used them as a training dataset.

Model learning and verification We trained a classifier model by using CNN with the training dataset. Next, we verified the classifier model by employing the k-fold cross-validation approach. The accuracy of the classifier model measured by the validation method is 0.9695 on average.

Finding high-weight Features with Grad-CAM After generating the classifier model, we used Grad-CAM [69] to observe high-weight API call graphs. The number of API call graphs that have a positive weight score is 4312 as shown in Table 2. Based on these 4312 API call graphs, MAPAS finds malicious applications.

To pick a threshold, we measured the Jaccard similarity between the high-weight API call graphs and API call graphs extracted from malicious applications and benign ones. As result, the similarity score is 0.561 and 0.2996, respectively. We used the average value (0.4303) of two scores as a threshold value for detecting malware. In this work, MAPAS can avoid biased results by using the average value. In other words, MAPAS avoid false negatives that when the classifier detections the application is benign when it is actually malware and false positives that is classifying the application is malware when it is actually benign by using average score.

5.3 Performance evaluation of MAPAS with the CNN classifier model

MAPAS uses the Jaccard similarity algorithm as a classifier to detect malware. We evaluated the performance and the usage of computing resources of MAPAS’s malware detection process. Also, we measured the performance and the usage of computing resources of the classifier model generated by CNN. For this evaluation, we used 1000 malicious applications and 1000 benign applications of the test dataset as shown in Table 1.

Table 3 shows the experimental results. To classify 2000 applications, MAPAS took 21.18 s (1.059 ms on average) on a single core. The classifier model processed them in 15.92 s (0.796 ms on average) by using one GPU. It is worth noting that, when we used the classifier model without using a GPU, we could not finish processing 2000 applications within 24 h. In addition, as in Table 3, the classifier model used 10,590 MiB of GPU memory and about 2070 MB of RAM (1214.16% more than MAPAS). We, also, measured the detection accuracy. The CNN classifier model showed 11% lower detection rate than MAPAS.

5.4 Performance evaluation of MAPAS with MaMaDroid

We compare the performance of MAPAS to previous work (MaMaDroid [61]). Similar to MAPAS, MaMaDroid uses API call graphs of malicious applications to detect them. To compare the performance, MAPAS and MaMaDroid [61] created a classifier by using 9000 benign applications and 9000 malicious ones in the training dataset. MaMaDroid converted API call graphs into Markov chain [59] and created a classifier by learning 198,916 features. On the other hand, MAPAS used unique 21,659 API call graphs for creating a classifier. By default, MaMaDroid uses random forest (RF) [13] and k-nearest neighbors (k-NN) [17]. Also, in this evaluation, we did not use a GPU but only a CPU for both MaMaDroid and MAPAS.

Performance of the learning process Figure 7 shows the evaluation results of learning phases in each system. MaMaDroid+CNN used about 1214% of RAM more than MAPAS for the learning phase (MAPAS used 2.26 GB of RAM and MaMadroid+CNN used 34 GB of RAM). Also, MaMaDroid+CNN spent 5.45 times as much time as MAPAS did to finish learning the dataset. However, MaMaDroid+RF and MapaDroid+k-NN finished the learning phase faster than MAPAS, even though they used much more memory than MAPAS.

Performance of the classification process To evaluate the classification process of MAPAS and MaMaDroid, we used each system for classifying 2000 applications in the test dataset. The evaluation results are shown in Figs. 8 and 9. Overall, MaMaDroid using the random forest algorithm (MaMaDroid+RF) showed the best accuracy as in Fig. 8. MAPAS achieves about 3% lower accuracy than MaMaDroid+RF. However, MAPAS showed the best performance in terms of the execution time and the lowest RAM usage as illustrated in Fig. 9. To be specific, MAPAS can classify applications 76.4% and 145.8% faster than MaMaDroid+RF and MaMaDroid+k-NN, using much lower memory (MAPAS used memory around ten times lower than MaMaDroid+RF).

Fig. 7
figure 7

Performance evaluation results of the learning process of MAPAS and MaMaDroid

Fig. 8
figure 8

Accuracy of classification results of MAPAS and MaMaDroid

Fig. 9
figure 9

Performance evaluation results of the classification process of MAPAS and MaMaDroid

Table 4 Accuracy of MAPAS and MaMaDroid for detecting malware in 70 categories

Detecting malware of various categories We evaluated the effectiveness of MAPAS and MaMaDroid+RF for detecting Android malware in 70 categories defined by Wei et al. [79]. The measurement results are shown in Table 4. MAPAS showed about 99% accuracy for 70 malware categories on average. This result demonstrates that MAPAS can generally detect any type of malware with high accuracy. On the other hand, MaMaDroid detected malware with 69% accuracy on average. Specifically, MaMaDroid showed high accuracy for detecting malware in categories such as BankBot, Univert, Utchi, FakeDoc, but it cannot accurately detect malware in categories such as Bankun, FakePlayer, FakeUpdates, Leech, Nandrobox, SlemBunk, Smskey and SmsZombie. Bankun, Fakeplayer are Trojan-type that hides in the normal flow and abruptly execute [79]. This means Trojan-type does not affect the current state to the next state transition. Thus, MaMaDroid cannot detect Trojan-type malware due to it uses Markov chain which relies on the current state and next state probability of transitions.

Detecting unknown malware We evaluated the performance of detecting unknown malicious applications by using MAPAS and MaMaDroid+RF. To this end, we collected malware, released later than applications in the training dataset, from VirusShare [75]. As in Table 4, MAPAS showed 91% accuracy for detecting unkown malware, which is 6% higher than MaMaDroid.

6 Related work

DroidRisk [76] and Dini et al. [19] used permissions as features of malware for detecting them by using Analytic Hierarchy Process (AHP). Also, the following work employed permissions as features of malware but used different learning algorithms. Peng et al. [64] used Naive Bayes, Zarni et al. [10], Li et al. [46], FAMOUS [42] and Pehlivan et al. [63] used tree-based machine learning algorithms to detect malware. Ganesh et al. [23] used CNN.

Santos et al. [66], TinyDroid [16], McLaughlin el al. [54] and Deeprefiner [85] classified malware based on their opcode (bytecode instructions) using various machine learning algorithms such as SVM, k-NN, decision tree, naive Bayes, Bayesian networks, multilayer perceptron (MLP) and long short-term memory models (LSTM).

Droidapiminer [2] detected malware with machine learning algorithms such as k-NN, Iterative Dichotomiser 3 (ID3), SVM and C4.5 by using frequently used APIs in malware. Nix et al. [58] and MalDozer [34] also attempted to detect malware by using CNN, LSTM, SVM and Naive Bayes based on APIs used in malware. Droiddelver [30] detected malware by analyzing API call blocks of them with deep belief network (DBN) and restricted Boltzmann machine (RBN) algorithms.

On the other hand, Yerima et al. [86], Droidmat [82], Drebin [7], DroidDolphin [84], Chan et al. [15] used complex features (i.e., using more than two different types of features such as Permission, API, Opcode) with various machine learning algorithms. DroidDeepLearner [77], Hou et al. [31], Li et al. [49], Li et al. [47], Zhang et al. [88], kim et al. [38] proposed deep learning-based malware detection systems based on complex features. The above studies showed a high detection rate by detecting malware with features containing various information but require a lot of computing resources.

The closet related work to this paper is MaMaDroid [61] that used Markov chain [59] to calculate the probability of transition from the current state (Sources) to another state (Sinks) from API call graphs used in malicious applications. MaMaDroid, then, utilized k-NN and random forest algorithms to train the Markov chains and to generate a classifier model. Besides, DeepFlow [92] and EveDroid [45] also used API call graphs for detecting malware. They especially focused on detecting newly emerging malicious applications by using a deep learning algorithm.

7 Conclusion

In this paper, we proposed MAPAS, an effective and efficient malware detection approach. MAPAS analyzes common features of API call graphs extracted from malicious applications by using a deep learning algorithm. Then, it detects malware based on the features with a lightweight classifier for the efficiency. Our evaluation results showed that MAPAS outperforms a state-of-the-art approach, MaMaDroid [61], in terms of the usage of computing resources and the accuracy for detecting unknown malware. Also, MAPAS can generally detect any type of malware with high accuracy.