Keywords

1 Introduction and Motivation

Ransomware is a recent kind of malware that spread out mainly in last couple of years, and it is particularly aggressive for two reasons: on one hand it uses very effective mechanisms of infection based mainly on techniques of social engineering like sophisticated phishing (by mail or chat), and on the other hand it makes completely inaccessible the data on the infected machine, as it cyphers all the files with a strong asymmetric key cryptographic algorithm.

The ransomware is still increasing its capability to harm the victim’s device, prevent the restore of the data or device, and evade detection. As a matter of fact, the more recent releases of this malware are able to recognize when they are executed in a virtual environment, which is often used for creating a sandbox where safely executing a program for studying its behavior and understanding if it launches a malicious payload or not. Additionally, recent ransomware is equipped with anti-debugging techniques, which is another way to evade detection as hindering the scanning of anti-malware.

Statistics from US governative agencies show that Cryptolocker infected in 2014: 336,856 machines in USA, 4,593 in UK, 25,841 in Canada, 15,427 in Australia, 1,832 in India, 100,448 in other countries. At its peak, CryptoLocker, a kind of ransomware, was infecting around 50,000 computers per month. According to SCMagazineFootnote 1 the CryptoWall, another ransomware, in a roughly five-month period infected 625,000 victims worldwide, encrypting 5.25 billion files, collecting more than $1.1 million in ransoms. The malware tries to delete shadow copies of the system through vssadmin.exe, so that the victim cannot return to previous system restore points too.

Moreover ransomware is invading the smartphone world: Kaspersky labs found 1,113 new ransomware samples targeting Android devices in the first quarter of 2015, which is a 65 % increase in the number of mobile ransomware samples with respect to those collected in 2014Footnote 2. This is a dangerous trend since ransomware is designed to extort money, damage personal data, and block infected devices. Once the device is infected, the attacker asks the victim to pay a ransom in order to obtain the key for decrypting the files or restoring the control of the smartphone.

As the evidence of the high infections rate demonstrates, commercial antimalware solutions are mainly ineffective to detect ransomware.

For this reason we propose a technique for specifically detecting ransomware on smartphone devices that is completely based on formal methods. The technique has been proved to be very effective as the evaluation produced an F-measure of detection equal to 0.99 on a dataset of 2, 477 samples. Additionally, the technique is able to localize in the code the peculiar instructions that implement the stages of infection, and the activation of the payload, which provides fundamental pieces of information to build both effective detectors and removal systems for ransomware. Moreover, at the best knowledge of the authors, literature counts only two works that propose a method to detect mobile ransomware [2, 20] and that are compared with ours in the section of related work.

The paper proceeds as follows: Sect. 2 describes and motivates our detection method; Sect. 3 illustrates the results of experiments; Sect. 4 discusses the related work; finally, conclusions are drawn in Sect. 5.

2 The Methodology

In this section we present our methodology for the detection of Android ransomware malware using model checking. While model checking was originally developed to verify the correctness of systems against specifications, recently it has been highlighted in connection with a variety of disciplines see [1, 7]. Moreover, great advancements have been made to tackle the limitation of this technique due to its high time and memory requirements, see [4, 9–11]. In this paper we present the use of model checking in the security field.

2.1 Formal Methods for Ransomware Detection

The approach is structured in three main sub-processes.

Formal Model Construction. This first sub-process aims at deriving formal models starting from the Java Bytecode. The Bytecode of the app under study is parsed and suitable formal models of the system are produced. More precisely, the bytecode of the analysed app that resides in a class folder or in JAR files is fed to a custom parser, based on the Apache Commons Bytecode Engineering Library (BCEL)Footnote 3. The parsed Java Bytecode of the .class files are successively translated into formal models. In our approach Calculus of Communicating Systems (CCS) [15] has been exploited. CCS [15] is one of the most well known process algebras. A Java Bytecode-to-CCS transforming function has been defined for each instruction of the Java Bytecode. We associate a new CCS process to each Java Bytecode instruction. This translation has to be performed only one time for each app to be analysed and it has been completely automated. Each Java Bytecode instruction that is not a (conditional or unconditional) jump is represented by a process that, using the operator (“.”), invokes the process corresponding to its successive instruction. Conditional jumps are instead specified as non-deterministic choices. An unconditional jump is represented by a CCS process that invokes the corresponding process of the jump target.

Temporal Logic Properties Construction. This second sub-process aims to define the characteristic behaviour of a ransomware by means of the construction of the temporal logic properties. This step tries to recognize specific and distinctive features of the ransomware behaviour with respect to all the other malware families and to goodware too. Thus, this specific behaviour is written as a set of properties. To specify the properties, we manually inspected a few samples in order to find the ransomware malicious behavior implementation at Bytecode level. In our approach, the mu-calculus logic [19] is used, which is a branching temporal logic to express behavioural properties.

Ransomware Family Detection. Finally, a formal verification environment, including a model checker, is invoked to recognise the ransomware family. This sub-process checks the sets of logic properties obtained from the ransomware malware family characterization against the CCS model of the app. In our approach, we invoke the Concurrency Workbench of New Century (CWB-NC) [8] as formal verification environment. When the result of the CWB-NC model checker is true, it means that the app under analysis belongs to the ransomware family, false otherwise. Thanks to very detailed CCS model and the logic formulae we are able to reach a good accuracy of the overall results, as explained in the following section.

To the Authors’ knowledge, model checking has never used before for the ransomware detection. The main distinctive features of the approach proposed in this paper are the use of formal methods, the identification of the ransomware through the Java Bytecode and the definition of a fully static approach. More precisely, our methodology exploits the Bytecode representation of the analysed apps. Detecting Android ransomware through the Bytecode and not directly on the source code has several benefits: (i) independence of the source programming language; (ii) recognition of malware families without decompilation even when source code is lacking; (iii) easiness of parsing a lower-level code; (iv) independence from obfuscation.

Another important feature of our approach is that we try to reuse existing model checkers avoiding the design of custom-made model checker. In fact our goal is to recognise ransomware with the criteria of reusing existing checking technologies. Model checkers, especially the most widely used ones, are extremely sophisticated programs that have been crafted over many years by experts in the specific techniques employed by the tool. A re-implementation of the algorithms in these tools could likely yield worst performance.

3 Results and Discussion

3.1 Empirical Evaluation Procedure

To estimate the detection performance of our methodology we compute the metrics of precision and recall, F-measure (Fm) and Accuracy (Acc), defined as follows:

$$ PR = \frac{TP}{TP+FP};\;\; RC = \frac{TP}{TP+FN};$$
$$ Fm = \frac{2 PR\;RC}{PR+RC};\;\; Acc = \frac{TP+TN}{TP+FN+FP+TN} $$

where TP is the number of malware that was correctly identified in the right family (True Positives), TN is the number of malware correctly identified as not belonging to the family (True Negatives), FP is the number of malware that was incorrectly identified in the target family (False Positives), and FN is the number of malware that was not identified as belonging to the right family (False Negatives).

3.2 Experimental Dataset

The real world samples examined in the experiment were gathered from three different datasets. The first one is a collection of freely available 672 samplesFootnote 4 and 11Footnote 5 Android ransomware samples. The samples are labelled as ransomware, koler, locker, fbilocker and scarepackage [2] and appeared from December 2014 to June 2015. The second one is the Drebin project’s dataset [3, 18], a very well known collection of malware used in many scientific works, which includes the most diffused Android families.

Each malware sample in these datasets is labelled according to the malware family it belongs to: each family comprehends samples which have in common the same payload. This collection does not contain ransomware samples: we use this dataset to check the true positives.

The last one is a dataset of trusted applications crawled from Google PlayFootnote 6, by using a script which queries a python APIFootnote 7 to search and download apps. The downloaded applications belong to all the 26 different available categories (i.e., Books & Reference, Lifestyle, Business, Live Wallpaper, Comics, Media & Video, Communication, Medical, Education, Music & Audio, Finance & News, Magazines, Games, Personalization, Health & Fitness, Photography, Libraries & Demo, Productivity, Shopping, Social, Sport, Tools, Travel, Local & Transportation, Weather, Widgets). The applications retrieved were among the most downloaded in their category and were free.

The trusted applications were collected between April 2015 and January 2016 and were later analysed with the VirusTotal serviceFootnote 8, a service able to run 57 different antimalware software (i.e., Symantec, Avast, Kasperky, McAfee, Panda, and others) on the app: the analysis confirmed that the crawled applications did not contain malicious payload. We use this dataset to check the true positives.

Furthermore, we developed a frameworkFootnote 9 able to inject several obfuscation levels in Android applications: (i) changing package name; (ii) identifier renaming; (iii) data encoding; (iv) call indirections; (v) code reordering; (vi) junk code insertion.

Table 1. Dataset used in the experiment

These injections were aimed at generating morphed versions of the applications belonging to the ransomware dataset. Previous works [6] demonstrated that antimalware solutions fail to recognize the malware after these transformations. We applied our method to the morphed dataset in order to verify if it loses its effectiveness, too, or it keeps on recognizing the malware also after they have been altered. Table 1 provides the details of the full collection of 2,477 samples used to test the effectiveness of our method. Regarding the ransomware samples, Table 1 shows the number of original and morphed samples we tested; in some cases our framework was not able to disassemble some of the selected samples, this is the reason why we had to discard them and we consider a lower number of morphed samples if compared with original ones. In order to test the capacity of our rules to identify exclusively ransomware samples, we include in the dataset both trusted and malware samples from other families (respectively Trusted and Other Malware).

Table 2 provides a brief description of the payload brought by the malware families labelled as Other Malware, i.e., malware that is not ransomware.

Table 2. Families in Drebin dataset with details of the installation method (standalone, repackaging, update), the kind of attack (trojan, botnet), the events that trigger the malicious payload and a brief family description.

We test 60 samples for each family. The malware was retrieved from the Drebin project [3, 18] (we take into account the top 10 most populous families).

3.3 Evaluation

As a baseline for evaluating the performances of our solution, we compare the results obtained with our method with those produced by the top 10 ranked mobile antimalware solutions from AVTESTFootnote 10, an independent Security Institute which each year provides an analysis of the performances of the antimalware software. We built this baseline, by submitting our original and morphed samples to the VirusTotal APIFootnote 11, which allows to run the above mentioned antimalware.

Table 3 shows the evaluation between the top 10 antimalware solutions and our method with the original ransomware samples and with the morphed ones.

We consider only the samples and the percentage identified in the right family (column “ident” and the percentage in column “%ident”) in Table 3. We also report the samples detected as malicious but not identified in the right family and the samples not recognized as malware (column “unident”).

Table 3. Top 10 signature-based antimalware evaluation against our method.
Table 4. Performance evaluation

With regards to Table 3 we notice that BitDefender shows better performance in family identification for Ransomware original samples. Instead, with regards to morphed samples, antimalware performance decreases dramatically, indeed BitDefender is able to identify only 86 samples. The worst antimalware in Ransomware identification is Alibaba, able to correctly classify just 3 original samples and 0 morphed samples.

Due to the novelty of the problem, antimalware solutions are not still specialized in family identification; this is the reason why most of antimalware are unskilled to detect families. Another problem is that current antimalware are not able to detect malware when the signature mutates: their performance decreases dramatically with morphed samples. On the contrary, the detection done by our method is barely affected by the code transformations, so it is independent from the signature.

Table 4 shows the results obtained using our method. We consider the sum of original and morphed samples: the detail about the number of original and morphed samples is shown in Table 1.

Results in Table 4 seems to be very promising: we obtain an Accuracy and a F-measure equal to 0.99. Concerning the ransomware results, we are not able to identify the malicious payloads of just 6 samples (i.e., 3 originals and 3 morphed) on 1, 277.

4 Related Work

In this section we review the current literature related to ransomware detection and formal methods applied to the detection of mobile malware.

As we stated previously, literature counts only two works about the detection of Android ransomare.

The first one proposes HelDroid [2]: the approach includes a text classifier based on NLP features, a lightweight Smali emulation technique to detect locking strategies, and the application of taint tracking for detecting file-encrypting flows. The main weakness of HelDroid is represented by the text classifier: the authors train it on generic threatening phrases, similar to those that typically appear in ransomware or scareware samples. In addiction, like whatever machine learning approach, HelDroid needs to train the classifier in order to label a samples as a ransomware: the detection capability of the model is related to the training dataset.

The other work in literature exploring the ransomare detection in mobile world is [20]. The authors illustrate a possible design of a static and dynamic analysis based solution, without implementing it. Their goal is to build a better performance tool in order to help to understand what should do to approach for a successful detection of Android ransomware.

Formal methods have been applied for studying malware in some recent papers, see [12, 13, 16, 17]. Recently, the possibility to identify the malicious payload in Android malware using a model checking based approach has been explored in [5, 14]. Starting from payload behavior definition they formulate logic rules and then test them by using a real world dataset composed by DroidKungFu, Opfake families and update attack samples. However, as it emerges from the literature in the last years, formal methods have been applied to detect mobile malware, but at the best knowledge of the authors they have never been applied for identifying specifically the ransomware attack on Android malware.

5 Conclusions

Ransomware is a new type of malware that restricts access to the infected smartphone and it demands the user to pay a ransom to the attacker in order to remove the restriction. Ransomware samples are able to encrypt files on the infected device, which become difficult or impossible to decrypt without paying the ransom for the encryption key.

In this paper we propose a technique based on formal methods able to detect ransomware behaviour in Android platform. We obtain encouraging results on a dataset of 2,477 samples: 1 precision and 0.99 recall, overcoming in terms of effectiveness the top 10 popolous commercial antimalware.

As future work we are going to extend our solution using a ransomware dataset for different environment, like Windows Mobile and iOS in order to experiment the portability of our method.