1 Introduction

Since Android-based devices are used by thousands of end-users every year, more and more malicious applications are continuously developed by cyber-criminals in order to steal sensitive information and conduct hostile activities. According to McAfee Mobile Threat Report, in 2019, cyber-criminals have increased the effectiveness of their mobile attacks with the support of a wide variety of methods and new approaches, such as backdoors and cryptocurrencies, by making them hard to be identified and removed [32]. In addition to this, as show in Fig. 1, G DATA and McAfee experts have counted more than 4.18 million new malicious applications in 2019 [17], while Kaspersky and TechCrunch have estimated that there will be over 6 billion smartphone users worldwide by 2020 [22, 41].

Fig. 1
figure 1

Total mobile malware detections by quarter in 2018 and 2019 [32]

Therefore, to face the following security trend and support researchers in addressing the malware detection tasks, several approaches based on machine learning (ML) and deep learning (DL) have proved to be effective in facing many aspects related to Android threats, especially when they have been combined with static and dynamic features directly extracted from mobile apps [16, 21, 31]. However, due to the continuous release of new Android malware, the related classification tasks are still challenging. As a consequence, many state-of-the-art approaches suffer from problems related to their dynamic re-training, as well as the updating training datasets.

To address these issues, in this paper, we propose new special features, called permission maps (Perm-Maps), which combine information related to the Android permissions and their corresponding severity levels. Such features are employed to classify different malware families through the usage of a convolutional neural network (CNN). Also, the advantages introduced by the Perm-Maps are being enhanced by a training process based on the federated logic, where end-user devices extract static features locally and send them to a centralized server devoted to training the employed neural network.

Next, we explore the effectiveness of the proposed Perm-Maps by comparing them with the most popular state-of-the-art ML- and DL-based approaches. Finally, to reduce the computational effort respectively required by the Perm-Maps generation and CNN training processes, we investigate a feature selection technique based on the most frequent Android permissions.

The main contributions of this paper can be summarized as follows:

  1. 1.

    Novel features, called Perm-Maps, are proposed to combine the Android permissions and their corresponding security levels into an image.

  2. 2.

    A federated architecture is presented to support the training phase of the Perm-Maps.

  3. 3.

    A CNN is employed to classify several Android malware families and then compared with the most popular state-of-the-art approaches.

  4. 4.

    A feature selection technique based on the most frequent Android permissions is investigated to reduce the computational effort required by the Perm-Maps generation and CNN training processes, respectively.

The rest of the paper is organized as follows. Section 2 will present the related works about malware classification methods for Android devices. Section 3 will report a background overview on Android permissions. Section 4 will show the definition of Perm-Map, which is based on the Android permissions and their corresponding severity levels. Section 5 will present the employed federated architecture. Section 6 will discuss the obtained results related to the proposed CNN and the investigated feature selection technique, respectively. Finally, Sect. 7 will show the conclusions and future works.

2 Related works

Since Android malware applications are continuously released every year by cyber-criminals, many detection frameworks based on static and dynamic methodologies have been proposed [16, 21, 31]. Static techniques can acquire the behaviour of the analyzed applications by performing several reverse engineering steps, and consequently, by extracting useful signatures without executing the application. For instance, Onwuzurike et al. [34] presented MaMaDROID, a new Android malware detection solution that can check the sequences of API calls associated with the activity of a mobile application.

However, static approaches are often adversely affected by the use of obfuscation techniques, and additionally, they become ineffective against polymorphic malware which is able to modify itself. This is the reason why any signature-based detection techniques are ineffective, and consequently, they are often substituted by dynamic approaches, which are based on dynamic analysis techniques, and hence, are able to analyze the behaviour of an application at run time. In 2018, Sruthi et al. [40] proposed a malware detection technique, in Windows OS environment, based on API calls. Furthermore, several works have adopted ML and DL techniques based on both static and dynamic features [14, 33, 48].

In 2016, Kolosnjaji et al. [24] investigated a comparison among different deep neural networks (DNNs) typologies. In particular, they proposed a convolutional long short term memory (Conv-LSTM) network able to achieve an 89.0% in average accuracy, by considering 10 different Android malware categories. Kumar et al. [25] proposed a comparison among the three famous ML-based methods to detect Android malware by analyzing the visual representation of APK files formatted as Grayscale, RGB, CMYK, and HSL images, without any code extraction and decompiling operations. More precisely, they investigated the proposed technique by using decision trees (DT), Random Forest (RF), and k-nearest neighbor (k-NN), respectively. The obtained results have shown that RF is able to achieve a 91% accuracy by considering APK files formatted as Grayscale images.

In 2017 Vinayakumar et al. [42] investigate different LSTM neural networks to classify the APK files as either benign or malicious. In particular, they proposed an LSTM network able to achieve an 89.7% accuracy, by taking into account Android permissions translated as numerical information.

In 2018 Li et al. [27] proposed a comparison among different DNNs configurations based on static information, like permissions and Java code. More precisely, they compared ten distinct neural network configurations by achieving an average accuracy between 95 and 97% in the Android malware classification task. Xie et al. [47] proposed a tool called RepassDroid, which is able to classify Android applications, as benign or malicious, based on permission and Java methods. Additionally, they explored a comparison among different ML-based approaches like DT, RF, k-NN, Naive Bayes (NB), and support vector machines. The achieved results have proven that RF is able to achieve a 99.7% accuracy by taking into account 24,288 Android applications.

In 2019, Li et al. [26] proposed a novel and highly reliable DNN classifier for Android malware detection based on the extraction of several features from manifest files and source code. In particular, they considered seven different static features like app components, hardware features, permissions, intent filters, restricted and suspicious Java methods, and used permissions. Thus, they have been used to train a DNN able to obtain a 99.25% average accuracy. D’Angelo et al. [13] proposed a deep sparse autoencoders (AEs) to classify Android-based malware and goodware (GW) applications downloaded from several app stores. More precisely, they proposed a new API methods representation technique named API-images, and then, an average accuracy of 95% has been achieved by employing deep sparse AEs.

In 2020, Aonzo et al. [7] presented BAdDroIds, a mobile application that leverages DL for detecting malware on resource-constrained devices. In particular, the proposed application has been compared with the most notable Android malware detection frameworks by achieving a 98% average accuracy.

Finally, in 2021, D’Angelo et al. [12] proposed a CNN and a recurrent neural network (RNN), based on API-images, in order to classify different malware families. More precisely, they used both neural networks on five malware families on the Unisa malware dataset (UMD) by achieving 99% in average accuracy.

3 Background

In this section, some key concepts related to Android permissions and federated environments are discussed in order to understand and appreciate the novelties of the proposed approach.

3.1 Permission’s overview

Android permissions can be categorized into three main typologies: Install-time, Runtime, and Special [4]. Install-time permissions grant an application limited access to restricted data, and thus, they allow an application to perform restricted actions that minimally affect the system or other apps. When a developer declares install-time permissions, the system automatically grants the required permissions without notifying the end-user. There are two types of Install-time permissions respectively called normal permissions and signature permissions:

  • Normal permissions allow access to data and actions that present minimal risk for the system or end-users privacy. They can be used or identified through a protection level’s value set to normal.

  • Signature permissions since they are defined in another Android application, the signature permissions are granted only if the requesting and declarant applications are signed through the same certificate. Also, they can be used or identified through a protection level value set to signed.

Runtime permissions, also known as dangerous permissions, grant an application additional access to restricted data by allowing it to perform actions that substantially affect the system and other apps. When an Android application requests runtime permissions, the system presents a prompt and waits that is granted or not by the end-user. Runtime permissions can be used or identified through a protection level value set to dangerous.

Finally, the special permissions can be only defined by the original equipment manufacturers (OEMs) to provide access control concerning several energy-intensive actions, such as access to other applications. More precisely, they are closely associated with an app operation (app op) related to access control, and they can be used or identified through a protection level value set to appop.

4 Permission maps

Although most of the techniques used in literature include both static and dynamic approaches, the static one is the most desired because it can analyze applications without running them. Accordingly, we propose new features, called Perm-Maps, derived by the malware static analysis. More precisely, A Perm-Map is a sparse matrix where Android permissions, and their corresponding severity levels, are related as fixed points and reported in an x–y plane. As depicted in the following, the proposed Perm-Maps are able to address three main issues: (i) Android malicious developers could define custom permissions to perform several hostile activities, like theft of sensitive data or launch of cyber-attacks [1]; (ii) since default and custom permissions are associated to different severity levels, also called protection levels or flags, like: normal, signature, dangerous, or their combinations, an application could be characterized by many permissions and severity levels [3, 5]. Therefore, a malicious developer could define some low severity level permissions to perform several actions without notifying the end-user; (iii) since Perm-Maps represents static features only extracted from the manifest file, they cannot be influenced by the most famous obfuscator tools, like DexGuard [18], ProGuard [19], and Obfuscapk [6].

4.1 Perm-Map creation workflow

The creation of a Perm-Map consists mainly in the following four steps:

  1. 1.

    Extraction of the Android permissions and their corresponding protection level.

  2. 2.

    Assignment of an identifier (\(ID^p\)) to any Android permission.

  3. 3.

    Assignment of an identifier (\(ID^s\)) to any severity level.

  4. 4.

    Creation of the Perm-Maps by using pairs of IDs (\(ID^p\); \(ID^s\)) as coordinates of fixed points in an x–y plane.

The first step is accomplished by using several tools or libraries devoted to the malware static analysis. A typical approach could envisage a dictionaries creation process of the well-known Android permissions, and their protection levels, by finding them from the official documentation [2]. Alternatively, the \(\varvec{\langle }{\mathbf{permission}}{\varvec{\rangle }}\) tag can be employed to know the protection level of custom permissions. This approach is adopted by several most famous reverse engineering tools, like Androguard [15]. More precisely, for each permission declared into the AndroidManifest file, it is able to obtain the corresponding protection level by checking if the considered permission is known; assign a dangerous protection level otherwise.

Next, the second and third steps are accomplished by creating two dictionaries to respectively translate each Android permission and each corresponding severity level into a unique ID number. Finally, for each analyzed application, the fourth step is conducted by considering each pair of ID numbers (\(ID^p\); \(ID^s\)) as coordinates of a fixed point, and consequently, storing the translated information in a sparse matrix. For instance, let p1 and p2 two Android permissions, and let s3 and s2 their security level, respectively. We can consider two pair of coordinates \(C1 = (p1,s3)\) and \(C2 = (p2,s2)\) and draw two points in an x–y plane, where axes x and y reports permissions and severity levels, respectively. However, since security levels could be different among them, it is possible to use different colour scales (like RGB or Gray-scale) to remark these differences. Figure 2 shows the complete workflow to obtain a Perm-Map.

Fig. 2
figure 2

Perm-Maps workflow

5 A federated architecture

Since millions of Android-based applications are released every year, managing related data for model training purposes is a process that requires significant efforts, mainly associated to accessing, searching, and updating them. To overcome these issues, we present a federated architecture to support Android classification tasks through the proposed Perm-Maps. Federated architectures are based on a federated data production logic, which implies that the participating devices send their own pre-processed permission data to a centralized infrastructure devoted to provide collection services and classification-model construction and to share related information [23]. Due to its great success, the federated logic has been investigated, in the last decade, to face main issues related to the convergence process among edge and cloud infrastructures, such as data aggregation, data mobility, and services migration [10, 30, 38]. Also, it has been involved in many other famous application domains, such as cryptography solutions to preserve data security [36], optimization frameworks for the medical of things devices [37], and vehicular networks optimization [43].

In detail, the proposed architecture aims to provide a data aggregation workflow where federated devices are used as decentralized permission data sources and preliminary processing units. Additionally, a central server is employed to collect data, and then construct, share and update a classification model to be transferred as an update to each federated device, and thus, to propose a managing strategy for the involved permissions data. Therefore, the discussed architecture works through two steps respectively named model creation process and model update process, while its main contributions can be summarized as follows:

  1. 1.

    A data aggregation’s workflow is presented to collect data from federated devices.

  2. 2.

    A centralized dataset is employed to create a shared DNN model based on Perm-Maps.

  3. 3.

    A data update workflow is discussed to manage centralized data and re-adapt the shared model.

5.1 Model creation process

At beginning of the model creation process, each device decompresses the APK file and sends the AndroidManifest file to the central server. Thus, when data are completely stored, it will perform the Perm-Maps creation process by following the workflow shown in Fig. 2. Basically, the server will run the CNN’s training and testing phase and send the classification model to each device. Finally, each end-user will receive a notification concerning the classification result of the analyzed application. Figure 3 shows the discussed process, while its main steps can be summarized as follows:

  1. 1.

    End devices decompress the APK file.

  2. 2.

    They also send the manifest file to the central server.

  3. 3.

    The server runs the Perm-Maps creation process, when data are completely available.

  4. 4.

    It then runs the CNN’s training and testing phase.

  5. 5.

    The server sends the classification model to each device.

  6. 6.

    The end devices notify the end-users about the classification result.

Fig. 3
figure 3

Model creation process

Note that, when an end device receives the first classification model information, it becomes able to autonomously create its Perm-Maps, and hence perform classification, without affecting the central server.

5.2 Model update process

The following phase is responsible for collecting new data when the end-user tries to install a new application. At a high level, it differs from the previous process in three main aspects:

  1. 1.

    If an application is unknown, it automatically stores the related manifest file on the central server.

  2. 2.

    If an application is unknown, it considers the end-users feedback to generate a classification label.

  3. 3.

    If a threshold value is reached, it trains and shares an updated model by considering new data.

Therefore, when an end-user installs an application, the device decompresses the APK, extracts the Perm-Map by reading the AndroidManifest file, and uses the classification model to make a classification. If the application is known, the classification module will notify the end-user by showing the achieved prediction. Otherwise, it will ask if the installed application is known or trusted, and subsequently, will send the manifest file and the user’s answer to the central server. Thus, the employed server stores new data and, when the dataset size will have reached a threshold value, it will re-perform the Perm-Maps creation process. Finally, the server will re-run the training and testing phase and sends the updated model to each device. Figure 4 shows the discussed process, while the main steps can be summarized as follows:

  1. 1.

    End devices decompress the APK file.

  2. 2.

    They also extract the Perm-Map from the manifest file.

  3. 3.

    End devices also try to obtain a prediction and ask if the analyzed application is known or trusted.

  4. 4.

    They send the manifest file and user’s answer to the server.

  5. 5.

    The server stores new data.

  6. 6.

    It then re-runs the Perm-Maps creation process, when the dataset size reaches a threshold value.

  7. 7.

    It also re-runs the CNN’s training and testing phase.

  8. 8.

    Finally it sends the updated model to each device.

Fig. 4
figure 4

Model update process

6 Experimental results

The first goal of experiments, reported in this section, is devoted to demonstrating the contribution of the proposed approach concerning the classification of several Android applications. Instead, the second one exploring the effectiveness of a feature selection technique, based on the most frequent permissions, to reduce the computational effort required by the generation and training processes of the Perm-Map and CNN, respectively.

6.1 UMD cleaning

In 2021 we developed a new Android malware dataset (AMD) called Unisa malware dataset (UMD)Footnote 1 [12] that contains 25,275 mobile applications collected by analyzing two famous datasets: AMD [28, 44] and Drebin [8, 39]. This first version of UMD consists of two main directories called amd-cuckoo-family and drebin-cuckoo-family that contain 66 and 143 Android malware families, respectively. Additionally, it provides, for each analyzed application, the report files obtained through CuckooDroid Sandbox [11, 20]. Table 1 shows an overview of the first release of UMD.

Table 1 Overview on the first version of UMD

In this work, we use a cleaned version of UMD (UMD-v2) obtained by applying the following modifications:

  1. 1.

    Consider the two main folders as a single one.

  2. 2.

    Merge the common families.

  3. 3.

    For each common family, remove the duplicates.

  4. 4.

    Remove each application which has got one or more malformed files.

  5. 5.

    Remove each application which has got one or more missing files.

The application of points (1) and (2) have reduced the number of considered families from 209 to 185. Instead, the application of points (3), (4) and (5) have reduced the number of the analyzed applications from 25,275 to 24,285. Additionally, the application of the entire protocol has reduced the dimensions (Dim.) from 117.63 to 112.45 GB. Table 2 reports a comparison between the two versions of our datasets.

Table 2 Comparison between the versions of UMD

6.2 Proof of concept experimental setting

We built our proof of concept testing framework within a virtualization scenario based on VirtualBox. For this work, we considered 10 categories of Android applications. In particular, the entire dataset used for training has been composed by choosing nine malware families from UMD-v2 and selecting GW applications from the following online stores: ApkPure, GooglePlay, and PlayDrone. Hence, to simulate the discussed Model Creation Process, each application has been analyzed through the Android device cross-platform mode of CuckooDroid [11, 20]. More precisely, in our proof of concept framework we used two Android guest virtual machines, simulating end devices, to decompress each APK file and send the AndroidManifest file to the server virtual machine. Thus, we extracted Perm-Maps by using a dedicated Python script executed on the server machine. We stored each Perm-Map as a matrix \(4 \times 298\) in accordance with the maximum number of distinct severity levels and Android permissions observed, respectively. Figure 5 shows the application’s distribution extracted by performing an exploratory data analysis, EDA [35, 46], and it highlights the unbalanced behaviour of the employed dataset.

Fig. 5
figure 5

Data distribution for each category

Subsequently, we have split the following dataset in order to run the experiments. To this purpose, the whole dataset has been subdivided into two mutually exclusive subsets called learning and testing dataset, respectively. We used 70% of the entire dataset for learning and the remaining 30% for testing. Then, the K-fold cross-validation algorithm, with k = 10 (as recommended in [9], has been used to tune the hyper-parameters and provide an unbiased evaluation of each employed CNN. Finally, each CNN has been trained on each training set and evaluated on the corresponding testing set. Table 3 reports the main information about the involved dataset.

Table 3 Summary of the involved dataset

6.3 Proposed network and evaluation metrics

The employed CNN architecture has been developed as a sequence of two Conv2D layers with kernel_size = (2, 2), activation = relu, and no pooling. For the first one, we used 8 filters and strides = (2, 2), while for the second one we used 2 filters and strides = (1, 1). Subsequently, we added a flatten layer to convert the latent space, from the second Conv2D layer, as a flattened sequence to fed a fully-connected softmax neural network. Therefore, 2 dense layers with 128 nodes, activation = relu, and dropout = 0.5, have been connected. Finally, a dense layer with 10 nodes and activation = softmax has been used as the output layer. Figure 6 shows the architecture of the proposed network. Additionally, the following architecture has been derived by varying the following hyper-parameters:

  • numConvLayers: the number of Conv2D layers considered (1, 2, 3);

  • numDenseLayers: the number of dense layers considered (1, 2, 3, 4);

  • filters: the number of filters considered for each Conv2D layer (2, 4, 8, 16);

  • neurons: the number of neurons considered for each dense layer (10, 32, 64, 128, 256);

  • activation: activation functions employed (relu, softmax);

  • strides: the stride length for each Conv2D layer (1, 2, 4);

  • batch_size: considered batch_size values (16, 32, 64, 128);

  • loss: loss functions used (Categorical_Crossentropy, SparseCategoricalFocalLoss).

Fig. 6
figure 6

Architecture of the employed neural network

To evaluate the classification quality of the employed neural network, the following metrics have been computed: accuracy (Acc.), sensitivity (Sens.), specificity (Spec.), precision (Prec.), area under the ROC curve (AUC), and F-measure (F-Meas or F-score). More precisely, they have been derived from a multi-class confusion matrix where, for each category, TPs (true positives) are the applications correctly classified, TNs (true negatives) are the applications correctly classified in another category, FPs (false positives) are the applications incorrectly identified as a considered category, while FNs (false negatives) are the applications in another category incorrectly identified as a considered category. Subsequently, in order to obtain a global validation, the average values (Avg.) among all metrics have been computed.

6.4 Achieved results

The proposed CNN has been trained and tested on an iMac equipped with an Intel 6-Core i7 CPU @ 3.20 GHz, and 16 GB RAM. The employed neural network has been compiled with Adam optimizer and SparseCategoricalFocalLoss function [29], which is a useful function to fit neural networks in presence of unbalanced datasets. Then, it has been trained with batch_size = 64, and 150 epochs by using the 70/30 criteria and the K-fold cross-validation algorithm with k = 10. We chose the following hyper-parameters according to the achieved results from the testing process. Tables 4 and 5 show results that have been obtained from the testing phase by respectively using the 70/30 criteria and the K-fold cross-validation algorithm with k = 10, while Table 6 shows the multi class confusion matrix related to the 70/30 criteria.

Table 4 Performance metrics related to 70/30 criteria
Table 5 Performance metrics related to K-fold k = 10
Table 6 Multi-class confusion matrix related to 70/30 criteria

Furthermore, to face the yearly growth of the malicious applications and analyze the update process of the presented architecture, we have estimated the data growth range within which to readjust the proposed CNN. More precisely, we have reduced the whole dataset by 5% through an iterative process. At each step, 5% of data have been randomly removed, and thus, we have employed the considered sub-dataset to train and test the proposed CNN by following the 70/30 criteria. Table 7 summarizes the classification metrics derived by the testing phase for each considered sub-dataset.

Table 7 Performance metrics related to dataset updating process

The achieved results show that the proposed CNN should be readjusted when the data dimensions growing between 15 and 20%. In particular, the comparison between the whole dataset (size 100%) and the dataset reduced by 20% (size 80%) shows a worsening of all classification metrics. For instance, the proposed CNN has respectively obtained a worsening of 3% in average precision, 7% in average sensibility, and 6% in average F-score.

In order to show the effectiveness of the use of the proposed representation method, the achieved results have been compared with the most notable ML-based approaches implemented in the WEKA [45] framework. More precisely, we used multi-layer perceptron (MLP), J48 trees (J48), and NB, to derive the classification metrics by considering a flattened version of the employed dataset that has been used to train and test the proposed CNN. Table 8 summarizes the comparison between the proposed CNN (Pr-CNN) and the employed ML-based methods.

Table 8 Comparison between the proposed CNN and ML-based methods

The following comparison shows that the MLP classifier is not able to distinguish different application categories by considering Android permissions and their severity levels, while J48 trees and the NB classifier have achieved good results. More precisely, the proposed CNN has obtained up to a 3% improvement in average accuracy over J48 trees and the NB classifier, and up to a 16% over MLP classifier. Consequently, the proposed CNN can reduce the number of FPs and FNs, and then, better minimize the classification error respect to the most famous ML-based approaches.

Finally, we compared the proposed CNN with the ML and DL based state-of-art solutions. We considered RF results respectively achieved by A. Kumar et al. (Kum-RF) [25] and N. Xie et al. (Xie-RF) [47], LSTM neural network results achieved by R Vinayakumar et al. (Vi-LSTM) [42], and DNN results obtained by C. Li et al. (Li-DNN) [26]. Table 9 summarizes the comparison between the Pr-CNN and the state-of-art solutions.

Table 9 Comparison between the proposed CNN and state-of-art solutions

First of all, the following comparison shows that the Vi-LSTM and Kum-RF solutions have achieved discrete results, and consequently, the proposed CNN has obtained up to 10% and 8% in average accuracy over both solutions, respectively. As reported in Sect. 2, Vi-LSTM evaluation metrics have been obtained by only considering Android permissions translated as numerical information, while Kum-RF evaluation metrics have been achieved by considering Grayscale images directly generated from the APK files, without performing any code extraction and decompiling operations. Consequently, the selected static features are not sufficient to achieve equivalent results as those obtained by the proposed CNN. Second, Xie-RF and Li-DNN have been achieved optimal results, and consequently, the proposed CNN has obtained up to 2% in average accuracy over Xie-RF, while their evaluation metrics are similar to those achieved by Li-DNN. However, the proposed Perm-Map representation technique is only based on Android permission and their severity levels, while Xie-RF and Li-DNN are based on Android permissions and Java methods. Consequently, Xie-RF and Li-DNN become ineffective against obfuscation techniques. Finally, Table 10 reports a final overview among proposed CNN, ML-based methods of WEKA, and state-of-art solutions.

Table 10 Overview among proposed CNN, ML-based methods of WEKA, and state-of-art solutions

6.5 Feature selection process

Since the number of employed permissions is 298, the final goal is devoted to exploring a feature extraction technique, based on the most frequent Android permissions, in order to reduce the computational effort required by the generation and training processes of the Perm-Map and CNN, respectively. To this purpose, we have analyzed the permissions frequencies distribution in order to find the minimum frequency number that was able to reduce the number of employed permissions and preserve the number of applications analyzed previously. We have performed the following analysis by using a dedicated Python script. More precisely, we have firstly created an ordered dictionary to store each permission and its frequency. Then, we have considered all Android permissions required at least 50 times, and consequently, 57 Android permissions have been considered for the generation process of each Perm-Map. Figure 7 shows the first five most required Android permissions.

Fig. 7
figure 7

Most required Android permissions

Subsequently, according to the workflow shown in Fig. 2, we employed the 57 Android permissions to generate and store each Perm-Maps as a matrix \(4 \times 64\) in accordance with the maximum number of distinct severity levels and an over-bound number of Android permissions, respectively. We have chosen the following over-bound to simplify the operations that are performed by convolutional layers. Thus, we have split the following new dataset in order to run the experiments. To this purpose, the whole dataset has been subdivided into two mutually exclusive subsets assuming the role of learning and testing datasets, respectively. We used 70% of the entire dataset for learning and the remaining 30% for testing. The employed neural network has been compiled with Adam optimizer, SparseCategoricalFocalLoss function, batch_size = 64, and 150 epochs. Furthermore, it presents the same architecture of the neural network described in Fig. 6 except for the input_shape = (4, 64, 1) and dense layers with dropout = 0.45. Finally, the computational effort for the text substitution, Perm-Maps generation, and training processes have been derived with and without considering the employed features selection method, respectively. Table 11 reports the computational effort required for each analyzed phase, Table 12 shows results that have been obtained from the testing phase by using the 70/30 criteria, while Table 13 summarizes the comparison between the proposed CNNs that have been respectively called CNN-NoExtraction (CNN-NE) and CNN-WithExtraction (CNN-WE).

Table 11 Required computational effort
Table 12 Performance metrics derived by the 70/30 criteria and features selection method
Table 13 Comparison between the proposed CNNs

The obtained results show that the employed feature selection approach could reduce the computational effort required by each analyzed process. More precisely, Table 11 shows that text substitution and Perm-Maps generation processes have been slightly improved, respectively. Furthermore, it shows that the training process has been improved by 3.5 s, while the total effort has been improved by 3.6 s. Finally, the comparison reported in Table 13 demonstrates that proposed CNNs have been obtained equivalent evaluation metrics by testing phase, and thus, how the employed features selections criteria could also optimize the proposed representation approach.

7 Conclusions and future works

In this paper, novel features called Perm-Maps, based on Android permissions and their corresponding severity levels, have been presented. Next, a CNN has been used to show the potentialities of the proposed approach. More precisely, it has been enhanced by a training process based on a federated logic, where end-users devices extract static features locally and send them to a central server devoted to training a neural network performing malware classification. Then, the effectiveness of the presented methodology has been validated by using statistic metrics and comparing it to the most popular state-of-the-art ML-based approaches, like NB, MLP and J48 DTs. The obtained results show that the proposed CNN has achieved up to a 3% improvement in average accuracy over a J48 tree-based and NB classifier, and up to 16% over a MLP classifier, respectively. Finally, a feature selection technique, based on the most frequent Android permissions, has been explored to reduce the computational effort required by the Perm-Maps generation and CNN training processes, respectively. The achieved results show that the proposed methodology has improved the training time by 3.6 s and that they are also comparable with those obtained without considering any features selection technique.

However, due to the high number of existing Android-based applications, we would like to propose two possible future works. First of all, we will investigate the proposed features by considering an enormous quantity of decentralized data and applying a fully federated learning approach, involving end devices in model construction. Finally, since the most popular ML and DL based methods consider only features obtained at the end of malware analysis, we will propose new solutions capable of reducing damages caused at run-time by processing streams of dynamic features. For instance, several combinations among LSTM layers, CNNs, and stacked AEs (SAEs) could be explored and combined with the proposed approach.