The architecture of the CIantiMF requires the creation of two parallel extensions which act additionally and complementary to the function of the ART JVM. This injects artificial intelligence at android compiler level, significantly enhancing its active security. Specifically, SAME [45] analyzes the java classes before they load and run a java application (class loader). Introduction of the files in the ART JVM passes necessarily through the said extension in which it is checked whether the classes are benign or malicious. If they are found malicious, a decision is made, either automatically, if the accuracy of classification exceeds a desired threshold, or after an intervention of the system’s operator for the rejection and non-installation of the application. If the control class is found benign, then the installation process continues normally without problems, while the user is informed that it is a safe application.
Then, when the application is executed, a control of the network traffic that is generated by the application is performed to determine whether it is related to malicious sources or not. A thorough analysis is also carried out to identify the potential encrypted traffic and accordingly if it is following the HTTPS protocol it is allowed, whereas if it is following the Tor protocol it is rejected by default as malicious.
The proposed architecture of the CIantiMF is presented in Fig. 1.
It should be emphasized that the above architectural shape operates based on the dynamic analysis of the android system’s parameters, adapting the requirements of the running applications on the basis of stringent criteria and robust security policies.
This adaptation is the result of an automatic process, derived from computational intelligence technologies, thus overcoming the potential inability of users to take timely measures to protect themselves. Finally, it is important that these malware identification procedures require fewer steps than the processor to analyze an application, resulting in a better resources management and in less energy consumption.
Smart anti-malware extension (SAME)
In our previous work [45], we have proposed the SAME which introduces intelligence to the compiler and classifies malicious java classes in time to spot the android malwares. This is done by applying the java class file analysis (JCFA) approach and based on the effective BBO optimization algorithm, which is used to train a MLP.
Generally, the source code java files (.java) of a java application are compiled to byte code files (.class) which are platform independent and they can be executed by a JVM just like ART which is an ahead-of-time (AOT) compiler. The classes are organized in the .java files with each file containing at least one public class. The name of the file is identical to the name of the contained public class. The ART loads the classes required to execute the java program (class loader) and then it verifies the validity of the byte code files before execution (byte code verifier) [3]. The JCFA process includes also the analysis of the classes, methods and specific characteristics included in an application. The SAME, introduces advanced artificial intelligence (AI) methods, applied on specific parameters and data (obtained after the JCFA process) to perform binary classification of the classes comprising an application, in benign or malicious. More specifically the SAME system employs the biogeography-based optimizer to train a MLP which classifies the java classes of an application successfully in benign or malicious.
The architectural design of the SAME introduces an additional functional level inside the ARTJVM, which analyzes the java classes before their loading and before the execution of the java program (class loader). The introduction of the files in the ARTJVM, always passes from the above level, where the check for malicious classes is done. If malicious classes are detected, decisions are done depending on the accuracy of the classification. If the accuracy is high, then the decisions are done automatically, otherwise the actions are imposed by the user regarding the acceptance or rejection of the application installation. In the case that the classes are benign, the installation is performed normally and the user is notified that this is a secure application [45].
A basic innovation of the SAME is the inclusion of a machine learning approach as an extension of the ART JVM used by the android OS. This joint with the JCFA and the fact that the ART JVM resolves ahead-of-time all of the dependencies during the loading of classes, introduces Intelligence in compiler level. This fact enhances the defensive capabilities of the system significantly. It is important that the dependencies and the structural elements of an application are checked before its installation enabling the malware cases.
An another important innovative part of this research is related to the choice of the independent parameters, which was done after several exhaustive tests, to ensure the maximum performance and generalization of the algorithm and the consumption of the minimum resources.
Finally, it is worth mentioning that the BBO optimization algorithm (popular for engineering cases) is used for the first time to train an artificial neural network (ANN) for a real information security problem.
Online Tor traffic identification extension (OTTIE)
The TTIE is essentially a tool for analysis of web streaming traffic in fixed intervals, to extract timely conclusions in which some or all of the incoming data are not available for access from any permanent or temporary storage medium, but those arrive in a form of consecutive flows. For these data there is no control over the order in which they arrive, their size may vary and many of them offer no real information. Also the examination of individual IP packets or TCP segments can extract only a few conclusions and therefore the interdependence of the individual packets to each other, their analysis cannot be done with simple static methods, but it requires further modeling of traffic and the use of advanced analytical methods for the extraction of knowledge from complex data sets. This modeling in TTIE is achieved by the use of the computational intelligence online sequential extreme learning machine (OSELM) algorithm.
The extreme learning machine (ELM) as an emerging biologically inspired learning technique provides efficient unified solutions to “generalized” single-hidden layer feed forward networks (SLFNs) but the hidden layer (or called feature mapping) in ELM need not be tuned [46]. Such SLFNs include but are not limited to support vector machine, polynomial network, RBF networks, and the conventional feed forward neural networks. All the hidden node parameters are independent from the target functions or the training datasets and the output weights of ELMs may be determined in different ways (with or without iterations, with or without incremental implementations). ELM has several advantages, ease of use, faster learning speed, higher generalization performance, suitable for many nonlinear activation function and kernel functions.
According to the ELM theory [46], the ELM with Gaussian radial basis function kernel (GRBFK) \(K(u,v)=\mathrm{exp}(-\gamma {\vert }{\vert }u-v{\vert }{\vert }^{2})\) is used in this approach. The hidden neurons are \(k=20\) that chosen with trial and error method. Subsequently, \(w_{i }\) are the assigned random input weights and \(b_{i},\,i=1,\ldots ,N\) are the biases. To calculate the hidden layer output matrix H, the Eq. (1) is used.
$$\begin{aligned} H=\left[ {{\begin{array}{c} {h\left( {x_1 } \right) } \\ \vdots \\ {h\left( {x_N } \right) } \\ \end{array} }} \right] =\left[ {{\begin{array}{ccc} {h_1 \left( {x_1 } \right) }&{} \cdots &{} {h_L \left( {x_1 } \right) } \\ \vdots &{} &{} \vdots \\ {h_1 \left( {x_N } \right) }&{} \cdots &{} {h_L \left( {x_N } \right) } \\ \end{array} }} \right] \end{aligned}$$
(1)
\(h(x) = [h_{1}(x), . . ., h_{L}(x)]\) is the output (row) vector of the hidden layer with respect to the input x. Also h(x) actually maps the data from the d-dimensional input space to the L-dimensional hidden-layer feature space (ELM feature space) H and thus h(x) is indeed a feature mapping. ELM is to minimize the training error as well as the norm of the output weights:
$$\begin{aligned} \hbox {Minimize} : {\vert }{\vert }H\beta - T{\vert }{\vert }^{2}\hbox { and }{\vert }{\vert }\beta {\vert }{\vert } \end{aligned}$$
(2)
where H is the hidden-layer output matrix of the equation (1), \({\vert }{\vert }\beta {\vert }{\vert }\) is used to minimize the norm of the output weights and actually to maximize the distance of the separating margins of the two different classes in the ELM feature space \(2/{\vert }{\vert }\beta {\vert }{\vert }\).
To calculate the output weights \(\beta \) the function (3) is used:
$$\begin{aligned} \beta =\left( {\frac{I}{C}+H^\mathrm{T}H} \right) ^{-1}H^\mathrm{T}T \end{aligned}$$
(3)
where c is a positive constant is obtained and T resulting from the function approximation of SLFNs with additive neurons
$$\begin{aligned} T=\left[ {{\begin{array}{l} {t_1^\mathrm{T} } \\ \vdots \\ {t_N^\mathrm{T} } \\ \end{array} }} \right] [46] \end{aligned}$$
which is an arbitrary distinct sample with
$$\begin{aligned} t_{i}=[t_{i1}, t_{i2},{\ldots },t_{im}]^\mathrm{T}\in R^{m}[47] \end{aligned}$$
The OSELM is an alternative technique for large-scale computing and machine learning approaches that used when data become available in a sequential order to determine a mapping from data set corresponding labels. The main difference between online learning and batch learning techniques is that in online learning the mapping is updated after the arrival of every new data point in a scale fashion, whereas batch techniques are used when one has access to the entire training data set at once. It is a versatile sequential learning algorithm because the training observations are sequentially (one-by-one or chunk-by-chunk with varying or fixed chunk length) presented to the learning algorithm. At any time, only the newly arrived single or chunk of observations (instead of the entire past data) are seen and learned. A single or a chunk of training observations is discarded as soon as the learning procedure for that particular (single or chunk of) observation(s) is completed. The learning algorithm has no prior knowledge as to how many training observations will be presented. Unlike other sequential learning algorithms which have many control parameters to be tuned, OSELM with RBFkernel only requires the number of hidden nodes to be specified [47, 48].
The proposed method uses an OSELM that can learn data chunk-by-chunk with a fixed chunk size of \(20\times 20\), with RBF kernel classification approach to perform malware localization, Tor traffic identification and botnets prohibition in an energetic security mode that needs minimum computational resources and time [7]. The OSELM consists of two main phases namely: boosting phase (BPh) and sequential learning phase (SLPh). The BPh used to train the SLFNs using the primitive ELM method with some batch of training data in the initialization stage and these boosting training data will be discarded as soon as boosting phase is completed. The required batch of training data is very small, which can be equal to the number of hidden neurons [46,47,48].
The general classification process with OSELM classifier is described below:
Phase 1 (BPh) [47, 48]
The process of BPh for a small initial training set \(N=\{(x_{i}, t_{i}){\vert }x_{i}\in R^{n}, t_{i} \in R^{m}, i=1, \cdots , \tilde{N} \}\) is described as follows:
-
(a)
Assign arbitrary input weight \(w\beta ^{\left( 0 \right) }=M_0 H_0^\mathrm{T} T_{0i}\) and bias \(b_{i}\) or center \(\mu _{i}\) and impact width \(\sigma _{i}, i=1, \ldots \tilde{N} \), where \(\tilde{N} \) number for hidden neuron or RBF kernel for a specific application.
-
(b)
Calculate the initial hidden layer output matrix \(H_0 =[h_1 ,\cdots ,h_{\tilde{N} } ]^\mathrm{T}\), where \(h_i =[g(w_1 \cdot x_i +b_1 ), \ldots , g(w_{\tilde{N} } \cdot x_i +b_{\tilde{N} } )]^\mathrm{T}\), \(i=1, \ldots , \tilde{N} \), where g activation function or RBF kernel.
-
(c)
Estimate the initial output weight, where \(M_0 =( {H_0^\mathrm{T} H_0 } )^{-1}\) and \(T_0 =[ {t_1 ,\ldots ,t_{\tilde{N} } } ]^\mathrm{T}\).
-
(d)
Set \(k=0\).
Phase 2 (SLPh) [47, 48]
In the SLPh the OSELM will then learn the train data chunk-by-chunk with a fixed chunk size of \(20\times 20\) and all the training data will be discarded once the learning procedure on these data is completed. The essentials step of this phase for each further coming observation \((x_i ,t_1 )\), where \(x_{i} \in R^{n}, t_{i} \in R^{m}\) and \(i=\tilde{N} +1, \tilde{N} +2, \tilde{N} +3\), described as follows:
-
(a)
Calculate the hidden layer output vector \(h_{\left( {k+1} \right) } =[g(w_1 \cdot x_i +b_1 ), \ldots , g(w_{\tilde{N} } \cdot x_i +b_{\tilde{N} } )]^\mathrm{T}\)
-
(b)
Calculate latest output weight \(\beta ^{\left( {k+1} \right) }\) by the algorithm \(\hat{\beta } =( {H^\mathrm{T}H} )^{-1}H^\mathrm{T}T\) which is called the recursive least-squares (RLS) algorithm.
-
(c)
Set \(k=k+1\)
The proposed TTIE algorithm includes the following ruleset which is the core of its reasoning and described below.
Step 1 Perform malware localization by OSELM with malware localization dataset (MLD). If the malware analysis gives a positive result (Malware) the network traffic is blocked and the process is terminated. If the malware analysis gives a negative result (Benign), no action is required and goes to step 2.
Step 2 Perform network traffic analysis by OSELM with network traffic classification dataset (NTCD). If the network traffic classification result is not a HTTPS, no action is required. If the network traffic classification result is a HTTPS, go to step 3.
Step 3 Performs Tor-traffic identification by OSELM with Tor-traffic identification dataset (TTID). If the botnet classification result gives a positive result (Botnet) the network traffic blocked and the process terminated. If the botnet classification result gives a negative result (HTTPS), no action is required.
The overall algorithmic approach of TTIE that is proposed herein is described clearly and in detail in the following Fig. 2.