Keywords

1 Introduction

Information systems security requires the deployment of a rigorous security policy with several security mechanisms and tools. We generally start with prevention systems such as authentication where the goal is to prove the identity of users, access control where the goal is to define rights of users on data, and firewalls where the role is to control the access to the information system towards the outside world.

However, these mechanisms are not sufficient to fully protect systems against malicious attacks. Indeed, computer systems often exhibit vulnerabilities, which allow attackers to bypass preventive mechanisms. In addition, many security tools focus on the protection against external attacks, while attacks can be also internal. For example, client side attacks are a very common nowadays. Therefore, intrusion detection is necessary as a second layer of security after deploying prevention systems. Unfortunately, Intrusion Detection is still imperfect for two reasons. First, intrusion detection systems (IDSs) generate a very large number of low-level alerts, where most of them are false positive; i.e, alerts generated in the absence of attacks. And second, IDSs suffer from false negative which is the absence of alerts in the presence of attacks.

In order to overcome these problems, a promising approach is the so-called cooperative intrusion detection [4, 20], which allows various intrusion detection tools to cooperate. In addition to IDS, other analyzers can be considered such as network and vulnerability scanners in order to correlate alerts by considering contextual information. This can be done by including for example topology and cartography. In fact, nowadays all security tools have to cooperate using a central security information and event management system (SIEM). A SIEM provide many functions to take benefit of the collected data, such as Normalization, Aggregation, Alerting, Archiving, Forensic analysis, Dashboards, etc. The most relevant function is Correlation, when we can get a precise and quick picture about the threats and attacks in real time. However, most of proprietary SIEM use its own data representation and its own correlation techniques which are not always favorable to share knowledge and to do custom reasoning.

In such situation, the use of common and extensible formalism to describe information in intrusion detection is a major concern. This information is generally structured and encoded in XML. For example, this is the case of alerts in IDMEF (for Intrusion Detection Message Exchange Format) and TAXII (Trusted Automated eXchange of Indicator Information) as well as the vulnerabilities in OVAL (Open Vulnerability and Assessment Language) and STIX (Structured Threat Information eXpression). However, information encoded here in XML is limited to a syntactic representation basing in different taxonomies. Consequently, in the absence of a semantic approach correlating this information is a fastidious task. Indeed, it is more interesting to move from taxonomies to ontology specification languages [9, 12], which are able to simultaneously serve as recognition, reporting and correlation languages.

Several existing knowledge representation models can be used in SIEM such as [1, 2, 7, 8]. In this paper, our contribution can be seen as an enhancement of existing representations by regrouping a large amount of information into a single ontology. This will offer a comprehensive and extensible knowledge representation which can be used in many event correlation systems.

On an other hand, given that tools used in SIEM are not totally reliable, usually conflicts appear between them [15, 19]. For example, one can easily see that IDSs are not fully reliable since they generate many false positives and false negatives. Therefore, it is very important to resolve these conflicts in order to exploit the cooperation. Hence, our second contribution is an ontological reasoning approach to correlate alerts in order to reduce the amount of alerts, especially false positives.

The rest of this paper is organized as follows. In Sect. 2 we briefly recall intrusion detection and some works of knowledge representation proposed in the context of intrusion detection, and then we present the proposed ontology. Section 3 presents an architecture of an alert correlation system based on DLs reasoning. In Sect. 4 experiments are conducted and results are discussed. In Sect. 5, some related works are briefly discussed. Section 6 concludes this paper.

2 Related Works

The automatic correlation of information from different security systems has been a vivid topic of research for over a decade [4, 20]. Numerous approaches have been developed for correlating alerts and other log entries to strength the power of intrusion detection systems. Here, we briefly discuss only related works regarding the use of ontology in computer security. Ontology can be used in many field in SIEM, such as to analyze user behavior and system activities, or to identify known attack patterns, or also to analysis abnormal behavior and activity of both systems and users. Notice that semantic approaches have many advantages over existing approaches, mainly two aspects: the formal and extensible knowledge representation capability and the decidable reasoning.

Using ontology in computer security is relatively new. The first research work was done by Undercoffer et al. [16]. They produced an ontology that specify a model of computer attack. Their ontology is based on attack strategies which is categorized according to targeted system components, tools of attacks, consequences of attacks, and location of attackers. They present their model as a target-centric ontology. Since the work of Jeffrey many other ontologies was proposed. In [17], Wang et al. propose an Ontology for Vulnerability Management (OVM) which contains several concepts about vulnerabilities, affected products, consequences and countermeasures, etc. Authors have used their own implementation of their ontology without referring to any languages. In [2], Azevedo et al. propose a domain-ontology with more generic and abstract concepts in the field of computer security, serving as the basis for the construction of other specific security-domain-ontologies called CoreSec. In [5], Gao et al. provide an ontology-based attack model which is used to assess the information system security from attack angle. The proposed ontology consists of five dimensions, which include attack impact, attack vector, attack target, vulnerability and defense.

More recently, many semantic description methods for the security policy has been proposed. In [14], an ontology-based method is presented to solve the problem of the semantic description and verification of a security policy. Onto-ACM (ontology-based access control model), is a semantic analysis model proposed by Choi et al. [3] to address the difference in the permitted access control between service providers and users. More over, in [18] ontologies are used to perform threat analysis and develop defensive strategies for mobile security. Authors have proposed on ontology-based approach that can identify an attack profile in accordance with structural signature of mobile viruses, and also overcome the uncertainty regarding the probability of an attack being successful, thanks to semantic reasoning.

3 Ontological Based Specification and Reasoning for Alert Correlation

3.1 Knowledge Representation in Intrusion Detection

In front of an intrusion detection environment characterized by a very low detection rate, a high rate of false alerts, and a poor granularity of the information provided by alerts, a huge effort has been made by the intrusion detection community for the standardization of threats and attacks representation. The resulted data formalisms (e.g. IDMEF, TAXII, STIX, etc.) has provided a workspace for open communication between security tools and has been largely used in many alert correlation systems [4, 6].

Despite their different approaches, alert correlation systems have to share knowledge about attacks and the context in which they occur. However, many security tools do not care about how they represent their knowledge and how they use it. We think that having a coherent and formal model to represent knowledge is important for any correlation system. M2D2 is among the most important work in this area, it is a relational model that regroup essential information used in correlation, such as alerts, events, nodes, softwares, etc. In 2009, this model was revised by adding new concepts and by regrouping concepts into classes, this new model is called M4D4 [8]. In a recent work [10] proposed by Sadighian et al., authors have designed a set of comprehensive and extensible ontologies, and have implemented fusion and detection algorithms based on OWL-DL and SQWRL in order to allow reducing false positives.

3.2 The Proposed Ontology

Strassner defines the ontology as follows: “An ontology is a formal, explicit specification of a shared, machine-readable vocabulary and meanings, in the form of various entities and relationships between them, to describe knowledge about the contents of one or more related subject domains throughout the life cycle of its existence” [13]. This meaning of ontology is used mostly in the context of knowledge sharing.

IDMEF and M4D4 are among the most important work in terms of knowledge representation in the domain of intrusion detection. However, IDMEF does not contain enough information because it describes just alerts, and M4D4 is proposed in the context of network intrusion detection including contextual information (cartography and topology) and the description of vulnerabilities.

Fig. 1.
figure 1

Main concepts and relations of ONTO-SIEM

In this section, we propose an ontological conceptualization that combines the representation of IDMEF, M4D4, TAXII and other information sources such as OVAL, STIX and NVD. Generally, we can divide knowledge in intrusion detection into 5 groups [8]: Analyzers, Events and alerts, Attacks and Vulnerabilities, Contextual information, and Users and Attackers. Figure 1 shows the main concepts and relations of the proposed ontology, baptized “ONTO-SIEM”.

4 Ontology Based Event Correlation System

The use ONTO-SIEM is very suitable for event correlation within a SIEM, when many tools have to cooperate and to exchange information. Indeed, we developed a prototype of alert correlation system to show the importance and usefulness of this ontology. The architecture of our system consists of two essential modules: the conversion module that puts reported alerts into the ontology, as well as contextual information (topology and cartography), and the correlation module that allows reasoning about the constructed ontology. Figure 2 summarizes the architecture of the correlation system.

In order to use an ontology within an application, it must be specified in a formal representation. Indeed, a variety of languages exists that are used to represent conceptual models, with varying expressiveness, ease of use and computational complexity. We used OWL, which is a recommendation of The World Wide Web Consortium (W3C), widely used in web semantic. OWL is based on Description Logics. Description Logics are known for their expressiveness and their clearly defined semantics that allow a decidable reasoning.

Fig. 2.
figure 2

ONTO-SIEM based alert correlation system architecture.

In this work, we build our ontology using the API Jena (http://jena.apache.org/), and the reasoning is provided by Pellet (http://clarkparsia.com/pellet/) which is a full OWL-DL reasoner.

4.1 Populating the Ontology

To populate our ontology we need to use several tools. Information about hosts and the network topology are given using Nmap (http://nmap.org/). This tool can provide many information such as running hosts and their operating systems, servers listening in these hosts with their corresponding version, and many further information. Information about the vulnerabilities of systems and applications are given using Nessus (http://www.tenable.com/products/nessus/). Information about attacks are given in real time by IDS/IPS, in our system we used Snort (http://www.snort.org/) with a set of VRT and community rules. Notice that it is also possible to insert directly information into ONTO-SIEM by security operators, namely add information about equipments, systems and applications.

4.2 Reasoning with the Ontology

Reasoning is important in ontology because it allows to ensure the quality of ontology. Indeed, through the use of a reasoner, it is possible to test whether concepts are non-contradictory, and also to derive implicit relations.

Filtering Events: In Tables 1 and 2, we present some rules that can be used to filtering pertinent and not pertinent events. Most of theses rules are reused from the Pasagrada framework [10]. Notice that if an event is not classified as pertinent this does not means it is not pertinent. To decide so, the event must satisfy at least one rule from Table 2.

Table 1. Filtering pertinent events
Table 2. Filtering not pertinent events

Rule 1 selects events generated by analyzers that can actually monitors the target. For example, an IDS can only detect events that occur in the network to which it is connected. This can be explicitly provided by the relation monitors or inferred, for example for NIDS, as follows.

$$\begin{aligned} monitors \equiv hosted-in \wedge connected \wedge netNodes \end{aligned}$$
(1)

Rules 2 and 4 select events based on the vulnerability of the OS and the Application, respectively. Some tools such as vulnerability scanners can confirm if an OS or an application is vulnerable or not to a given vulnerability. Obviously, this concern only known vulnerabilities, not zero-day vulnerabilities. Rules 4 and 6 are similar to rules 3 and 4, they just consider the equivalence between vulnerabilities reported by several organisms with different names. These tow rules deal with the case when different analyzers (IDS and scanners) refer to the same vulnerability with different names or references.

Rule 6 is the inverse of rule 1, it selects events reported by tools that does not actually monitor the target of the attack. Rules 7 selects events reported for target that is not actually vulnerable to the referred vulnerability. This concern both OS and Software vulnerabilities. The question now is haw to get such information, because traditionally scanners only report affected hosts not protected once. For instance we admit that such information is explicitly given by the relation IsNotVulnerable.

Aggregating Events: Here we consider only pertinent event for which we try to group events together in order to generate meta-event. A meta-event represent a summarizing of a single malicious activity that causes multiple elementary events. We distinguish tow types, Host based meta-event and Network-based meta-event. In this latter we can distinguish three sub-classes [10], within certain time interval (Table 3).

  1. 1.

    One-to-One (Rule-8). This can be an attack attempted by a single attacker against a single target, for example a SQL injection.

  2. 2.

    One-to-Many (Rule-9). This can be an attack attempted several time by a single attacker against many targets, for example a network or vulnerabilities scan.

  3. 3.

    May-to-One (Rule-10). This can be an attack attempted by several attackers against a single target, for example a DDoS.

Table 3. Network based events aggregating

For host-base meta-event, we consider several event’ features to decide to group or not events. These features are Node (N), User (U), Process (P), Service (S), and File (F) [10]. Based on these features and in case of a complete availability of data, we can distinguish two main subclasses (Table 4).

  1. 1.

    NUP (Rule-11), when many events have the same node, the same user and the same process.

  2. 2.

    NUF (Rule-12), when many events have the same node, the same user and the same file.

Table 4. Host based events aggregating

5 Experimental Results

ONTO-SIEM is implemented using Protégé which is a powerful editor supporting OWL-Dl, SWRL and other many reasoners such as HermiT, Pollet, etc. Protégé is powerful thanks to many plugins that can be add to it.

To evaluate the proposed rules we have used UNB-ISCX-2012 [11] which is an open Intrusion Detection Evaluation dataset. UNB-ISCX-2012 is an interesting benchmark because it provides a real labeled traffic which contains both attacks and normal activities. Moreover, this benchmark provides a complete capture of the traffic with a set of divers and multi-steps attack scenarios. Table 5 gives a summary about the benchmark and its attack scenarios.

Table 5. UNB-ISCX benchmark description

We tested our approach using 2 scenarios, namely “Infiltrating from inside” and “HTTP DDoS”. The first scenario consists to obtain access to a host inside the local network, and then the compromised host is used as a pivot to attack computers which are not accessible via the Internet. The second scenario consists of performing a stealthy, low bandwidth denial of service attack without the need to flood the network (for more details about the testbed architecture and the attack scenarios see [11]).

5.1 Populating ONTO-SIEM

The UNB-ISCX-2012 benchmark provides only the raw traffic collected during 7 days, and an xml file containing labeled attacks with their execution periods, but no thing about topology and the cartography of the testbed, as well as the Vulnerabilities. So, we did a had work to manually extract that information from the benchmark. For instance, in Table 6 we show some of used OSs and Softwares. Vulnerabilities are also manually insert into ONTO-SIEM.

Table 6. UNB-ISCX benchmark’ OS and Softwares

The raw traffic are analyzed using Prelude-Snort, and then reported alert are translated from xml to ONTO-SIEM. Prelude is an open SIEM which can easily be connected with many analyzers. Prelude’ output are in xml, so it will be very simple to translate them to ONTO-SIEM.

5.2 Discussing Results

Concerning the first scenario, snort has reported alerts which are translated into ONTO-SIEM using Prelude (https://www.prelude-siem.org/). Correlation process has given results shown in Table 7. The first filtering level has reduced the amount of alerts by 30% (3307 alerts are removed) which are alerts that refer to no existing hosts, while the second filtering level has reduced about 17% (1340 alerts are removed) of the amount of the remainder alerts. Therefore, after this preliminary filtering stage, more than 43% of alerts are reduced. Only 57% of the initial alerts will be concerned by further correlation processing using the ontology.

Table 7. Filtering alerts of scenario 1

The same discussion is given for the second scenario. Table 8 shows correlation process results. The first filtering level has reduced the amount of alerts by 27.95% (1561 alerts are removed) which are alerts that refer to no existing hosts, while the second filtering level has reduced about 10% (410 alerts are removed) of the remainder alerts. Therefore, after this preliminary filtering stage, more than 36% of alerts are reduced. Only 64% of the initial alerts will be concerned by further correlation processing using the ontology.

Table 8. Filtering alerts of scenario 2

6 Conclusion and Future Work

We proposed in this paper a domain ontology for a cooperative intrusion detection based on several data sources such as IDMEF, TAXII, STIX, M4D4, OVAL, NVD, etc. This ontology is implemented with OWL which is recommended by W3C since 2004 for the representation of ontologies in the Web Semantic. OWL is based on Description Logics which are a decidable fragment of the first order logic and are well suitable to represent structured information.

We have illustrated the usefulness of this ontology through an application in the context of alert correlation. This application allows automatic translation of alerts generated by IDSs to OWL, as well as contextual information generated by network and vulnerability scanners. Furthermore, a set of rules proposed to be inferred over the constructed ontology, these rules aim mainly to remove not pertinent alerts. This is very important to reduce the amount of alerts by analyzing in priority pertinent alerts. Other actions can be performed in the perspective to complete this work. Indeed, the proposed ontology need to be completed by more concepts and relation to allow a more comprehensive correlation rules, and also by using other reasoning mechanisms provided by OWL-Dl such as the verification of consistency and the satisfiability of concepts.