How can sliding HyperLogLog and EWMA detect port scan attacks in IP traffic?
- 3.7k Downloads
IP networks are constantly targeted by new techniques of denial of service attacks (SYN flooding, port scan, UDP flooding, etc), causing service disruption and considerable financial damage. The on-line detection of DoS attacks in the current high-bit rate IP traffic is a big challenge. We propose in this paper an on-line algorithm for port scan detection. It is composed of two complementary parts: First, a probabilistic counting part, where the number of distinct destination ports is estimated by adapting a method called ‘sliding HyperLogLog’ to the context of port scan in IP traffic. Second, a decisional mechanism is performed on the estimated number of destination ports in order to detect in real time any behavior that could be related to a malicious traffic. This latter part is mainly based on the exponentially weighted moving average algorithm (EWMA) that we adapted to the context of on-line analysis by adding a learning step (supposed without attacks) and improving its update mechanism. The obtained port scan detecting method is tested against real IP traffic containing some attacks. It detects all the port scan attacks within a very short time response (of about 30 s) and without any false positive. The algorithm uses a very small total memory of less than 22 kb and has a very good accuracy on the estimation of the number of destination ports (a relative error of about 3.25%), which is in agreement with the theoretical bounds provided by the sliding HyperLogLog algorithm.
KeywordsControl Chart Exponentially Weight Move Average Bloom Filter Attack Detection Destination Port
Denial of service (DoS) attacks are one of the most important issues in network security. They aim to make a server resource unavailable by either damaging data or software or flooding the network with a huge amount of traffic. Thus, the server becomes unreachable by legitimate users, causing a significant financial loss in some cases. Port scan is a particular DoS attack that aims to discover available services on the targeted system . It essentially consists of sending an IP packet to each port and analyzing the response to the connection attempts. Definitions found in the literature are enable to provide an absolute quantitative definition of port scan. The attack is rather defined by a comparison to the standard behavior. The attacker can discover not only available ports (or sevices) but also more relevant information about the victim such as its operating system, services owners, and the authentication method. Once the system vulnerabilities are identified, a future attack can be launched, engendering important damages. Various port scanning techniques have been developed and are very simple to install in order to launch serious port scan attacks. Nmap  is the most known port scan method. It was proposed by Fyodor in 2009. Zmap  is a faster scanning method developed by Durumeric et al. in 2013. It can scan the IPv4 address space in less than 45 min using a single machine.
Network operators are always looking for scalable solutions to detect on-line DoS attacks. Their objective is to stop the attack very quickly in order to avoid wasting network resources. So, the attack detection solution should ideally be deployed very close to the source of the attack, which is unrealistic as it means that it has to be implemented for each user. Moreover, the DoS attacks can be launched from several sources against a single victim, at the same time, and are called distributed attacks (DDoS) in this case. To detect such attacks, one has to consider the aggregated traffic issued from the several sources, because the contribution of each source can be considered as a normal traffic. Therefore, the attack detection solution has to be implemented in a core router network so as to analyze the traffic issued from several users. It has also to perform an on-line analysis and rise alarms in case of suspicious traffic. In the context of core network, the real-time processing is a big challenge. In fact, the analysis time of an IP packet has to be shorter packet inter-arrival time which is of only some nanoseconds in the current IP traffic carried by core networks (8 ns in an OC-768 link). Moreover, as attack detection is not the main role of the router, which can provide many other functions such as prioritization and quality of service, the amount of memory used for attack detection has to be very small.
The problem of DoS attack detection in IP traffic has been largely addressed by the network security community. Most of the proposed methods analyze the exhaustive traffic and maintain accurate statistics about the various flows (number of packets per communication (source-destination pair), number of SYN packets sent by each source address, etc.) (e.g., the threshold random walk method proposed in ). The memory size required by this kind of approach is proportional to the number of flows, which is clearly unscalable and not adapted to the current very high-bit traffic carried by very high speed links. To overcome this problem, it is necessary to dispense accurate statistics and to generate estimates which require less memory and are based on a faster processing. In this context, some recent probabilistic methods based on Bloom filters have been proposed (see [5, 6], and ). A Bloom filter is an efficient data structure of a limited size that guarantees fast processing, thanks to the use of hash functions.
Attack detection algorithms must first generate on-line aggregate information and statistics on the observed traffic, and then identify, based on the obtained statistics, the suspicious traffic that could correspond to attacks. Probabilistic algorithms can be used to extract statistics and estimates quickly, but they must be complemented by a decision phase that identifies attacks.
Various methods can be used for the decision phase. In , a detailed study of the port scan detection approaches is provided by Monowar et al. The different methods are divided into five main classes (soft computing, algorithmic, rule-based, threshold-based, and visual approaches). The performance of these algorithms is compared (accuracy, response time, etc.,). The results show that methods combining the data mining and threshold-based analysis are the most efficient in terms of false positive rates, robustness, and scalability. A common weakness of the threshold-based methods is that their accuracy is closely related to traffic characteristics. As an example, the detection mechanism used in , and , is based on the well-known problem of finding the top k elements from a data stream. This means that at most, k simultaneous attacks can be detected. Therefore, the parameter k must be well chosen in order to minimize false alarms and missed attacks. The aim of this paper is to use Bloom filters for the extraction of relevant information about the traffic and to automatically adapt a threshold-based algorithm to the on-line analysis context and the varying traffic conditions.
Organization of the paper
In this paper, we designed a new algorithm that detects on-line port scan attacks. The proposed method is mainly based on the sliding HyperLogLog algorithm  that we adapted to the context of port scan detection in IP traffic. Sliding HyperLogLog is an efficient algorithm that estimates the number of distinct elements over a sliding window. It is able to deal with a massive data stream and provides an accurate estimate using a very small memory. We used sliding HyperLogLog to analyze traffic and perform an on-line counting that we completed with a decisional mechanism that identifies port scan attacks.
The organization of this paper is as follows: The sliding HyperLogLog algorithm is presented in Section The sliding HyperLogLog algorithm. A detailed description of the proposed method for port scan detection is given in Section The proposed method for port scan detection. In this latter section, the counting method and the decisional mechanism are explained separately. The new detecting method is tested against experimental data collected from IP backbone network in Section Experimental results. It is also compared to other existing methods. Concluding remarks are presented in Section Conclusion.
The sliding HyperLogLog algorithm
The full specification of the HyperLogLog algorithm is given by the following pseudo-code:
The HyperLogLog algorithm
The main advantage of the HyperLogLog algorithm is that it provides an excellent cardinality estimation, with a relative accuracy of about , using a very small memory equal to m log2 log2(n/m) bits. m is the number of subsets, and n is the real cardinality of the multiset. In practice, using only 1.5 Kb, a cardinality of a one billion can be easily estimated with a typical standard error of about 2%.
The sliding window model is widely used in many applications requiring data stream management such as network monitoring, security, and financial applications. It consists of maintaining and updating some relevant statistics about the recent items of data stream. The sliding window can be logical or physical if it is, respectively, defined as the last N received items or the last time window T. Datar et al.  proposed a standard framework to adapt several applications (sums, averages, min, max, etc.) to the data stream context by adding a sliding window. They showed that their sliding window mechanism requires a memory overhead and adds a loss in the accuracy of the estimation. The additional error depends of course on the total used memory.
Two major results about accuracy and memory consumption of sliding HyperLogLog algorithm are detailed in . First, unlike in , adding the sliding window does not modify the accuracy of the algorithm. So, the accuracy of the sliding HyperLogLog algorithm is exactly the same as that in the HyperLogLog algorithm (a standard error of . Second, an upper bound to the total used memory was established. The total size of the m lists LFPM is bounded is Idsizem l n(n/m) bytes, where Idsize is the size of the item identifier <t v ,ρ(v)>. In practice, the timestamp t v is encoded on 4 bytes, and only 1 byte is sufficient for ρ(v).
The proposed method for port scan detection
Probabilistic counting method
We focus in this paper on a particular kind of port scan attack called vertical port scan . It consists of scanning many ports for a given destination. The number of destination ports can theoretically reach 65,536 as it is encoded on 2 bytes, but the commonly used destination ports are not very numerous. The used destination ports are mainly composed of the so-called, in , well-known ports (0-1023) and some registered ports (1024-49151). Therefore, the total number of distinct destination ports is a key observable to detect port scan attacks. In this context, the sliding HyperLogLog algorithm can be applied to count indefinitely, over a sliding window, the number of distinct destination ports. For each received packet (identified by the classical 5tuple composed of the source and destination addresses, the source and destination port numbers together with the protocol type), only the destination port will be considered and hashed into a random value. The sliding HyperLogLog algorithm will not perform an exact counting but will only provide an estimation. Therefore, a good choice of the parameters of the algorithm has to be done in order to ensure an acceptable error on the estimation. The number of buckets, m, is the crucial parameter of the counting method. With a high value of m, a smaller standard error can be achieved (), but a larger memory will be used (Idsizem l n(n/m) bytes). Moreover, m depends on the cardinality of the multiset: the number of distinct destination ports which can theoretically reach 65,536. The number of distinct elements per bucket has to be high enough to perform significant statistics. With a total number of buckets of 1,024 (m=1,024=210), a standard error of only 3.25% can easily be achieved. Notice that this choice does not depend on the traffic trace and can be used for any port scan attack detection.
There is clearly a tradeoff in the choice of the size of the sliding time window W. With a larger time window, one can answer requests concerning larger durations, for example, the number of destination ports in the last 30 min. But to deal with a larger time window, more information has to be stored. More precisely, the upper bound of the used memory (5m l n(n/m) bytes) depends on n, the number of distinct destination ports, which is closely related to the size of the time window. Moreover, the standard duration of the attack must be considered in the choice of the size of the time window: W has to be large enough to notice the impact of the attack on the traffic. The port scan attacks last about 20 min and should be detected from the first minute. Thus, W=60 s is a good choice for the size of the time window. So, the total used memory will be less than 22 kB, which is very reasonable for a router.
Some slow attacks, also called progressive attacks, are more difficult to detect because the intensity of the attack is increasing slowly. The attack lasts more than the standard duration in this case. To detect this kind of attack, one has to aggregate more the traffic in time. For the sliding HyperLogLog algorithm, a larger time window W′=5 min can be added. The algorithm is performed independently and in parallel for the two time scales: W=60 s and W′=5 min.
Once relevant statistics related to port scan attacks are provided from the counting process, one has to filter and classify these information in two groups: ‘standard behavior’ and ‘suspicious traffic’. This problem, also known as ‘change-point detection’ has been widely studied in the literature. Many methods have been developed by the community of statistics and data mining for several application fields.
In their  book entitled Detection of Abrupt Changes: Theory and Application, Basseville and Nikiforov provided the description and the performance analysis of a wide range of algorithms dealing with this problem of change detection. They classified these algorithms into three main categories: the elementary algorithms, the cumulative sum algorithm, and the Bayes-type algorithms.
The elementary algorithms use simple and intuitive concepts. They have many industrial applications, in particular, in the quality control field. One can cite the Shewart control charts algorithm that was first introduced by Walter Shewhart  in 1924. This technique has found many applications in improving the quality of manufacturing processes . A more efficient method, called ‘the geometric moving average control charts’ algorithm, was proposed later by Roberts in 1965 . This algorithm is also known as the ‘exponentially weighted moving average’ (EWMA) algorithm. Its key idea is to give different weights to the values of the observed process, to detect the change point: the recent values must be given more importance. Another solution presented in the ‘finite moving average’ (FMA) is to ignore very old observations by using a finite set of weights. The filtered derivative algorithm is another elementary algorithm introduced by Basseville and Gasnier  in 1981, based on the gradient techniques. It is widely used in the context of image edge detection.
The ‘cumulative sum’ (CUSUM) algorithm was designed by Page in 1954 . It is based on the sum of the process past observations. The CUSUM algorithm is well adapted to the detection of systematic small variations of the process.
The Bayesian algorithms were first introduced by Girshick and Rubin in 1952 . The main advantage of these methods is that they guarantee a robust performance with a formal proof of optimality, but they need an a priori knowledge about the observed process, more precisely, the distribution of the change time must be given in advance.
Roberts presents in  a comparative experimental study of all the different algorithms described above. The input parameters of all these algorithms are set to their optimal values, and the mean detection delays are compared. The main result is that the CUSUM algorithm outperforms the other algorithms when the observed process has small shifts.
Recall that in the context of on-line port scan detection, we focus on change-point detection in a high-speed data stream. Sequential data is provided on the fly from the counting process, and our purpose is to identify as quickly as possible the change point using a very small memory. According to Basseville and Nikiforov , all the approaches described above can be used in the context of on-line analysis. But, in our particular context of port-scan detection, no assumption about the distribution of the attack time can be made. So, the Bayesian algorithms are not adapted for such applications. Moreover, the total duration of an attack is about 20 min, so the detection delay must be small enough (less than 1 min) to stop the attack quickly. Therefore, the lack of reactivity of the CUSUM algorithm against abrupt change points may be a big weakness.
Sebastiao and Gama performed in  an experimental comparison between some particular algorithms well adapted for an on-line analysis. More precisely, they compared the efficiency of the four following algorithms: the statistical process control (SPC) , the adaptative windowing (ADWIN) , the fixed cumulative windows model (FCWM) , and the Page-Hinkley test (PHT) . These methods are closely related to the the three kinds of algorithms presented above. The main result of this paper is that PHT and SPC are less time- and memory-consuming, but in some cases, they endanger a high rate of false alarms.
To achieve our objectives in terms of detection delay, on-line analysis (only one pass over the whole data) and without any a priori knowledge about IP traffic characteristics, we choose to focus in this paper on an elementary algorithm: EWMA  proposed by Roberts. The main idea of the EWMA algorithm is to define a threshold delimiting a ‘standard behavior’ and to handle and update periodically an average of the observed data stream. The change point is then declared as soon as the average exceeds the fixed threshold. This algorithm is very simple to implement. Unlike ADWIN, the past values of the observed data are not stored. It does not require any data structure. It also has a lower complexity because for each observed data, one has only to update a weighted average . Compared to PHT and CUSUM, EWMA has the advantage to closely relate the importance of the observed data to its age which is more meaningful in the context of data stream. In PHT and CUSUM algorithms, all data history is equivalent and is considered in the same manner.
where Y(t) is the observed value at time t. λ is a multiplicative factor (0<λ≤1). It can be interpreted as a kind of correlation between Y(t) and EWMA(t). In practice, we want the moving average EWMA(t) to follow carefully the variations of the observed process Y(t). To give more importance to the past observations, than the current one, λ is very often taken smaller than 0.5. However, with very small value of λ, the algorithm becomes insensitive to some attacks having a small duration or a moderate intensity. That is why λ is usually between 0.2 and 0.5 in practice. EWMA(0), also called the target, is the average of the whole data set.
where the factor k is either set equal to 3 or chosen using the tables in Lucas et al.  in the enhanced version of EWMA proposed in 1990. s0 is the standard deviation calculated on the whole data set.
It is clear that the so-described EWMA algorithm cannot be directly applied for the on-line data stream analysis because it clearly requires two passes over the whole data set. In fact, it needs first to compute EWMA(0) and s0 using the whole data to fix the threshold UCL. Then, all the data will be considered again to detect the change points. To overcome this problem, we propose to add a learning step of some minutes at the beginning of the algorithm in order to initialize its parameters, namely, EWMA(0) and s0. No change-point detection will be performed during this learning step. So, we implicitly assume that this period corresponds to the ‘standard behavior’ and does not contain any anomaly.
Characteristics of the traffic trace used for attack detection
No. of IP packets
No. of flows
The flow is defined as the set of those packets with the same source and destination addresses, the same source and destination port numbers, and the same protocol type.
In this part, we focus on testing the counting process based on the sliding HyperLogLog algorithm. The number of distinct destination ports in the last time window W is estimated every 30 s. The time window is taken equal to 60 s, and the number of buckets m equals 1,024.
Decisional mechanism using EWMA algorithm
The input of this decisional part is the estimated number of distinct destination ports provided by the counting process. This information is received every 30 s; it concerns the last 60-s time window. The input data is presented in Figure 6. One can easily see several peaks that are very likely to correspond to some port scan attacks. Our objective here is to automatically identify these peaks using the EWMA algorithm. The multiplicative update factor λ is taken equal to 0.3 and the detection parameter k equals to 3. All the implementations are performed with R.
If, at any time t, EWMA(t)<LCL, then we restart the algorithm. The idea is that if we consider the worst case where the learning phase contains many port scan attacks, EWMA(t) will have smaller values at the end of the attacks, which can be detected by comparing EWMA(t) to the lower bound LCL defined in the learning phase. In other words, if there is no more correlation between the initial parameters calculated on the training set and the current values, the algorithm has to be restarted again. The learning phase is very useful as there is no absolute quantitative description of a port scan attack. The attack is simply defined as a significant deviation from a standard behavior that has to be learned online.
EWMA versus CUSUM algorithm
X(i) has a mean of 0 and a standard deviation equal to 1. The upper bound k is taken equal to 5 to have good ARL properties as mentioned in . Figure 11 shows that only the first attack is detected. The second attack that happens around the second 3,450 has a shorter duration and a smaller magnitude; that is why it has a limited impact on CUSUM(t) and can not be detected. Moreover, the duration of the first attack is largely overestimated. In fact CUSUM(t) is updated even in case of attacks, and unlike EWMA(t), in its original version, it takes a long time to reach normal values at the end of the attack. This can be explained by the fact that the CUSUM algorithm accumulates the effect of the attack over several sliding windows as it is based on a sum and gives the same weight to all the observed values.
In this paper, a new method identifying online port scan attacks in IP traffic is proposed. First, some relevant statistics are extracted from the data stream using the sliding HyperLogLog algorithm. A good choice of the parameters of the this algorithm has to be done in order to ensure an acceptable accuracy of the statistics. Second, a change point detection method based on the EWMA algorithm is used to identify suspicious traffic. It is mainly an adaption of the EWMA to the data stream context. For this purpose, a learning phase is added at the beginning of the algorithm in order to initialize its parameters. Then, a new constraint is added to the moving average EWMA(t) update to overcome the false-positive problem when this latter exceeds the UCL detection threshold. Finally, we run experiments on a real traffic trace captured in the IP backbone network of Orange Labs in December 2007 in the context of the ANR-RNRT OSCAR project. The obtained results confirm the efficiency of the adapted combination of the HyperLogLog and EWMA algorithms in terms of accuracy, memory usage, and time response.
- 1.de Vivo M, Carrasco E, Isern G, de Vivo GO: A review of port scanning techniques. SIGCOMM Comput. Commun. Review 1999, 8: 411-430.Google Scholar
- 2.Staniford S, Hoagland JA, Alerney McJM: Nmap Network Scanning: The Official Nmap Project Guide to Network Discovery and Security Scanning. (Insecure, 370 Altair Way Ste 113 Sunnyvale, California 94086-6161 US).Google Scholar
- 3.Durumeric Z, Wustrow E, Halderman JA: ZMap: Fast internet-wide scanning and its security applications. Paper presented at the 22nd USENIX security symposium. Washington, D.C., USA, 14–16 Aug 2013Google Scholar
- 4.Jung J, Paxson V, Berger A, Balakrishnan H: Fast portscan detection using sequential hypothesis testing. Paper presented at IEEE symposium on security and privacy. Claremont Resort Oakland, California, USA, 9–12 May 2004Google Scholar
- 7.Sridharan A, Ye T, Bhattacharyya S: Connectionless port scan detection on the backbone. Paper presented at the 25th IEEE performance, computing, and communications conference (PCCC),. Phoenix, AZ, USA, 0–12 April 2006Google Scholar
- 9.Levy-Leduc C: Detection of network anomalies using rank tests. Paper presented at the EUSIPCO, Laussane, Switzerland, 25–29 Aug 2008Google Scholar
- 10.Chabchoub Y, Hebrail G: Sliding HyperLogLog: estimating cardinality in a data stream over a sliding window. Paper presented at the ICDM workshop on large-scale analytics for complex instrumented systems (LACIS), Sydney, 13 Dec 2010Google Scholar
- 11.Flajolet P, Fusy E, Gandouet O, Meunier F: Hyperloglog: the analysis of a near-optimal cardinality estimation algorithm. Paper presented at the 13th conference on analysis of algorithm (AofA), Juan des Pins, 17–22 June 2007, 127-146.Google Scholar
- 12.Heule S, Nunkesser M, Hall A: HyperLogLog in practice: algorithmic engineering of a state of the art cardinality estimation algorithm. Paper presented at the EDBT 2013 conference, Genoa, Italy, 18–22 March 2013Google Scholar
- 13.GitHub: PostgreSQL extension adding HyperLogLog data structures as a native data type. . Accessed 20 Dec 2013 http://github.com/aggregateknowledge/postgresql-hll
- 15.Basseville M, Nikiforov I: Detection of Abrupt Changes: Theory and Application. Prentice-Hall, Upper Saddle River; 1993.Google Scholar
- 16.Shewhart W: Bell Telephone Laboratories series: Economic Control of Quality of Manufactured Product. Princeton: D. Van Nostrand Company; 1931.Google Scholar
- 17.Box GEP, Jenkins G: Time Series Analysis, Forecasting and Control. San Francisco: Holden-Day; 1990.Google Scholar
- 23.Sebastiao R, Gama J: A study on change detection methods. Paper presented at the 14th Portuguese conference on artificial intelligence (EPIA), Aveiro, Portugal, 12–15 Oct 2009Google Scholar
- 24.Gama J, Medas P, Castillo G, Rodrigues P: Learning with drift detection. In Advances in Artificial Intelligence. Edited by: Bazzan ALC, Labidi S. New York: Springer; 2004:286-295.Google Scholar
- 25.Bifet A, Gavalda R: Learning from time-changing data with adaptive windowing. Paper presented at the 7th SIAM international conference on data mining, Minneapolis, MN, USA, 29 Sept–1 Oct 2007Google Scholar
- 26.Sebastiao R, Gama J: Monitoring incremental histogram distribution for change detection in data Streams. In Lecture Notes in Computer Science: Knowledge Discovery from Sensor Data. Edited by: Gaber MM, Vatsavai RR, Omitaomu OA, Gama J, Chawla NV, Ganguly AR. New York: Springer; 2010:25-42.CrossRefGoogle Scholar
- 27.Douglas C: Introduction to Statistical Quality Control. New York: Wiley; 2004.Google Scholar
- 28.Lucas JM, Saccucci MS, Baxley RVJ, Woodall WH, Maragh HD, Faltin FW, Hahn GJ, Tucker WT, Hunter JS, MacGregor JF, Harris TJ: Exponentially weighted moving average control schemes: properties and enhancements. Technometrics 1990, 32: 1-29. doi:10.1080/00401706.1990.10484583 10.1080/00401706.1990.10484583MathSciNetCrossRefGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.