Introduction

Security experts adapted their strategy due to the significant increase in cyberattacks, in particular, the increase in their complexity and resolution; which led to the application of both active and passive defence systems as a part of their defensive strategies [8]. As an active defence system, a honeypot functions as a decoy to entice cyberattackers to reveal information which can be utilised by security experts in updating their security procedures [28]. As a concealed system, it is essential to disguise its identity for its successful operation. Nonetheless, cyberattackers always attempt to uncover these honeypots and one of the most effective techniques for revealing their identity is a fingerprinting attack. Generally, for any unconcealed system fingerprinting is not of great concern, but for a honeypot it may be end of its life, resulting in significant consequences, for example, it can be exploited as a zombie by an attacker to attack others [35].

A Honeypot can be protected from a fingerprinting attack, however, this is not consistent with the principle of a honeypot, which is established with the purpose of gaining information about attackers. It would be beneficial if an attempted fingerprinting attack can be predicted timely. Unfortunately, no specific method is available to detect and predict an attempted fingerprinting attack in real-time as it is challenging to distinguish it from other attacks. Therefore, this paper presents a Computational Intelligence (CI) enabled honeypot that is capable of discovering and predicting an attempted fingerprinting attack by using a principal components analysis (PCA) and fuzzy inference system (FIS). The proposed CI-enabled design is focused on the most common Operating System (OS) fingerprinting attack, which is performed on the target system to obtain specific information regarding the OS, services, device type and type of architecture [3]. The mechanism used is to send a stream of fabricated TCP/IP packets by an attacker to prompt a response in the form of TCP/IP packets containing fingerprint information of the target system [33]. Conversely, the proposed CI-enabled honeypot analyses this stream of TCP/IP packets sent by an attacker to obtain signs of an attempted fingerprint attack on the honeypot.

Initially, this paper performs a simulation of fingerprinting attacks on the honeypot to collect attack data (TCP/IP packets). The simulation is accomplished by employing a KFSensor honeypot, and Nmap and Xprobe2 fingerprinting tools. The attack simulation data is captured in two different logs by the KFSensor honeypot and Wireshark analyser for forensic analysis. Subsequently, based on preliminary observations and empirical evidence, a number of important fields of collected TCP/IP packets are analysed to ascertain abnormalities or patterns as an indication of an attempted fingerprinting attack. Successively, it applies PCA to determine the most influential fields, which can be further utilised to develop an effective approach to predict the fingerprinting attack. Later, it proposes an FIS to correctly correlate the identified influential fields by the PCA and predict an attempted fingerprinting attack and its severity level on the honeypot. Finally, the proposed system is successfully tested against the five popular fingerprinting tools. This testing includes two previous tools Nmap and Xprobe2 and three new tools NetScanTools Pro, SinFP3 and Nessus; which were not involved in the development of the CI-enabled honeypot.

The rest of the paper is organised into the subsequent sections: “Background information” explains about honeypots, fingerprinting attack and an OS fingerprinting attack. “Simulation of various types of OS fingerprinting attacks” describes the fingerprinting attack simulation on the honeypot for the collection of attack data and its detailed analysis. “Identifying abnormalities/patterns in the various fields of TCP/IP protocols as signs of os fingerprinting attack” discusses a comprehensive examination of the chosen TCP/IP fields and their related abnormalities/patterns as signs of a fingerprinting attack. “Design and development of computational intelligence enabled honeypot for predicting fingerprinting attacks on honeypots” presents the design and development of a CI-enabled honeypot for predicting fingerprinting attacks on honeypots. For this design it performs PCA on the selected TCP/IP fields to establish the most significant fields to predict fingerprinting attacks on honeypots. Subsequently, it proposes a fuzzy inference system to predict fingerprinting attacks and their severity levels on honeypots. Finally, “Conclusion” concludes the paper and discusses the possible future enhancement of the CI-enabled honeypot.

Background information

Honeypots

A honeypot is a concealed security system that functions as a decoy to entice cyberattackers to reveal their information [31]. It deceives, detects and diverts cyberattackers, whilst contemporaneously gathering their information [11]. Honeypots are designed to represent themselves as a potential target for cyberattackers; subsequently, deployed in an isolated manner and closely monitored to uncover vulnerabilities and new attacks to the network and to develop an enhanced defensive strategy [37]. Most honeypots are used to imitate the functionalities of a real network to entice cyberattackers to attack them assuming them to be a real network and revealing their information.

Honeypots are categorized into different categories on the basis of their design and level of interaction with cyberattackers: low-interaction honeypots, medium-interaction honeypots and high-interaction honeypots [1, 29, 36]. Low-interaction honeypots normally imitate real systems and have restricted communication with cyberattackers; whereas, high-interaction honeypots are normally real systems that have unrestricted communication with cyberattackers [38]. Medium-interaction honeypots have greater ability to communicate with cyberattackers than low-interaction honeypots, however, they have less functionality than high-interaction honeypots [7]. Honeypots can be a crucial active defence tool for organisations or researchers in an attempt to discover advanced attacks and related techniques which are not possible through other security tools; however, for effective operation, additional cost and skills are required in their design and management. Honeypots are effective security systems but should not be used as the only defensive system or an alternative to replace other security systems to protect the network.

Fingerprinting attack

In a fingerprinting attack, an attacker usually sends a sequence of fabricated packets to the target system to provoke a response in the form of packets containing fingerprint information with the intention of its identification. Fingerprinting attacks are categorized into two categories on the basis of the activity of cyberattackers: active and passive fingerprinting attacks. In an active fingerprinting attack, cyberattackers send carefully constructed packets to the target system, analysing their response packets to extract fingerprinting information [31]. In a passive fingerprinting attack, cyberattackers do not send any packets to the target system rather they sniff, capture and analyse traffic from the target system to extract fingerprinting information [31]. Active fingerprinting attacks are more accurate than passive fingerprinting attacks as the result is based on the direct response from the target system. Therefore, in this design of CI-enabled honeypot, only an active fingerprinting attack is considered.

OS fingerprinting attack

OS fingerprinting attacks are the most prevalent fingerprinting attack, which is performed on the target system to obtain specific information regarding the OS, services, device type and type of architecture [3]. The mechanism is to send a stream of fabricated TCP/IP packets from the attacker to elicit response TCP/IP packets containing fingerprint information of the target system [33]. After analysing a number of the fields of certain TCP/IP protocols of the response packets, a fingerprint is constructed and compared against the fingerprint database to find the exact or closest matched fingerprint of the target system. Cyberattackers are highly successful in performing OS fingerprinting attacks as the same TCP/IP protocol suite is implemented by every OS distinctively, as a result, distinct responses are produced for the same TCP/IP query. Consequently, different responses generated by different operating systems divulge substantial information about that system to cyberattackers. The complete process of an OS fingerprinting attack is dependent on TCP/IP protocol suite; therefore, it is sometimes referred to as TCP/IP stack fingerprinting. After obtaining precise information about the OS and target system, cyberattackers can launch more complex attacks with greater severity against the target system.

Simulation of various types of OS fingerprinting attacks

Each OS implements the TCP/IP protocol suite distinctively, therefore, acquiring a fingerprint of any OS requires an analysis of the TCP/IP packets sent by that OS, which can offer significant information to construct an accurate fingerprint of that OS. This process of finding a fingerprint of a particular OS is primarily based on the analysis of the TCP, ICMP and UDP protocols as every fingerprinting tool/technique sends and receives these three protocol-based packets to the target machine. However, certain fingerprinting tools/techniques primarily employ TCP packets to perform the fingerprinting attack and certain primarily employ ICMP packets, thus, the development of any successful method to detect this attack should involve examination of both categories (TCP and ICMP) of tools/techniques. Thus, this simulation covers both TCP-based and ICMP-based OS fingerprinting attacks to acquire the OS fingerprint (and information about associated services) of a honeypot system for resolving its identity. This information about OS and associated services of a honeypot system would assist an attacker to identify and exploit that honeypot system and possibly transform it into a zombie to attack others. The simulation of all OS fingerprinting attacks are accomplished by employing a KFSensor honeypot, and Nmap and Xprobe2 fingerprinting tools. KFSensor is a commercially available graphical user-interface honeypot for the Windows platform [14], Nmap is TCP-based and Xprobe2 is an ICMP-based OS fingerprinting tool. This experimental simulation generated fingerprinting data for analysis and, abnormalities and patterns detection as symptoms of an OS fingerprinting attack.

TCP-based OS fingerprinting attack using Nmap

Nmap is the most powerful and reliable scanning tool which is very effective in performing an OS fingerprinting attack (mainly TCP-based). Many Nmap scripts use heuristics and fuzzy signature matching to reach conclusions about the target host OS or services [19]. During an OS fingerprinting attack, Nmap sends a stream of TCP/IP packets (approximately 16 or more), to identified open and closed ports on the target machine [18]. This stream of TCP/IP packets contains TCP, UDP, and ICMP packets, however, this counting does not include all the retransmitted packets. These packets/probes are aimed at several existing ambiguities and their exploitation in the standard protocol Request for Comments (RFCs). When the target machine sends a reply back to the Nmap machine for these packets/probes, Nmap analyses values of various parameters of TCP, ICMP and UDP packets and constructs an OS fingerprint to match against its database of OS signatures [10]. Depending on the OS signature matching result, it predicts the possible OS of the target machine, when there is no exact match it can use its fuzzy technique to predict the result [15].

Table 1 shows the five different Nmap scripts for an OS fingerprinting attack. The first Nmap script is the basic OS fingerprinting command that reveals the OS fingerprint and several other details such as OS version numbers, device type and architectural information [20]. The second Nmap script offers more descriptive fingerprinting information such as OS type, device type, host script and traceroute. The third Nmap script utilises the fuzzy approach to predict the closest matched OS (in percentages) in event that it does not find exact match [15]. The fourth Nmap script is used to perform OS fingerprinting continuously for the given number of attempts to improve the accuracy of prediction. The fifth and last Nmap script is completely different from other four scripts and utilises a different signature database for matching any fingerprint. It discovers information relating to various services running on different ports such as HTTP, FTP, SMTP, SSH, Telnet and DNS. This script can be executed with different intensity from 0 to 9, where 9 is the highest intensity which improves the accuracy of prediction [17, 16]. The first four Nmap scripts use Nmap database called nmap-os-db [22], and the fifth Nmap script uses Nmap database called nmap-services [21]. For accuracy and to discount any outlier data, each Nmap script (with various sub-options) is executed 100 times to record the results in various network conditions and observe the retransmitted packet pattern.

Table 1 Nmap OS fngerprinting attack scripts

ICMP-based OS fingerprinting attack using Xprobe2

Nmap is a powerful and reliable fingerprinting tool, however, its results largely rely on TCP packets; consequently, an ICMP-based fingerprinting simulation and analysis is essential to propose a generic solution. Xprobe2, is one of the first ICMP-based fingerprinting tools, which utilises ICMP packets and is based on the signature engine and fuzzy signature matching [5]. During an OS fingerprinting attack, Xprobe2 sends a stream of TCP/IP packets (approximately 10 or more), to identified open and closed ports on the target machine [40]. This stream of TCP/IP packets contains ICMP, TCP, and UDP packets, not including all the retransmitted packets. Xprobe2 consists of 13 modules, and Xprobe2++ or Xprobe2-ng consists of an additional 3 modules (fingerprint: icmp_info, app: ftp, app: http), which it utilises to find an OS fingerprint [40]. This tool is both more effective and quicker than Nmap due to the utilisation of fewer number of TCP/IP packets. Nevertheless, it is obsolete and not updated, as a result of this, it is not able to ascertain latest OSs including Windows 7 on the honeypot system [2]. Nonetheless, this paper is focused on the counter strategy of identifying and predicting an OS fingerprinting attack, and for this, Xprobe2-based simulation is imperative to analyse the ICMP-based OS fingerprinting attack as it is one the very first ICMP-based OS fingerprinting tools and the basis for all ICMP-based tools.

Table 2 shows the five different Xprobe2 scripts for an OS fingerprinting attack. The first Xprobe2 script is a basic OS fingerprinting command that determines a fingerprint of an OS running on an intended system as per its basic operation [5]. The second Xprobe2 script determines a fingerprint of an OS depending on the utilisation of specific modules, which can provide different results based on the selected modules. The third Xprobe2 script determines a fingerprint of an OS by sending more traffic to an intended system because parameter -B sends consecutive TCP handshake requests to any open TCP port such as 80, 443, 23, 21, 25, 22, 139, 445 and 6000 on an intended system and expects a SYN ACK reply [6]. The fourth Xprobe2 script determines a fingerprint of an OS by utilising an internal port scanning module that performs a port scanning of indicated TCP and/or UDP port(s) [6]. The fifth Xprobe2 script determines a fingerprint of an OS by utilising additional details regarding a protocol, port and the current status via parameter -p. The protocol can be chosen from TCP or UDP, the port number from 1 to 65,535, and the current status (Open or Closed) of a port. In case of a closed port, an intended system may reply with RST packet for a TCP port, and may reply with ICMP Port Unreachable packet for a UDP port. In case of an open port, an intended system may reply with SYN ACK packet for a TCP port, and may not reply (send a packet) for a UDP port [6]. Similar to the Nmap experiment, to obtain accurate results and removing any outliers in the data, each Xprobe2 script (with various sub-options) is executed 100 times to record the results in various network conditions and observe the pattern of retransmitted packets.

Table 2 Xprobe2 OS fingerprinting attack scripts

Identifying abnormalities/patterns in the various fields of TCP/IP protocols as signs of OS fingerprinting attacks

The experimental simulation data for both TCP and ICMP based OS fingerprinting attacks collected in the previous section is analysed in this section. Each stream of TCP/IP packets received from the attacker is analysed to reveal any observed abnormalities/patterns in the various fields of TCP/IP protocols (i.e. TCP, ICMP, UDP and IP) [34]. This analysis identifies the ten indicator fields of TCP/IP protocols based on their detected discrepancies in the attack simulation data. These ten indicator fields mostly include TCP and IP fields as shown in Figs. 1 and 5. Additionally, these ten TCP/IP fields are analysed to emphasis their weight based on the literature and the core attack principles of popular OS fingerprinting tools/techniques.

Fig. 1
figure 1

Investigated fields of TCP Header for the attempted OS fingerprinting attack

Discovering abnormalities/patterns in TCP Flags

TCP is comprised of six standard flags (SYN, ACK, URG, PSH, RST, FIN) that controls the nature and flow of the transmission. There are several flags or combination of flags which are considered as illegal/abnormal flags based on the RFCs of TCP, however, it does not explain the handling of such illegal/abnormal flags. As a result, it is managed by the OS and thus, different OSs generate different responses for an illegal/abnormal flag or combination of flags. This is a significant concern for the security community as attackers exploit these responses to determine the OS of the target machine. A number of these illegal/abnormal TCP flags can be utilised as a good indicator of an OS fingerprinting attack, which relatively straightforward to find as they are renowned. Some OS fingerprinting tools utilise additional control flags (CWR, ECN) and three Reserved Bits in their attack techniques. This analysis includes all the possible illegal/abnormal flags or combination of flags which can be an indication of an OS fingerprinting attack. The inclusion of these additional flags are to ensure that the proposed approach is a generic approach, however it explains the findings of the experiment regarding illegal/abnormal flags.

URG/PSH/FIN probing

This is one of the well-known abnormal flag combinations (called Xmas probe), which exploits flaws in the TCP RFC 793 to determine the open and closed ports. This URG/PSH/- FIN probing is only effective with those operating systems that conform to the TCP RFC 793. Nevertheless, an attacker can send this URG/PSH/FIN packet to understand the status of any port. Upon receiving this packet on the port of the target machine, a specific OS-based response is generated by that machine, which can give an indication the OS of a target machine. This is an important probe in an OS fingerprinting attack; however, it is an abnormal flag combination that can be combined with other indicators as a sign of an OS fingerprinting attack. The captured TCP packet with URG/PSH/FIN probing during an OS fingerprinting attack is shown in Fig. 2.

Fig. 2
figure 2

Captured TCP packet with URG/PSH/FIN probing during an OS fingerprinting attack

NULL packet

The NULL probe is another abnormal packet wherein no flag is set but contains a packet sequence number. Nevertheless, an attacker can send this NULL packet to understand the status of any port. Upon receiving this packet on the port of the target machine, a specific OS-based response is generated by that machine, which can give an indication of the OS of a target machine. This is an important probe in an OS fingerprinting attack; however, it is an abnormal flag combination that can be combined with other indicators as a sign of an OS fingerprinting attack. The captured TCP packet with NULL probing during an OS fingerprinting attack is shown in Fig. 3.

Fig. 3
figure 3

Captured TCP packet with NULL probing during an OS fingerprinting attack

Reserved Bit Probing

There are 3 reserved bits in the TCP header for future use. These reserved bits should not be used and are always set to zero. The captured TCP packet utilising reserved bits during an OS fingerprinting attack is shown in Fig. 4. This symptom can be combined with other indicators as a sign of an OS fingerprinting attack.

ECN-echo probing

The Explicit Congestion Notification (ECN) flag offers added functionality for notifying hosts about network congestion without dropping packets. It is an additional feature that may be used for the two hosts if they are ECN-enabled. The captured TCP packet with ECN-Echo probing during an OS fingerprinting attack is shown in Fig. 4. This symptom can be combined with other indicators as a sign of an OS fingerprinting attack (Fig. 5).

Fig. 4
figure 4

Captured TCP packet with Reserved Bit and ECN-Echo probing during an OS fingerprinting attack

Fig. 5
figure 5

Investigated fields of IP Header for the attempted OS fingerprinting attack

FIN Probing

The FIN flag is used to close the connection and, it should only be sent when the connection was initiated previously. Therefore, this FIN packet is violating the rules of TCP that would never occur in the real world. Nevertheless, an attacker can send this FIN packet as an unconnected packet for knowing the status of any port. Upon receiving this packet on the port of the target machine, a specific OS-based response is generated by that machine, which can give an indication of the OS of a target machine. This is a very important probe in an OS fingerprinting attack; however, it is an abnormal flag that can be combined with other indicators as a sign of an OS fingerprinting attack.

SYN/FIN probing

This pair of flags are reciprocally exclusive and usually not used in the same packet. Therefore, this SYN/FIN packet is violating the rules of TCP that would never occur in the real world. Nevertheless, an attacker can send this SYN/FIN packet to understand the status of any port. Upon receiving this packet on the port of the target machine, a specific OS-based response is generated by that machine, which can give an indication of the OS of a target machine. This is an important probe in an OS fingerprinting attack; however, it is also an abnormal flag combination that can be combined with other indicators as a sign of an OS fingerprinting attack.

Discovering abnormalities/patterns in TCP Options

The majority of fingerprinting tools utilise TCP Options field of the TCP header because it is an adaptable field and can be of any size from 0 to 40 bytes. The TCP options field may contain some or all attributes: Maximum Segment Size (MSS), Window Scaling, Selective Acknowledgements (SACK), Timestamps, and Nop. Therefore, every OS customises this TCP Options field based on its implementation which can be identified as a pattern of that OS. Conversely, the TCP options field can be used to identify an OS fingerprinting attack by finding abnormalities/patterns in the packets received from an attacker. This can be combined with other indicators as a sign of an OS fingerprinting attack.

Discovering abnormal/frequent uses of TCP urgent pointer

TCP provides the facility to mark certain amount of data as urgent, which is indicated by setting the URG flag. This Urgent Pointer field indicates how much of the data in the segment is urgent. This field and URG flag jointly allow an application to forward urgent data immediately by creating a secondary out of band channel without waiting in sequential send queue. Nonetheless, most users are uncertain about using this field correctly. Thus, this ambiguity offers a possible opportunity to attackers to exploit this field for a fingerprinting attack. At the same time, the improper use of this Urgent Pointer may reveal a potential OS fingerprinting attack.

Discovering abnormalities/variations in TCP Window Size

TCP Window Size is important field to decide the total amount of bytes that can be sent successfully without waiting for an acknowledgement. TCP Window Size is maintained by both sender and receiver due to the bidirectional nature of TCP, however, fixed limit is determined by receiver [27, 32]. This field is mainly used for network troubleshooting, application baselining or preventing network congestion at the receiver end. This is the important field for flow control and could be exploited for a fingerprinting attack. Equally, this TCP Window Size can be looked at for finding substantial discrepancies and repetitive cases of zero windows that could reveal a potential OS fingerprinting attack.

Discovering abnormal/frequent uses of IP service type/Type of service (TOS) Field

This is an IP datagram field that is used to describe its various quality of services. It is an 8-bit field consisting of several quality parameters, namely, Precedence, Speed, Throughput, Reliability and Cost as shown in Table 3 [4]. Some of the QoS parameters may not be frequently used in regular communications; therefore, their frequent or anomalous use may reveal irregular actions and perhaps the probability of an OS fingerprinting attack.

Table 3 IP service type/Type of service (TOS) specifications

Discovering abnormalities/commonalities in IP identification (IPID) field

In a TCP/IP network, the maximum size of a datagram is limited to the processing capacity of that network, which is called the Maximum Transmission Unit (MTU). Therefore, the successful data transmission process requires fragmentation of all those datagrams, which are greater than the MTU. The IPID field facilitates fragmentation (and later reassembly) of IP datagrams with a unique ID, which is incremented whenever an IP datagram is sent from source to the destination. This IPID is used to reassemble all fragmented IP datagrams (which will have the same IPID) at the receiver end. The exact order of the fragmented datagrams during the reassembly is determined by the fragment offset. The More Fragments (MF) flag is used to determine if fragmentation is allowed, and whether more fragments are pending. Similarly, the Don’t Fragment (DF) flag is used to deny fragmentation, resulting the drop of packets greater than the MTU size.

The updated specification of the IPID Field (RFC 6864) states that it must not be utilised for any purpose other than fragmentation (and reassembly) [39]. However, it is not uncommon to set its value to zero while using it for numerous pings, and for numerous SYN-ACKs from the same source. Irrespective of IPID standard guidelines, its implementation is still ambiguous, which leads to its exploitation by attackers for various types of attacks and possibly a fingerprinting attack. Similarly, this field can be analysed for various sequences of IPID or commonality of fragmented packets of the same IPID number for finding a sign an OS fingerprinting attack.

Discovering abnormalities in IP time-to-live (TTL) value

The IP TTL field is used to determine the lifetime of an IP datagram in the network. It can be defined as a counter or timestamp and once it is elapsed, the corresponding IP datagram is discarded or revalidated. This field was added to the IP header to restrict the time an IP datagram can spend on any network due to the connectionless nature of IP. This field can be exploited to perform various kinds of attacks including an OS fingerprinting attack, where an abnormal TTL value or a TTL value of less than or equal to one can be used. Conversely, these TTL abnormalities may provide a sign of an OS fingerprinting attack.

Discovering abnormalities/patterns in UDP Requests

UDP is a very useful protocol in many probing techniques due to its connectionless nature. All OS fingerprinting tools use UDP packets in conjunction with TCP and/or ICMP packets to collect fingerprinting information from the target machine. An attacker sends UDP packets to a port of the target machine and may or may not receive response depending on the open/closed port. The target machine replies with an ICMP error message- Destination Unreachable (ICMP Type 3) if the port is closed, otherwise, receives no reply for an open or filtered port. Generally, the UDP packet used in OS fingerprinting is either empty or set to a fixed payload. An attacker can also set IP DF flag in the UDP packet that can prompt the target machine to reply with an ICMP error message. These symptoms can be found in the UDP packets received from an attacker to identify an OS fingerprinting attack. This can be combined with other indicators as a sign of an OS fingerprinting attack.

Discovering abnormalities/patterns in ICMP requests

ICMP is an error announcing protocol that is used for troubleshooting, control and error message services. It is used by network devices (e.g. routers, gateways, hosts) to announce error messages when there is an issue in delivering packets. As a result of this, an attacker can use legitimate ICMP request packets, ICMP Echo Request (Type 8), ICMP Router Solicitation Request (Type 10), ICMP Timestamp Request (Types 13), ICMP Information Request (Type 15-Deprecated) and ICMP Address Mask Request (Type 17-Deprecated) to collect significant information about an OS of the target machine [25, 26, 30]. However, most OS fingerprinting tools/techniques utilise abnormal ICMP requests by changing some of the parameters of these ICMP requests. For example, an abnormal ICMP Echo request (Type 8) can be easily determined by examining its Code value which should always be Code 0, however, some OS fingerprinting tools use the invalid Code value in their attacks. These abnormalities can be found in the ICMP request packets received from an attacker to identify an OS fingerprinting attack. This can be combined with other indicators as a sign of an OS fingerprinting attack.

Discovering abnormalities/patterns in ICMP packet size

ICMP packets are normally used to report errors in the standard format and therefore, their size is relatively stable with respect to particular OS, and it is in the predictable range [25, 26, 30]. When the common size of an ICMP packet is determined as a network baseline, it is relatively straightforward to compare normal and abnormal ICMP packets without investigating their contents in a detailed way. For example, in an Nmap-based experimental simulation, the baseline size was 74 bytes (i.e. most common ICMP packet size in Windows), and the size of collected ICMP packets by KFSensor Honeypot was 149 and 179 respectively. The recorded size of these two ICMP packets for all the Nmap experimental iterations was the same. This is one clear indication of pattern/abnormality found in the ICMP request packets received from an attacker to identify an OS fingerprinting attack. This can be combined with other indicators as a sign of an OS fingerprinting attack.

Design and development of computational intelligence enabled honeypot for predicting fingerprinting attacks on honeypots

The design of Computational Intelligence (CI) enabled honeypots to utilise two approaches, namely a Principal Components Analysis (PCA) and a Fuzzy Inference System (FIS). The PCA is utilised to determine only the most influential TCP/IP fields from the previously observed several TCP/IP fields, which can be further utilised to develop an effective approach to predict an OS fingerprinting attack. Then FIS is designed to utilise and correctly correlate the identified influential fields by PCA and predict an attempted OS fingerprinting attack and its severity level on the honeypot. The complete working procedure of this CI-enabled honeypot is shown in Fig. 6.

Fig. 6
figure 6

Computational Intelligence enabled honeypot for predicting OS fingerprinting attacks on honeypots

Table 4 Principal components analysis of targeted TCP/IP fields of collected fingerprinting data

Principal components analysis to determine the most influential TCP/IP fields for predicting fingerprinting attacks on honeypots

Principal components analysis is one of the most effective computational techniques for dimensionality reduction by feature extraction while retaining most of the information. The primary reasons for the preferred choice of PCA over other techniques are:

  • A very efficient technique for smaller dimensions which is the case here

  • The decreased requirements for capacity and memory which makes the proposed design, a lightweight system

  • The low noise sensitivity which is a great advantage for the volatile network traffic

  • It uses simple statistical calculations which is available with most of the ordinary tools that avoids the need of complex programming or machine learning tasks

  • A lack of redundancy of data due to orthogonal components

  • A synchronized low-dimensional representation of the variables

Based on the comprehensive research on the exploitation of various TCP/IP fields in several attacks, and subsequent experimental simulation of an OS fingerprinting attack in this work concluded the ten evidential TCP/IP fields that may reveal an OS fingerprinting attack. To aid prediction of an OS fingerprinting attack, it is worthwhile to select only the most significant fields out of the ten chosen fields and also establish their corresponding relationships with each other. This can be accomplished using a PCA, where principal components with higher variances will be considered the best components, showing extra information about the data. Based on this analysis, only the best components are selected for the subsequent analysis as they practically signify the complete data, and rest of the components can be ignored based on the pre-decided threshold values, namely, Cumulative Proportion of Variance, Eigenvalue and/or Loading (contribution of each variable to the principal component). The traditionally accepted threshold values considered for this experiment are: \(Cumulative\; Proportion\; of \;Variance\) >85%, Eigenvalue >1 (from Kaiser’s rule [13]) and Loading for any variable should be relatively higher than other variables or at least \(Loading^{2}\) >\( 1/Total \;\; Number\; of \; Variables\) [9, 41, 12]. Prior to the PCA, data profiling of the collected data is required, converting mostly categorical fields into numerical fields for the analysis purposes.

Fig. 7
figure 7

PCA graph demonstrating Eigenvalues for the principal components

Table 5 Loading/rotation matrix of the selected most significant principal components

Table 4 illustrates the standard deviation, variances and cumulative proportion of variances for the chosen ten principal components. The cumulative proportion of variance for the first five components is 0.8821793 (\(\approx \) 88%), which is higher than the pre-decided threshold value of 85%, and thus, this PCA analysis suggests, the first five components are the best components based on the collected attack simulation data. The contribution of the rest five components to the data is very low because their cumulative value is only around 12%. Nonetheless, it is also crucial to further evaluate the first five best components, such as examining their eigenvalues >1 (see Fig. 7), which is true for the first four components, however, the fifth component is marginally smaller than 1 (\(\approx \) 1), but the inclusion of the fifth component is crucial to constitute 85% value of cumulative proportion of variance as mentioned earlier [12].

The final evaluation and selection of influential variables are additionally based on the Loading, which shows the correlation between an original variable and a principal component. In this analysis, the Loadings of the first five most significant principal components are computed as shown in Table 5, wherein, the five variables (TCP Flags, TCP Options, ICMP Requests, ICMP Packet Size and UDP Requests) have greater Loadings than the rest of the five variables (TCP Window Size, IP Time-To-Live, IPID Value, IP Type Of Services and TCP Urgent Pointer), which indicate that the first five variables have greater correlation with the five most significant principal components. Moreover, an in-depth analysis of Loadings of the first five variables for five components reveals some new attributes emerging from these five principal components.

The Loadings of the first principal component highlight the higher weighting and importance of the first five variables in the data. The PC2 and PC5 are mainly represented by the two variables TCP Flags and TCP Options due their greater Loadings, thus, the two variables can be grouped as a new TCP attribute (TCP Flags + TCP Options). Similarly, the PC3 and PC4 are mainly represented by the two variables ICMP Requests and ICMP Packet Size due their greater Loadings, thus, the two variables can be grouped as a new ICMP attribute (ICMP Requests + ICMP Packet Size). The fifth variable UDP requests has consistently greater Loadings in all the five principal components, which highlights its higher weighting and importance in the data, but as a separate networking protocol, thus, it can be considered as a separate UDP attribute to combine with TCP attribute and ICMP attribute for representing any principal components from PC1 to PC5. Eventually, these three derived attributes from PCA collectively represent the data and can be used for the subsequent operation.

Fig. 8
figure 8

Fuzzy input variable MTCPF and its fuzzy sets

Fig. 9
figure 9

Fuzzy input variable MICMPF and its fuzzy sets

Fig. 10
figure 10

Fuzzy input variable MUDPF and its fuzzy sets

Fig. 11
figure 11

Fuzzy output variable PAFA and its fuzzy sets

Fuzzy inference system for predicting OS fingerprinting attacks and their severity levels on honeypots

In the previous analysis, the three new attributes (related to TCP, ICMP and UDP) are derived from the five most significant principal components as a sign of OS fingerprinting attack which can be used to predict the potential attack. However, it is not feasible to use the precise value of these attributes for developing a generic prediction approach due to the coverage of several OS fingerprinting tools/techniques. Additionally, it is equally important to correlate these attributes in a way that the proposed prediction approach can predict most OS fingerprinting attacks accurately irrespective of tools/techniques. Fuzzy logic can address both problems effectively by offering a value range for each attribute to cover most OS fingerprinting tools/techniques in the prediction range and correlating attributes in a way that fuzzy rules can cover majority of the OS fingerprinting tools/techniques accurately.

Fuzzy input and output variables

In designing the fuzzy inference system, the three influential attributes are employed and their corresponding fuzzy input variables are derived. Here, TCP flags and TCP options are merged as a single attribute called Malicious TCP Field (MTCPF); ICMP requests and ICMP packet size are merged as a single attribute called Malicious ICMP Field (MICMPF); and the last variable UDP requests is kept unchanged with renaming as Malicious UDP Field (MUDPF). Therefore, the three derived fuzzy input variables are: MTCPF, MICMPF and MUDPF. This PCA-based evolution of only three attributes could offer a highly optimised and effective rule base for better prediction accuracy of the proposed system.

The value ranges for these three fuzzy input variables are determined based on the analysis of thousands of TCP/IP packets collected from Nmap and Xprobe2 experimental simulations and on the main principles of fingerprinting tools. The common value range is set 1–15 packets for all three input variables based on the observation of various streams of TCP/IP packets. Subsequently, this value range is split into three fuzzy sets Low, Medium and High to represent three severity levels of an OS fingerprinting attack in the prediction. The corresponding value ranges set for Low is 0-6- packets, Medium is 4–10 packets, and High is 8–15 packets. Matlab is used to simplify this design. Figures 8, 9 and 10 illustrate three fuzzy input variables MTCPF, MICMPF and MUDPF in Matlab. As a preliminary design, a triangular membership function is selected, however, any other function can be selected and is very straightforward to adapt and analyse in Matlab.

Fig. 12
figure 12

Fuzzy inference system consisting of input and output variables for the proposed system

Fig. 13
figure 13

Fuzzy rules of the proposed system to predict the fingerprinting attack on honeypots

Fig. 14
figure 14

Fuzzy rule base of the proposed system to predict the fingerprinting attack on honeypots

Finally, the fuzzy output variable Probability of an Attempted Fingerprinting Attack (PAFA) is derived to represent the future correlation of three fuzzy input variables as a result. This variable is represented in percentage (0–100%) and split into three fuzzy sets Low, Medium and High to represent three severity levels of OS fingerprinting attack in the prediction. The corresponding value ranges set for Low is 0–40%, Medium is 30–70%, and High is 60–100%. Its Matlab design based on the similar triangular membership function is shown in Fig. 11 (Fig. 12).

Fig. 15
figure 15

Testing results of the CI-enabled honeypot based on the level of accuracy for each prediction for the five selected fingerprinting tools

Fig. 16
figure 16

Testing results of prediction accuracy and prediction sensitivity of the CI-enabled honeypot for the five selected fingerprinting tools

Fuzzy rules and fuzzy rule base system

The fuzzy rules are created based on the correlation of three fuzzy input variables and their corresponding results in the form of a fuzzy output variable. The relation among these three input variables is established in a way that the fuzzy rule base system should be the generic rule base for several OS fingerprinting tools/techniques. The created sample rules are shown in Fig. 13 and the corresponding fuzzy rule base system is shown in Fig. 14. This designed fuzzy inference system (see Fig. 12) is based on Mamdani’s inference method [23].

Test results of the computational intelligence enabled honeypot

The proposed CI-enabled honeypot system is employed to test the prediction results for an attempted OS fingerprinting attack from the five different OS fingerprinting tools: Nmap, Xprobe2, NetScanTools Pro, SinFP3 and Nessus. A total of 250 OS fingerprinting attacks using different attack scripts were carried out, i.e., 50 attacks from each tool. The testing results for all the tools and their level of prediction accuracy are shown in Fig. 15. In calculating the prediction accuracy of the proposed system, the prediction of attack levels as High, Medium and Low are translated into their corresponding percentage accuracy as 100%, 66.7% and 33.3% for the purpose of evaluation. The failure to detect an attempted attack is considered as 0%. The prediction accuracy of the proposed system was 82.67% for Nmap, 82% for Xprobe2, 92% for NetScanTools Pro, 86% for SinFP3 and 80% for Nessus, which is shown in Fig. 16. The overall prediction accuracy of the proposed system was 84.53%. Alongside prediction accuracy, the prediction sensitivity of the proposed system was also calculated based on the total True Positive (TP) and False Negative (FN) to determine whether the system can detect an attempted attack or not. Out of 50 attacks from each tool, the proposed system predicted attempted attacks 46 times for Nmap, 41 times for Xprobe2, 46 times for NetScanTools Pro, 43 times for SinFP3 and 45 times for Nessus. The prediction sensitivity of the proposed system was 92% for Nmap, 82% for Xprobe2, 92% for NetScanTools Pro, 86% for SinFP3 and 90% for Nessus, which is shown in Fig. 16. The overall prediction sensitivity of the proposed system was 88.4%. The prediction accuracy and sensitivity of the proposed system demonstrates its success for the five different types of fingerprinting tools (TCP-based, ICMP-based and combination of both).

Table 6 Fuzzy inference system (FIS) based prediction for attempted OS fingerprinting attacks and their severity levels for various tools

Finally, Table 6 illustrates the summary of prediction results for the five OS fingerprinting tools, where the proposed system can predict the severity level of an attempted OS fingerprinting attack as HIGH for all the five tools with some exceptions as discussed in Table 6. These are related to Nmap and Nessus as they can perform a wide range of OS fingerprinting attacks, some of which rely on HTTP and other application layer protocols, however, this investigation concentrated on the core protocols of the network and transport layer (TCP, ICMP, UDP and IP). As a result of using HTTP and some other application layer protocols, there is a reduced reliance on the core TCP/IP protocols to obtain OS fingerprinting information leading to the generation of lower TCP/IP traffic and fewer abnormalities/patterns for the prediction of the system. However, HTTP and some application layer protocols can be included, with each protocol targeting a very specific attack, significantly increasing the complexity and overheads of the method. Whereas core TCP/IP protocols are included in all tools/attacks based on the TCP/IP stack fingerprinting technique, offering a lightweight generic approach that can predict all TCP/IP based fingerprinting attacks.

Conclusion

This paper presented a computational intelligence enabled honeypot for discovering and predicting an attempted fingerprinting attack by using a Principal Components Analysis and Fuzzy Inference System. The proposed CI-enabled design was focused on the most common OS fingerprinting attack. The simulation of fingerprinting attacks and data (TCP/IP packets) collection was accomplished by employing - KFSensor honeypot tool and Nmap and Xprobe2 fingerprinting tools. Subsequently, based on preliminary observations and empirical evidence, some of the important fields of collected TCP/IP packets were analysed to establish abnormalities or patterns as a sign of an attempted fingerprinting attack. Successively, it applied a PCA to determine the most influential fields, which were further utilised by the proposed FIS to predict the fingerprinting attack and its severity levels. Finally, the proposed system was successfully tested against the five popular fingerprinting tools. This included two previous tools Nmap and Xprobe2 and three new tools NetScanTools Pro, SinFP3 and Nessus; which were not involved in the development of CI-enabled honeypot. Notwithstanding, the CI-enabled honeypot being promising and encompassing several types of TCP/IP based fingerprinting attacks, it may omit some fingerprinting attacks which exploit some of the application layer protocols HTTP, SMB, NTP, SNMP and SSH. Thus, in the future, it is essential to enhance this CI-enabled honeypot and incorporate these fingerprinting attacks. Additionally, the developed fuzzy inference system can be improved to cover new attacks by utilising an adaptive rule base through dynamic fuzzy rule interpolation approach [24].