Empowering Network Security through Advanced Analysis of Malware Samples: Leveraging System Metrics and Network Log Data for Informed Decision-Making

Alharbi, Fares; Kashyap, Gautam Siddharth

doi:10.1007/s44227-024-00032-1

Empowering Network Security through Advanced Analysis of Malware Samples: Leveraging System Metrics and Network Log Data for Informed Decision-Making

Research Article
Open access
Published: 11 June 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Networked and Distributed Computing Aims and scope Submit manuscript

Empowering Network Security through Advanced Analysis of Malware Samples: Leveraging System Metrics and Network Log Data for Informed Decision-Making

Download PDF

141 Accesses
Explore all metrics

Abstract

In the never-ending battle against rising malware threats, cybersecurity professionals were constantly challenged by malware researchers. Businesses and institutions that have fallen prey to these threats that have suffered significant financial losses and enormous disruption to countless lives. As a result, security approaches have evolved to include preemptive measures such as the widespread use of HoneyPots. However, data-driven decision-making was required to improve the effectiveness of such approaches. Therefore, this paper describes a quantitative analysis that assesses various malware samples using system metrics and network log data. The goal is to properly visualise this information and analyse if it can aid in decision-making processes, ultimately leading to the construction of more robust and secure networks. To help with this research, a dashboard application was created that allows the installation of virtual machines, the configuration of virtual networks, and the collection of system metric data from outside sources. The findings of this paper can help greatly improve network security and stay ahead of threats in the cat-and-mouse game.

On the Reliability of Network Measurement Techniques Used for Malware Traffic Analysis

Detecting indicators of deception in emulated monitoring systems

Article 02 January 2019

Investigating Malware Propagation and Behaviour Using System and Network Pixel-Based Visualisation

Article Open access 09 November 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Malware was alwaysa serious and ever-changing danger in the world, constantly targeting new weaknesses and organisations. This ever-changing environment has turned cybersecurity into a perpetual cat-and-mouse game, with security analysts and malware researchers competing to outmanoeuvre one another. Unfortunately, malware researchers frequently have the upper hand, forcing security professionals to react to emerging threats. To combat this, there is an urgent need for additional people to conduct proactive malware research and work on preventative approaches. Understanding how malware spreads is essential for countering its impacts. While research on external propagation is well established such as [1,2,3,4], research on malware transmission within local area networks frequently relies significantly on sophisticated mathematical models, making it less accessible to cybersecurity enthusiasts from varied backgrounds. The goal of this paper is to visualise malware propagation utilising variables and characteristics from devices on local area networks, giving a more user-friendly manner to understand the nature of propagation and aid in security decision-making.

This paper has two key goals that complement one another: contributing to a thorough understanding of malware propagation and improving response approaches during active attack events. The primary goal is to create clear and understandable visualisations of malware propagation. These visualisations serve several functions. For starters, they help other researchers develop advanced anti-malware capabilities by providing clear insights into how malware spreads. Researchers can create more effective defences and develop targeted solutions to combat malware threats by analysing the dissemination patterns. Second, visualisations are critical in informing decision-makers about the adoption of network security features. Decision-makers can implement security solutions more effectively if they understand how malware flows through networks. This helps them to effectively and proactively protect their systems, lowering the risk of malware outbreaks and potential damage. Furthermore, the visualisations help researchers grasp virus behaviour and its impact on networks. These visualisations support research goals by simplifying complex propagation patterns, allowing researchers to better understand the complexities of malware threats. The second goal is to discover significant characteristics of malware transmission to improve response times during active attack events. This analysis dives into several critical areas of malware distribution. First, it investigates whether malware spread is deterministic in some cases. Responders can anticipate malware behaviour and build preventative measures by researching whether it follows predictable patterns in certain circumstances. Second, the paper investigates the effect of Internet Protocol (IP) address location on propagation. Understanding how IP address features influence malware distribution allows responders to concentrate their efforts on vulnerable locations, increasing the efficacy of their response approaches. Furthermore, the paper looks into how network structure affects propagation. Responders can design networks that are better equipped to limit the risk of rapid spread and contain malware outbreaks by understanding the role of network architecture in malware propagation. Finally, the paper investigates how the study’s findings might be utilised to respond proactively to active threats. Consider approaches such as dynamically disconnecting machine clusters or deploying HoneyPots and HoneyNets for greater network protection and better malware event observation. Finally, by using enhanced visualisations, we hope to gain a better understanding of malware and its transmission patterns. At the same time, it aims to uncover critical traits that will allow responders to mount proactive defences against active threats. The findings of this paper have the potential to transform decision-making processes and response approaches, thereby improving network security and reducing the effect of malware attacks.

The paper is structured into six main sections to comprehensively explore the study’s objectives and findings. Section 1 sets the stage by providing an overview of the research goals and the significance of clear and interpretable visualizations in understanding malware propagation and improving response strategies. In Sect. 2, the existing literature on malware propagation and analysis methodologies is reviewed, laying the foundation for the study and highlighting the contribution of this research in addressing specific gaps. Section 3 is further divided into several sub-sections: Sect. 3.1 explains the step-by-step approach taken to achieve the goals, encompassing tool development, data collection, and analysis phases. Section 3.2 addresses the ethical implications of working with actual malware samples and ensuring a secure sandbox environment. Section 3.3 details the setup used for collecting data, while Sect. 3.4 outlines the strategy for observing and visualizing malware activities. Section 4 delves into the results and observations obtained from testing different malware samples. Sub-sections (Sect. 4.1–4.7) present detailed analyses of specific malware variants, such as WannaCry, CryptoLocker, Locky, Petya, NotPetya, and NotPetya Extended, highlighting their propagation behaviours and effects on the network. In Sect. 5, the study’s contributions are discussed in two sub-sections: Sect. 5.1 emphasizes the success of the visualizations in conveying relevant metrics clearly, while Sect. 5.2 reflects on the research questions, conclusions, and potential areas for improvement. Section 5.3 explores potential avenues for further research and development, building on the research’s insights and limitations. Finally, in Sect. 6, the paper offers a concise summary of the research’s key findings, the effectiveness of the visualizations, and the implications of the study’s outcomes for understanding malware propagation and enhancing response strategies. The conclusion also serves to reiterate the significance of the research in advancing the field of cybersecurity and malware analysis.

2 Related Works

The majority of malware propagation research hhadas a strong mathematical focus, which wasstill important to this topic. Understanding existing knowledge on malware spread aids in the formulation of hypotheses and the validation of our findings and those of other researchers. Yu et al. [5] used epidemic theory and a two-layer epidemic model to investigate malware propagation. In contrast to control system theory, which tries to identify and contain malware transmission, they concentrated on epidemic modelling because it is more concerned with the number of compromised hosts and their dispersion. In their study, Yu et al. [5] differentiated between epidemiological studies that examine malware transmission and control system studies that try to prevent proliferation. Their findings indicated three stages in large-scale network propagation: an early exponential distribution, a late power law distribution with a short tail, and a final power law distribution. Guillen and Rey [6] expanded on Yu et al. [5] work by recognising the significance of “compartment devices” that can spread malware without being infected directly. When researching malware behaviours in networks containing compartment devices, this observation emphasises the importance of network trace analysis. Hosseini and Azgomi [7] developed a model that identifies different system states during malware propagation using a rumour-spreading approach. This approach facilitates classifying the roles of machines in an active attack on a local area network and may be useful in visualisations. Zhuo and Nadjin [8] created the “MalwareVis” tool to visualise malware network traces, with an emphasis on specific attributes such as protocol and IP address. While their research provides insights into complicated malware analyses, it may be difficult for newcomers to comprehend without a thorough understanding of the underlying formulas. Gove and Deason [9] used a unique approach based on discrete Fourier transforms to detect malware-related network behaviour. Their approach is sophisticated but exciting, as it permits malware detection in difficult settings. Afianian et al. [10] summarised common tactics used by malware to avoid detection and investigation. Chakkaravarthy et al. [11] investigated malware detection approaches within the network, focusing on payload fragmentation and session splicing. Miramirkhani et al. [12] investigated artefacts used by malware researchers to detect a virtual environment on sandbox machines. Sharafaldin et al. [13] developed a labelled dataset for intrusion detection systems that contain network traffic attributes that can be used to detect benign or intrusive flows. Creese et al. [14] visualised enterprise network attacks and their potential effects, providing useful insight into the approach of our paper. Nataraj et al. [15] used byte plot visualization as grayscale images for automatic malware detection. They achieved 97.18% accuracy using GIST features and the K-nearest neighbor algorithm. Naeem et al. [16] proposed an image-based malware detection system that uses D-SIFT and GIST features. They achieved 97.4% accuracy but limited their analysis to Windows malware only. Su et al. [17] developed a lightweight neural network for distinguishing DDoS malware and goodware in IoT applications. They achieved 94% accuracy but had a limited dataset. Makandar and Patrot [18] proposed a SVM classification using wavelet transform and GIST features, achieving 98.84% and 98.88% accuracy for KNN and SVM, respectively. Han et al. [19] analyzed malware using entropy graphs with bitmap images for classification. Limited to Windows portable executable files and cannot detect packed malware samples. Tuncer et al. [20] proposed a model for malware classification using hybrid features like singular value decomposition and local binary patterns. Achieved an 88.08% accuracy rate. Various deep learning-based approaches were presented by different researchers. For instance: Robert et al. [21] used CNN to classify malware traffic with 91.32% accuracy. Bendiab et al. [22] employed a deep CNN to identify malicious network traffic in IoT environments with 94.5% accuracy. Kalash et al. [23] designed a deep CNN model for malware categorization achieving high accuracy rates on different datasets. Cui et al. [24] used CNNs to classify malicious code binaries achieving 97.6% accuracy. Wang et al. [25] applied deep learning to improve malware analysis for detecting zero-day threats. The study of these various methodologies and characteristics in malware propagation, evasion strategies, and malware analysis approaches will be critical for the methodology and visualisation tool of our study. Table 1 contains a summary of the aforementioned studies.

Table 1 Summary of Studies on Malware Propagation and Detection Approaches

Full size table

3 Methodology

Since there is a clear sequence of dependency among the goals, this paper wasuse a linear approach to achieving them. The paper’s evolution was determined by this dependency rather than personal taste. Each goal was addressed sequentially, ensuring that the essential groundwork is created before proceeding to the next phase. This approach was for the goals to be met seamlessly and systematically, maximising the effectiveness of the research and visualisation tool creation.

3.1 Methodology and Phases of the Study

The paper wasproceed in a linear order, with different phases leading to the achievement of the goals. The first phase entails developing a tool to extract system details and possibly the moment of infection from an affected machine for later malware propagation research. Before developing the solution, an extensive study on malware analysis methodologies for sandbox environments was undertaken to verify that the virtual environment used for testing prevents malware from spreading outside the virtual environment. Understanding how to mislead malware with lower context awareness approaches wasalsoinvestigated. Vulnerabilities exploited by malware to exit virtual environments wasalso investigated, both to avoid them and, if time allows, to study them. Once built, the tool was tested on non-infected systems first, followed by testing on systems under active attack, in accordance with the first goal. Following that, the tool was deployed to extract data from devices that are actively being attacked, and continual development may be required to adapt the tool to varied malware behaviors, in accordance with goal three. The third phase is analysing, visualising, and making conclusions from the results while sticking to the criteria given in goal two. Phases two and three may be repeated numerous times, with each run analysing a different piece of malware, depending on the amount of spare time.

3.2 Ethical Considerations

This paper discussedtwo ethical issues, one of which wasa demand and the other a declaration. Because we are working with actual malware samples, the sandbox environment had to be highly secure to prevent malware dissemination outside of the controlled environment. A less secure sandbox could represent a risk to home, business, or school networks, putting devices connected to those networks at risk. As a result, it wascritical to optimise the sandbox environment and extensively investigate the malware samples for any context-based attacks that might compromise the external network. The declaration i was that no malware wouldbe created as a result of this paper. This research will only looked at existing malware variants that had been already investigated and avoided. The same ethical issues applied to testing setup, and it was critical to provide a secure and regulated environment to avoid any unexpected repercussions of malware transmission.

3.3 Data Collection and Network Configuration

A three-tiered approach was used to collect the data required for visualising malware activities and dissemination. The first two approaches entailed utilising a specially designed program to collect metrics from all running virtual machines every five seconds and to record screenshots of the current view every thirty seconds. This approach allowed us to monitor resource utilisation in the background and identify any substantial changes in the foreground during runtime. When we ran a WannaCry^{Footnote 1} sample, for example, we could see resource spikes throughout the encryption process and the display of the ransom letter thereafter. The third way of data collection involved utilizing WireShark^{Footnote 2} to record communications between devices on the testing network. WireShark was installed on a second node and ran in promiscuous mode while monitoring the primary network device. This allowed us to keep track of all activity between the nodes as shown in Fig. 1. The network configuration for this paper will be simple, consisting of a local area network with four vulnerable Windows 7 virtual machines. These machines were purposefully set up to be as vulnerable to malware transmission as possible. On the network, each virtual machine had an open, password-less, limited encryption fileshare, all ports open, and user access control deactivated. On the network, a fifth node running Ubuntu 20.04 LTS was set up, running WireShark in promiscuous mode. To avoid malware spreading to the host machine, the complete network was constructed within Oracle Virtualbox’s internal network setup. The Ubuntu system was the only one with access to the external host, and it used a Virtualboxfileshare to extract the network activity. During active testing, this fileshare wasdisabled and deleted, and it was only be mounted and reopened after testing was completed and the infected devices were reverted.

3.4 Quantitative Testing Approach

The testing wasbe mostly quantitative, to observe and visualise the activity and dissemination of multiple malware samples. The goal was to obtain a thorough understanding of the malware landscape and adequately demonstrate the capabilities of our setup. While the emphasis was on breadth, the findings were not be lacking in depth. Each sample was to be analysed using one of the three ways, and the network log provided a variety of factors to investigate, including alternative protocols. All of the samples utilised in the testing were obtained from TheZoo^{Footnote 3}, an online malware collection. Each sample was tested for a total of twenty minutes, including ten minutes of general activity and ten minutes with the sample actively running on node 1. Additional time, roughly ten minutes each wasallotted for setup and teardown, resulting in a total testing time of around forty minutes, assuming no issues. This time frame was designed to make quantitative approaches easier to apply and to standardise the analysis time. It wasunderstood that certain samples may not activate during the testing time; this was investigated and duly noted to understand any usual or abnormal conduct.

4 Experimental Analysis

In this section, we ran tests with several malware samples and collect important metrics using the previously specified configuration. Our malware samples came from a variety of sources. The major source wasTheZoo malware repository on GitHub. TheZoo is a community paper dedicated to collecting and cataloguing various malware samples and variants. Furthermore, we used the MalwareDatabase^{Footnote 4} collection, which was maintained by a user named NTFS123 and contained unique samples not available in TheZoo, such as different versions of WannaCry. We used a variety of technologies in conjunction with our dashboard programme to validate certain attributes or versions of the malware samples. VirusTotal^{Footnote 5}, developed by HispasecSistemas, is one such tool. We were able to use file or Uniform Resource Locator (URL) submissions to analyse suspicious files or links and compare them to other entries with similar looks. It was especially beneficial in distinguishing between malware samples that appeared similar but had fundamental differences. To read the full details of the samples used in the following section, enter their MD5 hashes into VirusTotal. The Cuckoo^{Footnote 6} sandbox environment, an open-source automated malware analysis environment, was also used in our research. Cuckoo functioned as a stand-in for our setup. If a sample looked to be inactive in our environment but VirusTotal reports indicated otherwise, we would run it in the Cuckoo sandbox to evaluate any potential contextual awareness capabilities displayed by the malware.

4.1 WannaCry (Wormless)

A wormless variation of WannaCry was used to fine-tune data collection throughout the development phase. Because this variation lacks the payload’s worm component, it cannot spread over a network like its infamous sister that exploited the EternalBlue vulnerability [26]. The wormless variation was chosen to save test tear-down time during development while still ensuring that data was recorded by the dashboard. WannaCry was ideal for this purpose since it exhibits rapid and easily identifiable activity, making it simple to pinpoint certain times in time. Several tests were run throughout the development process, and adjustments and enhancements were made along the way. At first, the tests simply featured the four nodes and the dashboard program. The final test, however, included the Ubuntu node to capture network events. A test with only one infected machine was visualised for clarity and understanding of the unique activities of an infected system compared to uninfected ones. This gives us an idea of how our network visualisations would look, especially when compared to subsequent tests including numerous infected workstations, which can cause network activity to become highly chaotic. Figure 2 depicts node 1’s infection and subsequent data encryption activity. The heightened activity lasts for the duration of the game. Notably, nodes 3 and 4 witness an increase in activity for unknown reasons. Because the sample lacks the worm component, it cannot propagate to these nodes, leading us to believe that the increased activity is due to concurrent network activity of some kind, while the exact source is unknown. Surprisingly, despite the absence of the worm component and the EternalBlue attack, the infected node 1 attempts to connect to node 2, as seen in Fig. 3. This implies that the sample can seek the Server Message Block (SMB) vulnerability and make connections, but not exploit it. This action raises concerns about the nature of this WannaCry sample, leading us to hypothesise that it could be a neutered sample with the worm component removed or from an older version of WannaCry before the worm component was implemented. While it may appear improbable, we prefer the former explanation because it appears more believable for a malware sample. Figure 4 depicts Simple Service Discovery Protocol (SSDP) activity that looks to be quite typical, with no evident anomalies in its nature. This is to be expected, as the underlying operating system (Windows in this scenario) would still be functioning and regular network protocols would continue to function during the ransomware encryption process. It stands to reason that ransomware that affects the entire operating system, such as Petya/NotPetya^{Footnote 7}, will cause more severe aberrations in network activities. Figure 5 depicts Transmission Control Protocol (TCP) activity across the network. Node 4 received and created no TCP activity for some unknown reason, which could simply be a coincidence. Notably, there are several TCP connections established between IP addresses 192.168.0.10 and 192.168.0.11. We also see that the first round of TCP connections collected between 192.168.0.10 and 192.168.0.11 resembles the first round of TCP connections captured between 192.168.0.12 and 192.168.0.11. This implies that the initial round of communication is typical. The third mass of TCP connections, on the other hand, wasnot like its predecessors. As SMB operates on top of TCP, we hypothesise that this is the connection collected during the SMB activity. This notion wasbacked further by the following erratic and continuous TCP activity, which replicates the SMB graph. The User Datagram Protocol (UDP) and browser activity, illustrated in Figs. 6 and 7, appeared to be fully normal, lending credence to the concept that normal network activity coexists with the expected infected behaviour.

4.2 WannaCry (Wormless)

WannaCry, also known as WannaCryptor, was selected as the first propagating sample for observation and visualisation. Before Marcus Hutchins discovered its killswitch in 2017, it caused substantial damage. The National Health Service (NHS) in the United Kingdom reported that it infected 603 primary care and other NHS organisations and deactivated 1220 diagnostic devices to prevent future contamination. Because of its capacity to rapidly disseminate utilising the SMB vulnerability, WannaCry was a perfect pick for the first test. WannaCry actively monitors the network for open SMB ports (445) and uses the EternalBlue vulnerability to propagate to additional machines after infecting the first machine. It can even infect machines that are already infected. The first metric examined was CPU use across all network nodes. The hypothesis was that when WannaCry began encrypting, CPU consumption on a single node would noticeably increase, followed by successive jumps in CPU usage as the malware spread. Furthermore, a general rise in CPU used across all devices was expected as a result of the ransom note running and continual network scanning following infection. Figure 8 mostly confirms these hypotheses. The first ten minutes indicate irregular activity on node 1, which was most likely due to startup programmes upgrading or the node asking for its connection status. There was a big surge in activity on node 1 (blue) around the midpoint, indicating the deployment of the WannaCry sample. Followed by the spike of activity on node 1, there was an increases in activity on node 3 (green), followed by lesser spikes on nodes 2 (red) and 4 (orange). These initial increment signified infection and encryption points that occured within two minutes. Following that, there wasa general increase in CPU utilisation across all nodes, with repeated spikes, which mightbe due to WannaCry’s re-infection behaviours. The visualisation in Fig. 9 depicts WannaCry’s SMB activity across the four nodes. When a node is the source of an SMB activity, the orange colour denotes the source, while the blue colour represents the destination. The purple lines reflected the SMB activity’s direction. The infection begun on node 1, and WannaCry spreaded to node 3. It extended from there to node 4 and finally to node 2. The subsequent SMB activity can be attributed to infected workstations seeking to reinfect the network’s previously infected machines. One intriguing discovery wasthe behaviour of node 1, which first talked with node 3 and later showed lower activity saved for interactions with nodes 3 and 2. The cause for this unique pattern wasunknown, although WannaCry is notorious for being loud and active due to its ransomware nature. TCP activity (Fig. 10) appeared to be similar to the later half of the SMB graph distribution. This resemblance wasunderstandable given that the SMB protocol runs on top of the TCP protocol on port 4445, which wasabused by this variant of WannaCry. As a result, the first half of the graph depicted general TCP activity, whereas the second half depicted both general and SMB-related activity. However, without prior knowledge about SMB activity, Fig. 10 was insufficient to identify whether propagation had happened. In a WannaCry-infected network, the SSDP visualisation indicated no notable deviations or irregularities as shown in Fig. 11. It appeared to function regularly and consistently, making determining the specific site of dissemination difficult. Similarly to SSDP, there was no discernible activity within UDP, BROWSER, or HTTP that indicated WannaCry dissemination as shown in Figs. 12, 13 and 14. After infection, all of these processes behave consistently. In terms of screenshot collection, as shown in Fig. 15, we estimated that the duration between a clean network and full infection is about two minutes and thirty seconds. As shownby the final Fig. 16 acquired in this test, the visualization demonstrated WannaCry’s inclination to reinfected previously afflicted machines. With further examination utilising Cuckoo as shown in Fig. 17, unexpected findings revealed that WannaCry demonstrated contextual awareness. According to Cuckoo, WannaCry was found specifically searching for the Cuckoo agent.py in the starting directory, indicating its ability to locate its testing environment.

4.3 CryptoLocker

In this paper, we examined three versions of CryptoLocker^{Footnote 8}, which emerged in September 2013, November 2013, and January 2014, respectively. However, during our tests, none of the CryptoLocker ransomware variants executed their payload within the twenty-minute time frame. Upon further investigation, we discovered a unique reason behind this behaviour. Once launched, the CryptoLocker ransomware created a visible process in the task manager, inserts itself into the startup programs, and then deletes the original executable that initiated these processes. This observation was evident in the task manager (Figs. 18 and 19). While we expected no internal propagation within local area networks, we did not anticipated the lack of encryption, which revealed an intriguing situation. After removing the injection method, CryptoLocker was requiredto communicate with its command and control server, known as the GameOver ZeuS server. During this communication, the server registers CryptoLocker as part of its botnet and generated 2048-bit RSA keys, which was sends back to the CryptoLocker on the client side to initiate file encryption. This situation presented two interesting scenarios: first, the GameOver ZeuS server was taken down in 2014, rendering any communication with the server impossible; second, even if the server were hypothetically available, our testing setup operates within an internal network structure, making it unable to communicate with the server. Consequently, we encountered an unforeseen difficulty, as malware requiring external influence to perform its full function would not operate entirely within our setup. In essence, our system flaw lies in its suitability for testing dropper-type malware, while it may not be as applicable for downloader types or botnets.

4.4 Locky

In contrast to CryptoLocker, Locky^{Footnote 9} provided a projected difficulty in malware testing. Locky, like CryptoLocker, performs self-deletion after execution. Locky, on the other hand, has contextual awareness, which means it has numerous approaches for determining whether it is running in a virtual environment. During our test, the Locky sample self-destructed and did not perform any additional actions. We searched for all claimed infection indications in registry and system files but found none, leading us to conclude that this is the first time in our testing that a sample correctly identified that it was within a virtual machine and chose not to run. In a previous scenario, WannaCry demonstrated minimal contextual awareness by hunting for the Cuckoo agent in the starting directory. However, because it was not looking for the normal indicators of a virtual machine, it was able to execute in our configuration, which employs the Cuckoo environment, as shown in Fig. 20.

4.5 Petya

NotPetya/ PetrWrap is the Petya variation to watch for potential propagation since it uses the SMB Eternalblue vulnerability to spread. To provide a full comparison, we first visualised the known non-propagating variation. Petya differs from normal ransomware in that it infects the master boot record, causing the system to boot into the ransomware rather than the operating system. As a result of the system becoming infected, rebooting, and encrypting the files, we should expect significant increases in CPU consumption as shown in Fig. 21. This test’s SMB activity is nearly identical to the SMB activity seen in the 5.2 WannaCry [Wormless] test. We initially suspected that the WannaCry sample was the start of the SMB EternalBlue worm’s implementation, however, the findings of our test have led us to reject that hypothesis. Instead, we attribute this unusual activity to erroneous SMB activity caused by our setup rather than by the samples as shown in Fig. 22. While this may interfere with our study of SMB activity from actual EternalBlue samples, we believe wasnot be materially deleterious because the 5.3 WannaCry analysis revealed that when a sample uses SMB to propagate, it generates highly loud and distinct activity. Similarly to the previous visualisations, there is little remarkable activity observed during the test in the TCP, UDP, SSDP, or BROWSER protocols as shown in Figs. 23, 24, 25 and 26. A deeper examination of network activity may uncover deviations from the norm, although, at first inspection, there are no significant differences between normal activity and the period when the sample is engaged and the environment is under active danger.

4.6 NotPetya/PetrWrap

This Petya variant used several exploits, including the SMB vulnerability Eternalblue, as well as a backdoor in M.E.Doc accounting software and EternalRomance. Despite its aggressive propagation capabilities in real-world circumstances, we tested two variations of NotPetya in our environment, and while both successfully attacked node 1, neither was able to infect the other nodes. The CPU utilisation during NotPetya testing was noteworthy because it demonstrated unambiguous proof of activation and sustained activity after the malware was launched as shown in Fig. 27. Spikes in activity on other nodes also signalled possible propagation attempts. The SMB activity after activation suggested that the initially infected node was attempting to communicate with other nodes as shown in Fig. 28. The increasing TCP consumption supported NotPetya’s promiscuous nature even more as shown in Fig. 29. Standard network protocols, on the other hand, showed no deviations as shown in Figs. 30, 31 and 32. NotPetya caused a system restart to be delayed by creating a scheduled job for propagation, which may have occurred outside of the testing window. The next test concentrated on monitoring until the infected node reboots, delivering further information.

4.7 NotPetya Extended

Because of its distinct operating style, NotPetya worked differently from other ransomware. It stalls spread by infecting additional machines in the network after the first infection and organising a lockdown via forced restarts. NotPetya successfully spread to all machines on the virtual network in our modified tests. The CPU utilisation during the longer test demonstrates a clear infection pattern, with activity reverting to normal levels after infection until the synchronised restart initiates the file encryption process as shown in Fig. 33. NotPetya’s SMB and TCP operations are denser and more frequent than WannaCry’s as shown in Figs. 34 and 35. Standard approaches, on the other hand, indicate consistent and normal activity bothbefore and after infection as shown in Figs. 36, 37 and 38. It wasclear that our current testing approach has limits, and a qualitative approach may be more appropriate in future studies to comprehend malware processes and encourage complete payload deployment.

5 Discussions

In this section, we examined the findings in light of the goals stated in the introduction section of this paper. In addition, we looked into any additional interesting observations and emphasise the most important discoveries from the testing phase.

5.1 Effective Visualizations

Our primary goal was to produce clear and simple propagation visualisations. We expanded this goal throughout the development process to incorporate a larger perspective while still attaining the same purpose. Instead of focusing exclusively on visualising observed propagations, we intended to collect metrics and logs to form judgements about whether or not propagations occurred. This move allowed us to examine the effects of malware propagation on the network rather than just displaying the infection’s pathways. We believe we have succeeded in developing clear and interpretable visualisations. Each visualisation efficiently conveys relevant metrics in appropriate formats. Using multi-line charts to show CPU utilisation across nodes, for example, proved to be a simple yet effective means of communicating changes in values over time. The presentation of the values together allowed for easier comparisons between nodes. Furthermore, in accordance with the ideas outlined in Raffael Marty’s^{Footnote 10} book “Applied Security Visualisation”, we displayed typical activities for comparison before adding malware and by evaluating samples without network propagation capabilities. Our visualisations improved over time, adding features such as labels, legends, and IP addresses to increase readability and provide a clearer view of the network’s condition. Despite these advances, we recognise that our visualisations are not perfect. The visualisations of network logs, in particular, might benefit from fine-tuning to properly depict the difference between transmitting and receiving IP addresses. Furthermore, when applied to bigger networks, the scalability of the CPU utilisation graphs may be an issue. Overall, we feel that our visualisations fulfil a fundamental role of informing, instructing, and assisting in study and decision-making. We can improve the visualisations with additional study and tweaks to contribute at a better level.

5.2 Addressing Research Questions and Limitations

Our second set of goals included explicit questions about certain qualities and use situations, which we will now address in order. Based on our research, we believe malware propagation is not completely deterministic. While malware may search for IP addresses that are similar to its own, other factors such as speed can influence the order in which it spreads. In our WannaCry experiments, for example, 192.168.0.12 became infected before 192.168.0.11, most likely because it received and accepted the SMB connection faster than the others. As a result of several contributing factors, the propagation order might appear random or pseudo-random. Unfortunately, due to restrictions in constructing virtual networks and hardware capability, we were unable to properly investigate the impacts of network structure on propagation. Our existing system and visualisations fall short of successfully responding to an active threat event. We collect data for later analysis and visualisation, but no real-time analysis or response is performed. Adapting our system to visualise real-time data would still introduce some latency, making it difficult to respond quickly to rapidly spreading attacks like WannaCry, which affected our four machines in less than two and a half minutes. While our visualisations can provide insights and generalisations for future decision-making under active danger, a real-time response system that can make effective decisions on its own would require extensive development and training.

5.3 Future Works

Several other discoveries were found throughout the testing phase that we believe are extremely significant. One such discovery is the contrast between malware’s external and interior dissemination approaches. External propagation happens when malware leaves the network to spread, whereas internal propagation occurs within the network by utilising exploits found in the operating system. Due to the lack of an internet connection in our testing environment, we were unable to investigate malware that relies on external connections, such as compromised email accounts or communication with command and control servers. Another point is the probability of overlooked cases as a result of our quantitative study’s standardised approach. Malware that targets specific programs rather than the operating system may not have been thoroughly tested or detected in our configuration. To solve this, further studies may need to be conducted over a longer period to alter the testing approach to suit such occurrences. We also looked into the concept of contextual awareness, which is when malware detects that it is executing in a virtual environment. While our reduced testing approach may miss certain human interaction artefacts, testing on real-world machines poses ethical challenges. Furthermore, modern evasion tactics are learning to recognise virtual environments such as VirtualBox and VMWare, which may have an impact on the effectiveness of our testing. These findings highlighted the importance of a balanced approach to malware research, taking into account external versus internal propagation approaches, and being aware of contextual evasion approaches. Future research should address these issues to improve the efficacy of malware testing and analysis.

6 Conclusions

We have made tremendous progress throughout this paper by developing a dashboard program capable of launching and collecting metrics from many virtual machines. We built visualisations that provide vital insights into the behaviour of hacked devices and their impact on the overall network. Our findings pave the way for further research on malware spread and characteristics, particularly in the context of Windows 7 operating systems. Moving forward, we will concentrate on improving the dashboard programme by seamlessly integrating real data visualisation on the same webpage. We intend to redesign the virtual network formation procedure to speed up testing phases and reduce setup and tear-down time. In addition, we intend to investigate other approaches for acquiring metric data and devise approaches to prevent malware’s contextual awareness. In addition, we will continue to develop our visualisations to effectively explain findings and provide many possibilities for diverse points of view. Extending our research to include larger corporate networks will provide insights into active threats on interconnected systems with varying topologies. We invite other researchers to delve deeper into this topic and contribute to ongoing efforts to combat and understand malware.

Data Availability

Not applicable.

Notes

References

Falana OJ, Sodiya AS, Onashoga SA, Badmus BS (2022) Mal-Detect: an Intelligent visualization Approach for Malware Detection. J King Saud Univ - Comput Inform Sci 34:1968–1983. https://doi.org/10.1016/j.jksuci.2022.02.026
Article Google Scholar
Saidia Fascí L, Fisichella M, Lax G, Qian C (2023) Disarming visualization-based approaches in Malware Detection systems. Computers Secur 126:103062. https://doi.org/10.1016/j.cose.2022.103062
Article Google Scholar
Ullah F, Srivastava G, Ullah SA (2022) Malware Detection System using a Hybrid Approach of multi-heads attention-based Control Flow traces and image visualization. J Cloud Comput 11:1–21. https://doi.org/10.1186/s13677-022-00349-8
Article Google Scholar
Wang Z, Wang W, Yang Y, Han Z, Xu D, Su C (2022) CNN- and GAN-Based classification of malicious code families: a code visualization Approach. Int J Intell Syst 37:12472–12489. https://doi.org/10.1002/int.23094
Article Google Scholar
Yu S, Gu G, Barnawi A, Guo S, Stojmenovic I (2015) Malware Propagation in large-scale networks. IEEE Trans Knowl Data Eng 27:170–179. https://doi.org/10.1109/TKDE.2014.2320725
Article Google Scholar
Hernández Guillén JD, del Martín A (2018) Modeling Malware Propagation using a carrier compartment. Commun Nonlinear Sci Numer Simul 56:217–226. https://doi.org/10.1016/j.cnsns.2017.08.011
Article MathSciNet Google Scholar
Hosseini S, Azgomi MA (2016) A model for Malware Propagation in Scale-Free Networks based on rumor spreading process. Comput Netw 108:97–107. https://doi.org/10.1016/j.comnet.2016.08.010
Article Google Scholar
Zhuo W, Nadjin Y, MalwareVis (2012) Entity-Based Visualization of Malware Network Traces. In Proceedings of the ACM International Conference Proceeding Series;; pp. 41–47
Gove R, Deason L (2018) Visualizing Automatically Detected Periodic Network Activity. In Proceedings of the 2018 IEEE Symposium on Visualization for Cyber Security, VizSec; Institute of Electrical and Electronics Engineers Inc., May 7 2019
Afianian A, Niksefat S, Sadeghiyan B, Baptiste D (2019) Malware Dynamic Analysis Evasion techniques: a Survey. ACM-CSUR 52. https://doi.org/10.1145/3365001
Sibi Chakkaravarthy S, Sangeetha D, Vaidehi VA (2019) Survey on malware analysis and mitigation techniques. Comput Sci Rev 32:1–23
Article MathSciNet Google Scholar
Miramirkhani N, Appini MP, Nikiforakis N, Polychronakis M (2017) Spotless Sandboxes: Evading Malware Analysis Systems Using Wear-and-Tear Artifacts. In Proceedings of the Proceedings - IEEE Symposium on Security and Privacy;; pp. 1009–1024
Sharafaldin I, Lashkari AH, Ghorbani AA (2018) Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization. In Proceedings of the ICISSP - Proceedings of the 4th International Conference on Information Systems Security and Privacy; 2018; Vol. 2018-Janua, pp. 108–116
Creese S, Goldsmith M, Moffat N, Happa J, Agrafiotis I, CyberVis (2013) Visualizing the Potential Impact of Cyber Attacks on the Wider Enterprise. In Proceedings of the IEEE International Conference on Technologies for Homeland Security, HST 2013; 2013; pp. 73–79
Nataraj L, Yegneswaran V, Porras P, Zhang JA, Comparative (2011) Assessment of Malware Classification Using Binary Texture Analysis and Dynamic Analysis. In Proceedings of the Proceedings of the ACM Conference on Computer and Communications Security;; pp. 21–29
Naeem H, Guo B, Naeem MRA, Light-Weight (2018) Malware Static Visual Analysis for IoT Infrastructure. In Proceedings of the International Conference on Artificial Intelligence and Big Data, ICAIBD 2018; Institute of Electrical and Electronics Engineers Inc., June 25 2018; pp. 240–244
Su J, Vargas DV, Prasad S, Sgandurra D, Feng Y, Sakurai K (2017) Lightweight classification of IoT Malware based on image recognition. 17. https://doi.org/10.1145/nnnnnnn.nnnnnnn
Makandar A, Patrot A (2017) Malware Class Recognition Using Image Processing Techniques. In Proceedings of the 2017 International Conference on Data Management, Analytics and Innovation, ICDMAI; Institute of Electrical and Electronics Engineers Inc., October 18 2017; pp. 76–80
Han KS, Lim JH, Kang B, Im EG (2015) Malware Analysis using visualized images and Entropy Graphs. Int J Inf Secur 14:1–14. https://doi.org/10.1007/s10207-014-0242-0
Article Google Scholar
Tuncer T, Ertam F, Dogan S (2021) Automated malware identification method using image descriptors and singular value decomposition. Multimedia Tools Appl 80:10881–10900. https://doi.org/10.1007/s11042-020-10317-6
Article Google Scholar
Shire R, Shiaeles S, Bendiab K, Ghita B, Kolokotronis N (2019) Malware Squid: A Novel IoT Malware Traffic Analysis Framework Using Convolutional Neural Network and Binary Visualisation. In Proceedings of the Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer Verlag, September 7; Vol. 11660 LNCS, pp. 65–76
Baptista I, Shiaeles S, Kolokotronis NA (2019) Novel Malware detection system based on machine learning and binary visualization. In proceedings of the 2019 IEEE international conference on communications workshops, ICC Workshops 2019 - Proceedings; Institute of Electrical and Electronics Engineers Inc., May 1
Kalash M, Rochan M, Mohammed N, Bruce NDB, Wang Y, Iqbal F (2018) Malware Classification with Deep Convolutional Neural Networks. In Proceedings of the 2018 9th IFIP International Conference on New Technologies, Mobility and Security, NTMS 2018 - Proceedings;; Vol. 2018-Janua, pp. 1–5
Cui Z, Du L, Wang P, Cai X, Zhang W (2019) Malicious code detection based on CNNs and multi-objective algorithm. J Parallel Distrib Comput 129:50–58. https://doi.org/10.1016/j.jpdc.2019.03.010
Article Google Scholar
Wang C, Zhao Z, Wang F, Li Q (2021) A novel malware detection and family classification Scheme for IoT based on DEAM and DenseNet. Secur Communication Networks 2021. https://doi.org/10.1155/2021/6658842
Kashyap GS, Malik K, Wazir S, Khan R (2022) Using machine learning to quantify the Multimedia Risk due to Fuzzing. Multimedia Tools Appl 81:36685–36698. https://doi.org/10.1007/s11042-021-11558-9
Article Google Scholar

Download references

Funding

The authors would like to thank the Deanship of Scientific Research at Shaqra University for supporting this work.

Author information

Authors and Affiliations

Department of Computer Science, College of Computing and IT, Shaqra University, Shaqra, Saudi Arabia
Fares Alharbi
IIIT Delhi, New Delhi, India
Gautam Siddharth Kashyap

Authors

Fares Alharbi
View author publications
You can also search for this author in PubMed Google Scholar
Gautam Siddharth Kashyap
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization and methodology, G.S.K.; software, F.A.; validation, G.S.K. and F.A.; experiment analysis, F.A.; investigation, G.S.K.; writing—original draft preparation, G.S.K.; writing—review and editing, F.A. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Gautam Siddharth Kashyap.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Alharbi, F., Kashyap, G.S. Empowering Network Security through Advanced Analysis of Malware Samples: Leveraging System Metrics and Network Log Data for Informed Decision-Making. Int J Netw Distrib Comput (2024). https://doi.org/10.1007/s44227-024-00032-1

Download citation

Received: 17 October 2023
Accepted: 06 June 2024
Published: 11 June 2024
DOI: https://doi.org/10.1007/s44227-024-00032-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Empowering Network Security through Advanced Analysis of Malware Samples: Leveraging System Metrics and Network Log Data for Informed Decision-Making

Abstract

Similar content being viewed by others

On the Reliability of Network Measurement Techniques Used for Malware Traffic Analysis

Detecting indicators of deception in emulated monitoring systems

Investigating Malware Propagation and Behaviour Using System and Network Pixel-Based Visualisation

1 Introduction

2 Related Works

3 Methodology

3.1 Methodology and Phases of the Study

3.2 Ethical Considerations

3.3 Data Collection and Network Configuration

3.4 Quantitative Testing Approach

4 Experimental Analysis

4.1 WannaCry (Wormless)

4.2 WannaCry (Wormless)

4.3 CryptoLocker

4.4 Locky

4.5 Petya

4.6 NotPetya/PetrWrap

4.7 NotPetya Extended

5 Discussions

5.1 Effective Visualizations

5.2 Addressing Research Questions and Limitations

5.3 Future Works

6 Conclusions

Data Availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation