Finding Software Bugs in Embedded Devices
- 3.6k Downloads
The goal of this chapter is to introduce the reader to the domain of bug discovery in embedded systems which are at the core of the Internet of Things. Embedded software has a number of particularities which makes it slightly different to general purpose software. In particular, embedded devices are more exposed to software attacks but have lower defense levels and are often left unattended. At the same time, analyzing their security is more difficult because they are very “opaque”, while the execution of custom and embedded software is often entangled with the hardware and peripherals. These differences have an impact on our ability to find software bugs in such systems. This chapter discusses how software vulnerabilities can be identified, at different stages of the software life-cycle, for example during development, during integration of the different components, during testing, during the deployment of the device, or in the field by third parties.
11.1 The Challenges of Embedded Devices and Software
We argue that the problem of embedded software security is due to multiple factors, including a systematic lack of transparency, control, and resistance to attacks. A particular way to improve this is to analyze the software of these devices, with the particular goal of identifying software vulnerabilities in order to correct them as early as possible.
11.1.1 Lack of Transparency
Today, many smart devices are compromised during massive attacks, and may be abused to form large botnets (networks of compromised devices). Record-high Distributed Denial of Service (DDoS) attacks (i.e., network flooding) reportedly generated between 620 Gbps and 1 Tbps of traffic [241, 344]. These DDoS attacks were reported to use several hundred thousand compromised embedded/smart devices, comprising dozens of different models of Commercial Off-The-Shelf (COTS) products like IP/CCTV cameras and home routers. Most of those devices were compromised using default or hard-coded credentials set by the manufacturer . Malware running on such devices has complete control over the traffic that is generated, and most smart devices do not embed any infection detection or prevention mechanism. Worse yet, the users or owners of the device are often not aware of the problem, and unable to solve it. In fact, devices are not designed to be inspected and modified by end-users (e.g., to perform forensics as discussed in Chap. 13).
11.1.2 Lack of Control
Another important problem is that smart devices are generally provided as a fixed software (i.e., firmware) and hardware platform, often tied to a cloud service and bundled together as a closed system that the user has little control over. An example of the negative consequences of this customer lock-out is the Revolv smart thermostat. Revolv’s manufacturer was acquired by its competitor Nest, and after a year Nest stopped the cloud service, rendering the Revolv thermostats installed in homes impossible to use . Users often have no choice regarding which software the device should run, or which cloud service to use, or what the device should do. Choosing, installing and using alternative software for such devices is difficult, if not impossible, often due to the single-purpose nature of the hardware and software design, the lack of public documentation, in addition to any anti-tampering measures imposed by the manufacturer.
11.1.3 Lack of Resistance to Attacks
In practice, Internet scanning botnets are active enough that some devices will be compromised within a few minutes after being connected to the Internet . To be considered trustworthy, devices need to have a certain level of resistance to attacks. This is astonishing, because in essence many of the recurring security issues with smart devices have already been “solved” for many years. If vulnerabilities and corresponding attack situations could ultimately be avoided, it is important to ask who is responsible for the damage caused by compromised smart devices, beyond the malware author. The device owner may be legally responsible, but often the end-user does not have any means to detect or prevent such compromises, or to apply a secure configuration. On the other hand, the manufacturers currently often have no legal liability, and thus no incentive (e.g., economic, legal) to prevent a potential vulnerability and compromise.
11.1.4 Organization of This Chapter
Solving these problems requires analyzing the software and firmware for the embedded devices, and identifying and fixing their vulnerabilities. This chapter describes the possible steps to systematically and consistently achieve this goal. We first provide a classification of embedded systems that is well adapted to their analysis. We then describe the possible steps for their analysis. We start with ways to obtain the software to analyze, which is often a challenge in itself for embedded devices. We then describe how to perform static analysis on the firmware packages obtained, which has many advantages such as speed and scalability. We then describe techniques which can be used to dynamically analyze the firmware, which in contrast to static analysis has the advantage of larger code coverage and lower false positive rates.
11.1.5 Classification of Embedded Systems
A general definition of embedded systems is hard to establish . However, two widely accepted differences separate embedded devices from modern general-purpose computers, such as ordinary desktop PCs or smartphones, namely: (a) they are designed to fulfill a specific purpose, and (b) they heavily interact with the physical world via peripherals. The aforementioned two criteria cover a wide variety of devices, ranging from hard-disk controllers to home routers, from digital cameras to Programmable Logic Controllers (PLCs). These families can be further classified according to several aspects, such as their actual computing power , the extent to which they interact with their computing and physical environment, their field of usage, or the timing constraints imposed on them.
Multipurpose / non-embedded systems.
We use Type-0 in order to reference traditional general-purpose systems.
General purpose OS-based devices (e.g., Linux-based).
The Linux OS kernel is widely used in the embedded world. However, in comparison to the traditional GNU/Linux found on desktops and servers, embedded systems typically follow more minimalist approaches. For example, a very common configuration that can be found in consumer-oriented products as well as in Industrial Control Systems (ICS) is based on the Linux kernel coupled with BusyBox and uClibc.
Embedded OS-based devices.
These dedicated operating systems targeted at embedded devices systems are particularly suitable for devices with low computation power, which is typically enforced on embedded systems for cost reasons. Operating systems such as uClinux or FreeRTOS are suitable for systems without a Memory Management Unit (MMU) and are usually adopted on single-purpose user electronics, such as IP cameras, DVD players and Set-Top Boxes (STB).
Devices without OS-Abstraction.
These devices adopt a so called “monolithic firmware”, whose operation is typically based on a single control loop and interrupts triggered by the peripherals in order to handle external events. Monolithic firmware can be found in a large variety of controllers of hardware components, such as CD-readers, WiFi-cards or GPS-dongles.
11.2 Obtaining Firmware and Its Components
Even though complete black box analysis of embedded devices is possible to some degree and in certain situations, obtaining the firmware significantly helps and makes more advanced analyses possible. There are two main ways to obtain the firmware for a given device—as a firmware package (e.g., online, support media) and through extraction from the device itself.
11.2.1 Collecting Firmware Packages
The environments in which embedded systems are deployed are heterogeneous, spanning a variety of devices, vendors, CPU/hardware architectures, instruction sets, operating systems, and custom components. This makes the task of compiling a representative and balanced dataset of firmware packages a difficult problem to solve. The lack of centralized points of collection, such as the ones provided by software/app marketplaces, antivirus vendors, or public sandboxes in the malware analysis field, makes it difficult for researchers to gather large and well triaged datasets. Firmware often needs to be downloaded from vendor Web pages and FTP sites, and it is not always simple, even for a human, to tell whether or not two firmware packages are for the same physical device.
One challenge often encountered in firmware analysis and reverse engineering processes is the difficulty of reliably extracting meta-data from a firmware package. This meta-data might include, the device’s vendor, its product code and purpose, its firmware version, or its processor architecture, among countless other details.
11.2.2 Extracting Firmware from Devices
Obtaining the firmware from an online repository as a firmware package is convenient and thus preferred, however it is not always possible. First, the firmware may not be available, e.g., because there is no update yet, nor one planned. Sometimes the firmware is only distributed through authorized and qualified maintenance agents, e.g., in case of industrial or critical systems. It is also common that the firmware is not distributed at all in an attempt to prevent counterfeit products, reverse engineering of the software or protecting its security.
In such cases the best (and sometimes the only) solution is to extract the firmware from the device itself. There are multiple possible ways to approach this ( and  provide a detailed overview of the process), each approach having its own set of benefits and issues. In the simplest case, the firmware can be extracted by connecting to a debug interface (e.g., JTAG, and serial ports such as UART, SPI, I2C). It is important to note that JTAG is a low level protocol and many different mechanisms can be implemented on top of it. Debug mechanisms allow dumping some memories (e.g., ROM, RAM or Flash memories behind a Flash controller), but not necessarily others. When Flash memory is soldered onto a Printed Circuit Board (PCB) and is independent from the processor, it is possible to de-solder it and extract its contents using a Flash programmer/reader. Unfortunately, the variety of Flash memory standards, types and pinouts is huge. One can design their own Flash chip adapter for reading and dumping the memory contents (e.g., code, data) . However, some cheap universal programmers may be sufficient for dumping sufficiently many models of Flash memories . Finally, the advanced Flash programmers support even hundreds of thousands of different Flash memory models .
However, when the device is a Flash microcontroller, the Flash memory is integrated within the microcontroller and is typically not directly accessible. In such cases, the microcontrollers themselves provide mechanisms to access Flash memory areas, but often such mechanisms come with some Flash area protection mechanisms, which are often arbitrary and microcontroller specific. Such protection mechanisms can sometimes be bypassed due to vulnerabilities in the implementation of the protections themselves [447, 556]. However, such attacks may not always succeed, and one may be left with using more costly invasive hardware attacks such as Linear Code Extraction (LCE)  or direct memory readout using a microscope  as the only option available.
11.2.3 Unpacking Firmware
The next step towards the analysis of a firmware package is to unpack and extract the files or resources it contains. The output of this phase largely depends on the type of firmware, as well as the unpacking and extraction tools employed. In some examples, executable code and resources (such as graphics files or HTML code) might be embedded directly into a binary blob that is designed to be directly copied into memory by a bootloader and then executed. Some other firmware packages are distributed in a compressed and obfuscated package which contains a block-by-block image copy of the Flash memory. Such an image may consist of several partitions containing a bootloader, a kernel, a file system, or any combination of these.
11.2.4 Firmware Unpacking Frameworks
- Binwalk is perhaps the best known and most used firmware unpacking tool developed by Craig Heffner . It uses pattern matching to locate and carve files from a binary blob. Additionally, it also extracts some meta-data such as license strings.Table 11.1
Comparison of the unpacking performance of Binwalk, BAT, FRAK and Firmware.RE on a few example firmware packages (according to )
The Binary Analysis Toolkit (BAT), formerly known as GPLtool, was originally designed by Armijn Hemel and Tjaldur software in order to detect GPL license violations [269, 558]. To do so, it recursively extracts files from a binary blob and matches strings with a database of known strings from GPL projects and licenses. BAT also supports file carving similar to binwalk, as well as a very flexible plugin-oriented extension interface.
Firmware.RE  extends BAT with additional unpacking methods and specific analyses to perform automated large-scale analyses. When released, it achieved a lower false positive rate when unpacking firmware compared to binwalk.
11.2.5 Modifying and Repacking Firmware
Modifying and repacking a firmware could be one optional step during the analysis of the firmware and device security. The modifications could be performed either at the level of the entire firmware package, or at the level of individually unpacked files (that are finally repacked back into a firmware package). Such a step could be useful in testing several things. First, it can check whether a particular firmware has error, modification and authenticity checks for new versions of firmware. If such checks are missing or improperly implemented, the firmware update mechanism can then be used as an attack vector, or as a way to perform further analysis of the system [57, 162]. Second, it can be used to augment the firmware with additional security-related functionality, such as exploits, benign malware and more advanced analysis tools. For example, this could be useful when there are no other ways to deliver an exploit (e.g., non-network local exploits such as kernel privilege escalation), or provide some (partial) form of introspection into the running device/firmware .
The firmware-mod-kit tool  is perhaps the most well-known (and possibly among the very few) firmware modification tools. Unfortunately, it supports a limited number of firmware formats, and while it can be extended to support more formats, to do so requires substantial manual effort. Further, for some formats it relies on external tools to perform some of the repacking. These tools are developed and maintained by different persons or entities in different shapes and forms, thus there is no uniform way to modify and repack firmware packages.
11.3 Static Firmware Analysis
Once the code is extracted further analysis can be performed. There are two main classes of analysis that can be preformed on a generic computing system—static analysis and dynamic analysis. In principle, the distinction between the two is easy: in static analysis the code is analyzed without executing it, but instead only reasoning about it, while in the dynamic setting the analysis is performed on the code while it is executed. With more advanced analysis techniques, however, this frontier is slightly blurred. For example, symbolic execution allows one to analyze software by considering some variables to have an unknown value (i.e., they are unconstrained). Symbolic execution is sometimes considered static analysis and at other times dynamic analysis. In this section, we will first describe simple static analysis which can be efficiently performed on firmware packages, then we will discuss more advanced static analysis approaches. Finally, we will cover the limitations of static analysis and in the next section focus on the dynamic analysis on firmware packages.
11.3.1 Simple Static Analysis on Firmware Packages
126.96.36.199 Configuration Analysis
For a large majority of complex embedded devices (i.e., those of Type-I as described in Sect. 11.1.5), while service configuration is stored within the file-system of the device, user-configurable information is often stored elsewhere—within a region of memory called Non-Volatile Random Access Memory (NVRAM) which retains its state between power cycles (similar to Flash memory in some ways). Many devices treat NVRAM as a key-value store and include utilities such as nvram-get and nvram-set, as well as dedicated libraries to get and set values stored there. On a router, for example, the current Wi-Fi passphrase and web-based configuration interface credentials, will often be stored within the NVRAM, which will be queried by software in order to facilitate the authentication of the device and its services.
All other device configuration, without performing a firmware upgrade, will be static. As a result of this, any, e.g., hard-coded passwords or certificates (as noted in ), can be leveraged by an adversary to compromise a device. To this end, Costin et al.  show many instances where devices are configured with user accounts and passwords that are weak, missing entirely, or stored in plain-text. Therefore, a first step in static analysis of firmware is to examine the configuration of its services: to check for improperly configured services, e.g., due to use of unsafe defaults and hard-coded credentials. Configuration files are of further use in estimating the set of programs utilized and the initial global configuration of a device, in the absence of physical access to it. For example, by examination of its boot scripts, we are able to learn which services present in its firmware (among potentially hundreds) are actually utilized by the device, this can aid in reducing the amount of time taken by more complex analysis approaches described later.
Manual methods are often sufficient for analysis of a few firmware images and, with limited scope, analysis of things such as the device’s configuration. For example, to estimate the set of processes started by a firmware one can inspect the contents of a boot script, e.g., /etc/rcS.
188.8.131.52 Software Version Analysis
Many devices are not designed to receive firmware updates. This prohibits patching against known security vulnerabilities and can often render a device useless to an end-user. This prevents abusing the firmware update as an attack vector. However, when a vulnerability is discovered, the only effective mitigation is to replace the device with a new one.
Many devices are designed to be updated and vendors provide firmware updates. However, the mechanisms for applying those updates are often not standardized and are largely ad-hoc. They also heavily rely on the end-user’s diligence (to identify that an update is available) and action (to actually apply the updates). The end-result of this is that an overwhelming majority of devices are left unpatched against known vulnerabilities. Thus, a further step in the analysis of firmware is to identify the versions of software (both programs and libraries) it contains, and correlate those versions with known vulnerabilities (e.g., CVE database).
There are several possible approaches to perform this. For example,  use fuzzy hashing [340, 507] as a method to correlate files in firmware images. The effectiveness of the approach was demonstrated in several examples, in particular uncovering many IoT and embedded devices being so-called “white label” products.2 Finally, machine learning can be used to identify firmware images  or to search for known vulnerabilities .
11.3.2 Static Code Analysis of Firmware Packages
Developing tools for performing automated static code analysis on embedded device firmware presents a number of complexities compared to performing analyses on software for commodity PC systems (i.e., Type-0 devices). The first challenge is the diversity of CPU architectures. This alone restricts the amount of existing tooling that can be used, and when attempting large scale analysis tools will inevitably have to deal with firmware from a number of distinct architectures. To facilitate the analysis in this case, the algorithms will either have to be reimplemented for each architecture being analyzed, or the architecture-specific disassembled firmware instructions will have to be lifted to a common, so-called Intermediate Language (IL) or Intermediate Representation (IR). A further difficulty for more simple devices (e.g., those of Type-III) is the often non-standard means by which different device firmware executes (e.g., it could be interrupt driven) and interacts with the memory and external peripherals. More complex firmware (e.g., that of Type-I devices) tends to more closely follow the execution behavior of more conventional devices (those of Type-0).
184.108.40.206 Code Analysis of Embedded Firmware
Despite the increased complexity of performing automated analysis of embedded device firmware, a number of techniques have been proposed for both targeted and large-scale static analysis. Eschweiler et al.  and Feng et al.  use numeric feature vectors to perform graph-based program comparisons  efficiently. They encode control-flow and instruction information in these feature vectors to identify known vulnerabilities in device firmware. Both methods provide a means of querying a data-set of binaries using a reference vulnerability as input and identifying the subset of binaries that contain constructs that are similar (but not necessarily the same) to those of the input vulnerability. The work in  improves the performance of these approaches by relying on Neural Networks.
220.127.116.11 Discovering Backdoors with Static Analysis
Aside from vulnerability discovery, a small body of work has attempted to automatically identify backdoor-like constructs in device firmware. Static analysis is most suited to detecting such constructs due to the fact it can achieve full program coverage. Dynamic analysis is less adequate in this case, as it relies solely on execution traces that can be captured and analyzed stemming from triggering standard program behaviors (which, by definition , a backdoor is not).
HumIDIFy3  uses a combination of Machine Learning (ML) and static analysis to identify anomalous and unexpected behavior in services commonly found in Linux-based firmware. ML is used first to identify the type of firmware binaries, e.g., a web-server, this then drives classification-specific static analysis on each binary. HumIDIFy attempts to validate that binaries do not perform any functionality outside of what is expected of the type of software they are identified as. For example, HumIDIFy is able to detect a backdoor within a web-server taken from Tenda router firmware4 that contains an additional UDP listening thread which executes shell commands provided to it (without authentication) as the root user.
Stringer5  attempts to locate backdoor-like behavior in Linux-based firmware. It automatically discovers comparisons with static data that leads to execution of unique program functionality, which models the situation of a backdoor providing access to undocumented functionality via a hard-coded credential pair or undocumented command. Stringer provides an ordering of the functions within a binary based on how much their control-flow is influenced by static data comparisons that guard access to functionality not otherwise reachable. The authors demonstrate Stringer is able to detect both undocumented functionality and hard-coded credential backdoors in devices from a number of manufacturers.
Firmalice  is a tool for detecting authentication bypass vulnerabilities and backdoors within firmware by symbolic execution. It takes a so-called security policy as input, which specifies a condition a program (or firmware) will exhibit when it has reached an authenticated state. Using this security policy, it attempts to prove that it is possible to reach an authenticated state by discovering an input that when given to the program satisfies the conditions to reach that state. To discover such an input, Firmalice employs symbolic execution on a program slice taken from a program point acting as an input source to the point reached that signals the program is in an authenticated state. If it is able to satisfy all of the constraints such that a path exists between these two points, and an input variable can be concretised that satisfies those constraints, then it has discovered an authentication bypass backdoor (and a triggering input)—such an input will not be discoverable in a non-backdoored authentication routine. Unfortunately, Firmalice requires a degree of manual intervention to perform its analysis, such as identifying the security policy, input points and privileged program locations. It is therefore not easily adaptable for large-scale analysis.
18.104.22.168 Example Static Analysis to Discover Code Parsers
In order to interact with remote servers or connecting clients (e.g., for remote configuration), most firmware for networked embedded devices will contain client/server components, e.g., a web-server, or proprietary, domain-specific client/server software. In all cases, the firmware itself or software contained within it (for more complex devices) will implement parsers for handling the messages of the protocols required to communicate with corresponding client/server entities. Such parsers are a common source of bugs, whether their implementation incorrectly handles input in a way that causes a memory corruption, or permits an invalid state transition in a protocol’s state machine logic. Thus, identifying these constructs in binary software is useful as a premise to performing targeted analyses. To this end, Cojocar et al. , propose PIE, a tool to automatically detect parsing routines in firmware binaries. PIE utilizes a supervised learning classifier trained on a number of simple features of the LLVM IL representation of firmware components known to contain parsing logic. Such features include: basic block count, number of incoming edges to blocks, and number of callers (for functions). PIE provides a means to identify specific functions responsible for performing parsing within an input firmware package, or software component. Stringer , described in Sect. 22.214.171.124, similarly provides a means of automatically identifying parser routines (for text-based input); in addition to identifying routines, it is also able to identify the individual (text-based) commands, processed by the parser.
11.4 Dynamic Firmware Analysis
Static analysis is indeed a robust technique that can help discover a wide range of vulnerability classes, such as misconfigurations or backdoors. However, it is not necessarily best suited for other types of vulnerabilities, especially when they depend on the complex runtime state of the program.
Similar to static analysis, powerful dynamic analysis techniques and tools have been developed for traditional systems and general purpose computers. However, the unique characteristics and challenges of the embedded systems make it difficult, if not impossible, to directly apply those proven methods. To this end, there are several distinct directions for dynamic analysis of embedded systems and we briefly discuss them below.
11.4.1 Device-Interactive Dynamic Analysis Without Emulation
When the device is present for analysis, the simplest form of device-interactive dynamic analysis is to test the devices in a “black-box” manner. The general idea of this approach is to setup and run the devices under analysis as in normal operation (e.g., connect to Ethernet LAN, WLAN, smartphone), and then test it with various tools and manual techniques (e.g., generic or specialized fuzzers, web penetration) and observe their behavior via externally observable side-effects such as device reboots, network daemon crashes, or XSS artifacts . Similar approaches and results were reported by several independent and complementary works [259, 280, 292].
While being simple and easy to perform, this type of dynamic analysis has certain limitations, some of which are due to the “black-box” nature of the approach. For example, it is challenging to know what is happening with the entire system/device while the dynamic analysis is performed, i.e., the introspection of the system is missing or is hard to achieve. Also, in this approach it is not easy to control in detail what specifically is being executed and analyzed, the analysis being mostly driven by the data and actions fed to the device. In addition to this, some types of vulnerabilities might not have side-effects that are immediately  or externally visible (e.g., a crash of a daemon which does not necessarily expose a network port), therefore those bugs could be missed or misinterpreted during the analysis.
11.4.2 Device-Interactive Dynamic Analysis with Emulation
As an extension to the aforementioned approach, emulation can be coupled with device-interactive dynamic analysis to provide the required depth and breadth, therefore outperforming other static or dynamic analysis methods. The general idea of this approach is to split the execution of the embedded firmware between the analysis host and the actual running device. The analysis host is connected to the device via a debug (e.g., JTAG) or serial (e.g., UART) interface. Therefore one requirement is that the device under analysis must provide at least such an interface, whether documented or not. The analysis host then runs a dynamic analysis environment which is typically an emulator (e.g., QEMU-based) augmented or extended with additional layers and plugins such as symbolic execution and taint analysis. The analysis host has access to the execution and memory states both for the emulator and for the running device. The firmware is being analyzed first in the extended emulator environment. During the firmware emulation and analysis, certain parts of the analyzed firmware are transferred for execution by the analysis host from the emulator to the running device. This is sometimes required, for example, when the firmware needs to perform an I/O operation with a peripheral present on the devices but not in the emulator. The execution and state transfer to and from the device occur via the connected debug or serial interface. On the one hand, by using this approach it is possible to control exactly what is to be analyzed because the emulator is under the full supervision of the analysis host. On the other hand, this approach enables broader and deeper coverage of the execution because the device can complement the execution of firmware parts that are impossible to execute within the emulator.
This is the approach followed by Avatar  which aims at providing symbolic execution with S2E , while Avatar2  focuses on better interoperability with more tools. Prospect  explores forwarding at the system calls level and Surrogates  provides a very fast debug interface. Inception  provides an analysis environment to use during testing when source code is available.
11.4.3 Device-Less Dynamic Analysis and Emulation
Performing dynamic analysis in a device-interactive manner certainly has its benefits, however such an approach has a number of limitations and is hard to fully automate. Firstly, it is not easy to scale the human operator’s interventions and expertise required for many of the tasks related to the approach of device interaction with emulation. Secondly, it is challenging to automate and scale the logistics operations related to acquisition, tear-down, connection, configuration and reset of a large number of devices. Therefore, dynamic analysis techniques that are easier and more feasible to scale and automate are required. One such technique is the device-less analysis based on full or partial emulation.
Davidson et al.  presented the FIE tool that detects bugs in firmware of the MSP430 microcontroller family. FIE leverages KLEE  to perform symbolic execution of firmware in order to detect memory safety violations (e.g., buffer overflows and out-of-bounds memory accesses), and misuse of peripherals (e.g., attempted writes to read-only memory). FIE needs the availability of the source code, which is uncommon, and is able to handle a variety of the nuances and challenges faced during automated analysis of firmware, especially when dealing with firmware for Type-III devices. However, when reading I/O from a device, the values read are always assumed to return unconstrained (completely symbolic) values which leads to a state explosion problem. This limits the size of the programs which can be analyzed.
In , the authors perform device-less dynamic security analysis via automated and large-scale emulation of embedded firmware. Similarly, FIRMADYNE  presents an automated and scalable system for performing emulation and dynamic analysis of Linux-based embedded firmware.
The general idea of both works is to crawl and then unpack firmware packages into minimal root filesystems (i.e., rootfs) that can subsequently be virtualized and executed as a whole via “system emulation” (as opposed to “user emulation”) using for example QEMU . The emulator is first used to start an architecture-specific emulation host OS, such as Debian for ARM or MIPS depending on the architecture of the device whose firmware is being dynamically analyzed. Then the firmware root filesystem is uploaded to the emulation host OS, where its Linux boot sequence scripts are initiated, most likely in a chroot environment under the emulation host OS. Once the firmware’s Linux boot sequence concludes, various services (e.g., a web server, SSH, telnet, and FTP) of the device/firmware under analysis should be running, and are ready for logging, tracing, instrumentation and debugging. The work in  extends this approach by running a custom operating system kernel which is able to emulate some of the missing drivers.
We have provided a short overview of the field: from our excursus, it is clear that analyzing the software of IoT/embedded devices and finding security vulnerabilities within them is still a challenging task. While multiple directions and techniques are being actively explored and developed within the field, more research, insights and tools are still required.
Unfortunately, the existing proven techniques (e.g., static, dynamic, hybrid analysis) cannot be applied in a straightforward manner to embedded devices and their software/firmware. One reason for this is the high heterogeneity and fragmentation of the technological space that supports embedded/IoT systems. Another reason is the “opaque” nature of embedded devices, which can be seen as akin to the “security by obscurity” principle. Such reasons make embedded systems harder to analyze compared to more traditional systems.
Indeed, the current embedded firmware “population” may still contain many latent backdoors and vulnerabilities, both known and unknown. However, as we detailed in this chapter, positive and promising avenues for the detection of embedded software bugs are becoming increasingly available. Such avenues include large-scale analysis and correlation techniques, hybrid/dynamic analysis of emulated firmware or running devices, and advanced techniques to specifically detect backdoors.
- 38.Autoelectric. XGecu TL866II. http://autoelectric.cn/EN/TL866_main.html.
- 57.Zachry Basnight, Jonathan Butts, Juan Lopez Jr., and Thomas Dube. Firmware modification attacks on programmable logic controllers. International Journal of Critical Infrastructure Protection, 2013.Google Scholar
- 69.Fabrice Bellard. Qemu, a fast and portable dynamic translator. In USENIX Annual Technical Conference, FREENIX Track, volume 41, page 46, 2005.Google Scholar
- 75.Emma Benoit, Guillaume Heilles, and Philippe Teuwen. Quarkslab blog post: Flash dumping, September 2017. https://blog.quarkslab.com/flash-dumping-part-i.html.
- 120.Cristian Cadar, Daniel Dunbar, and Dawson Engler. KLEE: Unassisted and Automatic Generation of High-coverage Tests for Complex Systems Programs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI ’08, 2008.Google Scholar
- 125.Giovanni Camurati and Aurélien Francillon. Inception: system-wide security testing of real-world embedded systems software. In USENIX Security Symposium, 2018.Google Scholar
- 137.Daming D Chen, Manuel Egele, Maverick Woo, and David Brumley. Towards automated dynamic analysis for linux-based embedded firmware. In ISOC NDSS 2016, 2016.Google Scholar
- 150.Lucian Cojocar, Jonas Zaddach, Roel Verdult, Herbert Bos, Aurélien Francillon, and Davide Balzarotti. PIE: Parser Identification in Embedded Systems. Annual Computer Security Applications Conference (ACSAC), December 2015.Google Scholar
- 151.SEC Consult. House of Keys: Industry-Wide HTTPS Certificate and SSH Key Reuse Endangers Millions of Devices Worldwide. Blog, Nov, 25, 2015.Google Scholar
- 155.Andrei Costin, Jonas Zaddach, Aurélien Francillon, and Davide Balzarotti. A Large Scale Analysis of the Security of Embedded Firmwares. In Proceedings of the 23rd USENIX Security Symposium (USENIX Security), August 2014.Google Scholar
- 156.Andrei Costin, Apostolis Zarras, and Aurélien Francillon. Automated Dynamic Firmware Analysis at Scale: A Case Study on Embedded Web Interfaces. In 11th ACM Asia Conference on Computer and Communications Security (ASIACCS, ASIACCS 16, May 2016.Google Scholar
- 157.Andrei Costin, Apostolis Zarras, and Aurélien Francillon. Towards automated classification of firmware images and identification of embedded devices. In IFIP International Conference on ICT Systems Security and Privacy Protection, pages 233–247. Springer, 2017.Google Scholar
- 158.Franck Courbon, Sergei Skorobogatov, and Christopher Woods. Reverse engineering flash EEPROM memories using scanning electron microscopy. In Smart Card Research and Advanced Applications - 15th International Conference, CARDIS 2016, pages 57–72, 2016.Google Scholar
- 161.Ang Cui. Embedded Device Firmware Vulnerability Hunting with FRAK. DefCon 20, 2012.Google Scholar
- 162.Ang Cui, Michael Costello, and Salvatore J Stolfo. When Firmware Modifications Attack: A Case Study of Embedded Exploitation. In Proceedings of the 20th Symposium on Network and Distributed System Security, NDSS ’13. The Internet Society, 2013.Google Scholar
- 163.Ang Cui and Salvatore J. Stolfo. Defending Embedded Systems with Software Symbiotes. In Robin Sommer, Davide Balzarotti, and Gregor Maier, editors, Recent Advances in Intrusion Detection, volume 6961 of Lecture Notes in Computer Science, pages 358–377. Springer, 2011.Google Scholar
- 171.Lyla B Das. Embedded Systems: An Integrated Approach. Pearson Education India, 2012.Google Scholar
- 175.Drew Davidson, Benjamin Moench, Thomas Ristenpart, and Somesh Jha. FIE on Firmware: Finding Vulnerabilities in Embedded Systems Using Symbolic Execution. In Proceedings of the 22nd USENIX Security Symposium, SEC ’13, 2013.Google Scholar
- 191.Thomas Dullien and Rolf Rolles. Graph-based comparison of executable objects. In Symposium sur la Securite des Technologies de lInformation et des Communications, SSTIC ’05, 2005.Google Scholar
- 198.Elnec. Elnec beeprog2. https://www.elnec.com/en/products/universal-programmers/beeprog2/.
- 202.Sebastian Eschweiler, Khaled Yakdan, and Elmar Gerhards-Padilla. discovRE: Efficient Cross-Architecture Identification of Bugs in Binary Code. In ISOC NDSS 2016, 2016.Google Scholar
- 212.Qian Feng, Rundong Zhou, Chengcheng Xu, Yao Cheng, Brian Testa, and Heng Yin. Scalable Graph-based Bug Search for Firmware Images. In ACM CCS 2016, 2016.Google Scholar
- 241.Dan Goodin. Record-breaking ddos reportedly delivered by > 145k hacked cameras. Ars Technica, 09 2016.Google Scholar
- 259.Mária Hatalová. Security of small office home routers. PhD thesis, Masarykova univerzita, Fakulta informatiky, 2015.Google Scholar
- 261.Steve Heath. Embedded systems design. Newnes, 2002.Google Scholar
- 262.C Heffner and J Collake. Firmware mod kit-modify firmware images without recompiling, 2015.Google Scholar
- 263.Craig Heffner. binwalk – firmware analysis tool designed to assist in the analysis, extraction, and reverse engineering of firmware images. https://github.com/ReFirmLabs/binwalk.
- 269.Armijn Hemel, Karl Trygve Kalleberg, Rob Vermaas, and Eelco Dolstra. Finding Software License Violations Through Binary Code Clone Detection. In Proceedings of the 8th Working Conference on Mining Software Repositories, MSR ’11. ACM, 2011.Google Scholar
- 271.Alex Hern. Revolv devices bricked as Google’s Nest shuts down smart home company. The Guardian, April 2016. https://www.theguardian.com/technology/2016/apr/05/revolv-devices-bricked-google-nest-smart-home.
- 280.Hewlett Packard Enterprise (HPE). Internet of things research study – 2015 report, 2015.Google Scholar
- 292.Independen Security Evaluators. Exploiting SOHO Routers, April 2013.Google Scholar
- 312.Markus Kammerstetter, Christian Platzer, and Wolfgang Kastner. Prospect: peripheral proxying supported embedded code testing. In Proceedings of the 9th ACM symposium on Information, computer and communications security, pages 329–340. ACM, 2014.Google Scholar
- 340.Jesse D. Kornblum. Identifying Almost Identical Files Using Context Triggered Piecewise Hashing. In Proceedings of the Digital Forensic Workshop, 2006.Google Scholar
- 341.Karl Koscher, Tadayoshi Kohno, and David Molnar. Surrogates: enabling near-real-time dynamic analyses of embedded systems. In Proceedings of the 9th USENIX Conference on Offensive Technologies. USENIX Association, 2015.Google Scholar
- 344.Brian Krebs. KrebsOnSecurity Hit With Record DDoS. Krebs On Security, September 2016.Google Scholar
- 345.Brian Krebs. Who Makes the IoT Things Under Attack? Krebs On Security, October 2016.Google Scholar
- 429.Marius Muench, Dario Nisi, Aurélien Francillon, and Davide Balzarotti. Avatar2: A Multi-target Orchestration Platform. In Workshop on Binary Analysis Research (colocated with NDSS Symposium), BAR 18, February 2018.Google Scholar
- 430.Marius Muench, Jan Stijohann, Frank Kargl, Aurélien Francillon, and Davide Balzarotti. What you corrupt is not what you crash: Challenges in fuzzing embedded devices. In ISOC NDSS 2018, 2018.Google Scholar
- 439.Marcus Niemietz and Jörg Schwenk. Owning your home network: Router security revisited. In 9th Workshop on Web 2.0 Security and Privacy (W2SP) 2015, 2015.Google Scholar
- 447.Johannes Obermaier and Stefan Tatschner. Shedding too much light on a microcontroller’s firmware protection. In 11th USENIX Workshop on Offensive Technologies (WOOT 17), Vancouver, BC, 2017. USENIX Association.Google Scholar
- 507.Vassil Roussev. Data Fingerprinting with Similarity Digests. In IFIP International Conference on Digital Forensics, pages 207–226, 2010.Google Scholar
- 528.Yan Shoshitaishvili, Ruoyu Wang, Christophe Hauser, Christopher Kruegel, and Giovanni Vigna. Firmalice-automatic detection of authentication bypass vulnerabilities in binary firmware. In NDSS, 2015.Google Scholar
- 529.O. Shwartz, Y. Mathov, M. Bohadana, Y. Oren, and Y. Elovici. Reverse engineering iot devices: Effective techniques and methods. IEEE Internet of Things Journal, pages 1–1, 2018.Google Scholar
- 549.Olivier Thomas and Dmitry Nedospasov. On the impact of automating the ic analysis process. BlackHat 2015, August 2015.Google Scholar
- 550.Sam L. Thomas, Tom Chothia, and Flavio D. Garcia. Stringer: Measuring the Importance of Static Data Comparisons to Detect Backdoors and Undocumented Functionality. In Proceedings of the 22nd European Symposium on Research in Computer Security, ESORICS ’17, 2017.Google Scholar
- 551.Sam L. Thomas and Aurélien Francillon. Backdoors: Definition, Deniability and Detection. In Symposium on Research in Attacks, Intrusion, and Defenses (RAID). Springer, September 2018.Google Scholar
- 552.Sam L. Thomas, Flavio D. Garcia, and Tom Chothia. HumIDIFy: A Tool for Hidden Functionality Detection in Firmware. In Proceedings of the 14th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, DIMVA ’17, 2017.Google Scholar
- 556.Andrew Tierney (@cybergibbons). Bypassing code readout protections on microcontrollers, January 2018.Google Scholar
- 558.Tjaldur Software Governance Solutions. Binary Analysis Tool (BAT).Google Scholar
- 564.S. Vasile, D. Oswald, and T. Chothia. Breaking all the things - a systematic survey of firmware extraction techniques for iot devices. In CARDIS, 2018.Google Scholar
- 585.Xiaojun Xu, Chang Liu, Qian Feng, Heng Yin, Le Song, and Dawn Song. Neural network-based graph embedding for cross-platform binary code similarity detection. In ACM SIGSAC Conference on Computer and Communications Security, CCS ’17, 2017.Google Scholar
- 591.Jonas Zaddach, Luca Bruno, Aurélien Francillon, and Davide Balzarotti. Avatar: A Framework to Support Dynamic Security Analysis of Embedded Systems’ Firmwares. In NDSS 2014, February 2014.Google Scholar
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.