An example from 2015 illustrates how compilers can be used to spread malware. Xcode is Apple’s development tool for iOS applications. Attackers added infectious malware to Xcode and uploaded the modified version to a Chinese file-sharing service. Chinese iOS developers downloaded the malicious version of Xcode, compiled iOS applications with it and inadvertently created infected executables, and then distributed these infected executables through Apple’s App Store [9]. This technique has allegedly long been known to the CIA [5], who has been claimed to have exploited Xcode to add malware to iOS applications.

In this chapter, we consider the processes behind the production and maintenance of information and communications technology (ICT) equipment. We discuss how hardware and executable software running on this hardware are produced and maintained. Our discussion is not related to the traditional fields of software or hardware engineering. Rather, we look at the process from a toolchain point of view, to understand from which vantage points a vendor can introduce hidden functionality into the equipment. Finally, we discuss the advantages, from the perpetrator’s point of view, of using development and production line tools to include malicious functionality into a product and we clarify the implications of this for the equipment buyer.

4.1 Software Development

The code language running on a device is far from being easily understood by humans. We therefore distinguish between the source code and the executable code. The source code of a program is written in a programming language that is designed to be humanly understandable. The source code for the program needs to be translated into executable code before it is ready to be executed on a device. This translation is carried out by a compiler. The process of producing the code that actually runs on a device is illustrated by the three boxes at the bottom of Fig. 4.1.

Fig. 4.1
figure 1

The structure of compiler code and equipment source code that eventually make up the executable running on the equipment. The first compiler is programmed directly in executable code. Version 2 of the compiler is programmed in a high-level language defined by the compiler. Version 1 of the compiler is then used to create an executable form of version 2 of the compiler. This iterative process is repeated for each new compiler generation. Thompson demonstrated how malware inserted into any historic version of the compiler can survive forever down the development chain and eventually result in backdoors inserted into present-day executable code by present-day compilers

The compiler that produces the executable code is itself a program. It was written in a programming language and was itself compiled by a compiler. That compiler was, in turn, compiled by another compiler and, thus, the full structure of programs, source code, and executable code that leads to the executable of a product is quite complex. A simplified picture of the dependencies is given in Fig. 4.1. Many aspects are left out of this figure: We have not shown how this structure allows new programming languages to emerge, we have not considered how compilers for programs running on new hardware architectures are built, and we have not considered how pre-compiled libraries or elements of an operating system would interact with the final executable. For our discussion, it suffices to know that, for most modern programs, the dependency chain backwards to older compilers, all the way back to the first compiler that was written in executable code, is quite long and, for some very common programming languages, can be traced back to the 1970s [4].

It is generally accepted that executable code is humanly understandable only with a tremendous amount of effort (see Chap. 6 for more details). The source code of a program is therefore at the centre of many discussions. It is usual for companies to release only the executable code of their products to ensure that their ideas are not copied by competitors. Open-source discussions and initiatives are other examples illustrating the absence of transparency in executable code. Equipment vendors have therefore occasionally countered customer worries by making the source code of their products available.

Given the above, the following question presents itself: Where in the process of Fig. 4.1 can malicious code be inserted so that the intended malicious functionality is part of the final executable? It is obvious that such code can be part of the source code of the product itself, but can it be placed in the compiler without altering the source code of the product? The answer is given by Ken Thompson [11] in his Turing Award lecture in 1983. In it, he gave a very simple example of how a Trojan horse can be inserted into the end executable code of a product through the source code of any single one of the compiler versions that, at some time in the past, played a part in producing the final executable.

What Thomson demonstrated is how one could alter the C compiler so that it introduces a backdoor into the UNIX operating system whenever the operating system is compiled. Furthermore, he showed how one could code this property into a ‘gene’ so that any later version of the C compiler would inherit this capability. Finally, he showed how to remove any trace that this had happened from the source code of both the compiler and the operating system. The backdoor would be inserted into any future version of UNIX by any future version of the C compiler and neither the UNIX developers nor future developers of the C compilers would ever know.

The insights of Thompson are well known in the scientific community. Still, as pointed out by [4], such methods are not considered a real threat to the security of computer systems. The danger of someone using a compiler to inject backdoors to break into some system yet to be made has been considered unlikely. If we put ourselves in the position of a company that wishes to include backdoors into its products without being caught, the discoveries of Thompson can be viewed from a different angle. Such a company would like to leave as few traces as possible of this ever having happened. Using the compiler to introduce backdoors would leave no trace in the source code, so the code can safely be given to any suspicious customer. Furthermore, such a company would benefit from keeping knowledge of the secret confined to as small a group of people as possible to minimize the risk of information on their actions leaking out. Altering compiler tools means that the development team itself need not know that the backdoors are being introduced.

From the above discussion, we now draw the following conclusions:

  • The absence of malicious elements from the source code of a software product does not prove that such elements do not exist in the executable code.

  • If a vendor wants to install malicious code in a product, it is not necessary for the development team to be aware of this. The malicious code can be installed in the compiler tools that the developers are instructed to use.

4.2 Hardware Development

Putting malicious functionality into hardware can be very effective in gaining control over an entire system. Furthermore, it could require very little space on the integrated circuit. As an example, it has been demonstrated that a backdoor can be inserted into an entire system by adding as few as 1,341 additional gates to a chip [7]. These additional gates are used to check the checksum of an IP packet and, given the right checksum, it would install the packet’s payload as new firmware on the processor. The firmware could, in principle, do anything, but what is described in the paper by King [7] is an attack where the installed firmware gives a particular username login root access to the system.

The process for the development of integrated circuits is somewhat different from that of software. Still, there are similarities, in the sense that hardware can be defined through a high-level language, where the details of gates and transistors are abstracted away. The transition from descriptions understandable by humans into a physical chip coming off a production line will go through a process that can roughly be described as follows:

  1. 1.

    An algorithmic description of the desired behaviour (usually specified in a dialect of the C programming language) is synthesized into a register-transfer level (RTL) hardware design language by a high-level synthesis tool [8].Footnote 1

  2. 2.

    The RTL is translated into a gate-level description of the chip by a logic synthesis tool [10].

  3. 3.

    The gate-level description is used by production lines to produce the actual chip.

Again, we agree that our description is vastly simplified. We have not considered difficulties related to, for example, the layout or thermal issues. Still, the model is sufficient for the conclusions we need to draw.

In the case of integrated circuits, malicious functionality can, of course, be placed directly into the RTL or the algorithmic description of the chip. Learning from Sect. 4.1 above, we need to also consider the synthesis tools that translate the hardware descriptions from languages that are humanly understandable and down to gate-level descriptions. The important observation in this case is, however, that all of the synthesis tools are themselves pieces of software. They are therefore subject to the same considerations made at the end of the previous section. Knowing that as few as 1,341 extra gates on a chip can leave an entire system wide open, it is easy to see that any tool involved in the making of a chip can easily insert serious malicious functionality into it. Such functionality could even be inserted by the production line that produces the final chip from the gate-level descriptions [1, 3, 6]. We are therefore forced to make similar conclusions for hardware as we did for software:

  • The absence of malicious elements from the source code of a hardware product does not prove that such elements do not exist in the chip.

  • If a vendor wants to install malicious code in a chip, it is not necessary for the development team to be aware of this. The malicious functionality can be installed in the synthesis tools that the developers use.

4.3 Security Updates and Maintenance

An electronic device consists of two parts: one is the hardware that the customer can touch and feel and the other is the software that guides the device to behave as intended.Footnote 2 Whereas the hardware of a device can be considered fixed at the time of purchase, the software of the device is generally updated and changed several times during the lifetime of the device.

The reasons for updating the software running on network equipment can be the following:

  1. 1.

    There could be a bug in the device that needs to be fixed.

  2. 2.

    New and optimized code could have been developed that increases the performance of the device.

  3. 3.

    New functionality that was defined after the purchase of the device, for example, new protocol standards, needs to be supported.

  4. 4.

    New security threats have emerged, creating the need for new protection mechanisms in the equipment.

In particular, points 3 and 4 in the above list make software updates inevitable. The requirement that equipment support new protocols that will be defined after the time of purchase will prevail for the foreseeable future. Furthermore, the security threats that the equipment must handle will continuously take on new forms.

Software updates come in different sizes, depending on what part of the software is updated. Device driver updates will have different implications than an update of the operating system of the entire device. For our discussion, we do not need to analyse these differences in more depth. It suffices to observe that changes to the operating system and the device drivers will be required from time to time.

Obviously, since software updates are made in software, they inherit the same conclusions as those we drew in Sect. 4.1. In addition to these conclusions, we can draw another one from the discussion in this section:

  • Malicious elements in the software code on a device can be introduced through software updates at any time in the life cycle of the device.

  • If a vendor, at some point in time, wants to install malicious code in a sold device through a software update, it is not necessary that the development team of the update be aware of this. The malicious code can be installed in the compiler tools that developers use.

4.4 Discussion

Protecting an ICT infrastructure from attacks and data leakage is a daunting task. We are currently witnessing an arms race between the developers of equipment and intruders in which ever more sophisticated attacks are designed and need to be countered with ever more sophisticated defence mechanisms. When we assume that the perpetrators could be the designers of the equipment themselves, the issue looks entirely different. The problem is new and no real arms race has started. The only technical discussion of the matter we are aware of is that where Huawei suggested making its software source code available to its customers [2] in response to doubts over whether it could be trusted.

Fig. 4.2
figure 2

Schematic overview of the elements that contribute to the final integrated product

There is every reason to applaud the effort made by Huawei to build trust through openness. Still, there are two separate and independent reasons why giving access to the software source code is nowhere near giving insight into the true ability of an electronic system. First, it has been demonstrated beyond any doubt that full backdoors into a system can be created with a microscopic number of additional gates on a chip. A backdoor existing in the hardware will clearly not be visible in the software running on top of it. Second, the code running on a machine is not the source code, but instead some executable code that was generated from the source code by a compiler. Again, it has been demonstrated beyond any doubt that backdoors can be introduced into the executable code by the compiler and, thus, not even a software-based backdoor need be visible in the source code.

Figure 4.2 shows a schematic of an integrated electronic product and parts of the production line leading up to it. The yellow boxes represent the product itself, coarsely divided up into one hardware layer and three software layers. The green boxes are representations of parts of the final product that are visible to the engineers building it. These are the source code of the software and the high-level representations of the hardware. The blue boxes represent the tools that transform the source code representations into the finished product. As we have seen above, a backdoor can be inserted by any tool used in the process, so any blue box is a potential point of attack. To fathom the full complexity of the design process, we also need to understand that all of the tools in this chain – all of the blue boxes – are themselves electronic products that are built in exactly the same way, with source code, compilers, synthesis tools, logic synthesis tools, and a hardware production line. The recursion of tools that is implied by this observation is illustrated for compilers alone in Fig. 4.1.

Defence against an untrusted equipment maker should therefore focus on the actual hardware that has been purchased and on the actual machine code running on the system. It is depressingly simple for a dishonest equipment provider to introduce unwanted functionality through the development tools and for the dishonest provider this approach has the clear advantage that very few people in the company need to know about it. Looking at source code and verifying the toolchain itself is most likely futile; we will probably end up in a recursive pit consisting of tools building tools. For compilers alone, we are easily faced with a sequence of tools going all the way back to the early days of computing.