Constructing software countermeasures against instruction manipulation attacks: an approach based on vulnerability evaluation using fault simulator

Fault injection attacks (FIA), which cause information leakage by injecting intentional faults into the data or operations of devices, are one of the most powerful methods compromising the security of confidential data stored on these devices. Previous studies related to FIA report that attackers can skip instructions running on many devices through many means of fault injection. Most existing anti-FIA countermeasures on software are designed to secure against instruction skip (IS). On the other hand, recent studies report that attackers can use laser fault injection to manipulate instructions running on devices as they want. Although the previous studies have shown that instruction manipulation (IM) could attack the existing countermeasures against IS, no effective countermeasures against IM have been proposed. This paper is the first work tackling this problem, aiming to construct software-based countermeasures against IM faults. Evaluating program vulnerabilities to IM faults is required to consider countermeasures against IM faults. We propose three IM simulation environments for that aim and compare them to reveal their performance difference. GDB (GNU debugger)-based simulator that we newly propose in this paper outperforms the QEMU-based simulator that we presented in AICCSA:1–8, 2020 in advance, in terms of evaluation time at most ×\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document}400 faster. Evaluating a target program using the proposed IM simulators reveals that the IM faults leading to attack successes are classified into four classes. We propose secure coding techniques as countermeasures against IMs of each four classes and show the effectiveness of the countermeasures using the IM simulators.


Introduction
Modern ICT (Information and Communication Technology) society is supported by many devices, such as smartphones and IC cards, containing confidential information. These devices provide confidentiality of internal data by using various kinds of security functions, such as access control and cryptography. Fault injection attacks (FIA) [1], which physically inject intentional faults into devices' operations, can bypass or corrupt the normal operation of the security functions. Increasing the number of devices around us in the IoT era enables attackers to obtain physical access to the target devices easily; therefore, protecting the devices from FIA becomes more critical to realize a secure information society.
Modeling possible faults [2] which attackers can inject is a necessary step to construct countermeasures against FIA. Previous studies report that many fault injection means can skip assembly instructions running on many types of architecture. Such a fault effect is modeled as instruction skip (IS) model and is widely studied as a possible threat to cryptography on actual devices [3,4]. Some studies report that injecting IS at short intervals is difficult, and they propose anti-IS countermeasures under the assumption that IS does not occur at consecutive instructions [5,6].
The instruction replacement (IR) model is another fault model on software, where the attackers can have the devices run faulty instructions, which are instructions replaced from originally programmed ones. Some experiment results report that IR faults were observed on some devices [7][8][9]. Most of the IR faults in these studies are uncontrollable; therefore, some studies conclude that the IR faults are not useful for establishing successful attacks [7]. On the other hand, recent studies [10][11][12][13] show that attackers can manipulate the instructions running on the devices partly but as attackers want, by laser fault injection (LFI). We call such faults instruction manipulation (IM) to distinguish them from the above (uncontrollable) IR. IM faults are pretty powerful than conventional IS and IR faults; however, the IM faults in the previous studies have some restrictions because the IM faults are injected by single-spot laser equipment. A single-spot laser-one of the significant restrictions-can manipulate only one specific bit of each target instruction. Even under this restriction, IM faults pose a significant threat to devices. For example, manipulating one bit of a branch instruction can cause a control flow alternation. To prevent such attacks by LFI, hardware countermeasures such as light detectors are essentially effective. Nevertheless, some manufacturers demand anti-FIA countermeasures without any hardware modification due to cost reasons.

Related works
In the general context of fault tolerance, faults like soft errors by radiations occur at random; therefore, time/space redundancy is effective since the probability that faults into redundant parts occur at the same time is considered negligible. Some compiler-based and formally provable fault mitigation methods make good results in such cases [14,15]. On the other hand, attackers can inject faults into vulnerable points intentionally in the context of FIA. One challenging point to construct anti-FIA countermeasures on software is that the countermeasure part of the software can also be fault-injected by attackers. Previous studies related to anti-FIA countermeasures take care of the faults into the countermeasure part by restricting the number of faults or the frequency of faults. Moro et al., [6] propose an anti-IS countermeasure under the assumption where the attackers can skip only one instruction. Barenghi et al., [5] propose another anti-IS countermeasure under the assumption where attackers can skip instructions, but not consecutive instructions. On the other hand, Sakamoto et al., [11] report that attackers can inject multiple IM faults into consecutive instructions with laser since the laser effects on the electric circuit last during the laser duration. Constructing countermeasures against the IM is considered a more challenging task than the conventional software countermeasure constructions.
We presented a part of work in this paper in advance in [16], where countermeasures against IMs and evaluation results using QEMU-based IM simulator are shown. The QEMU-based IM simulator was useful to construct countermeasure against IM. We proposed four secure coding schemes against IM vulnerabilities found by the QEMUbased IM simulator. Meanwhile, the QEMU-based IM simulator has the drawback that the vulnerability evaluation takes a long time. It took approximately 132 min to evaluate a short (15 instruction) program. Reducing the computation time for IM simulation is a critical milestone to evaluate realistic application programs that contain a large number of instructions. This paper is the extended version of the previous work; therefore, it includes proposals of effective simulation environments for evaluating IM vulnerabilities and a computational complexity reduction technique in addition to the contents in [16].

Contributions
To make problems to be solved to construct anti-IM countermeasures clear, we first restrict IM attacker's ability and formally define it as the one-bit-set instruction manipulation (OSIM) model. This model is based on the fact that existing studies [10,11] related to IM use singlespot lasers onto flash memories for injecting IM faults. Colombier et al., [10] explain that laser irradiation onto a flash memory causes only bit-set faults and that single-spot laser affects only one bit. Kumar et al., observe one-bitreset IM (ORIM) faults on an AVR microcontroller [13]. Section 7 discusses the efficiency of our anti-OSIM countermeasure against ORIM. We do not care about faults injected into more than two bits in this paper in order to relax countermeasure construction. The bit adjacent ordering is device-dependent and can be too complicated because of some reasons, for example, bus scrambling. We construct the anti-OSIM countermeasure through an attackbased approach. We first investigate how the OSIM model attacks an existing anti-FIA countermeasure. For this purpose, we propose three simulation environments to simulate the effects of the OSIM model faults, and then we attack the existing anti-FIA countermeasure by the simulators. Among the simulators, the GDB-based simulator outperforms the QEMU-based simulator presented in previous work in [16] in terms of evaluation time. However, the GDB simulator's performance is insufficient for evaluating large programs because the evaluation time increases exponentially to the number of instructions. Furthermore, we propose a computational cost reduction technique for OSIM simulation. The OSIM model can manipulate instruction bits only to '1' but not to '0'; therefore, we can eliminate simulating some IMs such as manipulations from '1' to '1'. This method reduces the computational complexity of the IM simulation from approximately the power of n to the power of n/2, where n is the number of instructions. Using the computation complexity reduction technique and the GDB-based simulator, we achieve approximately Â400 faster evaluation when n ¼ 15 compared to the results in [16].
A data integrity verification (DIV) scheme, which is often used as a common countermeasure against fault injection attacks (FIA) [1,5,17] is chosen as the attack target. The attack results are categorized into four classes. We propose coding methods as countermeasures to prevent each of these four attacks. The proposed coding methods improve the DIV, and the improved DIV implementation is attacked on the IM simulator to evaluate the efficiency of the proposed countermeasure.
The remainder of this paper is organized as follows. In Sect. 2, we provide works related to FIA and preliminaries for explaining the proposed methods. A formal description of the OSIM model and its computational complexity for simulation is presented in Sect. 3. A reduction technique of the computational complexity is also described in Sect. 3. Three different OSIM model simulation environments are presented in Sect. 4. In Sect. 5, a target program is evaluated using the three simulation environments, where the evaluated results show that the GDB-based simulator outperforms other simulators. Based on the evaluation results, the coding method as a countermeasure against the OSIM model are proposed in Sect. 6. The simulators also evaluate the countermeasure-applied implementation to confirm the efficiency of the proposed countermeasures. Discussions are given in Sect. 7, and finally, Sect. 8 concludes this paper.

Fault injection attack (FIA)
Fault attacks are physical attacks that inject intentional computation errors (so-called 'faults') and use them to compromise security. Since fault attacks can disturb the operation of normal algorithm flow, they can cause information leakage from secure implemented devices. Fault injection is performed by various means, such as clock glitches [7,17,19], underpowering [20], electromagnetic perturbation [8,21,22], laser irradiation [9,11,23], and body biasing [24]. As a simple example of a fault attack, Anderson and Kuhn introduce an attack that forcibly outputs sensitive data by changing addresses of output data [25]. Differential fault analysis (DFA) [26,27] is a sophisticated technique for attacking cryptosystems. When injecting faults during cryptosystem computation, incorrect ciphertexts are output from the device. DFA computes the candidates of a secret key of the cryptosystem with sets of a correct (fault-free) and an incorrect (fault-injected) ciphertext.

Fault model
No correct operations are ensured on devices exposed to the threat of fault attacks; therefore, assuming realistic attackers' ability and making it the basis of trust is necessary for constructing countermeasures against fault attacks. Such a process is called 'fault modeling' [2]. Fault models strictly define the faults attackers can inject. Defining appropriate fault models based on an assumed attacker, we can construct secure countermeasures against the target fault model. While various levels of fault models can be considered, defining suitable fault models is desirable depending on the required security level due to the higher cost of countermeasures against stronger fault models. The level of faults to be injected depends on the attackers' knowledge and their fault injection equipment. Instruction skip (IS) and instruction replacement (IR) models are known as popular fault models that attackers can inject into actual devices. Data integrity verification (DIV) schemes are widely used as countermeasures against DFA. The DIV schemes output no data if the verification is failed so that attackers cannot perform DFA.

Instruction skip (IS) model
IS model allows an attacker to skip an assembly instruction. Previous studies have reported the occurrence of IS faults for several fault injection means on various devices, such as the clock glitch on the 8-bit AVR ATMega163 [7] and LEON3 [19], electromagnetic irradiation on the 32-bit ARM Cortex-M3 [8] and the 32-bit ARMv7-M [21], and laser irradiation on the 32-bit ARM Cortex-M3 [9]. Actual key-retrieve attacks based on IS faults are reported by several papers [12,19]; constructing countermeasures against IS is a major topic in fault attack studies.

Instruction replacement (IR) model
IR model allows an attacker to replace a running instruction with another. IS model can be considered part of IR model, wherein a target instruction is replaced with a NOP instruction; therefore, IR model faults represent a more significant threat than IS model faults. Some experiment results reported that IR faults were observed on some devices [7,8,12]. Balasch et al. reported the occurrence of IR faults on an ATMega163 smartcard via a clock glitch at the time of instruction pre-fetch [7]. In their study, some instructions, e.g., EOR (Exclusive OR) and SER (SEt all bits in Register), were targeted, and the instructions after replacement depend on the period of the clock glitch. However, Balasch et

Instruction manipulation (IM) model
On the other hand, recent studies [10,11,13] show that means of accurate fault injection, such as laser irradiation, can control IR faults and launch practical attacks. We call such faults instruction manipulation (IM) to distinguish them from (uncontrollable) IR. In these studies, flash memories on the target devices are irradiated with lasers. This laser irradiation injects faults into the instruction fetching process; the bit position to be fault-injected depends on the irradiated points. If an attacker records the irradiated points and the corresponding faulty bit positions in advance, the attacker can replace a target instruction with another one. However, the authors in [10] describe that IM faults injected by laser irradiation pose some limitations. One of the major restrictions is that a single-spot laser can manipulate only one specific bit of each target instruction. Even under this restriction, IM faults pose a threat to devices.

One-bit set instruction manipulation (OSIM) model
OSIM model allows an attacker to set one bit of a machine code to 1. According to [10], only bit-set faults are observed on their target device; thereby, we assume that attackers can set a bit to 1 but not to 0. Moreover, we assume that the attacker uses single-spot laser equipment. This assumption limits the number of bits that can be faultinjected simultaneously. We assume that a single-spot laser can inject faults into only a single bit because it can illuminate only one spot at a time. These assumptions reflect the actual faults that attackers can inject. Note that the OSIM model does not define the number of faults that can be injected. Sakamoto et al., [11] shows that the number of laser-injected IM faults increases with a longer laser duration.

OSIM model attacker's ability
We summarize an attacker's ability to inject OSIM faults with single-spot laser equipment as follows: 1. The attacker can set one bit per machine code to 1. 2. The attacker can inject the bit-set faults into multiple instructions. 3. The attacker cannot inject the bit-set faults into different bit positions during program execution.
We define the OSIM model formally according to the above conditions in the next section.

OSIM model and its computational complexity
This section describes how much computational complexity the OSIM simulator has and proposes its reduction technique. We first define the OSIM model formally and then evaluate the computational complexity. The computational complexity of the OSIM model is exponential to the number of instructions. The reduction technique becomes more effective as the instructions increase.

Formal definition of the OSIM model
Definition 1 (Instruction code) Let the bit length of an instruction be w. An instruction code ins is represented as a w-dimensional row vector, Definition 2 (Running program) A running program P is an n Â w matrix consisting of n instructions, Note that P is not the program stored in ROM. The IM faults in previous studies inject bit-set faults into program memory read-out circuits rather than into ROM.
OSIM faults into the cth bit are represented as following; where the operator k c represents the component-wise OR operation between the right operand and the cth column of the left operand, and k represents bit OR operation.
Here, c corresponds to the irradiated position irradiated by laser-namely which bit position is targeted-and e i ¼ 1 corresponds to the laser irradiation timing. For example, the laser turns on at the timing of the fetching of the 1st, 3rd, and nth instructions when e 1 , e 3 , and e n in a fv are equal to 1. Executing P 0 different from P by attackers' manipulation can cause confidential information leakage from the devices.

Computational complexity of the OSIM model
The abstract workflow of the OSIM model simulators presented in Sect. 4 is shown in Fig. 1. The .text section of the target executable file is manipulated according to all possible OSIM model faults by the python program, and the python program generates all the manipulated executables. Let w be the instruction length and n be the number of instructions to be targeted by the faults. In Thumb instruction, w ¼ 16. The number of all possible OSIM model faults is N all ¼ w Â 2 n because 2 n possible fault vectors can be injected into w bit positions. We can obtain the N all results from the simulator and analyze them for constructing countermeasures. Since N all is exponential to the number of instructions n, increasing n makes OSIM simulation difficult.

Computational complexity reduction technique
As shown above, the number of OSIM faults increases exponentially to the number of instructions n. Simulating all OSIM faults becomes more challenging by increasing the number of instructions in the examined program. We propose a technique that reduces the number of OSIM faults we must simulate. The idea is based on the OSIM model can manipulate instruction bits to '1' but not '0'; therefore, some fault vectors cause no effects on the program execution. Fig. 2 shows the classification of all possible fault vectors (FVs) injected to the first column of an examined program. In this situation, the number of all possible FVs is N all ¼ 2 4 ¼ 16. Distinct 16 FVs, fv 0 ; :::; fv 15 , are classified into two groups: four ineffective FVs, which produce no change onto the program binaries, and effective FVs, which produce some changes onto the program binaries. Let n i;1 be the number of '1' bits in the ith column. There are 2 n i;1 ineffective FVs in the ith column such as the figure. Clearly, we do not need to simulate the ineffective FVs. Remains (these are 2 n À 2 n i;1 FVs) are the effective FVs that can change the program's behavior. Actually, we do not need to simulate some effective FVs as we can know some of them to lead to the same results. For example, the four FVs (fv 4 ; fv 6 ; fv 12 ; fv 14 produces the same manipulated binaries because all of e 2 and e 4 are identical, which contribute the instruction manipulations. By simulating an FV in such the same group, we can know the execution results from the rest of the FVs without simulation. There are ð2 n À 2 n i;1 Þ=2 n i;1 ¼ 2 n i;0 À 1 such the groups for every column: therefore, we can get the all 2 n results generated by every FV only simulating 2 n i;0 À 1 executions. For entire program, we can get all w Â 2 n results by simulating only P wÀ1 i¼0 ð2 n i;0 À 1Þ executions. Assumed '1' and '0' ratio in target programs as 1:1, the number of FVs we have to simulate becomes w Â ð2 n=2 À 1Þ, which is close approximately to 2 n=2 by increasing the number of instructions n.

OSIM model simulator and its variations
This section describes some constructions of the OSIM model simulators. The OSIM model simulator workflow is already shown in Fig. 1, there are some variations by the execution environment of the workflow. The main computational bottleneck of the workflow is the execution time of the Thumb instruction set simulator (ISS). Our previous work uses the user-mode QEMU as the Thumb ISS, which takes approximately 10 ms for simulating each binary executable. This section proposes three different approaches for OSIM simulators, the user-mode QEMU-based, the full-system QEMU-based, and the GDB-based.

User-mode QEMU-based OSIM simulator
This approach uses user-mode QEMU (arm-qemu-static) as the Thumb ISS as shown in Fig. 3. The QEMU emulator has two execution modes, user-mode emulation and fullsystem emulation. The former achieves high performance by omitting to emulate the low-level parts of the system such as hard disk and Ethernet controllers. We adopted the user-mode QEMU ISS for our IM simulator presented in [16]. The concrete system description are given in Fig. 3. The python script (InstructionManipulation.py) executed on Ubuntu 20.04 Linux manages the entire OSIM simulation workflow. The python script launches arm-qemu-static as a child process of the python and passes the manipulated binary executable to the child process as an argument. Although user-mode emulation can quickly execute the passed binary executables, the child process generation is a large performance bottleneck because it involves launching a large arm-qemu-static system.

Full-system QEMU-based OSIM simulator
This approach uses full-system QEMU (runqemu coreimage-minimal) as the Thumb ISS as shown in Fig. 4. The full-system QEMU virtual machine is emulated on Ubuntu 20.04, and we can run arbitrary processes on the virtual machine. The python script for managing IM simulation is executed on the virtual machine. The python script directly launches the manipulated binaries as child processes on the virtual machine. Although this approach has the disadvantage that the emulation is slow, the  FVs that causes the same effects = 2 1,1 .

GDB-based OSIM simulator
This approach uses GDB (GNU debugger) as the Thumb ISS as shown in Fig. 5.
Two processes, the python process for managing IM simulation and the GDB process for executing Thumb binary executables, run on Ubuntu 20.04. The python script passes the manipulated binaries to the GDB process via inter-process communication, and the GDB executes the received binaries and sends back the executed results. The GDB process is stationed in Ubuntu. The python script controls the GDB interactively with respect library. An advantage of this approach is that it requires no process launching for each binary execution. Therefore, this approach is expected as a promising solution to improve OSIM simulation performance. However, this approach has some disadvantages as well. First, this approach is considered unsuitable for simulating large systems such as applications on Linux as we have to prepare Linux kernel and shared libraries ourselves. On the other hand, this approach has sufficient abilities for simulating bare-metal applications and applications on small operating systems, which are often used in embedded devices. Second, the arm simulator on the GDB (GDB 10.1) was not able to handle some exceptions such as undefined instruction and aborts exceptions. We made possible the exception handling by modifying some source codes.

OSIM simulators comparison
This section compares the different OSIM simulator approaches described before section by examining (attacking) a target program.

Plaintext integrity verification with default fail (PIVD)
We chose plaintext integrity verification with default fail (PIVD) [17] as the attack target for the OSIM simulators. PIVD, a DIV scheme, is designed as a countermeasure against IS faults. Listing 1 shows an example of the PIVD implementation written in Thumb. This program outputs CTXT (the encrypted outcome of PTXT) only if two plaintexts, PTXT and PTXT 0 , are equal to prevent DFA. The encryption and decryption processes are executed by the EncAndDec function at line 11. We specify no cipher algorithm but implicitly take into account a famous block cipher such as AES-128. To verify a 128-bit block at one time, 32-bit hashed values of PTXT and PTXT 0 are stored in the addresses r3 and r3 ? 0904, respectively. This hash process is not essential. We avoid multiple-word comparison to ease anti-OSIM countermeasure construction.
OutCTXT is a function that outputs CTXT (ciphertext).
FaultHandler is a function that securely aborts the program to prevent fault attacks. The attacker tries to inject DFA faults during the encryption process to obtain faulty ciphertext; therefore, it is the attacker's goal to execute the OutCTXT function under the state that PTXT != PTXT 0 .  ; FaultHandler function is located at an upper address from here . 17 . endfunc This program is written in an anti-IS manner called default fail [28], where important functions (OutCTXT) or instructions (at LBL_SUCCESS) are placed at a lower address than the conditional branch for integrity verification (at line 8). The default operation of the branch (LBL_FAIL and FaultHandler) is placed at a higher address. The default fail manner is secure against skipping the beq instruction because IS can increment a program counter but not decrement it. Note that this program is written to be intentionally vulnerable to the IM, to find as many as vulnerabilities for the attack-based countermeasure construction.

PIVD evaluation results by the OSIM model simulators
The results of the attacks on the PIVD by the simulators are shown in Table 1. In the attacks, PTXT and PTXT 0 are set to different values under the assumption that the attacker already injected faults during the encryption process. The parameters n ¼ 8. The branch instruction at line 11 is excluded from the target of the OSIM faults because we assume that the faults are already injected during the execution of the EncAndDec function. The table shows that the results of the user-mode QEMU-based and the full-system QEMU-based simulators are the same except for evaluation time. On the other hand, the GDB-based simulator exhibits few different results. The differences are considered originated from the difference that the GDB-based simulator executes programs as bare metal applications, but the QEMU-based simulators execute them as Linux applications (for example, program loading addresses and startup routines differ). The GDB-based simulator poses approximately 79 faster evaluation time compared to the user-mode QEMU-based simulation presented in [16]; therefore, it is suitable to evaluate bare metal and small os applications. Moreover, we apply the computation complexity reduction technique described in Sect. 3.3 to the GDB-based simulator. The evaluation results are shown in the last row in Table 1. The evaluation time is reduced to 1/5 compared to before applying the technique and 1/40 compared the user-mode QEMU-based simulator.
We found that the faults that lead to faulty ciphertext outputs can be categorized into four patterns. In the next section, we describe how the four-fault patterns cause faulty ciphertext outputs and propose secure coding methods as countermeasures against the OSIM faults.

OSIM model attacks on the PIVD and the proposal of countermeasures against them
This section describes what OSIM faults cause to defeat the PIVD and proposes coding methods secure against such OSIM faults.

Operand register manipulation (ORM)
This is an attack manipulating the operand register of the cmp instruction at line 7 as shown in Fig. 6. The cmp instructions in Thumb take two operands, a source register Rs and a destination register Rd. In the original program, two distinct r1 and r3 are used for the source register Rs and destination register Rd, respectively. When the contents of the two registers differ, the PIVD program detects a fault injection and enters the fault handler function during normal flow. Attackers can have the cmp instruction compare the same register by injecting the OSIM fault shown in Fig. 6. In this case, the cmp instruction always sets the Z flag to 1; therefore, the integrity verification process passes for any values. Similar faults are used to increase the number of attack trials by changing a counter register to another one [10]. where H D(, ) is a function that evaluates the Hamming distance of the two input operands, and bin(reg) means the binary expression of the reg.

Branch address manipulation (BAM)
This is the attack that forcibly executes the OutCTXT function by manipulating the address of the unconditional branch instructions at line 9. Unconditional branch instruction in Thumb ISA indicates the branch address by two's complement program counter (pc) relative addressing as shown in Fig. 7. FaultHandler function is located at a higher address than the PIVD since it follows the default fail manner; therefore, the sign bit (the 10th bit) of the unconditional branch instruction is zero. The OSIM fault at the sign bit changes the branch address to a lower address than the PIVD. If the location of the OutTXT function coincidentally corresponds with the changed address as shown in Fig. 7, the attackers can obtain faulty ciphertexts.

Countermeasure against the BAM
To prevent the above attack on the address of the branch instruction, a FaultHandler function is placed at the address between the OutCTXT function and PIVD. By placing a FaultHandler function at a lower address than the PIVD, the bit-set fault in the 10th bit of the branch instruction affects nothing because the 10th bit is already one before the fault injection. Other OSIM faults in the address field (the 9th-0th bits) change the branch destination to higher addresses than FaultHandler function. OutCTXT function is placed at a lower address than FaultHandler function to prevent unexpected reaches by such address manipulations. Figure 8 shows the attack that changes the value to be compared by manipulating the load address of the load instruction (ldr) at line 12. The ldr instruction loads the value at the address indicated r3 into r1. Since the address of the PTXT is stored in r3, PTXT is loaded into r3 in the case without any fault injections; however, manipulating the ldr machine code as shown in Fig. 8 changes the operation of the ldr instruction to loading PTXT 0 . As a result, the PIVD program compares the same value and always outputs ciphertexts, even if faults are injected. Figure 9 shows another example of the ldr instruction manipulation leading to successful attacks. This is an example of the attack by a multiple-instruction manipulation. From the OSIR model assumption, the attacker can inject bit-set faults into the same bit position of multiple instructions. The faults are injected into the 7th bits of the ldr instructions (offset fields) so that unrelated values are loaded to both registers for integrity verification, r1, and r3.    and r3 ? 1), zeros were loaded in the experiment. As a result, the PIVD program always outputs ciphertexts. This is because the simulator environment (QEMU) initializes the values at unused addresses to zeros. The fault at the 7th bit is one example of this type of attack. The faults at other bits of the offset, Rb, and Rd fields can lead to this type of attack.

Countermeasure against LAM
Choosing proper offset in binary, similar to the countermeasure against ORM, is effective against the first LAM attack. To prevent the OSIM faults from manipulating the address of PTXT to the address of PTXT 0 , the offsets from the base address must satisfy the following expression: where H D(, ) is a function that evaluates the Hamming distance of the two input operands, and d PTXT represents the offset of PTXT from the base address.
We propose unused memory randomization as a countermeasure against the second LAM. This randomization requires that different values are stored in all unused addresses, but strong randomness is not required for the values. This countermeasure might consume a lot of time if the device has a vast address space; however, the necessity of this countermeasure is device-dependent because the treatment of unused addresses differs by device. We consider that unused memory randomization is unnecessary for most embedded devices because the SRAM values become unstable when booting.

Branch condition manipulation (BCM)
This is the attack that manipulates the branch condition field of the beq instruction at line 8. An actual BCM attack is performed on an ARM SC100 device [11]. The conditional branch is one of the most important instructions for determining software control flow. Manipulating the operation of the conditional branch can lead to remarkable information leakage. As shown in Fig. 10, the branch instruction in Thumb ISA determines the branch condition by the four-bit condition field (the 11th-8th bits). When the condition field is 0b0000, it is decoded to beq instruction, which branches if Z = 1. The beq instruction is used in the PIVD for jumping the OutCTXT function only if PTXT is equal to PTXT 0 . When an attacker injects a fault into the 8th bit of the beq instruction, the beq instruction changes to a bne instruction holding the same branch address. The bne instructions branch if Z = 0; therefore, this IM changes the PIVD program to a program that outputs CTXT when PTXT is not equal to PTXT 0 . Bit-set faults at other bits of the condition field cause similar effects. For example, a fault at the 9th bit changes the beq instruction to a bcs instruction which branches if C = 1. The C flag is set to 1 by the cmp instruction at line 7 when r1 ! r3. We consider that the attacker cannot control the PTXT 0 value, and half of the cmp executions set C flag to 1. Since the IM to the bcs instruction outputs CTXT with high probability, constructing countermeasures against the IM to bcs (and also bmi and bhi) is necessary for preventing DFA, not only the IM to bne.

Countermeasure against the BCM
The BCM attacks change the opcode part of the instructions, unlike other attacks which change the operand part; therefore, a countermeasure against BCM naturally requires a tricky method, unlike countermeasures against other OSIM faults. As shown in Listing 2, the countermeasure against BCM requires some extra instructions. The instructions at lines 14-16 are a comparison process instead of a cmp instruction, and the branch instruction at line 17 is responsible for determining if verification succeeds or fails. The following instructions in lines 8-12 are responsible for detecting BCM attacks. Consider if the beq instruction at line 17 is manipulated to a bne instruction as an example. If the beq instruction at line 17 is changed to other conditional branches by BCM, the identical conditional branch can detect the fault and jumps to the FaultHandler function. Consider if the beq instruction at line 17 changes to a bcs instruction through the 9th-bit manipulation. In this case, the attacker can pass the verification at line 17 when the C flag is one; however, the attacker cannot reach the OutCTXT function because of another bcs instruction at line 9. Moreover, the attacker cannot inject faults into the second bcs instruction because its 9th bit is originally one.

Evaluation of the proposed countermeasures
A PIVD implementation written in the proposed anti-OSIM manners is shown in Listing 2.
The anti-OSIM PIVD implementation is evaluated by two OSIM simulators. Table 2 shows the evaluation results by the user-mode QEMU-based simulator. Anti-OSIM PIVD implementation causes no faults though it has a large number of extra instructions (n ¼ 15). This result shows that the proposed countermeasures are effective in corresponding attacks. On the other hand, the evaluation time also largely increases by increasing the number of instructions. For increasing the number of instructions just twice, the evaluation time approximately becomes 132 times. Table 3 shows the evaluation results by the GDB-based OSIM simulator with computational complexity reduction technique. The evaluation time gets approximately 4009 faster than the user-mode QEMU-based simulator for n ¼ 15.

Limitation of OSIM model simulator
Although our OSIM model simulator simulates the fault effects by manipulating the bits of executable files, actual OSIM faults are injected into running codes. Therefore, the simulator cannot simulate the OSIM model perfectly; for example, the cases that a manipulated instruction in a loop is executed multiple times. In this case, the simulator can simulate only the situation when faults are injected into all of the looped instructions. The actual OSIM model can choose which instructions are fault-injected into. We do not care about such cases because the target PIVD implementation has no loop.
Moreover, our proposed countermeasures do not ensure 100% fault tolerance because whether faulty ciphertexts are output is dependent on the initial device states such as register and memory values. Simulating all possible register and initial memory values is impossible. Even in the current settings (simulating only one initial value), the simulation takes approximately 1.5 and 20 sec for n = 8 and 15, respectively. While we achieve a large evaluation time reduction in this paper, we consider that the vulnerability evaluation by the OSIM simulators is not applicable for large programs since the simulation time increases the order of exponentiation for the number of instructions n. We estimate that evaluating an n ¼ 50 program takes about a day. Mitigating the attacker's model, for example, restricting the number of instructions the attacker can Listing 2 An example of the anti-OSIM PIVD implementation.

Performance evaluation
The proposed countermeasures against the ORM, LAM, BAM, which modify the operand of instructions, have almost no effect on device performance. We need to take the performance decline by the countermeasure against BCM and the unused memory randomization into account. The countermeasure against BCM adds some instructions into the original PIVD implementation. These extra instructions make the program footprint and the execution time worse. However, the overhead of the extra instructions can be negligible compared to the encryption and decryption process.
Unused memory randomization countermeasure might consume a lot of time if the device has a vast address space; however, the necessity of this countermeasure is device-dependent because the treatment of unused addresses differs by device. We consider that the unused memory randomization is unnecessary for most embedded devices because SRAM values become unstable when booting.

Application to other programs and ISAs
Although the proposed countermeasure targets to protect only the narrow code snippet, the data verification process is a basis for other security mechanism; therefore, we consider the proposed method can be embedded in other programs. One example is an instruction duplication scheme [5]. This scheme executes all ldr instructions in AES encryption twice and verifies all the loaded values. Since the verification processes are almost the same as the PIVD, we consider our proposed method is also applicable to the instruction duplication scheme. Automating finding out the parts vulnerable to the OSIM faults in large programs and replace them with the proposed anti-OSIM code remain as future works.
While we developed an anti-OSIM countermeasure for ARM Thumb ISA, Kumar et al., report that IM faults on an AVR microcontroller [13]. We suppose that the proposed countermeasures against ORM, BAM, and LAM attacks are effective in protecting other ISAs because they require only operand modification. On the other hand, the proposed countermeasures against the BCM attack, which requires operand modification, likely not to apply other ISAs in the same manner. How operand modification influences a program behavior is strongly ISA-dependent and hard to predict: therefore, we must carefully investigate the effects using a simulator.
Furthermore, Kumar et al. observe one-bit-reset IM (ORIM) faults on an AVR device rather than OSIM faults. This implies that the direction of manipulation (set/reset) is device-dependent. We must use different countermeasures depending on the fault models. For example, the inequality signs in inequalities (1) and (2) must be reversed to make the proposed countermeasures effective to anti-ORIM faults.

Conclusion
We propose secure coding schemes as countermeasures against the OSIM model, which is one of the IM models.
Constructing countermeasures against IM model on software is a big challenge because the attackers can inject faults into the countermeasure part of the software. We construct countermeasure against the OSIM model by Table 2 Evaluation results on original PIVD (Listing 1) and proposed anti-OSIM PIVD (Listing 2) using the user-mode QEMU-based OSIM simulator Target implementation  Faulty ciphertext  Fault detected  EXP4*  EXP11*  EXP19*  n  Evaluation time   Original PIVD  528  2368  288  848  64  8  60 s   Anti-OSIM PIVD  0  346112  45056  116736  16384  15 132 min *EXP4, EXP11, and EXP19 represent undefined instruction, segmentation fault, and timeout exceptions, respectively attack-based approach. Three simulation environments for simulating the OSIM faults has been constructed, and we investigated which parts of the software are vulnerable to the OSIM faults. We have proposed the countermeasure schemes according to the simulation results and shown that the effectiveness of the countermeasures by the attacks using the simulator. In addition, we propose OSIM simulation complexity reduction technique and the GDB-based OSIM simulator to For outcome of the improvements, and evaluation time is reduced by at most 4009.
Author contributions All authors contributed equally to the final dissemination of the research investigation as a full article. All authors have read and agreed to the published version of the manuscript. Data availability Not applicable.
Code availability This paper uses a modified version of GDB (GNU debugger) v10.1, which is an open source code provided by https:// www.gnu.org/software/gdb/download/.

Conflict of interest Not applicable.
Ethical approval Not applicable.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons. org/licenses/by/4.0/.
Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Junichi Sakamoto received his M.I.S and Ph.D. degrees from Yokohama National University, Japan, in 2017, and 2020, respectively. He is currently working as a researcher at Yokohama National University and National Institute of Advanced Industrial Science and Technology. He has engaged in various researches with respect to hardware security, efficient implementations of the cryptographic algorithms, side-channel attacks, and laserbased fault attack. He is currently interested in remote side-channel attacks and hardware implementation of homomorphic encryptions.
Shungo Hayashi is a master course student working on information security at Yokohama National University, Japan. He received B.S. degree in Information Technology from Yokohama National University in 2016. He is currently working on researches on fault injection attacks against embedded systems.
Daisuke Fujimoto received B.E., M.E., and Ph.D. degrees from Kobe University, Japan, in 2009, 2011 and 2014, respectively. He is currently an assistant professor at the Graduate School of Information Science, Nara Institute of Science and Technology, Nara, Japan. He is also a visiting assistant professor in the Institute of Advanced Sciences, Yokohama National University. His research interests include hardware security and the implementation of security cores. He is a member of IEEE and IEICE.
Tsutomu Matsumoto is a professor of the Faculty of Environment and Information Sciences, Yokohama National University, and the Director of the Cyber Physical Security Research Center at the National Institute of Advanced Industrial Science and Technology. He received a Doctor of Engineering degree from the University of Tokyo in 1986. He has been interested in research and education of Embedded Security Systems such as IoT Devices, Cryptographic Hardware, In-vehicle Networks, Instrumentation and Control Security, Tamper Resistance, Biometrics, Artifact-metrics, and Countermeasure against Cyber-Physical Attacks. He serves as the chair of the Japanese National Body for ISO/TC68 (Financial Services) and the Cryptography Research and Evaluation Committees (CRYPTREC) and as an associate member of the Science Council of Japan (SCJ). He received the IEICE Achievement Award, the DoC-oMo Mobile Science Award, the Culture of Information Security Award, the MEXT Prize for Science and Technology, and the Fuji Sankei Business Eye Award.