Advertisement

A high security and efficiency protection of confidentiality and integrity for off-chip memory

  • Yang Su
  • Jun-Wei Shen
  • Min-Qing Zhang
Original Research
  • 17 Downloads

Abstract

Due to the cost, power requirements and capacity, data are mainly stored off-chip rather than on-chip in embedded systems. The data exchanged between the processor and off-chip memory might be sensitive, security and efficiency of off-chip memory is a major concern in embedded systems. However, existing protection solutions often require significant overhead to achieve adequate protection. This paper proposes a security management method which can guarantee stronger data confidentiality and integrity with a reduced overhead in area, memory footprint, and performance cost. To reduce on-chip memory consumption and latency cost, protection scheduling and storage strategy are carefully studied, then our security management architecture is formed. Afterwards, confidentiality and integrity modules are designed respectively. Data confidentiality is protected through a Binary Additive Stream Cipher-AES-Time Stamp based on circuit and addresses scrambling network of Benes, and data integrity is safeguarded through an integrity checking module based on SHA3-TAH and MSet-XOR-Hash tree. Hardware implementation is designed and synthesized in a 0.18 µm process. The experiment area is about 1.1 mm2 while frequency reaches 563 MHz. The attack complexity is more than 2224 while additional storage consumption is limited in about 26.6%.

Keywords

Memory security Cryptographic algorithm Confidentiality Integrity SHA3 

1 Introduction

With the rapid development and progress of electronic information technology, embedded system equipment (Musa et al. 2018) can be seen everywhere in our lives, such as mobile phone (Hong and Xuebin 2018), tablet computer (Liang et al. 2018), ATM machine (ATM) and vehicle GPS (Jaime et al. 2016). These embedded systems often need to deal with sensitive information, including mobile payment, user bank account information, password information and so on. However, embedded systems are widely threatened by various hardware attacks and software attacks. These attacks seriously threaten the data confidentiality, data integrity and system availability of embedded systems, the privacy and property security of the users is threatened and the application in some security sensitive areas is restricted too. Therefore, the security problem of embedded system is attracting more and more attention.

The typical embedded system hardware can be divided into three main parts: the System on Chip (SoC) chip (Xiaojun et al. 2017), the off-chip memory and the external bus between the chip and the memory. The SoC chip usually has security protection against the physical attacks, abnormal environment detection and emergency destruction and so on. It can be considered safe. Off-chip memory is used to store a large number of system information. Because of the general use of third party products, the function is relatively simple and the security protection ability is weak, so it is the attacker’s key target (Yuan et al. 2018). The external bus is responsible for the communication between the SoC chip and off-chip memory. It is exposed to the outside and vulnerable to probes and other attacks. Therefore, from the point of view of the security protection of SoC chip, the high security SoC chip can be divided into security areas, while the vulnerable external bus and off-chip memory are divided into the areas which in need of protection.

Attacks on off-chip memory (Mădălin and Salvador 2017) (since the protection method applied to off-chip memory is also effective for the external bus, the off-chip memory and external bus will no longer be treated differently in our paper below) are mainly divided into hardware attack and software attack. Hardware attacks mainly include electromagnetic analysis attack (Duan et al. 2015), template attack (Hailong and Yongbin 2016) and power attack (Damla and Heba 2018) and so on. Electromagnetic analysis attack collects the electromagnetic information (which depends on the intermediate result of the cryptographic operation) produced during the work of cryptographic chip, and then carries out the statistical analysis in cryptographic knowledge and mathematics to achieve the purpose of breaking the key information. Template attack collects the side channel information in the work of targeted chips, analyzes their correlation and operation correlation, and achieves attacks on targeted chips. Power attack can be divided into simple power attack and differential power attack. Simple power attack directly measures and analyzes the power consumption of the security chips in the execution of the cryptographic operations, and obtains the entire or part of the key. Unlike the simple power attack, the differential power attack not only uses the power consumption of the chips, but also extracts the key information by statistical analysis.

Software attacks mainly include man-in-the-middle attack (Kemal et al. 2014), spoofing attack (Monali and Krishna 2017), resetting attack (Ryan and George 2018) and replay attack (Riccardo et al. 2017) and so on. The man-in-the-middle attack usually takes the transmission bus as the attack point, the attacker intercepts the effective information by inserting the monitor device in the middle of the channel, and sometimes even disguises it as the data terminal to complete the dialogue. The spoofing attack replaces the data variables in memory by using forged data, and then causes the system to go wrong. Spoofing attack is easy to operate and has a great threat to plaintext stored systems. Resetting attack can replicate the data in one address space to the other address space, and achieves the control of the system. Replay attack replaces current stored information with previously stored information on a certain address space, thereby disrupting the system’s updating of storage content. Resetting attack occurs on the spatial axis of data exchange (or replacement), while replay attack occurs on the timeline.

For the attacks above, simple data encryption is difficult to achieve the desired protection effect. For example, resetting attack can attack system instructions by replacing specific sequence of instructions, and attackers can enable the system entering to a special operation mode. In the face of complex attack means, we must combine various protective means to achieve the established safety protection target.

Some dedicated solutions for embedded systems are emerging. However, existing protection solutions often require significant overhead, and their protection ability is not strong enough to withstand the continuous development of attack technology. This paper details a high-efficiency security management ensuring confidentiality and integrity which matches with the strict requirements of embedded systems while maintaining strong data protection. The newly proposed scheme for data protection is compared with existing solutions for performance, memory footprint, logic area, and protection ability.

The rest of the paper is organized as follows. Section 2 reviews published off-chip memory protection strategies in the literature. Section 3 proposed a new security storage management scheme. Section 4 details a new security architecture aiming at data confidentiality while Sect. 5 details a new security architecture aiming at data integrity. In Sect. 6, a complete set of results is presented. The paper ends with some conclusions in Sect. 7.

2 Off-chip memory protection strategies in literature

The most beneficial method to protect off-chip memory data is to guarantee their confidentiality and integrity. In our study, the attacks are limited in external memory and bus (Faheem et al. 2018). Off-chip protection strategies currently known can be classified into two types: the Key Dependent Method (KDM) and the Algorithms Dependent Method (ADM).

KDM protection, taking complex key management technology as the foundation, is achieved through a key and cipher algorithm. The cryptographic key is the core of security of the algorithm, and safety can be guaranteed as long as the key is safe. XOM is a classic security storage strategy of KDM protection proposed by Stanford University which is based on a complex key management protocol to weaken the dependence of algorithm collaborative operation (Lie et al. 2003). The Adaptive Tree-log Scheme was proposed in 2003 and was improved in 2015 (Schraml et al. 2015). Based on multiple sets of hash function validator and hash tree, higher checking efficiency was obtained by tracking writing and reading operation. A distributed security storage structure was put forward by the National University of Defense Technology (Fangyong 2005). It was mainly aimed at the protection of shared memory in the multiprocessor system, and the process was distributed together on the processor and the memory controller, this method combined One Time Pad (OTP) encryption and timestamp integrity checking to effectively reduce the performance cost. The security page allocation strategy PIFT was proposed in 2009 and was optimized in 2013 (Santos and Fei 2013). The technology completes the verification based on the time process and indicated the credibility of each memory page, by the public label of the storage page, it improved the utilization of storage space and reduced the dependence on the on-chip operating system. The strategy has the advantages of flexibility, efficiency and being lightweight.

ADM protection, taking collaborative cryptographic algorithms as the foundation, is achieved through some cipher algorithms. The effective cooperation between algorithms is the prerequisite to ensure the security of information in off-chip storage. The GCM check scheme was proposed in 2004 and was improved in 2018 (Zhe et al. 2018). In the scheme, the checking cost is hidden in the memory encryption latency, then encryption and verification is executed concurrently. AEGIS was proposed in 2005 and was improved in 2017 (Shubhabrata and Jörn 2017). The strategy obtains confidentiality protection by OTP encryption technology as well as the integrity protection through hash tree with cache. The strategy PE-ICE (Elbaz et al. 2006) guarantees confidentiality and integrity through diffusion of block cipher (i.e. AES), the execution sequence is the label extraction first and then the data encryption. PE-ICE is essentially different from the common protection rules based on data encryption and abstract extraction, the PE-ICE completely relies on the AES encryption algorithm to ensure the integrity of the data, and the hash algorithm is not included in the architecture. Furthermore, a hash tree is optimized in the improved strategy named TEC-tree based on PE-ICE (Elbaz et al. 2007). The UMAC based hash tree verification scheme HHS (Hu and Sunar 2010) is constructed by incremental hash function, improving integrity, and random bit sequence is introduced to encrypt hash values. In addition, the Toeplitz technology is adopted to enhance the security without increasing the length of the key.

KDM brings security protection to the system through the security management of the key, it has the advantages of simple structure and easy realization, but it has low security and is difficult to be embedded into systems. ADM increases attack complexity and scatters secure source protection pressure. However, the execution efficiency and architecture scalability must be further improved. As a consequence, this paper proposes a hierarchical structure of co-dependent security storage strategy that combines a variety of algorithms with cache acceleration mechanism and hierarchical partition protection mechanism to realize the comprehensive protection effect with efficiency and flexibility.

3 Design of the security management architecture

3.1 Operation scheduling

Different schemes can be used to cipher and hash data as shown in Fig. 1. The security of hash algorithms is independent of cipher algorithms, so these three ways are almost equal in security. Let Tcipher be cipher operation time, TXOR be the time of key xor operation, Thash be the time of hash operation, Tcheck be the time of tag checking operation, TIN be the time of data input process, TAD be the data addressing time, TA be the tag addressing time. Reading addressing time is assumed to be TRD while writing addressing time is assumed to be TWR.

Fig. 1

Different schemes of confidentiality and integrity checking. a ETH for write operation; b ETH for read operation; c THE for write operation; d THE for read operation; e E&H for write operation; f E&H for read operation

For Encrypt-then-Hash (ETH), shown in (a) and (b), the plaintext will be ciphered followed by hash operation of ciphertext in the write operation. As a result, deciphering and integrity checking could be parallelized in the read operation. Then the latency added by protection of TETH will be:
$$\left\{ \begin{aligned} & {T_{RD}}={T_{AD}}+\hbox{max} \left\{ {{T_{hash}}+{T_{check}},{T_{chiper}}} \right\} \\ & {T_{WD}}={T_{chiper}}+\hbox{max} \left\{ {{T_{hash}},{T_{AD}}} \right\}+{T_{AT}} \\ & {T_{ETH}}={T_{chiper}}+\hbox{max} \left\{ {{T_{hash}},{T_{AD}}} \right\}+{T_{AT}}+{T_{AD}}+\hbox{max} \left\{ {{T_{hash}}+{T_{check}},{T_{chiper}}} \right\} \\ \end{aligned} \right.$$
(1)
For Hash-then-Encrypt (HTE), shown in (c) and (d), the plaintext is hashed followed by ciphering operation of the tag and plaintext in the write operation. Hence, deciphering of ciphertext can be done before the integrity checking in the read operation. Latency added by protection of THTE will be:
$$\left\{ \begin{aligned} & {T_{RD}}={T_{AD}}+{T_{chiper}}+{T_{hash}}+{T_{check}} \\ & {T_{WR}}={T_{hash}}+{T_{chiper}}+{T_{AD}}+{T_{AT}} \\ & {T_{HTE}}={T_{hash}}+{T_{chiper}}+{T_{AD}}+{T_{AT}}+{T_{AD}}+{T_{chiper}}+{T_{hash}}+{T_{check}}+{T_{IN}} \\ \end{aligned} \right.$$
(2)
For Encrypt-and-Hash (E&H), shown in (e) and (f), the write step is parallelizable but the ciphertext should be deciphered before the integrity checking in the read step. And the latency added by protection of TE&H will be:
$$\left\{ \begin{aligned} & {T_{RD}}={T_{AD}}+{T_{chiper}}+{T_{hash}}+{T_{check}} \\ & {T_{WR}}=\hbox{max} \left\{ {{T_{chiper}}+{T_{AD}},{T_{hash}}+{T_{AT}}} \right\} \\ & {T_{E\& H}}=\hbox{max} \left\{ {{T_{chiper}}+{T_{AD}},{T_{hash}}+{T_{AT}}} \right\}+{T_{AD}}+{T_{chiper}}+{T_{hash}}+{T_{check}}+{T_{IN}} \\ \end{aligned} \right.$$
(3)

Obviously, the reading latency of ETH and writing latency of E&H are the smallest. In embedded systems, the bus is always waiting for reading addressing. Moreover, the longer it takes for static integrity checking, the easier it is to be attacked. Therefore, we should shorten the static integrity checking time as much as possible. Static integrity checking in ETH only needs to perform hash algorithm without data encryption/decryption operation which takes a shorter time than HTE and E&H. As a result, we prefer the ETH scheme to protect off-chip memory.

3.2 Hierarchical storage strategy

Manufacturing and maintenance overhead is proportional to the security level. According to security needs and cost, sensitive data are stored respectively in various safety grade zones in our design as shown in Fig. 2. The large amount of confusion and encrypted data (M″), Hash tree (HT) and check root (H) are stored in the external insecurity zone. The small amount of temporary data and safety root are stored in the on-chip security zone. Temporary data include the temporary label (T), encrypted data (Mʹ) and decrypted system data (M). A safety root includes random data segment (m) and the root tag (ROOT).

Fig. 2

Data structure of security management

Different regions deliver safety through cryptographic algorithms. For example, the hash algorithm or data values are stored in a higher security area and safety lifting is to be realized through encryption or integrity checking for data in an area of lower security. By using classification data storage scheme, it can not only ensure the safety of sensitive information but also reduce the cost of precious memory resource.

4 Confidentiality management based on BASC-AES-TS and address scrambling

4.1 BASC-AES-TS mechanism

Confidentiality is generally obtained through cipher algorithms. Binary Additive Stream Cipher (BASC) combining features of parallelism of block cipher and time hidden of sequence keystream generation, is well-suited to meet the needs of real-time encryption/decryption and time optimization (Hojoon et al. 2018). The idea is to use data fetching delay to compute a random key stream. The keystream has to be created before writing to ensure that plaintext are ciphered before being stored, and must be reproducible in order to be available for the deciphering. As a consequence, the main issue of BASC is how to generate the keystream fast and securely.

In most systems, memory access requires a significant latency to retrieve data which may be long enough to create BASC keystream. AES and the AES based BASC (BASC-AES) operations are shown in Fig. 3. Time consumption of data encryption/decryption in BASC-AES process is less than that in the AES process. Data under processed is the input data of AES and encrypted data must first be obtained from the memory to start AES decryption operation in the AES based encryption process. However, data only participates the key XOR operation in BASC-AES based encryption process, making BASC-AES based encryption process much more efficient.

Fig. 3

Encryption sequence diagram in read addressing

Considering the security of encryption algorithm, keys used in each encryption process must be various. The key of BASC-AES is generated by AES algorithm, and as a consequence, it must be ensured that inputs of AES algorithm is absolutely different. Therefore, the TS (time stamps) is added into AES to ensure data of each address using different keys in each time. We call the encryption strategy based on BASC and AES with TS as BASC-AES-TS. The time stamps must be stored in a secure area to decrypt data in the next read addressing.

4.2 Addresses scrambling based on Benes network

The specific processor’s initialization code is relatively fixed, making it easier to get some instruction sequence as long as the processor’s type is known. As a result, a bit permutation module is introduced to the scrambling address mapping to strengthen the protection ability (Zhongyun et al. 2018). Scrambling strategy is shown in (4), in which E is the scrambling operation, D is the descrambling operation, a is the address before scrambling, and h is the address after scrambling:
$$\left\{ \begin{aligned} & h={E_{{k_1}}}\left( a \right) \\ & a={D_{{k_2}}}\left( h \right)={D_{{k_2}}}\left( {{E_{{k_1}}}\left( a \right)} \right) \\ \end{aligned} \right.$$
(4)
Operation E and D are realized by the Benes network (Mohsen and Fathollah 2015). Let N be the data width, m be the network progression, R be the switch unit number, then N = 2 m. For an N input and N output network based on basic 2 × 2 switch units, there are N! kinds of displacement, denoted as B (m), (m = logN). The Benes network can be recursive constructed by two layers of N/2 switching units and two sub-networks B (m − 1) as shown in Eq. (5). So, there are 2logN-1 layers, with each layer having N/2 switch units. This means that a total of NlogN − N/2 switch units are contained in the Benes network and NlogN − N/2 bit configuration parameters is needed.
$$B\left( m \right)=\left[ {N/2,\begin{array}{*{20}{c}} {B\left( {m - 1} \right)} \\ {B\left( {m - 1} \right)} \end{array},N/2} \right],\;m={\text{log}}N$$
(5)
To improve safety, configuration parameters generator and a dynamic update controller is designed. A parameters generator consists of three primitive polynomial LFSRs (Susil and Vashek 2018) (Linear Feedback Shift Register). The feedback polynomials are shown in (6):
$$\left\{ {\begin{array}{ll} { f_{1} \left( x \right) = x^{{41}} \oplus x^{6} \oplus x^{3} \oplus x^{2} } \\ { f_{2} \left( x \right) = x^{{47}} \oplus x^{8} \oplus x^{6} \oplus x^{3} } \\ { f_{3} \left( x \right) = x^{{53}} \oplus x^{7} \oplus x^{5} \oplus x^{2} } \\ \end{array} } \right.$$
(6)
The controller is clock controlled as shown in Eq. (7). LFSR1 takes the current 21 register bits as the X1, LFSR2 takes the current 24 register bits as the X2, LFSR3 takes the current 27 register bits as X3, X1, X2, X3 are used to control these three LFSR’s actions. Three LFSRs will continue to generate a random number with a period of 2144. Random sequence from these registers is filled to the permutation network according to the rules of parameter configuration. To obtain the original address, a scrambled address will be descrambled with an inversion of configuration parameters:
$$\left\{ {\begin{array}{ll} {{M_1}=\left( {{X_1} \oplus {X_2}} \right)\cdot \left( {{X_2} \oplus {X_3}} \right)} \\ { {M_2}=\left( {{X_2} \oplus {X_3}} \right)\cdot \left( {{X_1} \oplus {X_3}} \right)} \\ {{M_3}=\left( {{X_1} \oplus {X_3}} \right)\cdot \left( {{X_1} \oplus {X_2}} \right)} \end{array}} \right.$$
(7)

4.3 Operation process of confidentiality management

In the write process a TS is first assigned for each address segment by BASC-AES-TS, and it will be incrementally updated in every write addressing. Secondly, data is encrypted using keys generated by TS. Meanwhile the address scrambling operation is completed. At last, TS and ciphertext is written to the corresponding memory area.

Memory write addressing process:

1) Time stamp increment: TS(@) = TS(@) + 1;

2) Key generation: key = BASC-AES-TS

{TS(@), PAD};

3) Data encryption: C = D⨁key;

4) Address scrambling: A’ = S(A);

5) Cipher text writing: C → MEMORY;

6) Time stamps storing: TS(@) → TS-MEMORY.

Memory read addressing process:

1) Stamp reading: TS(@) ← TS-MEMORY;

2) Key generation: key = BASC-AES-TS

{TS(@),PAD};

3) Address descrambling: A’ = S(A);

4) Cipher text reading: C ← MEMORY;

5) Data decryption: D = C⨁key;

6) Plain text outputing: D → CACHE-MEMORY.

In the read process, BASC-AES-TS accesses TS from the label storage area and transports keys to the AES module when the controller receives a read address request. The generated key is just the same key used in encryption operation, so the data could be decrypted accurately and the right plaintext could be delivered to the bus.

5 Integrity management based on SHA3-TAH and MSet-XOR-Hash

5.1 Hash strategy of SHA3-TAH

Hash algorithms are usually used for integrity protection. The algorithm should provide strong enough security protection and be efficient in hardware structure. Moreover, it should be able to be extended to resist additional properties tampering, reset and replay an attack. Finally, the control mechanism should be streamlined to reduce its dependence and burden on the system. The SHA3 can well meet the above demands (Xiongwei et al. 2016), so we choose SHA3 as the algorithm for integrity protection.

SHA3 is based on a three-dimensional space array, supporting four modes which are 1152-224, 1088-256, 832-384, and 576-512 (Blocks—hashes). The algorithm consists of 24 rounds operation, and each round is divided into 5 steps:

\(\theta :a\left[ x \right]\left[ {y^{\prime}} \right]\left[ z \right] \leftarrow a\left[ x \right]\left[ {y^{\prime}} \right]\left[ z \right]+\sum\limits_{{y^{\prime}=0}}^{4} {a\left[ {x - 1} \right]\left[ {y^{\prime}} \right]\left[ z \right]} +\sum\limits_{{y^{\prime}=0}}^{4} {a\left[ {x+1} \right]\left[ {y^{\prime}} \right]\left[ {z - 1} \right]}\);

\(\rho :a\left[ x \right]\left[ {y^{\prime}} \right]\left[ z \right] \leftarrow a\left[ x \right]\left[ {y^{\prime}} \right]\left[ {z - \left( {t+1} \right)\left( {t+2} \right)/2} \right]\);

If \(0 \leq t<24\) and \({\left( {\begin{array}{*{20}{c}} 0&1 \\ 2&3 \end{array}} \right)^t}\left( {\begin{array}{*{20}{c}} 1 \\ 0 \end{array}} \right)=\left( {\begin{array}{*{20}{c}} x \\ y \end{array}} \right)\left| {GF{{\left( 5 \right)}^{2x2}}} \right.\), or \(t= - 1\) and \(x=y=0\);

\(\pi :a\left[ x \right]\left[ y \right] \leftarrow a\left[ {x^{\prime}} \right]\left[ {y^{\prime}} \right]\) and \(\left( {\begin{array}{*{20}{c}} x \\ y \end{array}} \right)=\left( {\begin{array}{*{20}{c}} 0&1 \\ 2&3 \end{array}} \right)\left( {\begin{array}{*{20}{c}} {x^{\prime}} \\ {y^{\prime}} \end{array}} \right)\);

\(\chi :a\left[ x \right] \leftarrow a\left[ x \right]+\left( {a\left[ {x+1} \right]+1} \right)a\left[ {x+2} \right]\);

\(\ell :a \leftarrow a+RC\left[ {{i_r}} \right]\).

SHA3 is improved in our architecture. Firstly, because the block length of AES is 128 bits, the key stream generated by the BASC-AES-TS data encryption mechanism is an integer multiple of 128. In order to improve the utilization of the algorithm, the block length of the SHA3 should also be 128 times. Secondly, attribute parameters is introduced into SHA3’s last 128 bit to achieve stronger integrity protection. TS is added to cope with the replay attack, and the address label is added to cope with the reset attack, and the hash key (HKey) is added to increase the relevance of hashes and system. For an example, the length of the data block could be set to 1024 bit in 1152-224 mode, which is eight times the length of AES data block and the remaining 128 bits is used for attached properties. We call this verification strategy based on SHA3, TS and address as SHA3-TAH.

5.2 Integrity trees based on MSet-XOR-Hash

Safety requirements determine that the compression ratio cannot be too high, so a large number of hashes will be produced. Because tagging quantity is limited by memory storage, hashes consumption of on-chip memory should be streamlined. Therefore, solving the contradiction between the protection ability and storage space is critical.

Hash tree realizes the management mechanism of multi-layer label tree by stacking hash operations (Yun and Viktor 2016), as shown in Fig. 4. Taking the root node as a secure source, the hash tree achieves security by diffusion among layers. Any changes of leaf nodes will eventually lead to the change of root node. The integrity of the hash tree is safety as long as the root node is stored in a trusted area.

Fig. 4

Architecture of hash tree

However, node updating and hash check consume much time and constructing function and checking method have to be improved. Incremental Multiset Hash (iHASH) is a hash functions family based on set operations. They map multisets of arbitrary finite sizes to hashes of a fixed length. The iHASH functions have the incremental property that, when new members are added to the multiset, the hash can be quickly updated by addition and subtraction operations of elements.

Definition 5.1

Let H, +H, ≡ H be operations of probabilistic polynomial time algorithms (Andrea and Holger 2015). Then a function formed by arithmetic above is multiset hash functions as long as it satisfies the following requirements:

Comprassion: H is able to map set B to an element of a set with cardinality 2m, where m is some integer.

Comparability: Since H can be a probabilistic algorithm, a multiset need not always hash to the same value. Therefore we need ≡ H to compare hashes. Any multisets M of B must meet the polynomial \(H\left( M \right)\,{ \equiv _H}\,H\left( M \right)\) for comparison.

Incrementality: Let M and M’ be elements of set B, and there is \(H(M \cup M\text{'})\,{ \equiv _H}\,\)\(H\left( M \right){+_H}H\left( {M\text{'}} \right)\) for operation \({+_H}\). Equation of \(H(M \cup \left\{ b \right\}){ \equiv _H}H\left( M \right){+_H}H\left( {\{ b\} } \right)\) could be easily computed knowing H(M) and elements b in set B.

Definition 5.2

Hk is a pseudorandom function with key k, and Hk: \({\left\{ {0,1} \right\}^l} \to {\left\{ {0,1} \right\}^m}\). Expression \(r\xleftarrow{R}{\left\{ {0,1} \right\}^m}\) means r is randomly selected from \({\left\{ {0,1} \right\}^m}\) (Jie and Jianliang 2018).

As one of iHASH, MSet-XOR-Hash uses the XOR operation and is very efficient compared to others. We choose MSet-XOR-Hash as hash tree management algorithm. It can be defined as Eq. (8) (Xingyuan et al. 2018). When a leaf node changes, a subtraction operation can eliminate the influence to the hash tree. Furthermore, the hash tree will be updated through an addition operation to add a new leaf node. Since the changes in leaf node only needs the hash tree to perform an addition and subtraction operation in each layer, MSet-XOR-Hash tree performs at a higher rate of execution efficiency.
$$\begin{aligned} & H_{k} \left( M \right) = \left[ {H_{k} \left( {0,r} \right) \oplus \mathop \oplus \limits_{{v \in V}} M_{v} H_{k} \left( {1,v} \right);\sum\limits_{{v \in V}} {M_{v} \bmod 2^{m} } ;r} \right]\left| {_{{r \leftarrow B}} } \right.; \\ & \quad \left( {h,c,r} \right) + _{{H_{k} }} \left( {h^{\prime},c^{\prime},r^{\prime}} \right) = \left[ {H_{k} \left( {0,r^{\prime\prime}} \right) \oplus \left( {h \oplus H_{k} \left( {0,r} \right)} \right) \oplus \left( {h^{\prime} \oplus H_{k} \left( {0,r^{\prime}} \right)} \right);c + c^{\prime}\bmod 2^{m} ;r^{\prime\prime}} \right]\left| {_{{r^{\prime\prime}\mathop \leftarrow \limits^{R} \{ 0,1\} ^{m} }} } \right.; \\ \end{aligned}$$
(8)

There is a trade-off between the compression ratio, the security memory footprint and the security level. This will discussed in Sect. 6.2 in greater detail. It gives designers the opportunity to build a secure system which fits the requirements of his application.

5.3 Operation process of integrity management

Integrity protection consists of a static check and a dynamic check. The dynamic check includes read addressing check, write addressing check, and hash tree check. The process of each check operation is shown as below.

The static check is mainly targeted at data integrity before booting up the system fully to ensure that the loaded program and the data are not damaged. The new label is obtained through hash to verify data integrity in read addressing check while the new label is written into the tag memory for a later check in write addressing. In order to ensure the safety of data and hash tree, the hash tree must be checked regularly. Hash tree check starts from the leaf node to the root node layer by layer. If the root node is unchanged, then the entire hash tree is safe.

6 Hardware implementation and performance analysis

6.1 Security management hardware core

According to modular principle, protection hardware architecture is designed as shown in Fig. 5. Interface to external consists of a 32 bits address bus, a group of 128 bits input and output data bus, and a group of control bus. Under the scheduling of control module, data encryption module accesses TS from the tag memory, meaning while it fills the data (PAD), then it sends these data to plaintext input port of AES module. Data from data bus is to be processed in key XOR module with the ciphertext generated by AES. Furthermore, address signals is disorganized through the address scrambling module.

Fig. 5

Block diagram of security management

Interface to internal is much the same as interface to external. Data from data bus firstly is sent to the Hash module to generate label with TS and address information together. After that, the check label is sent to the hashes management module to construct hash tree according to the address information. There is an especially designed label bus to transfer the check label as well as a node bus to transfer the hash tree node. Efficient data communication can be realized through these transmission channels.

6.2 Analysis of security level and overhead

6.2.1 Analysis of security level

Table 1 summarizes the existing off-chip memory protection approaches which support some level of confidentiality and integrity. With regards to confidentiality, our design is enhanced not only by a higher complexity of encryption, but also by data address scrambling. With regards to integrity, our design is improved by adding an address, a time stamp and hash key properties to the SHA3 and iHASH algorithm. The security of PE-ICE and TEC-tree depends on the label, which security level is no higher than 264. The security of the XOM and AEGIS depends on the hash algorithm, of which the security is improved. However, the complexity of Hash algorithm used is no more than 2160. The complexity of SHA3 used in our strategy is more than 2224 at least. Our attack complexity is overwhelmingly higher than that of others.

Table 1

Comparison of security ability and protection means

Strategies

XOM

PE-ICE

TEC-tree

AEGIS

HHS

OURS

Memory space occupied

 On-chip

Keys TS

Random

Counter number

Root node

Root node

Random

Keys hashes

TS hashes

 Off-chip

Hashes

Address random

Counter number

TS hashed

Random

Hashes

Confidential management

AES-OTP

AES

PE-ICE

OTP

AES

BASC-AES-TS

Benes Network

Integrity management

Hash

Tag check

PE-ICE

Cache hash tree

NH hash tree

SHA3-TAH

iHASH-tree

Security

2128/2160

232

264

2160

2128

≥ 2224

6.2.2 Resources overhead of security

Each module is descripted by Verilog HDL language and is synthesized in a 0.18 µm process. The comprehensive results of the key modular are shown in Table 2. Area overhead of our solutions is about 1.1 mm2, while the frequency reaches 563 MHz. The hardware design achieves the design goal of high speed with small area overhead.

Table 2

Comprehensive results of area and frequency (partly)

Module

Function

Cipher algorithm

Execution time (clock cycle)

Area (µm2)

Frequency (MHz)

Confidentiality management

Data encryption/decryption

AES

10

384,259

576

Address scrambling

Benes

0

19,121

Integrity management

Hash

SHA3

120

671,075

563

hash tree addition/Subtraction

iHASH

1

36,836

756

Since TS and Hash tree consume most storage resources, memory overhead can be evaluated by Hash tree and TS storage space. Assume 1 G storage space is to be protected, then the total number of leaf node will be \({N_0}=1\left( {Gb} \right)/1024\left( {bit} \right){\text{ }}=1048576\). Three kinds of hash mode are discussed in the following, which are the \(1024 - 224\), \(768 - 256\) and \(512 - 384\). L is assumed to be 256, 128, 64, 32 and 16. The trend of total tags is shown in Fig. 6. The graph illustrates that the total nodes to be stored is inversely proportional to the compression ration L under specified hash mode. One conclusion can be drawn that the greater the L is, the smaller the memory overhead is. The original data is shown in Table 3.

Fig. 6

Memory overhead with variable L under different modes

Table 3

Comparison of space resource consumption of hash tree

Hash mode

L

\({N_1}\)

\({N_2}\)

\({N_3}\)

\({N_4}\)

\({N_5}\)

\(\sum\limits_{{i=2}} {{N_i}}\)

1024–224

16

229,376

14,336

896

56

1

15,289

32

229,376

7168

224

1

7393

64

229,376

3584

56

1

3641

128

229,376

1792

1

1793

256

229,376

896

1

897

768–256

16

349,525

21,845

1365

85

1

23,297

32

349,525

10,923

341

1

11,266

64

349,525

5461

85

1

5548

128

349,525

2731

1

2732

256

349,525

1365

1

1367

512–384

16

786,432

49,152

3072

192

1

52,417

32

786,432

24,576

768

24

1

25,369

64

786,432

12,288

192

1

12,481

128

786,432

6144

48

1

6193

256

786,432

3072

1

3073

The memory footprint of PE-ICE, TEC-tree, XOM, AEGIS and ours (1024-224 mode) is analyzed respectively taking 256 Kb’s instruction and 256 Kb data as a protected object. The results are shown in Table 4. Demands of space resources of our strategy is considerably smaller. It can be seen from parameter of additional proportion that our storage consumption is limited in about 26.6%, which is about half of that of other safety storage strategies except HHS.

Table 4

Comparison of memory footprint

Strategies

XOM

PE-ICE

TEC-tree

AEGIS

HHS

Ours (L = 64)

Ours (L = 16)

Code

Data

Code

Data

Code

Data

Code

Data

Code

Data

Code

Data

Code

Data

Intra-chip additional

0

24

0

0

0

0/32

0

32

3

3

0

0

0

0

Off-chip additional

128

128

128

262

128

128

160

276

60

60

65

65

68

68

Memory footprint

792

902

768/800

980

638

642

648

Additional proportion

54.7%

76.2%

50%/56%

91%

24.6%

25.4%

26.6%

Security level

232

264

2128/2160

2160

2128

2224

≥ 2224

6.2.3 Latency of security

Taking 1024 bits parallel processing task as an example, latency of several safety storage strategies is listed in Table 5. The main cost of PE-ICE is the check segment. Its encryption/decryption operation cannot be parallelized with data addressing. The main delay of XOM comes from hash operation for using MD5/SHA1 algorithm. The delay of AEGIS is related to the number of hash tree layers and compression ratio. The time consumption of one layer is listed in the table. Time delay of TEC-tree is related to hash tree operations. In our strategy parallel mechanism is introduced by hiding the key stream generation in addressing operation in reading, and hash operation in addressing operation in writing. Furthermore, the execution time of hash tree is reduced by iHASH-tree mechanism. It can be seen that our latency is much shorter than the others.

Table 5

Comparison of latency in read and write addressing

Security management

AES

XOM

PE-ICE

TEC-tree

AEGIS

HHS

Ours

Read delay (cycles)

50

+ 64/80

+ 3

> 80

> 25

+ 62

+ 6

Write delay (cycles)

50

+ 64/80

+ 4

+ 54/70

+ 4

+ 69

− 8

Security level

2128/2160

232

2160

264

2128

2224

7 Conclusion

This paper systematically studied off-chip memory protection for an embedded system, and proposes a high security and efficiency protection scheme, which contains a data encryption process of BASC-AES-TS, addresses scrambling mechanism of Benes network, hash strategy of SHA3-TAH and iHASH-tree architecture of MSet-XOR-Hash. The experiment results showed that our protection strategy can provide much higher confidentiality and integrity protection with minimum overhead of memory footprint and latency. Furthermore, hardware implementation achieved a frequency of 563 MHz with quite a small area of overhead. In summary, the security protection we designed outperforms other existing protection methods. Our high security and efficiency protection management is a promising and significant solution for off-chip security protection in embedded systems.

Notes

Acknowledgements

The authors thank all the reviewers and editors for their valuable comments and works. This paper is supported by the National Natural Science Foundation of China (no. 61772550).

References

  1. Andrea T, Holger H (2015) Polynomial time decision algorithms for probabilistic automata. Inf Comput 224:134–171MathSciNetzbMATHGoogle Scholar
  2. Damla Y, Heba Y (2018) Power analysis based side-channel attack on visible light communication [online]. Phys Commun.  https://doi.org/10.1016/j.phycom.2018.04.013 Google Scholar
  3. Duan L, Hongxin ZH, Qiang L et al (2015) Electromagnetic side-channel attack based on PSO directed acyclic graph SVM. J China Univ Posts Telecommun 22(5):10–15CrossRefGoogle Scholar
  4. Elbaz R, Torres L, Sassatelli G et al (2006) A parallelized way to provide data encryption and integrity checking on a processor-memory bus. In: Proceedings of the 43rd annual design automation conference. ACM, pp 506–509Google Scholar
  5. Elbaz R, Champagne D, Lee RB et al (2007) Tec-tree: a low-cost, parallelizable tree for efficient defense against memory replay attacks. In: Cryptographic hardware and embedded systems-CHES 2007. Springer, Berlin Heidelberg, pp 289–302CrossRefGoogle Scholar
  6. Faheem U, Matthew E, Rajiv R (2018) Data exfiltration: A review of external attack vectors and countermeasures. J Netw Comput Appl 101:18–54CrossRefGoogle Scholar
  7. Fangyong H (2005) Research on key techniques of memory system data confidentiality and integrity protection. National University of Defense Technology, ChangshaGoogle Scholar
  8. Hailong ZH, Yongbin ZH (2016) How many interesting points should be used in a template attack. J Syst Softw 120:105–113CrossRefGoogle Scholar
  9. Hojoon L, Minsu K, Yunheung P et al (2018) A dynamic per-context verification of kernel address integrity from external monitors [online]. Comput Secur.  https://doi.org/10.1016/j.cose.2018.02.013 Google Scholar
  10. Hong A, Xuebin CH (2018) Research on embedded access control security system and face recognition system [J]. Measurement 123:309–322CrossRefGoogle Scholar
  11. Hu Y, Sunar B (2010) An improved memory integrity protection scheme. Trust and trustworthy computing. Springer, Berlin, pp 273–281CrossRefGoogle Scholar
  12. Jaime AB, Leonel PT, Rolf Fredi M et al (2016) An embedded system approach for energy monitoring and analysis in industrial processes. Energy 115:811–819CrossRefGoogle Scholar
  13. Jie L, Jianliang ZH (2018) Paula Whitlock. Efficient deterministic and non-deterministic pseudorandom number generation. Math Comput Simul 143:114–124CrossRefGoogle Scholar
  14. Kemal B, Devrim U, Nadir A et al (2014) Mobile authentication secure against man-in-the-middle attacks. Procedia Comput Sci 34:323–329CrossRefGoogle Scholar
  15. Liang Y, Lichen SH, Junyan ZH et al (2018) Heterogeneous information network model for equipment standard system. Phys A 490:935–943CrossRefGoogle Scholar
  16. Lie D, Thekkath CA, Horowitz M (2003) Implementing an untrusted operating system on trusted hardware. In ACM SIGOPS operating systems review, ACM, pp 178–192Google Scholar
  17. Mădălin N, Salvador M (2017) Defending cache memory against cold-boot attacks boosted by power or EM radiation analysis. Microelectron J 62:85–98CrossRefGoogle Scholar
  18. Mohsen J, Fathollah B (2015) Improving the reliability of the Benes network for use in large-scale systems. Microelectron Reliab 55(3–4):679–695Google Scholar
  19. Monali M, Krishna A (2017) Modeling and analyses of IP spoofing attack in 6LoWPAN network. Comput Secur 70:95–110CrossRefGoogle Scholar
  20. Musa A, Gonzalez V, Barragan D, Ambient J (2018) A new strategy to optimize the sensors placement in wireless sensor networks [online]. J Ambient Intell Hum Comput.  https://doi.org/10.1007/s12652-018-0868-2 Google Scholar
  21. Riccardo MG, Ferrari, André MH, Teixeira (2017) Detection and isolation of replay attacks through sensor watermarking. IFAC Pap OnLine 50(1):7363–7368CrossRefGoogle Scholar
  22. Ryan H, George L (2018) Detecting semantic social engineering attacks with the weakest link: Implementation and empirical evaluation of a human-as-a-security-sensor framework. Comput Secur 76:101–127CrossRefGoogle Scholar
  23. Santos JCM, Fei Y (2013) Leveraging speculative architectures for runtime program validation. ACM Trans Embedd Comput Syst 13(1):3CrossRefGoogle Scholar
  24. Schraml R, Hofbauer H, Petutschnigg A et al (2015) Tree log identification based on digital cross-section images of log ends using fingerprint and iris recognition methods. In: International conference on computer analysis of images and patterns, CAIP 2015, pp 752–765Google Scholar
  25. Shubhabrata S, Jörn WJ (2017) Aegis: reliable application execution over the mobile cloud. Procedia Comput Sci 109:482–489CrossRefGoogle Scholar
  26. Susil KB, Vashek M (2018) Investigating results and performance of search and construction algorithms for word-based LFSRs, σ-LFSRs. Discret Appl Math 243:90–98MathSciNetCrossRefzbMATHGoogle Scholar
  27. Xiaojun ZH, Amine ASA, Abbes A et al (2017) ECG encryption and identification based security solution on the Zynq SoC for connected health systems. J Parallel Distrib Comput 106:143–152CrossRefGoogle Scholar
  28. Xingyuan W, Xiaoqiang ZH, Xiangjun W et al (2018) Image encryption algorithm based on multiple mixed hash functions and cyclic shift. Opt Lasers Eng 107:370–379CrossRefGoogle Scholar
  29. Xiongwei F, Kenli L, Wangdong Y et al (2016) A secure and efficient file protecting system based on SHA3 and parallel AES. Parallel Comput 52:106–132CrossRefGoogle Scholar
  30. Yuan G, Hong A, Zenghui F et al (2018) Mobile network security and privacy in WSN. Procedia Comput Sci 129:324–330CrossRefGoogle Scholar
  31. Yun RQ, Viktor KP (2016) Compact hash tables for decision-trees. Parallel Comput 54:121–127MathSciNetCrossRefGoogle Scholar
  32. Zhe L, Hwajeong S, Chien-Ning CH et al (2018) Secure GCM implementation on AVR. Discret Appl Math 21:58–66MathSciNetzbMATHGoogle Scholar
  33. Zhongyun H, Shuang Y, Yicong ZH (2018) Medical image encryption using high-speed scrambling and pixel adaptive diffusion. Signal Process 144:134–144CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Engineering University of PAPXi’anChina
  2. 2.State Key Laboratory of CryptologyBeijingChina

Personalised recommendations