1 Introduction

The traditional centric data storage cannot meet the requirements for efficient management of big data in many cases, so most data centers or clouds apply Distributed Storage Systems (DSS) involving tens of thousands of storage nodes [1,2,3]. Among so many storage nodes, there will likely be some nodes that cannot provide access to the data due to a number of problems such as cyberattacks (e.g. Distributed Denial of Service, DDoS), crash of the operating system or applications, failure of Internet connections, or hardware failures [4,5,6]. Based on the report by [7], the percentage of unavailable data nodes can be approximately 2% each day.

To recover the data of failing nodes, redundant nodes are usually deployed in the DSS network, and coding schemes are applied to introduce constraints among the nodes. Different version of erasure codes (EC) and networking codes have been proposed for this purpose [8,9,10]. The Maximum Distance Separable (MDS) codes are probably the most commonly used scheme for DSS. The MDS code was first proposed by Acedanski in 2005 [11], and it was proved to achieve the Singleton bounds, so it became a popular choice for data recovery in DSS. In a DSS with MDS codes, N nodes store the original data (called information nodes) and R redundant nodes store the checking data (called checking nodes). When a node crashes, N nodes will be accessed to recover the data that was store on the failing node. This process will introduce non-negligible bandwidth overhead to the DSS. To improve the network traffic (also called recovery fan-in), regeneration codes (RGC) [12] and locally regeneration codes (LRC) [13] were proposed in 2007 and 2011, respectively. However, these schemes involve large amount of matrix operations during the recovery process, so the complexity is very high for practical applications. In 2014, Rashmi and his co-authors first created the framework for Piggybacking in [14], and then proposed a practical 2-stripes MDS scheme (Hitchhiker) [15]. For the case that a single information node crashes, the Hitchhiker scheme can recover the data by solving the Piggybacking function, which involves much less data access than the basic MDS scheme.

In recent years, some new EC schemes have been proposed. For example, the minimum bandwidth regenerating codes (MBR) is proposed in [16] to further reduce the storage cost and repair bandwidth for a wide range of exact repair mechanisms. Wang and his co-authors created the quasi-systematic code (QS) to introduce the privacy-preserving property to the DSS [17]. However, most of these works are variants of the normal MDS scheme for special purposes. In practical applications of DSS, most systems still apply the original MDS scheme. For example, Ren et al. proposed to combine the EC scheme and simple copies to achieve a balance between repair complexity and storage efficiency [18]. For an RS based DSS, [19] applied a weighted binary tree to select the best replacement node when node failure occurs. In [20], Redundancy First Sleep (RFS) and Data First Compensation (DFC) algorithms are proposed to reduce the repair complexity of RS based DSS.

Most of above applications of DSS are for cloud based storage, and the data access latency is large because the cloud is usually far from the users. To meet the requirements of low latency applications, edge computing and storage has become a hot topic in recent years [21, 22], and the EC schemes are also applied for reliable storage on edge. For example, [23] replies on edge servers to provide efficient data access for Internet of Things (IoT), and MDS code is applied directly to improve the storage reliability at the edge with small overhead. The work in [24] focuses on Content Deliver Networks (CDN) that push hot contents to the edge servers. To achieve a balance between storage efficiency and repair complexity, they proposed to apply the MDS scheme for data with big volume, and simply maintain copies the data with small volume. The authors in [25] consider the cooperative storage of cloud and edge and applied the general EC scheme to protect the storage on both sides. As illustrated in [26], edge servers are usually less reliable than those in the cloud, so the failure probability of multiple nodes increases dramatically. In this case, the protection schemes based on normal MDS schemes will introduce large network traffic and high recovery complexity.

In this paper, a new DSS structure with efficient data protection capabilities is proposed based on the Redundant Residue Number System (RRNS). The features of the second version of the Chinese Remainder Theorem (CRT-II) are fully utilized to recover the crashed nodes with much fewer node accesses and lower complexity.

The remainder of this paper is organized as follows. In Sect. 2, we will first build the model for DSS, and then introduce the procedure for the basic MDS scheme and its improved scheme, including the analysis of network traffic and complexity. In Sect. 3, we first introduce the basics of Residue Number Systems (RNS) and RRNSs, and then propose the RRNS based DSS structure, including the analysis of network traffic and complexity for different cases. Experiments are performed in Sect. 4 to compare the proposed scheme with the MDS based schemes in terms of network traffic and complexity. After discussions of the security advantages of the proposed scheme in Sect. 5, the paper is concluded in Sect. 6.

Before describing the system model, the parameters used in this paper are listed in Table 1 for convenience.

Table 1 List of main parameters

2 System model and MDS based DSS

2.1 System model of coding based DSS

Before describing the system model, the parameters used in this paper are listed in Table 1 for convenience.

The basic structure of a traditional coding based DSS is shown in Fig. 1. The original data is first divided into N units (or symbols), and then R redundant checking units are generated by coding. Finally, all the \(N+R\) data units are distributed on \(N+R\) nodes. When one of the nodes crashes, the data on it are not available, and needs to be recovered based on the coding scheme [27,28,29]. Usually, the communication overhead of the recovery consists of the data that has to be collected from normal nodes and the data stored in the new nodes. In Sect. 4, the proposed scheme will be compared with the basic MDS scheme and the Hitchhiker scheme, so we will introduce the details of these two schemes in following two sub-sections, including the analysis for network traffic and complexity.

Fig. 1
figure 1

Storage and recovery of data in coding based DSS

2.2 Introduction of basic MDS scheme

2.2.1 Principle of encoding and decoding

The most representative MDS code is the ReedSolomon (RS) code [30,31,32]. An (NR) RS erasure code (RSEC) is composed of N data symbols \(\left( {\varvec{d}}=\left[ d_{1}, d_{2}, \ldots , d_{N}\right] ^{T}\right)\) and R parity symbols \(\left( {\varvec{p}}=\left[ p_{1}, p_{2}, \ldots , p_{R}\right] ^{T}\right)\). Each symbol is expressed as a w-bit word, and the combination of the \(N+R\) symbols is defined as a codeword \({\varvec{c}}=\left[ {\varvec{d}}^{T}\ {\varvec{p}}^{T}\right] ^{T}\) on the Galois Field GF \(\left( 2^{w}\right)\). Any combination of N symbols (data or parity) could be used to recover the original N data symbols [33]. In other words, the maximum number of erasures is R.

In the encoder, the R parity symbols of RS-EC can be generated from N data symbols by matrix multiplication over \(\textrm{GF}\left( 2^{w}\right)\) [34] as follows:

$$\begin{aligned} {\varvec{p}}={\varvec{G}} \times {\varvec{d}} \end{aligned}$$

where \({\varvec{G}}\) is an \(R \times N\) generator matrix defined over \(\textrm{GF}\left( 2^{w}\right)\) and is commonly constructed in the form of a Vandermonde matrix [35] as shown in equation (2).

$$\begin{aligned} {\varvec{G}}=\left[ \begin{array}{cccc} 1^{0} &{} 2^{0} &{} \cdots &{} N^{0} \\ 1^{1} &{} 2^{1} &{} \cdots &{} N^{1} \\ \vdots &{} \vdots &{} \ddots &{} \vdots \\ 1^{R-1} &{} 2^{R-1} &{} \cdots &{} N^{R-1} \end{array}\right] \end{aligned}$$

The symbols are stored on \(N+R\) nodes in the DSS. If e data symbols are erased or equivalently e data nodes crash \((e \le R)\), they can be recovered as follows. First, we can construct an \(e \times 1\) vector \(\varvec{p_{e}}\) of e available parity symbols, a \(N \times 1\) vector \(\varvec{d_{e}}\) of the available data symbols and 0 s on the positions of erasures, an \(e \times N\) matrix \({\varvec{G}}^{\prime }\) with e rows from \({\varvec{G}}\) corresponding to the e available parity symbols in \(\varvec{p_{e}}\), and an \(e\times e\) matrix \({\varvec{G}}^{\prime \prime }\) with e columns of \({\varvec{G}}^{\prime }\) corresponding to the e erased data symbols. Then the vector of erased data symbols \(\varvec{d_{r}}\) could be recovered according to equations (3) and (4) as explained in [36], and the process is shown in Fig. 2 for an (3, 3) RS-EC.

Fig. 2
figure 2

RS-EC based data recovery (for (3, 3) RS-EC)

$$p_{I} = {\text{ }}p_{e} - {\text{ }}G^{\prime } \times {\text{ }}d_{e} {\text{ }}$$
$$d_{r} = {\text{ }}\left( {G^{{\prime \prime }} } \right)^{{ - 1}} \times p_{I} {\text{ }}$$

Since the code is defined over Galois Field \(\textrm{GF}\left( 2^{w}\right)\), the operation of addition (or subtraction) between elements is actually performed with XOR, and the multiplication (or division) operation is performed based on a logarithmic table (gf log) and an inverse logarithmic table (gfi log) on \(\textrm{GF}\left( 2^{w}\right)\) as explained in [37] and shown in Eq. (5).

$$x \times y = gfi\;\log (gf\;\log (x) + gfl\;\log (y))$$

2.2.2 Network traffic and complexity for node recovery

Based on the recovery process of RS-EC, we can see that N data symbols need to be collected from the node cluster no matter how many data symbols are erased \((e<R)\). Since e symbols need to be stored on the new nodes, the network traffic is

$$\begin{aligned} N_{T}=N+e. \end{aligned}$$

The main complexity of the recovery process is due to the need to compute the inverse of an \(e \times e\) matrix \({\varvec{G}}^{\prime \prime }\) and N multiplications.

2.3 The Hitchhiker scheme

The Hitchhiker is a new erasure-coded storage system that reduces the network traffic during the reconstruction of unavailable data without requiring any additional storage and maintaining the same level of fault-tolerance as RS-based systems [15]. Hitchhiker is built on top of the RS-EC scheme, and two stripes are stored on each node in the DSS. The example of (10, 4) Hitchhiker code is taken here to explain the encoding and decoding process. The structure of the (10, 4) Hitchhiker code is shown in Fig. 3, where \(a_{i}\) and \(b_{i}(i=1,2, \ldots , 10)\) are data symbols (bytes in this example, \(w=8\)), \({\varvec{a}}=\left[ a_{1}, a_{1}, \ldots , a_{10}\right] , {\varvec{b}}=\left[ b_{1}\right.\), \(\left. b_{1}, \ldots , b_{10}\right]\), and \(f_{1}, f_{2}, f_{3}\) and \(f_{4}\) are the functions to generate the 4 parity symbols in RS-EC. In Fig. 3, the two columns are called two stripes. The first stripe is actually a normal RS codeword. The second stripe is generated based on the Piggybacking framework. The framework operates on pairs of stripes of an RS code and allows for arbitrary functions of the data pertaining to one stripe to be added to the second stripe. In this example, 4 functions based on XOR operation are defined for the 4 parity symbols of the second stripe as

$$\left\{ {\begin{array}{*{20}l} {g_{1} = f_{1} (b)} \hfill \\ {g_{2} = f_{2} (b) \oplus a_{1} \oplus a_{2} \oplus a_{3} } \hfill \\ {g_{3} = f_{3} (b) \oplus a_{4} \oplus a_{5} \oplus a_{6} } \hfill \\ {g_{4} = f_{4} (b) \oplus a_{7} \oplus a_{8} \oplus a_{9} \oplus a_{{10}} } \hfill \\ \end{array} } \right.$$

In the DSS with the Hitchhiker code, the pair of \(\left( a_{i}, b_{i}\right)\) are stored on different nodes, so 14 nodes are used for the (10, 4) Hitchhiker code. If one node crashed, both ai and bi need to be recovered. Based on [15], the number of data units that required for the recovery is related to the position of the pair. For \(i \in \{1,2,3,4,5,6\}\), 13 data units need to be collected \((d=13)\), and for \(i \in \{7,8,9,10\}\), 14 data units need to be collected \((d=14)\). Taking \(i=1\) as an example (\(a_{1}\) and \(b_{1}\) are unavailable), the 13 data units are \(\left\{ a_{2}, a_{3}, b_{2}, b_{3}, \ldots , b_{10}, f_{1}({\varvec{b}}), f_{2}({\varvec{b}}) \oplus a_{1} \oplus a_{2} \oplus a_{3}\right\}\), and the recovery of \(a_{1}\) and \(b_{1}\) involves 4 steps: 1) \(b_{1}\) is recovered based on \(\left\{ b_{2}, b_{3}, \ldots , b_{10}, f_{1}({\varvec{b}})\right\}\) according to the normal RS-EC decoding procedure; 2) \(f_{2}({\varvec{b}})\) is generated based on b according to the normal \(\textrm{RS}\) encoding process; 3) XOR \(f_{2}({\varvec{b}})\) with \(f_{2}({\varvec{b}}) \oplus a_{1} \oplus a_{2} \oplus a_{3}\) to get the value of \(a_{1} \oplus a_{2}\) \(\oplus a_{3}\); 4) XOR \(a_{1} \oplus a_{2} \oplus a_{3}\) with \(a_{2}\) and \(a_{3}\) to recover \(a_{1}\). Therefore, both \(a_{1}\) and \(b_{1}\) are reconstructed by using only 13 bytes, as opposed to 20 bytes in RS codes \((d=20)\), resulting in a saving of 35%. In this recovery process, step 1) requires 10 multiplications, step 2) requires 10 multiplications, and step 3) and 4) requires 3 XOR operations. For a traditional RS-EC based DSS, 10 multiplications are required for each of \(a_{1}\) and \(b_{1}\), so the recovery of single pair for Hitchhiker code introduces 3 additional XOR operations.

The Hitchhiker code introduced above is named Hitchhiker-XOR in [15]. Since only one node can be recovered, the average network traffic can be expressed as

$$\begin{aligned} N_{T}=(N+z) / 2+1, \end{aligned}$$

where z is the number of XORs for the parity symbols in the second stripe. There are other two schemes proposed in [15], Hitchhiker-XOR+ and Hitchhiker-nonXOR. Since the network traffic and complexity are similar for the three Hitchhiker codes, we only take Hitchhiker-XOR for comparison in Sect. 4. It should be noted that the Hitchhiker code improves the network traffic only for single pairs (single node crashes). When multiple nodes crash, the recovery process is same as that for normal RS-EC, and the network traffic and complexity are also similar.

Fig. 3
figure 3

Two stripes for the (10, 4) Hitchhiker code

3 Proposed DSS based on RRNS

In this section, we first introduce the basics of RNS and RRNS, and then propose RRNS based DSS, including the encoding and decoding process. Then, the RS based scheme and the proposed scheme are compared in the third subsection.

3.1 Introduction of RNS and RRNS

3.1.1 Basics of RNS

The RNS consists of a set of modules \(\Psi _{N}=\) \(\left\{ m_{1}, m_{2}, m_{3}, \ldots , m_{N}\right\}\), where \(m_{i}\) is called residue base and any two residue bases are relatively prime. The multiplication of all modules \(M=\prod _{i=1}^{N} m_{i}\) is called “dynamic range”. If X is a positive integer less than \(M-1\), it can be uniquely represented as a remainder vector \(\Phi _{N}\) \(=\left\{ x_{1}, x_{2}, x_{3}, \ldots , x_{N}\right\}\), where \(x_{i}\) is the modular of X over base \(m_{i}\). In turn, if \(\Psi _{N}\) and \(\Phi _{N}\) are known, X can be recovered by

Fig. 4
figure 4

Structure of RRNS based DSS

$$\begin{aligned} X=\left( \sum _{i=1}^{N} x_{i} t_{i} M_{i}\right) \bmod M, \end{aligned}$$

in which \(M_{i}=M / m_{i}\), and \(t_{i}=\mid M^{-1}_{i}\mid _{mi}\) is the modular of the inverse of \(M_{i}\) over \(m_{i}\) so that \(M_{i} t_{i} \equiv 1\left( \bmod m_{i}\right)\). This process to recover X is known as the Chinese Remainder Theorem (CRT) [38], which involves 2N multiplications, N divisions, \(N+1\) modular operations and \(N(N-1)\) addition operations.

The new CRT-II is an improvement of CRT [39] for lower implementation complexity. If the remainder vector \(\Phi _{N}\) is divided into two subsets \(\left\{ x_{1}, x_{2}, \ldots , x_{N / 2}\right\}\) and \(\left\{ x_{N / 2+1}, x_{N / 2+2}, \ldots , x_{N}\right\}\), two numbers \(X_{1}\) and \(X_{2}\) can be recovered based on the two subsets according to CRT, respectively. Then the original X could be recovered as

$$\begin{aligned} X=X_{2}+\left\| t\left( X_{1}-X_{2}\right) \right\| _{P_{1}} P_{2}, \end{aligned}$$

in which \(P_{1}=\prod _{i=1}^{N / 2} m_{i}, P_{2}=\prod _{i=N / 2+1}^{N} m_{i}\), and \(t=\) \(\left\| P_{2}^{-1}\right\| _{P_{1}}\) is the modular of the inverse of \(P_{2}\) over \(P_{1}\) so that \(t P_{2} \equiv 1\left( \bmod P_{1}\right)\). This process actually divides a RNS into two smaller RNSs. Based on such recursive partitions, the RNS with N remainders can be divided into N/2 minimum RNSs with 2 remainders. Then the recovery of X starts with CRT over the N/2 pairs of remainders, and the N/2 results can be used to generate N/4 pairs of remainders. This process is repeated for \(\log _{2} N\) times to generate the original X.

3.1.2 RRNS for erasure recovery

An RRNS is constructed by adding R redundant residue bases into a RNS [40, 41], the set of modules becomes \(\psi _{N+R}=\left\{ m_{1}, m_{2}, \ldots , m_{N}, m_{N+1}, \ldots , m_{N+R}\right\}\), where the first N bases are called information bases and the last R bases are called redundant bases, and the remainder vector of data X becomes \(\Phi _{N+R}=\) \(\left\{ x_{1}, x_{2}, \ldots , x_{N}, x_{N+1}, x_{N+2}, \ldots , x_{N+R}\right\}\). Usually, the bases are organized in such a way that \(m_{1}<m_{2}<\cdots<\) \(m_{N+R-1}<m_{N+R}\), and the dynamic range is still \(M=\) \(\prod _{i=1}^{N} m_{i}\). In this case, any \(X<M\) can be uniquely represented and recovered by any subset \(\Phi _{N}^{S} \subset \Phi _{N+R}\), so that a maximum of R erasures could be tolerated.

3.2 Proposed DSS based on RRNS

3.2.1 Basic structure and general idea

For the DSS with \(N+R\) storage nodes, we can construct an RRNS with \(\psi _{N+R}=\left\{ m_{1}, m_{2}, \ldots , m_{N}, m_{N+1}, \ldots , m_{N+R}\right\}\). For the storage of data X, we will first get its remainder vector \(\Phi _{N+R}=\left\{ x_{1}, x_{2}, \ldots , x_{N}, x_{N+1}, x_{N+2}, \ldots , x_{N+R}\right\}\), and then distribute them to the \(N+R\) nodes. For efficient recovery of nodes failure, the basic idea of CRT-II is used. The residue bases (or equivalently the nodes) are first divided into \(N_{g}\) groups (\(G_{g}, g=1,2, 3, \ldots , N_{g}\)), and the set of modules for \(G_{g}\) is expressed as \(\psi _{n+r}^{g}=\left\{ m_{1}^{g}, m_{2}^{g}, \ldots , m_{n}^{g}, m_{n+1}^{g}, \ldots , m_{n+r}^{g}\right\}\), which includes \(n=N / N_{g}\) information bases and \(r=R / N_{g}\) redundant bases. In this way, the original RRNS is actually divided into \(N_{g}\) subsystems of RRNS, and each sub-RRNS can tolerate r erasures (failure nodes) at most. Then the \(N_{g}\) groups are organized into \(N_{g} / 2\) pairs, \(\left( G_{1}, G_{2}\right)\), \(\left( G_{3}, G_{4}\right) , \ldots , \left( G_{N g-1}, G_{N g}\right)\), so than CRT-II could be applied for efficient data recovery.

The data recovery process can be represented as a binary tree as shown in Fig. 4. The tree has \(\log _{2}N_{g}+1\) layers, where layer 0 produces the CRT result for each group \(\left( Y_{g}, g=1,2, \ldots , N_{g}\right)\), layer 1 produces the CRT result for each group pair (e.g. \(Y_{12}\) ), and the last layer produces the original X. When no node crashes, the data X can be directly accessed by collecting N remainders from the N information nodes (n in each group), so the network traffic is the same as that of the MDS based scheme.

3.2.2 Recovery of remainders on failure nodes

When e nodes crash \((1 \le e \le R)\), the complexity of the recovery process is related to the number of crashed nodes and their positions. Typical cases are analyzed as follows.

1) Recovery within single group (layer 0)

If all crashed nodes are located in the same group \(\left( \text {e.g. }G_{g}\right)\), and the number is not larger than \(r(e \le r)\), then the data on these nodes can be recovered by three steps: a) collecting n remainders from the normal nodes within the group; b) performing CRT over the n remainders to get \(Y_{g}\); c) Performing \(\left\| Y_{k}\right\| _{m_{i}^{g}}\) to recover the remainder for the i-th node in \(G_{g}\).

2) Recovery within a group pair (layer 1)

If the number of crashed nodes in a group (e.g. \(G_{1}\) ) is larger than r, but the total number of crashed nodes for the group pair \(\left( G_{1}, G_{2}\right)\) is less than 2r, all the data on the crashed nodes can be recovered by three steps: a) collecting 2n remainders from the normal nodes within the group pair; b) performing CRT over the 2n remainders to get \(Y_{12}\); c) Performing \(\left\| Y_{12}\right\| _{m_{i}^{1}}\) or \(\left\| Y_{12}\right\| _{m_{i}^{2}}\) to recover the remainders for the crashed nodes in \(G_{1}\) or \(G_{2}\).

3) Recovery within adjacent group pairs (layer 2)

If the number of crashed nodes in a group pair \(\left( \text {e.g. }G_{1}, G_{2}\right)\) is larger than 2r, but the total number of crashed nodes for the adjacent group pairs \(\left( G_{1}, G_{2}\right)\) and \(\left( G_{3}, G_{4}\right)\) is less than 4r, all the data on the crashed nodes can be recovered by three steps: a) collecting 4n remainders from the normal nodes within the two group pairs; b) performing CRT over the 4n remainders to get \(Y_{1234}\); c) Performing \(\left\| Y_{1234}\right\| _{m_{i}^{g}} (g=1,2,3,4)\) to recover the remainders for the crashed nodes in \(G_{1}, G_{2}, G_{3}\) and \(G_{4}\).

When the number of crashed nodes exceeds the tolerance limit of two adjacent group pairs, more pairs need to be involved to get the intermediate value of Y in a higher layer, and the procedure is same as that for case 3).

3.2.3 Network traffic and complexity for nodes recovery in RRSN based DSS

For the case that e crashed nodes distribute among the \(N+R\) nodes \((1 \le e \le R)\), the number of recoveries at layer j can be counted as \(L_{j}, j=0,1, \ldots , \log _{2} N_{g}\). For the recovery at layer j, the number of collected remainders is \(2^{j} n\), and the recovery process involves \(2^{j+1} n\) multiplications, \(2^{j} n\) divisions and \(2^{j} n+1\) modular operation. So the total network traffic could be expressed as

$$\begin{aligned} N_{T}=\sum _{j} L_{j} 2^{j} n+e \end{aligned}$$

and the total number of multiplications, divisions, and modular operations are \(2 d_{RRNS}\), \(d_{RRNS}\) and \(d_{RRNS}+\sum _{j} L_{j}\), respectively.

3.3 Comparison of network traffic and complexity in the failure free case

3.3.1 Advantages and disadvantages of RRNS based DSS

In the RS based DSS, the original data can be directly obtained by combining the data from the N information nodes when all of them work normally. But the update of data in the system is complex as it involves three steps: recovery the original data from the system, update the original data and update the data on all nodes by performing the encoding.

By contrast, since the values stored on all nodes are modular, the original data needs to be recovered based on CRT algorithm even there are no failing nodes, which introduces extra complexity. However, based on the principle of RNS, we have

$$\begin{aligned} \left( (a X+b) \bmod m_{i}\right) =\left( a x_{i}+b\right) \bmod m_{i}, \end{aligned}$$

in which \(x_{i}=X \bmod m_{i}\). So, if the original data X in the DSS are numbers that are updated in a linear form of \(a X+\) b, e.g. bank accounts or score records, then the data stored in the i-th node in RRNS based DSS can be directly updated by

$$\begin{aligned} x_{i}^{\text{ new }}=\left( a x_{i}^{\text{ old }}+b\right) \bmod m_{i}, \end{aligned}$$

without recovering the original X. Since the data recovery and re-encoding are avoided, the cost for network traffic and computation is much lower than those in RS based DSS. So, in practice, the network traffic and complexity of the RS based scheme and the proposed RRNS based scheme in the normal case (no failure nodes) should be compared when both reading frequency and update frequency are considered.

3.3.2 Network traffic and complexity comparison in the normal case

We assume the number of times for data read (or fetch) and data updates in unit time period are \(N_{r}\) and \(N_{u}\), respectively, and the data is updated by adding a number. Then based on the procedure of RS based schemes, the network traffic for RS and Hitchhiker based schemes per unit time can be calculated as

$$\begin{aligned} N_{T}^{R S}=N_{T}^{r}+N_{T}^{u}=N_{r} N+N_{u}(2 N+R), \end{aligned}$$

where \(N_{T}^{u}\) part includes \(N_{u} N\) data units for recovery of original data and \(N_{u}(N+R)\) data units for node update. Since the computation complexity per unit time only comes from data updates, it can be estimated based on the encoding process in Sect. 2.1 as

$$\begin{aligned} N_{C}^{R S}=N_{u}(N(N+R) GFMs+(N-1)(N+R) Adds), \end{aligned}$$

in which GFMs and Adds denotes multiplications over \(\textrm{GF}\left( 2^{w}\right)\) and addition operations, respectively.

For RRNS based DSS, N data units are needed from the nodes for data recovery, and \((N+R)\) data units are transmitted to all nodes for data updates, so the network traffic in unit time can be expressed as

$$\begin{aligned} N_{T}^{RRNS}=N_{T}^{r}+N_{T}^{u}=N_{r} N+N_{u}(N+R). \end{aligned}$$

And based on the analysis in Section 3.1.1, the complexity overhead in unit time can be expressed as

$$\begin{aligned} N_{C}^{{RRNS}} = & \;\;N_{C}^{r} + N_{C}^{u} \\ & = \;N_{r} (2N\;Muls + N\;Divs + (N + 1)\;Mods \\ & + \;N(N - 1)\;Adds) + N_{u} (2(N + R)\;Mods \\ & + \;(N + R)\;Adds), \\ \end{aligned}$$

in which Muls, Divs, Mods and Adds denote multiplication, division, modular and addition operations, respectively.

If defining \(\gamma =N_{u} / N_{r}\) and \(\rho =R / N\) as the update/read ratio and coding rate, we can have

$$\begin{aligned} \dfrac{N_{T}^{R S}}{N_{T}^{R R N S}}=\dfrac{N_{r} N+N_{u}(2 N+R)}{N_{r} N+N_{u}(N+R)}=1+\dfrac{1}{1+\rho +1 / \gamma } \end{aligned}$$

which means that the network traffic of the proposed RRNS based scheme is always smaller than that of RS based schemes, and the advantage of RRNS scheme is larger for lower coding ratios and higher update frequencies. The complexity for the RS based scheme and the RRNS based scheme cannot be numerically compared due to the different nature of operations. But it can be confirmed the complexity for RS based scheme would be lower than that of the RRNS based scheme for very low update frequencies, and the complexity of both schemes linearly increases with the update/read ratio.

4 Experimental evaluation

This section compares the performance of the three schemes: RS, Hitchhiker, and the proposed RRNS based scheme. We first introduce the experiment setup and evaluate the encoding complexity of the three schemes in Sect. 4.1. The three schemes are compared for the cases with and without failure nodes in Sect. 4.2 and 4.3, respectively, in terms of complexity and network traffic.

4.1 Experiment setup and encoding complexity

We emulate the DSS on single PC with an Intel (R) Core (TM) i5-6300HQ at 2.30 GHZ and 8GB of RAM. Python implementation based on Numpy is used to emulate the distributed storage of 10MB data on 14 nodes. For RS and Hitchhiker based DDS \((N=10, R=4)\), the byte-based encoding procedure in [17] is adopted, 10 bytes of data are encoded to 14 bytes according to RS or Hitchhiker codes each time, which are then assigned to 14 nodes. This the process is repeated by \(10^6\) times so that 10MB data are stored distributed on 14 nodes. To make an equal comparison, thus byte-based storage scheme is applied in the RRNS based DDS with 14 nodes. First, 10 bytes of data are combined into an 80-bit integer (X). Second, 14 remainders are obtained by preforming modular operation of the integer X over 14 modules. Third, the 14 remainders are assigned to 14 nodes. This process is repeated until all the 10MB data are processed. Finally, each node stores 1MB data for all the three DSS schemes.

The 14 modules used in the experiments are listed in Table 2, and the encoding time for the three schemes are compared in Table 3. As we can see, the encoding complexity of the RRNS based DSS is less than half of that for RS or Hitchhiker based DSS.

Table 2 Modules used for RRNS based DSS
Table 3 Comparison of encoding complexity

4.2 Comparison with failing nodes

4.2.1 Complexity comparison for data recovery

For data recovery in RRNS based DSS, the 14 nodes are divided into two groups \((k=2)\). Since the Hitchhiker scheme is proposed mainly for the case of single failure node, the comparison of complexity for single failure node is performed for all the three DSS schemes, and that for two failure nodes is performed only for RS and RRNS based DSS. All the results are averaged over 100 tests.

Based on Table 4, we can see that when one information node needs to be recovered, the RRNS based scheme only takes about 62 s to recover the 1MB data, which is about 50% and 63% of that for RS based and the Hitchhiker based schemes, respectively. For the case of one parity node failure, the recovery time of the RRNS based scheme is that same, that for RS scheme decreases by 10%, and that for Hitchhiker based scheme increases by 18%. So, the recovery time of the RRNS based scheme is about 54% of that of the other two schemes.

Table 4 Comparison of recovery complexity (1 failure node)

Table 5 compares the recovery time of the RS code based and the RRNS based schemes when two nodes crash. Two cases are considered: 1) two nodes are both information nodes; 2) two nodes are both parity nodes. The recovery time for the RS code based scheme for cases 1) and 2) are 174.66 s seconds and 152.68 s seconds, respectively. For the RRNS based scheme, the recovery time is not related to the node type (information or parity), but determined by the position of the nodes. When both nodes are in the same group, only 62.33 s seconds are required to recover both nodes, which is about 35% of that of the RS code based scheme. But when the two failure nodes are in different groups, the recovery time increases to 139.64 s, which is still 20% and 9% less than that of the RS code based scheme for the two cases, respectively.

Table 5 Comparison of recovery complexity (2 failure nodes)

4.2.2 Comparison of network traffic

Similar to the complexity comparison, the comparison of network traffic for single failure node is performed for all the three DSS schemes, and that for multiple failure nodes is performed only for the RS based DSS and the RRNS based DSS.

For the failure of a single information node, we fix the number redundant nodes \((R=4)\), and increase the number of information nodes \((N=[4:4:16])\). For each N, we count the number of data bytes during the data recovery process of each DSS scheme. The results are shown in Fig. 5. As predicted by equations (6), (8) and (11), the network traffic is linear proportional to the number of information nodes for the three schemes, and the network traffic for the RRNS scheme is much lower than that for the Hitchhiker code based scheme. For example, RRNS could reduce the network traffic by 22.2% relative to the Hitchhiker scheme for \(N=12\).

Fig. 5
figure 5

Network traffic for recovery of single information node

For the case with multiple failure information nodes, we fix the number of information nodes and the parity nodes to be \(N=8\) and \(R=8\), respectively, and increase the number of failure information nodes from 1 to 8. For the RRNS based DSS, the 16 nodes are divided into 4 groups \(\left( N_{g}=4\right)\), and each group includes two information nodes and two redundant nodes \((n=r=2)\). The network traffic for each number of failure information nodes is measured for RS based scheme and RRNS based scheme, and the results are shown in Fig. 6. Since the network traffic for the RRNS scheme is related to the position of the failing nodes, the average value is measured over 100 tests with random distribution of the failure nodes in the 4 groups. As we can see, the network traffic for the RS based scheme is linearly proportional to the number of failure information nodes, which matches the prediction by equation (6). By contrast, the network traffic for RRNS based scheme is much lower for small number of failing information nodes, and approaches to that of the RS based scheme for larger number of failing information nodes. Since the number of failing nodes would be small in practice for most of the time, the RRNS based DSS would produce much less network traffic than the RS based DSS.

Fig. 6
figure 6

Network traffic for recovery of multiple information nodes

4.3 Comparison with no failing nodes

In following experiments, the number of times of fetching data within a unit time is fixed to be 4, and the number of data updates varies from 0 to 8, so that the update/read ratio \(\gamma\) varies from 0 to 2. For each time of fetching or updating, all the 10MB data will be accessed. With these settings, the experiment is repeated 100 times for each DSS scheme, and the average processing time in seconds is recorded for each scheme for comparison. The results are shown in Fig. 7. Two important conclusions can be obtained from the figure. One is that the processing time for each scheme is almost linearly proportional to the update/read ratio, which matches the theoretical predictions of equations (15) and (17). The other is that the processing time of RRNS based DSS is higher than the other two schemes when the update/read ratio is lower than 1.25, and becomes lower for larger update/read ratios. This is expected because RRNS scheme introduces extra overhead for normal data fetching, but the updating complexity is very low due to the in-place computation on each node without recovery of the original data. This shows that the RRNS based DSS is more suitable for scenarios with frequent data updates.

Fig. 7
figure 7

Complexity comparison of three DSS schemes

The data volume that is transmitted during the data fetching and update are recorded to measure the network traffic. After performing the experiments 100 times, the average results for the three DSS schemes are compared in Fig. 8. As we can see, the network traffic increases linearly with the increase of update/read ratio, and network traffic for RRNS scheme is always lower than that of the RS or Hitchhiker scheme. In particular, when there is no data fetching, all DSS schemes have the same network traffic. These results match the predictions of equations (14), (16) and (18), and prove that the RRNS based DSS has larger advantages in terms of network traffic for the scenarios with frequent data updates.

Fig. 8
figure 8

Network traffic comparison of three DSS schemes

5 Enhanced security and flexibility

The storage characteristics of the proposed RRNS based DSS bring additional security and flexibility to the system. In RS based DSS, the original data are stored on information nodes, so the data could be directly revealed if one or more nodes are compromised. Instead, in RRNS based DSS, the data can only be recovered when at least N nodes are controlled by the attacker, which significantly increases the difficulty for a successful attack. In addition, all the nodes in the RS based DSS can only store the same amount of data. In other words, if the storage volume of some nodes is less than that of others, them may not be used in the DSS. In contrast, the data volume on each node in the RRNS based DSS is determined by the bit width of the modulo assigned to the node. We have the freedom to design a RRNS based DSS with moduli with different bit widths, in which the larger moduli are assigned to nodes with large storage volume and the smaller ones are assigned to nodes with small storage volume. In this way, the RRNS based DSS could adapt to the storage volume of different nodes. This feature would be very useful when the DSS is built cooperatively by multiple operators, so that they do not need to have nodes with the same storage volume.

6 Conclusions

In this paper, a new distributed storage system (DDS) is proposed based on the RRNS scheme to enhance the reliability of DDS, and efficient data recovery scheme is designed based on CRT-II. The data storage and recovery procedures are introduced, and the complexity and network traffic for data recovery are analyzed theoretically. Furthermore, the advantages and disadvantages of the proposed scheme is theoretically analyzed for different update/read ratios. Python based implementation are used to compare the performance of the proposed scheme with the traditional RS code based scheme and its improved version (Hitchhiker code based scheme) for the cases with and without node failures. Experimental results show that the proposed scheme could achieve lower encoding complexity, lower data recovery complexity and lower network traffic when there are failure nodes, and the advantages are more obvious when the portion of failing nodes is small. When the DSS works in normal case with no failure nodes, the RRNS scheme always achieves lower network traffic, but the complexity is higher for low data update frequency. These results show that the RRNS scheme is more suitable for DSS with frequent data updates. In addition, the proposed scheme enhances the security of the DSS so that the data cannot be used even when some storage nodes are controlled by attackers.

In the future, we plan to optimize the RRNS based DSS scheme for cloud-edge cooperative storage, e.g. CDN. As reported in [42], CDN carries most of the Internet traffic and continues to grow. Storage of edge servers is limited and the contents on the edge needs to be frequently updated based on the popularity [43,44,45]. In this case, the advantages of RRNS based DSS may be more obvious. In addition, since the cost of edge storage is usually higher than that in the cloud, it is preferred to have most of the edge storage for the raw content instead of the parity data, but less parity data on edge will decrease the storage reliability. To cope with such contradiction, we plan to keep part of the parity data on edge and move other parity data on cloud. Thanks to the tiered architecture of the CRT-II based data recovery, the basic protection capability can be maintained within a group of edge servers for single failures, and the parity data on cloud is only accessed for rare situations that multiple nodes fail in an edge group. Such RRNS based DSS schemes for cloud-edge cooperative storage need to be optimized based on the storage cost of cloud and edge, data access latency and the network traffic between cloud and edge.