1 Introduction

In today’s data-driven landscape, the management and preservation of vast amounts of information have become critical. Large-scale data storage systems are used in a variety of domains, from social networking and video streaming sites to the ever-expanding realm of cloud storage solutions. In these contexts, ensuring the integrity and availability of stored data is paramount, especially when storage nodes are individually unreliable. The fundamental challenge of maintaining data resiliency and accessibility in the face of node failures is traditionally addressed by introducing redundancy through data replication and erasure coding [3]. Distributed storage systems are a key solution to this challenge. These systems effectively distribute data across multiple nodes or servers, which are often geographically dispersed. In addition to ensuring reliable data access, they have a unique property: after a regeneration process, the resulting system retains the essential characteristics of the original configuration. The regeneration process may require a balance between optimizing certain metrics, such as the speed of regeneration and the minimum number of nodes involved. The design of coding techniques tailored to meet these objectives has consequently become an area of research interest [3, 4, 12, 16, 18].

In the general framework, a Distributed Storage System (DSS) with parameters (nkd), or simply an (nkd)-DSS, consists of n storage nodes. Each of these nodes stores packets, or equivalently symbols, in such a way that allows the data collector to reconstruct the stored file by contacting any k nodes. This property is referred to as the Maximum Distance Separability (MDS) property of the system. In the event of a node failure, the system connects to any set of \(d \ge k\) surviving nodes and downloads \(\beta \) packets from each of them. This collective operation results in a total repair bandwidth of \(\gamma = d \beta \) packets. The parameters d, \(\beta \), and \(\gamma \) are called the repair degree, normalized repair bandwidth, and total repair bandwidth, respectively.

In certain DSSs, replacement nodes are designed to connect only to predetermined subsets of nodes for repair. In these systems, the repair process is exact and uncoded, accomplished by simply downloading packets from the surviving nodes [9, 14, 17, 19, 24]. The encoding process begins with the use of an outer MDS code. The encoded symbols are then placed into storage nodes using an inner fractional repetition code to meet specific desired characteristics. This dual-layered approach allows the entire DSS to function without requiring any computation at the surviving nodes or the replacement nodes [1, 10, 11, 18, 19, 22]. Notably, the known constructions for exact repair processes typically operates at one of two distinct points: the Minimum Bandwidth Regenerating (MBR) point, minimizing the repair bandwidth \(\gamma \); the Minimum Storage Regenerating (MSR) point, minimizing the storage per node. In parallel, several studies have investigated code constructions, called local codes, aimed at minimizing the number of nodes involved in the regeneration process [4, 12, 15, 21]. In these codes, the repair degree d is strictly less than k.

In this paper, we present novel constructions based on an extension method that adds new symbols to the storage nodes of existing FR codes. In particular, we obtain an infinite family of codes through the extension process associated with affine planes and projective planes. The resultant codes may achieve optimality without altering the minimum distances. Additionally, we introduce families of locally repairable codes originating from net FR codes. Furthermore, we outline the construction of extensions to certain FR codes that arised from partitioning the edges of complete bipartite graphs.

2 Fractional repetition codes

Let \(\theta \) represent the number of encoded symbols generated by an outer (nk)-MDS code, such as the Reed–Solomon code. Copies of these symbols are distributed across n nodes, with each symbol occurring \(\rho \) times and each node containing \(\alpha \) symbols. In the event of a node failure, recovery is possible by downloading exactly \(\beta \) packets from a predetermined set of d surviving nodes, resulting in a total repair bandwidth of \(d\beta \). In other words, \(\alpha =d \beta \).

Example 1

Let \(X = (x_1,x_2,x_3,x_4,x_5) \in \mathbb F_q^5\) be a file of 5 packets to be stored in a DSS. Consider the (6, 5)-MDS code that inputs the file X, and outputs the coded file \(Y = (y_1,y_2,y_3,y_4,y_5,y_6)\) of 6 packets, where \(y_i = x_i\) for \(1 \le i \le 5\), and \(y_6 = \sum _{i=1}^5 x_i\). The coded packets are then distributed among the storage nodes \(V_1 = \{y_1,y_2, y_5\}\), \(V_2=\{y_1, y_4, y_6\}\), \(V_3=\{y_2, y_3, y_6\}\) and \(V_4=\{y_3, y_4, y_5\}\), as shown in Fig. 1. In this configuration, every node contains a subset of 3 packets from the set \(\{y_1, y_2, \ldots , y_6 \}\), and each \(y_i\) is replicated twice in the storage system. When a node fails, the system can connect to the remaining 3 nodes to download 1 packet from each to repair the failed node. As any distinct pair of nodes shares exactly one symbol in common, any two storage nodes contain at least 5 distinct symbols together. Consequently, the entire file can be recovered from any two nodes by means of the MDS property.

Fig. 1
figure 1

A (4, 2, 3)-DSS

The nodes of a DSS can be represented by a binary matrix \(N = [n_{ij}]\) of size \(\theta \times n\) defined as follows:

$$\begin{aligned} n_{ij} = \left\{ \begin{array}{cc} 1;&{} \text {if } y_i \in V_j, \\ 0;&{} \text {otherwise}. \end{array} \right. \end{aligned}$$

For example, the nodes of the DSS in Example 1 can be represented by the following matrix.

$$\begin{aligned} \begin{aligned}&\qquad V_1 \; V_2 \; V_3 \; V_4 \\ N&= \left[ \begin{array}{cccc} \,1\, &{} \,1\, &{} \,0\, &{} \,0\, \\ 1 &{} 0 &{} 1 &{} 0 \\ 0 &{} 0 &{} 1 &{} 1 \\ 0 &{} 1 &{} 0 &{} 1 \\ 1 &{} 0 &{} 0 &{} 1 \\ 0 &{} 1 &{} 1 &{} 0 \\ \end{array} \right] \!\! \begin{array}{c} y_1 \\ y_2 \\ y_3 \\ y_4 \\ y_5 \\ y_6 \\ \end{array} \end{aligned} \end{aligned}$$

Definition 1

Let \(\Omega = \{ 1,2,\dots , \theta \}\) denote the index set of coded packets. An FR code \(\mathcal {C}\), with repetition degree \(\rho \), for an (nkd)-DSS is a collection of n subsets \(V_1, V_2, \ldots , V_n\) of \(\Omega \) that satisfy the following conditions:

  1. 1.

    Each subset \(V_i\) has a cardinality of d for all \(1 \le i \le n\).

  2. 2.

    Each element of \(\Omega \) belongs to exactly \(\rho \) sets within the collection.

Constructing FR codes based on combinatorial structures, including regular graphs and combinatorial designs such as affine planes and projective planes, were presented in [9, 13, 14, 17, 19, 21, 23].

In a DSS with parameters (nkd), the stored file should be constructed from an arbitrary set of k nodes. The file size depends on the parameter k and is defined as follows.

Definition 2

The supported file size of an FR code \(\mathcal {C}\) for an (nkd)-DSS, denoted as \(M_k(\mathcal {C})\), corresponds to the minimum number of distinct symbols stored within any set of k nodes. In other words,

$$\begin{aligned} M_k(\mathcal {C}) = \min _{I \subset [n], \left| I \right| =k} \left| \cup _{i \in I} V_i \right| . \end{aligned}$$

Rouayheb and Ramchandran [17] derived two upper bounds for the file size that an FR code can support. The first bound is given by the formula

$$\begin{aligned} M_k(\mathcal {C}) \le \left\lfloor \frac{nd}{\rho } \left( 1 - \frac{ \left( {\begin{array}{c}n-\rho \\ k\end{array}}\right) }{\left( {\begin{array}{c}n\\ k\end{array}}\right) } \right) \right\rfloor . \end{aligned}$$

The second bound is sharper than the first bound, and it is given as \(M_k(\mathcal {C}) \le \phi (k)\), where \(\phi (k)\) is recursively defined by

$$\begin{aligned} \phi (1)&= d, \\ \phi (k+1)&= \phi (k) + d - \Bigg \lceil \frac{ \rho \phi (k) - kd }{ n - k} \Bigg \rceil . \end{aligned}$$

The minimum distance, denoted as \(d_{\min }\), is another metric considered in DSSs. It represents the size of the smallest subset of nodes whose failure ensures that the file stored in the system cannot be reconstructed from the remaining nodes. The following lemma establishes a Singleton-like bound on the minimum distance of an FR code.

Lemma 1

[14, Lemma 1] The minimum distance of an FR code \(\mathcal {C}\) for an (nkd)-DSS is bounded from above by

$$\begin{aligned} d_{\min } (\mathcal {C}) \le n- \Bigg \lceil \frac{M_k(\mathcal {C})}{\alpha } \Bigg \rceil + 1. \end{aligned}$$

An FR code \(\mathcal {C}\) designed for an (nkd)-DSS is called locally recoverable if \(d < k\). Locally repairable codes suffer a penalty on the maximum possible minimum distance [4].

Lemma 2

[15, Theorem 1] Consider an (nkd)-DSS. If the file size is M,  then

$$\begin{aligned} d_{\min }(\mathcal {C}) \le n- \Bigg \lceil \frac{M}{\alpha } \Bigg \rceil - \Bigg \lceil \frac{M}{d\alpha } \Bigg \rceil + 2. \end{aligned}$$

Assume that each node is part of a local structure that forms a (local) FR code. We have the following bound for locally recoverable codes.

Lemma 3

[21, Theorem 3] Let \(\mathcal {C} = (\Omega ,\mathcal {V})\) be a locally recoverable FR code designed for an \((n,k,d = \alpha )\)-DSS, where each node is part of a local FR code \(\mathcal {C}' = (\Omega ',\mathcal {V}')\) with \(\left| \Omega ' \right| = \theta '\), \(\left| \mathcal {V}' \right| =n'\) per-node storage \(\alpha '=\alpha \) and repetition degree \(\rho '=\rho \). Then the minimum distance of \(\mathcal {C}\) is bounded from above by

$$\begin{aligned} d_{\min } (\mathcal {C}) \le n- \Bigg \lceil \frac{(\rho '-1)\theta ' \big \lfloor \frac{M_k(\mathcal {C}) - 1}{\theta '} \big \rfloor + M_k(\mathcal {C})}{\alpha } \Bigg \rceil + 1. \end{aligned}$$
(1)

3 Extension-based code constructions

In this section, we will construct new FR codes based on existing FR codes. Our strategy is to keep the number of storage nodes in the DSS constant, while increasing the number of symbols within each node. To this end, we begin with a motivating example. Consider the following FR code:

$$\begin{aligned} \mathcal {C} = \left\{ 123, 456, 789, 147, 258, 369, 159, 267, 348 \right\} . \end{aligned}$$

The file size of \(\mathcal {C}\) is \(M_3(\mathcal {C}) = 6\). For example, users connecting to the second, fourth, and eighth nodes can download six distinct symbols. In other words, only six symbols are retrievable from the nodes containing symbols 456, 147, and 267. However, the upper bound of the file size is \(\phi (3) = 7\). Now, we extend the number of symbols in each node by adding a certain symbol from the set \(\{a,b,c\}\) as follows:

$$\begin{aligned} \mathcal {C}' = \left\{ 123a, 456a, 789a, 147b, 258b, 369b, 159c, 267c, 348c \right\} . \end{aligned}$$

The file size of \(\mathcal {C}'\) is \(M_3(\mathcal {C}')=9\), which meets the upper bound. Recall that an FR code for an \((n,k,d)-\)DSS attains the Singleton-like bound if \(k = \big \lceil \frac{M_k (\mathcal {C})}{\alpha } \big \rceil \). Both \(\mathcal {C}\) and \(\mathcal {C}'\) require the failure of at least seven nodes to guarantee that the file stored in the DSS cannot be reconstructed from the surviving nodes. Consequently, \(\mathcal {C}\) does not meet the Singleton-like bound while \(\mathcal {C}'\) achieves optimality with respect to this bound.

In summary, we have constructed an optimal FR code for a (9, 3, 4)-DSS using an existing code without altering the minimum distance. To generalize this idea, we can now define code extensions.

Definition 3

Consider an (nkd)-DSS, whose inner fractional repetition code \(\mathcal {C} = (\Omega _1, \mathcal {V})\) is characterized by the parameters \((n, \theta _1, \alpha _1, \rho _1)\). Let \(\Omega _2\) be a finite set of symbols. Extend each node of the DSS by adding \(\alpha _2\) symbols from \(\Omega _2\), resulting in the creation of the following nodes:

$$\begin{aligned} \mathcal {W}=\left\{ (V|W) : V \subset \Omega _1, W \subset \Omega _2, \left| V \right| = \alpha _1, \left| W \right| = \alpha _2 \right\} . \end{aligned}$$

If \(\mathcal {C}' =(\Omega _1 \cup \Omega _2, \mathcal {W}')\) forms a fractional repetition code for \(\mathcal {W}' \subseteq \mathcal {W}\), then we say that \(\mathcal {C}'\) is an extension of \(\mathcal {C}\).

It is worth highlighting the operation in defining the nodes in \(\mathcal {C}'\). This operation adjoins symbols to each node of a given FR code \(\mathcal {C}\) to form a new FR code \(\mathcal {C}'\). In particular it takes each node V of size \(\alpha _1\) from \(\mathcal {C}\), and adjoins to it a set W of size \(\alpha _2\), resulting in a node (V|W) of size \(\alpha _1+\alpha _2\). This set W comprises specific symbols from \(\Omega _2\), rather than encompassing all possible \(\alpha _2\)-element subsets of \(\Omega _2\).

Projective planes and affine planes serve as important tools for constructing code extensions. These structures belong to the category of balanced incomplete block designs. Let us now revisit their definitions.

Definition 4

A \((\theta , \alpha , \lambda )\) Balanced Incomplete Block Design (BIBD) is a pair \((\Omega , \mathcal {V})\), where \(\Omega \) is a set with \(\theta \) elements, called points, and \(\mathcal {V}\) is a collection of subsets of \(\Omega \), called blocks. These blocks satisfy the following conditions: each block has precisely \(\alpha \) points, and every 2-subset of \(\Omega \) is contained in exactly \(\lambda \) blocks.

Let \(|\mathcal {V}| = n\). Then it is easy to verify that

$$\begin{aligned} n \alpha = \theta \rho \quad \text {and} \quad \rho (\alpha - 1) = \lambda (\theta - 1) \end{aligned}$$

where \(\rho \) represents the replication number of each point in the design. We can use the notation \((\theta , \rho , \alpha , \lambda )\)-BIBD in order to capture the parameter \(\rho \).

Remark 1

A BIBD is in fact an FR code, with the additional property that every pair of distinct points is contained in exactly \(\lambda \) blocks [13].

A BIBD with parameters \((n^2 + n + 1, n + 1, 1)\) where \(n \ge 2\), is referred to as a projective plane of order n. Correspondingly, an \((n^2,n,1)\)-BIBD is called an affine plane of order n. Projective and affine planes of order n exist for prime power values of n. We refer to [20] for further details.

Definition 5

Let \(\mathcal {C} = (\Omega , \mathcal {V})\) be an FR code with \(\mathcal {V} =\{ V_1,V_2, \dots , V_n \}\). A subset \(P \subset \mathcal {V}\) is called a parallel class if \(V_i \cap V_j = \emptyset \) for all \(V_i,V_j \in P\) with \(i \ne j\), and \(\cup _{V_i \in P} V_i = \Omega \).

Theorem 1

An extension of an FR code for a \((p^2, p, p+1)\)-DSS exists if p is an odd prime. Moreover, the code extension is optimal with respect to the Singleton-like bound for \(k = 3\).

Proof

Let p be an odd prime. Then there exists an affine plane of order p, which is a \((p^2,p,p+1)\)-BIBD. This affine plane has \(p(p+1)\) lines that fall into \(p+1\) parallel classes, with each class consisting of p lines. A projective plane can be formed by adjoining a set of \(p+1\) points in a one-to-one correspondence with parallel classes. In this case, each new point is incident to all lines of the corresponding parallel class. Observe that any p distinct parallel classes of the affine plane form an FR code \(\mathcal {C}\) with parameters \((n=p^2,\theta =p^2,\alpha =p,\rho =p)\). By adjoining p new symbols in one-to-one correspondence with these p parallel classes, we obtain an extension \(\mathcal {C}'\) of \(\mathcal {C}\) with parameters \((n=p^2,\theta =p^2+p,\alpha =p+1,\rho =p)\).

Next, we show that \(\mathcal {C}'\) is optimal for \(k=3\). In the projective plane, any two lines intersect in exactly one point. Since the new nodes are part of the projective plane of order p, the union of any \(k = 3\) nodes will have at least \(3(p+1) - {3 \atopwithdelims ()2} = 3p\) symbols. As \(3 = \lceil \frac{3p}{p+1}\rceil \), \(\mathcal {C}'\) is optimal with respect to the Singleton-like bound for \(k = 3\). This completes the proof. \(\square \)

Theorem 2

Let q be a prime power. Consider the FR code \(\mathcal {C}\) obtained from q distinct parallel classes of the affine plane. Then the file size of the extension \(\mathcal {C}'\) of \(\mathcal {C}\) is given by \(M_k(\mathcal {C}') = q^{2} + q - \left( {\begin{array}{c}q\\ 2\end{array}}\right) \) for \(k=q\).

Proof

It can be deduced from Lemma 9 in [14] that the DSS achieves the minimum file size by selecting storage nodes from different parallel classes. These storage nodes introduce q additional new symbols through the extension process. By applying the inclusion–exclusion principle, it can be seen that at least \(q^{2} + q - \left( {\begin{array}{c}q\\ 2\end{array}}\right) \) symbols are covered by any q storage nodes. Consequently, we have demonstrated a set of nodes that attains this bound. \(\square \)

Next, we present an extension family of FR codes where the new code has a larger minimum distance than the existing code. Before proceeding, let us state a result known as Corrádi’s bound [2], which establishes bounds on the size of the union of certain subsets.

Lemma 4

Let \(V_1, V_2, \dots , V_n\) be \(\alpha \)-element sets and let X be their union. If \(|V_i \cap V_j| \le k\) for all \( i \ne j\), then

$$\begin{aligned} |X| \ge \frac{\alpha ^2 n}{ \alpha + (n-1) k}. \end{aligned}$$

Theorem 3

Let q be a prime power. Consider an FR code \(\mathcal {C} = (\Omega , \mathcal {V})\) obtained from a projective plane of order q. Suppose there exists an FR code \(\mathcal {C}'\) where extra symbols are adjoined to the nodes of \(\mathcal {C}\) in a one-to-one correspondence with the elements of \(\Omega \) such that any pair of distinct symbols appears together in at most 2 nodes in the resulting FR code. Then,

$$\begin{aligned} d_{\min }(\mathcal {C}') \ge d_{\min }(\mathcal {C}) + 1. \end{aligned}$$

Proof

The Corrádi’s bound is optimal for projective planes [6]. Therefore,

$$\begin{aligned} M_{q^2} (\mathcal {C}) = \Bigg \lceil \frac{(q+1)^2q^2 }{q+1 + (q^2-1) 1} \Bigg \rceil = q^2+q. \end{aligned}$$

Given that each symbol is present in \(q + 1\) nodes, and any distinct pair of nodes intersect in exactly one symbol, the minimum distance \(d_{\min }(\mathcal {C})\) is equal to \(2q+1\). Now assume that there exists an extension \(\mathcal {C}'\) with nodes \((N_i\,|\,i)\), where \(i \notin N_i\) and for every \(j \in \Omega \), the count of nodes containing both i and j is at most 2. The failure of certain \(2q+1\) nodes guarantees that 2 symbols from the projective plane cannot be recovered. However, according to our assumption, there will be at least one copy of these symbols that will survive in the new FR code obtained by the extension technique. Therefore, \(d_{\min }(\mathcal {C}') \ge d_{\min }(\mathcal {C}) + 1\). \(\square \)

In the rest of this section, we will focus on locally recoverable codes. In particular, we will investigate an infinite family of locally recoverable FR codes obtained by the extension technique. But let us recall the definition of resolvable FR codes first [13].

Definition 6

Let \(\mathcal {C} = (\Omega , \mathcal {V})\) be an FR code. A resolution is a partition of \(\mathcal {V}\) into r parallel classes. If a resolution exists, then the code is called resolvable.

Before stating our next theorem, let us present a motivating example. Consider the FR code \(\mathcal {C}_1 = \{12, 34,13, 24 \}\) and its isomorphic copy \(\mathcal {C}_2 = \{56, 78, 57, 68 \}\). Adjoin symbols from \(x_1,x_2\) and \(y_1, y_2\) to the non-parallel classes of \(\mathcal {C}_1\) and \(\mathcal {C}_2\) as follows:

$$\begin{aligned} \mathcal {C} = \left\{ \begin{array}{cccccccc} 12x_1, &{} 34x_1, &{} 13x_2, &{} 24x_2, &{} 56x_1, &{} 78x_1, &{} 57x_2,&{} 68x_2 \\ 12y_1, &{} 34y_1, &{} 13y_2, &{} 24y_2, &{} 56y_1, &{} 78y_1, &{} 57y_2, &{} 68y_2 \end{array} \right\} . \end{aligned}$$
(2)

This forms a locally recoverable FR code for an (16, 5, 3)-DSS. Each node is part of a local FR code with parameters \((n=4, \theta =6, \alpha =3, \rho =2)\) where the local FR codes are obtained from isomorphic copies of the extension of \(\mathcal {C}_1\) and \(\mathcal {C}_2\) by adding a symbol to each node in a given parallel class.

We are now ready to generalize the idea illustrated in this example.

Theorem 4

Consider a resolvable FR code \(\mathcal {C}_1 = (\Omega _1, \mathcal {V}_1)\) with parameters \((n = a^2, \theta = a^2, \alpha = a, \rho = a)\) such that any pair of non-parallel nodes intersect in exactly one symbol. Let \(\mathcal {C}_2 = (\Omega _2,\mathcal {V}_2)\) be an isomorphic copy of \(\mathcal {C}_1\) with \(\Omega _1 \cap \Omega _2 = \emptyset \). Also, let \(x_1, \dots , x_a\) and \(y_1, \dots , y_a\) be symbols not in \(\Omega _1 \cup \Omega _2\). Denote the j-th node in the i-th parallel classes of \(\mathcal {C}_1\) and \(\mathcal {C}_2\) by \(T_{i,j}\) and \(U_{i,j}\), respectively. Then the collection \(\mathcal {C}\) of all the extended nodes

$$\begin{aligned} \left( T_{i,j} \,|\, x_i \right) , \left( T_{i,j} \,|\, y_i \right) , \left( U_{i,j} \,|\, x_i \right) , \left( U_{i,j} \,|\, y_i \right) \end{aligned}$$

forms an FR code with parameters \((n =4a^2,\theta = 2(a^2 +a), \alpha = a+1, \rho = 2a)\). Moreover, \(\mathcal {C}\) is optimal with respect to the bound given in (1) for the file size \(a^2 + a + 1\).

Proof

Let \(\mathcal {C}\) be defined as above. It is easy to observe from its construction that \(\mathcal {C}\) is a locally recoverable FR code for an \((4a^2,a^2+1,a+1)\)-DSS. In this setting, each node is part of a local FR code with parameters \((n'=a^2, \theta '=a^2+a, \alpha '=a+1, \rho '=a)\). Consequently, we have the following bound on the minimum distance:

$$\begin{aligned} d_{\min }(\mathcal {C}) \le 4a^2 - \Bigg \lceil \frac{(a-1)(a^2+a) \left\lfloor \frac{a^2+a}{a^2+a} \right\rfloor + a^2+a+1}{a+1} \Bigg \rceil +1. \end{aligned}$$

Thus, \(d_{\min }(\mathcal {C}) \le 3a^2\). Now choose \(k < 3a^2\) nodes and suppose that the file is not recoverable. This implies that a minimum of \(a^2 + a\) symbols are inaccessible within the DSS. Without loss of generality, assume that all symbols in \(\Omega _1\cup \{ x_1,\dots , x_a \} \) are not recoverable in the DSS. Hence, all the nodes \(\left( T_{i,j} \,|\, x_i \right) \), \(\left( T_{i,j} \,|\, y_i \right) \), and \(\left( U_{i,j} \,|\, x_i \right) \) should fail. As a result, we are left with only \(a^2 + a\) symbols, and in this situation, recovering the file in the DSS is not possible. Since all the local codes are isomorphic to each other, the above calculation remains valid when \(x_i\) is replaced by \(y_i\) and \(\Omega _1\) is replaced by \(\Omega _2\). Now consider the case where \(a^2+a\) symbols are chosen from \(\Omega _1 \cup \Omega _2\). This implies that we need to select at least a of the symbols from \(\Omega _1\) and \(\Omega _2\). To minimize the number of node failures required to ensure that a symbols are not recoverable, such a symbols should be contained in a single node. Based on this observation, eliminating the symbols would necessitate a minimum of \(a(a-1)+1\) node failures in each of the four categories of extended nodes. However, this contradicts our assumption that \(k < 3a^2\). A similar calculation for the case in which \(a^2+a\) symbols are chosen from \(\Omega _1 \cup \Omega _2\) and \(\Omega _3 = \{x_1, \dots , x_a, y_1, \dots , y_a\}\) will lead to the conclusion that \(d_{\min }(\mathcal {C}) = 3a^2\). \(\square \)

Example 2

Consider the FR code \(\mathcal {C}\) constructed in (2). Let us compare our code with the bound given in (1). Here, we have \(n = 16\), \(\rho ' = 2\), \(\theta ' = 6\), and \(M_5(\mathcal {C}) = 7\). It can easily be seen that \(d_{\min } \le 12\). Hence, the failure of at least 12 nodes guarantees that the file stored in the DSS cannot be reconstructed from the surviving nodes.

4 FR codes from complete bipartite graphs

In this section, we introduce a family of FR codes that can be obtained by edge partitioning of the complete bipartite graph \(K_{c,d}\) into copies of the complete bipartite graph \(K_{a,b}\). We begin our discussion with an example.

Example 3

Consider the complete bipartite graph \(K_{4,2}\) whose vertices are partitioned into the subsets \(V = \{1, 2, 3, 4 \}\) and \(W = \{a, b \}\). We can partition \(K_{4,2}\) into four copies of \(K_{2,1}\). Each copy of \(K_{2,1}\) can be represented by the vertices involved in the partition. For example, \(\{12a, 34a, 14b,23b\}\) represents one of such partitions. Note that this partition served as storage nodes in Example 1.

Fig. 2
figure 2

The complete bipartite graph \(K_{4,2}\)

A partition of \(K_{c,d}\) into the copies of \(K_{a,b}\) may or may not exist. The following theorem by Hoffman and Liatti [5] gives necessary and sufficient conditions for the partition problem on complete bipartite graphs.

Theorem 5

Let abc,  and d be positive integers. Let \(g = (a, b)\), the greatest common divisor of a and b; let ef be integers satisfying \(ae-bf = g\), and let \(h = ae+bf\). For each integer x, let: \( \alpha (x) = \lceil \frac{xf}{a} \rceil \), \(\beta (x) = \lfloor \frac{xe}{b} \rfloor \), and \(\gamma (x) = \frac{x}{ab}\). Then the edges of the complete bipartite graph \(K_{c,d}\) can be partitioned into copies of the complete bipartite graph \(K_{a,b}\) if and only if the following conditions are true:

  1. 1.

    \(ab \mid cd\),

  2. 2.

    \(g \mid c\) and \(\alpha (c) \le \beta (c)\),

  3. 3.

    \(g \mid d\) and \(\alpha (d) \le \beta (d)\),

  4. 4.

    \(c \alpha (d) + d \alpha (c) \le h \gamma (cd) \le c \beta (d) +d \beta (c)\).

The task of partitioning the edges of the complete bipartite graph \(K_{c,d}\) into copies of the complete bipartite graph \(K_{a,b}\) can be formalized as an instance of the exact cover problem. Formally, given a set X and a collection S of subsets of X, the exact cover problem aims to find a subset \(S^*\) of S such that every element of X is included in exactly one subset of \(S^*\). The exact cover problem can effectively be addressed using Knuth’s Algorithm X which employs a straightforward recursive, nondeterministic, depth-first, backtracking approach to find all possible exact covers for a given collection of sets [8]. The problem is formulated as a binary matrix in Algorithm X. Then the objective is to identify a subset of the rows in such a way that the number 1 appears exactly once in each column.

Next, we will demonstrate how this partitioning can provide an infinite family of FR codes where we can apply enumeration algorithms. Let \(\Gamma (V, W )\) be the complete bipartite graph. Let \(i\in V\), \(j \in W\) and X be the set of all pairs (ij) such that (ij) is an edge in the complete bipartite graph. Let K be the set of all s-element subsets of V. Let \(S = \{(v|w): v \in K, w \in W \}\). We will define a matrix M whose rows are indexed by the elements of S and columns are indexed by the elements of X as follows:

$$\begin{aligned} M_{(v \mid w),(i,j)} = \left\{ \begin{array}{rl} 1; &{} \text {if } i \in v \text { and } w=j,\\ 0; &{} \text {otherwise}. \end{array} \right. \end{aligned}$$

Let’s apply the Algorithm X to the matrix M to construct FR codes.

Example 4

Consider the complete bipartite graph \(K_{4,2}\) given in Fig. 2. Suppose we want to partition the edges of \(K_{4,2}\) into the copies of \(K_{2,1}\). Let us index the columns of M with the pairs (1, a), (2, a), (3, a), (4, a), (1, b), (2, b), (3, b), (4, b) and the rows of M with the blocks \((12 \,|\, a)\), \((13 \,|\, a)\), \((14 \,|\, a)\), \((23 \,|\, a)\), \((24 \,|\, a)\), \((34 \,|\, a)\), \((12 \,|\, b)\), \((13 \,|\, b)\), \((14 \,|\, b)\), \((23 \,|\, b)\), \((24 \,|\, b)\), \((34 \,|\, b)\). Both the rows and columns are indexed in the given order. Then the binary matrix M has the form where

$$\begin{aligned} N = \left[ \begin{array}{rrrrrrrr} 1 &{} 1 &{} 0 &{} 0 \\ 1 &{} 0 &{} 1 &{} 0 \\ 1 &{} 0 &{} 0 &{} 1 \\ 0 &{} 1 &{} 1 &{} 0 \\ 0 &{} 1 &{} 0 &{} 1 \\ 0 &{} 0 &{} 1 &{} 1 \end{array}\right] . \end{aligned}$$

The matrix N has 3 exact covers. Specifically, the first and sixth, second and fifth, as well as the third and fourth rows all provide exact covers. Since M has a block diagonal structure with repeated blocks of N, it has \(3^2 = 9\) exact covers. Therefore, by extending the exact covers of N or applying Algorithm X to the matrix M, we can find all 9 distinct solutions to the exact cover problem. These solutions correspond to the following FR codes:

$$\begin{aligned} \begin{array}{ccc} \{ 12a, 34a, 13b, 24b \}, &{} \{ 12a, 34a, 14b, 23b \}, &{} \{ 13a, 24a, 12b, 34b \},\\ \{ 13a, 24a, 14b, 23b \}, &{} \{ 14a, 23a, 12b, 34b \}, &{} \{ 14a, 23a, 13b, 24b \}, \\ \{ 12a, 34a, 12b, 34b \}, &{} \{ 13a, 24a, 13b, 24b \}, &{} \{ 14a, 23a, 14b, 23b \}. \end{array} \end{aligned}$$

Note that the codes in the first two rows have the property \(M_2(\mathcal {C}) = 5\). The remaining codes have \(M_2(\mathcal {C}) = 4\).

We can perform similar calculations for larger complete bipartite graphs. For instance, we can obtain FR codes with parameters \((n = 9, \theta = 9, \alpha = 3, \rho = 3)\) by partitioning the edges of \(K_{6,3}\) into copies of \(K_{2,1}\). This case has a total of \(15^3 = 3375\) solutions. Out of these solutions, 375 of them have the property \(M_k(\mathcal {C})=5\), while the remaining 3000 solutions have \(M_k(\mathcal {C})=6\) for \(k=3\). The table below provides a summary of the enumeration results for larger values of k.

5 Conclusion

Addressing multiple node failures in DSSs is crucial for ensuring data integrity and availability. FR codes have emerged as powerful tools offering efficient solutions to this challenge, finding applications in diverse fields such as cloud storage, network coding, and data center architectures. Recognizing their importance in modern information systems, this paper presented extension-based constructions of new FR codes, and studied their characteristics. In this framework, we derived an infinite family of optimal FR codes based on the proposed method related to affine and projective planes. We then constructed locally recoverable FR codes that can achieve a Singleton-like upper bound with equality. In addition, we provided another construction of FR codes, realized as an extension of FR codes, by partitioning the edges of complete bipartite graphs. After presenting the necessary and sufficient conditions for the existence of such partitions, we applied Algorithm X to investigate two specific cases and obtained comprehensive enumeration results.