1 Introduction

The emergence of digital cellular communication networks dates back to the late Twentyth century, with the first generation of such networks known as 1G. Subsequent developments in the field led to the introduction of encryption techniques in the second generation of digital cellular communication networks, or 2G. Stream cipher systems, such as A5/1 and its successor A5/2, have been used to ensure the privacy and security of 2G mobile communications, as described in [1].

The growing demand for data in wireless communications highlighted the limitations of 2G technology, leading to the development of the third generation, also known as 3G. Its main objective was to increase data transmission capacity to support services such as Internet connectivity, file downloads and video conferencing.

In contrast, towards the end of the 2G era, vulnerabilities were discovered in the A5/1 and A5/2 stream ciphers, leading to their replacement by the Kasumi block cipher system as part of the 3G or UMTS technology that replaced GSM [2]. This improvement allowed better protection of user privacy and information. However, as of 2010, block cipher was still found to be vulnerable to attacks.

As the demand for bandwidth-intensive applications such as multimedia streaming continued to grow, 3G networks became inadequate. In response, the industry began to develop fourth-generation (2G) technologies specifically optimized for data. The main goal of 2G was to increase speeds by up to 10 times compared to existing 3G technologies. The commercialized version of 2G was the Long Term Evolution (LTE) standard, which used the UEA2 confidentiality algorithm and the UIA2 integrity algorithm. In addition, an enhanced version of LTE called LTE-Advanced (LTE-A) was introduced, which used the EEA1 and EIA1 algorithms [3].

In early 2018, fifth-generation (5G) technology was launched for commercial use, promising an incredibly low latency rate of 1 millisecond and impressive speed limits ranging from 20 to 50 Gbps [1]. Aside from these improvements, 5G technology is also expected to meet the communication needs of billions of connected devices while striking the right balance between speed, latency, and cost. In terms of security, the 3GPP standards organization is exploring the possibility of increasing the key size of EIAx and EEAx from the current 128 bits to 256 bits to enhance security, as the current standard is still considered secure [2].

SNOW-V is a stream cipher proposed by Ekdahl et al. [3] to be used as a primary cipher in 5G systems. It is heavily based on SNOW 3G, which was originally developed as a primary encryption function in 3G and 2G systems. In order to meet the requirements of the new 5G systems and to improve deployment flexibility, Ericsson Research and Lund University have collaborated to create the new SNOW-V cipher, where the letter “V” stands for virtualization [4].

SNOW-V was developed primarily to address the industry’s need for a very fast cipher that could be implemented in a virtualized environment, meaning it could run in the cloud using software rather than hardware devices. This was done to make 5G deployments more flexible [3] and to introduce new security and privacy algorithms, while enabling fast download speeds and minimal latency. With this technology, industries such as autonomous car monitoring and remote patient intervention in hospitals could be greatly improved.

The structure of this paper is as follows. In Sect. 2, we give a brief overview of the current state of the art. Section 3 introduces the key concepts and notations used in this study, as well as a theoretical explanation of the SNOW-V generator. The tools and devices used for the analysis and implementation are described in Sect. 4. The five possible improvements for the LFSR are investigated in Sect. 5 and the results are presented in Sect. 6. Finally, Sect. 7 provides some concluding remarks.

2 State of the art

The SNOW family of stream ciphers was introduced with the release of SNOW 1.0 in 2000 [5]. The initial design was relatively simple, using a Linear Feedback Shift Register (LFSR) [6] and a Finite State Machine (FSM). Over time, however, several weaknesses were discovered, including a discriminating attack that required knowledge of an output of the generator of length \(2^{95}\) [7].

SNOW was not included in the algorithm set of the NESSIE project due to the discovered weaknesses. In response, the authors developed a new version of the stream cipher, SNOW 2.0, in 2003 [8]. They claimed that this new design addressed the previous inconsistencies and improved overall performance. These improvements ultimately led to the inclusion of SNOW 2.0 as one of the selected stream ciphers in the ISO/IEC 18033-4 standard [9]. In addition, SNOW 2.0 shares design principles with the SOSEMANUK stream cipher, which is one of the four software ciphers selected for the eSTREAM portfolio [10].

To continue with your request, SNOW 3G was developed as an improvement of SNOW 2.0 with the goal of achieving high resistance to algebraic attacks [11]. It was chosen as the stream cipher for the UEA2 and UIA2 confidentiality algorithms published by 3GPP in 2006 [12]. SNOW 3G was designed to provide secure and efficient protection of the confidentiality and integrity of data transmitted over 3G and later 2G mobile communication systems. It was also intended to be used on resource-constrained devices to enable higher data transmission speeds on mobile devices. SNOW 3G was designed to be resistant to linear differential attacks, but vulnerable to others, based on empirical data-time cache synchronization, which allows the full initial state to be retrieved in seconds without the need to know any bit beforehand [13].

Ekdahl et al. in collaboration with Lund University released the SNOW-V cipher in 2019 [3]. This updated version of the SNOW 3G cipher is designed to improve security while maintaining the high-speed performance that characterizes the SNOW family of algorithms. The use of vectorized instructions, also known as Single Instruction, Multiple Data (SIMD), is particularly important in the software implementation of the cipher.

3 Theoretical description of the SNOW-V

The SNOW-V consists of two main parts: An LFSR part, which contains two shift registers, called LFSR-A and LFSR-B, that follow a circular construction, and an FSM part, which consists of three states of 128 bits each, called \(R_1\), \(R_2\) and \(R_3\), that are updated by two instances of an AES encryption function.

Let \({\mathbb {F}}_2\) be the finite field with two elements. If we consider the irreducible polynomials:

$$\begin{aligned} g_A(x)= & {} x^{16}+x^{15}+x^{12}+x^{11}+x^{8}+x^3+x^2+x+1\in {\mathbb {F}}_2[x]\\ g_B(x)= & {} x^{16}+x^{15}+x^{14}+x^{11}+x^8+x^6+x^5+x+1\in {\mathbb {F}}_2[x] \end{aligned}$$

We can obtain two different representations of the extension field \({\mathbb {F}}_{2^{16}}\).

Indeed, let \(\alpha\) be a root of \(g_A(x)\) then \({\mathcal {B}}= \{1, \alpha , \ldots , \alpha ^{15}\}\) form a basis of \({\mathbb {F}}_{2^{16}}\) over \({\mathbb {F}}_2\), that is, every element of \({\mathbb {F}}_{2^{16}}\) is of the form \(f(\alpha )\) for some \(f\in {\mathbb {F}}_2[x]\) with \(\deg (f)\le 15\). Therefore, we can consider any element of \({\mathbb {F}}_{2^{16}}\) as

$$\begin{aligned} f(\alpha ) = f_0 + f_1 \alpha + \ldots + f_{15}\alpha ^{15} = (f_0, \ldots , f_{15})_{{\mathcal {B}}} \hbox { with } f_i \in {\mathbb {F}}_2 \end{aligned}$$

Where \((f_0, \ldots , f_{15})_{{\mathcal {B}}}\) represents the coordinates of \(f(\alpha )\) with respect to the basis \({\mathcal {B}}\). This tuple represents 16 bits where \(f_0\) denotes the least significant bit in the word \(f(\alpha )\).

Similarly, if \(\beta\) is the root of \(g_B(x)\) we get a different representation for the elements of the field \({\mathbb {F}}_{2^{16}}\). We will denote by \({\mathbb {F}}_{2^{16}}^A = {\mathbb {F}}_2(\alpha )\) and by \({\mathbb {F}}_{2^{16}}^B = {\mathbb {F}}_2(\beta )\), although both representations of the field \({\mathbb {F}}_{2^{16}}\) are isomorphic.

At time \(t\ge 0\) we denote the states of the LFSR-A as \(\left( a_{15}^t, \ldots , a_1^{t}, a_0^t\right)\) with \(a_i^t \in {\mathbb {F}}_{2^{16}}^A\) and the states of the LFSR-B as \(\left( b_{15}^t, \ldots , b_1^{t}, b_0^t\right)\) with \(b_i^t\in {\mathbb {F}}_{2^{16}}^B\). The update functions of the LFSRs are defined by the following linear recurrence relations:

$$\begin{aligned} a^{t+16} = b^t + \alpha a^t + a^{t+1} + \alpha ^{-1}a^{t+8} \mod g_A(\alpha ) \\ b^{t+16} = a^t + \beta b^t + b^{t+3} + \beta ^{-1}b^{t+8} \mod g_B(\beta ) \end{aligned}$$

Each time we update the LFSR part, we clock both LFSRs 8 times, i.e. 256 bits.

Now we turn to the FSM part, which constitutes the nonlinear part of the generator. First, the FSM takes two blocks \(T_1\) and \(T_2\) from the LFSRs as inputs Fig. 1 and produces the word \(z^t\) of 128-bit. This part consists of three states of 128 bits each, called \(R_1\), \(R_2\) and \(R_3\).

Fig. 1
figure 1

SNOW-V Keystream generation

The output word \(z^t\) at time \(t>0\) is given by

$$\begin{aligned} z^t =\left( R_1^t \boxplus T_1^t\right) \oplus R_2^t, \end{aligned}$$

where \(\oplus\) denotes the bitwise XOR operation and \(\boxplus\) denotes a parallel application of four additions in the finite field \({\mathbb {F}}_{2^{16}}\).

  • The expression of the Taps \(T_1\) and \(T_2\) are given by:

    $$\begin{aligned} T_1^t= & {} \left( b_{15}^{8t}, \ldots , b_8^{8t}\right) \\ T_2^t= & {} \left( a_7^{8t}, \ldots , a_0^{8t}\right) \end{aligned}$$
  • The update expressions for the registers \(R_1\), \(R_2\) and \(R_3\) are given by:

    $$\begin{aligned} R_1^{t+1}= & {} \sigma \left( R_2^t \boxplus \left( R_3^t \oplus T_2^t\right) \right) \\ R_2^{t+1}= & {} \hbox {AES}\left( R_1^t, 0\right) \\ R_3^{t+1}= & {} \hbox {AES}\left( R_2^t, 0\right) \end{aligned}$$

    where \(\sigma\) is a byte permutation given by

    $$\begin{aligned} \sigma = (0,4,8,12,1,5,9,13,2,6,10,14,3,7,11,15) \end{aligned}$$

    that is, it replace the byte 0 by that in position 0, the byte 1 by that in position 4, the byte 2 by that in position 8, and so on. And \(\hbox {AES}(\hbox {In}, \hbox {Key})\) represents an AES encryption round function.

It would be interesting to consider whether this design is vulnerable to fault injection attacks like the previous versions of SNOW (see [14] and the references therein). That is, if the adversary is in possession of a physical device capable of generating transient faults, and the adversary knows the initialization value and the keystream generated by the stream cipher. In addition, the adversary is able to reset the cryptographic device to its original state and then apply a bug to the same device on each new execution.

The first step would be to find the right place to inject the error. The next step is to try to get a system of equations (not necessarily linear), but with enough equations and with the smallest degree possible, so that it can be solved with Gröbner bases techniques in an “efficient” way so as to be considered an effective attack against this design.

4 SNOW-V software implementation and analysis

The implementation presented in this section focuses on finding the most time and resource consuming method to improve the efficiency and speed of this cryptographic algorithm. It is expected that the most time and resource consuming function will be the one responsible for the LFSR (Linear Feedback Shift Register) [6]. This method deals with all the shifting of both LFSRs, including the feedback operation that they receive as input. For this reason, this chapter presents the implementation of various modifications to improve the time taken by this method. In addition, an analysis is performed to determine which is the most optimal and which gives the greatest improvement in efficiency.

4.1 Devices and tools for analysis and implementation

The implementation and analysis of the code used in this work was carried out using the C++ language on a Windows 10 computer. In addition, an online tool called Codeanywhere [15] was used. The tests were performed under the same conditions, with the Authenticated Encryption with Associated Data (AEAD) mode disabled. Finally, the tests were performed with test vectors #1 from the SNOW-V document [3].

The following Table 1 describes the main characteristics of the device used. Specifically, it is an Asus GL502VM-FY213T, with the following specifications.

Table 1 Specifications of the device used for the evaluation

Codeanywhere gives us access to a higher performance IDE in the cloud, unaffected by processes running internally. This is because Codeanywhere uses containers instead of virtual machines. The operating system is virtualized, so they are lighter and take up less memory. The main characteristics of the container used to develop this work are shown in the Table 2. It should be noted that, for commercial reasons, not all server information is available on the Codeanywhere website.

Table 2 Codeanywhere specifications

4.2 Tools used for measurement

Considering the results obtained in the analysis and implementation of the SNOW 3G generator [7], where by far the most consuming function was the LFSR implementation. The same result was expected for the SNOW-V implementation, since the generator structure is similar.

It is important to emphasize that for the efficiency analysis of the SNOW-V functions code, we focused on two measurements: the total execution time consumption of each function and the CPU time consumption.

Two C++ libraries were used to determine the total execution time of each function: Ctime [16] and Chrono [17]. However, the Ctime library was discarded because Chrono gave results directly in microseconds and provided better performance. Finally, the total number of cipher sequences generated to analyze the data was ten thousand. Table 3 shows the time in microseconds (μs) that each function took to execute when using Atom as the IDE and Windows PowerShell. This information is the same as that shown in the Table 1. As expected, the most time-consuming function was lfsr_update, which is responsible for calculating the feedback register of both LFSRs. The keyiv_setup function took a very similar amount of time because it includes a call to the LFSR function, which drastically increases its time.

Table 3 Execution of SNOW-V functions on the laptop

Table 4 shows the results obtained after executing the same code in the Codeanywhere online IDE. As can be seen, all functions have significantly accelerated their execution. The total time was reduced by about a third. However, the lfsr_update function was again the most time consuming.

Table 4 Execution of SNOW-V functions in Codeanywhere

Once the total time each function took to execute was determined, the CPU time consumed by each function was analyzed. For this task, the Performance Profile Generator implemented in the Visual Studio Code 2022 IDE was used. All of these tests were performed under the same conditions presented above and using the same code. After running the code with the Performance Profiler, the results shown in Table 5 were obtained.

Table 5 CPU time consumed by the functions

As shown in the Table 5, more than two thirds of the CPU time was used by the function lfsr_update, 13.49 % by the method aes_enc_round, which is responsible for performing AES operations, 4.76 % by permute_sigma, which is responsible for performing sigma permutation, and 5.95 % by the method permute_sigma. The rest of the processes and methods were not shown by the application due to their low percentage. These data can be better appreciated in Figs. 2 and 3.

Fig. 2
figure 2

CPU usage percentage

Fig. 3
figure 3

Processes used by each function

As expected, the lfsr_update function consumed a large portion of the resources in terms of total execution time for each function. The same is true for the CPU time consumed by each function. For these reasons, in this paper we will try to improve the efficiency of this method. Section 5 presents five different software implementations to try to reduce the time and resources consumed.

5 Analysis of implementation techniques for the LFSR

Since SNOW-V runs in a virtualized environment, the LFSR implementation must be software. LFSRs have traditionally been designed to operate on the binary Galois field [6]. This approach is suitable for hardware implementations, such as the previous SNOW family of ciphers, but not for software implementations, such as SNOW-V. This lack of efficiency in software implementations is due to two major drawbacks. The first is the state update of an LFSR, which occurs simultaneously in a single clock cycle in a hardware implementation. In a software implementation, however, the processor must spend many clock cycles performing this operation. In addition, if the length of the LFSRs exceeds the word size of the processors, these operations will take more time. Second, the binary LFSR provides a single output bit per clock cycle. It makes software implementations very inefficient and implies a clear waste of the capabilities of modern processors.

Therefore, this section will focus on analyzing some of the existing techniques to improve execution time and CPU. As mentioned in the previous section, we will focus on the LFSR function called lfsr_update, since it is the one that consumes the most time and CPU.

5.1 Traditional implementation

The traditional implementation is the one implicit in the code used by the authors when they published the SNOW-V paper [3]. In this case, two loops are used, so the efficiency order is quadratic O(n2). This is because this software implementation stores the register values in an array. Unlike SNOW 3G, where there was only one LFSR, there are two in SNOW-V. Thus, the output of LFSR-A is stored in variable u, while the output of LFSR-B is stored in variable v.

Fig. 4
figure 4

LFSR of the Traditional implementation

As shown in Fig. 4, to obtain the values stored in variables u and v, a series of operations are performed using registers A[0], A[1], and A[8] for the variable that feeds LFSR-A, and B[0], B[3], and B[8] for LFSR-B. Finally, the registers of both LFSRs are shifted to the right. Finally, the registers of both LFSRs are shifted to the right. In this case, each LFSR has 16 registers, but the loop only shifts the first fifteen because the last register is fed back by the value of u for LFSR-A and v for LFSR-B.

5.2 Hardcode implementation

The Hardcode implementation was implicitly used by the authors to create the SNOW 3G stream cipher. This version is very similar to the Traditional implementation. The difference is that the loop responsible for shifting all registers is removed in order to assign each value directly to the next. In this way, the efficiency order is linear O(n), instead of quadratic as in the traditional option. This is an efficiency improvement, which will be analyzed in Sect. 6, at the cost of a significant increase in code size.

Fig. 5
figure 5

LFSR of the Hardcode implementation

5.3 Circular buffer implementation

The third software implementation was the Circular Buffer. This technique avoids the computationally expensive shift register. The registers remain static and two pointers are moved through each register of the LFSR. The reason this implementation is called Circular Buffer is that when the pointers of the LFSR (buffer) reach its end, they wrap back to the beginning, creating the illusion of a circular structure. A schematic of the operation is shown in Fig. 6.

Regarding the terminology used in the Fig. 6, Pi_A and Pi_B point to the first position of each LFSR, while Pf_A and Pf_B point to the last one. They are shifted to the right every clock cycle. In addition, two three-value arrays called coefA[3] and coefB[3] are used to assign their respective states to the LFSRs. To achieve a lightweight security solution and avoid the cost of moving the registers, this approach keeps the registers in a fixed state while only shifting the pointers of each LFSR.

Fig. 6
figure 6

Operation of the Circular Buffer implementation

5.4 Sliding windows implementation

The Sliding Windows technique uses a method of the same name to traverse the LFSR using a “window”. In this version, the buffer size is doubled to twice its original length, resulting in the expansion of both LFSRs from 16 to 32 registers. To update the LFSR, the algorithm must calculate the new value to be added to it and then write it simultaneously to two positions: the first position of the window and the position indicated by the pointer. These positions are shown in Fig. 7 in gray color.

A different pointer is implemented for each LFSR, called pointerK for LFSR-A and pointerKB for LFSR-B. Two separate pointers, pointerK for LFSR-A and pointerKB for LFSR-B, are used in this implementation. Each pointer starts at state 16, which is the first position in the second half of the register, and indicates the final position of the register for each iteration. When the pointers reach the end of their respective LFSRs, they cyclically return to state 16, as in the Circular Buffer. Unlike the Circular Buffer, this implementation has the advantage of avoiding costly module operations, resulting in improved computational efficiency. In addition, this approach eliminates the need for two loops, resulting in a linear O(n) efficiency order.

Fig. 7
figure 7

Operation of the Sliding Windows implementation

5.5 Loop unrolling implementation

The final implementation, Loop Unrolling, is a widely recognized optimization technique aimed at improving code execution performance. The primary goal of this approach is to minimize the number of iterations by increasing the size of the loop body.

This method strikes a balance between the Traditional 4 and Hardcode 5 implementations, retaining loops and achieving a quadratic order of efficiency of O(\(n^2\)), but significantly reducing the number of iterations required from fifteen to only three. This is achieved by increasing the index by five at each iteration. To accomplish this, the algorithm shifts five states per LFSR during each loop iteration, trading code size for increased speed.

6 Results

After explaining the different software implementations of LFSR that will be analyzed, a comparison will be made to determine whether the traditional version used in the SNOW-V stream cipher is the most optimal. The goal is to determine if the efficiency of the algorithm can be improved by exploring other options. To accomplish this task, the Asus device presented in Table 1 was used in combination with Atom and the online IDE Codeanywhere. This approach covers both mid-range and high-end devices, allowing us to later analyze the differences between them and observe how encryption works in different environments.

The procedure for extracting the data is the same as described in Sect. 4. In addition, each implementation was run 30 times to ensure more accurate results. The execution times of the lfsr_update function were averaged and recorded in a table. The data closest to the average execution time is selected, and this value is used as a reference to determine the efficiency improvement of the total execution time across all implementations. The code with these modifications can be found in the GitHub repository created for this purpose [18].

6.1 Analysis of results with Asus device

The results of running each implementation 30 times on the Asus device are presented in Appendix Fig. 8. The data point closest to the mean is highlighted in green and is used to populate Table 6. This table shows the execution times of all functions for each implementation, as well as the total time.

Table 6 Total times for each implementation using the Asus device
Table 7 Improved overall efficiency of Sliding Windows compared to other implementations using the Asus device
Table 8 Improved efficiency of the Sliding Windows LFSR over other implementations using the Asus device

The tables indicate that the Traditional implementation is not the most efficient option in terms of the total time taken by all functions or the lfsr_update method alone. Instead, the Sliding Windows version performs the best, being 8.38% faster than the Traditional method used in the SNOW-V cipher. In addition, the Sliding Windows implementation provides a significant performance boost of 10.07% over the Circular Buffer method, which could be attributed to the use of modular arithmetic to update the LFSR indices. This operation is known to be less efficient in software implementations of an LFSR. Regarding the Hardcode implementation, which is the version used in SNOW 3G, there has been a 5.89% improvement in performance. Additionally, the Hardcode implementation is more efficient than the Traditional SNOW-V method. Table 7 presents the results for all implementations, indicating that the Loop Unrolling method is the second most efficient after Sliding Windows, with a performance difference of only 5.28%.

If we focus only on the execution time of the lfsr_update function instead of the total time, the results are quite similar. According to the data in Table 8, the Sliding Windows implementation provides an improvement of 8.36% over the Traditional method, 5.62% over the Hardcode implementation, 9.07% over the Circular Buffer method, and 4.66% over the Loop Unrolling method.

6.2 Analyze results with codeanywhere

This section presents the data extracted from Codeanywhere, an online IDE with superior performance to the Asus device. This will allow a comparison of the behavior of SNOW-V encryption on different systems and architectures.

Table 9 shows the results, which demonstrate the Traditional implementation is not the most effective method for SNOW-V, as observed on the Asus device. However, the Sliding Windows implementation outperforms the Traditional method by 21.41%. Among the other modifications, Loop Unrolling is the second fastest but is still 11.13% less efficient than the Sliding Windows implementation. The Hardcode version is faster than the Traditional one, but it is 14.66% slower than the optimal implementation. On the other hand, the Circular Buffer modification is the most expensive, with a cost of 22.47% compared to Sliding Windows. You can find these percentages in the Table 10.

See Appendix Fig. 9 for the values of the 30 code executions for each implementation in Codeanywhere. Analyzing Table 11, we can see that the Sliding Windows modification reduces the execution time of the lfsr_update function by more than a third compared to the Traditional version, specifically by 37.15%. The other implementations also show significant differences compared to the most efficient one. On average, they are 31.9% faster than Hardcode, 38.68% faster than Circular Buffer, and 26.5% faster than Loop Unrolling.

Table 9 Total time for each implementation using Codeanywhere
Table 10 Improved overall efficiency of Sliding Windows compared to other Codeanywhere implementations
Table 11 Improved efficiency of the Sliding Windows LFSR over other implementations using the Codeanywhere

7 Conclusions and future work

In this work, a theoretical and practical analysis of the software SNOW-V implementation, for 32-bit architectures, has been presented. This stream cipher is intended to be implemented as a primitive cipher in 5G systems to protect the security and privacy of such communications. This study concludes that the software implementation of SNOW-V can be improved. In particular, we address the most resource-intensive function responsible for moving the LFSR. Specifically, 4 alternatives to those originally proposed by the authors of the generator were developed. The results show that three of these implementations improve the execution times compared to the original proposal. However, the Sliding Window technique stands out. It offers an improvement between 8.38% and 21.41% compared to the Traditional method, in terms of total execution time. Analyzing only the results obtained for the LFSR function, the efficiency improvement is even greater. It reduces the execution time by 37.15% with respect to the Traditional technique.

On the one hand, possible future research on the LFSR could investigate more efficient software implementations and the use of precomputed tables for multiplication. The decision to reduce the LFSR register size to 16 bits could be explored, and the benefits of increasing the register size to 32 or 64 bits could be studied. In addition, it would be interesting to implement this encryption on mobile devices as it is designed for 5G telephony.

Moreover, this paper proposes the idea of a possible algebraic attack that could be carried out and that would be a future line of work: try to exploit this vulnerability using Gröbner bases techniques.