1 Introduction

The IoT has become more prevalent and integrated into various aspects of our daily lives. It has transformed traditional objects into smart, connected entities that can communicate with each other and with us, providing enhanced functionalities and convenience. IoT applications encompass various fields, such as smart homes or healthcare. Given their widespread presence, IoT devices are often targeted by cybersecurity attacks. For instance, a notable incident in 2016 involved a massive Distributed Denial of Service (DDoS) attack that utilized thousands of compromised IoT devices to disrupt internet services like Netflix and Spotify [1]. In another case, the Verkada hack exposed 150,000 security cameras used in schools, jail cells, hospitals, and major companies, leading to the theft of sensitive client information [2]. Legal regulations have been developed to mitigate the threat of leaking private data, such as the General Data Protection Regulation (GDPR) or the IoT Cybersecurity Improvement Act. These regulations strongly recommend using encryption and implementing encrypted IoT communication, which would be a crucial initial measure to counteract the numerous risks effectively.

Unlike the encryption used in standard 1-to-1 communication, the n-to-n communication utilized by IoT devices is more difficult to encrypt. This is because messages must be encrypted for a group of recipients, not just one recipient. An illustration of this problem is shown in Fig. 1. To effectively address the encryption challenges in n-to-n communication, SGC schemes can be employed [3]. Due to their benefits, various SCG schemes have been proposed in the literature, creating a challenge in selecting a specific scheme. This challenge is amplified by the fact that these schemes offer different features, exhibit diverse performance (depending on the aspect in question), and are based on various architectures.

Fig. 1
figure 1

Illustration of a typical IoT group communication scenario in (a) and a naive and efficient way to encrypt this group communication in (b) or (c), respectively. (Used image sources [4,5,6,7,8], based on [9].). a Illustration of a typical IoT group communication situation. A smartwatch collects data on a patient’s heart rate and wants to share this data with the doctor treating the patient and the health insurance company. Based on the transmitted heart rate data, the doctor can create a treatment plan and share it with the patient and the health insurance company. The health insurance company, in turn, sends the authorization for the assumption of treatment costs, or its rejection, to the patient and doctor. b As sensitive information is sent in the case described in (a), it should be transmitted in encrypted form to protect it from unauthorized access. This partial illustration shows a naive approach to how the smartwatch can transmit its data securely, for example. The naive approach is that the smartwatch encrypts the data individually for each recipient. c The naive approach presented in (b) for the encrypted data transmission would work in itself, but it would be inefficient. The smartwatch must perform as many encryptions as other group members. For obvious reasons, the naive approach does not scale for larger groups. For this reason, a more efficient approach, SGC schemes, is presented in this subfigure. With the help of SGC schemes, the encrypted data transmission requires only one encryption process and is independent of the number of additional group members

To assist in selecting an appropriate scheme, we have provided a comprehensive overview and guidelines in our previous work [10]. This recommendation considers both theoretical performance and features as essential factors. However, it is possible that, in theory, multiple schemes perform equally well, which raises the question of determining the best one in practical terms. Furthermore, in our previous work, we solely conducted a theoretical analysis of performance using the Landau notation. However, developers and users are primarily concerned with real-world performance. To illustrate this, consider an SGC scheme where the computation times remain constant regardless of the group size, which may seem favorable at first glance. However, in practice, it may be that it constantly takes a week to calculate a group key, making the scheme less appealing. Therefore, real measured performance values are crucial in selecting an appropriate scheme. As a result, this work aims to build upon our previous work [10] by introducing a benchmark for SGC schemes.

To facilitate the benchmarking of SGC schemes, the contribution of our work is three-fold:

  1. 1.

    We designed two use cases for the different SGC scheme classes, namely, (i) a smart city with traffic light control and (ii) vehicles forming and leaving a platoon.

  2. 2.

    We implemented for each use case a benchmark and made the specification and the implementation available.

  3. 3.

    We deployed and analyzed the performance of different SGC schemes based on IoT typical hardware to assess both the calculation time of diverse group operations and the memory consumption of the schemes.

The remainder of this paper is structured as follows. In Sect. 2, we introduce the foundations of SGC schemes and benchmarking. Section 3 elaborates on business problems necessitating SGC schemes and defines two specific IoT scenarios. Section 4 introduces the benchmark for centralized and decentralized/hybrid SCG schemes, and Section 6 introduces the benchmark for distributed/contributory SGC schemes. In Sects. 5 and 7, we evaluate the proposed benchmarks. In Sect. 8, we provide an overview and a differentiation from related work. In Sect. 9, we conclude the paper.

2 Background

This section introduces the term SGC schemes, including its definition and classification. Afterward, we discuss the basics of benchmarking.

2.1 SGC schemes

To introduce the term Secure Group Communication schemes, also abbreviated SGC, we first define this term. Next, we present how SGC schemes can be classified and highlight what performance aspects of SGC schemes are considered in the literature.

2.1.1 Definition

We define SGC Scheme in analogy to our previous work [10] and [9]. An SGC Scheme consists of two components: (i) the Group Key Management (GKM) and (ii) the Group Membership Management (GMM). The task of the GKM component is to define how a group key is generated, distributed, and updated. The group key is often referred to in the literature as a Traffic Encryption Key (TEK). The TEK allows group members to encrypt messages to the group or decrypt messages from the group [11]. The task of the GMM component is to specify the group membership operation securely. This includes, for example, the authentication of new group members who wish to join the group. The generation of keys for new group members or for the creation or removal of group members is then carried out by the GKM component.

2.1.2 Classification

Figure 2 illustrates the different classes of SGC schemes. Specifically, there are three classes: centralized, distributed/contributory, and decentralized/hybrid. These three classes differ respectively regarding the actors involved and their communication patterns. For example, in centralized and decentralized/hybrid schemes, the actors consist of a trusted CI and the group members. In the distributed/contributory schemes, the actors consist only of the group members. In the centralized schemes for creating or updating the group key, there is only communication between the CI and the group members, but not among the group members. In the decentralized/hybrid schemes, the group members also communicate. In the case of distributed/contributory schemes, communication only takes place between the group members.

Fig. 2
figure 2

Illustration of the three classes of SGC schemes ( [9] and [10])

2.1.3 Performance aspects

The performance of SGC schemes is usually analyzed based on the following three aspects ( [3, 9,10,11,12,13,14,15,16,17,18,19]): Computation times, message sizes, message amounts, and memory requirements. Regarding the calculation times, the literature distinguishes how large these are for the following group operations: group creation, addition of members, and removal of members. In addition, the literature differentiates the duration of calculation times for these group operations for the group members and, if available, the CI. The message count and size are also analyzed depending on the group operation. Also, for the storage space requirements, a distinction is made as to whether these apply to the group members or the CI, if present. Interested readers can find a detailed overview of the theoretical performance and features of over 40 common SGC schemes in the following survey [9].

2.2 Benchmarking foundations

For the introduction to the basics of benchmarking, we refer to the publication [20]. Specifically, we first introduce the different types of benchmarks and then address the quality characteristics of benchmarks.

2.2.1 Benchmark types

In general, benchmarks can be distinguished into three classes: (i) specification-based benchmark, (ii) kit-based benchmark, and (iii) hybrid benchmark. The specification-based benchmark includes the definition of the functions under consideration. This also includes specifying the function’s input parameters and the expected result by calling the function with these parameters. However, the concrete implementation of the considered functions is not part of the specification-based benchmark. The concrete implementation of the considered functions is part of the kit-based benchmark. However, since implementations usually depend on the software or hardware used, these must be defined in advance for the kit-based benchmark. If, thereby, functional differences occur, they must be resolved accordingly. Hybrid benchmarks are a mixture of kit-based benchmarks and specification-based benchmarks. Hybrid benchmarks specify the functions under consideration, including input parameters and expected results, but also provide implementations. The difference to kit-based benchmarks is that these implementations can be incomplete. A commonality among all three types of benchmarks is the initial necessity of defining a business problem. This serves as the foundation from which the functions to be considered by the benchmark can be derived.

2.2.2 Quality criteria

To ensure a sound benchmark, five characteristics are mentioned in [20], which we present in more detail below:

  • Relevance: The quality attribute relevance indicates how close the behavior considered by the benchmark is to the behavior in which the benchmark user is interested.

  • Reproducibility: The quality characteristic reproducibility describes whether running the benchmark with the same configuration consistently produces similar results.

  • Fairness: The quality attribute fairness is fulfilled if benchmark runs with different configurations can be compared without artificial restrictions.

  • Verifiability: The quality characteristic verifiability is fulfilled if the accuracy of the results, which a benchmark delivers, is correspondingly validly justified.

  • Usability: The quality attribute usability is fulfilled if a user executes the benchmark without obstacles.

3 Business problems

The initial step in creating our benchmark for SGC schemes is identifying a suitable business problem in the IoT environment. First, we must investigate whether a common business problem exists for all three SGC schemes. Otherwise, we need to define several business problems. To answer this question, we address the question of which SGC classes can be meaningfully compared with each other. According to [10], only centralized and decentralized/hybrid SGC schemes can be meaningfully compared. This is mainly because, in both cases, the same actors are present: the group members and a trusted CI. The distributed/contributory SGC schemes cannot be meaningfully compared with other classes of SGC schemes since, in terms of actors, there are only group members and no other parties, like a CI. Therefore, we must identify a separate business problem for the distributed/contributory SGC schemes. In contrast, we can use the same business problem for the centralized and decentralized/hybrid SGC schemes.

To define a business problem of the centralized and decentralized/hybrid SGC schemes, we are guided by a smart city scenario, representing a suitable environment for these two SGC classes. Specifically, we consider a scenario wherein the city administration aims to enhance the “intelligence” of the traffic lights in a city, enabling them to collaborate for an optimal traffic flow. Considering the magnitude of a possible attack, such as the complete shutdown of all traffic lights, ensuring the security of communication between the traffic lights is imperative. Therefore, the city administration wants to encrypt the communication of the traffic lights. When selecting a suitable SGC scheme, the city administration is guided by two objectives: the scheme should be as efficient as possible, and the city administration wants to control the encryption at all times. Therefore, the decision is to use a centralized or decentralized/hybrid scheme, as in these two cases, the city administration can take over the role of the CI to always have control over the encryption. Using centralized or decentralized/hybrid schemes, the city administration wants to be able to create a traffic light group with encrypted communication and add new traffic lights to this group or remove traffic lights from this group in the future. Adding a traffic light to the initially created traffic light group may be necessary, for example, because new traffic lights are installed due to road works or a traffic light-free intersection is extended with traffic lights. The removal of traffic lights occurs, for example, when a construction site is finished, and its traffic lights are dismantled. Finally, we want to underline the relevance and novelty of our business problem for centralized and decentralized/hybrid SGC schemes. The novelty of our business problem is given by the fact that cities such as Vienna or New York are planning or already implementing smart traffic lights ( [21] and [22]). To demonstrate the relevance of the chosen business problem, let us consider the following example: If in the USA, the delay per car could be reduced by only one second, for example, by using smart traffic lights, 1.5 million tonnes of CO2 could be saved or 3.9 million barrels of oil [23]. Smart traffic lights can potentially be used to save drivers time and money and protect the environment.

After presenting our business problem for centralized and decentralized/hybrid SGC schemes, we present the business problem for distributed/contributory SGC schemes. The peculiarity of distributed/contributory SGC schemes is that they do not require a CI. This may be because the group members cannot agree on a trusted party providing the CI or because communication with the CI is difficult or impossible. The latter would occur, for example, in a platooning scenario in which trucks want to join together to form a platoon. Since the trucks travel over rural roads where internet connectivity is often poor, an SGC scheme that relies on communication with a distant CI would be unsuitable. However, this IoT scenario would be ideally suited for deploying distributed/contributory SGC schemes. Therefore, we choose the platooning scenario as their business problem. Here, we assume that trucks can also be added or removed to or from the platoon after the initial creation of a platoon group. This can be necessary in the case, for example, if trucks leave from different locations and thus meet with a time delay or have other destinations and, therefore, have to take different exits. Analogous to the business problem for centralized and decentralized/hybrid SGC schemes, we would also like to show the novelty and relevance of the platooning business problem. A big argument for both points is that the EU and the European car manufacturers are very interested in truck platooning. For example, the EU funded the ENabling SafE Multi-Brand pLatooning for Europe (ENSMBLE) project under Horizon 2020 to investigate the feasibility and benefits of truck platoons [24]. The government of the Netherlands has even officially stated the realization of truck platoons as a goal and has already passed laws allowing the testing of such vehicles on public roads [25]. The European Automobile Manufacturers Association has even developed a detailed multi-stage plan to enable trucks from different manufacturers to form a platoon by 2025 [26].

4 Benchmark for centralized and decentralized/hybrid SGC schemes

After defining the business problems, we must decide whether to develop a specification-based, kit-based, or hybrid benchmark. Since in the IoT environment, the hardware used is extremely heterogeneous [27] and thus the functionalities provided differ significantly, we decided to develop a hybrid benchmark. Developing a kit-based benchmark would have required resolving all the functionality differences and defining the possible hardware, so we decided against this type of benchmark. We first define a specification-based benchmark, which we then extend to a hybrid benchmark using corresponding implementations.

Table 1 Definitions of the functions required by the city hall (=CI) with input parameters and expected results for the creation of our specification-based benchmark for centralized and decentralized/hybrid SGC schemes

4.1 Specification-based benchmark for centralized and decentralized/hybrid SGC schemes

Creating our specification-based benchmark for centralized and decentralized/hybrid SGC schemes requires considering all relevant functions, including input parameters and expected results. The functions to be considered differ only minimally between these two classes of SGC schemes. In centralized schemes, there is only communication between the CI and the group members, but the group members do not communicate with each other to generate the group key. In decentralized/hybrid SGC schemes, on the other hand, the group members additionally communicate among themselves. For this purpose, the group is divided into subgroups, which a Subgroup Controller (SgC) represents. This SgC communicates with the CI for its subgroup so that the remaining group members only need to communicate with their SgC, not the CI. Therefore, the functions of these two classes of SGC schemes differ in that (1) in centralized SGC schemes, the results of the CI are for the group members, while in decentralized/hybrid SGC schemes, they are for the SgCs, and (2) there are SgC functions in decentralized/hybrid SGC schemes and not in centralized SGC schemes. We address these differences accordingly in the functions specified below.

In Table 1, we first define the functions of the CI, which includes the initial group creation and the subsequent addition and removal of group members. This table accounts for the differences between the two SGC classes by distinguishing whether the messages are intended for the group members or SgCs. Thus, the set of parameters that must be computed for the group members or SgCs is the empty set in the case of the decentralized/hybrid and the centralized schemes, respectively.

The functions of the group members are listed in Table 2, which includes the group creation and the subsequent addition and removal of members. When adding members, we consider whether the function is performed by a new or an old group member. In specifying the functions of the group members, we also need to consider the differences between centralized and decentralized/hybrid SGC schemes. This is necessary because, in the case of centralized schemes, no group members act as SgCs. Thus, the functions of the SgC can be ignored in this case.

Table 2 Definitions of the functions required by the traffic lights (=group members) with input parameters and expected results for creating our specification-based benchmark for centralized and decentralized/hybrid SGC schemes

4.2 Hybrid benchmark for centralized and decentralized/hybrid SGC schemes

As stated earlier, we are extending our specification-based benchmark for centralized and decentralized/hybrid SGC schemes. According to [20], we must extend the specification-based benchmark with corresponding implementations; however, they do not have to be complete. Specifically, as centralized SGC schemes, we have implemented CFKM, GKMP, LKH, LKH+, OFT, S2RP, SBSA, SGCSH, SKDC and XFKS. Regarding the decentralized/hybrid SGC schemes, we have implemented the following schemes: CKA, D-OFT, DH-LKH, G-DH, D-LKH, BD and CKA. These schemes are selected based on our previous work [10].

We provided complete implementations for the above SGC schemes but specified the hardware and software to be used in advance. This way, benchmark users have a full implementation, which can serve as an orientation if the benchmark user wants to use other hardware or software.

In the following, we present our implementation in more detail and show how we have addressed the five quality features of a benchmark: relevance, reproducibility, fairness, verifiability, and usability. To this end, we present the workload for centralized and decentralized/hybrid SGC schemes and show how we fulfill the quality criteria of relevance, reproducibility, and fairness. Then, we go into more detail about the metrics and the concrete implementation, focusing on the quality criteria of verifiability and usability.

4.2.1 Centralized and decentralized/hybrid SGC schemes: workload definition

For the definition of our workload for centralized and decentralized/hybrid SGC schemes, they must reflect as closely as possible the behavior in which the benchmark user is interested. Since the benchmark user is a city administration that wants to make its traffic lights “smarter” and therefore wants them to communicate with each other in encrypted form, we first consider what behavior interests the city administration. To realize smart traffic lights, the city administration must be able to roll out the linking of traffic lights step by step. Therefore, it is vital to add traffic lights to a set of already linked traffic lights or remove traffic lights from this set. This would enable the city administration to dynamically link traffic lights in different parts of the city to gain experience with the smart traffic light system. This way, the number of traffic lights communicating with each other can be gradually increased, and the smart traffic light system can be tested in ever larger scenarios. Should problems arise, such as errors in the software for smart traffic control, the city administration can reduce the group of communicating traffic lights at any time. Concerning group operations, we believe that the city administration has an interest in the following group operations being taken into account by the workloads: (1) the initial creation of a group of traffic lights that can communicate with each other in encrypted form, (2) the addition of more traffic lights to a group of traffic lights that already communicate in encrypted form, and (3) the exclusion of a traffic light from the encrypted communication of a group of traffic lights.

Thus, our workloads include the creation of a group and the addition of members to and the removal of members from this group. For a more precise definition of the respective functions, we can refer to the corresponding specifications of our specification-based benchmark. We only have to decide which group sizes should be considered in each case. For this purpose, we are guided by the city of New York, which pursues the smart control of its traffic as a goal [22]. To estimate the number of possible traffic lights upwards, we use the number of intersections in New York, which is 13,543 [28]. If we assume that four traffic lights can be installed for each of these intersections, we get a number of 54,172 possible traffic lights. We use this number as an upper limit for our group size and round it up to 55,000 for simplicity. Thus, groups with group sizes between 2 and 55,000 could theoretically be of interest to the city administrations. Optimally, we consider all possible group sizes from the interval [2; 55,000] in our measurements. However, since one has to perform the measurements several times for each group size to be able to calculate a corresponding accuracy measure, we have decided not to run through the range with step size one but with step size 100. However, if a benchmark user wants to sample the interval more fine-grained, the step size can be conveniently changed by adjusting a variable. For decentralized/hybrid SGC schemes, we also need to determine the number of SgC and the assignment of group members to sub-groups. For the number of SgC, we are guided by the number of districts in New York and assume that each district represents a sub-group. Since there are 51 districts in New York [29], we would have 51 sub-groups. Since we also want to assume that the group members are evenly distributed among the subgroups, we increase the number of subgroups from 51 to 55 as we consider group sizes of up to 55,000. The even distribution works better with 55 subgroups than with 51.

In addition to relevance, it is also essential for our workloads that they fulfill the quality criterion of reproducibility. At first glance, the reproducibility of the measurement results is given since the group sizes can be set equal, and the steps of the group encryption algorithms are deterministic. However, these algorithms require random numbers as input parameters since otherwise, the complete key calculation would be deterministic, and thus, an attacker could also easily calculate the key. However, using random numbers in the SGC scheme algorithms poses a problem for reproducibility. Consider, for example, the centralized SGC scheme SKDC. In this case, the agreement on a group key proceeds so that the CI randomly generates a group key. The CI then sends the group key generated to the respective group members and encrypts it for each group member beforehand. Subsequently, each group member decrypts the encrypted group key with the symmetric key on which it agreed with the CI in advance. Let us consider two runs of a group creation for a group of size n, and the CI would choose the key \(g_1\) as the group key in the first case and \(g_2\) in the second case, whereby \(g_2 \ne g_1\) applies with a very high probability. Thus, the CI would encrypt the group key n times in both runs, but the necessary calculations would differ since the same group key would not be encrypted. To guarantee the reproducibility of our measurements, we must ensure that the same values are always calculated. To ensure this, we generate static variables before starting the measurements, which are used instead of the generated random variables. In the case of the centralized SGC scheme SKDC, for example, this would mean that we define a group key \(g_s\) before the actual measurements, which is always encrypted. During the measurements, our implementation would still generate a random group key \(g_r\), but after generating this random key, it would continue to calculate with the key \(g_s\). If we need more than one random variable, we generate the static variables by defining an initial static variable and then hashing it several times. The second static variable would thus correspond to the hash value of the first static variable, the third static variable to the hash value of the second static variable, and so on. In this way, we can ensure the reproducibility of our measurements. We calculate the static variables before the measurement to avoid falsifying the measurement results, such as the required calculation time.

The last quality criterion to address is fairness. This means we can perform our measurements with different configurations without artificial limitations. That is, (1) the measurements for the respective group operations automatically adapt to the respective group size under consideration, and (2) we have to determine which schemes can be meaningfully compared with each other. For the realization of (1), we have created the three variables \(k_{min}\), \(k_{max}\) and rep in our scripts accordingly, with which the interval \([ k_{min}, k_{max} ]\) is specified from which the group sizes originate and which is run through in steps of hundreds. The variable rep specifies how many measurements will be carried out per group size. We have already discussed (2) in our previous work [10]. SGC schemes can be meaningfully compared within their class. Meaningful cross-class comparisons are only possible between centralized and decentralized/hybrid schemes.

4.2.2 Centralized and decentralized/hybrid SGC schemes: metric definitions

For the definition of our metrics, we are guided by Sect. 2.1.3, in which we have captured the performance aspects for SGC schemes that are common in the literature. These consist of memory requirements, computation times, and communication overhead. However, we focus on memory requirements and computation times in this work. We do so as we would need otherwise network simulation and network test tools to analyze the performance of SGC schemes in different network conditions.

Concerning storage requirements, we distinguish which group operation is considered and whether the storage requirements apply to the CI or the group members. At this point, we would like to mention that the memory requirements do not consider the statically generated variables needed for reproducibility.

Regarding the calculation times, we first look at Figs. 3 and 4, illustrating the workflow of centralized and decentralized/hybrid SGC schemes. These figures show that the workflow for both types of SGC schemes runs sequentially in phases. In centralized SGC schemes, the CI calculates the corresponding parameters in the first phase, which the group members then process in the second phase. The CI does not need any input from the group members. The decentralized/hybrid SGC scheme workflow behaves similarly, where the CI calculates parameters in the first phase. However, these are first processed by the SgCs in the second phase before the group members use them to calculate the group key in the third phase. Since these phases run sequentially one after the other, we define the metric calculation time as the time required for calculating the individual phases. We take into account (1) which actor performs the calculation, that is, the CI, group member, and, if available, SgC; (2) for which group operation the calculations are performed, that is, the group creation, the addition of members or the removal of members. When adding members, we also distinguish whether the calculations are performed by an old or a new group member. An overview and more detailed description of the functions whose computation times we evaluate can be found in our specification-based benchmark, which consists of the specification of these functions. Simply put, we measure the execution times of the functions listed in the specification-based benchmark (see Tables 1 and  2). The static parameters required are generated before the actual function is executed and are not included in the calculation time.

Finally, we want to discuss how to ensure the quality feature verifiability utilizing our metrics. This metric describes how accurate our benchmark results are. We have decided to use the concept of confidence intervals, which is common in the literature, to indicate the accuracy of our measurement results. Specifically, we repeat each measurement 1000 times and then calculate the associated mean and 95% confidence interval.

Fig. 3
figure 3

Illustration of the sequential workflow of centralized SGC schemes, which consists of two phases

Fig. 4
figure 4

Illustration of the sequential workflow of decentralized/hybrid SGC schemes, which consists of three phases

4.2.3 Centralized and decentralized/hybrid SGC schemes: implementation

The last remaining quality criterion is the usability criterion. To show how we implemented this criterion, we go into more detail about implementing our benchmark for centralized and decentralized/hybrid SGC schemes. Specifically, we had to implement the functionality of the CI, the group members, and SgC. Here, the code of the CI, the group members, and the SgC are not executed on the same hardware. The group members and SgC represent IoT devices, which tend to be less powerful. The CI, on the other hand, can also run on more powerful hardware. Since we developed a hybrid benchmark, we decided to use the ESP32 microcontroller as the hardware for the group members and SgC. The ESP32 is a 32-bit system-on-a-chip with a 240 MHz dual-core CPU, 512 kB RAM, and the ability to establish a 2.4 GHz Wi-Fi connection. The ESP32 can be considered a typical IoT device as it is already used in many IoT projects, such as smart surveillance [30], smart saline level monitoring [31] or solar water pumping systems [32]. Specifically, we use the developer board version of this chip as it can be easily connected to a computer via USB to flash it or read out its serial pin. The serial pin is, in simple terms, the console output of the chip. We programmed the chip in MicroPython, a lean variant of Python 3, optimized for use on microcontrollers and in constrained environments [33]. Thus, our implementation should run directly on all other IoT devices that support MicroPython. If not, our implementation must be ported accordingly, replacing the MicroPython libraries used. To show how the measurements can be performed on the ESP32 using our scripts, we have provided a step-by-step tutorial in the appendix, which guides the benchmark user through the execution of a measurement.

Since the CI hardware is significantly more potent than the group members, we believe it is reasonable to assume that a modern operating system can run on this hardware. As hardware for our CI we choose a Lenovo-B50-50 with a 2 GHz CPU and 4GB RAM. As operating system we used Ubuntu 16.04 LTS. Based on this assumption, we decided to implement the scripts that realize the CI’s functionality and the measurement scripts using Python. We chose Python because (1) it is independent of the operating system and (2) it is considered very user-friendly [34], which is why we believe it should make it easy to run our benchmark. Analogous to the group members’ software implementation, we added instructions for executing our benchmark software for the CI in the appendix. We provide the benchmark itself via [35].

5 Evaluation of centralized and decentralized/hybrid SGC schemes

To evaluate the performance of centralized and decentralized/hybrid SGC schemes, we first consider the centralized schemes and then the decentralized/hybrid schemes.

5.1 Evaluation of centralized SGC schemes

We begin our evaluation of the centralized SGC schemes by analyzing the performance of the group members. Specifically, we look at the computation times for the different group operations for various group sizes and the memory requirements. Then, we analyze the CI analogously.

5.1.1 Centalized SGC schemes: group member performance

In the following, we illustrate the calculation times of the members for the group operations group creation (see Fig. 5), the addition of a member (see Fig. 6), and removal of a member (see Fig. 7) for different group sizes in the case that a centralized SGC scheme is to be used. Based on these diagrams, we can state, considering the group size, that the scheme XFKS requires the most extended calculation times, followed by the schemes SGCSH and OFT in second to third place for the most extended calculation times. However, the distances between the individual places are not even. For example, the two fastest schemes are less than 7ms apart, while the distance between the second and third fastest schemes is more than 34ms. The calculation times of some schemes, such as XFKS or SGCSH, are independent of the respective group size. In contrast, other schemes, such as OFT or CFKM, have calculation times that increase with increasing group size. Overall, for our traffic light use case, all centralized SGC schemes required less than 47ms for all group operations and sizes considered.

Fig. 5
figure 5

Calculation times of group members for joining a newly created group when using a centralized SGC scheme

Fig. 6
figure 6

Calculation times of group members for joining an already existing group when using a centralized SGC scheme

Fig. 7
figure 7

Calculation times of group members when a group member is removed from their group using a centralized SGC scheme

The storage requirements of group members in centralized SGC schemes are shown in Fig. 8. This clearly shows that the SGCSH scheme has the highest memory requirement and needs more than 100 Kilobytes more than the other centralized schemes. These required less than 0.5 Kilobytes each. However, the memory requirement of the SGCSH scheme is independent of the respective group size. In contrast, the memory requirement of the S2RP scheme, for example, increases with increasing group size. However, we believe this increase is negligible for the considered group sizes. If we take into account that the ESP32 has 448 KB ROM and 520 KB SRAM [36], i.e., a total of 968 KB memory, even the most memory-intensive centralized scheme requires less than 12% of the available memory for the group sizes considered.

Fig. 8
figure 8

Storage requirements for the group members if centralized SGC schemes are used

5.1.2 Centralized SGC schemes: CI performance

To evaluate the performance of the CI for centralized schemes, we proceed analogously to the performance evaluation of the members and consider first the required computation times and then the required memory. The computation times of the CI for the group operations group creation, adding a group member, and removing a group member are illustrated in the Figs. 9, 10, and 11, respectively. The first observation is that the calculation times for the considered group sizes are in different value ranges. Thus, group creation can sometimes take up to 100 seconds, while adding and removing members is always possible in less than 20 seconds. Similarly, the schemes regarding their runtime cannot be arranged across the 3 group operations. For example, the slowest scheme to create a group is S2RP, followed by the schemes SGCSH and LKH+. For adding a member, on the other hand, the three schemes with the most extended calculation times are as CFKM, SKDC, and SBSA. For removing a member, the ranking of the three slowest schemes is: XFKS, SKDC, and SBSA. Additionally, when adding and removing members, the remaining schemes always require less than 1 second. This is not the case with group creation, as the distance between the remaining schemes is much greater in each case.

Fig. 9
figure 9

Calculation times of the CI to create a completely new group when using a centralized SGC scheme

Fig. 10
figure 10

Calculation times of the CI to add a member to an existing group when using a centralized SGC scheme

Fig. 11
figure 11

Calculation times of the CI to revoke a member from an existing group when using a centralized SGC scheme

The storage requirements of the CI for centralized SGC schemes are illustrated in Fig. 12. The S2RP scheme has the highest storage requirements, followed by the OFCT and SKDC schemes. The memory requirement of the mentioned schemes also increases with increasing group size and can reach 25 Megabytes for the group sizes considered. The remaining schemes require significantly less than 0.1 Megabytes for the group sizes considered and can, in our opinion, be considered constant.

Fig. 12
figure 12

Storage requirements for the CI if centralized SGC schemes are used

5.2 Evaluation of decentralized/hybrid SGC schemes

To evaluate the decentralized/hybrid SGC schemes, we first examine the group members’ performance in terms of required computation times and memory requirements. Then, we similarly assess the SgCs and the CI.

5.2.1 Decentralized/Hybrid SGC schemes: group Member Performance

Group member computation times for the decentralized/hybrid SGC schemes are illustrated for the group operations group creation, member addition, and member removal in Figs. 13, 14, and 15, respectively. Note that Marks and Alohali are only shown in Fig. 13 as these schemes support this group operation. It can be seen in the figures that the computation times do not depend on the respective group sizes and are less than 1 ms for all group operations, except for the Marks scheme, which needs about 100 ms on average for the group creation.

Fig. 13
figure 13

Calculation times of group members for joining a newly created group when using decentralized/hybrid SGC schemes

Fig. 14
figure 14

Calculation times of group members for joining an already existing group when using a decentralized/hybrid SGC schemes

Fig. 15
figure 15

Calculation times of group members when a group member is removed from their group using a decentralized/hybrid SGC scheme

The storage requirements for members for decentralized/hybrid SGC schemes are shown in Fig. 16. Similar to the computation times for group members for decentralized/hybrid SGC schemes, we assume the storage requirements to be independent of the respective group size. In addition, the LNT scheme has the highest memory requirements with 0.03 Kilobytes, while the remaining schemes only require around 1.6 Bytes of memory.

Fig. 16
figure 16

Storage requirements for the group members if decentralized/hybrid SGC schemes are used

5.2.2 Decentralized/hybrid SGC schemes: SgC performance

The computation times of the SgCs in case of using decentralized/hybrid SGC schemes for the group operations group creation, adding group members, and removing members can be seen in Figs. 17, 18, and 19, respectively. Again, it should be mentioned here that the two schemes Alohali and Marks only supported the creation of a group and are therefore missing in the Figs. 18 and 19. Looking at the computation times for group creation in Figure 17, we see similarities to the computation times of the group members during group creation. The Marks scheme has the longest computation time, while the other schemes have almost negligible computation times. In addition, the computation times of Marks increase with increasing group size and can reach up to over 400 s.

Looking at the SgC computation times for adding and removing members in 18 and 19, we see that the Kronos scheme has the most extended computation times, followed by the LNT, DEP, and IGKMP schemes, which overlap within their error. Likewise, these figures show that the computation times of the SgCs for most decentralized/hybrid SGC schemes depend on the respective group size and increase accordingly with increasing group sizes.

Fig. 17
figure 17

Calculation times of the SgCs to create a new group using a decentralized/hybrid SGC scheme

Fig. 18
figure 18

Calculation times of the SgCs to add a member to an existing group when using a decentralized/hybrid SGC scheme

Fig. 19
figure 19

Calculation times of the SgC to revoke a member from an already existing group when using a decentralized/hybrid SGC scheme

The storage requirements of the SgCs in decentralized/hybrid SGC schemes are depicted in Fig. 20. The Marks scheme has the most extensive storage requirements, followed by the Dep and Riseg. Additionally, it can be seen that apart from the DEP scheme, the memory requirements of the decentralized/hybrid SGC schemes do not depend on the respective group size. However, for the DEP scheme, the memory requirement increases with the group size. It is also evident from the figure that the memory requirements of the schemes vary significantly. For example, the MARKS scheme requires constantly 16 Kilobytes, whereas RISEG, Alohali, IGKMP, Kronos, and LNT each require a constant value below 1 Kilobyte. The memory requirements of the DEP scheme range between these two values.

Fig. 20
figure 20

Storage requirements for the SgCs if decentralized/hybrid SGC schemes are used

5.2.3 Decentralized/hybrid SGC schemes: CI performance

Figures 21, 22, and 23 illustrate the computation times of the CI for decentralized/hybrid SGC schemes for the group operations group creation, adding group members, and removing group members, respectively. In the Figs. 22 and 23, the schemes Marks and Alohali are missing again as they only support group creation. In Fig. 21, the Marks scheme requires the most extended calculation times, at just under 17.5 s. The remaining schemes need significantly less than 0.5 s each for the group creation. A similar picture can be seen for the remaining schemes when adding a member in Fig. 22 and removing a member in Fig. 23. The remaining schemes require less than 75 ms. The DEP scheme takes the longest, followed by the IGKMP and LNT schemes, which share the second place. It is also evident that the calculation times are independent of the respective group size.

Fig. 21
figure 21

Calculation times of the CI to create a completely new group when using a decentralized/hybrid SGC scheme

Fig. 22
figure 22

Calculation times of the CI to add a member of an existing group using a decentralized/hybrid SGC scheme

Fig. 23
figure 23

Calculation times of the CI to revoke a member from an existing group when using a decentralized/hybrid SGC scheme

The last aspect we analyze concerning decentralized/hybrid SGC schemes is the memory requirement of the CI, which is illustrated in Fig. 24. The Riseg scheme has one of the highest storage requirements and increases with the group size to over 100 Kilobytes. The other schemes have storage requirements of less than 20 Kilobytes, which are independent of the respective group size.

Fig. 24
figure 24

Storage requirements for the CI if decentralized/hybrid SGC schemes are used

6 Benchmark for distributed/contributory SGC schemes

After presenting our benchmark and evaluating centralized and decentralized/hybrid SGC schemes, we introduce our benchmark for distributed/contributory SGC schemes in this section. To do this, we proceed analogously to the benchmark for centralized and decentralized/hybrid SGC schemes. We first create a specification-based benchmark and then expand this into a hybrid benchmark.

6.1 Specification-based benchmark for distributed/contributory SGC schemes

For the definition of our specification-based benchmark, we consider the workflow of distributed/contributory SGC schemes illustrated in Fig. 25. In this figure, the CI, which controls the flow of the respective SGC scheme in decentralized/hybrid SGC schemes and limits the number of phases to a maximum of 3, is missing. In distributed/contributory SGC schemes, the absence of the CI means that the individual group members have to interact with each other to perform the necessary calculations, which leads to a significant increase in the number of phases. We want to map this behavior meaningfully using a specification-based benchmark and define the corresponding functions. We faced the challenge that the functions are now cross-phase and that corresponding pauses are required for communication with other group members. Since the length of these pauses is largely network-dependent and our benchmark is primarily intended to benchmark the respective SGC schemes and not the network protocols used, we assumed that the intermediate pauses each have a duration of 0 seconds and are, therefore, non-existent. We plan to analyze the effects of network influences on the individual SGC schemes in our future work. Based on these assumptions, we can define our specification-based benchmark for distributed/contributory SGC schemes based on our platooning business problem in Table 3. This includes the functions that the group members must execute to (1) join an initially created group, (2) add a member to the group, and (3) remove a member from the group. Since we assume there is no pause during the respective calculations, and the group members immediately receive all the necessary parameters, we can also pass these parameters directly as a list to the respective functions right at the beginning.

Fig. 25
figure 25

Illustration of the workflow of distributed/contributory SGC schemes

Table 3 Definitions of the functions required by the group members with input parameters and expected results for the creation of our specification-based benchmark for centralized and disitributed/contributory SGC schemes

6.2 Hybrid benchmark for distributed/contributory SGC schemes

Analogous to the centralized and decentralized/hybrid SGC schemes procedure, we are now expanding our specification-based benchmark for distributed/contributory SGC schemes to a hybrid benchmark. We must again provide corresponding implementations to do this, although these can be incomplete. More precisely, we provide implementations for the distributed/contributory SGC schemas BD, DH-LKH, D-OFT, G-DH, CKA, and D-LKH. The selection of distributed/contributory is again based on our previous work [10]. Also analogous to the centralized and decentralized/hybrid SGC schemes procedure, we fully implement the distributed/contributory SGC schemes but again specify the software and hardware used. This means benchmark users already have a basis if they want to use other hardware or software and only need to port our implementations to the new hardware or software accordingly. In the following, we present our implementations for distributed/contributory SGC schemes in more detail and show how we fulfill the five quality criteria: relevance, reproducibility, fairness, verifiability, and benchmark usability. We first address the quality features’ relevance, reproducibility, and fairness by presenting our workload for distributed/contributory SGC schemes. Then, we show how we have considered the quality features verifiability and usability by presenting our implementation and metrics in more detail.

6.2.1 Distributed/contributory SGC schemes: workload definition

To achieve the quality feature relevance, our benchmark must reflect the behavior the benchmark user is interested in as closely as possible. As a business problem, we had defined a platoon of vehicles, which should be able to form spontaneously, whereby the number of vehicles participating in the platoon is dynamic over time. Therefore, our workloads must support the initial creation of a platoon and the addition and removal of vehicles to and from the platoon. Thus, our workloads must include the following group operations: (1) Initial creation of a group, (2) adding members to the group, and (3) removing members from the group. For a more detailed description of the associated workloads and functions, please refer to our specification-specific benchmark (see Table 3), which we still need to supplement with the specific group sizes. Specifically, this means we must determine how large the platoons in our workloads can be. To do this, we rely on the work of Sala et al. [37], which examines the capacity of freeway lanes with platoons. Although the maximum accepted platoon length is 20 [37], we choose the maximum number of vehicles in a platoon to be 30 to create a more challenging security scenario. In other words, our workloads, therefore, include all group sizes from the interval [2; 30].

In addition to the relevance quality feature, our workloads should also fulfill the reproducibility criterion. Consequently, we proceed analogously to the workloads of the centralized and decentralized/hybrid SGC schemes and generate random variables but then use predefined variables instead. More precisely, we set a value by hand and then hash it as often as the corresponding variables are required. We carry out this static generation of variables before executing the respective function, whereby each hash value corresponds to a variable value. This procedure ensures that all calculations are always carried out with the same numbers, thus ensuring reproducibility.

Next, we consider the quality criterion fairness for our workload for distributed/contributory SGC schemes. We achieve this feature analogously to the workloads for centralized and decentralized/hybrid SGC schemes by using the three variables \(k_{min}, k_{max}\), and rep to conveniently define the measurement interval for the group sizes \([k_{min}; k_{max}]\), as well as the number of repetitions rep per group size. In addition, we only compare distributed/contributory SGC schemes with each other and not with schemes from different classes, as this is not reasonable and fair [10].

6.2.2 Distributed/contributory SGC schemes: metrics and implementation

Recall the assumptions made for the workflow of the distributed/contributory SGC schemes, as depicted in Fig. 25. These assumptions include that the pauses between the phases are 0 seconds each, and individual group members can access all the parameters they would receive during individual phases right from the start. Consequently, individual group members can complete all required calculations in a single iteration. Thus, the workflow of the particular group members is no longer cross-phase and corresponds to that of the centralized or decentralized/hybrid SGC schemes. For this reason, we define our metrics for the distributed/contributory SGC schemes analogously to those for the centralized or decentralized/hybrid SGC schemes. We have realized the quality feature verifiability with the help of confidence intervals by repeating each measurement 1000 times and calculating the corresponding 95% confidence interval.

When implementing the distributed/contributory SGC schemes, we also followed the implementation of the centralized or decentralized/hybrid SGC schemes to ensure the usability quality feature. Thus, we used the ESP32 microcontroller as hardware for the group members, which we programmed using (Micro)Python. The instructions listed in the appendix for carrying out the measurements on the ESP32 for centralized or decentralized/hybrid SGC schemes also apply to the distributed/contributory SGC schemes. We provide the benchmark itself via [35].

7 Evaluation of distributed/contributory SGC schemes

After presenting our benchmark for distributed/contributory SGC schemes, we evaluate it in this section. In contrast to the centralized and decentralized/hybrid SGC schemes, we only have the group members as actors and no CIs. In the following, we first consider the computation times of the group members for the individual group operations and then the memory requirements.

The calculation times of the individual group members for the group operations (1) group creation, (2) adding a group member, and (3) removing a group member are illustrated in the Figs. 26, 27, and 28, respectively. Please note that the schemes BD and CKA support only the creation of a group and not the subsequent change of its composition. Thus, they are not depicted in the Figs. 27 and 28. Moreover, the confidence intervals of almost all schemes are barely visible in these figures, except for scheme D-LKH. Regardless of which group operation is considered or which scheme is analyzed, the calculation time is less than 0.3 seconds for the group sizes considered. For the BD, DH-LKH, D-OFT, G-DH and CKA, the calculation times increase with increasing group size, while D-LKH is almost not affected. The group creation takes the longest for all distributed/contributory SGC schemes analyzed. These graphs showcase the impact of a missing CI: The centralized schemes for group sizes of up to 55,000 members only require less than 0.1 s, whereas the distributed/contributory SGC schemes for 30 members sometimes require more than 0.1 s.

Fig. 26
figure 26

Calculation times of group members for joining a newly created group when using distributed/contributory SGC schemes

Fig. 27
figure 27

Calculation times of group members for joining an already existing group when using a distributed/contributory SGC scheme

Fig. 28
figure 28

Calculation times of group members when a group member is removed from their group using a distributed/contributory SGC scheme

Finally, we investigate the memory requirements of the distributed/contributory SGC schemas in Fig. 29. The memory requirements of the schemas BD and CKA do not depend on the respective group size and are below 0.1 Kilobytes. The memory requirements of the remaining schemes, overlapping in this figure, increase with the group size up to over 0.8 Kilobytes. Let us compare the memory requirements with those of the centralized SGC schemas. All but one of the centralized SGC schemas require less memory than the distributed/contributory SGC schemas. This is particularly interesting as we have considered group sizes of up to 55,000 for these. The lack of a CI harms calculation times and memory requirements but also offers the advantage that no central party must be trusted.

Fig. 29
figure 29

Storage requirements for the group members if distributed/contributory SGC schemes are used

8 Related work

In this section, we review related work and highlight the novelty of our contributions.

There are several works in the literature that perform a theoretical performance analysis of SGC schemes [3, 9,10,11,12,13,14,15,16]. Our approach differs from these works by offering a distinctive perspective—we scrutinize the performance of SGC schemes through measurements on actual hardware, providing a concrete and practical evaluation. For this purpose, we have implemented the schemes and defined corresponding workloads that meet the quality requirements of a benchmark.

Concerning practical performance evaluations, no benchmark in the literature covers all classes of SGC schemes [18]. However, our former work [18] provides a benchmark focusing on centralized SGC schemes. In addition to this category-specific benchmark, there are several performance analyses (e.g., [17, 38] and [19]) for individual SGC schemes, but no systematic benchmarks. In contrast to these works, we have developed a benchmark that includes systematic performance analyses and covers all classes of SGC schemes.

Besides performance analyses based on actual measurements, the literature only evaluates the performance of the encryption and decryption process, but not how the key agreement process is performed (e.g., [39,40,41,42], and [43]). We distinguish ourselves from these studies by explicitly incorporating the key agreement process into our analysis.

In addition to analyzing the use of SGC schemes for IoT, there are also works that explore the use of other types of schemes for IoT. For example, the works [44] and [45], which consider the use of transport layer encryption TLS, or the works [46, 47] and [48], which focus on attribute-based encryption. However, we do not discuss these works in more detail since TLS, for example, provides only transport encryption, while SGC schemes offer end-to-end encryption. Attribute-based schemes are designed for use cases in which there may, in turn, be subgroups in a group, with the subgroups in turn having different permissions. SGC schemes, on the other hand, assume that group members are only part of one group.

9 Conclusion

The pervasive integration of IoT devices in our daily lives has brought forth heightened cybersecurity concerns. SGC schemes have been present to face the challenges presented by n-to-n communication in IoT. While prior works provide valuable theoretical analyses based on performance and features to guide SGC scheme selection, more structured real-world performance assessments are required. To fill this gap, our work contributes by designing two use cases that specifically demand SGC solutions. We implement corresponding benchmarks and deploy and analyze the performance of varied SGC schemes on microcontrollers. The outcomes of our experiments shed light on the substantial impact of a missing CI in distributed/contributory SGC schemes on calculation times and storage.

In future work, we plan to analyze the network influences on the performance of SGC schemes. To this end, using our benchmark results, we plan to use network simulation tools such as OMNet++ to simulate different typical IoT network scenarios and the performance of IoT devices. In addition, we plan to use attribute-based encryption schemes and compare them with SGC schemes. We also plan to investigate the use of (1) signature schemes such as [49] and (2) signature encryption schemes that combine signing and encryption in one logical step, such as [50, 51] for IoT use cases.