Introduction

Nowadays, the security of IT systems in companies and organizations depends on a variety of factors. They must provide a good protection against external risks and attacks like malware and phishing. One threat to corporate IT systems that should not be underestimated is internal malicious use, whereby employees manipulate data and exploit their position in the company to enrich themselves. Such cases are referred to as occupational fraud. It has been shown that fraudulent or erroneous behavior by employees can lead to significant business losses [1].

One approach to addressing this problem is to apply thorough authorization management and access control mechanisms. One widely used approach to this is role-based access control (RBAC). Instead of assigning permissions directly to users, they are grouped into roles, which are then assigned to users [2]. The corresponding optimization problem is called the role mining problem, which is about finding a minimum number of roles and a corresponding assignment of permissions to roles and roles to users, and was shown to be NP-complete [3]. The underlying permission-to-user assignments are usually assumed to be invariant in time. However, this assumption does not match the requirements of real-world use cases. For instance, employees change positions and departments and join or leave the company. Such behavior leads to structural changes, meaning the assignment of permissions to users must be understood as a dynamic construct. Furthermore, users of role mining software should be able to influence the role mining process, e.g., by specifying preferences or by manually manipulating solutions. This leads to additional events that need to be dynamically included into the role mining process.

To address this, we have created a definition of the Dynamic Role Mining problem and explored different strategies for assigning roles to new employees of an organization [4]. The goal of this research is to provide a comprehensive overview of dynamically occurring events that are relevant in the context of RBAC and their integration into evolutionary algorithms for role mining. For events arising from structural changes of an enterprise, corresponding event handling methods are provided and investigated in detail. Moreover, all presented event-handling methods are evaluated in a series of experiments to examine the advantages of dynamic optimization compared to static role mining.

The remainder of this paper is organized as follows: Sect. “Role Mining in Static Environments” presents the basic role mining problem and corresponding solution strategies. In Sect. “The Dynamic Role Mining Problem”, we introduce to dynamic optimization problems in general before defining the Dynamic Role Mining Problem. Sect. “Role Mining In Dynamic Environments” provides a broad overview of different event types relevant for role mining and their integration into the framework of an evolutionary algorithm. Furthermore, for events triggered by structural changes in business environments, corresponding event-handling methods are presented. In section “Experiments and Evaluation”, we evaluate the results of the experiments performed. Section “Conclusion and Future Works” provides a conclusion of the presented research and points out avenues for future research.

Role Mining in Static Environments

This section introduces the Sect. “The Basic Role Mining Problem”, presents some of the most common solution strategies, and illustrates how evolutionary algorithms can be adapted to solve the RMP. These provide the framework for integrating the dynamically occurring events considered in the following sections.

The Basic Role Mining Problem

A first definition of the RMP was proposed by Vaidya et al. as Minimum Biclique Cover Problem [3]. However, for the application of evolutionary algorithms, the matrix-representation of the RMP is more suitable:

  • \(U=\{u_1,u_2,...,u_m\}\) a set of \(M=|U |\) users

  • \(P = \{p_1,p_2,...,p_n\}\) a set of \(N=|P |\) permissions

  • \(R = \{r_1, r_2,...,r_k\}\) a set of \(K=|R |\) roles

  • \(UPA \in \{0,1\}^{M \times N}\) the targeted permission-to-user assignment matrix, where \(UPA_{ij}=1\) implies, that permission \(p_j\) shall be assigned to user \(u_i\).

  • \(UA \in \{0,1\}^{M \times K}\) a possible assignment of roles to users, where \(UA_{ij}=1\) implies, that role \(r_j\) is assigned to user \(u_i\).

  • \(PA \in \{0,1\}^{K \times N}\) a possible assignment of permissions to roles, where \(PA_{ij}=1\) implies, that permission \(p_j\) is assigned to role \(r_i\).

Based on the presented matrix representations, a definition of the Basic Role Mining Problem in its matrix decomposition version can be provided.

The Basic Role Mining Problem (Matrix Decomposition Version)

Given a set of users U, a set of permissions P and a permission-to-user assignment matrix UPA, find a minimal set of Roles R, a corresponding role-to-user assignment matrix UA and a permission-to-role assignment matrix PA, such that each user has exactly the set of permissions granted by the UPA matrix:

$$\begin{aligned} {{\text {Basic RMP}}} = {\left\{ \begin{array}{ll} \text {min } &{} |R |\\ \text {s.t.,} &{} {\text {UA}} \otimes {\text {PA}} = {\text {UPA}}. \end{array}\right. } \end{aligned}$$
(1)

where \(\otimes \) denotes the Boolean Matrix Multiplication:

$$\begin{aligned} ({\text {UA}} \otimes {\text {PA}})_{ij} = \bigvee _{l=1}^k(UA_{il} \wedge PA_{lj}). \end{aligned}$$

A role concept \(\pi =\left\langle R^\pi , {\text {UA}}^\pi , {\text {PA}}^\pi \right\rangle \), consisting of a set of roles \(R^\pi \), a role-to-user assignment \({\text {UA}}^\pi \) and a permission-to-role assignment \({\text {PA}}^\pi \), denotes a candidate solution for a given Basic RMP. It is called a feasible solution, if it satisfies \({\text {UA}}^\pi \otimes {\text {PA}}^\pi ={\text {UPA}}\). For the Basic RMP, in particular, a feasible solution is also denoted 0-consistent.

The objective of the role mining problem is to find a feasible solution that involves a minimum number of roles. An upper bound on the minimum number of roles can easily be calculated based on the given permission-to-user assignment matrix UPA. If the number of users is less or equal than the number of permissions \(m \le n\), one role is created for each user, which is assigned exactly the permissions that are also assigned to the considered user. Hence, \({\text {UA}}=I_m\) and PA = UPA, where \(I_m\) denotes the m-dimensional identity matrix. Since \({\text {UA}} \otimes {\text {PA}} = I_m \otimes {\text {UPA}} = {\text {UPA}}\), this solution complies with the 0-consistency constraint. Furthermore, \(|R |= m\). If \(n < m\), for each permission one role is created, which is assigned the considered permission only. The created roles are then assigned to users according to UPA. Hence, UA = UPA and \({\text {PA}} = I_n\). Since \({\text {UA}} \otimes {\text {PA}} = {\text {UPA}} \otimes I_n = {\text {UPA}}\), this solution also complies with the 0-consistency constraint. Furthermore, \(|R |= n\). Therefore, the minimum number of roles is bounded by \({\text {min}}\{m,n\}\).

Figure 1 shows an example of the schematic representation of the UPA, UA and PA matrix. For better visualization, black cells indicate 1’s and white cells represent 0’s. This representation technique is also used in the further course of this paper to illustrate binary matrices.

Fig. 1
figure 1

Schematic representation of UPA, UA and PA matrix

There are many other variants in which the RMP is modified, for example by relaxation of the 0-consistency constraint or the inclusion of more business-driven objectives like administrative costs [5] or compliance aspects and license costs [6]. A detailed survey on the different variants of the RMP is provided by Mitra et al. [7].

General Solution Strategies

The role mining problem is a well-studied problem in literature. Accordingly, there are many established solution techniques. An overview of different RMP variants and solution strategies can be found in Ref. [7]. Therefore, only the most important contributions and solution strategies are summarized below. A widely used approach to tackle the RMP is to group permissions. From these groups a set of roles is created. Different methods are used for this grouping. These roles are then assigned to users to obtain appropriate role concepts [8,9,10,11,12]. Often, the role mining problem is approached by mapping it to other well-known problems in data mining, such as the minimum tiling problem [3], the Set Cover Problem [13] or the Minimum Biclique Cover Problem [14]. Other approaches are based on graph optimization [15] or formal concept analysis [16].

Evolutionary Algorithms for the RMP

Since the RMP is NP-complete, evolutionary algorithms (EAs) are a common strategy to search for good solutions. In particular, Saenko and Kotenko have published several approaches to coping with the RMP based on EAs [17,18,19]. Other publications that also use the concept of EAs for the RMP can be found, for example, in Refs. [20] or [21]. In our contribution, we also use an evolutionary algorithm called addRole-EA, as presented in Ref. [22], as the basis for integrating the event handling methods developed. Therefore, the following paragraph is devoted to a brief introduction to evolutionary algorithms in the context of the role mining problem.

EAs represent a population-based optimization strategy following the principle of survival of the fittest. At the beginning, an initial population of individuals, each representing one possible solution of the RMP is generated, e.g. at random. From this, the best individuals, in terms of a predefined fitness function, are selected for mutation (modification of the genome of an individual) and crossover (exchange of genetic information between individuals), which leads to the creation of additional individuals. For the Basic RMP, the number of roles \(|R^\pi |\) of an individual constitutes its fitness value. Again, only the best individuals are selected to survive, thus being part of the next generation. This procedure is repeated iteratively, until a stopping condition is met (usually a maximum number of iterations, a maximum number of iterations without improvement of the global best fitness value or a given solution quality). A top-level description of a general evolutionary algorithm is given in Fig. 2. A more detailed introduction to evolutionary algorithms is provided, for example, in Ref. [23].

Fig. 2
figure 2

Top-level description of an EA based on [4]

In the addRole-EA, each individual coincides with a possible role concept \(\pi =\left\langle R^\pi , {\text {UA}}^\pi , {\text {PA}}^\pi \right\rangle \). Since in real-world use cases roles usually contain a small number of permissions and only a small set of roles is assigned to each user, it is logical to use the sparse format for matrices to represent UA and PA to save memory space. The encoding of an individual in the addRole-EA is illustrated in Fig. 3.

Fig. 3
figure 3

Encoding of individual for addRole-EA

The unique feature of the addRole-EA is its addRole-method. This allows the addition of new roles to \(R^\pi \), \({\text {UA}}^\pi \) and \({\text {PA}}^\pi \) of an individual \(\pi \), such that the 0-consistency constraint is fulfilled at all times. Subsequently, all roles, that became obsolete by the addition of the new roles, are deleted from the individual, ideally resulting in a reduction of the total number of roles. For mutation, new roles are created (e.g. by the intersection of the permission sets of different users or by merging or splitting of old roles), which are then assigned to individuals by means of this method. In recombination, the roles of different individuals are exchanged, again using the addRole-method. The addRole-EA is a steady-state evolutionary algorithm. For replacement an elitist selection scheme is applied based on the total number of roles (Basic RMP). A detailed description of the addRole-EA is provided in [22].

The Dynamic Role Mining Problem

This section introduces to the Dynamic Role Mining Problem. At first dynamic optimization problems and corresponding sources of dynamics are defined in general. Subsequently, a definition of the Dynamic Role Mining Problem is derived.

Dynamic Optimization Problems

Dynamic optimization problems are characterized by objective functions or restrictions that change with time. These changes are triggered by dynamically occurring events with direct or indirect influence on the specifications of the optimization problem or the associated optimization process.

One possible source of dynamics consists in events triggered by external factors. A good example of this are tour and route planning problems. If the corresponding Vehicle Routing Problem is not adapted properly to dynamically occurring delivery requests or cancellations and changing travel times between destinations due to uncertain and varying traffic conditions, significantly worse or even infeasible optimization results are obtained [24].

Another source of dynamics consists in the interaction of users of optimization software, so-called decision makers (DM), with the corresponding optimization process. To classify events triggered by the interaction of a DM with an optimization process, König and Schneider distinguish between direct and indirect manipulations [25]. Direct manipulations imply the modification of solution candidates, while indirect manipulations comprise changes of optimization objectives or constraints as well as adaption of parameters of the applied optimization algorithm. Nascimento considers interaction possibilities with evolutionary algorithms for dynamic optimization. Additionally to direct and indirect manipulation, it is allowed for dynamic focusing on manually chosen sub-problems Furthermore, interaction possibilities aiming at the specifications of evolutionary algorithms, like deliberate inclusion of certain individuals into the population of an evolutionary, algorithm are presented [26].

A survey on optimization in dynamic environments is provided by Cruz [27] offering the following formal definition of a dynamic optimization problem in its most general form:

$$\begin{aligned} {\text {DOP}} = {\left\{ \begin{array}{ll} \text {optimize } &{}f(x,t)\\ \text {s.t.,} &{}x\in F(t)\subseteq S, t\in T. \end{array}\right. } \end{aligned}$$

where:

  • \(S\in {\mathbb {R}}^n\), S is the search space.

  • \(t\in T\) is the time.

  • \(f: S\times T\rightarrow {\mathbb {R}}\) is the objective function, that assigns a numerical value (f(xt)) to each possible solution \((x\in S)\) at time t.

  • F(t), is the set of feasible solutions \(x\in F(t)\subseteq S\) at time t.

It can be seen that in this definition the objective function and the constraints are time-dependent. In the following, this concept is applied to role mining resulting in a definition of the Sect. “The Dynamic Role Mining Problem”.

The Dynamic Role Mining Problem

Although it would appear natural due to the multitude of changes that occur in business environments, there is little research on dynamic role mining. A first approach was presented by Saenko and Kotenko [19]. Changes concerning the assignment of permissions to users are aggregated into a matrix \({\text {UPA}}_1\). After a certain period of time, the \({\text {UPA}}_1\) matrix is compared to the original user-permission-assignment \({\text {UPA}}_0\) and the corresponding role concept \(\pi _0 = \left\langle R_0, {\text {UA}}_0, {\text {PA}}_0 \right\rangle \), which is currently implemented at the considered company. Based on that, the so-called RBAC Scheme Reconfiguration Problem is defined. It consists of finding a new role concept \(\pi _1 = \left\langle R_1, {\text {UA}}_1, {\text {PA}}_1 \right\rangle \), where \({\text {UA}}_1 = {\text {UA}}_0+\Delta {\text {UA}}\) and \({\text {PA}}_1 = {\text {PA}}_0 + \Delta {\text {PA}}\) such that:

$$\begin{aligned}&{\text {RBAC Scheme Reconfiguration Problem}}=\\&\quad {\left\{ \begin{array}{ll} \text {min } &{}\left\| \Delta {\text {UA}} \right\| _1 + \left\| \Delta {\text {PA}} \right\| _1 \\ \text {s.t.} &{}{\text {UA}}_1 \otimes {\text {PA}}_1 = {\text {UPA}}_1. \end{array}\right. } \end{aligned}$$

Hence, the RBAC Scheme Reconfiguration Problem is about finding a new role concept \(\pi _1\) which fulfills the conditions defined by \({\text {UPA}}_1\) and contains as few changes as possible compared to the old role concept \(\pi _0\). However, only permission changes of already existing employees are considered. New employees or employees leaving the company are not taken into account in this approach. Another disadvantage is the aggregation of changes over a certain period of time. Events such as the arrival of new employees or events triggered by user interaction, however, require an integration into the optimization process, if possible in real time. To reflect this, we provide a definition of the Dynamic Role Mining Problem:

$$\begin{aligned}&{\text {DynRMP}}=\\&\quad {\left\{ \begin{array}{ll} \text {min } &{}|R(t) |\\ \text {s.t.} &{} {\text {UA}}(t) \otimes {\text {PA}}(t) = {\text {UPA}}(t), t\in T. \end{array}\right. } \end{aligned}$$

Analogous to the definition of general dynamic optimization problems in Sect. “Dynamic Optimization Problems”, \({\text {UPA}}(t)\) as well as \({\text {UA}}(t)\) and \({\text {PA}}(t)\) are modeled time-dependent. Solving the DynRMP, therefore means finding an optimal role concept \(\pi (t)=\left\langle R^\pi (t), {\text {UA}}^\pi (t), {\text {PA}}^\pi (t) \right\rangle \) for each point in time \(t\in T\).

Role Mining in Dynamic Environments

In this section, we describe and classify the various events relevant to role mining. Subsequently, it is explained, how the presented events can be integrated into an evolutionary algorithm. For events emerging from structural changes in business landscapes, we present detailed event handling methods.

Events

In the following, the main events relevant for dynamic role mining are presented and classified. There are two different sources that can trigger events relevant in role mining: (1) interactions of a decision maker (DM) with the role mining software, (2) changes in the company’s structure or staffing.

Interaction Events

This section provides an overview of user interaction events. These events are given identifiers I01 to I17 listed in Tables 1, 2, 3 and 4. They can be classified into the categories defined by König and Schneider [25] or Nascimento [26] (see Sect. “Dynamic Optimization Problems”). The first category contains events which lead to a direct manipulation of the solution candidates (see Table 1). Generally, the main issue here is editing, adding or removing roles. (I01–I06). In the context of compliance regulations, it is possible that certain combinations of permissions must not be assigned to the same role. This is addressed by interaction I07, which allows a DM to manually specify or edit segregation of duty (SoD) conflicts, i.e. critical combinations of permissions. For more details about SoD-conflicts. Refer to Ref. [6], where benchmark instances are provided that include compliance restrictions. At this, critical combinations of permissions are aggregated in a compliance matrix C. To assess the severity of a SoD-conflict in C, a corresponding weight vector is introduced. From this, a compliance score can be calculated for each individual.

Table 1 List of events induced by manipulating solutions

Another interaction possibility of a DM with the optimization process is to adjust the parameters of the evolutionary algorithm used. Furthermore, specific mutation or crossover operators, that have turned out to be strong in certain situations, can be selected manually. Since these indirect interactions strongly depend on the algorithm used, Table 2 provides only a few examples.

Table 2 List of events induced by adapting parameters

If a DM is already satisfied with the optimization results achieved for certain areas, like the users of certain departments of the company or other sets of users, or if the DM would like to enforce the optimization in other areas, the focus of the optimization process can be adjusted. For example, the optimization focus could be set on a certain set of users (I14). In addition, a DM can exclude users and roles, that he or she is already satisfied with, from further role optimization and thus reduce the problem size. An overview of such interaction events is given in Table 3.

Table 3 List of events induced by adjusting the optimization focus

EAs bear the risk of getting stuck in local optima. For this reason, it might be interesting to store some solution candidates from previous iterations. This way, it is possible to return to the stored solution candidates, thereby avoiding the necessity of a complete restart of the optimization process. In dynamic optimization in particular, the fitness landscape is subject to change over time. Hence, it is possible that some of the stored individuals may have better fitness values than the individuals of the current population. Therefore, injecting such stored individuals into the current population appears to be a promising approach. This is also referred to as memory-based evolutionary algorithms which have been covered in previous publications. A survey on memory-based evolutionary algorithms is provided by Branke [28].

Here it is proposed to adopt the concept of memory-based evolutionary algorithms to the domain of role mining with user interaction. In this scenario, the DM can store interesting role concepts into a so-called role concept repository. The DM may also analyze and possibly deploy them later. The resulting interaction possibilities are listed in Table 4. If we consider multi-objective role mining, in which the 0-consistency constraint is relaxed or business-driven objectives like license costs [6] or administrative costs [5] are included, further interaction events can arise like the weighting or ranking of the different optimization objectives or setting thresholds for certain objectives. However, this paper focuses on single-objective role mining problems, and such events are not examined any further.

Table 4 List of events induced using a role concept repository

Structural Events

In addition to events triggered by a DM, there are events which are caused by changes in the structure of a company as employees change positions and responsibilities, or as they join or leave the company. An overview of such structural events (S01–S04) is shown in Table 5.

Table 5 List of events induced by structural change

Since these events are described in more detail in Sect. “Event Handling”, they will not be discussed further at this point.

Event Handling

To process the events defined in Sect. “Events” close to real time, it is important to forward them to the optimization process immediately after occurrence. For this purpose, the iterative course of evolutionary algorithms is of great advantage, as it can be checked at the beginning of each iteration, whether one or more events are currently pending. If necessary, the corresponding event-handling methods can then be executed to adapt the individuals of the current population of the EA to the new conditions of the business environment. Figure 4 shows the alteration of the sequential process of the EA for the integration of the event-handling methods.

Fig. 4
figure 4

Integration of the event-handling methods into an evolutionary algorithm [4]

Aggregation of Users

There are some differences between static and dynamic problems regarding the encoding of role concepts. Most companies fill a majority of their positions at least twice to prevent hindrances in the operational process in case of vacation, illness or leaving of employees. Users, that are assigned the same set of permissions, are often aggregated to classes prior to the actual role mining process [13, 22]. Each class of users and the associated permission set can be represented by one row of the UPA matrix to reduce the problem size. To be able to identify all users even after aggregation, each user is assigned a unique UserID. Subsequently, the UserIDs corresponding to each user class are stored in a separate user mapping. After optimization, all users of one user class are assigned the same set of roles. In case of static role mining, especially in the presented Basic Role Mining Problem, the cardinality of these classes is disregarded. However, in the dynamic case, it plays an important role, as users join or leave the company over time. For this purpose, the cardinality of each user classes is added UserCount to each row of the UPA (and UA) matrix. Furthermore, a temporary users list \(U_{\text {temp}}(t)\) is introduced. It can be considered a technical auxiliary tool to ensure that users, who are known to be leaving the company or changing positions, can still be provided with permissions for a certain period of time, independent of the ongoing role optimization process. The set of permissions of a user results from the permissions which he or she is assigned by the currently implemented role concept as well as from the permissions of \(U_{\text {temp}}(t)\). In this way, the user known to be leaving the company can continue to do his or her work until the day of departure, without affecting the further role optimization process.

Figure 5 shows an example of a user mapping as well as the temporary users list \(U_{\text {temp}}(t)\) corresponding to the exemplary UPA matrix of Fig. 1. It currently includes 7 users. The users u101, u102 and u103 belong to the first user class and are assigned \(p_1\), \(p_2\) and \(p_3\). Users u104 and u105 belong to the second user class and are assigned \(p_3\), \(p_4\) and \(p_5\). User u106 belongs to the third user class and is assigned permissions \(p_4\), \(p_5\) and \(p_6\). User u107 will leave the company in the foreseeable future and is therefore no longer part of the optimization process, but is still assigned permissions \(p_2\) and \(p_4\) through means of the \(U_{\text {temp}}(t)\).

Fig. 5
figure 5

Example of user mapping incl. \(U_{\text {temp}}(t)\)

In the following, event handling methods corresponding to the events S01-04 are presented with focus on the update of \({\text {UPA}}(t)\), \({\text {UA}}(t)\) and \({\text {PA}}(t)\). These are illustrated in the framework of a continuous example building on the situation in Fig. 5.

User Joins Company (S01)

The knowledge that a new employee will join a company triggers event S01 and the associated event handling method. Since it is usually known in advance that a new user will be joining a company, a distinction is made between the occurrence of the information and the actual entry of the user. To make the best use of this lead time, the event can be further segmented, such that the information on the future arrival of the new user is included into the optimization process as soon as it occurs, whereas the role concept is adapted when the new user joins the company, see Fig. 6.

Fig. 6
figure 6

Sequential handling of S01

To update the \({\text {UPA}}(t)\) matrix accordingly in S01a, it is necessary to distinguish between two cases:

Case 1: Permission set of new user equals permission set of existing user.

In case the new user is assigned exactly the same permissions as at least one of the already existing users, the processing of the event is straight-forward, as there is already a user class corresponding to the new user. Therefore, only the UserCount of the corresponding user class must be increased. \({\text {UPA}}(t)\), \({\text {UA}}(t)\) and \({\text {PA}}(t)\) remain unchanged.

An example for the handling of this case of S01a can be found in Fig. 7. In this case, a new user u108 joins the company, which was illustrated in Fig. 5, and is assigned permissions \(p_1\), \(p_2\) and \(p_3\) and is thus categorized into the first user class \(C_1\).

Fig. 7
figure 7

Exemplary handling of S01a Case 1 based on [4]

Case 2: Permission set of new user does not equal permission set of existing user.

If there is no user with the same permissions as the new user in the current company structure, a new user class must be created. This is done by adding a new row with \(UserCount=1\) to \({\text {UPA}}(t)\) and \({\text {UA}}(t)\). The new row of \({\text {UPA}}(t)\) corresponds to the permissions assigned to the new user. The new row in \({\text {UA}}(t)\) contains the roles which are assigned to him or her. Again, two cases must be distinguished:

Case 2.1: Permission set of new user can be covered completely by existing roles.

In this case, existing roles are assigned to the new user to provide him or her with the required permissions. At this, under consideration of the 0-consistency constraint, all roles possible can be assigned to the new user. However, this may result in the user being assigned some permissions multiple times across different roles. Thus, it might be worthwhile to assign only a subset of those roles to the new user. To address this, we presented and investigated different strategies to assign roles to new users in [4].

Figure 8 shows an example of this case, where a new user u109, being assigned \(p_1\), \(p_2\), \(p_3\) and \(p_4\), joins the company. Since there is no other user being assigned the same permissions, a new user class \(C_4\) is created for the new user. In addition, all permissions required for u109 can be covered by assigning roles \(r_1\), \(r_2\) and \(r_4\).

Fig. 8
figure 8

Exemplary handling of S01a Case 2.1 based on [4]

Case 2.2: Permission set of new user cannot be covered completely by existing roles.

It is possible that, even after assigning all possible roles, regarding the 0-consistency constraint, permissions of the new user still remain uncovered. In this case, a new role must be created for the new user and added as new row to \({\text {PA}}(t)\) and as new column to \({\text {UA}}(t)\). The new role can either be assigned all of the new users’ permissions or only the permissions, which remain uncovered after assigning the existing roles, are used to constitute the new role. This is also examined in Ref. [4].

Figure 9 shows an example of a new user u110, being assigned permissions \(p_1\), \(p_3\) and \(p_4\). Again, there is no other user, that is assigned the same set of permissions, such that a new user class \(C_5\) is created for u110. By assigning role \(r_2\) to u110, permission \(p_3\) can be covered. Based on that, a new role \(r_5\) is created containing the remaining, uncovered permissions \(p_1\) and \(p_4\) for user u110.

Fig. 9
figure 9

Exemplary handling of S01a Case 2.2 based on [4]

User Leaves Company (S02)

Analogous to the first event type, a distinction is also made in S02 between the occurrence of the event information and the actual exit of the employee. As soon as the information about the imminent departure of a user is transmitted, this information is included into the optimization process. For this purpose, the \({\text {UPA}}(t)\) matrix needs to be updated and the affected user is moved to \(U_{\text {temp}}(t)\). As soon as the user leaves the company, he or she is removed from \(U_{\text {temp}}(t)\) and the currently best role concept is implemented, see Fig. 10.

Fig. 10
figure 10

Sequential handling of S02

Again, the update of \({\text {UPA}}(t)\) required for S02a leads to the consideration of two different cases:

Case 1: Permission set of leaving user equals permission set of remaining user.

In this case, other users remain in the user class of the leaving user after his or her departure. The handling of this case is similar to case 1 of S01. \({\text {UPA}}(t)\), \({\text {UA}}(t)\) and \({\text {PA}}(t)\) remain unchanged. Only the UserCount of the user class corresponding to the leaving user must be reduced by 1. Additionally, the leaving user is moved to \(U_{\text {temp}}(t)\).

Figure 11 shows the exemplary departure of user u102, belonging to user class \(C_1\). Since there are other users in \(C_1\), only the corresponding UserCount is updated and u102 is moved to \(U_{\text {temp}}(t)\).

Fig. 11
figure 11

Exemplary handling of S02a Case 1 based on [4]

Case 2: Permission set of leaving user does not equal permission set of remaining user.

The case in which the leaving user is the only member of the corresponding user class requires more detailed examination. At first, \({\text {UPA}}(t)\) and \({\text {UA}}(t)\) have to be updated by removing the row corresponding to the user class of the leaving user. In addition, it must be checked, if there are roles uniquely assigned to the leaving user. If this is the case, the corresponding rows of \({\text {PA}}(t)\) as well as the corresponding columns of \({\text {UA}}(t)\) are removed. Again, the leaving user is moved to \(U_{\text {temp}}(t)\).

Figure 12 shows the departure of u110. Since u110 is the only member of the corresponding user class, \(C_5\) and the corresponding rows in \({\text {UPA}}(t)\) and \({\text {UA}}(t)\) are removed, while u110 is moved to \(U_{\text {temp}}(t)\). In addition, role \(r_5\), which is uniquely assigned to u110 can be removed from \({\text {UA}}(t)\) and \({\text {PA}}(t)\).

Fig. 12
figure 12

Exemplary handling of S02a Case 2 based on [4]

Change of Job Position (S03)

Another change in the structure of a company results from positions changes of users, which usually take place within the context of relocations and promotions. To enable a user to continue his or her previous work for a certain transition period, the permissions of the old job position must not be withdrawn immediately. At the same time, the user must already be assigned the permissions of the new job position to be able to perform the new tasks. This can lead to a state in which, a user is allowed to execute transactions, that may not normally be controlled by the same person. To mitigate these compliance conflicts, it is of highest importance to ensure that the permissions of the old job position are revoked after the transition period has expired. The sequential handling of S03 is illustrated in Fig. 13. At this, the mechanisms of S01 and S02 are used in parallel. As soon as the information about the impending change of position becomes known, the user concerned will be added to the optimization process using the methods of S01, but with unchanged UserID, while the user corresponding to the old job position is moved to \(U_{\text {temp}}(t)\), again with unchanged UserID using the methods presented for S02. At the beginning of the transition period, the current best role concept is implemented. Hence, the user is assigned the permissions of the new job position directly from the implemented role concept. In addition, the permissions corresponding to the old job position are assigned to the user by \(U_{\text {temp}}(t)\). At the end of the transition period, the user is removed from \(U_{\text {temp}}(t)\) such that only the permissions required for the new job position remain in the user’s permission set.

Fig. 13
figure 13

Sequential handling of S03

Permission Request (S04)

In the day-to-day business of companies, it is possible that users lack certain permissions to perform the tasks at hand. If this is the case, a permission request can be submitted to report which permissions are required additionally. Subsequently, this is reviewed by a supervisor and either approved or rejected. Even if this event provides far less lead time compared to S01-03, it can be worthwhile to include it into the optimization process as soon as the permission request is submitted, see Fig. 14. The corresponding user in \({\text {UPA}}(t)\) is replaced by a new user with the requested additional permissions, but unchanged UserID, using the mechanisms of S01 and S02. At the time of the decision on the permission request, the currently best role concept \(\pi ^*(t)\) can be implemented in case of approval. In case the permission request is rejected, the changes to \({\text {UPA}}(t)\) are revoked.

Fig. 14
figure 14

Sequential handling of S04

Experiments and Evaluation

In this section, the difference between dynamic and static optimization is discussed in more detail. In practice, once a good role concept has been found and implemented, the roles contained in this role concept usually remain unchanged over time. In many cases, this leads to a large number of unnecessary roles, which contradicts the minimization objective of role mining. In dynamic role mining, however, dynamically occurring events can be integrated into the optimization process, such that roles can be adapted to the new circumstances. In the following, this is examined for the four events triggered by structural change (S01-04) and the associated event-handling methods presented in the previous section.

Preparation of Benchmarks

The experiments were conducted on two benchmark instances of the PLAIN_x-Benchmark of RMPlib [6]. The first benchmark instance considered, PLAIN_small_02 (PS_02), includes 50 users and 50 permissions. The second benchmark instance, that we considered for our experiments, PLAIN_small_05 (PS_05), comprises 100 users and 100 permissions. The number of roles, which was used to create the benchmark instances, can serve as reference value and upper bound for the optimum number of roles. This amounts to 25 roles for PS_02, while 50 roles were used to create PS_05 [6]. To simulate users joining the company (S01) or new job positions (S03), the instances were initially reduced by a certain number of users. For this, users were randomly removed before each experiment was conducted based on uniform distribution until the desired number of users remained. The users removed from the benchmark instances and their associated permissions, can then be added as new users (S01) or considered to be associated to new job position for users that remained in the benchmark instances (S03). To simulate event S02, random users are selected who are to be leaving the company. For permission requests (S04), users request random subsets of the permissions, which are not assigned to them at the time of the event occurrence.

User Joins Company (S01)

In [4], we explored different ways to assign existing roles to new users in case of S01. A very simple approach to do this consists of creating a new role for a new user containing all of his or her required permissions. It is assumed that the new role can be well integrated into the role concept and that the number of roles, even if increased at first, can be further reduced in the course of the ongoing optimization process. However, we were able to show that this approach is not very effective [4]. Other approaches exploit the roles, which are already contained in the role concept at the time of event occurrence. The new user is iteratively assigned one of the existing roles until there is no role left that can be assigned to him or her under consideration of the 0-consistency constraint (Eq. 1) and that covers at least one of the user’s remaining uncovered permissions. For this purpose, a randomized approach, a greedy approach, or approaches exploiting additional information, such as the number of users to which certain roles are assigned or the similarity of existing users to the new user, who is to be joining the company, were examined. Only in case that some of the permissions of the new user are not covered after the application of these methods, a new role is created, which contains the remaining uncovered permissions of the new user. This may prevent the unnecessary creation of new roles. Even though all of these methods achieved better results than the automatic creation of a new role for each new user containing all of his or her permissions, none of these methods were able to achieve significantly better results when compared directly among each other. Therefore, to evaluate the effectiveness of dynamic optimization in case of S01, existing roles are randomly selected and assigned to the new user.

To represent the static role mining process, at different points in time \(t_i\), the currently best role concept \(\pi ^*_{{\text {static}}}(t_i)\) (in terms of the number of roles) is assumed to be implemented and thus fixed. All events that take place after that have no influence on the roles in \(R^*_{{\text {static}}}(t_i)\) and the corresponding assignment of permissions to roles in \(PA^*_{{\text {static}}}(t_i)\). This corresponds to the practice found in real-world use cases, where roles are usually not changed once a role concept is implemented. However, the roles in \(R^*_{{\text {static}}}(t_i)\) can be assigned to a new user as described above. Hence, a new role will only be created, if the new user has uncovered permissions after all roles contained in \(R^*_{{\text {static}}}(t_i)\) are assigned to him or her that fulfill the 0-consistency constraint. After the static role concept has been implemented, instances of S01 are simulated every 5000 iterations resulting in new users joining the company. This procedure is repeated 5 times, i.e. for a period of 25,000 iterations. Subsequently, the dynamic optimization algorithm is given another 10,000 iterations to process the events. The parameters underlying the different test cases are shown in Table 6. Each test setup was repeated 20 times with different random seeds. The results were averaged.

Table 6 Parameter values for the evaluation of all events (S01-04)

Figure 15 shows the typical progression of the number of roles of the best individual of the EA over iterations for event S01. In this example on benchmark instance PS_02, starting at iteration \(t_1=10,000\), two occurrences of S01 were simulated \(|E |=2\) with 5 repetitions, so that 10 new users joined the company in total.

Fig. 15
figure 15

Number of roles over iterations for \(|E|=2\) and \(t_1=10,000\) in PS_02 considering event S01

In both static and dynamic role mining, a sudden increase in the number of roles can be observed whenever an event occurs. This shows that the existing roles at the time of event occurrence are usually not sufficient to provide the new users with the required permissions. New roles are therefore created as described. In the static case, these are integrated into the set of roles \(R^*_{static}(t_i)\) of the implemented role concept \(\pi ^*_{static}(t_i)\) and cannot be processed any further, which explains the emergence of the staircase-shaped curve. In the dynamic case, the created roles are included into the ongoing optimization process resulting in a further reduction of the number of roles. Figure 15 shows an additional disadvantage of static role mining. It is possible that the optimization process is aborted at a too early stage and optimization potential is wasted. Figure 16, where the static role concept is fixed at iteration \(t_4=100,000\), shows that, even in case of dynamic role mining, the number of roles is hardly improved over a long period of time. Only after the occurrence of the events at iteration \(t_4=100,000\), improvements can be achieved compared to the static approach.

Fig. 16
figure 16

Number of roles over iterations for \(|E|=2\) and \(t_4=100,000\) in PS_02 considering event S01

This is also reflected in Table 7, where the resulting number of roles \(|R^*(t_{i,{\text {end}}})|\) of the best individual at \(t_{i,{\text {end}}}:=t_i+35,000\), which is obtained from the 25,000 iterations needed for the simulation of the events and the subsequent 10,000 iterations provided to process the events in the dynamic case, is shown. First, it is evident that the dynamic approach leads to better results in all test cases. For the static approach, it can be observed that by increasing the values of \(t_i\), which coincides to the static role concept \(\pi ^*_{{\text {static}}}(t_i)\) being implemented later in time, the difference in the resulting number of roles for static and dynamic role mining decreases. While the difference amounts to approximately 6 roles for PS_02 (and between 12.25 and 16.6 roles for PS_05) for \(t_1=10,000\), it is reduced to 0.55 respectively 1.3 roles for PS_02 (and 0.8 resp. 3.0 roles for PS_05) considering \(t_4=100,000\).

Table 7 Resulting number of roles for event S01

Using the static approach did not necessarily lead to the creation of a new role for each new user. For the example in Fig. 16, where 10 new users were joining the company in total, 8.2 new roles were created. Another interesting indicator is therefore the impact I(E) of a set of events E. This is defined by the difference in the number of roles of the best individual at the time before event occurrence \(t_i\) and immediately after application of the corresponding event-handling method \(t_i^+\). In the case that more events occur at the same time (\(|E |> 1\)), the resulting difference is divided by the number of events:

$$\begin{aligned} I(E):=\frac{|R^*(t_i^+)|- |R^*(t_i)|}{|E |} \end{aligned}$$
(2)

The average value of the impact of S01 for the different test cases is shown in Table 8. It can be noted that the impact values are relatively independent of the time of event occurrence. For PS_02, the values vary between and 0.61 and 0.85. This means that, on average, in between 61 and \(85\%\) of all cases, it was necessary to create a new role for a new user joining the company. This is probably due to the role structure of the benchmark instances, where one role is often assigned to multiple users, such that there is a certain probability, that all necessary roles, to provide a new user with the required permissions, are already available in \(R^*_{{\text {static}}}(t_i)\) or the corresponding set of roles provided by the individuals of the dynamic optimization process at time of the event occurrence. For \(PS\_05\) the impact values are considerably smaller. This may be explained by the fact that to create the benchmark instances, a user in in PS_02 was averagely assigned 5 roles, whereas a user was assigned only 2.5 roles on average for the creation of PS_05 [6].

Table 8 Impact values for event S01

User Leaves Company (S02)

The event, in which users leave the company, is evaluated in a similar way. Again, in the static case, a role concept \(\pi ^*_{{\text {static}}}(t_i)\) is fixed at specific points in time. Subsequently, instances of S02 are simulated. For this purpose, users are randomly selected from the set of all current users of the company, who are to be leaving the company, based on uniform distribution. In dynamic role mining, the occurrence of events is managed by the event-handling method for S02 as presented in Sect. “Role Mining In Dynamic Environments”. The parameters for the different experiments are shown in Table 6. Again, each test setup was repeated 20 times with different random seeds. Results were averaged.

Fig. 17
figure 17

Number of roles over iterations for \(|E|=2\) and \(t_1=10,000\) in PS_02 considering event S02

It becomes clear that fixing \(\pi ^*_{{\text {static}}}(t_i)\) at a too early stage leads to poor results in the static case. While the number of roles can only be further reduced by the occurrence of events after a certain point in time using dynamic role mining in Fig. 18, where \(\pi ^*_{{\text {static}}}(t_4)\) is considered, in Fig. 17, where \(\pi ^*_{{\text {static}}}(t_1)\) is considered, it is reduced significantly after the role concept fixed at \(t_1=10,000\).

Fig. 18
figure 18

Number of roles over iterations for \(|E|=2\) and \(t_4=100,000\) in PS_02 considering event S02

Table 9 shows the resulting number of roles \(|R^*(t_{i,end})|\) of the best individual at \(t_{i,{\text {end}}}\) for the simulation of S02. Similar to the observation for event S01, the difference between the results obtained from static and dynamic role mining decreases for larger values of \(t_i\). Furthermore, the dynamic approach yields better results in all test cases, analogous to the simulation of S01.

Table 9 Resulting number of roles for event S02

As visible in Figs. 17 and 18, the role concept \(\pi ^*_{{\text {static}}}(t_i)\) does not change when a user leaves the company in the static case. The corresponding number of roles \(|R^*_{{\text {static}}}(t_i) |\) therefore remains unchanged by the occurring events. The values for I(E), which are all zero, can thus be omitted in Table 10, such that only the impact of events in the dynamic case is considered. The impact values of event S02, which range between \(-0.08\) and \(-0.26\), are rather small compared to the values obtained for S01. This can be explained by the fact that roles in a well-designed role concept are assigned to multiple users, so that they are still needed even if a user leaves the company.

Table 10 Impact values for event S02

Change of Job Position (S03)

The evaluation for S03, where users change their job position in a company, is performed in the same way as for the two previous events. As described in Sect. “Role Mining In Dynamic Environments”, to handle event S03, a combination of the event-handling methods for S01 and S02 is needed. For the event handling of S01, again random selection is used for the assignment of roles to users.

Figures 19 and 20 show the progression of the number of roles over iterations for event S03 on benchmark instance PS_02 for \(t_1=10,000\) and \(t_4=10,000\).

Fig. 19
figure 19

Number of roles over iterations for \(|E|=2\) and \(t_1=10,000\) in PS_02 considering event S03

Fig. 20
figure 20

Number of roles over iterations for \(|E|=2\) and \(t_4=100,000\) in PS_02 considering event S03

It can be seen that the progression of the number of roles in both figures resembles the role number progressions for event S01 (see Figs. 15 and 16). This is also confirmed by the values for \(|R^*(t_{i,{\text {end}}})|\), which are close to the values obtained for S01. This can be explained by the fact that a change of job position can be represented as a combination of the events S01 and S02 as described in Sect. “Events”. At this, the impact values of S01 are significantly higher than the values obtained for the impact of S02, which causes the similarity of S03 and S01. Again, the advantages of dynamic role mining become apparent, as it leads to better results than the static approach in all considered test cases, see Table 11.

Table 11 Resulting number of roles for event S03

It is interesting to see that regarding the impact of event S03, which is shown in Table 12, the values obtained from dynamic role mining are slightly smaller than the corresponding values for S01. This is again due to the fact that event S03 is represented by a combination of S01 and S02. The event-handling method belonging to S01 increases the number of roles, while handling S02 slightly reduces the role number as reflected in the corresponding impact values for S01 (see Table 8) and S02 (see Table 10). In the static case, however, there is no major difference considering the impact values of S03 and S01.

Table 12 Impact values for event S03

Permission Request (S04)

To simulate a permission request (S04), at first the user, who is to be submitting the permission request, is selected randomly based on uniform distribution. Subsequently, depending on the current number of his or her permissions, up to 20% additional permissions are drawn from the set of permissions, which are not assigned to the selected user at that moment. Again, the selection of permissions is performed randomly based on uniform distribution. Analogous to the handling of a permission request is achieved applying a combination of the event-handling methods for S01 and S02 as described is Sect. “Events”.

Fig. 21
figure 21

Number of roles over iterations for \(|E|=2\) and \(t_1=10,000\) in PS_02 considering event S04

Figure 21 shows the progression of the number of roles of the best individual for PS_02 and \(|E |= 2\), where the role concept is implemented at \(t_1=10,000\), while \(\pi ^*_{{\text {static}}}(t_4)\) is considered in Fig. 22, which means that the role concept is fixed at \(t_4=100,000\) in this case.

Fig. 22
figure 22

Number of roles over iterations for \(|E|=2\) and \(t_4=100,000\) in PS_02 considering event S04

Again, there is a great similarity to the curves obtained for event S01 and event S03. Analogous to the other events, the dynamic approach was outperforming static role mining in all test cases. However, if we look at the values obtained for \(|R^*(t_{i,{\text {end}}})|\) in Table 13, it becomes evident that in almost all cases significantly worse results were obtained compared to the results for S01 and S03.

Table 13 Resulting number of roles for S04

This is also reflected in the impact of event S04. In all cases, the associated impact values are approximately 1, such that almost every time a user submits a permission request, a new role needed to be created. This may be due to the way a permission request is simulated. The random selection of permissions for S04 can result in the user being assigned a set of permissions which no longer corresponds to the role structure underlying the benchmark instance. Therefore, it may not be possible to create a role that meets the requirements of the permission request and at the same time can be assigned to other users respecting the 0-consistency constraint (Table 14).

Table 14 Impact values for S04

Conclusion and Future Works

In this paper, an overview and a classification on the different sources of dynamic events in the context of role mining was presented. A distinction was made between events triggered by user interaction with role mining software and events that emerge from changes in the structure of a company as employees change positions and responsibilities, or as they join or leave the company. For the events triggered by structural change, detailed event handling methods were presented and integrated into the framework of an evolutionary role mining algorithm. Subsequently, the different events and the corresponding event-handling methods were examined in a series of experiments. The advantages of dynamic optimization compared to static role mining could be demonstrated, as the dynamic role mining outperformed the static approach in all test cases. At this, experiments were performed on synthetically created benchmark instances of RMPlib and events were simulated. In particular the simulation procedure for permission requests, where the random selection of permissions dissolved the role structure embedded in the benchmark instance, needs to be improved. This is important to cope with real-world use cases, where employees do not request random permissions but permissions required to fulfill the tasks of their work. In addition, similar to the consideration of events triggered by structural change in this paper, events emerging from the interaction of users with role mining software must be examined. The development of corresponding event-handling methods and their evaluation is subject to further investigation. Since our research aims at addressing events emerging from the environment of real companies, it would be interesting to apply and evaluate the presented methods in real-world use cases. Another interesting approach would be to investigate the applicability of the techniques developed in this paper to other problems that are subject to dynamic change, such as tour and route planning problems, image processing or dynamic machine scheduling.