1 Introduction

In many industries, particularly in the service sector, employees incur a major part of the direct costs and have a key impact on the quality of the products and services delivered by organizations. Making effective use of the workforce by devising high-quality personnel schedules is thus a critical success factor. For a good overview of the literature dealing personnel scheduling problems, see, e.g. (Van den Bergh et al., 2013).

This paper deals with the Multi-Activity Shift Scheduling Problem (MASSP) consisting of composing anonymous work shifts that cover demands specified in terms of a time period (e.g. an interval of 15 min) and an activity (a type of work). The composition of work shifts needs to account for a set of rules governing aspects such as the length and placement of breaks, the total work hours and the maximum number of consecutive periods for which a certain activity can be performed. The objective function typically involves minimizing the total work hours scheduled and/or penalties for violating rules under- and/or over-covering demand. Typically, both the shift composition rules and the objective function vary between different MASSP variants.

A classical formulation for shift scheduling problems is the set covering formulation proposed by Dantzig (1954) where each of the main decision variables corresponds to a complete feasible shift. For many shift scheduling problems, however, the number of shifts is too large to enumerate explicitly. In particular, this is the case for multi-activity shift scheduling problems. As an example, the biggest instances considered in this paper exhibit billions of feasible shifts.

To avoid an explicit enumeration of all feasible shifts, many approaches for the MASSP rely on Branch-and-Price techniques in which the shift variables are generated as needed throughout the solution process by solving column generation subproblems. As an example, Demassey et al. (2005) use Constraint Programming for solving the shift generation subproblem; more specifically, they use a cost-regular constraint that allows modeling shift feasibility using a regular language. Restrepo et al. (2012) solve the subproblems using a resource-constrained shortest path algorithm on an expanded graph, and Côté et al. (2013) formulated the subproblem using a grammar and solve it by dynamic programming.

Interestingly, the idea of describing the set of feasible shifts with formal languages is also used in approaches using (monolithic) MILP formulations: As an example, Côté et al. (2011a) represent the set of feasible shifts by a network flow component where the network is a directed acyclic graph derived from an automaton. Furthermore, Côté et al. (2011b) present a formulation in which shift rules are expressed using context-free grammars. The grammars are used to derive a hypergraph which is embedded into a MIP formulation. This formulation is very efficient since it does not require using a separate hypergraph per shift/employee, but only a single hypergraph with an integer hyperflow. A set of shifts can then be determined in a postprocessing step by decomposing the hyperflow into hyperpaths.

Another stream of shift scheduling research deals with so-called implicit formulations: In these formulations, shifts are not modeled explicitly but certain aspects such as the placement of breaks are considered implicitly using certain types of constraints such as the forward- and backward constraints proposed by Bechtold and Jacobs (1990). After solving the implicit problem, shifts can be recovered from the solution in a postprocessing step. Dahmen et al. (2018) are the first to propose such an implicit model for multi-activity shift scheduling problems. Their approach relies on enumerating partial shift schedules (pre- and post-break work stretches) forming the main decision variables; the allocation of breaks between these stretches is then implicitly modeled using forward and backward constraints.

In addition to the exact approaches sketched so far, there are also some heuristic approaches for the MASSP that have been proposed in the literature, for example a large neighbourhood search (Quimper & Rousseau, 2010) and a recently published Lagrangian relaxation-based matheuristic (Hernández-Leandro et al., 2019). The latter article focuses on the personalized MASSP, but it also includes experiments with the MASSP instances from Demassey et al. (2005) that we also consider in this paper.

This paper presents an efficient MILP formulation for the MASSP that is based on the idea of encoding MASSP rules in state-expanded networks. Formulations based on state-expanded networks were previously applied to personnel scheduling problems such as airline crew scheduling (Mellouli, 2001) and nurse rostering (Römer & Mellouli, 2016). Recently, Porrmann and Römer (2021) introduced a formulation based on a state-expanded network for the MASSP variant introduced by Demassey et al. (2005).

In a state-expanded network, nodes are associated with rule-relevant states and arcs represent transitions between states, typically induced by assigning pieces of work (e.g. flights in airline crew scheduling or shifts in nurse rostering). The network is designed in a way that each path in the network corresponds to a schedule (or crew pairing) that is feasible with respect to a set of rules. The network is then embedded into a MILP model as a network flow component. An important advantage of this type of model is demonstrated in Mellouli (2001) for the case of the airline crew pairing chain problem: If certain employees (crew members) can be considered to be identical/anonymous and if the state-expanded network encodes the full set of schedule (pairing) legality rules, then one can use an aggregated integer-valued flow in a single network for these employees instead of introducing a separate network each employee. Similar to the implicit grammar model from Côté et al. (2011b), one can then decompose the integer flow into paths to obtain feasible schedules (crew pairing chains).

Since in the MASSP, all employees are considered identical, a state-expanded network model for this problem can also use a single network with integer flow variables. It turns out, however, that a state-expanded network that encodes all shift composition rules quickly becomes huge; this leads to the fact that for complex large-scale MASSPs, a “plain” state-expanded network model is neither practically useful nor competitive with state-of-the art approaches such as the grammar-based models from (Côté et al., 2011b) or the implicit formulation proposed in (Dahmen et al., 2018). To address this issue, Porrmann and Römer (2021) proposed to employ a machine learning approach for heuristically reducing the size of the state-expanded network. The downside of such a heuristic reduction is that the state-expanded network no longer represents all feasible shifts, rendering the whole approach non-exact. The main contribution of this paper is to propose a new formulation based on block-based state-expanded networks that can be efficiently solved without resorting to such heuristic network reduction to the extent that it outperforms other state-of-the art exact approaches.

1.1 Contributions

This paper presents a new efficient formulation for the MASSP that relies on efficiently representing shifts in block-based state-expanded networks. A key design decision in the proposed model is to use full activity blocks (consecutive assignments of the same activity) as the basis for constructing arcs of the state-expanded network. While this idea by itself does not yield a model that is competitive with the state-of-the art, it provides the basis for two exact techniques that, in particular when being combined, help to substantially reduce the size of the model instances. First, we introduce a set of constraints that implicitly ensure that the (aggregated) flow in the state-expanded network can be decomposed into shifts in which two consecutive activity blocks do not have the same activity. This means that this requirement does not need to be encoded in the network, resulting in a reduction in the number of nodes (and activity block arcs) by a factor of about \(|A|-1\) where |A| is the number of activities in the instance. Second, we introduce the idea of activity block templates: An activity-block template is an activity-agnostic block associated with the arcs in the state-expanded network. By making the state-expanded network activity-agnostic, its size is reduced by a factor of \(|A| -1\). The assignment of activities to the block templates is then performed by assignment variables that are linked to the flow in the state-expanded network. Finally, we show how these two ideas can be combined by embedding the activity assignment variables in a second state-expanded network that is used for composing (activity-specific) work blocks. For complex MASSP problems such as the problem considered in Dahmen et al. (2018), applying both ideas results to a reduction in the size of the main state-expanded network in the order of the square of the number of activities in a given instance.

In the experiments with two sets of problem instances from the literature that include more than 70 previously unsolved instances, we show that our approach is able to solve all instances to optimality on a notebook computer in less than one hour; more specifically, only five out of more than 1000 instances require more than 30 min to be solved to optimality.

The remainder of this paper is structured as follows: The next section provides a description of multi-activity shift scheduling problems (MASSPs), in particular including the MASSP variants introduced in Demassey et al. (2005) and (Dahmen et al., 2018) studied in our computational experiments. Section 3 describes how to construct block-based state-expanded networks encoding the shift composition rules typically arising in MASSPs, and in particular how to construct networks encoding the rules from the two problem variants presented in Demassey et al. (2005) and Dahmen et al. (2018). Section 4 presents how to embed these networks into a MILP formulation for MASSPs, and it also presents a set of implicit constraints ensuring that an aggregate flow can be decomposed into feasible paths without having to encode the activity change requirement in the network. Section 5 presents the idea of activity block templates and shows how they can be used for reducing the overall model size; furthermore it shows how they can be used to create models involving two coupled state-expanded networks operating on different levels of detail. Section 6 presents the results from our computational experiments, followed by the conclusions in Sect. 7.

2 Multi-activity shift scheduling problems

A MASSP deals with compiling (anonymous) work shifts that are used to cover a fixed work demand given per period \(p \in P\) where P is a set of time periods constituting a work day. The number of shifts to be compiled can either be given or free, and the objective typically consisting of minimizing an objective involving total scheduled work time and penalties for under- or overcovering demand or for violating soft shift legality rules. What sets the MASSP apart from a classical (mono-activity) shift scheduling problem is the fact that demand in each period p is provided for different work activities \(a \in A\). For each of these activities, there is usually a minimum and a maximum number consecutive periods for which it can be assigned before switching to another activity or to a break. Note that in the “plain” MASSP considered in this paper, it is assumed that each employee can be assigned to each activity. The presence of employee-specific activities turns the problem in to a personalized MASSP which is studied in Côté et al. (2013), for example. A key feature of shift scheduling problems is the fact that the shifts need to respect a (problem-specific) rule set that governs aspects such as break requirements and the number and duration of consecutive work periods. In many MASSP problem variants, the rule set involves multiple shift types (e.g. short and long shifts) affecting the parameters of certain rules. As an example, the maximum allowed shift duration is typically different for short and long shift types.

To facilitate the description of typical types of rules (and to lay the ground for the modeling approach presented later), let us introduce the following terms that view a shift as a hierarchical composition of blocks:

  • An elementary assignment or elementary block is an assignment of an activity \(a \in A\) to a period \(p \in P\)

  • An activity block is a block of consecutive elementary assignments of one type of activity

  • A work block is a block of consecutive activity blocks

  • A break block is a block of consecutive break periods

  • A shift is an alternating sequence of work and break blocks starting and ending with a work block

These components or “building blocks” of shifts can be used to express shift composition rules typically arising in an MASSP in a straightforward way. This is due to the fact that many of these rules can be expressed as constraints regarding the properties of blocks (e.g. minimum or maximum duration of activity blocks, work blocks or break blocks; and number of breaks and number of work periods in shifts) or as constraints affecting the composition of blocks from other blocks (e.g. alternation of work and break blocks in shifts, no consecutive blocks of the same work activity in one work block). To see examples for concrete rule sets and concrete MASSP variants, let us now discuss two MASSP variants from the literature.

2.1 MASSP variant from (Demassey et al., 2005)

In this problem variant, which will be referred to as the Demassey problem in the rest of this paper, each instance has a fixed number of employees and a given number of different activities. There are two shift types (short and long shifts); short shifts have a single short break and long shifts have two short breaks and one long break. Shifts must completely fall into a work day consisting of 96 periods of 15 min; demand is given for each activity and for each period. Under- and overcovering of demand are allowed, but penalized in the objective function. The other part of the (minimization) objective involves the costs associated with each scheduled work period. The shift legality rules can be stated as follows:

  1. 1.

    Each activity block has a minimum duration of four periods

  2. 2.

    Each work block is composed of a single activity block

  3. 3.

    The duration and composition of a shift depends on the shift type:

    1. (a)

      Short shifts exhibit at least 3 and less than 6 h of work; they consist of two work blocks separated by a short break of 15 min

    2. (b)

      Long shifts exhibit least 6 h and at most 8 h of Work; they consist of four blocks work blocks separated by two short breaks of 15 min and a long break of 1 h (breaks can placed in arbitrary order)

2.2 MASSP variant from (Dahmen et al., 2018)

In contrast to the Demassey problem, the MASSP variant presented by Dahmen et al. (2018) (subsequently called the Dahmen problem), does not deal with a fixed number of employees/shifts, but the number of shifts can be freely chosen. Depending on the instance, there are different numbers of activities and different shift types affecting the length of the break (each shift involves a single break, irrespective of the shift type), the length of the work blocks and the total shift duration.

The time granularity is either 15, 30 min or 1 h, and shifts can exceed the planning horizon of 1 day. Activity blocks exceeding the planning horizon do not contribute to demand covering, but they contribute to the objective function that consists in minimizing the total number of periods worked. The demand has to be covered (undercovering is not allowed), and overcovering is allowed and not penalized.

The shift legality rules in the Dahmen problem can be stated as follows:

  1. 1.

    A shift can only start at certain periods \(P^\textrm{start} \subset P\).

  2. 2.

    There are activity-specific minimum and maximum activity block durations.

  3. 3.

    There are minimum and maximum durations for work blocks that depend on the type of shift and on whether the work block is before or after the break.

  4. 4.

    A work block can be composed of multiple consecutive activity blocks; if a work block is composed of multiple activity blocks, then two consecutive activity blocks need to have different activity types (we will subsequently call this rule the activity change requirement).

  5. 5.

    Each instance may have multiple shift types; the type of shift governs:

    1. (a)

      The minimum and maximum duration of a shift (break periods are counted).

    2. (b)

      The minimum and maximum duration of the pre- and post-break work blocks.

    3. (c)

      The (fixed) duration of the break.

In addition to these rules, Dahmen et al. (2018) discuss a “restricted” variant of their problem in which it is only allowed to assign two different activities within a single work block. In other words, in that variant, each work block contains activity blocks with at most two different activity types.

3 Modeling shifts with block-based state-expanded networks

Our approach for the MASSP relies on the central idea to encode all shift composition rules in a (directed) state-expanded network \(G=(N,E)\) in a way that each path from the source node \(v^\textrm{source}\) to the sink \(v^\textrm{sink}\) corresponds to a feasible shift and the set of all source-sink paths in G corresponds to the set of all feasible shifts. Each node in \(N^\textrm{state} = N {\setminus } \{ v^\textrm{source}, v^\textrm{sink}\}\) is associated with a rule-related state \(s_v\) which typically consists of a tuple of state attributes. The arc \(e^\textrm{circ} = (v^\textrm{sink}, v^\textrm{source}) \in E\) is denoted as the flow circulation arc, and in case of a given number of employees n, its flow value is fixed to n. All arcs between the nodes in \(N^\textrm{state}\) represent state transitions induced by assigning a feasible work activity block or a break block. Note that by associating arcs with full activity and break blocks, all rules related to activity blocks (e.g. minimum and maximum duration) and break blocks (e.g. the possible break lengths) are satisfied by design since only legal activity and rest blocks are considered in the network. Observe that a difference to other related approaches such as the automaton models discussed in Côté et al. (2011a) where each transition is associated with an elementary (single-period) assignment, and also to the state-expanded network representation proposed by Porrmann and Römer (2021) where transitions are associated with partial activity blocks.

Example

To illustrate the construction of such a block-based state-expanded network, let us consider a small and simplified single-activity shift scheduling problem with a planning horizon of seven periods. In the example problem, a feasible shift needs to satisfy the following (hard) shift composition rules: A shift needs to contain either five or six periods of work spread across two work blocks that have to be separated by a single-period break. An activity block has a minimum duration of two periods and a maximum duration of three periods.

To model this problem, we assign each node \(v \in N^\textrm{state}\) a state \(s_v\) consisting of four state attributes:

  • \(s^\textrm{p}_v\) is the period index of the node.

  • \(s^\textrm{prevWork}_v\) is a Boolean attribute denoting whether the last assignment was a work activity work or not.

  • \(s^\mathrm {\#work}_v\) is a counter of the number of work periods assigned so far.

  • \(s^\mathrm {\#break}_v\) corresponds to the number of breaks taken so far (in the example, either 0 or 1).

The resulting network with all feasible nodes and arcs is displayed in Fig. 1. The attributes \(s^\textrm{p}_v\) and \(s^\mathrm {\#work}_v\) of the nodes in \(N^\textrm{state}\) are visualized using the position of the node (\(s^\textrm{p}_v\) corresponds to the x-axis, \(s^\mathrm {\#work}_v\) to the y-axis); the attribute \(s^\mathrm {\#break}_v\) is used as node label and the value of \(s^\textrm{prevWork}_v\) is indicated by the color of the node.

Fig. 1
figure 1

Block-based state-expanded network for the example

The arcs between nodes in \(v \in N^\textrm{state}\) represent state transitions induced by assigning activity blocks and break blocks. In particular, the arcs emanating from a node v for which the value of \(s^\textrm{prevWork}_v = \textrm{false}\) represent the feasible activity blocks that can start from the state \(s_v\) and end in a feasible state (e.g. for a node v with \(s^\textrm{prevWork}_v=\textrm{false}\) and \(s^\mathrm {\#work}_v = 4\), only a single activity block with a duration of two results in a state that respects the maximum number of work periods per shift).

The nodes for which \(s^\textrm{prevWork}_v =\textrm{true}\) and \(s^\mathrm {\#break}_v = 0\) have an outgoing break arc representing a single-period break. The rule that shifts need to be composed of two work blocks separated by a rest block is ensured by the fact that there are only arcs to the sink from a node with \(s^\mathrm {\#break}_v=1\) and \(s^\textrm{prevWork}_v=\textrm{true}\). Similarly, the rule limiting the number of total work periods to be either five or six is ensured by the fact that there are only arcs from nodes \(v \in N^\textrm{state}\) to \(v^\textrm{sink}\) if \(5 \le s^\mathrm {\#work}_v \le 6\). The remaining arcs represent the connections between \(v^\textrm{source}\) and the initial state nodes (for which \(s^\textrm{prevWork}_v = \textrm{false}\)) and the flow circulation arc \(e^\textrm{circ}\) connecting the sink to the source node.

Multi-activity work blocks. A simplifying assumption in the example above is that there is a single type of activity. In such a setting, each work block consists of a single activity block. In a multi-activity setting, a work block can consist of multiple consecutive activity blocks with different work activities. Activity blocks with the same start period and end period and different activities are represented by a separate (parallel) activity block arc for each activity.

In order to model constraints on the duration of work blocks, we introduce a state attribute \(s^\mathrm {\#pWb}_v\) counting the number of work periods in the work block. Given an activity block arc from node v to node w with length p, the transition function with respect to that attribute is \(s^\mathrm {\#pWb}_w = s^\mathrm {\#pWb}_v + p\). After a break, this attribute is reset, that is, a node w that forms the target of a break arc has \(s^\mathrm {\#pWb}_w = 0\).

If a work block can consist of multiple activity blocks, a natural rule is that two consecutive activity blocks must be assigned to different activities. To encode this rule in the network, we can introduce a state attribute \(s^\textrm{prevAct}_v\) indicating the previously assigned activity. Then, if the previous block was assigned to activity a, that is, \(s^\textrm{prevAct}_v =a\), only arcs representing blocks with activities \(a' \ne a\) can emanate from v. If the number of different activities per work block is limited, this can be represented by a set-valued state attribute \(s^\textrm{actWb}_v\) that records the set of activity types assigned in a work block. That state attribute is reset to the empty set after the end of a work block.

Modelling different shift types Let us now see how we can deal with multiple shift types imposing different break patterns and shift lengths, denoting the set of shift types with Q. It turns out that we do not need to include an extra state attribute for representing shift types. Instead, for each partial shift ending at node v, we can determine the subset \(Q_{s_v} \subset Q\) of shift types for which \(s_v\) is a feasible state. At the beginning of a shift, say, in the first work block, a state is likely to be feasible for various or all shift types. Later on, in particular depending on the break periods assigned, a state may only be feasible for a single shift type. In the construction of the state-expanded network, this means that it is checked for each possible transition after a node v whether the target state after the transition is feasible for at least one shift type. Only if this is the case, the corresponding target node and the arc corresponding to the transition is included in the network.

3.1 State model for the Demassey problem

The Demassey problem is a MASSP, but it has the special (and simplifying) rule that every work block consists of a single activity block. As a result, we do not have to explicitly model rules governing the composition of work blocks from activity blocks, and also the workblock duration rules are dealt with by the fact that each activity block assignment represents a full work block. The state variable thus only needs to keep track of whether the last assignment was work or not; we use the state variable \(s^\textrm{prevWork}_v\) for this purpose. There are, however, two types of shifts that not only vary with respect to the minimum and maximum duration but also with respect to their break configuration. To keep track of the break configuration, our state variable represents the number of short and long breaks assigned so far in the state attributes \(s^{\#shortbreak}_v\) and \(s^\mathrm {\#longbreak}_v\). Finally, we use the attribute \(s^\mathrm {\#work}_v\) to count the total number of work periods.

To summarize, the state attributes needed are:

  • \(s^\textrm{p}_v\) period p.

  • \(s^\textrm{prevWork}_v\) Boolean state indicating whether the previous assignment was work or not.

  • \(s^\mathrm {\#shortBreak}_v\) number of short breaks assigned.

  • \(s^\mathrm {\#longBreak}_v\) number of long breaks assigned.

  • \(s^\mathrm {\#work}_v\) number of total work periods assigned in the path so far.

3.2 State model for the Dahmen problem

While in the Demassey problem, each work block consists of a single activity block, the Dahmen problem permits work blocks with multiple consecutive activity blocks as long as two consecutive activity blocks within a work block do not exhibit the same activity (the activity change requirement) and work block length constraints are respected. As explained above, we can model these multi-activity work blocks using the state variables \(s^\mathrm {\#pWb}_v\) and \(s^\textrm{prevAct}_v\). In the Dahmen problem, a shift only has a single break and the length of the break depends on the shift type. In order to determine the shift type implied by the break, we need an attribute that does not only record if there was a break but also the length of that break, we use the attribute \(s^\textrm{breakLength}_v\) (which is 0 if there was no break) for this purpose.

To summarize, the state attributes needed to represent the shift rules for the Dahmen problem are:

  • \(s^\textrm{p}_v\) period p.

  • \(s^\textrm{prevAct}_v\) previously assigned activity if the previous assignment was work, otherwise (e.g. in case of a break), the value is None.

  • \(s^\mathrm {\#pWb}_v\) number of work periods assigned in the current work block.

  • \(s^\mathrm {\#work}_v\) number of total work periods assigned in the path so far.

  • \(s^\textrm{breakLength}_v\) length of the break in the path (0 if there was no break).

As explained above, the restricted variant of the Dahmen problem only permits two different activities to be assigned per work block. To model this constraint, we introduce an additional state attribute:

  • \(s^\textrm{actWb}_v\) set of activity types that have occurred in the work block so far.

This state attribute is set to the empty set for each node representing the beginning of a work block. A transition induced by the assignment of a work activity adds the type of the activity to the set \(s^\textrm{actWb}\). If the set \(s^\textrm{actWb}_v\) has a cardinality \(|s^\textrm{actWb}| = 2\), only one of the two activities in the set can be chosen next, namely the activity type \(s^\textrm{actWb} \setminus s^\textrm{prevAct}_v\) which does not induce a violation of the activity change requirement.

4 MILP formulations

The state-expanded network explained in the previous section constitutes a core element of our MILP formulation for the MASSP. In this section, we first describe the basic formulation and then an implicit formulation of the activity change requirement that allows moving this rule out of the state-expanded network, reducing its size by a factor of about \(|A|-1\) where |A| is the number of activities in the problem.

4.1 Basic MILP formulation

The state-expanded network enters the model in form of a network flow component. The flow on an arc \(e \in E\) is represented by the integer decision variable \(X_e\). The cost of a unit flow on arc e is denoted as \(c_e\). As an example, if arc e represents a work activity block, this cost factor may include the cost induced by the number of work periods in that block. The other two sets of decision variables are \(Y^\textrm{u}_{a,p}\) and \(Y^\textrm{o}_{a,p}\) which model the under- and overcovering of the demand \(d_{a,p}\) of activity a in period p; these variables are associated with penalties \(c^\textrm{u}\) and \(c^\textrm{o}\) for under- and overcovering. Note that in case of hard demand covering limits as in the Dahmen problem, the corresponding variables can be forced to 0.

Using the described symbols, the MILP model can be written as follows:

$$\begin{aligned} \textrm{min} \sum _{e \in E} c_e X_e + \sum _{a \in A}\sum _{p \in P} \bigl ( c^\textrm{o} Y^\textrm{o}_{a,p} + c^\textrm{u} Y^\textrm{u}_{a,p} \bigr ) \end{aligned}$$
(1)
$$\begin{aligned} \sum \limits _{e \in v^\textrm{in}} X_{e}&= \sum \limits _{e \in v^\textrm{out}} X_{e}&\forall v \in N \end{aligned}$$
(2)
$$\begin{aligned} X_{e^\textrm{circ}}&=n \end{aligned}$$
(3)
$$\begin{aligned} \sum \limits _{e \in E^\textrm{cov}_{a,p}} X_{e} + Y^\textrm{u}_{a,p} - Y^\textrm{o}_{a,p}&= d_{a,p}&\forall a \in A, p \in P \end{aligned}$$
(4)
$$\begin{aligned} X_e&\in \mathbb {Z}^+_0&\forall e \in E \end{aligned}$$
(5)
$$\begin{aligned} Y^\textrm{o}_{a,p} \ge 0, \quad Y^\textrm{u}_{a,p}&\ge 0&\forall a \in A, p \in P \end{aligned}$$
(6)

The objective function (1) contains the cost induced by the flow in the state-expanded network and the penalties for over- and undercovering demand. (2) are the flow balance constraints for each node v ensuring that the flow on the incoming arcs \(v^\textrm{in}\) equals the flow on the outgoing arcs \(v^\textrm{out}\), and constraint (3) fixes the flow on the circulation arc \(e^\textrm{circ}\) to the number n of employees. In case that the number of employees is not fixed but part of the scheduling problem, this constraint can be dropped, and the cost of an employee can be modeled in the cost coefficient \(c_e^\textrm{circ}\). Constraint set (4) models the demand covering for each activity and period; the set \(E^\textrm{cov}_{a,p} \subset E\) is the set of arcs representing an assignment that covers activity a in period p. The other two constraint sets determine the domains of the decision variables.

A solution to problem (1)–(6) contains the integer flow in network G; the solution flow \(X^*_{e^\textrm{circ}}\) on the circulation arc corresponds to the total number of units flowing through the network. Using flow decomposition, we can obtain \(X^*_{e^\textrm{circ}}\) paths between the source and the sink each of which corresponds to a shift. Note that in general, such a flow decomposition is not unique; that is, a given flow solution may be decomposable in different paths (representing different sets of shifts).

4.2 Implicit activity change constraints

As explained in Sect. 3, representing the rule that two consecutive work activity blocks need to have different activities requires introducing a state attribute that stores the activity assigned in the previous block. Given that |A| is the number of activities, introducing this state attribute increases the number of nodes (and arcs) by a factor of about \(|A|-1\). In order to avoid this increase, we propose a set of linear constraints that implicitly enforce that the flow in the state-expanded network is decomposable into a set of shifts respecting the activity change requirement.

If we assume that the activity change rule is not embedded in the state-expanded network, it may happen that the flow through a node v representing the connection of two consecutive activity blocks is not decomposable in a way that for each path through v, the activity associated with the in-arc of v in the path is different from the activity of the out-arc of v. We refer to a node connecting two activity blocks as an interior node of a work block; the set of these nodes will be written as \(N^\textrm{inWb}\). Using the state attribute \(s^\mathrm {\#pWb}_v\) introduced in Sect. 3, we can state that \(N^\textrm{inWb}\) is the set containing all nodes with \(s^\mathrm {\#pWb}_v > 0\) that have a least one outgoing arcs representing an activity block.

To enforce that for each of the nodes \(v \in V^\textrm{inWb}\), there exists a flow decomposition respecting the activity change rule, we impose a set of constraints that ensures that the total flow on the out-arcs of v representing a block with activity a (the set of these arcs is denoted as \(v^\textrm{out}_{a}\)) is smaller or equal than total flow on the arcs in \(v^\textrm{in}_{\lnot a}\) representing the in-arcs associated with activity blocks for activities other than a. This set of constraints can be written as:

$$\begin{aligned} \qquad \sum \limits _{e \in v^\textrm{in}_{\lnot a}} X_{e} \ge \sum \limits _{e \in v^\textrm{out}_a} X_{e} \qquad \forall v \in N^\textrm{inWb}, a \in A \end{aligned}$$
(7)

Proposition 4.1

Constraints 7 ensure that the flow through node \(v \in N^\textrm{inWb}\) can be decomposed in a way that the resulting (partial) work blocks respect the activity change requirement, that is, that a work block does not contain two consecutive activity blocks assigned to the same activity.

Proof

We will now give a constructive proof for the existence of a such a feasible decomposition.

We consider a node \(v \in N^\textrm{inWb}\) for which constraints (7) hold, and we assume that we have a non-negative flow through v. A flow unit on an incoming (outgoing) arc of such a node represents an activity block that we denote as incoming (outgoing) activity block. Recall that we want to show that each outgoing block with a certain activity a can be linked to an incoming block with an activity \(a' \ne a\).

Constraints (7) ensure that in a feasible solution, the number of outgoing activity blocks with \(activity(b^\textrm{out})=a\) is smaller or equal than the number of incoming blocks not assigned to a. The question we address in this proof is whether this is sufficient to guarantee that we can assign a correct incoming block \(b^\textrm{in}\) to every outgoing block \(b^\textrm{out}\) in a way that \(activity(b^\textrm{in}) \ne activity(b^\textrm{out})\). We will show this by providing a decomposition procedure that, given that constraints (7) hold, is guaranteed to find such an assignment.

A key concept for our decomposition procedure is that of complementary pairs: Two activity block pairs \((b^{in}_1,b^\textrm{out}_1)\) and \((b^{in}_2,b^\textrm{out}_2)\) are complementary if \(activity(b^\textrm{in}_1) = activity(b^\textrm{out}_2)\) and \(activity(b^\textrm{in}_2) = activity(b^\textrm{out}_1)\). If there exists such a set of complementary pairs in the flow through a node v, we can extract the corresponding flow and obtain two partial work blocks respecting the activity change requirement.

Observe that if we have a flow solution for which the constraints (7) hold for a node v and if we extract the flow corresponding to two complementary pairs from that solution, then the two constraints associated with v and the involved activities will also hold after that operation since for both constraints, both the left-hand side and the right-hand side are reduced by one.

Our decomposition procedure for a flow through a node v starts with the full flow solution and extracts complementary pairs until no more such pairs are found. After having removed all complementary pairs, either each outgoing activity block is assigned a feasible incoming block (there is no residual flow on an activity-block arc going out of v) or there are outgoing blocks left which are not part of any complementary pair given the remaining incoming blocks. As noted above, however, the constraints are still valid for the residual flow solution, thus for each outgoing block with activity a there must exist at least one incoming block with an activity other than a. In addition, since there do not exist any complementary pairs in the residual flow, this means that the set of activities associated with residual incoming blocks is disjoint from the set of activities associated with the outgoing activity blocks. This means that we can randomly assign one of the incoming blocks represented by the residual flow to each of the outgoing blocks to obtain a feasible decomposition. \(\square \)

5 Reducing model size and symmetry with activity block templates and coupled networks

The state-expanded network formulation discussed above gives rise to large model instances exhibiting a considerable amount of symmetry. To illustrate the symmetry which is inherent in many MASSPs, see Fig. 2 depicting four shifts for a small example with three activities. Each shift consists of two activity blocks separated by a two-period break. In this schedule, certain activity blocks with the same start and end period can be exchanged between the shifts without affecting the quality of the solution: As an example, the first activity blocks in the first two shifts can be exchanged. In addition, the last activity block in the first shift and the first activity block in third shift can be exchanged, as well as the last block in the second shift and the first block in the fourth shift.

In a state-expanded network in which activity blocks are associated with arcs, these symmetrical shifts correspond to symmetrical paths in the network. In particular, the exchangeable blocks in shifts one and three (and those in shifts two and four) correspond to arcs emanating from nodes associated with different states. For each of the states and for each feasible block, the network contains a parallel arc for each activity. This means that the decision which activity should be assigned to a block is “repeated” for each possible state, despite the fact that for a solution like the one displayed in Fig. 2, many solutions with arcs starting in different states are equivalent.

Fig. 2
figure 2

Example: a multi-activity shift schedule with a lot of symmetry

We propose to avoid this symmetry by moving the activity assignment decision out of the state-expanded network and replace the activity-specific activity block arcs with arcs associated with “anonymous” activity block templates. This means that the flow in the state-expanded network only decides that certain activity blocks with a given start and end time are placed in a shift, but not to which activity this block is assigned. The assignment of concrete activities is then delegated to a separate model part that ensures that for each activity block template (characterized by start and end period), the number of matching “concrete” activity blocks that are assigned equals the flow on all arcs representing the corresponding activity block template.

For the example from Fig. 2, this idea is illustrated in Fig. 3: In the top part, the shifts from Fig. 2 are displayed as “anonymous” blocks or activity block templates. The bottom part of the figure displays the number of times each activity type is assigned to each template block. As an example, there exist two templates representing an activity block from period 7 to period 9. The bottom part of the figure shows that these two templates will be “filled” with one block assigned to activity 2 and one block assigned to activity 3–the crucial idea here is that the template-based model does not explicitly assign the activity-specific blocks to the activity-agnostic template blocks but merely ensures that they can be assigned. In other words, in contrast to the original model, the template-based model does not need to decide between solutions that are equivalent anyways: All solutions that are symmetric with respect to the assignment of activity blocks to template blocks correspond to a single solution in the template-based model, while all of them represent different solutions (paths in the state-expanded network G) in the original model.

Fig. 3
figure 3

Avoiding symmetry by decoupling shift composition and activity assignment

5.1 Model reformulation based on activity block templates

To formulate the MASSP using activity block templates, we first create a state-expanded network that uses activity block templates instead of activity blocks. The network construction follows the description in Sect. 3, assuming that there is a single “template activity” (which can, depending on the rule set, appear multiple times in a single work block) that represents the activity block template. The lower / upper bound for the duration of the template activity is chosen as the minimum / maximum of the the lower / upper bounds of all activities in the problem instance under consideration, and the set of all possible activity block templates is referred to as \(B^{\square }\). In the following exposition (and in the mathematical model discussed below), we use the symbol G to represent the template-based network, that is, we assume that it replaces the original activity-specific network. Note that compared to the original network, the number of activity block arcs in the template block network is reduced by a factor of \(|A| - 1\), where |A| is the number of activities in the problem instance under consideration.

The assignment of a concrete activity \(a\in A\) to an activity block template \(b \in B^{\square }\) is represented by the nonnegative integer decision variable \(X^{\square }_{b,a}\). Observe that while there is a single variable \(X^{\square }_{b,a}\) for each of \(b \in B^{\square }\) and each activity for which b is a feasible block, every activity block template \(b \in B^{\square }\) is associated with multiple arcs in G (these arcs start in nodes with different states, e.g. before and after a break); we denote the set of all arcs \(e \in E\) associated with a block b as \(E^{\square }_{b}\).

The new model then consists of the objective function (1), the flow balance constraints (2), the flow size constraint (3) to model a fixed number of employees, the variable domains (5) and (6), and the constraints (8)–(10) to be explained next.

$$\begin{aligned} \sum \limits _{e \in E^{\square }_{b}} X_{e} = \sum \limits _{a \in A^b}&X^\square _{b,a}&\forall b \in B^{\square } \end{aligned}$$
(8)
$$\begin{aligned} \sum \limits _{b \in B^\textrm{cov}_{a,p}} X^\square _{b,a} + Y^\textrm{u}_{a,p} - Y^\textrm{o}_{a,p}&= d_{a,p}&\forall a \in A, p \in P \end{aligned}$$
(9)
$$\begin{aligned} X^{\square }_{b,a}&\in \mathbb {Z}^{+}_{0}&\forall a \in A, B \in B_a \end{aligned}$$
(10)

The linking constraints (8) ensure that for each activity block template b, the total number of assigned concrete activity blocks from activities \(a \in A^b\) for which block b is a valid activity block equals the total flow on the arcs \(e \in E^{\square }_{b}\) in G that represents b. The constraints (9) reformulate the cover constraints (4) using the activity block assignment variables \(X^{\square }_{b,a}\); the set \(B^\textrm{cov}_{a,p}\) is the set of blocks that are valid for activity a and demand period p. Finally, (10) establish the domains of the block assignment variables, using the set \(B_a\) representing the set of all feasible activity blocks for activity type a.

5.2 Dealing with work block composition rules by coupling state-expanded networks

The template-based state-expanded network discussed above does not consider concrete activities at all, and the mathematical model does not relate the (concrete) activity block variables \(X^{\square }_{b,a}\) to each other. This means that the model above can be used for problems such as the Demassey problem, but not for problems such as the Dahmen problem in which a work block can be composed of multiple activity blocks and there are rules governing the feasibility of this composition.

In order to account for work block composition rules, we propose a more complex formulation in which the composition of activity blocks into feasible work blocks is ensured by a separate state-expanded network \(\mathcal {G}=(\mathcal {V},\mathcal {E})\) that we will refer to as the work block composition network in the following. Like the original network, \(\mathcal {G}\) contains a source and a sink node that are connected by a circulation arc from the sink to the source. All arcs \(e \in \mathcal {E}\) other than the circulation arc and the arcs from the source and to the sink are associated with activity blocks assigned to concrete work activities \(a \in A\) and each path in \(\mathcal {G}\) corresponds to a feasible (activity-assigned) work block. Depending on the rules to be considered, the state variable \(s_v\) associated with each node \(v \in \mathcal {N}^\textrm{state}\) (that is, each node that is neither the source nor the sink) has (a subset of) the following attributes:

  • \(s^\textrm{p}_v\) the period p.

  • \(s^\mathrm {\#pWb}_v\) The number of work periods in the current work block.

  • \(s^\textrm{prevAct}_v\) The activity type of the previous activity, (None if it is the first activity in the block).

  • \(s^\textrm{actWb}_v\) The set of activity types that have occurred in the work block so far.

The flow variables associated with the edges \(e \in \mathcal {E}\) are denoted as \(X^\mathcal {G}_{e}\), and, like for G, the network flow component associated with \(\mathcal {G}\) needs respect the flow balance constraints:

$$\begin{aligned} \qquad \sum \limits _{e \in v^\textrm{in}} X^\mathcal {G}_{e} = \sum \limits _{e \in v^\textrm{out}} X^\mathcal {G}_{e} \qquad \forall v \in \mathcal {N} \end{aligned}$$
(11)

To make sure that the work blocks corresponding to the flow in \(\mathcal {G}\) (consisting of “concrete” activity blocks) match the work blocks corresponding to the flow in G (consisting of activity block templates), the flows in both networks need to be linked. In order to achieve this, it does not suffice to simply link blocks based on the start- and end period of the block, but one also need to account for the position of the activity blocks in the work blocks. This can be achieved by using the relative start period k of a block b within the work block as additional matching criterion. Note that this information is present in form of the state attribute \(s^\mathrm {\#pWb}_v\) in the state nodes of both networks G and \(\mathcal {G}\). Using the block b (characterized by start and end period) associated with an activity block arc and the relative position k obtained from the arc’s source node, we can define the arc set \(E^{\square }_{b,k}\) of all arcs in E representing a template activity block b (identified by start and end period) that starts in the relative period k of a work block. Analogously, we define the corresponding arc sets \(\mathcal {E}^{a}_{b,k}\) in \(\mathcal {E}\) that represent the activity blocks with start- and end period given by block b that are assigned to activity \(a \in A\) and start at the relative period k in a work block.

Assuming that the set \(K_b\) represents all possible work-block relative start periods of an activity block template b, we are ready to formulate the linking constraint that connects the flows in G and \(\mathcal {G}\):

$$\begin{aligned} \qquad \sum \limits _{e \in E^{\square }_{b,k}} X_{e} = \sum \limits _{a \in A} \sum \limits _{e \in \mathcal {E}^{a}_{b,k}} X^\mathcal {G}_{e} \qquad \forall b \in B_{\square }, k \in K_b \end{aligned}$$
(12)

Then, using set \(\mathcal {E}^\textrm{cov}_{a,p}\) referring to the set of all arcs in \(\mathcal {E}\) representing activity blocks that cover activity a in period p, we formulate the following covering constraints, followed by the domains of the flow variables associated with the arcs \(e \in \mathcal {E}\).

$$\begin{aligned} \sum \limits _{e \in \mathcal {E}^\textrm{cov}_{a,p}} X^\mathcal {G}_{e} + Y^\textrm{u}_{a,p} - Y^\textrm{o}_{a,p}&= d_{a,p}&\forall a \in A, p \in P \end{aligned}$$
(13)
$$\begin{aligned} X^\mathcal {G}_e&\in \mathbb {Z}^+_0&\forall e \in \mathcal {E} \end{aligned}$$
(14)

In addition to the constraints (11) – (14) described in this subsection, the full model with two coupled state-expanded networks contains the objective function (1), the flow balance constraints (2) for G, the flow size constraint (3) to model a fixed number of employees, and the variable domains (5) and (6).

Finally, observe that for MASSP variants such as the flexible variant of the Dahmen problem that only need to consider the activity change rule and minimum and maximum duration rules, the implicit activity change constraints introduced in Subsection 4.2 ensuring the correct decomposability into work blocks can be applied for the work block composition network \(\mathcal {G}\). This way, the state attribute \(s^\textrm{prevAct}_v\) can be dropped from the state definition of the work block composition network. The implicit activity change constraints read as follows when applied to \(\mathcal {G}\):

$$\begin{aligned} \qquad \sum \limits _{e \in v^\textrm{in}_{\lnot a}} X^\mathcal {G}_{e} \ge \sum \limits _{e \in v^\textrm{out}_a} X^\mathcal {G}_{e} \qquad \forall v \in \mathcal {N}^\textrm{inWb}, a \in A \end{aligned}$$
(15)

6 Computational results

In this section, we report the results from our experiments with two sets of instances, one for the Demassey problem, and one for the Dahmen problem. We start in Subsection 6.1 with experiments comparing the different model variants proposed in the last section both with respect to the size of the model instances and with respect to solution times. In Subsection 6.2, we compare the results from the best model variants to those obtained with state-of-the-art exact approaches from the literature.

All models were implemented in Python, and solved with Gurobi 9.1 with standard settings, except that the barrier method was used to solve the root relaxation. The computer used for the experiments was a Notebook with an Intel Core i7 10750 H processor clocked at 2.66 GHz with 6 cores and 32 GB RAM.

6.1 Experiments with different model variants

In this section, we compare the model variants proposed in the Sects. 4 and 5 for two sets of instances, one for the Demassey problem, and one for the Dahmen problem.

6.1.1 Demassey problem

The first set of experiments is conducted with the Demassey problem described in Sect. 2.1. The instances were first introduced in Demassey et al. (2005) and later used in several other papers such as Côté et al. (2011b), Côté et al. (2013) and Dahmen et al. (2018). The instance set comprises 100 instances consisting of 10 groups. Each of the groups is characterized by a given number of activities from 1 to 10, and the instances within each group vary with respect to the demand profile and with respect to the number of employees.

As described in 2.1, in the Demassey problem, each block of work needs to be assigned to a single activity, or, in other words, a change between different activities is only legal if there is a break in between. From a modeling perspective, this means that we do not need to deal with the composition of work blocks from multiple activity blocks, making it unnecessary to deal with work block composition rules such as the activity change requirement. As a consequence, we only compare two modeling approaches for the Demassey instances:

  • ActivitySEN: The plain model from Sect. 4 in which all rules are encoded in a single state-expanded network that is based on concrete activity blocks.

  • TemplateSEN: The model from Sect. 5.1 using a template block-based state-expanded network. Note that for the Demassey problem, we do not need a second state-expanded network for composing work blocks.

Table 1 Results for the Demassey instances

Table 1 presents the results from experiments with these two model variants. For both approaches, the table reports results for each of the 10 instance groups. Note that within one instance group, the problem structure is identical which means that for each instance group and for each model variant, the number of variables (Vars) and constraints (Cons) is identical. The other columns reported for each model variant are the number of instances solved to optimality with a time limit of 30 min and the average solution time in seconds; instances not solved to optimality are counted with 1800 s.

The results show that for the group with a single activity (yielding a mono-activity shift scheduling problem), the “plain” activity-based model yields smaller model instances and a shorter solution time. This is due to the fact that the template-based model introduces a “template flow” that is then only mapped to a single activity and thus, in that case, using the template-based model does not make much sense.

With an increasing number of activities, the template-based models can play out their strength: While the number of variables rapidly increases for the plain activity-based model, the size of the template-based model instances only grows moderately. Specifically, for the 10-activity instances, the activity-based model exhibits about eight times as many variables as the template-based model. Clearly, the model size impacts the performance: For the biggest instances, the activity-based model instances were not always solved to optimality within 30 min; in total, only 90 of the 100 instances are solved to optimality and the average solution time is 365 s. This contrasts with the template-based model: Using this model, all instances are optimally solved within half an hour, and the average solution time time is only 60 s.

6.1.2 Dahmen problem

The second set of experiments deals with the Dahmen problem described in Sect. 2.2. For this problem, Dahmen et al. (2018) presents experiments with 540 instances that vary with respect to the following features:

  • Time granularity / length of a single time period (15, 30, or 60 min).

  • Number of activities (3, 6 or 9).

  • Degree of block flexibility (1, 2 or 3).

  • Number of shift types (1 or 2).

  • The set of possible shift start periods (shifts can start every ith period with \(i \in \{1,2,3\}\)).

  • Demand profile index (1–5).

By grouping the 540 instances according to the first five features, one obtains 108 instance groups. Within each of these groups, all five instances have the same problem structure and only vary with respect to the demand profile.

In contrast to the Demassey problem, in the Dahmen problem, activity changes can happen within a work block. Furthermore, Dahmen et al. (2018) consider two problem variants: A basic (“flexible”) variant that imposes no restriction on the number of different activities assigned in a single work block, and a “restricted” variant in which at most two different activities are allowed per work block. In this section, we first deal with deal with the flexible variant, followed by results for the restricted variant.

Table 2 Results for the Dahmen instances with time granularity of 15 min (flexible variant of the Dahmen problem)

In our computational experiments with the flexible variant of the Dahmen problem, we use the following model variants:

  • ActivitySEN: The plain model from Sect. 4 in which all rules are encoded in a single state-expanded network that is based on concrete activity blocks.

  • ActivitySEN+ChangeCons: A model with a single activity block-based network that does not encode the activity change requirement and instead uses the activity change constraints introduced in Sect. 4.2.

  • TemplateSEN: The model from Subsection 5.2 in which a template-block-based state-expanded network is linked to a second state-expanded network for work block composition that encodes the activity change requirement.

  • TemplateSEN+ChangeCons: The model from Subsection 5.2 in with a template-block-based state-expanded network linked to state-expanded network for work block composition; the activity change requirement is enforced by activity change constraints 15.

The result from using these model variants on the 15-minute instances from the Dahmen instance set are displayed in Table 2. Results for the other instances (all of which are optimality, most in less than 10 s) can be found in Tables 6 and 7 in the appendix.

Regarding the model sizes, it turns out that the last model variant TemplateSEN+ChangeCons yields much smaller instances than the “plain” model: For the largest model instances, the plain model has almost 20 times as many variables and 5 times as many constraints. The difference in the model sizes is reflected in the solution time and the solution quality: With the plain model variant, only 150 of the 180 instances are solved to optimality within the time limit of 30 min. With the last model variant, in contrast, all instances are solved to optimality in slightly more than one minute on average; the biggest instances being solved in 5 min on average. Interestingly, when comparing the two variants in the middle, the variant with a single network and activity change constraints yield bigger model instances than the model variant with template blocks without the constraints but at the same time allows to solve more instances to optimality.

Let us now turn to the restricted variant of the Dahmen problem in which at most two different activities are allowed to be assigned per work block. Table 3 compares the results from using the best model (TemplateSEN) for this variant with those from the best model for the flexible variant (TemplateSEN+ChangeCons). Note that we do not use the model with the activity change constraints here since the state information needed for modeling the “at-most-two activities” rule already contains all the information needed to model the activity change requirement. See Sect. 3.2 for a description of this state model and 5.2 for a description of the mathematical model involving two coupled state-expanded networks. Since the model instances for the restricted model are bigger (on average, about twice as big in terms of variables and constraints) and thus harder to solve, we increased the solution time limit to one hour for these models. The results show that while the model instances are bigger and the solution times are higher, still all instances could be solved to optimality in less than 30 min on average for each instance group—only for five instances, more than 30 min were needed.

(Dahmen et al., 2018) raise the interesting question if the two-activity restriction has a negative impact on solution quality. Dahmen et al. (2018) suspected that there was no negative impact, but they were not able to solve all instances to optimality and thus could not give a definite answer for all instances in the problem set. In the experiments presented here, however, all instances were solved to optimality for both variants. The rightmost column (ObjDiff) in Tables 3, 6 and 7 reports the average difference in the objective between the restrictive and the flexible variant; and it turns out that indeed, for all instances, the optimal objective function value from the restricted variant is the same as the optimal objective from the flexible variant.

Table 3 Results for the flexible and the restricted variant of the Dahmen problem: instances with time granularity of 15 min

6.2 Comparison to other exact approaches from the literature

In this section, we compare the results from our experiments to the results reported in the literature for existing state-of-the-art exact approaches for the MASSP. To the best of our knowledge, while exact approaches for the Demassey problem were considered in various publications, for example in Demassey et al. (2005), Demassey et al. (2006),Côté et al. (2011b), Côté et al. (2013) and Dahmen et al. (2018), the only publication dealing with the Dahmen problem is the original publication.

Table 4 Comparing our approach to the state-of-the-art for the Demassey instances

As pointed out in Dahmen et al. (2018), probably the best existing exact approach for the Demassey problem is the implicit grammar model proposed in Côté et al. (2011b). In Table 4, we compare the results from our best model variant to those reported in Dahmen et al. (2018) for the implicit grammar model which are, to the best the of our knowledge, the most recent results reported for the grammar model. Like in the original paper (Côté et al., 2011b; Dahmen et al., 2018) solved the grammar model only until a MIP gap of 1% was reached. To allow a better comparison, we also ran a series of experiments with the same MIP gap the results of which are reported in Table 4. When comparing the number of variables and the number of constraints, our model is smaller than the grammar model; this difference increases with the number of activities per instance. For the 10-activity instances, the grammar model instances exhibit about 25% more variables and 50% more constraints than our model instances. Both approaches solved all instances to 1%-optimality within less than a minute on average.

Observe, however, that the results were obtained on different hardware, and with different MILP solvers. Specifically, regarding hardware, our computer is a notebook with a newer processor clocked at 2.66 GHz with 6 cores, 12 threads and 32 GB RAM, while the experiments from Dahmen et al. (2018) with the grammar models were carried out on a server with a Dual Intel Xeon X5650 processor clocked at 2.66 GHz with in total 12 cores and 24 threads and 72 GB of RAM. With respect to the solvers, it can be expected that the solver used in our experiments (Gurobi 9.1) is faster than CPLEX 12.6 used in Dahmen et al. (2018). Taking into account all these aspects, at this point, we cannot tell which of the models performs better with respect to solution time. However, it appears unlikely that if evaluated with the same hardware and the same solver, one approach would completely outperform the other.

Table 5 Comparing our approach to the state-of-the-art for the Dahmen instances with time granularity of 15 min

Table 5 deals with the Dahmen instances. For these instances, Dahmen et al. (2018) show that their implicit formulation yields much better results than the grammar formulation. Table 5 compares the results obtained with this implicit formulation to those obtained with our best model (TemplateSEN+ChangeCons) for the instances with a time granularity of 15 min intervals for the flexible variant of the Dahmen problem.

It turns out that the implicit model has much less constraints than our model for all instances. This is due to the fact that in the implicit model, all pre- and post-break work blocks are explicitly enumerated while in our model, the work blocks are composed by a flow in a state-expanded network in which each node corresponds to a constraint. When it comes to the number of variables, for the smallest instances with three activities, the instances from the implicit formulation also exhibit a much smaller number of variables. This is again related to the enumeration of the pre- and post-break work blocks in the implicit model: For the instances with only three activities and little flexibility, the number of enumerated work blocks is relatively small. The opposite is the case for the instances with nine activities and a lot of flexibility where the number of work blocks to enumerate is very large. For these instances, our model exhibits much less variables; for the biggest instance, the implicit model exhibits more than 20 times as many variables as our model.

With regard to the solution performance of the models, the statements from above regarding hardware and solver software also hold here. However, the differences in performance are much bigger than those reported for the Demassey problem, in particular when it comes to the large instances: With our model, all instances are solved to optimality, and even the largest instances are optimally solved in 5 min or less on average. The implicit model from (Dahmen et al., 2018), however, seems to struggle with the large instances: Even within 3 h of computation time, 45 of the instances are not solved to optimality. These results, in combination with the huge difference with respect to the number of variables, indicate that our model performs substantially better than the implicit model for large instances, despite the fact that one needs to be careful with such statements given differences in hardware and software mentioned above.

Regarding the “restricted” variant, we do not include an explicit comparison here since Dahmen et al. (2018) do not report detailed figures with respect to model size and instances solved to optimality per instance group. Nonetheless, it is interesting to note that while our model instances for the restricted variant are larger than those for the basic flexible variant (see above), the opposite is the case for the implicit model from (Dahmen et al., 2018): Since it relies on enumerating feasible work blocks, the instances get smaller for restricted variant (29 % on average as reported in the paper). This makes the restricted models easier to solve: instead of 45 instances not being solved to optimality within 3 h for the flexible problem, only 28 instances cannot be solved to optimality within that time for the restricted variant. Recall, however, that our model instances for the restricted problem can be solved to optimality within 296 s on average for the 15-minute instances; for the largest instance groups, the average solution time to optimality is around 20 min.

Table 6 Results for both variants of the Dahmen problem: instances with time granularity of 30 min
Table 7 Results for both variants of the Dahmen problem: instances with time granularity of 60 min

7 Conclusions

This paper presents a new MILP modeling approach for multi-activity shift scheduling problems based on state-expanded networks. In particular, it presents two techniques for dealing with the explosion of the size of the networks for large-scale instances: A set of implicit constraints ensuring that an aggregated flow can be decomposed into shifts respecting the activity change requirement, and the concept of template blocks that allows modeling different aspects of composing shifts in different and coupled state-expanded network. When combined, the two techniques allow reducing the size of the main state-expanded network by a factor in the order of the square of the number of activities in the instance under consideration.

We show how this approach can be used to model different MASSP problems. In particular, we provide experimental results for two big sets of MASSP instances with 100 and 540 instances, respectively.

Our experiments show that our approach is at least competitive with the best approach on the first set of instances, and is clearly better than the previously best approach for large instances from the second set. This is shown by the fact that we are able to solve all instances to optimality, including 70 previously unsolved instances from the second set within less than 45 min of computation time, most of the instances being solved much faster.

8 Preprint and relation to prior work

A preprint version of this paper can be found at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3798667. The previous title of the preprint was “State-Expanded Network Formulations for Multi-Activity Shift Scheduling”. In its latest revision, the title was changed to “Block-based state-expanded network formulations for multi-activity shift scheduling” in order to emphasise the important modeling decision of basing the state-expanded network on blocks of consecutive assignments and to highlight the difference to the state-expanded network model used in (Porrmann & Römer, 2021).

The article (Porrmann & Römer, 2021) was published before submitting this work to the Journal of Scheduling. The key differences between the contributions of (Porrmann & Römer, 2021) and those presented in this manuscript are discussed in the main text and can be summarised as follows:

  • Porrmann and Römer (2021) use a different state model which implies a different structure of the state-expanded network underlying the MILP formulation. In (Porrmann & Römer, 2021), a single activity block, that is, a sequence of consecutive assignments of the same activity, can be composed of multiple arcs. This requires that the state model contains a state attribute that allows computing the duration of an activity block. In the model in the present paper, every arc that represents an assignment corresponds to a full activity or break block, making it unnecessary to track any activity block-related aspects in the state model.

  • This block-based network structure allows us to develop the exact techniques (the implicit modeling of the activity change rule and the two-level model combining a model layer based on template blocks with an activity assignment layer) that form key contributions of this paper. While, as can be seen in the computational results, the block-based model itself still struggles with the largest instances, applying both techniques eventually permits solving all Demassey and Dahmen instances to optimality, many of them for the first time. Since using the model employed in (Porrmann & Römer, 2021) did not permit to efficiently solve all Demassey instances, the key contribution of that paper was to introduce a Machine Learning-based approach to heuristically remove nodes and arcs from the state-expanded network.

  • (Porrmann & Römer, 2021) only considers the Demassey MASSP variant; the present paper considers both the Demassey and the Dahmen variants and the respective instances.