1 Introduction

The ongoing transformation of production industries causes a paradigm shift in manufacturing processes toward new technologies and innovative concepts, called cyber, smart, digital or connected factory. The sector is entering its fourth revolution, characterized by a merging of computer networks and factory machines. At each link in production and supply chains, tools and workstations communicate constantly via Internet and local networks. Machines, systems and products exchange information both among themselves and with the outside world.

Such production systems are installed for goods that are produced in high quantities but in different configuations. By optimizing the flow of production, manufacturers hope to speed up individualized production at a lower cost and in a more environmentally sound way. In manufacturing practice of smart factories, there are not only lines with stations arranged one behind the other, but also more complex networks of stations at which assembly operations are performed. The considerable difference from flow lines, which can be analyzed by known methods, is that a number of required components are brought together to form a single unit for further processing.

For the optimization of the production, we explore the state space induced as a system of reactive modules, probably the most widely used simulation technique. Modern software systems support lightweight processes or threads. By the growing amount of non-determinism, however, such systems encounter their limits to optimize the concurrent acting of individual processes.

Agent-based simulation is a relatively new technique and consists of autonomous agents (self-directed objects which move about the system) and rules (which the agents follow to achieve their objectives). Agents move about the system interacting with each other and the environment and are used to model situations in which the entities have some form of intelligence. While the underlying implementation to control the production is in fact a multiagent system with goods that are associated with software agents that have desires of what they eventually become, and intension to avoid too much queuing, in the optimization we rather look at the network of deterministic reactive processes.

Performance analysis of production systems is generally needed during the planning phase regarding the system design, when the decision for a concrete configuration of such a system has to be made. The planning problem arises, e.g., with the introduction of a new model or the installation of a new manufacturing plant. Because of the investments involved, optimization of the system is crucial. The expenditure for new machines, for buffer or handling equipment, and the holding costs for the expected work-in-process face revenues from sold products. The performance of a concrete configuration is characterized by the throughput, i.e., the number of items that are produced per time unit. Other performance measures are the expected work in process or the idle times of machines or workers.

In this paper, we utilize the state-of-the-art model checker SPIN [31] as a performance analysis and optimization tool for such manufacturing systems, together with its input language Promela to express the flow production of goods. There are some twists needed to adapt SPIN to the optimization of the system that are uncovered in the sequel of the text. Language features from the latest version of the model checker (including loops and native c-code verification) are exploited. The text pushes forward the exchange of modeling and exploration between model checking and action planning, while studying an industrial case for the techniques to reach out to the real world.

Our running case study is the Z2, a physical monorail system for the assembling of tail lights. Unlike most production systems, Z2 employs agent technology to represent autonomous products and assembly stations. We formalize the production floor in this manufacturing process as a system of communicating processes and apply SPIN to analyze its behavior. Using optimization mechanisms implemented on top of SPIN, additional to the verification of the correctness of the model, we exploit its exploration process for the optimization of the production.

Instead of a simple objective like the number of steps in the plan, we look at a more complex cost function that computes the makespan based on travel and queue waiting times over a set of transport agents.

The algorithmic contribution of this text is general cost-optimization via constraint branch-and-bound. The optimization approaches originally invented by Ruys for SPIN were designed for traversing state space trees [45, 46], while the proposed approaches also support the traversal of state space graphs. Our argument in favor of using SPIN in this case study is that there is a natural correspondance between the queuing of messages in a communication channel and of products on a monorail to be processed in assembling stations. We also look into a framework of Monte Carlo search for comparison.

The paper is structured as follows. First, we review related work on industrial production and planning via model checking. Next, we introduce our Z2 case study of tail light assembling with its various manufacturing stages. The modeling as a distributed set of communicating processes (agents) is assisted by one possible formalization. We define the problem objective to be minimized and give a proof that the production flow problem is at least NP-hard. Afterward, we turn to the intricacies of the Promela model specification, introducing the rail switches as proctypes and the monorails as communication channels. The deterministic optimization scheme and its extensions are discussed, studying depth-first branch-and-bound, guarded branching, c-expression and process synchronization. Then, we describe an alternative for analyzing the Z2 model applying Monte Carlo search. In the empirical part, we study the effectiveness of the proposed approaches. In addition to this single-player Monte Carlo game framework implementation, we look at the parameterization of SPIN, and at two different models of simulation time.

2 Related work

Since the origin of the term artificial intelligence, the automated generation of plans for a given task has been seen as an integral part of problem solving in a computer. In action planning [23], we are confronted with the descriptions of the initial state, the goal (states) and the available actions. Based on these, we want to find a plan containing as few actions as possible (in case of unit-cost actions, or if no costs are specified at all) or with the lowest possible total cost (in case of general action costs).

The process of fully automated property validation and correctness verification is referred to as model checking [12]. Given a formal model of a system and a property specification in some form of temporal logic (such as LTL [22]), the task is to validate, whether or not the specification is satisfied in the model. If not, the model checker returns a counterexample trace as a witness for the falsification of the property.

Planning and model checking have much in common [10, 24]. Both rely on the exploration of a potentially large state space of system states. Usually, model checkers only search for the existence of specification errors in the model, while planners search for a path from the initial state to one of the goal states. Moreover, there is rising interest in planners that prove insolvability [30], and in model checkers to produce minimal counterexample [18].

Such form of planning via model checking has been pioneered by Abbeddaim and Maler, looking at job-shop scheduling problems with timed automata [1]; by Brinksma and Mader [6] in the verification and optimization of a PLC control schedule; and by Wijs [51] in a general treatment on analyzing and optimizing system behavior in time.

In terms of leveraging state space search, over the last decades there has been much cross-fertilization between fields. For example, based on Satplan citev32 bounded model checkers exploit SAT and SMT representations [2, 4] of the system to be verified, while directed model checkers [16, 35] exploit planning heuristics to improve the exploration for falsification; partial-order reduction [25, 50] and symmetry detection [20, 38] limit the number of successors, while symbolic planners [11, 17, 33] apply functional data structures like BDDs to represent sets of states succinctly.

Especially in open, unpredictable, dynamic, and complex environments, multiagent systems are applied to determine adequate solutions for transport problems. For example, agent-based commercial systems are used within the planning and control of industrial processes [13, 29] as well as within other areas of logistics [7, 19]. A comprehensive survey is provided by [42].

Flow line analysis is often done exploiting queuing theory [8, 39]. Pioneering work in analyzing assembly queuing systems with synchronization constraints analyzes assembly-like queues with unlimited buffer capacities [27]. It shows that the time an item has to wait for synchronization may grow without bound, while limitation of the number of items in the system works as a control mechanism and ensures stability. Work on assembly-like queues with finite buffers all assume exponential service times [3, 32, 36].

Model checking production flow is rare. Timed automata were used for simulating material flow in agricultural production [28]. There are, however, numerous attempts to apply model checking to validate the work of multiagent systems. The LORA framework [52, 53] uses labeled transition and Kripke systems for characterizing the behavior of the agents (their belief, their desire and their intention), and temporal logics for expressing their interplay, as well as for the progression of knowledge. Alternatives consider a multiagent system as a game, in which agents —either in separation or cooperatively—optimize their individual outcome [47]. Communication between the agent is available via writing to and reading from channels or via common access to shared variables. Other formalization approaches include work in the context of the MCMAS tool by Lomuscio1. Recently, there has been some approaches to formalize multiagent systems as planning problems [41].

Fig. 1
figure 1

Assembly scenario for taillights [40]

Fig. 2
figure 2

Assembly states of taillights [21]

3 Case study: Z2

One of the few successful real-world implementations of multiagent flow production is the so-called Z2 production floor unit [21, 40]. The Z2 unit consists of six workstations where human workers assemble parts of automotive taillights. The system allows production of certain product variations and reacts dynamically to any change in the current order situation, e.g., a decrease or an increase in the number of orders of a certain variant. As individual production steps are performed at the different stations, all stations are interconnected by a monorail transport system. The structure of the transport system is shown in Fig. 1. On the rails, autonomously moving shuttles carry the products from one station to another, depending on the products requirements. The monorail system has multiple switches which allow the shuttles to enter, leave or pass workstations and the central hubs.

The goods transported by the shuttles are also autonomous, which means that each product decides on its own which variant to become and which station to visit. This way, a decentralized control of the production system is possible.

The modular system consists of six different workstations, each is operated manually by a human worker and dedicated to one specific production step. At production steps III and V, different parts can be used to assemble different variants of the taillights as illustrated in Fig. 2. At the first station, the basic metal-cast parts enter the monorail on a dedicated shuttle. The monorail connects all stations; each station is assigned to one specific task, such as adding bulbs or electronics. Each taillight is transported from station to station until it is assembled completely.

From the given case study, we derive a more general notation of flow production for an assembly line network. System progress is non-deterministic and asynchronous, while the progress of time is monitored.

Definition 1

(Flow production)

A flow production is tuple \(F = (A,P,G,\prec ,S,Q)\) where

  • A is a set of all possible assembling actions;

  • P is a set of n products; each \(P_i \in P\), \(i \in \{1,\ldots ,n\}\), is a set of assembling actions, i.e., \(P_i \subseteq A\);

  • \(G = (V,E,w,s,t)\) is a graph with start node s, goal node t, and weight function \(w: E \rightarrow I\!\!R_0\);

  • \(\prec = (\prec _1,\ldots ,\prec _n)\) is a vector of assembling plans with each \(\prec _i \in A \times A\), \(i \in \{1,\ldots ,n\}\), being a partial order;

  • \(S \subseteq E\) is the set of assembling stations induced by a labeling function \(\rho : E \rightarrow A \cup \{\emptyset \}\), so that \(S = \{e \in E \mid \rho (e) \ne \emptyset \}\)

  • Q is a set of (FIFO) queues, all of finite size together with a labeling \(\chi : E \rightarrow Q\).

Products \(P_i\), \(i \in \{1,\ldots ,n\}\), travel through the network G, meeting their assembling order \(\prec _i \subseteq A \times A\) of the assembling actions A.

Definition 2

(Run, plan, path)

Let \(F = (A, P, G, \prec , S, Q)\) be a flow production. A run \(\pi \) is a set of triples \((e_j, t_j, l_j)\) of edges \(e_j\), queue insertion positions \(l_j\), and execution time-stamp \(t_j\), \(j \in \{1,\ldots ,n\}\). The run partitions into a set of n plans \(\pi _i = (e_1,t_1,l_1),\ldots ,(e_{m_i},t_{m_i},l_{m_i} )\), one for each product \(P_i\), \(i \in \{1,\ldots ,n\}\). Each plan \(\pi _i\) corresponds to a path, starting at the initial node s and terminating at goal node t in graph G.

In the real-world implementation of the Z2 system, every assembly station, every monorail shuttle and every product is represented by a software agent. Even the RFID readers which keep track of product positions are represented by software agents which decide when a shuttle may pass or stop. The agent representation is based on the well-known Java Agent Development Kit (JADE) and relies heavily on its Foundation for Intelligent Physical Agents (FIPA)-compliant messaging components.

Fig. 3
figure 3

Preconditions of the various manufacturing stages

Most agents in this system resemble simple reflex agents. These agents just react to requests or events which were caused by other agents or the human workers involved in the manufacturing process. In contrast, the agents which represent products are actively working toward their individual goal of becoming a complete taillight and reaching the storage station. In order to complete its task, each product has to reach sub-goals which may change during production as the order situation may change. The number of possible actions is limited by sub-goals which already have been reached, since every possible production step has preconditions as illustrated in Fig. 3.

The product agents constantly request updates regarding queue lengths at the various stations and the overall order situation. The information is used to compute the utility of the expected outcome of every action which is currently available to the agent. High utility is given when an action leads to fulfillment of an outstanding order and takes as little time as possible. Time, in this case, is spent either on actions, such as moving along the railway or being processed, or on waiting in line at a station or a switch. Each agent individually makes its decisions [43].

More generally, the objective of products in such a flow production system can be formalized as follows.

Definition 3

(Optimization objective, travel and waiting time)

Let \(F = (A, P, G, \prec , S, Q)\) be a flow production. The optimization objective is to minimize

$$\begin{aligned} \max _{1\le i \le n} wait(\pi _i)+time(\pi _i) \end{aligned}$$

over all possible paths with initial node s and goal node t, where

  • \(time(\pi _i)= \sum _{e \in \pi _i} w(e)\) is the travel time of product \(P_i\), alias the sum of edge costs

  • \(wait(\pi _i)= \sum _{(e,t,l) (e',t',l') \in \pi _i, e' \in Pred(e)} t - (t' + w(e'))\) the waiting time, where \(Pred(e)= \{e' = (u,v) \in E \mid e = (v,w)\}\) is the set of predecessor edges of e.

It is not difficult to show that the production flow problem is at least NP-hard. To obtain a decision rather than an optimization problem, we choose the time threshold large enough, so that only goal achievement is of interest. The reduction to 3-SAT then is as follows.

We take an instance of 3-SAT with n variables and m clauses. Each clause is a disjunction of literals \(l_1 \cup l_2 \cup l_3\), where each literal is a variable in \(\{x_1,\ldots ,x_n\} \cup \{\lnot x_1,\ldots ,\lnot x_n\}\). Every clause shuttle (of m shuttles) consists of variable assemblies, representing its 3 literals. At 2n stations, the variables are set either to true or false (like a clear or colored bulb). The assignment, therefore, avoids a variable being set to both true or false. If one variable \(x_i\) is fixed to a value true and false, all assembling plans of the other shuttles are dynamically adjusted to only allow this truth value assignment to the variable \(x_i\) (and the opposite assignment to \(\lnot x_i\)).

The terminal state is to have all shuttles arrive at the target location, so the terminal state is reached if and only if the 3-SAT formula is fulfilled.

The Z2 multiagent system was developed strictly for the purpose of controlling the Z2 monorail hardware setup. Nonetheless, due to its hardware abstraction layer [40], the Z2 multiagent system can be adapted into other hardware or software environments. By replacing the hardware with other agents and adapting the monorail infrastructure into a directed graph, the Z2 multiagent system can be transferred to a virtual simulation environment [26]. Such an approach, which treats the original Z2 agents like black boxes, can easily be hosted by a JADE-based event-driven multiagent system simulation platform. Experiments show that the simulated and the real-world scenarios match. For this study [26], the authors extended the model with timers to measure the time taken between two graph nodes. For the analysis, we simplify the system to reactive processes.

4 Promela model

Promela is the input language of the model checker SPIN, the ACM-awarded popular open-source software verification tool, designed for the formal verification of multi-threaded software applications, and used by thousands of people worldwide. Promela defines asynchronously running communicating processes, which are compiled to finite-state machines. It has a c-like syntax, and supports bounded channels for sending and receiving messages.

Channels in Promela follow the FIFO principle. They implicitly maintain order of incoming messages and can be limited to a certain buffer size. Consequently, we are able to map edges to communication channels. Unlike the original Z2 multiagent system, the products are not considered to be decision-making entities within our Promela model. Instead, the products are represented by messages which are passed along the node processes, which resemble switches, station entrances and exits.

Fig. 4
figure 4

Weighted graph model of the assembly scenario

We provided the physical model with timers to measure the time taken between two graph nodes. Since the hardware includes many RFID readers along the monorail, which all are represented by an agent and a node within the simulation, we simplified the graph and kept only three types of nodes: switches, production station entrances and production station exits. The resulting abstract model of the system is a weighted graph (see Fig. 4), where the weight of an edge denotes the traveling/processing time of the shuttle between two respective nodes.

The Promela model is designed to apply branch-and-bound optimization in order to evaluate the optimal throughput of the original system. Instead of local decision making, the various processes have certain non-deterministic options of handling incoming messages, each leading to a different system state. The model checker systematically computes these states and memorizes paths to desirable outcomes when it ends up in final states. As mentioned before, decreasing production time for a given number of products increases the utility of a final state.

The important aspect is that for branch-and-bound optimization SPIN does not stop with proving but with disproving, and if it is forced to go on searching by its termination code. Either there is no continuation, then time is progressed automatically (by the so-called watchdog), or the goal of the simulation has been reached in which an assertion assert(!final) is thrown, and a counterexample is generated, the trail is copied and the number of trails are counted.

We derive a Promela model of the Z2 as follows. First, we define global settings on the number of stations and number of switches. We also define the data type storing the index of the shuttle/product to be byte. In the model, switches are realized as processes and edges between the units by the following channels.

figure a

As global variables, we have bit-vectors for marking the different assemblies.

figure b

Additionally, we have a bit-vector that denotes when a shuttle with a fully assembled item has finally arrived at its goal location. A second bit-vector is used to set for each shuttle whether it has to acquire a colored or a clear bulb.

figure c

A switch is a process that controls the flow of the shuttles. In the model, a non-deterministic choice is added to either enter the station or to continue traveling onwards on the cycle. Three of four switching options are made available, as immediate re-entering a station from its exit is prohibited.

figure d

The entrance of a manufacturing station takes the item from the according switch and moves it to the exit. It also controls that the manufacturing complies with the capability of the station.

First, the assembling of product parts is different at each station; in the stations 1 and 3, we have the insertion of bulbs (Station 1 provides colored bulbs, Station 3 provides clear bulbs), Station 2 assembles the seal, Station 4 the electronics and Station 0 the cover. Station 5 is the storage station where empty metal casts are placed on the monorail shuttles and finished products are removed to be taken into storage. Secondly, there is a partial order of the respective product parts to allow flexible processing and a better optimization based on the current load of the ongoing production.

figure e

An exit is a node that is located at the end of a station, at which assembling took place. It is connected to the entrance of the station and the switch linked to it.

figure f

A hub is a switch that is not connected to a station but provides a shortcut in the monorail network. Again, three of four possible shuttle movement options are provided

figure g

In the initial state, we start the individual processes, which represent switches and hereby define the network of the monorail system.

Moreover, initially, we have that the metal cast of each product is already present on its carrier, the shuttle. The coloring of the taillights can be defined at the beginning or in the progress of the production. Last, but not least, we initialize the process by inserting shuttles on the starting rail (at Station 5).

figure h

We also heavily made use of the term atomic, which enhances the exploration for the model checker, allowing it to merge states within the search. In difference to the more aggressive d step keyword, in an atomic block all communication queue actions are blocking, so that we chose to use an atomic block around each loop.

5 Optimized scheduling

Inspired by [6, 37] and [45], we applied and improved branch-and-bound (BnB) for the optimization. Branching is the process of spawning subproblems, while bounding refers to ignoring partial solutions that cannot be better than the current best solution. To this end, lower and upper bounds are maintained as global control values on the solution quality, which improves over time.

For applying BnB to flow production systems, we extend depth-first search (DFS) with upper (and lower) bounds. In this context, branching corresponds to the generation of successors, so that DFS can be casted as generating a branch-and-bound search tree. One way of obtaining a lower bound L for the problem state u is to apply an admissible heuristic h with \(L(u) = g(u)+h(u)\), where g denotes the cost for reaching the current node from the root, and h is admissible; it always underestimates the remaining cost to reach a goal.

As with standard DFS, the first solution obtained might not be optimal. With depth-first branch-and-bound (DFBnB), however, the solution quality improves over time together with the global value U until eventually the lower bound L(u) at some node u is equal to U. In SPIN, the trivial heuristic \(h =0\) is used, but in HSF-Spin [16], a number of heuristic functions have been implemented. We obtain the following result.

For an admissible heuristic function h, the DFBnB procedure will eventually find the optimal solution to the flow production problem \(F = (A,E,G,\prec ,S,Q)\). It compute costs for partial runs and extend partial schedules incrementally. The objective function is monotonically increasing. Only inferior paths that cannot be extended to a better than the currently best one are pruned. As the state space is finite, the search will eventually terminate and return the optimal solution.

There are different options for finding optimized schedules with the help of a model checker that have been proposed in the literature.

First, in the Soldier (alias Hippie) model of [?], rendezvous communication to an additional synchronized process has been used to increase cost, dependent on the transition chosen, and together with a specialized LTL property to limit the total cost for the model checking solver. This approach, however, turned out to be too limited for our purpose.

An alternative proposal for branch-and-bound search is based on the support of native c-code in SPIN (introduced in version 4.0) [45]. One running example is the traveling salesman problem (TSP), but the approach is generally applicable to many other optimization problems. However, as implemented, there are certain limitations to the scalability of state space problem graphs. Recall that the problem graph induced by the TSP is in fact a tree, generating all possible permutations for the cities.

Following citev13,v6 and [45], we applied improved branch-and-bound optimization within SPIN. Essentially, the model checker can find traces of several hundreds of steps and provides trace optimization by finding a minimized counterexample (if ran with the parameter i). As these traces are step-optimized, and not cost-optimized, Ruys [45] proposed the introduction of a variable cost that we extend as follows.

figure i

While the cost variable increases the amount of memory required for each state, it also limits the power of SPINs built-in duplicate detection, as two otherwise identical states are considered different if reached by different accumulated cost. If the search space is small, so that it can be explored even for the enlarged state vector, then this option is sound and complete, and finally returns the optimal solution to the optimization problem. However, there might be simply too many repetitions in the model so that introducing cost to the state vector leads to a drastic increase in state space size, so that otherwise checkable instances now become intractable. We noticed that even by concentrating on safety properties (such as the failed assertion mentioned), the insertion of costs causes troubles.

5.1 Guarded branching

For our model, cost is tracked for every shuttle individually. The cost of the most expensive shuttle ride fixes the duration of the whole production process. Furthermore, the total cost value provides insight regarding unnecessary detours or long waiting times. Hence, minimizing both criteria are possible optimization goals of this model.

In Promela, every non-deterministic choice (in do-od blocks) is allowed to contain an unlimited number of possible options for the model checker to choose from. In theory, the model checker randomly chooses between these options, but at least in the deterministic enumeration options of the state spaces, the SPIN model checker obeys to the ordering of statements in non-deterministic choices. We used this feature extensively to order the statements that SPIN is able to find a valid trace on one of its first execution branches.

It is also possible to add a condition: if the first statement of a do-od block holds, SPIN will start to execute the following statements; otherwise, it will pick a different option.

Since the model checker explores any possible state of the system, many of these states are technically reachable but completely useless from an optimization point of view. In order to reduce state space size to a manageable level, we add constraints to the relevant receiving options in the do-od block of every node process.

Peeking into the incoming queue to find out, which shuttle is waiting to be received is already considered a complete statement in Promela. Therefore, we exploit C-expressions to combine several operations into one atomic statement. For every station t and every incoming channel q, a function prerequisites(tq) determines, if the first shuttle in q meets the prerequisites for t, as given by Fig. 3.

figure j

At termination of a successful run, we now extend the proposal of [45]. We use the integer array cost of the Promela model. It enables each process to keep track of its local cost vector and is increased by the cost of each action as soon as the action is executed. This enables the model checker to print values to the output, only if the values of the current maximum and total cost values have improved.

figure k

For solution reconstruction, we write a file for each new cost value obtained, temporarily renaming the trail file as follows.

figure l

5.2 Process synchronization

Due to the nature of the state space search of the model checker, processes in the Promela model itself do not make decisions. Nonetheless, the given model is distributed consisting of a varying number of processes, which potentially influence each other if executed in parallel.

We address this problem by introducing event-based progression to the Promela model. Whenever a shuttle s travels along one of the edges, the corresponding message is put into a channel and the cost of the respective shuttle is increased by the cost of the given edge.

figure m

We introduce another atomic function receives(q) that flags true if the first item s of q has minimal cost, changing the receiving constraint to the following.

figure n

Within SPIN, the global Boolean variable timeout is automatically set to true when all current processes are unable to proceed, e.g., because they cannot receive a message. Consequently, for every shuttle p, all processes will be blocked and timeout will be set to true. As suggested by Bosnacki and Dams [5], we add a process that enforces time progress, whenever timeout occurs (final is a macro for reaching the goal).

figure o

Time delay is enforced as follows: if the minimum event in the future event list is blocked (e.g., a shuttle is not first in its queue), we compute the wake-up time of the second best event. If the two are of the same time, a time increment of 1 is enforced. In the other case, the second best event time is taken as the new one for the first. It is easy to see that this strategy eventually resolves all possible deadlocks. Its implementation is as follows.

figure p

As a summary, the constraint bounded depth-first exploration has turned into the automated generation of the underlying state space of the event system, using c-code to preserve the causality of actions and to simulate the future event list.

6 Monte Carlo search

In the encoding of the production problem as a single-player game, the player starts at an empty floor and chooses in each step an agent and a case (one for entrance and exit, three for switches) until the goal is achieved or a predefined length is exceeded. The smaller the makespan for each agent found by the algorithm, the higher the score of the play. More formally, the game is consisting of all queue content, shuttle locations, and their respective cost values. The start position has all shuttles and all queues being empty and moves are the set of allowed actions for each queue. The set of final positions consists of all states in which either all the individual goals or the maximal step sized is reached, and the score function adds a constant for each individual unreached goal, on top of the maximum of the individual cost values.

The components of the game induce a search tree in the natural way with the final positions as leaves. A play(out) is a path from the root to some leaf.

6.1 UCT

The randomized optimization scheme we consider refers to the wider class of Monte Carlo search (MCS) algorithms. The main concept of MCS is the random playout (or rollout) of a position, whose outcome, in turn, changes the likelihood of generating successors in subsequent trials. Prominent members in this class of reinforcement learning algorithms are upper confidence bounds applied to trees (UCT) [34]. In each iteration of UCT, there are four stages: selection (following a tree policy), expansion (adding a leaf node), simulation (performing a rollout), and backpropagation for the adaptation of values at nodes based on the observed outcome. Cumulating in the success of AlphaGo and AlphaZero in winning matches against professional human Go player [48], and top Chess and Shogi engines [49], the impact of MCS in playing games can no longer be doubted.

6.2 Nested Monte Carlo search

Nested Monte Carlo Search (NMCS) [9] is a randomized search method that has been successfully applied to solve many challenging combinatorial problems, including Klondike Solitaire, Morpion Solitaire, Same Game, just to name a few. A large fraction of TSP instances have been solved efficiently at or close to the optimum. NMCS compares well with other heuristic methods that include much more domain-specific information. NMCS is parameterized with the recursion level of the search (which denotes the depth of the recursion tree) and with the number of iterations (that denotes the branching of the recursion tree). At each leaf of the recursive search a rollout is executed, which performs and evaluates a random run.

What makes the NMCS variant Nested Rollout Policy Adaptation (NRPA) [14, 44] notably different is the concept of learning a policy through an explicit mapping of moves to selection probabilities.

The production flow problem is implemented in a Monte Carlo search framework for single-agent game, so that the interface adapts the nomenclature of a game, Agents meet in arenas, they have some legal moves to execute, they enter, they play, they score and they terminate.

figure q

A move (or play) corresponds to a movement of shuttles and is a combination of agent ID and applied case. Moves are stored in the rollout which is bound by a Boolean condition (terminal). The length of the rollout and score are recorded and the score, which itself is the result of a maximization of costs over all agents, is minimized. Finding the potential set of successors (legalMoves), finalizes the implementation. In NRPA, each rollout (calling the constructor) is initially empty.

The abstract class that is needed for each individual agent looks as follows.

figure r

Next we provide one example for implementing an agent. Operating the switch leads to three different cases.

figure s

Last, but not least, we provide the implementation of incrementing virtual time,

figure t

and the channel requirements to obey imposed constraints on a valid operation. Most queues are unbounded but some are restricted to size 1, namely the ones responsible for assembling an item.

figure u

7 Experiments

We first compare the results of the exploration minimizing local virtual time (LVT) [15] to the ones simulating the discrete event system (DES) described in this paper. For comparison, we also present timing results of simulation runs of the original multiagent system implementation (MAS) on Z2 hardware [26]. We use MCS for Monte Carlo search with nested rollout policy adaptation.

Unlike the original software system, the Promela models do not rely on local decision making and induce a systematic global search for an optimal solution. Therefore, in difference to a distributed multiagent system that is analyzed, both models are studied and explored a centralized planning via model checking.

For executing the model checking, we chose version 6.4.3 of SPIN. As a compiler, we used gcc version 4.9.3, with the posix thread model. For the standard setting of trace optimization for safety checking (option SAFETY), we compiled the model as follows.

figure v
Table 1 Simulated production times for n products in the original MAS and SPIN simulation, including the amount of RAM required to compute the given result

Parameter i stands for the incremental optimization of the counterexample length. We regularly increased the maximal tail length with option m, as in some cases of our running example, the traces turned out to be longer than the standard setting of at most 10,000 steps (Table 1). Option REACH is needed to warrant minimal counterexamples.Footnote 1

For smaller problems, we experimented with SPINs parallel BFS, as it computes optimal-length counterexamples. The hash table is shared based on compare-and-swap (CAS). We also tried sbitstate hashing (supertrace) as a trade-off. Unfortunately, BFS interacts with c track, so we had to drop the experiments for cost optimization. Swarm search (./swarm -c3 -m16G -t1 -f) found many solutions, some of them being shorter than the ones offered by option i (indicating ordering effects), but due to the increased amount of randomness, for the optimized scheduling problem of the Z2 in general no better results that ordinary DFS were found. In each experimental run, a number of \(n \in \{2,\ldots ,20\}\) shuttles carry products through the facility. All shuttles with even IDs acquire clear bulbs; all shuttles with odd IDs acquire colored ones.

A close look at the experiment results of every simulation run reveals that, given the same number of products to produce, all three approaches result in different sequences of events. LVT and DES propose the same sequence of production steps for the product of each shuttle 4,2,1,0,5, while the MAS approach proposes 2,1,4,0,5. All three simulation models keep track of the local production time of each shuttles product. In MAS and LVT simulation, minimizing maximum local production time is the optimization goal. Steady, synchronized progress of time is maintained centrally after every production step. Hence, whenever a shuttle has to wait in a queue, its total production time increases. For the DES model, progress of time is managed differently. Results show that max. production time in DES is lower than LVT and MAS production times in all cases.

For every experiment, the amount of RAM required by DES to determine an optimal solution is slightly lower than the amount required by LVT as shown in Table 2. While the LVT required several iterations to find an optimal solution, the first valid solution found by DES was already the optimal solution in any conducted experiment. However, the LVT model is able to search the whole state space within the 16 GB RAM limit (given by our machine) for \(n \le 3\) shuttles, whereas the DES model is unable to search the whole state space for \(n > 2\). For every experiment with \(n > 3\) (LVT) or \(n > 2\) (DES) shuttles, respectively, searching the state space for better results was canceled, when the 16 GB RAM limit was reached.

The experiments indicate that the DES is faster and more memory efficient than the LVT approach. In our experiment, if one element in a process queue was delayed, all the ones behind it were delayed as well. While DES and LVT are both sound in resolving deadlocks for the given case study, LVT has the more accurate representation for the progress of time.

Table 2 Efficiency of MCS for a rising number of shuttles

Table 2 shows that the MCS implementation also scales well. It shows the length of the plan, the simulation time (Time), and the runtime for a growing number of vehicles. Different to the other models, we do not enforce prerequisites, namely that shuttles are protected from driving into a station if they have not all required components available. Given that SPIN is a full-fledged model checker that analyzes the encoding of the problem on the source-code level (resulting of traces that have thousands of steps), the result could have been expected, even though the search space is huge.

Table 3 Efficiency of MCS and LVT for a rising number of shuttles in dynamic context

The runtime for MCS was less than 5 seconds, the RAM requirements rather small (less than 4MB for the largest instance). As with the DES/LVT model, in cost we measure travel time plus some initial waiting time. To help the solver to find valid solutions, we extended the objective fuction \((1000 \cdot reached)+maximum\) by the term \((10 e_r)+(10 b_r)+(10 s_r)+(100 d_r)\), where \(e_r\), \(b_r\), \(s_r\), and \(d_r\) are the violations to the assembling status of electronics, bulbs, seals, and diffusors, respectively. We observe that there is a noticeable difference in the simulation times of LVT and DES even for two shuttles. Hence, we decided to reimplement LVT and have the two cost functions in a close match.

Table 3 shows that (due to RAM usage) the time to find the first solution for the manually tuned model in SPIN is extremely short, and the MCS remained sufficiently fast. The resulting cost values of the schedules are about the same with small advantages to MCS. However, this is not a deficiency in SPIN: we blame the differences to a slight shift in the evaluation of the terminal state: SPIN invokes the watchdog, while MCS extracts the shuttles from the rails.

In Table 3, we also added the measurements of the simulated multiagent system, where the agents chose the color of the lamp dynamically based on fuzzy logic decision rules that take the incoming orders and observed current queue lengths into account. Even though the task is slightly different in the real world with dynamically acting agents, we can derive that both analytical methods show a potential for optimization in the dynamic schedule.

At this point, we emphasize that both the model checker and the Monte Carlo solver generate traces from top to bottom, so it is possible to include dynamics such as a different choice of bulb colors based on incoming orders and queue length into the state space search. On the other hand, input models for SPIN and MCS are finite; they do not feature optimization in infinite open loops.

8 Conclusion and discussion

In this work, we presented a novel approach for optimizing a modern industrial production line. The research is motivated by our interest in finding and comparing centralized and distributed solutions to the optimization problems in autonomous manufacturing.

The formal model reflects the routing of shuttles in the monorail multiagent system. Using model checking for optimizing multiagent and discrete event systems is a relatively new playground for formal method tools in form of a new analysis paradigm. Switches of the rail network were modeled as active processes, the edges between the switches as communication channels.

Our results clearly indicate a lot of room for improvement in the decentralized solution, since the model checker found more efficient ways to route the shuttles on several occasions. Furthermore, the model checker could derive optimized plans of several thousand steps.

With all the additional c-code, we have worked hard to make SPIN do what it originally is not supposed to do: real-time model checking and optimization of the makespan of traveling agents in an industrial setting. As a surplus, however, we arrive at a larger generality of systems that can be optimized. As with directed model checking, additional heuristics are expected to guide the search toward finding a good plan even faster. Even though discrete event simulation is a powerful method, by looking at the limits and possibilities of representing time, alternatives to discrete time model checking based on ticks have to be looked at.

Monte Carlo search performed even better than systematic exploration. The advanced NRPA algorithm also has a smaller memory footprint than exhaustive methods (assuming that they store states for eliminating duplicates). The main advantage to a systematic enumeration of the state space is that the random choices do not rely on a fixed traversal ordering and that for a good performance, we do not had to assist the model checker by ordering the non-deterministic choices manually.

The success in many other application domains illustrates the effectiveness and generality of this search option: compared to depth-first search and branch-and-bound, Monte Carlo search is less dependent on finding a very good ordering of the successors. Even hand-coded pruning like checking prerequisites was not needed to arrive at high-quality solutions in a short amount of time. Nonetheless, there is noticeable impact of hand-coded information, as additional guidance information encoded in the cost function increased the performance of the search substantially, preventing the planner from exploring large parts of the state space.

So far, we haven’t extended SPIN by Monte Carlo search but gave an alternative implementation based on an existing single-player game optimization framework. Thus, on a first glance, a head-to-head comparison is seemingly unfair: SPIN is aimed at general software verification, and with a granularity on lines of the running Promela code, it does produce much longer traces than the search framework, which is based on non-deterministic action choices only.

Moreover, SPIN already supports randomization in its swarm search wrapper, but swarm algorithms are based on random depth-first search, while Monte Carlo search with policy adaptation improves over time, and, thus, learns the structure of the underlying problem. Nestedness of the search leads to exponential refreshment of policies, and, therefore, offers a trade-off between exploitation and exploration.

The interface for randomized search is simple and flexible. We are confident that the implementation of a Model Checker based on Monte Carlo search is only a matter of time.