Keywords

1 Introduction and Related Work

Contrasting to traditional mass production, manufacturing demands have shifted towards shorter innovation cycles and small-batch production. This has raised the demand for flexible manufacturing systems that can quickly be adapted to customized products by domain-experts in small and medium enterprises [6]. When additionally considering recent advances in collaborative robotics towards flexible partial automation, adaptation of robot programs to various sources of variety are needed [1, 4]: Product variety is needed to manufacture different product instances from a product family by assembling parts with varying features (e.g. color) to suite individual customer demands [9]. In this field, we particularly focus on process-specific variations [7] that additionally yield process variety. Relevant robot task parameters that may change with process-specific variations are e.g. pickup or placement locations, or even the ordering of process steps [1].

Visual end-user robot programming is an established approach to cope with such variety [6]. Corresponding approaches [14, 17,18,19] are mostly based on skill frameworks. Those let users combine skills with human-readable semantics into tasks (e.g. [15]) even for human-robot collaboration [16, 18]. Modularity and intuitive usability support convenient (re-)programming and, in consequence, quick adaptation. In contrast, our contribution seeks to reduce recurrent programming efforts by applying the visual programming paradigm to a task model that intrinsically encodes a subset of feasible variations (e.g. different part types or locations) and adapts online (Fig. 1). We hypothesize that this would further contribute to the economic efficiency of intelligent robot systems.

Fig. 1
figure 1

Visual programming enables frequent end-user robot task adaptation to customer demands in flexible manufacturing (a). We seek to reduce programming efforts by online adaptation (b). To this end, we propose to explicitly encode different situations with product variety (A1, A2) or process variety (B1, B2) in a single task model

Corresponding task models with variety have also been addressed in literature. Among them, especially precedence graphs and hierarchical AND/OR Trees are frequently used in intelligent robot systems (e.g. [4, 13, 16]). They seek to encode all feasible assembly sequences [10], hence focussing on process variety. Similarly, hierarchical models emphasizing product variety [7, 9, 11], approaches at the intersection of assembly and product family oriented goals [5], and ontologies to exchange production data under variety [8] have been proposed. They commonly decompose products into functional entities [11] until inseparable, constituent components referred to as primary generic products [9] or parts families [7] are reached. A group of feasible variants for assigning a part in concrete product instances is associated with each component. Analogously, groups of feasible locations can be expressed with spatial relations [14], or more specifically with areas in the workspace [18]. Taking inspiration from this group notion for feasible part types and locations, we propose end-user programming of assembly task models with skills accepting parts families and partly known locations as input. This way, parameters can be partially left underspecified at modelling time to create a single task model for several instances of the task. Consider e.g. a pick-and-place task that involves fetching five bolts from the imprecise location conveyor and putting them into a box—with our approach, a single task model is sufficient to robustly conduct this kitting task for any positions and orientations of bolts on the conveyor, and for any size of bolts.

Once a skill is executed, one of the physically present entities with precisely known parameters as sensed by the robot must be assigned to the symbolic part description in the task model. Establishing a link between symbolic parts and the world is referred to as the anchoring problem [3]. This in particular includes deciding between multiple sensed entities that equally match an ambiguous part description (e.g. bolts of different sizes all being of type bolt). Related approaches perform anchoring with local decisions [4, 14, 18]. Ambiguity is here resolved in the scope of a skill without considering subsequent process steps, e.g. by choosing from all matching entities the one closest to the robot [14], or by drawing randomly [18]. However, such decisions can render the overall process infeasible (Fig. 2): Despite being suitable for the currently considered skill, an entity may be strictly required by some subsequent skill with more strongly constrained input parts. Choosing the “wrong” entity will thus lead to an error when trying to anchor this subsequent skill. Therefore, we propose an algorithmic procedure with global decisions which considers the constraints of all skills during the anchoring process.

Fig. 2
figure 2

Our task models may be underspecified, e.g. by skills accepting any kind of gear (1 and 2) for adaptation to sensed parts in a world model (a-d). Locally correct anchoring decisions, e.g. assigning red_gear c to skill 1, can render the process infeasible when subsequent skills have strictly specified input parts (3 and 4)

All in all, our contribution is twofold: (i) We propose a task model and visual programming procedure with robot skills accepting parts families and flexible locations rather than definitely specified, uniquely identified parts as input parameters. (ii) We show a computationally efficient method for anchoring and executing such task models in unknown environments with ambiguous parts.

2 Our Approach

An overview of our approach is shown by Fig. 3. Users will first use a visual programming task editor to create a precedence graph model (Sect. 2.2) capturing different instances of the task (Sect. 2.3). After that, the robot workspace is prepared by supplying concrete parts. The task model provides partly underspecified information about the types and approximate locations of parts to be expected when executing the task (Sect. 2.1). From this information, a path to explore points of interest in the workspace with a camera attached to the robot hand is calculated. A world model is then built by active vision, i.e. by approaching each point of interest and performing object recognition. The world model enables the computational process of plan instantiation for the perceived situation in the workspace (Sect. 2.4): Detected entities in the world model are assigned to parts referenced in the task model with an assignment planner solving the anchoring problem. Together with the task model, the resulting assignment solution is passed to a task sequencer. The sequencer applies a scheduling algorithm to the task model and finishes skill parametrization by replacing underspecified parameters with precise information from the world model. The resulting operation sequence is finally passed to a skill execution engine. After task completion, further materials can be supplied, and the plan instantiation process can be re-iterated starting from the workspace exploration step without manually adapting the task model.

Fig. 3
figure 3

Our approach adapts generalized task models emerging from a visual programming procedure by means of active workspace exploration, assignment planning, task sequencing, and skill execution

2.1 Part Types and Locations

We describe parts in terms of their type and location in the workspace. To this end, a part type is an entry taken from a tree-shaped part type ontology. This ontology is a required input to the approach. It captures “is-a”-relations between a set of nodes \(O = \{o_1, o_2, \ldots , o_{|O|}\}\). Leaf nodes \(P \subset O\) denominate concrete part types as which parts in the physical world can be classified. We assume a CAD model given for each \(o \in O\) for the purpose of grasp and placement planning. When ascending from leaf nodes upwards towards the root node, encountered inner ontology nodes encode increasingly generic part descriptions. The ontology thus encodes parts families with an increasing level of generalization over part types. An example inspired by the benchmark domains used in our experiments is shown by Fig. 4. Here, different gear and conductor leaf part types are summarized under the more general terms gear and conductor. The approach is intuitively adapted to other domains by specifying a corresponding tree with several levels of generalized part types. Formally, the ontology is characterized by the function \(\text {is\_a}: O \times O \rightarrow \{\textsc {True}, \textsc {False}\}\) with \(\text {is\_a}(o, o') = \textsc {True}\) whenever \(o = o'\) or o is a child of \(o'\). In all other cases, \(\text {is\_a}(o, o')\) is \(\textsc {False}\).

Fig. 4
figure 4

A part ontology tree encodes “is-a”-relations to group different part leaf types into more generic type descriptions represented by inner tree nodes

Regarding the part location, we distinguish two cases: A location can be known precisely and, hence, be specified by a rigid body transform \(^{\text {w}}T_{\text {part}}\in \mathbb {R}^{4\times 4}\) indicating the object translation and rotation with respect to some world frame \(\text {w}\). This is e.g. the case for object recognition results, for parts provided on workpiece carriers etc. In the second case, a part location is not given precisely, but only within a certain tolerance. These two concepts can be captured by a unified formalization: Let \(L = \{l_1, l_2, \ldots , l_{|L|}\}\) denote a set of locations relevant to the task. A location \(l_i \in L\) may describe the precise position and orientation of some place where parts are usually located (e.g. the output slot of a parts feeder). Let \(L^{\text {prec}} \subseteq L\) denote these precisely known locations, each associated with a rigid body transform \(\text {pose}(l_i) \in \mathbb {R}^{4 \times 4}\) (\(l_i \in L^{\text {prec}}\)). In addition to these precisely known locations, elements of L may also describe a 2-dimensional area on the workbench surface, a 3-dimensional volume defining the interior of a box etc. We will see in Sect. 2.3 how L emerges from the visual programming process. For the planning process (Sect. 2.4), each location \(l_i \in L\) is associated with a location function \(\text {is\_at}_{l_i}: O \times \mathbb {R}^{4 \times 4} \rightarrow \{\textsc {True}, \textsc {False}\}\). These functions are designed to output \(\text {is\_at}_{l_i}(o, ^{\text {w}}T_{\text {part}}) = \textsc {True}\) for a part type \(o \in O\) and transformation \(^{\text {w}}T_{\text {part}}\in \mathbb {R}^{4\times 4}\) if and only if some part of type o with pose described by \(^{\text {w}}T_{\text {part}}\) is at the location denominated \(l_i\). Our system currently supports \(\text {is\_at}_{l_i}\) functions for comparing equality of precise positions, and for checking whether parts lie in planar workspace areas considering their axis-aligned bounding boxes \(\text {aabb}(o)\) (Fig. 5). The formalism allows for integrating more complex location specifications in future work (e.g. spatial relations between parts).

2.2 Task Models with Degrees of Freedom

Our goal is programming tasks that can be adapted to product and process variety at execution time. To this end, we first define the notion of part templates which capture boundary conditions that parts used in a task must satisfy. A part template \(p = (p^{\text {type}}, p^{\text {loc}})\) combines an arbitrary node \(p^{\text {type}} \in O\) from the part type ontology with a location \(p^{\text {loc}} \in L\). It describes a part with parameters that are possibly only partly known during the visual programming procedure, e.g. a conductor that may be either red, green, or blue and that lies at any position within a larger area on the workbench. Part templates enable task models with a certain degree of generality regarding part types and locations: In our framework, each task \((T, \prec _T)\) is composed of partially ordered operations \(T = \{\tau _1, \tau _2, \ldots , \tau _{|T|}\}\). The partial order \(\prec _T\) defines assembly precedence relations between operations, i.e. some operation \(\tau _i \in T\) must be done before \(\tau _j \in T\) (\(i \ne j\)) if and only if \(\tau _i \prec _T \tau _j\). This task model is well known from the assembly planning domain [10] and suited for flexible production settings. We further describe each operation with a pair \(\tau _i = \left( p_i, l_i\right) \) of a part template \(p_i\) and a part goal location \(l_i \in L\). The model thus covers any sort of operation where a part is transferred to a new location by the robot. This comprises basic pick-and-place actions as well as operations during which the transfer requires more sophisticated robot control (e.g. force-supervised gear meshing, see Sect. 3).

Fig. 5
figure 5

Our task editor (left) combines icon-based precedence graph modelling (a) with part creation in a virtual workspace (b). The modelling process outputs task models with associated operators to compare locations and part types (right)

Task models as defined above are underspecified, and each part template must be anchored to a physical entity when the task is executed (Sect. 1). To this end, the robot builds a world model \(W = \{\hat{p}_1, \ldots , \hat{p}_{|W|}\}\) containing all entities perceived on camera images. Entities are encoded by part states. Contrasting to part templates, part states \(\hat{p} = (\hat{p}^{\text {type}}, \hat{p}^{\text {loc}})\) combine an ontology leaf node \(\hat{p}^{\text {type}} \in P\) and a precise location \(\hat{p}^{\text {loc}} \in L^{\text {prec}}\) as detected by object recognition. We say that an operation \(\tau _i \in T\) may be applied to a part state \(\hat{p} \in W\) if and only if \(\hat{p}\) satisfies the part template \(p_i\). Validation of this connection between part templates and states is achieved with a satisfies-function (Eq. 1).

$$\begin{aligned} \text {satisfies}(\hat{p}, p) = {\left\{ \begin{array}{ll} \textsc {True} &{} \text {if } \text {is\_a}(\hat{p}^{\text {type}}, p^{\text {type}})\ \wedge \ \text {is\_at}_{p^{\text {loc}}}(\hat{p}^{\text {type}}, \text {pose}(\hat{p}^{\text {loc}}))\\ \textsc {False} &{}\text {otherwise} \end{array}\right. } \end{aligned}$$
(1)

2.3 Visual Programming

Users create task models by interacting with a graphical editor shown in Fig. 5. To this end, it is first necessary to specify part templates for each part to be used during the task. A new template can be added by choosing its part type and initial part location. The user is in charge of selecting from the part type ontology appropriately so that the desired level of task generalization is reached. The selection of locations is supported by a virtual representation of the workspace. In the virtual workspace, a workspace layout as introduced in our prior work [16] offers pre-defined regions to be chosen as part locations (e.g. \(l_4\) in Fig. 5, left). For each area defined by the layout, a location function based on the area corner vertices is instantiated and added to the location set L (Sect. 2.1). If the user prefers to specify part poses precisely (\(l_1, l_2, l_3\) in Fig. 5), additional location functions are defined by corresponding precise poses. Having specified all parts, pick-and-place operations may be added. Finally, the operations are connected with precedence relations using the icon-based editor component. Currently, the system is based on a single pick-and-place skill – suitable control algorithms are derived from annotations to the part type ontology (e.g. force-supervised gear meshing vs. position-controlled placement of our benchmark conductor parts). Yet further classes of skills, e.g. for visual inspection or presentation of parts to the user for collaborative steps, can be added in the future.

2.4 Plan Instantiation

Having modelled a task with operations \(T = \{\tau _1, \ldots , \tau _{|T|}\}\), users need to prepare the workspace by supplying necessary parts to the robot. After an active vision exploration procedure (see [2] for an overview of applicable methods), the robot has all detected parts stored in its world model \(W = \{\hat{p}_1, \ldots , \hat{p}_{|W|}\}\). The next step is solving the anchoring problem as introduced in Sect. 1, i.e. mating each part template \(p_i\) of operation \(\tau _i\) with a part state \(\hat{p}_j\) so that \(\text {satisfies}(\hat{p}_j, p_i)\) holds. Assuming that the user has provided at least one part for each operation (\(|W| \ge |T|\)), this means \(\mathcal {O}(|W|!)\) possible assignments. Enumerating and testing those to find a valid solution, clearly, is a computationally infeasible combinatorial problem even for small |W|. However, we can apply efficient combinatorial optimization algorithms to this unbalanced assignment problem, e.g. the well-known Kuhn-Munkres algorithm [12] with \(\mathcal {O}(|W|^3)\) runtime complexity:

Let \({\textbf {C}} = (c_{i, j})\) denote a \(|T| \times |W|\) cost matrix with a row for each part template and a column for each part state. Any wrong assignment of \(\hat{p}_j\) to \(p_i\) is modelled to have infinite costs, whereas a correct assignment has no costs, i.e.

$$\begin{aligned} c_{i, j} = {\left\{ \begin{array}{ll} 0 &{} \text {if } \text {satisfies}(\hat{p}_j, p_i)\\ \infty &{} \text {otherwise} \end{array}\right. }, i \in \{1, \ldots , |T|\}, j \in \{1, \ldots , |W|\}. \end{aligned}$$
(2)

Given \({\textbf {C}}\), combinatorial optimization computes an optimal, injective assignment \(f: \{1, \ldots , |T|\} \rightarrow \{1, \ldots , |W|\}\) which minimizes the total assignment costs \(\sum _i c_{i, f(i)}\) (\(i \in \{1, \ldots , |T|\}\)). In our case, f says that part template \(p_i\) of operation \(\tau _i\) must be associated with part state \(\hat{p}_{f(i)}\) to incur the minimum cost assignment. By construction of \({\textbf {C}}\), any solution involving a wrong assignment (cf. Fig. 2) leads to infinite overall costs. This means in practice that the user has not supplied all required parts to the workspace—in this case, our system outputs an error message to inform about missing parts. By contrast, a solution f with 0 overall costs means that each part template was matched with a suitable entity in the workspace. The process can then proceed to the task sequencing step.

The task sequencing procedure prepares a fully specified sequence of operations to be executed by the skill engine. For each operation \(\tau = (p, l)\), a suitable input entity matching p is known from the above assignment f. We further use a grid-based placement planner that determines precise part goal locations whenever the operation goal location l is an area. Finally, the precedence graph is transferred into a sequence that complies with all “earlier-later” relations. The fact that we are using a graph structure as task model opens a range of future possibilities here: Aside from searching for an operation sequence that optimizes energy consumption or other secondary criteria, planning of collaborative action with a human-robot scheduler would also be feasible at this point in the process.

3 Experimental Validation

We have modelled four benchmark tasks (Sect. 2.3) which are designed to illustrate specific aspects of product and process variety (Fig. 6a): Product variety is represented by task S1, in which gears of arbitrary types (red, blue, green, cf. Fig. 4) are assembled with force-supervised robot control. Task S2 is a kitting task, where a connector of each type is added to a bundle of three. Tasks S3 and S4 replicate assembly tasks of electrical circuits with a serial/parallel connection. The tasks S2–S4 use region-based initial locations, thus enabling convenient part feeding by the user. Task S2 furthermore allows for the bundle to be placed anywhere within an area. We have executed each task with different workspace configurations (e.g. S1 with different part types, S4 with orderly or arbitrarily placed connectors, cf. Fig. 1). Online adaptation and task execution in these differing settings was achieved successfully.

Fig. 6
figure 6

Our experiments comprise different benchmark tasks S1-S4 (a, goal states are rendered transparently). Adaptation time measurements enable a comparison of our approach and manual re-programming for different lot sizes (b)

Moreover, a theoretical comparison of the effort needed for adaptation with our approach versus the traditional re-programming method was conducted. We say that a production cycle consists of executing a task N times, i.e. finishing N instances of a product. By introducing the flexibility demand ratio \(FD = \frac{1}{N}\), we characterize the manufacturing setting, i.e. traditional mass production with hardly any adaptations for \(FD \rightarrow 0\), decreasing lot sizes for \(FD \rightarrow 1\), and one-off products for \(FD = 1\). The adaptation effort per cycle of our approach depends on N, as each program execution is preceded by exploration and assignment planning—re-programming effort is not required during a cycle as the task models for S1–S4 have covered all necessary adjustments. During the experiments with our benchmark tasks, an exploration time of about 9 s was measured whereas the planning time was negligible. By our definition, the effort per cycle for adaptation by visual re-programming is independent of N and therefore constant. However, the re-programming time including loading and saving the task model depends on the degree of necessary changes. We have considered three cases where only one operation or corresponding part (minimum effort); half of the involved parts (medium effort); or all parts (maximum effort) need to be adjusted in the task model between consecutive cycles. Representative durations of these three re-programming types have been gathered by observing an expert operate our task editor (min. \(\approx 31\) s; med. \(\approx 80\) s; max. \(\approx 110\) s).

Figure 6b compares the time allocated for adaptation within a production cycle depending on FD. In general, our approach achieves better results than manual re-programming in highly flexible domains, i.e. for higher FD values, since task variants are widely encoded in the task model. In particular, it performs better for lot sizes of three or less, even when considering re-programming with minimum effort. In other words, less adaptation effort is needed with our approach compared to manual re-programming for finishing three products—this confirms our hypothesis regarding economical efficiency (Sect. 1). For medium and maximum re-programming effort, this amortization threshold shifts towards larger lot sizes. However, the effort for exploring the workspace before each task iteration renders re-programming more efficient in mass production settings with relatively few changes. These quantitative results must of course be interpreted within the limits of our benchmark tasks. Yet, our analysis illustrates qualitative relationships that are transferable to other scenarios and applications.

4 Conclusion and Future Work

In this paper, we have contributed a visual programming and robot task execution approach that incorporates product and process variety. For this, part templates are specified as input to robot skills in terms of approximate locations and generalized parts families. This leads to partly ambiguous, underspecified task models capturing a set of task variants. Adaptation to concrete parts is achieved online by workspace exploration and combinatorial optimization to anchor ambiguous part templates to perceived concrete parts. Our experiments with a set of characteristic benchmarks show how this approach helps to reduce the (re-)programming effort of robots in flexible manufacturing settings.

We will address several limitations of the approach in future work: Currently, the task structure and number of processed parts are fixed. Further task variety could be achieved by augmenting the task model with constructs as loops for situation-dependent repetition of operations. Furthermore, we will extend the approach towards human-robot co-working by integrating multi-agent scheduling. Finally, our concept needs a comparison with other visual programming systems to evaluate the impact of generic part and location descriptions on usability.