1 Introduction and Motivation

Automation of production processes is a trend that traces back to the industrial revolution or even beyond. In today’s industrial production, machines are an integral part and its importance will even rise in future [7]. With the further digitalization and interconnection of production systems, which is sometimes referred to as Industry 4.0 [8], Smart Industry or Smart Manufacturing [11], the role of human work changes significantly. Tasks of production worker and knowledge worker, such as product development and production planning will intertwine and manual production work will shift to planning, control and monitoring tasks for machines and processes [7]. But even manual production work won’t vanish: with decreasing lot sizes caused by shorter product-lifecycles and higher product variations (mass customization), manual work processes will still be required. However, processes will become more complex and error-prone, since human workers need to be able to produce more different product variations and consequently will have to master a larger variety of single work steps.

Due to the aforementioned trends, complexity and required skills of tasks in industrial work environments will increase. Given the demographic shift [2] and reports about shortage of high-skilled labor in OECD countries [5], the question raises on how this more complex work can actually be handled in future. There is a strong need for supporting employees work in the future of manufacturing by making the complexity of the new industrial environments more manageable. Assistance systems have the potential to address this need. However, whereas in other domains such as car driving, navigation or computer configuration, assistance systems are already state of the art and widely accepted, assistance systems to support workers in industrial settings are still not in widespread use. This situation may change dramatically in the future due to the ever-increasing computation power, new sensors and actuators and new interaction technologies. These advances make smart environments in the context of production feasible [13] that are a requirement for advanced assistance systems. We will call these new smart environments “smart factories”. Based on the definition of Lucke et al. [13], a smart factory is defined as “a factory that context-aware assists people and machines in execution of their task”. This supportive environment spans different levels of the factory from the top floor to the shop floor. To establish a smart factory, a multitude of systems and subsystems are required. While Lucke et al. [13] distinguish between calm-systems (hardware) and context-aware-applications (software), we will focus only on one type of context-aware-applications in this paper, namely on assistance systems. We understand an assistance system in production as a context-aware system consisting of hardware and software that supports a user with the execution of a task and adapts depending on the progress of the task (cf. [1]). Potentially, the system can adapt to other context information, e.g. to specific users and their physical and emotional states or to objects in or the state of the physical environment.

In the last decades, a lot of assistance systems for industrial tasks have been proposed. However, the research landscape in this context is heterogeneous and a clear and coherent overview is missing. Lots of studies exist that present or evaluate specific systems, e.g. [10, 16, 17]. Surveys have been done with respect to a single aspect or with the focus on specific technologies, such as work on Industrial Augmented Reality (IAR) [6], work with a focus on Augmented Reality (AR) and Virtual Reality (VR) [3] or the consideration of Human-Machine Interaction (HMI) in the domain of Industry 4.0 [9, 14]. Still, to our knowledge, a holistic overview and classification of the previous research work on assistance systems in smart factories is missing. This is even more surprising as these systems become more and more relevant for industrial practice. Both researchers and practitioner would benefit from a clear and coherent set of attributes structured in a framework to compare work done in this research field. Scientists new to the field could get a quick insight into the field, others could use the overview work to retrieve relevant works from the mass of publications. Practitioners could derive more easy design decisions from existing work and evaluations by using the classification. Furthermore, they could use the overview work to get an insight into the possibilities that assistance systems could offer for the smart factories of the future. Last, but not least a coherent framework will establish a common ground for discussion in the field and help to identify open research topics and new research questions.

With this paper, we want to close the gap and provide the first framework for assistance systems in smart factories. The framework contains key characteristics of assistance systems and is meant for both researcher and practitioners. We achieve this goal by doing a morphological analysis based on existing work to construct our framework.

The rest of the paper is organized as follows. Section 2 is a methodical consideration about the creation of the framework. Section 3 describes and visualizes the framework. In Sect. 4 we will classify three research projects into our framework, followed by a conclusion and outlook in Sect. 5.

2 Methodological Considerations

In the following, we briefly introduce the research procedure that was executed to construct our framework for work assistance systems (cf. Fig. 1).

Fig. 1.
figure 1

Research procedure

The construction process can be divided in three main stages. At screening of current works, we analyzed related works. All five authors that are engaged in researching assistance systems contributed multiple research works from their academic knowledge base. We then performed a forward and backward search on these articles. At analysis and conceptualization, we constructed the framework and subsequently applied it to the description of existing assistance systems. We then entered the stage of critical assessment and discussed needs for revision which emerged from applying the framework. In line with [15], we furthermore applied all reasoning techniques being deductive, inductive and intuitive. Deductive reasoning (conceptual-to-empirical) was performed when the framework was initially constructed based on literature and in applying the framework. Inductive reasoning (empirical-to-conceptual) was performed when the framework application to real-world systems led to revisions. All reasoning techniques have been applied when moving back and forth between framework development (stage analysis and conceptualization) and discussion and revision (stage critical assessment). In order to continuously refine our framework, an incremental approach has been used in line with [15]. To do so, activities in the last two stages form a cycle. Moreover, we added an inner cycle via a bidirectional relation between framework development and discussion and revision. It was performed in several extensive discussion sessions. All objective and subjective ending conditions of the incremental process suggested in [15] have been met, except the criteria “All objects or a representative sample of objects have been examined”. Since the knowledge bases of the researchers in conjunction with forward and backward search were used, we cannot guarantee representativeness and hence consider our framework preliminary.

3 Framework

A framework is helpful in organizing the huge variety of heterogeneous assistance systems and revealing the areas in which further developments will be required to meet user demands [18]. Our framework shown in Table 1 has been developed adopting an interdisciplinary perspective due to the different scientific backgrounds of the researchers involved, such as computer science, engineering, psychology, economics and design. It is organized in four major categories. These categories integrate features, which characterize assistance systems by selected attributes.

Table 1. Framework

The category information is divided into the features generation and presentation. The first feature focuses on how relevant data is created; either through the power of software developers’ algorithms (automated), through a combination of human and machine intelligence (partly automated) or mainly through manual work processes (manually). Information presentation in contrast describes how the passing of information in terms of complexity is realized. The spectrum ranges from basic (e.g. simple graphics, beeps), through intermediate (e.g. symbols, steps) to complex (e.g. process models).

We chose intelligence as a category for all features that sum up aspects of the system that are a result of data-driven predictions or decisions regardless of the underlying technique in use. Techniques might range from classical Artificial Intelligence approaches leveraging declarative or procedural knowledge representations to more recently discussed techniques such as collaborative interactive machine learning [20]. State detection refers to the ability of the assistance system to gather data about the current condition of tools (e.g. tool tracking), machines (e.g. log files), products (e.g. target/ actual comparison) and finally also the user (e.g. vital data). Context sensitivity partly builds on these data and describes the application in fields such as task (e.g. task-specific instructions), environment (e.g. adaption of the screen due to incidence of light) and user (e.g. individual knowledge and experience). The dichotomous feature learning aptitude finally characterizes the ability of the assistance system to learn from past data in order to improve future behavior.

The category interaction describes the specification of the interface between humans and the assistance systems. The feature control characterizes the execution of the jobs and is therefore partitioned in the attributes human, cooperation and machine. Furthermore, the feature user involvement classifies the level of cognitive, visual and manual distraction and depends on characteristics of the used interaction mode (attributes low, middle and high). On the one hand, we classify the feature input into traditional input devices (e.g. keyboard, joystick, touchscreen) and modern (e.g. motion-based or touchless devices such as gesture control, speech recognition, eye-tracking). On the other hand, the feature output is grouped into visual (e.g. displays, projection), haptic/tactile (e.g. vibration/haptic technology) and acoustical (e.g. speaker, structure-borne sound). The feature extent of immersion describes the level to which the assistance systems are capable of delivering an inclusive, extensive, surrounding and vivid illusion of reality to the senses of a human participant [12]. For our field of work, the attributes none, augmented reality and virtual reality are of relevance.

The fourth category, system characteristics, specifies aspects concerning the construction of the assistance system. The transportability of the system is grouped in stationary (e.g. system integrated into machinery), restricted (e.g. transportation and setup requires some effort) and unrestricted (e.g. mobile devices such as tablet computers). We define robustness as the ability of the assistance system to withstand unintentional events (e.g. soft- and hardware actions) or the consequences of human error without being damaged (attributes: low, middle, high). Finally, we derive the technology readiness level for the technology maturity in accordance with the European Commission definition [2] in low (level 1–3), middle (level 4–6) and high (level 7–9).

All in all, the final framework resembles a faceted classification, since multiple properties (facets or features) with multiple values are captured. However, since facets normally represent “clearly defined, mutually exclusive, and collectively exhaustive aspects […] of a class or specific subject” [19], it is not a faceted classification in a strict sense since we allow that multiple property values hold when a subject is classified.

4 Framework Application

In order to demonstrate the application of our framework, we selected and classified three assistance systems. We selected the systems such that the aspect of diversity concerning the selected systems is emphasized. With this, we want to showcase the generality of our framework. The result is presented in the following sub-sections.

4.1 Intelligent Worker Assistance (Büttner et al. 2017)

The first system, presented in [4], can be described as an intelligent assistance system supporting workers in stationary manual assembly by means of projection-based augmented reality (AR) and hand tracking. Using depth cameras, the system can track the hands of the user and notifies the user about wrong picking actions or errors in the assembly process. The system automatically adapts the digital projection-based overlay according to the current work situation. Such a system contributes to helping the worker in dealing with increasing requirements regarding quality, accuracy and clocking of the assembly processes. The system identifies the respective work piece via computer vision and provides the worker with the corresponding assembly instructions. Depth cameras that are fixed on the ceiling of the assembly station capture the worker’s movements, the workspace and all objects that are situated in it, such as material boxes. This allows to monitor single steps (for example picking a component), intuitively control the production process by means of gesture recognition and successively ensures a correct assembly of the product. Visual aids in the form of text, graphics or video sequences can be displayed directly on the assembly workplace via projections. In connection with the gesture recognition by the depth camera system, the assistance system can also be operated via touch detection.

The generation of information is partly automated while its presentation is clearly complex. The system features a wide range of media from video to rich graphics to simple icons.

With regard to intelligence, the system features state detection in so far as it is capable of tracking tools, the machine or product and the user via its depth cameras. Context sensitivity applies to the task and the user because the system may identify false assembly steps by tracking user hand movements. Learning aptitude can be classified with no, since the system is not able to learn over time.

In terms of interaction, control is to be classified as cooperation because actions and commands can be initiated by both the human and the machine or system. User involvement is low since information are generated by the system. The input modalities are modern, e.g. soft buttons via projection, and the output is visual. The extent of immersion is clearly Augmented Reality.

Regarding system characteristics, the transportability is to be labelled as stationary while the robustness classifies as middle. The system’s technology readiness level falls within the middle (4–6) category.

4.2 TeleAdvisor (Gurevich et al. 2012)

TeleAdvisor, presented in [10], supports remote assistance tasks by enabling live in-situ projections. The system comprises a video camera and a pico-projector mounted on top of a tele-operated robotic arm. Thus, using a desktop interface a remote expert can guide a worker through e.g. a maintenance task by annotating the workspace with visual information such as pointers and text. Active tracking of the projection space is employed in order to reliably correlate between the camera’s view and the projector space. Using the robotic arm, the expert can also control the field of view.

All projected information is generated manually by the expert operating TeleAdvisor via the desktop interface. The expert may create annotations such as text, free-hand sketches, and choose from a set of images and icons, all available in different colours. Hence, information presentation classifies as intermediate.

The systems inherent intelligence does not feature state detection but allows for recognizing the environment in order to align the projection correctly (context sensitivity). The system is suitable for learning only in so far as it facilitates human-to-human mentoring (learning aptitude: no).

In terms of interaction the TeleAdvisor relies on human operation and control. The user involment is high since all information is generated and processed by the users (expert and worker). Input can be described as traditional since the desktop interface is operated by mouse and keyboard while output modalities are restricted to visuals (the source does not mention acoustics). Because of the in-situ projection, the extent of immersion is clearly augmented reality.

Regarding system characteristics, the authors rate their system high in terms of transportability, however, we classify TeleAdvisor as restricted since the source does not mention battery supply which makes it bound to a power cord. Looking at robustness an assessment is difficult due to lacking information. Again, the creators describe the system as very robust (high), so we will follow this assessment. The technology readiness level scores in the middle category.

4.3 Smart-Glasses-Based Service Support System (Niemöller et al. 2017)

The prototype system presented in [16] aims to support service technicians in executing service tasks. It is motivated by the complexity of today’s high-tech products that require an increasing amount of information during service work. In order to provide this information directly within the work process and to guide the service technician, information is displayed on smart glasses controlled via voice recognition. With that, the service technician can work hands-free and interference with manual tasks is minimized. The system is implemented with the glass development kit on Android.

All displayed information is generated manually by experts who can create contents for the Smart-Glasses-based Service Support System using a desktop computer. The expert can create step-by-step guidance and provide a detailed description for each step. Such information can comprise spare part information, pictures, wiring diagrams, videos and technical details. Due to these rich options for the presentation of information including multi-media, presentation of complex information is possible.

In regard to intelligence, neither state detection nor context sensitivity is implemented. However, a limited form of learning aptitude is indirectly available by using the features to easily provide feedback e.g. by making photos and commenting the pictures using voice recording. Such feedback can be processed and the information support could be improved on that basis which might be considered as a form of learning.

In terms of interaction, the control of the system can be considered as cooperative since the user triggers the display of information, but the system can also guide the user with step-by-step-descriptions. User involvement is high since all information belonging to an information object such as an activity has to be requested by the user whereby several requests may be required if the information is complex and must be displayed on multiple screens. Input can be made with modern interfaces such as voice recognition and touch displays attached to the side pieces of the smart glass. Regarding output, visual and acoustic output is possible. Since the smart glass create an information overlay to what is seen in reality, the extent of immersion is augmented reality.

Regarding system characteristics, transportability of smart glasses is unrestricted which is an advantage. However, in regard to robustness, such devices are prone to mechanical damage and hence have to be carried with caution. In terms of the technology readiness level, a distinction has to be made between hardware and software. While maturity of hardware is high since smart glasses are offered from major vendors, software maturity is low since (as of now) it is an academic prototype.

5 Conclusion and Outlook

In this paper, we presented a framework for classifying assistance systems in the context of smart factories. The framework has been iteratively developed by five experts from different backgrounds (computer science, engineering, psychology, economics and design) based on an analysis of literature in the related field of research. The framework consists of four major categories: information, intelligence, interaction and system characteristic. Each of the major categories contains multiple features, where each of the features represents a certain aspect of the system that can be described with the attributes provided in the context of the feature. The selection of attributes is not mutually exclusive, so for some of the features, multiple attributes can be used to classify a certain system, e.g. for the major category interaction and the feature output the attributes visual and acoustic can be used mutually to describe an aspect of the system.

To verify the functionality of the framework, we described and classified three assistance systems that have been provided by previous research projects. For this presentation, we chose three systems from the related literature that were as diverse as possible and had only few attributes in common. The classification of the three systems demonstrates the functionality of the framework well and shows that the framework provides all major aspects that characterize the three systems.

With the framework, we pursue the following three objectives: First, we want to provide a tool for classifying existing and new systems to better understand the aspects of these systems and to identify common key characteristics of assistance systems. Second, we want to found a common vocabulary in the research field. Third, we want to support the identification of research gaps, which will be possible by looking for aspects that are not present in the current generation of assistance systems. These three objectives are not only valuable for researchers in the field. We also aim to provide a better understanding by practitioners in the field who are welcome to use the framework as input for the development process of new assistance systems.

In our future work, we plan to further validate and revise our initial framework by classifying a large set of systems and to identify patterns and common characteristics e.g. of more research-oriented systems and industry-oriented systems.