Introduction

Digitisation and the increasing demand for unique and customised products have motivated manufacturing industries to move beyond the mass production paradigm and develop smarter production systems by involving humans as their core parameter (Bragança et al., 2019; Sherwani et al., 2020) This has led to a concept named Industry 5.0 (I5.0) (Industry 5.0: Towards More Sustainable, Resilient and Human-Centric Industry, 2021) where humans work in parallel with robots. Although smart machines, a key development in Industry 4.0, are more precise and accurate than humans, robot-centric manufacturing systems suffer from limitations in a more custom production environment as robots lack the flexibility and adaptability of human workers. Therefore, by combining humans’ coordination, proficiency and cognitive potential with the accuracy, efficiency and dexterity of robots, I5.0 brings together the best of both to enable a mass customization production environment.

Human–robot interaction (HRI) can generally be classified into three categories as illustrated in Fig. 1b–d. Figure 1a shows the traditional industry where robots are totally isolated from human operator. Then in HRI, first, Human–robot coexistence, in which both humans and robots work together but never overlap each other’s workspace and have different tasks. Second, human–robot cooperation, in which both human operator and robot have some common goals or tasks to perform. In this category, both agents work in a shared space but do not come in direct contact. The third category of HRI is Human–Robot Collaboration (HRC), in which both agents interact with each other in one of two ways:

  • Physical collaboration, in which robots are enabled to identify and predict human interactions using force or torque signals.

  • Non-physical collaboration, in which the information between human and robot is exchanged using either direct (gestures, speech etc.) or indirect communication (facial expression, human presence etc.) (Hentout et al., 2019)

Fig. 1
figure 1

Various level human–robot working in an industrial environment

Multiple robotic systems and architectures are present in industry, including serial robot manipulators (either with fixed base or mounted on mobile platforms), and parallel systems (for example delta robots and gantry or Cartesian systems). A robotic manipulator is an arm-like structure that is mainly used to handle materials without direct contact with the operator. A manipulator can be mounted on a fixed base, and carry out specific tasks by moving its end-effector [e.g. a PUMA robot (Jin et al., 2017)], or it can have a mobile platform to move around. Robots have been utilised in the manufacturing industry for decades to enhance production speed and accuracy. Industrial robots traditionally operate inside cages, isolated from humans for safety reasons. The ability to have robots sharing the workspace and working in parallel with humans is a key factor within the I5.0 concept and is at the core of the smart, flexible factory.

In HRC, a robot may help its human co-operator carry and manipulate sensitive and heavy objects safely (Solanes et al., 2018) or to position these precisely by hand-guiding (Safeea et al., 2019a, 2019b). Moreover, as robots are becoming cheaper, more flexible, and more self-governing by incorporating artificial intelligence, they may replace human workers, while others, which work alongside workers—complementing them, are called collaborative robots or cobots.

Cobots have now been implemented in many fields and sectors. In production, they are used in manufacturing, transportation (autonomous guided vehicles or logistics) and construction (bricks or material transfers). Moreover, there has also been a rise in cobot applications in the medical field for various operations, including robotic surgery (Sefati et al., 2021) (manipulation of needles or surgical grippers), assistance (autonomous wheelchairs or walking aid) or diagnosis [automatic positioning of endoscopes or ultrasound probes (Zhang et al., 2020a, 2020b)] etc. The use of cobots in the service sector is also rapidly increasing. Cobots have potential for growth in the coming years for various applications like companionship, domestic cleaning, object retrieving or as chat partners (Nam et al., 2021; Zhong et al., 2021). Sectors like military and international space exploration are also taking advantage of collaborative robots (Roque et al., 2016). For instance, the Space Rider (a planned uncrewed orbital lifting body spaceplane) spent months on international space missions where it often encountered debris. Therefore, for the inspection and thorough cleaning of the plane, a cobot was used (Bernelin et al., 2019). Moreover, recently cobots are also seen working actively as front-line workers during the global pandemic of COVID-19 (Deniz & Gökmen, 2021).

Manufacturing industries are eager to replace traditional robot manipulators with cobots due to their cost-effectiveness, safety and intuitive user interfaces (Ma et al., 2020). Cobots are especially affordable for Small and Medium Enterprises (SMEs), which face difficulties automating their manufacturing using traditional industrial robots (Collaborative Robotics for Assembly and Kitting in Smart Manufacturing, 2019). On the other hand, Multi-National Corporations (MNCs) are equally interested in deploying cobots to maintain competitiveness and ensure their factories adapt to the next level of advancement in manufacturing. For example, cobots have been introduced in the Spartanburg site of BMW Group to improve the worker’s efficiency by taking over the repetitive and precise tasks like equipping the car doors from inside with sound or moisture insulation etc. (Innovative Human–robot Cooperation in BMW Group Production., 2013).

An industrial cobot is developed for direct interaction with human co-workers to provide an efficient manufacturing work environment to complete tasks. Cobots can assist humans in various industrial tasks like co-manipulation (Ibarguren & Daelman, 2021), handover of objects during assembly (Raessa et al., 2020) picking and placing materials (Borrell et al., 2020), soldering (Mejia et al., 2022), inspection (Trujillo et al., 2019), drilling (Ayyad et al., 2023), screwing (Koç & Doğan, 2022), packaging etc., on a manufacturing line. They can also relieve human operators (Li et al., 2022) and precisely and quickly lift and place loads (Javaid et al., 2021). To perform these tasks, cobots need to actively perceive human actions.

Perception of human actions and intentions is critical to have a safe and efficient collaboration. This is achieved in cobots in different ways. In some scenarios, the human operator guides the cobot manually using hand or facial gestures and sometimes using voice commands (Neto et al., 2018). In other cases, the cobots are equipped with their own sensing modalities to establish awareness of their environment (Han et al., 2019; Siva & Zhang, 2020), recognise objects (Juel et al., 2019) and behavior of things (Berg et al., 2019; Ragaglia et al., 2018; Sakr et al., 2020), or detect and avoid collisions (Su et al., 2020; Zabalza et al., 2019) to ensure their own safety and that of their co-workers. As in HRC human workers work in very close proximity to cobots, the safety of the human operators is of utmost importance.

To equip the cobot with perception to keep track of its working space and the entities present in it, sensors are used. Sensors comprise an essential part of robotics to accomplish any task. In robotics, sensors are generally categorized in two groups. First, internal or proprioceptive sensors, come fixed within the robot for example, position sensors, motors, and torque sensors at joints. These obtain information about the robot itself. Second, external or exteroceptive sensors, for example, cameras, laser scanners, and IMUs, which gather information about the environment. The data acquired from both type of sensors can be used to analyse the circumstances of the workspace and judge the state of the robot, which, in turn, helps to control and regulate the defined tasks. In a collaborative environment, where humans are working side by side with robots, one of the vital tasks is to enable the robot to detect the presence of humans in its workspace.

Recent review studies in robotics focus on applications of cobots (Afsari et al., 2018; Hentout et al., 2019; Knudsen & Kaivo-oja, 2020; Wang et al., 2020a, 2020b), programming methods used in collaborative environments for various purposes (De Pace et al., 2020; Villani et al., 2018; Wang et al., 2019; Zaatari et al., 2019), specific tasks like gesture recognition (Liu & Wang, 2018), path planning (Manoharan & Kumaraguru, 2018), the methods used for human–robot collaboration (Halme et al., 2018; Martínez-Villaseñor & Ponce, 2019; Wang et al., 2019, 2020a), or tools/medium-specific systems like vision or inertial etc. (Majumder & Kehtarnavaz, 2021; Mohammed et al., 2016).

Li and Liu (2019) reviewed the standard sensors used in industrial robots and a brief working principle. They have further discussed the applications in which these sensors have been used such as HRI, Automated Guided Vehicles (AVG) navigation, manipulator control, object grasping etc. Ogenyi et al. (2021) presented a survey on robotic systems, sensors, actuators and collaborative strategies for a cobot workspace. The robotic systems are also discussed in detail under categories including collaborative arms, wearable robotic arms and robot assistive devices. Ding et al. (2022) present an overview of state-of-the-art perception technologies used with collaborative robots, focusing on algorithms for fusing heterogeneous data from different sensors. The existing sensor-based control methods for various applications in a human–robot environment have been discussed in a recently published work by Cherubini and Navarro-Alarcon (2021). The authors surveyed the sensor types, their integration methods and application domains.

The existing literature, discussed above, reviews various aspects of human robot collaborative environment including application of cobots, programming methods to detect collision, specific tasks such as gesture recognition or path planning, tools/medium-specific systems like vision or inertial sensors etc. In terms of sensor specific literature, reviews on standard sensors used in industrial robots or their applications in HRI are documented. In contrast to the existing literature, this paper provides an in-depth review of sensors and methods that have been utilised specifically in detection and tracking of humans in an industrial collaborative environment (with both fixed and mobile base cobots). This study provides a complete overview of sensor types, models, and locations (where they were mounted in the workspace) when detecting the obstacle. Moreover, we discuss in detail the pros and cons of each sensing approach and the emerging technologies for human detection and tracking. Due to the versatile nature of the obstacle detection module, we limit our review to manipulator workspaces in this study.

The rest of the manuscript is organised as follows: section “Obstacle detection and collision avoidance in cobots” describes the cobot systems available in the industrial domain and also the importance of external sensors in completing cobot tasks. Section “Sensors used for detecting obstacles” discusses in detail the types of sensors presented in the relevant literature. Section “Methods used for detecting the human in a human–robot collaboration environment” discusses and highlights the various algorithms used for detecting humans in industrial collaborative environments. Section “Discussion” discusses the limitations and advantages of the various sensor modalities and briefly discusses the research on sensor fusion and its benefits in this context.

Obstacle detection and collision avoidance in cobots

The interest in HRC has increased vastly in recent years from both the research and the industrial perspective. One of the main reasons for that is the advancement in technology that has made robot systems safer around human co-workers, typically using geometric design to limit pinching risks and force calculation or estimation to detect collisions. With respect to the robot itself, there are several ‘safe’ generations of robots which allow collaborative work between the robot and human workers. Rethink Robotics (https://www.rethinkrobotics.com/) offers a 7 degree of freedom robot arm, Sawyer. The key feature is that the joint motors incorporate in series with an elastic element (a mechanical spring) to ensure that the robot arm remains soft (flexible) to external contact even in case of software failure. The collaborative robots from Universal Robots (https://www.universalrobots.com/) look like traditional industrial robots but are certifiable for most HRC tasks according to the ISO 15066 (ISO/TS 15066:2016, 2016). They include several features such as, force detection and speed reduction if a human is detected by external sensors. Kuka robotics (https://www.kuka.com/) has introduced LBR iiwa (KUKA AG, 2021) which is a powerful but lightweight robot with extremely sensitive torque sensors in the joints. The torque sensors provide immediate information regarding contact with the environment, which can be used for avoiding unsafe collision. While more recently Kuka robotics have complemented the LKBR iiwa with the LBR iisy system (KUKA AG, 2024).

Moreover, almost every robot manufacturer now includes cobots in their portfolio. For example, Fanuc launched its CRX Collaborative Robot Series (Fraka https://www.franka.de/), ABB group (https://global.abb/group/en) (dual-arm robot), Yaskawa has Motoman HC10 (YASKAWA (https://www.yaskawa.eu.com/)), or COMAU with its cobot AURA (COMAU (https://www.comau.com/en/)). Most of these new generations of robots offer maximum flexibility to be programmed even by those without specialised robotics training, having intuitive methods to teach the robot about its environment and surrounding obstacles (De Gea Fernández et al., 2017).

However, it should be noted that cobots cannot be considered intrinsically safe because of their design features, indeed prior to deployment the cobot is considered a partially completed machine. For instance, a cobot will decelerate and stop when the force limit due to collision with an object is exceeded. However, the definition of the force limit is application dependent. To ensure the system is safe, time-consuming mechanical testing must be carried out to demonstrate that any collision at any configuration in the reachable workspace does not exceed the force and pressure limits defined per body part and per collision type in ISO TS 15066. Additionally, these tests must consider any potential tool and/or environmental changes. Thus, to improve system productivity it is often imperative to implement an efficient collision avoidance in a human–robot collaborative environment i.e., the obstacles need to be detected, and their motion needs to be predicted to prevent contact occurring.

For this study, we have categorised the previous research work into three categories, entirely simulated (simulated or generated data and an augmented environment), partially simulated (real sensor data but augmented environment) and hardware implementation (real sensor data and real physical environment). Moreover, in a robot workspace the obstacles can be static/fixed (e.g. machine equipment) or these can be dynamic (moving objects like human co-workers) (Majeed et al., 2021). The three defined categories are detailed below and shown in Fig. 2.

  • Entirely Simulated (ES): a simulated version of the cobot is used, in different scenarios, using pre-defined or recorded data. For example, Safeea et al. (2020), presented a work in which they have applied newton’s formula for collision avoidance. The method was tested in a simulated environment where the obstacles and targets were assumed only by the coordinate’s positions. Moreover, in another work (Safeea et al., 2019a), Safees et al., calculated the minimum distance between obstacles and human operators using QR factorization method. The minimum distance evaluations thus help in efficient collision avoidance. In another work, Flacco and De Luca (2010) presented a novel approach to avoid collision by formulating a probabilistic framework of cell decomposition using both single and multiple depth sensors. The work was implemented in a purely augmented environment using simulated data.

  • Partially Simulated (PS): In this category, the data was collected from real sensors while treated in a simulated environment. For example, Yang et al. (2018) collected multi-source data using Kinect and leap motion sensors. The comprehensive surrounding information was collected by fusing the vision data, using a point cloud, and the operator’s current movement data (captured using leap motion sensors attached to the worker’s hand). In this category, the developed models are not tested on real manufacturing robots, which would require further tuning and testing when deploying in a real time environment.

  • Hardware Implementation (HI): In this category, for efficient collision avoidance, both dynamic and static obstacles were detected using real data, and the experimentation is also physically performed on a real robot. For example, Safeea and Neto (2019) investigated the use of laser scanners along with inertial measurement units (IMUs). Data from both kinds of sensors was fused to find the position of human worker. Further, potential field method was used to avoid the collision between a worker and Kuka iiwa robot.

Fig. 2
figure 2

Three categories of obstacle detection in an industrial workspace containing a manipulator

The use of external sensors monitoring the work-space environment can facilitate the adaptation of classical robot systems to collaborative environments by providing extra layers of safety for human co-workers. Therefore, the main focus of this paper is to review the third category listed above, where real-time data is captured using different types of sensing modalities, and the system is implemented in a real industrial work-space. However, the second category (i.e., PS) is also discussed to some extent.

Sensors used for detecting obstacles

Sensors are of utmost importance for collaborative industrial robots to complete their operations. In particular, when the robot cannot be considered safe, due to for instance a dangerous end-effector, the use of external sensors and monitoring of the shared work-space, can enable the system to be used in a coexistence environment. Even as the manufacturing industry increasingly introduces new safe cobot technologies, the use of additional sensing modalities in the workspace provides additional information, thus providing an extra layer of safety for human operators. For example, if the cobot has a sharp object at its end effector, any collision with a human operator will be very dangerous, likely exceeding the pressure threshold even if contact force is minimal. In this case, it is necessary to guarantee that no collision with a human can possibly occur. Furthermore, even in scenarios where contact below the threshold is allowable, the robot must be stopped on contact. This leads to reduced productivity as there is usually a delay as the system must be manually restarted. Hence, external sensors not only provide safety by avoiding collision, but can also help to improve productivity.

Sensors acquire information from the shared human–robot work-space on the state of the robot and the environment. With the help of this information, the controller issues instructions for the robot to complete the appointed tasks. Sensor-based obstacle detection strongly resembles our central nervous system (The Brain’s Sense of Movement, 2002) and its origin can be related back to the servomechanism problem (Davison & Goldenberg, 1975). For instance, in image-based environment servoing (Cherubini & Chaumette, 2012; Perdereau et al., 2002) vision sensors are used to obtain visual feedback to control the motion of the robot. Shared work-space information can be acquired using different kinds of sensing modalities in an industrial environment. In a shared workspace there are two phases: pre-impact/collision and post-impact/collision. Hence, the type of sensors also can be divided into these two categories. In this section, the type of sensors used in pre-impact/collision and post-impact/collision are discussed. Examples of sensors used for obstacle detection in a human–robot collaboration environment are shown in Fig. 3.

Fig. 3
figure 3

Examples of sensors used for obstacle detection in human robot collaborative environment

Sensors used in pre-impact/collision phase

Visual sensors

Visual sensors have evolved expeditiously over the past few years. They are now being used in many fields like autonomous vehicles (Guang et al., 2018); enhancing security using face recognition (Kortli et al., 2020); detecting abnormal behavior in scenes (Fang et al., 2021); and human arm motion tracking in robotics (Palmieri et al., 2020) etc. Visual sensing technology includes various camera types, such as RGB cameras, hyper-spectral and multi-spectral cameras and depth cameras (Li & Liu, 2019). Different types of cameras provide diverse data. RGB cameras, the most common in daily life, are designed to create two-dimensional images that simulate the human vision system, capturing light information in three color wavelengths, i.e. red, green and blue, and their combination (INFINITI ELECTRO-OPTICS (https://www.infinitioptics.com/)). Depth cameras add distance information to simple 2D RGB images, thus creating stereo imaging. According to their operating principle, these can be categorised as RGB binoculars, Time of Flight, or structured light sensors. Even though processing of data captured from visual sensors can be time consuming and complicated, still, these enjoy vast popularity due to the benefits of being economical, convenient, and the vast amount of supplied information.

The majority of the research literature on obstacle detection and avoidance in a robotic environment is based on vision sensors (Melchiorre et al., 2019; Mohammed et al., 2016; Perdereau et al., 2002; Schmidt & Wang, 2014; Wang et al., 2013). The camera may be fixed somewhere in the work-space or may be mounted on some moving part of the robot. Khatib et al. (2017) used a depth camera (Microsoft Kinect) mounted on the EEF (eye-in-hand) or on the worker’s head. They achieved a coordinated motion between a KUKA LWR IV robot and human in a ROS environment. However, in this work three markers were placed around the robot for continuous camera localization and the detection was not real time. In another work, Flacco et al. (2012) proposed a real time collision avoidance approach by calculating the distance between the robot and possible moving objects. A Microsoft Kinect depth sensor was used in this work which was mounted on the top of the robot workspace. Moreover, Indri et al. (2020a, 2020b) and Rashid et al. (2020) used IP and simple RGB cameras to acquire imaging data. However, in most of the literature 3D or depth cameras have been used (De Luca & Flacco, 2012; Wang et al., 2016). The work done in the industrial field of human detection using vision sensors is presented in Table 1. Moreover, the work is categorized in terms of the main method used to detect the human/obstacle from the sensor data. Some of the methods, particularly Skeleton/Joints detection is used not only with vision sensors but also with inertial sensors (section “Inertial sensors”), hence those works are also presented in the table. The algorithms are discussed further in section “Methods used for detecting the human in a human–robot collaboration environment”.

Table 1 Summary of obstacle detection using vision and inertial sensors in hardware implementation category

Laser sensors

Due to the homogeneity, direction and brightness, lasers are widely used in many fields for various applications (Dubey & Yadava, 2008). Laser sensors usually consist of an emitter, a detector, and a measuring circuit. They are mainly used to measure physical quantities like distance, velocity and vibration. The main types are laser displacement sensors, laser trackers, and laser scanners. The main fundamentals of measuring laser range are: triangulation, time of flight (TOF) and optical interference (Bosch, 2001). TOF refers to the time from projecting to receiving the laser. TOF laser sensors are one of the most used range finders, especially for objects at long distances. The triangulation concept, primarily implemented in laser displacement sensors, uses trigonometric function and homothetic triangle theory to compute the distance to objects. Optical interference works on the principle that the superposition of two light beams with distinct phases will generate fringes with different brightness. It is mainly employed in laser tracker sensors.

Huang et al. (2021) present a robotic disassembly cell comprising two cobots and a human operator. A safety laser scanner along with active compliance control was used to achieve complicated disassembly operations with safe human–robot interaction. In another safe human–robot collaborative work, Safeea and Neto (2019) used a 2D LIDAR along with IMUs to actively avoid the collision of human workers with the KUKA iiwa robotic manipulator arm. We have only discussed few examples here, more details of laser scanners being used in industrial environments, especially for manipulators detecting the obstacles, are summarized in Table 2.

Table 2 Summary of obstacle detection using single laser sensors in hardware implementation category

Inertial sensors

Inertial sensors are one of the most widely used motion tracking sensors. These sensors are comprised of an accelerometer, gyroscope and magnetometer, and the combination of these three is mainly known as an inertial measuring unit (IMU). IMU sensors are mostly placed on the body of human operator, generally close to the joints (Safeea & Neto, 2019). In the literature, there is wide use of IMU sensors in obstacle detection in an industrial environment, as these sensors are cheap (compared to other sensors) and fast. Moreover, as IMU sensors can be attached to every joint or part of interest, it can increase the overall effectiveness and performance of the obstacle detection system (Glonek & Wojciechowski, 2017).

IMU is mostly used in combination with other sensors or technologies to provide an extra layer of information. For example, Corrales et al. (2008) used a GypsyGyro-18 IMU sensor with a ubisense tag to track a human operator in the workspace of P A-10 robotic manipulator arm. Digo et al. (2020) used MTx IMUs with spatial (V120: Trio tracking bar with 17 passive reflective markers) sensors to acquire data from upper limbs of human operator for typical pick and place movements in an industrial environment. Amorim et al. (2021) used 2 IMUs with 6 FLEX3 cameras for the pose estimation of a human operator working in a robotic cell. Further details on the usage of IMU sensors in robot obstacle detection are outlined in Table 3 and also in Table 4 which summarizes multi-modal sensor systems.

Table 3 Summary of obstacle detection using single other sensors in hardware implementation category
Table 4 Summary of obstacle detection using multiple sensors

Other sensors: proximity, ultrasonic, radar and acoustic

Apart from the aforementioned types of sensors, some other sensors have been utilized to some extent in the literature to detect obstacles/humans in robot workspaces, namely proximity sensors, ultrasonic sensors, acoustic sensors, and magnetic sensors.

Proximity sensors detect an object’s presence without coming in contact with it. Based on the fundamental operation of the sensor, they can be categorized as capacitive or inductive. Capacitive proximity sensors can detect anything that carries an electrical charge, while the inductive ones can only detect targets that carry a magnetic charge. The use of these types of sensors in dynamic obstacle avoidance is very limited in the literature. Another common type of proximity sensor is the distance proximity sensor. These sensors are used to detect the presence of objects within the sensing area. Mostly the fundamental operational unit of these types of sensors works on ultrasonic, infrared or radar waves. Sahu et al. (2014) developed a customised sensor base comprising of a force sensor, two capacitive proximity sensors, one inductive proximity sensor, an ultrasonic sensor, and a tactile sensor. The sensors are interfaced using a micro-controller and work as an integral part of the robotic arm.

Ultrasonic sensors are primarily used to detect the obstacle by estimating the object’s distance from the sensor base by sending ultrasonic sound waves towards it and computing the returned echo time. Although these are small in size and cheap in price, as these can only be used for short range distances, these are mostly used in combination with other sensors in a human robot collaborative environment. Dániel et al. (2012) have used the combination of an ultrasonic sensor and an infrared proximity sensor to avoid joint level collision in NACHI MR-20 robot.

Similarly, radar sensors detect the presence of objects by sending electromagnetic waves, and infrared sensors do the same by sending the energy of infrared wavelength (Stetco et al., 2020). One thing to note is that proximity distance sensors (ultrasonic proximity sensor, infrared proximity sensor and radar proximity sensors) and general distance sensors (ultrasonic sensors, infrared sensors and radar sensors) work in a similar manner but provide different output. The main difference between these two is that proximity sensors sense the presence of an object within a specific range, but do not necessarily provide distance information. Distance sensors detect the object and provide distance information. However, there are some sensors present in the market, like the HC-sr04 ultrasonic sensor (Ultrasonic Distance Sensor - HC-SR04 (5V). https://www.sparkfun.com/products/15569), which can be categorised as both. A summary of the usage of these sensors to detect obstacles in a robotic environment is outlined in Table 3.

Multiple sensors

Just as humans use different sensing capabilities, robots also benefit from multiple artificial senses to acquire information about the environment. Acquisition of information from multiple sensors is achieved in two ways:

  • Multiple sensors of the same modality: such as multiple cameras or laser scanners to cover a wider area. Research shows that if two or more sensors are scanning a workspace, if one sensor is unable to cover a space (for example, if it is setup in such a way that there are blind spots or if the sensor fails for any reason), the other sensor will compensate (Schmidt & Wang, 2014).

  • Data from multiple sensors of different modality: such as a camera and a LIDAR installed in a workspace (Kousi et al., 2018). This is usually known as a multi-modal sensor system. A typical obstacle detection system works on a single modality (discussed in previous sections). However, in complex environments, no single sensor modality can handle different situations in real-time. For example, if a system contains both a laser scanner and a camera and the obstacle is out of the laser scanner range, the camera can still detect it. Therefore, the use of various kinds of sensors for the same operation increases the chance of the task being done successfully. For example, Safeea and Neto (2019) used laser and inertial sensors to calculate the minimum distance between the human operator and KUKA iiwa robot.

A summary table (Table 4) containing the studies incorporating multi-sensor or multi-modal systems to obstacles is listed below.

Sensors used in post-impact/collision phase

Tactile (touch) and torque/force sensors

In most human–robot collaborative environments, when a robot approaches a human, it reduces its speed. However, many kinematic concepts should also be considered. For example, when a robot is about to reach its singular configuration, the angular velocity of its joints is exceptionally high even though the tip is often hardly moving (Frigola et al., 2006). In these situations, the robot can be extremely dangerous. Therefore, in this case, it is always better to reduce the speed of every part of the robot (joints as well as end-effectors), and in case of contact, a robot must minimise its contact pressure and force.

For a robot to recognize contact, tactile sensors are used, and in the case of force, torque sensors are used. These two types of sensors are discussed in this section as both come in contact with the obstacle/human operator in case of collision. A summary of these sensors used in an industrial environment is given in Table 5.

Table 5 Summary of obstacle detection using single torque or tactile sensors in hardware implementation category

Tactile sensors equip collaborative robots with touch sensation and thus helps enhance the intelligence of the robot. There are several types of these sensors, including capacitive, piezo-electric, piezo-resistive, and optical (Girão et al., 2013). In an industrial environment, tactile sensing technologies are mainly used for object exploration or recognition. However, in the literature, these have also been used for safe human collaboration. For example, Cho et al. (2017) used a custom-made tactile sensor to grasp an irregular shaped object while avoiding the collision with the cup holding the object. O’Neill et al. (2015) also developed a custom stretchable smart skin, made of tactile sensors, wrapped around the robot arm, so that it can intelligently interact with its environment, particularly sensing and localizing physical contact around its link surfaces. In another work by Vogel et al. (2016) a floor with 1536 tactile sensors was used in the workspace of KUKA KR60 L45 to detect the dynamic obstacles/ human workers as soon as they step on the sensing floor.

A torque sensor converts a torque applied to a mechanical axis to an electrical signal. Nowadays, most cobots that are being developed, come with built-in torque sensors in arm joints. DLR-III Burger et al. (2010), KUKA LBR iiwa [57], UR5e Lightweight (https://www.universalrobots.com/products/ur5-robot/) and many more come with built-in torque sensors in their joints to detect any collision. In the literature, some work is reported on detecting and avoiding collisions using these built-in sensors. Popov et al. (2017) used joint torque sensors of the Kuka iiwa LBR 14 R820 to detect an obstacle. Along with only detecting the collision, they have also calculated the point of contact where the collision happened. To evaluate their point of contact results, they have used data from a 3D LIDAR and a camera as a ground truth to estimate the performance and the true position of collision.

Similarly, Likar and Žlajpah (2014) and Hur et al. (2014) also reported work on detecting obstacles using the robot’s joint torque sensors. However, the use of external force/torque sensors for obstacle detection in a collaborative environment is less common. In one study, Li et al. (2020), used a torque sensor at the JAKA ZU7 robot’s bedplate to detect collision. A novel method was proposed based on the dynamic model that measures the force reaction caused by the robot’s dynamics at its bedplate. In another work, Lu et al. (2006), used a torque sensor on a wristband of robot to detect collision.

Methods used for detecting the human in a human–robot collaboration environment

In this section the main algorithms used to detect/highlight the human in a collaborative workspace are discussed. Where a human and robot are working in close proximity there are two general phases; Pre-Impact/Collision and Post-Impact/collision.

Pre-impact/collision methods

In the pre-collision phase, collision avoidance is the primary task and at least local knowledge of the workspace and the location of obstacles is required. Therefore, like the sensing modalities, the methods that are used in this phase also focus on detecting the object of interest before it comes in contact with the robot.

Visual background/foreground isolation

Background/foreground isolation is a generic process to differentiate the background (workspace including the robot) and the foreground (objects and humans). It can be done in several ways, for example (Cefalo et al., 2017) uses a virtual depth image of the workspace, including a robot. The robot kinematics was used to move the virtual model to match the real robot configuration, and a sequence of linear transformations was defined to obtain the same point and field of view depth image from the Kinect camera. Furthermore, the virtual depth image is subtracted from the real camera image to cancel the robot and fixed space and build a map containing only the obstacles.

Another way to isolate the background from the foreground is to use a reference frame. A reference frame can be any image without the unknown objects that need to be detected. Rashid et al. (2020) uses a reference frame to detect the humans in the workspace. Henrich and Gecks (2008) utilizes multiple images to create a reference image. That reference image is further used to calculate a difference image by evenly subdividing the current captured image into non-overlapping tiles of a grid, where each tile contains some pixels.

Instead of using a fixed reference image, an alternative is to use the previous frame or number of previous frames as a reference and to detect the changed pixels/regions as objects Kahlouche and Ali (2007) and Rea et al (2019) used optical flow to identify the dynamic instants. Furthermore, Bascetta et al. (2011) used a static camera to produce one image as a reference for each time step.

This method is easy and straightforward in the scenario where exact information about the obstacle is not required, and the workspace remains static. Also, it is more suitable when the focus is on detecting not only the human but also other static objects that may come in the way of the robot. However, one of the main limitations of this method occurs when a frame contains overlapping objects. For instance, Kühn et al. (2006) used the difference image method to detect obstacles. However, when the objects are lined up behind each other, the algorithm treats these as one object and they are projected in the same area in a difference image.

Visual object detection

Trained neural networks are being used to detect and locate objects of interest within an image. Convolutional Neural Networks (CNN) have been widely used as these are highly successful in object detection due to their ability to automatically learn features. Indri et al. (2020a) and Kenk et al. (2019) used You Only Look Once (YOLO) based object detector to detect and track humans.

Furthermore, objects can be detected using distance-based sensing modalities like laser, ultrasonic, proximity etc. With these type of sensors, only the distance from the object can be known without identification of the object itself. For example, Huang et al. (2021) used a safety laser scanner for configuring protection zones so that the human and robot do not come close to each other.

Object detection models like YOLO are popular for their speed and accuracy. Additionally, new objects outside the training set can easily be added to the network using transfer learning. Using YOLO with a depth camera enhances its application, beyond object detection, to determination of the object’s distance. However, the object detection models only register the global position and orientation of a person. In a collaborative environment it is also necessary to have knowledge of the location of the human joints and especially how near the hands are to the system. Existing networks such as YOLO are not currently useful in this respect.

Skeleton/joints detection

Human–robot interaction in the industrial environment may be dangerous, especially when human operators are working in close proximity to robots. Therefore, not only the global position and orientation of humans is required, but also the precise localisation of joints/body parts. To detect human joints, different sensors can be used such as vision, IMU, motion capture systems etc. In terms of vision, mainly the OpenNI library from Kinect is used for human skeleton estimation (Zanchettin et al., 2016; Zhang et al., 2020a, 2020b). There are also numerous vision-based pose estimation neural networks like OpenPose. OpenPose is an open-source system for the detection of 2D multi-person body including joints, that uses deep learning and part affinity fields that enable these type of methods to achieve high accuracy while detecting pose in real-time. Antão et al. (2019) used OpenPose with COCO dataset library to detect the body of a human operator working with a UR5 robot.

As well as vision systems, inertial sensors are widely used to detect human body joints. The IMUs are placed on the desired joints, which can then be tracked (Uzunović et al., 2018; Zhang et al., 2020a, 2020b). For example, Neto et al. (2018) placed five IMUs on the upper body of a human operator and kept track of joint motion using an extended Kalman filter. Another way to capture human joints information is by using Motion Capture (MoCap) systems. In MoCap systems, users wear tags or sensors near each joint of the body and the system calculates each joint movement by comparing the positions and angles between the worn tags/sensors. Tuli et al. (2022) used an Xsens motion capturing system with eight tags on human upper body to locate and track the human position.

Skeleton/joints detection method is the safest solution when human and robot are working very closely together, as these can detect and track the various pertinent parts of the human body such as head, hands, arms, torso etc.

Marker based detection

Another method that has been used in an industrial environment to detect a human operator is marker based detection. In this method, human operators wear specific color clothing or markers on their body which are detected and tracked by receivers, such as cameras or Radio Frequency systems. Tan and Arai (2011) adhered sewn colored patches on the operator clothes representing shoulder, elbows and head. Diab et al. (2020) attached RFID tags to the target object. Hawkins et al. (2013) created bright colored gloves to allow the detection of human hands by the vision system. Marker based detection is quick and easy, however, it requires the human to wear the tags, leading to dangerous scenarios if someone enters the workspace without wearing dedicated markers.

Discussion

Sensors constitute a vital part of an industrial workspace where humans and robots work together. In particular, relying simply on force-power limitation to ensure operator safety has stifled the deployment of fenceless collaborative robots in manufacturing. From Tables 1, 2, 3, 4, 5, it may be noted that several research works involve a combination of different sensors. The primary reason behind this being the fact that a combination of sensors of different modalities (or even of the same modality) provides a more reliable detection by complementing each other’s limitations. Moreover, as different types of sensors are useful for different purposes in each application, in Table 6 we summarise some useful characteristics of sensors specifically for the scenario of human detection in a collaborative environment.

Table 6 Direct sensing properties of different sensors used for detecting/avoiding obstacles in a human robot collaborative environment

One of the main advantages of pre-impact/collision methods is that these are capable of detecting the human before any contact takes place. Below we discuss pros and cons of each sensor type outlined in section “Sensors used in pre-impact/collision phase”.

  • Vision sensors: the significant advantage of using vision sensors is that they are non-intrusive and allow a robot to perform various tasks using only one sensor. However, like any other safety related system, implementing redundancy through multiple or multimodal sensors is imperative to ensure safety in case of any sensor failure. Moreover, vision sensing modalities not only have difficulty producing robust information when facing cluttered environments, as light conditions can easily interfere with the visual results, but they also place a high computational burden on the system when using advanced vision algorithms for stereo matching and depth estimation.

  • Laser sensors: laser scanners are fast, accurate in scanning, and can detect objects at a significant distance, but mostly these sensors, especially for 3D scanning, are quite expensive and require high computational power. These also provide a low number of images per second.

  • Inertial sensors: inertial sensors provide data with good precision if used for a short time, but as time goes by, the error accumulates. Moreover, they only work if the human is wearing them and therefore cannot be used in detecting humans who may accidentally enter the robot manipulator workspace without being fitted with sensors.

  • Proximity sensors: proximity sensors are low cost, have low power consumption, and high speed. However, these cover a limited range and can be easily tempered by environmental conditions like temperature etc.

Post-collision sensors are used to ensure minimal injury to a human operator if a collision occurs. Cobots equipped with only these types of sensors must operate at low speeds and with low force when a human is present. Additionally, reliance on force-power limitation requires assurances that the limits respected throughout the workspace, a process that typically requires testing. The pros and cons of these sensors are discussed below:

  • Tactile sensors: tactile sensors can detect an object even when a vision sensor cannot due to an occluded surface. Although tactile sensors are getting more and more attention, their performance is not yet very reliable. The development of these sensors requires advancement in various technological fields like electronics, materials etc. Therefore, despite these sensors having significant potential, there is a long way to go for these to be successfully used in obstacle detection modules of cobots.

  • Force sensors: force sensors only trigger when an object comes in contact; therefore, these are not ideal for avoiding an obstacle.

Along with using multiple kinds of sensors in a system, it is also important to note where these sensors will be installed in the workspace. A sensor must be placed in such a way that it covers the maximum area/angle of the robot workspace. There is no defined location or position to install each kind of sensor. In the literature, researchers have proposed different locations to enhance the system’s efficiency. For example, Perdereau et al.2002 used 5 IMU sensors and placed these on the upper body of a human (two at each forearm, two at each upper arm and one at the chest), while Digo et al. (2020) used 7 IMU sensors, one at the table for reference and six on the human body (one at right forearm, one at right upper arm, two at both shoulders, one at sternum and last one at pelvis). Therefore, in Table 4 we also list the location of installed sensors in each study.

This review of the literature was conducted up to January 2023 and 171 papers were identified as relevant to the topic of external sensors for object detection in the robotic manipulator environment. Of these, 76 papers satisfied the criteria of hardware implementation as described in section “Obstacle detection and collision avoidance in cobots”. Overall, from this review, we can summarize that the use of vision sensors (either single or in combination) dominates with about 41.3% usage in the literature. The use of other sensors, discussed in this article, whether these were used alone or in a multi-sensor system, is presented in Fig. 5. Moreover, if we make a comparison of systems where either a single sensor is used or a combination of these, then we can conclude that the use of multiple sensors (51%) is fairly equal to the use of a single sensor system (48.7%), as shown in Fig. 4. In multi sensors, mainly multiple vision sensors are used but when it comes of different modalities vision with laser, vision with inertial are used most as compared to other combinations. Further, if we focus on single sensor systems, again the vision sensor dominates, having been used in 40% of the relevant studies.

Fig. 4
figure 4

Multiple/multi-modal sensors vs single sensors used for obstacle detection in a robotic manipulator workspace

Multi-modal sensing systems aim to get the best out of different sensors. However, using external sensors in a system means increasing the price and computational cost; thus, adding more sensors adds extra levels of complexity (Wang et al., 2020a, 2020b). The best solution is to utilise optimal mathematical and statistical methods to achieve the best possible accuracy with minimal cost. This requires efficient coding, and algorithms which can make optimal use of noisy data from low cost sensors without a high computational burden. A recent review (Ding et al., 2022), indicates that the main algorithms which have been used so far in the literature for robot perception are stochastic algorithms such as the Kalman filter and particle filters. Researchers have also explored artificial intelligence based algorithms including fuzzy logic and neural networks etc., however to date the AI based approaches are not as mature in terms of accuracy or computational cost.

For the specific task of detecting a human in workspace, various kinds of algorithms are used in literature mainly using vision sensors. From the review, we can summarize that mainly the background/foreground method (41.0%) is used to isolate the human operator or any dynamic moving obstacle from the static environment. The second most used method is skeleton/joints detection (37.2%) using both vision and inertial sensors. Figure 5 illustrates the percentage of research works using each method to detect humans. Moreover Tables 2, 3, and 5, detail methods and algorithms used when vision sensors are excluded.

Fig. 5
figure 5

External sensors used in reviewed articles for obstacle detection in a robotic manipulator workspace and breakdown of most common vision based object detection techniques

Conclusions

In this study, a comprehensive review of sensors for human detection in a robotic manipulator industrial environment is presented. We have grouped the literature review into three categories: entirely simulated (both the data and the environment is simulated), partially simulated (real data but simulated environment), and hardware implementation (both data and environment is real). Focusing on hardware implementation, a range of different sensor modalities have been implemented on a wide range of industry-standard manipulators. From the review, it can be concluded that the most used common sensor technology for obstacle detection is the vision sensor, and specifically depth cameras, due to the rich information they provide about the environment. However, a major drawback of using a vision sensor is that it has difficulty in producing robust information when facing cluttered environments and it also places a high computational burden on the system. To overcome this limitation of a vision sensor and to get more accurate information (especially distance) in any kind of environmental condition, laser sensors have been proposed. Laser sensors provide fast and accurate information but these tend to be expensive, especially 3D laser sensors. Tactile sensors can be useful in aiding the robot in grasping tasks but are of limited application in obstacle detection and avoidance. Inertial sensors are particularly useful for detection and tracking of humans in the robot workspace, but the major limitation of these types of sensors is that these have to be worn by human operators and cannot protect a human who may unintentionally enter the workspace for example. Many works have used multiple sensors for enhanced environment perception, but this also increases the complexity and the cost. Therefore, future research should also focus on methods to not only improve the reliability and accuracy of the system but also to reduce the cost and computational burden of the perception system.