Keywords

1 Introduction

Over the last years the application of UA for cargo transport has gained importance due to rapid technological advancements and increasing efficiency, payload and flight time. UA are already used to deliver units of stored blood, emergency medication, parcels, food and other goods within urban and rural areas. In terms of industrial usage, UA are increasingly being considered for transport as they allow direct routes, reduced delivery times and can provide a cost-efficient complement to existing logistics solutions. The applications considered in this paper include not only the transport of urgently needed spare parts and tools, but also the cyclical delivery of small and lightweight parts and components [1].

Besides the periodic cargo transport UA are also convenient for the interaction with objects and their manipulation, for example in search and rescue scenarios, construction or maintenance. Focusing on industrial applications, UA equipped with suitable grippers can provide efficient solutions to the extraction of individual parts from various load carriers. Within the load carriers, the components can be provided parcelled and ordered (e.g. cardboard boxes), parcelled and unordered (e.g. plastic bags) as well as loose and unordered.

Key requirements for those applications are the precise localization of the UA in relation to the interaction object as well as a gripping device suitable for the specific task. Based on those requirements, we developed a solution for the precise localization and grasping of different objects. Focusing on object interaction and manipulation the proposed method ensures a precise localization between the UA and the interaction object and is suitable for applications with grippers or different tools.

To validate the proposed methodology, we use an industrial use-case whereby the UA picks up components parcelled in cardboard boxes from a small load carrier (SLC). The evaluation consists of the automated localization and pick-up of the cargo objects, the subsequent transport as well as the defined drop-off of the goods.

The paper is structured as follows: After presenting related work and selecting a suitable gripping technology for the targeted use-case we present the system architecture and provide details on the used approaches for object localization and positioning. The overall system is then evaluated within the aforementioned use case and possible improvements and future work are deduced.

2 Related Work

The working principles of grippers used for ground based robots do not differ from those designed for aerial grasping. Nevertheless, a successful aerial grasping operation is more challenging than handling objects with industrial robots. UA are constantly moving even when hovering at a fixed position. That is why it is crucial to compensate these movements with an intelligent gripper design. Many different approaches have been presented to allow the aerial gripping of cargo objects such as impactive, ingressive, astrictive and contiguitive grippers. The most relevant of these approaches are discussed below [2, 3].

Extensive research has already been conducted in the field of aerial gripping. Many of the approaches focus on the application of mechanical grippers. Those grippers show promising results and allow the grasping of various objects but are limited in their ability to deal with position errors of an UA [4, 5]. Other highly flexible manipulators with several joints such as the one designed by Zhang et al. need a complex control system in order to be usable [6].

Furthermore, magnetic grippers as used by Gawel et al. [7] or Bähnemann et al. [8] require additional constructions to be mechanically compliant and their use is restricted to ferrous objects. Nevertheless, they have a significant advantage over mechanical grippers: Only one object surface needs to be accessible, which is an important requirement in many industrial picking scenarios using SLCs.

Another possibility is to use adhesive grippers that offer similar advantages as magnetic grippers but can handle non-ferrous objects. An exemplary use case applied to UA was studied by a research group of the University of Pennsylvania [9]. They have evaluated a self-constructed adhesive gripper regarding its ability of perching a UA on inclined and vertical surfaces. Adhesion may be used as well for object grasping considering certain prerequisites. It is especially important that the surface of the object is clean so that a grasping force can be applied. In an industrial environment this is hard to ensure except a clean room is available as it is required for the production of electronics.

Besides the mentioned grippers, suction grippers are a valid choice for UA as they can handle large and broad objects and require less positioning accuracy than jaw grippers. On the other hand, the transport objects are limited by their surface structure and to obtain the required holding forces, heavy and energy-consuming vacuum pumps are necessary, leading to lower payloads and reduced flight time. Kessens et al. [10] has presented a system featuring an autonomous quadrotor, an on-board vacuum pump and a gripper with four individual self-sealing suction cups. The gripper provides a maximum holding force of 6 N and is capable of gripping multiple objects with irregularly structured surfaces. Another gripper designed by Kessens et al. [11] is even able to hold forces up to 150 N and is suitable for lateral grasping.

In accordance with the aforementioned forms of provision, suction grippers are especially suitable to grasp objects out of a SLC, as they enable a frictional connection along the surface normal, thus do not require an enclosure of the part, and can apply forces to objects having different surface structures typically appearing in storage environments such as cardboard boxes and plastic bags.

Fig. 1
figure 1

Steps of the presented aerial grasping and transport process. The movement to an initial position (1) roughly above the interaction object is followed by the detection (2), localization (3), grasping (4), transport (5) and drop-off at the destination (6)

3 Methodology and System Design

In this section the individual approaches for precisely localizing the UA and the detection and localization of the interaction objects will be presented. As indicated before, we focus on an industrial picking process and the corresponding general steps required for aerial object interaction and grasping as depicted in Fig. 1. After the take-off phase (not depicted) the UA flies to an initial position, roughly over the interaction object (1). When hovering above the interaction object, it is first identified (2) and then located in order to calculate a suitable grasping point using the object’s estimated pose (3). As soon as the grasping point is reached, the suction gripper is activated and the object is extracted from the SLC (4). After the transport phase (5) the object is dropped off at the destination (6).

3.1 Object Identification and Localization

Approaches available for object identification and localization range from tags and optical markers attached to the objects to feature matching approaches and machine learning solutions. Each method has its own advantages and disadvantages which can be found in the relevant literature.

Within our procedure we apply an object matching algorithm based on features extracted from gray-scale images. Even though independent optical markers provide more accurate results and convolutional neural networks allow faster identification and localization of objects with images, the feature-based approach is still more suitable. It does not require the attachment of additional markers and does not need the extensive acquisition and annotation of training data. The feature-based solution can therefore easily be extended to different objects, which is especially suitable for warehouse applications where a large number of products is stored.

To be able to identify and locate the required objects the procedure is implemented as follows. A gray-scale image of every object side is taken leading to six images in the case of a cubic cardboard box. To uniquely identify the individual objects, we calculate a set of Speeded Up Robust Features (SURF) of every image as described by Bay et al. [12] and store them in a database to quicken the following matching process. During the flight phase the camera image is analysed and the calculated SURF descriptors are compared to the ones stored in the database. The matching is thereby done between the six training images of the grasping object and the camera image using a k-nearest neighbors algorithm in combination with Lowe’s ratio test [13].

Based on the matching features, the perspective transformation between the camera image and the corresponding database image is calculated using the Random Sample Consensus algorithm. For further processing only the detected contour of the grasping object, which is described by its four corner points in the image plane, is required. To increase the robustness of the object detection, the length-width ratio of the rectangular contour is compared with the real object dimensions stored in the object database.

The spacial position of the interaction object in relation to the camera coordinate system is described as a Perspective-n-Point problem using the detected corner points, the known object dimensions and the intrinsic camera parameters. The problem is solved with the direct linear transformation followed by the Levenberg-Marquardt algorithm [14].

To transform the object’s location from the camera coordinate system into the world frame, the known pose of the object and the camera mounting position are used. The overall transformation is described in Eq. 1 and its illustration and the denotation of the indices is shown in Fig. 2.

$$\begin{aligned} {}_\mathrm {W}^\mathrm {O}T = {}_\mathrm {W}^\mathrm {U}T \cdot {}_\mathrm {U}^\mathrm {C}T \cdot {}_\mathrm {C}^\mathrm {O}T \end{aligned}$$
(1)

3.2 UA Localization

As stated before, we focus on a system architecture suitable for industrial applications. Therefore, we do not use a Motion Capturing or satellite-based localization system for localizing the UA but a Kinexon ultra-wideband (UWB) tracking system instead, which provides a tracking accuracy of less than ten centimetres and is widely used to track autonomous guided vehicles or cargo objects in industrial facilities.

While the accuracy of the UWB system is adequate for basic flight operations within a production plant it is not sufficient for gripping the desired objects. Therefore AprilTags, optical black-and-white markers, are used to achieve the required position accuracy. They are placed on the ceiling above the object’s storage place and face downwards so that they can be located using a camera mounted to the upper side of the UA.

The methodology used to fuse UWB tracking data and optical positioning information, an evaluation of the achieved in-flight positioning accuracy as well as the general system architecture used for object interaction with UA is presented in [15]. Thereby it has been shown, that the positioning accuracy based on UWB and additional optical tracking information is sufficient for grasping objects with lateral dimensions of 10 cm x 10 cm or more.

Since the used suction cup can only compensate vertical position deviations of up to 2 cm, an additional distance sensor is used to measure the vertical distance between the suction cup and the interaction object with millimeter precision. The sensor data is then used to detect a contact between the suction cup and the object as well as to determine whether the object was successfully gripped.

Fig. 2
figure 2

Coordinate transformations used to determine the interaction object’s global position. The indices have the following meaning: world (W), unmanned aircraft (U), camera (C), interaction object (O)

4 Evaluation

To evaluate the presented system several tests using the developed UA as shown in Fig. 3 were conducted and the results will be discussed below. The evaluation includes performance tests of the suction gripper as well as the detection and grasping of objects.

The UA is based on the DJI F450 airframe with suitable motors and electronic speed controllers, a Pixracer flight control unit (FCU) running the 1.9.2 stable release of the PX4 autopilot and an Aaeon UP Board as companion computer. To generate the required retention force we use an electric vacuum pump provided by Schmalz [16]. To detect and locate the interaction objects two additional sensors are mounted to the UA: an Intel RealSense D435 camera and a STM VL6180X time-of-flight distance sensor.

Fig. 3
figure 3

Hardware setup of the UA developed for object grasping

4.1 Retention Force of the Suction Gripper

To evaluate the retention force of the suction gripper, a specimen with adjustable weight is grasped. The specimen’s weight is thereby incrementally increased until the generated retention force is not enough to hold the specimen anymore. After the static test, the retention force is evaluated in-flight. In order to do this, the specimen is gripped and a take-off with maximum velocity of 1.0 m/s and with maximum acceleration of 2.0 m/s\({}^{2}\) to a flight altitude of 1.0 m is conducted, followed by a square shaped flight trajectory with a maximum velocity of 1.5 m/s and a maximum acceleration of 2.0 m/s\({}^{2}\). The weight is increased again until the specimen falls down during the flight maneuvers.

The static test shows that the suction gripper can produce a holding force of 10.81 N whereas during the flight phase objects with a weight force of up to 10.01 N can be retained safely.

4.2 Object Detection and Localization

To evaluate the performance of the object detection, the detection rate and the corresponding computation time are measured. The evaluation is divided again into a static test where the UA is placed on an elevated platform and a dynamic in-flight test where the UA hovers above the grasping objects to analyse the influence of flight movements.

In each evaluation 100 measurements are taken in different distances to the interaction object to determine the influence of the object size in the image plane. The images are taken at a fixed resolution of 640 \(\times \) 480 pixels and the evaluation is performed on a desktop computer equipped with an i7-8700 CPU, 32 GB RAM and a GTX 1080 GPU.

The results of the detection rate are shown in Fig. 4. The size of the green circles represents the percentage of successful detections and its center the distance to the object. In both cases the detection rate decreases as the object distance increases, but the effect is more significant when the UA is hovering. This is due to unavoidable in-flight movements of the UA which result in a reduced image quality and thus less features can be matched. With the used image resolution the detection rate drops significantly as soon as the distance between camera and interaction object exceeds a value of 60 cm. Therefore, the UA should hover around 30 cm to 50 cm above the interaction object to ensure a reliable detection and localization of the object.

Fig. 4
figure 4

Detection rate dependent on the distance between camera and grasping object (left) and time required for object detection and localization while hovering above the object (right)

Besides the detection rate, the time required to locate the interaction object is evaluated as described before. The measured execution time for the in-flight tests are shown in the right diagram of Fig. 4. It shows that with increasing distance to the object the execution time decreases as the number of extracted object features decreases and thus the feature matching requires less time. The worst observed execution time during flight is 242.20 ms while the median execution time is 100 ms at a distance of 36 cm and 80 ms at a distance of 76 cm. As the execution time directly depend on the number of detected features, during a static test under same conditions, the time required to locate can be larger than in-flight as more features need to be compared.

Based on the results of the examined detection rate, the UA should hover about 30 cm to 50 cm above the interaction object during the following evaluation of the grasping success rate in order to guarantee a reliable detection and localization of the object.

4.3 Grasping Success Rate

To evaluate the grasping success rate, the UA performs the procedure as described in Sect. 3. The storage position of the SLC containing the cardboard boxes is roughly determined by placing a UWB tag inside the SLC. The initial position setpoint sent to the UA is around 40 cm above the SLC to allow a precise localization of the cardboard box. After grasping, the UA ascends 50 cm and drops the object off at a random position 1.5 m away from SLC. The whole process is conducted autonomously and repeated ten times. The attempt is only considered successful when the object is precisely located, grasped and extracted from the SLC without being dropped. Based on this metric the object was grasped successfully nine out of ten times, resulting in a grasping success rate of 90 %. A successful grasp of a cardboard box is shown in Fig. 5.

Fig. 5
figure 5

In-flight grasping of a cardboard box out of a small load carrier. The box tilts to the side, as it is not grasped exactly in its center point

The failed grasping attempt was caused by a tilting of the cardboard box so that it was caught on the rim of the SLC and the suction gripper could not provide enough retention force to hold the object.

5 Summary and Conclusion

Within this paper, we have presented an approach for the object interaction and aerial grasping of lightweight cargo objects using an UA equipped with an industrial grade suction gripper. While the overall system works well and the gripping was successful in 90 % of the attempts, there are still limitations that need to be addressed in future research.

As shown within the evaluation, using a single suction cup leads to objects leaning to the side and reduces the load restraint during the transport. Within future research we will therefore use additional suction cups in a linear or triangle arrangement.

This leads to other issues, such as improving the positioning accuracy to guarantee that all suction cups fit on the object’s surface. To improve the accuracy of the position and eliminate the currently required optical markers at the same time, we plan to replace the marker based localization by an alternative tracking approach such as optical flow and visual odometry.