Introduction

Problem Definition

Autonomous landing of UAVs is a complex task that requires particular attention during the final phase of such where the GPS is ineffective. A novel cognitive computation approach based on perception and decision-making is proposed to recognize a colored man-made platform (target) on which the quadrotor, Fig. 1, must land safely.

Fig. 1
figure 1

The UAV flying during the testing process

The strategy combines computer vision and pattern recognition techniques along with a geometric analysis to make any final decisions. The cognitive approach outperforms several existing methods, including the one described in [1], which uses a monochrome landing platform [2].

Other groups have designed markers and platforms with various algorithms to address this problem. Shape recognition algorithms [3], based on UAV vision (perception) systems [4, 5] provide a clue as to the approach to the recognition task.

The images of the platform must frequently be acquired under extreme conditions, causing blind areas due to sunshine/reflections, or perspective distortion, because of camera pose.

This requires the application of specific intelligent techniques to transfer human reasoning (perception, recognition, and decision-making) to the computational cognitive system which, to the best of our knowledge, has not to date been addressed.

With such purpose, we address the same problem as in [1], i.e., the identification of the landing platform with the perception (vision) system on board the UAV but while overcoming the problems mentioned above, while an effective and robust cognitive system is achieved. The same quadrotor md-400 UAV (Fig. 1), along with a Samsung Galaxy S6 mobile device, has been used.

Related Works

Different active and passive methods, as based on computer vision strategies, have been proposed as approaches to precise autonomous UAV landing [6, 7]. For guidance during landing, active methods require external signals such as lamps [8] or infrared (IR) emitters [9, 10]. IR camera-based systems installed on the ground detect a near-IR laser lamp fixed to the nose of the UAV [11]; stereo IR vision systems have also been proposed [12,13,14]. Alternatively, passive systems only require the use of perception-based techniques on board. Some approaches attempt to recognize the UAV in flight, which is guided and landed from a control ground station [15, 16]. Other methods determine a secure landing area by analyzing the terrain [17, 18], as based on machine learning [19], by mounting either optical flow sensors [20, 21], or IR light-emitting diode (LED) lamps [22], by 3D terrain reconstruction [23,24,25,26], or based on 3D positioning with two cameras on the ground for fixed-wing UAVs [6]. Another common practice is the use of markers on the ground, i.e., landing platforms with specific designs, where tailored image processing techniques can recognize these markers even the UAV is in motion [27,28,29,30,31,32], including on board ships [33, 34]. We have designed a landing platform that uses colored markers, i.e., that includes spectral information. These markers are recognized by the cognitive system using a monocular RGB-based camera built into the mobile device.

Methods using mixed black/white markers or with unique color on platforms are classified as follows:

1. Based on specific geometric figures including circles, ellipses, polygons (squares, pentagons): In our previous work [1], we designed an expert system which provides both the angle of orientation and the degree of confidence in the recognition of a black and white platform [2]. Lange et al. [35, 36] proposed a figure similar to that proposed in [37] based on several concentric white rings on a black background. Nguyen et al. [7] used three concentric circles defining eight black/white areas. Polvara et al. [38] applied convolutional neural networks (CNNs) to detect a black circle enclosing a black cross on a white background.

Cocchioni et al. [39] designed a new marker based on a large circle and a second, much smaller circle, together with two small equilateral triangles of different sizes. Li et al. [40] also created a marker (pattern) that used two concentric circles with a geometric figure inside. Chen et al. [41] proposed a landing platform based on two concentric circles, black inside and light gray outside, with an irregular pentagon inside the black circle. The recognition algorithm used the well-known faster regions with convolution (Faster R-CNN) technique to recognize the figure based on its feature map. Sharp et al. [42] also used black-white squares with the aim of applying corner detection as a descriptor, encountering identical difficulties and limitations. The use of complex figures, based on curvatures, causes distortions because of image perspective projection. Moreover, the absence of relevant spectral information (color) produces saturated areas with high reflectivity due to intense illumination sources (e.g., the sun). Missed areas ensue where descriptors of full figures, corners, or feature maps fail.

2. Based on letter-shapes: Wang et al. [43] described a monochromic H-shaped figure [44] enclosed in a circle. The image was binarized and segmented to detect the corner (interest) points of the figure and reveal the geometric correlation between them. Zhao and Pei [45] described a single green H-shaped figure by applying the speeded-up robust features (SURF) descriptor [46, 47] with matching, which is invariant to small rotations and scale changes. Saripalli et al. [37, 48] used a white H-shaped figure and applied the following computer vision techniques: filtering, thresholding, segmentation, and labeling of related components. Guili et al. [49] designed a T-shaped gray figure painted with high-emissivity black powder for use with infrared systems, which allows the determination of the relative orientation angle between the platform and the UAV. These letter-shaped markers are based on a large figure with a unique color (green or gray), but do not prevent saturation, and consequently blind regions, because of reflections or distortions.

3. Based on fiducial tags: ‘AprilTags’ [50, 51] were used by several authors [30, 52,53,54], which were based on edge detectors and designed with low image resolution and intended to deal with occlusions, rotations, and lighting variation. Subsequent improvements have been described [54]. ArUco markers, similar to AprilTags, were used in OpenCV [55]. Chaves et al. [56] used this type of marker with a Kalman filter to guide the drone during the final phase of the landing sequence to land UAVs on ships. However, it is well known that edges are noise-sensitive features where incorrect detection can result from broken edges. Araar et al. [57] used the AprilTags system to generate markers that are printed over the surface of the landing platform. Since these markers are based on black and white, the missing parts problem stubbornly persists.

In summary, several approaches [37, 41, 43, 44, 49] used a single figure producing missing regions in the image due to saturation from direct illumination that causes reflections. Other methods [1, 30, 35, 40, 52,53,54,55, 58] used (intertwined or interspersed) figures or complex figures based on circles [59] and/or ellipses to address the missing parts problem, some of which [1] allow a certain plane distortion, but others, like [35], assume the UAV is parallel to the ground during the landing phase, ignoring the angle of inclination. Edges and interest points are also considered with high noise sensitivity [42, 43, 45, 49].

Contributions

Table 1 classifies and summarizes the previously described methods. Active or passive approaches are distinguished according to category and sub-category (I, II, and III). A description with analysis and comments describing drawbacks are shown in Table 1. The proposed new approach to autonomous landing addresses these shortcomings by making the following contributions:

  1. 1.

    The use of spectral information based on different colors, which reduces the likelihood of intensity saturation due to direct/indirect sunlight and compensates for high variability in outdoor environments.

  2. 2.

    Regions that are more robust than edges, from the point of view of the missing parts problem: these are designed to deal with blind spots. Indeed, the recognition of the platform is still possible when only half of the markers are identified (3), even if they are only partially perceived.

  3. 3.

    Combination of spectral and geometric information under a decision-making algorithm that uses the combination of a number of figures detected according to their color and relative positions. According to the number of figures detected for each region, a probability is computed based on the geometric relationships.

  4. 4.

    Recognition can be carried out at different distances and inclination angles between the vision system and the platform. This is not affected by distortions in the figures because of the perspective projection.

Table 1 Summary of comparisons of previous and proposed designs
Fig. 2
figure 2

Landing platform with labeling and geometrical markers

Table 2 Numerical values for each marker displayed in Fig. 2 and used during the color image segmentation process in order to obtain a binary image from the CIELAB color image

To develop our method, using color-based techniques, the following image segmentation methods have been studied:

  1. 1.

    Energy optimization under different approaches: (A) minimizing a higher-order color model built from a normalized color foreground and background histograms [60]; (B) minimizing the conductance of a graph consisting of nodes (pixels) and edge weights representing image intensity changes [61]. In the latter case, the user selects seeds belonging to foreground and background. Our platform is built with different foreground colors and is placed in complex outdoor environments with multiple colors around the target. Thus, two color-based histograms become ineffective, requiring a multi-classification approach [61] without user interactivity; (C) high order minimization is also applied for stereo image segmentation [62], where the energy function utilizes the corresponding relationships of the pixels between the stereo image pairs. The proposed cognitive approach is monocular, and no stereo-pairs can be obtained while flying the drone during the landing phase; hence, the recognition of the platform must guide this operation and the drone stabilization is not guaranteed; (D) in [63], seeds are initially interactively selected. By then applying the lazy random walk algorithm, superpixels and their boundaries are defined and compacted using energy minimization, and these boundaries are adjusted to existing objects in the scene.

  2. 2.

    Motion in videos, assuming that any movement of the platform appears in the image sequence during landing. In this regard, some approaches [64] apply sub-modular optimization by selecting a small number of initial trajectories for each entity. Color or texture information is extracted for each moving entity to build the energy and moving clusters. The apparent movement of objects in the sequence during landing is generated by the drone, but not for moving entities within the scene. The static platform, once it has been identified and located in the image must guide the drone. This is the opposite problem to that described in [64].

With the proposed cognitive approach, candidate regions, that is, those potentially belonging to the platform, are identified with their corresponding bounding boxes, which enclose the background (platform) and the figures inside. For each region, a probability is computed. In this way, the method can be used in the future as a proposal region approach in the context of convolutional neural networks. This is the case in the quadruplet model, inspired by Siamese networks [68], where four network branches are defined, consisting of the exemplar, instances, positive, and negative branches according to their inputs. Comparing this approach with our method, our proposed regions should be considered as exemplars whose probabilities will determine positive or negative branches, along with random instances. Additionally, such bounding boxes can also be considered as crop candidates in the attention box prediction network and supplied to the aesthetics assessment network; both were proposed in [69]. In the aesthetics network, probabilities can be used to assign binary labels considering probabilities (ranging between 0 and 1) less or greater than 0.5, respectively.

Proposed Cognitive Method

General

The premises described above have been addressed by mapping the human reasoning scheme to generate the computational cognitive process, based on perception, recognition and final decision-making:

  1. 1.

    Landing platform containing markers in the form of specific geometric and colored shapes (Fig. 2). Humans have high color perceptual ability.

  2. 2.

    The combination of color identification approaches and shape descriptors based on Hu moments [70], which are invariant to translations, rotations, and scale changes. Humans identify figures under different appearances.

  3. 3.

    A recognition process that uses a combination of color, providing isolated figures, and shapes with geometrical relations based on a Decision-Making (final phase of human reasoning) approach together with the Euclidean Distance Smart Geometric Analysis (EDSGA).

Landing Platform Design and Characteristics

The new landing platform shown in Fig. 2 contains six uniquely shaped and colored figures labeled LEB, LE, REB, RE, N and M. These markers are printed on white A3 paper. Each marker is filled with a different and unique color to facilitate the region detection procedure, and to address the problems that occur when regions are missing.

The center of mass (centroid) of the blue region (N) is placed in the middle of the platform. This is the point where the eight identical right-angled triangles (marked with black dotted lines) converge, arising as a result of dividing the squares into equal parts. This geometric pattern was designed to help determine the orientation of the image and to simplify the geometric relationships required to recognize the landing platform based on a computed probability.

When using RGB-based devices, color is an excellent perception attribute in computer vision [71]. The use of colors on the landing platform is essential because it provides two important advantages. Firstly, using different colors increases the probability of image processing techniques extracting several, or all, regions (more regions can be seen); if some regions (represented by colors) are not clearly detected, then others should be properly seen and recognized by the system, even if the whole platform is only partially visible in the image. Secondly, the use of a different color for each marker allows the system to perform an additional classification (described below) which is invariant to plane distortion due to orientation, rotation, or scaling, in addition to being invariant due to image perspective projection.

Proposed Solution

The proposed algorithm follows the architecture shown in Fig. 3 and is based on human reasoning, as expressed above, according to the standard imaging segmentation procedures of preprocessing, feature extraction, and recognition. Finally, the recognition probability is calculated by analyzing the geometric relations between the set of markers which have been identified. The processes shown in Fig. 3 that represent novel implementations are:

  1. 1.

    Computation of Hu moments. These are invariant to rotation, translation, and scaling. They are robust descriptors to deal with marker distortions that may result from image perspective projection and/or if the UAV deviates from the vertical inclination, as well as deviations from the zenithal position on the platform, all of which are typical during landing.

  2. 2.

    A new recognition process which mixes both the L*a*b* color discrimination with the Hu moments. This is based on a Decision-Making technique [72] that uses distance measurements together with the EDSGA algorithm. This avoids complex difficulties like those shown in Fig. 4.

Fig. 3
figure 3

Design and architecture of the proposed image recognition algorithm

Fig. 4
figure 4

Illustrative example to solve adverse problems, even with specular reflections. Note that the markers’ colors depicted in the labeled image are for display purposes only

Preprocessing

During the landing phase, the platform is imaged with the digital camera of the mobile phone (Samsung Galaxy S6) located onboard the UAV. The standard, high-resolution image is scaled down to 640 × 480, 1024 × 768 and 1707 × 1280 pixels in each of the three RGB spectral channels. The lowest resolution is used first, and if identification fails — as evidenced by a recognition score < 50% — the entire process is repeated using the next-highest level of resolution, and so on. After image reduction, the RGB image is transformed into the L*a*b* color space. This color space contains luminosity as a separate component, which effectively minimizes the impact of reflection difficulties caused by intense light sources such as the sun, which is the most problematic.

The a* and b* components are sufficiently separated in their respective color spaces to avoid overlaps during the segmentation process. Each marker is defined with a unique color to compensate for any parts of the landing platform image that might be missing. CIELAB is categorized as a uniform color space where changes in the color coordinates correspond to identical or similar recognizable changes in visible color tones and color saturation. This was designed to facilitate color measurement in accordance with the Munsell color order system [71]. Moreover, CIELAB is device-independent and has been sufficiently tested to deal with similar spectral component values during the segmentation processes. This occurs in adverse lighting conditions, such as those found in outdoor environments, where high levels of illumination lead to the saturation that causes low contrast in image colors. Low contrast also appears at low lighting levels. According to these considerations, several color spaces were analyzed during experimentation. The experiments showed that the best results were for the CIE 1976 L*a*b*, which was, therefore, chosen as the best color space for the segmentation.

Feature Extraction

The aim of this stage is to segment the image, identifying a group of regions and their measurements. These regions will be used in the recognition stage to match a single region with each marker, where the group of regions will be used as candidate markers. Feature extraction can be broken down into several steps:

  1. 1.

    Color-based segmentation (described in Algorithm 1) based on the selected CIELAB color space [73] with foreground segmentation [74]. The underlying idea in this algorithm is the consideration of a measure of similarity in the color space at the pixel level. This is also a common concept in color-based image segmentation, as expressed in [75], where color distances are used to measure similarities between the pixel, which is being labeled, adjacent pixels, and seeds that guide the segmentation process.

Alogrithm 1 returns a binary image that is used in the connected-component labeling process. Threshold (Tl) and estimated values (TIa, TIb) used for each figure appear in Table 2. These averaged values are obtained by applying the supervised naïve Bayes training approach with 500 images obtained under different attitudes (distances and inclination angles) of the UAV with respect to the platform, and under different and adverse lighting conditions (sunny, cloudy and alternating clouds with sun). Several rotations and scaling with respect to the UAV were also considered.

For this estimation process, each color marker was manually identified by human inspection of random samples for each figure, computing the averaged values and covariance matrices. The threshold, Tl, was manually obtained by carrying out a heuristic and supervised method through a fine-tuning process during the training stage. To support extreme changes in color shades, the best limit according to Algorithm 1 was found. Correct recognition can now be expected even under extreme and disparate conditions.

  1. 1.

    Image labeling: this is based on the 8-connected-components approach [76, 77]. For each binary region, we compute four properties — area, centroid, orientation, and bounding box — where some segmented regions are potential markers.

  2. 2.

    Hu moments: for each binary region, the seven Hu-invariant moments are obtained [78]. As before, the naïve Bayes estimation approach is used with the same 500 images to compute the averaged values and covariance matrices for each moment and candidate region. These are displayed in Table 3.

Table 3 Averaged values for the seven Hu moments for each marker

Recognition

Based on the properties of the binary regions, the goal now is to identify each unique marker (r = LEB, LE, REB, RE, N, and M). This is part of the discrimination procedure derived from human reasoning, and which belongs to the cognitive process.

Single Marker Identification

Using a combination of CIELAB (color) and Hu (shape) moment properties, the marker identification is performed by comparing the segmented candidate markers (binary regions) with the values provided in Tables 2 and 3. In this way, each candidate marker is identified and assigned to the label with the highest similarity, as based on a minimum distance criterion (Euclidean) [79].

Color and shape distortions often appear due to adverse outdoor environmental conditions, leading to failures during the marker identification process. To minimize these frequent misclassifications, color and shape properties are combined considering target colors (Table 2) and shapes (Table 3). We compute similarity/dissimilarity values using the following Decision-Making process (important part of the human reasoning):

For each binary region with an area ranging from 50 to 80,000 pixels, compute average spectral values a and b for channels a* and b*, respectively, and the seven Hu moments (Hu).

Compute the spectral distance DC, with TIa(r) and TIb(r) reference values for each marker provided in Table 2, as follows:

$$\begin{array}{c}{Dc}_{a}\left(r\right)=\left|\overline{{c }_{a}}-{TI}_{a}\left(r\right)\right|\\ {Dc}_{b}(r)=\left|\overline{{c }_{b}}-{TI}_{b}(r)\right|\\ Dc(r)=\sqrt{{\left({Dc}_{a}(r)\right)}^{2}+{\left({Dc}_{b}(r)\right)}^{2}}\end{array}$$
(1)

Candidate markers must meet the following constraints:

$$\begin{array}{c}{Dc}_{a}\left(r\right)<Tl\left(r\right)\\ {Dc}_{b}\left(r\right)<Tl\left(r\right)\\ Dc\left(r\right)<Tl\left(r\right)\end{array}$$
(2)

Compute the shape distance DHu with the reference Hu moments for each region, (r) (see Table 3), as follows:

$$DHu(r)=\sqrt{\sum_{i=1}^{7}{\left(Hu-{\phi }_{i}(r)\right)}^{2}}$$
(3)

Compute the total distance of the candidate binary regions with respect to each marker r:

$$D(r)=\left(\frac{\left(\frac{{Dc}_{a}(r)}{\mathit{max}\left({Dc}_{a}(r)\right)}\right)+\left(\frac{{Dc}_{b}(r)}{\mathit{max}\left({Dc}_{b}(r)\right)}\right)}{2}\right)+\left(\frac{DHu(r)}{\mathit{max}\left(Dhu(r)\right)}\right)$$
(4)

Algorithm 1

Color image segmentation. Foreground extraction.

figure a

The minimum D(r) value with respect to r allows the classification of the unknown candidate binary region as one of the possible labeled markers, i.e., r = {LEB, LE, REB, RE, N, M}. From now on, the markers have been labeled.

It is anticipated that each marker will have none, one, or several candidate regions associated with it. None indicates that a given candidate has not been detected. When several regions exist, the region with the lowest D(r) value will finally be assigned to the marker r because a smaller distance means greater similarity. For a better understanding of the above, consider the following pedagogical example: LEB: {1, 2, 3}; LE: {10, 11}; REB: {22, 23, 24, 25}; RE: {37, 38, 39}; N: {}; M: {45}.

The number in bold indicates the candidate region with a minimum D(r) value with respect to r for each marker. So, LEB has candidate regions {1, 2, 3}, LE has candidate regions {10, 11}, N could not be detected, and so on. Hence, for this example, the base group (B) of regions obtained is: GB = {LEB: 1, LE: 10, REB: 22, RE: 37, M: 45}. The N marker does not appear because it is not detected in this example and the total number of elements for the grouping (tM) is 5.

Euclidean Distance Smart Geometric Analysis (EDSGA)

Once the markers have been identified, groups of candidate markers are built to identify group coherences, which are compatible with the full set of figures drawn on the platform. This is carried out by computing compatibilities between the centroids of the grouped markers in the image and comparing to the grouping of markers on the platform. Geometric distances between centroids are obtained for comparison. This approach assumes that the platform is made up of a group of markers.

This process of determining group associations was designed as follows:

Groups of markers: build groups using the candidate regions for each, applying the following rules:

The length of each group will always be the same and must coincide with the total markers, tM. There is no minimum limit for the tM value as far as groupings are concerned, but the maximum value is 6.

At least (tM/2) + 1 members belonging to each group must be equal to GB. Hence, each group must contain at least (tM/2) + 1 regions with the minimum D(r) value regarding its respective marker. Hence, we build as many groups as permutations that can be obtained with the regions labeled as specific markers, i.e., (tM/2) − 1 that do not match the elements in GB. Considering the pedagogical example above, each group must contain at least three members belonging to GB and permutations for the remainder. Some valid groups could be:

G1 = {LEB: 1, LE: 10, REB: 22, RE: 38, M: 45}

G2 = {LEB: 1, LE: 10, REB: 22, RE: 39, M: 45}

G3 = {LEB: 2, LE: 11, REB: 22, RE: 37, M: 45}

G4 = {LEB: 2, LE: 10, REB: 23, RE: 37, M: 45}

Normal font character indicate candidate regions matching GB, while bold characters refer to the candidate regions that are permuted as not belonging to GB but are labeled as specific markers.

A candidate region must belong to only one marker. The group must not contain repeated regions.

We compute and sum the Euclidean distances (total distance) between all pairs of centroids of all regions belonging to each group.

The group with the minimum total distance is finally chosen as the most likely candidate to be the platform.

Once the most likely group (GC) is chosen, a reassignment is still feasible up to a maximum of (tM/2) − 1 regions for this group, i.e., all regions with discrepancies from GB with respect to GC are replaced by those of the latter.

For example, if GC is G2 = {LEB: 1, LE: 10, REB: 22, RE: 39, M: 45}, considering that GB = {LEB: 1, LE: 10, REB: 22, RE: 37, M: 45} the region of marker RE is reassigned, changing from 37 to 39.

This EDSGA approach greatly improves the accuracy of the recognition process and, therefore, the probability of recognition. It works well even in adverse situations. Indeed, EDSGA can correctly recognize groups of markers even if there are regions with similar colors and shapes.

This is demonstrated in Fig. 4 where both the standard and reflected platform images are close to each other because of their proximity to a glass door. The algorithm identifies the original image and rejects the reflected one. In this example, the LEB and M regions appear twice due to the reflection in the glass. Without EDSGA, the recognition would have detected 5 figures correctly, but would have failed in the LER marker, having detected the reflection to the left. However, after applying this smart analysis, all elements of the group are compared with the real figures. Hence, the region related to the LEB marker has been reassigned to the region number set in the EDSGA group. Without this step, the recognition would have been performed with a score of 80% instead of 100%.

This illustrative example demonstrates the utility of EDSGA. However, the scope of the application is most evident when hundreds of candidate regions are generated (Fig. 6A); some regions find matches with markers but are partially blind or significantly distorted. In this case, the EDSGA approach ensures that those regions with high similarity scores that can no longer be discerned are correctly assigned.

Recognition Score Computation

The objective of this stage is to provide a score in the range [0,1] that describes the recognition probability of the landing platform as a group. Therefore, we must determine whether the total number of markers, and their geometric characteristics/relationships are similar to the expected values, considering GC after the reassignment.

Metric Descriptors

In the recognition stage, a series of analyses based on metric descriptors are performed to study the geometric relationships between the detected objects (selected markers) shown in Fig. 2. The following descriptors are used: area, distances between centroids, and the angles between the straight lines connecting the centroids with respect to the base of the image, i.e., the bottom horizontal line of the image. The areas and distances are strongly affected by the image resolution, caused by the varying distance between the camera and the platform.

Therefore, instead of performing the analysis using only absolute values, relative values are established between the figures using a technique analogous to the comparative partial analysis (CPA) introduced in [1]. However, in this instance, the probability function using these similar measurements must also consider two independent events based on the independence of the probability theory.

The method considers all possible combinations between regions to ensure an adequate number of similar measurements. In this way, sufficient measurements will be obtained even if there are undetected markers, as was the case with marker N in the example above. Therefore, the proposed cognitive method is more robust and accurate. Each combination generates a measure of similarity which has two inequalities. All inequalities are shown in Tables 4, 5, and 6 in the “Appendix”.

In these inequalities, all numeric values were calculated using the zenithal position of the camera relative to the platform. Additional flexibility is required to withstand distortions in the markers due to image perspective projection. This is achieved by considering three thresholds, ArT, DT, AgT, as estimated by hundreds of tests and a trial-and-error procedure under different attitudes and distances between the vision system and the platform. All the values for these thresholds are reported in Tables 4, 5, and 6, respectively.

Table 4 shows a comparison of the area between regions, Table 5 compares the distances between centroids of the regions, and Table 6 compares the difference in the angles generated by the intersection of the straight lines connecting the centers of the regions with respect to the horizontal x-axis of the image. Tables 4, 5, and 6 contain all possible relationships involving areas, distances, and angles for the full set of markers with the flexibility expressed above, based on the three thresholds. Thus, the maximum number of possibilities (relations) when all markers are detected is 15 for the area (Ari, i = 1,…,15), 105 for the distances between the centroids (Dstj, j = 1,…,105), and 105 for the differences between the angles (Angk, k = 1,…,105).

Recognition Probability

The final recognition probability function returns a recognition score in the range [0,1] by combining the probabilities of two events (A and B) defined below. In this regard, the terms recognition score and recognition probability refer to the same concept (herein used interchangeably), which can also be expressed in terms of a percentage. Firstly, the ratio between the total number of markers that could be identified and those actually identified is considered. This event is represented by A and its probability is computed as:

$$P\left(A\right)=\frac{tMR}{tM}$$
(5)

where tMR is the total number of markers finally recognized and tM is the total number of possible markers that exist on the landing platform, i.e., six in the proposed design. On the other hand, it also takes into account the probability based on the geometric relationships between the regions belonging to the candidate group, GC. This probability is computed by applying the geometric relations defined in Tables 4, 5, and 6 (Appendix) and requires the following considerations:

(a) The number of possible relations for each topic (area, distance, angles) is defined by the number of markers detected. Following the example above with respect to the group GB where the marker N is missing, it can be seen how this marker is involved in five relations in Table 4, so the number of possible relationships in this case is 10. This is applicable to the remaining relationships, i.e., the maximum number of possible relationships is 45, and 45 for the example in Tables 4 and 5, respectively. On this basis, the number of possible relationships for each group is identified as aTp (area), dTp (distance), and agTp (angles).

(b) The number of relationships that are met for each topic among all the possibilities is defined by aTs (area), dTs (distance), and agTs (angles).

The probability of event B is therefore defined as follows:

$$P\left({B}_{a}\right)=\frac{{aT}_{s}}{{aT}_{p}};P\left({B}_{d}\right)=\frac{{dT}_{s}}{{dT}_{p}};P({B}_{ag})=\frac{{agT}_{s}}{{agT}_{p}}P(B)=\frac{P({B}_{a})+{P(B}_{d})+{P(B}_{ag})}{3}$$
(6)

The overall recognition probability from events A and B is modeled as the intersection under the assumption that these are two independent events, i.e., on the basis of probability theory [81], the final probability of detection of the landing platform is computed as follows,

$${P}_{d}= P\left(A\cap B\right)=P\left(A\right)\bullet P(B)$$
(7)

The assumption of independence is sufficient for this approach, although the events could be considered partially dependent; if a marker is missing, this affects event A and also event B since a different number of relations (equations) will be used. Inspired by fuzzy set theory [82] we have modeled P(A) and P(B) as membership degrees to combine them as t-norms and t-conorms for different combinations, including drastic, Einstein, or Hamacher product and sum without apparent improvement with respect to the results provided in (7).

Results

The test strategy is based on 800 images and follows the same test strategy designed in [1]. Thus, 80 of the 800 images were captured using the same settings for angles, distances, and lighting. The other 720 tests were similar. Both series of tests were carried out in the same environment and under the same considerations.

Because failures and limitations were previously observed with 80 images in [1], we started by reproducing similarly adverse test conditions. Further 720 new images, tested under very different conditions, were also used for the tests. In both cases, we set out to analyze the performance in terms of distances, inclination angles, and lighting conditions.

We also tested the following platforms and methods: circles/ellipses [7]; square markers [53, 54]; T/H/X-shaped [43, 48] under identical considerations to those in [1], i.e., without the richer color information, verifying the extent to which this new approach outperforms previous works [1]. A direct comparison is not possible because no additional information is provided about the three main parameters used to evaluate this approach: distance, inclination angle, and lighting conditions.

These useful parameters verify the novelty of our proposed cognitive method, demonstrating effective and robust recognition of images obtained under extreme conditions. However, the superior performance of the proposed cognitive approach in relation to [1] also reveals better performance in relation to the abovementioned methods.

For each image, the same visual supervision process is carried out as defined previously [1]. An outcome is considered successful when the user, by observation, determines that there are at least three red regions identified in the original image and that they match the correct markers on the platform. In Figs. 4, 6, 8, and 10 (see “Platform Recognized” columns), the boundary of each detected region is colored in red for debugging.

Image Acquisition Environment

All the images were acquired in a real test environment using a mobile phone (Samsung Galaxy S6) on board a quadrotor with four 1000 rpm/V brushless engines each driving a 10-inch propeller and powered by its own battery. In addition to acquiring the images, the telephone acts as a control unit that manages sensors, engines, and other quadrotor components. The experiments carried out to test the proposed cognitive approach were performed using the following:

  1. 1.

    Landing platform. The markers shown in Fig. 2 are printed on a white background on A3 paper.

  2. 2.

    All images were acquired using the integrated mobile phone camera (16MP, F/1.9) in automatic mode without use of zoom or flash.

  3. 3.

    800 pictures were acquired in different environments and conditions: distances, inclination, and lighting (outdoor: sunny, cloudy, sun and shadow; indoor: artificial light).

  4. 4.

    Each image was captured using the following settings: 3 bytes (24 bits) for color depth; a resolution of 4 MP with dimensions of 2560 × 1944 pixels; 1 byte (8 bits) per RGB channel; and JPEG using standard compression.

  5. 5.

    The test UAV flights were conducted in the courtyard of a building within a residential area.

  6. 6.

    The implementation of this test stage was carried out using the image processing toolbox provided with MATLAB Drive [83], which is used by the MATLAB Mobile App, i.e., with the code running \in the cloud (Drive). The Galaxy mobile platform was connected to the cloud via WiFi or 3G wireless networks and was running under Android 5.0.2, Octa-core (4 × 1.5 GHz Cortex-A53) CPU, and Mali-T760MP8 GPU, with internal memories of 128 GB and 3 GB RAM.

The aim of the tests is to determine the robustness and accuracy of the proposed cognitive method under different conditions in terms of the distance between the quadrotor and the platform, the angles of inclination under different perspectives, and the illumination.

From the point of view of computational cost, three internal resolutions were considered for each image, which varied between 640 × 480, 1024 × 768 and 1707 × 1280 pixels. In this regard, if any lower resolution is successful at recognition, the others are not tested. If all downscaled images fail, a final attempt using the native image resolution is performed.

Distance Test

Eight hundred images were used when distances from the UAV to the platform were in the range of 0.6 to 12 m. The results obtained are summarized in Fig. 5, where the percentage of effectiveness (i.e., probabilities of detection) against distance in meters are graphically displayed and compared between the proposed cognitive method and the method described in [1]. In Fig. 5A, the distance intervals are unique (i.e., 0–2 m, 2–4 m, 4–6 m, 6–8 m and > 8 m). In Fig. 5B the distance intervals are aggregated (0 to 2 m, 0 to 4 m, 0 to 6 m, 0 to 8 m and > 8 m). The proposed cognitive approach achieves, on average, an acceptable and improved performance (95%) compared to the previous strategy [1], labeled as “Others” in the graphs. Generally, we observe that better performance is achieved at shorter distances, as expected.

Fig. 5
figure 5

Effectiveness against distance (m) for the proposed cognitive method and the method in [1]: (A) exclusive intervals, (B) aggregate (cumulative) intervals

Although distance has a significant impact on effectiveness, the proposed cognitive method successfully recognizes an acceptable number of images up to 12 m, the maximum distance we considered. The proposed cognitive approach is more effective than the one described in [1]. Additionally, it can be concluded that only the lower image resolution is needed at shorter distances, as expected.

Illustrative Example Related to Distance

In Fig. 6, we compare two images that demonstrate the versatility of the proposed cognitive method in terms of distance. Both images were obtained under similar lighting conditions, but with distances of 8.94 m and 0.67 m in A and B, respectively. The platform was successfully recognized with scores of 91.11% and 100%, respectively. In contrast, the method in [1] fails at distances > 8 m.

Fig. 6
figure 6

Illustrative example at two very different distances under identical lighting conditions: (A) at 8.9 m, (B) at 0.67 m

Inclination Angle Test

Eight images were captured and evaluated with a combination of inclination angles of 1°–68°, and distances, as described previously. This angle is defined, in degrees, between the vertical axis in the image and the imaginary straight line connecting the center of the camera lens to the centroid of the N marker. As before, the results obtained are displayed graphically in Fig. 7 with the percentage of effectiveness against angle of inclination for the proposed cognitive method, and that described in [1]. In Fig. 7A, intervals of angles are mutually exclusive (i.e., 0–10°, 10–20°, 20–30°, 30–40°, and > 40°). However, in Fig. 7B the intervals of angles are aggregated (0° to 10°, 0° to 20°, 0° to 30°, 0° to 40°, and > 40°). Again, acceptable performance is achieved with the proposed cognitive approach (95%), which outperforms the strategy in [1]. In the main, better performance is achieved at lower inclination angles.

Fig. 7
figure 7

Effectiveness against inclination angles (degrees) for the proposed cognitive method and the method in [1]: (A) exclusive intervals, (B) aggregate (cumulative) intervals

Illustrative Examples for the Inclination Angle

Figure 8 contains two different illustrative images to show the versatility of the proposed cognitive method in terms of high inclination angle (A: 59.78° and B: 61.17°), where the effect of the image perspective projection is acutely observed. In both cases, the proposed cognitive approach successfully recognizes the platform. This outperforms the approach in [1], where the experiments failed at these angles. The results of this example are generally representative of what was commonly observed.

Fig. 8
figure 8

Meaningful sample of high inclination angles due to the distortion in the plane caused by the perspective. In both cases, we achieved a successful recognition with a score of (A) 96.19% and (B) 84.44%, respectively. *Both labeled and platform recognized images were zoomed for better illustration

Lighting Condition Test

Eight hundred images were acquired on different days and under different daylight conditions in outdoor environments (sunny, cloudy, sun, and shade), and indoors with artificial lighting. These images were acquired under the criteria of the quadrotor’s operator during different flight experiments. Figure 9 displays the averaged values over the total number of images used for this test graphically. It is clear that the proposed cognitive approach outperforms the method in [1] in outdoor environments, with similar results to those in [1] for indoor environments. This is because the brightness indoors is constant and allows for sufficient contrast between the black figure on the white background, thus compensating for the additional contribution of color markers in outdoor environments.

Fig. 9
figure 9

Effectiveness against different lighting conditions

Illustrative Examples Under Different Lighting Conditions

Two illustrative images are shown in Fig. 10 to demonstrate the proposed cognitive method’s versatility in terms of lighting conditions. Figure 10A is acquired in an artificially lit indoor environment, where the reflections caused by this type of light and the high inclination angle produce an excess of luminosity that hinders the recognition of the landing platform. Figure 10B, acquired in an outdoor environment, is also affected by the reflection caused by the intensity of the sunlight and the high inclination angle.

Fig. 10
figure 10

Illustrative examples under different lighting conditions in indoor (A) and outdoor (B) environments. *Both labeled and platform recognized images were zoomed for better illustration

In Fig. 10A, all the markers of the landing platform are correctly identified, and the platform was detected with a probability of 83.49%. Best performance was not obtained because marker M is partially blind. A lower recognition score in Fig. 10B results because markers RE and M are totally blind and could not be detected. For this case, the maximum recognition probability value was 4/6 (≈66%), i.e., despite these missing markers, the score obtained for recognition is the maximum possible, and the landing was successfully achieved. This result, which was generally observed in different experiments, demonstrates the robustness of the proposed cognitive method, even when markers are missing.

Overall Assessment

It is important to check the robustness and reliability of the recognition feature of the system. The score returned by the probability function should be consistent with the input image. This is achieved by checking that the region identified and associated with each marker correctly matches the expected marker.

To carry out this process, and, therefore, to decide the likelihood of success, we have performed a manual, visual “supervised” human inspection. This supervised process requires a visual inspection of each result to determine whether the recognition system has detected at least three regions (edges marked in red), and whether these regions were matched with the correct markers.

The conventional state-of-the-art for trackers [84, 85] is that a minimum of two frames indicates a successful outcome. It compares the positive example that is manually provided in the first frame with the following frames. However, our system works in a more efficient fashion since it only needs a single frame for assessment. Several criteria can be used to make a decision about what is really a successful outcome; these criteria have already been applied in [1] and allow an objective evaluation of system behavior.

In addition to the performance reported above, we also provide details about the recognition scores obtained according to Eq. (7) for different distances, angles and lighting, as before.

As displayed in Fig. 11, the minimum score obtained is 35%, while the average was 92.94%. A visual inspection of the result was carried out to determine whether the recognition system detected at least three markers. As a result of this process, it was observed that images with a score lower than 50% resulted from the detection of two markers or less, while those with higher values always correctly detected at least three markers.

Fig. 11
figure 11

Score returned by the recognition probability function. (A) Score by distance, (B) Score by inclination angle, (C) Score by lighting conditions

From the above, it can be inferred that the result returned by the probability function is coherent with the visual inspection and, therefore, the probability function is reliable because it is consistent with observations made during the flight tests. In addition, images with more extreme lighting issues, blind areas, and large deformations due to perspective have lower scores, which is again consistent with the images with inferior results in [1].

Conclusions and Future Trends

The proposed cognitive approach can recognize a landing platform in a robust manner in conditions that are representative of real-world scenarios. The average recognition time per image is 0.5 s, 1 s faster than previous work [1]. The cognitive method operates across a broad variety of conditions: different distances, inclination angles, any lighting conditions (sunny, cloudy, sun, and shadow, etc.), complex environments (indoor, outdoor), and with blind regions that result from intense light sources and reflections. Robustness and accuracy are the main characteristics that define the cognitive method.

Cognitive computation, which is the applied paradigm involving visual perception and decision-making, consistently outperforms the previous approach [1], and by extension those that were previously evaluated [1]. With the new design, 760 of 800 new images were successfully recognized (95%), while before [1], 63 images were correctly identified from a total of 80 (78.75%). Although the sensitivity to the inclination angle due to perspective may still be a limiting factor for the practical exploitation of this method, the improvements are still so great that we strongly believe that the proposed algorithm, along with this new platform design, will be highly effective for both rotary- and fixed-wing UAVs under somewhat restricted circumstances of approach angle.

As in [1], the closer the drone is to the landing platform the lower the image resolution that is needed. For short distances (0–4 m), the system works better when using low resolution (1024 × 768). As the distance increases, the best results are obtained using high resolution (2560 × 1944). If high resolution is used at short distances, the number of regions obtained through the segmentation process increases and reduces the likelihood of success. Using low resolutions for long distances may cause the platform to be segmented into a single region, which prevents recognition. In addition, the more regions there are, the more important EDSGA is.

Thresholds are considered optimal because the training set used to obtain the Tl values was carried out from many different perspectives, and various weather conditions and environments. In addition, the images used in the results section to evaluate the performance of the method were obtained in environments that were totally different from those of the training set. From these premises and the support provided by EDSGA for decision-making, along with the good results obtained, we can infer that values shown in Tables 2 and 3 are optimal and effective for complex environments and diverse weather conditions. Hence, no additional calibration is needed (unsupervised method).

It has also been proven that recognition can be carried out using a single frame with difficult images that present several blind areas. In the most extreme cases, where half of the markers (3) were totally blind, the system was still able to carry out the recognition with a score of 50%, given that the remaining markers could be correctly identified. The system also properly recognized images with angles of inclination of up to 68° with respect to the horizontal (axis OX). It was also evidenced that this new approach overcomes some challenging aspects of an angled approach, such as the presence of shadows or severe occlusions in the scene, as well as the overexposure of the images, and distortions.

In the future, one of the main objectives is to use additional recognition approaches, perhaps similar to those used for facial recognition [86] but as applied to the landing platform. In this regard, the face detection feature of the mobile device attached to the UAV frames the landing platform in a single rectangle in the same way as it does for human faces. In this case, more sophisticated training processes will be needed.

As mentioned in the “Contributions”, neural networks can be used for video and image processing [68, 69] and applied to perform autonomous landing. This would extend this method to identify a family of targets, thinking about the distribution of goods in an aerial way. However, convolutional neural networks, like any neural network model, are computationally expensive. In addition, they also have many hyperparameters that need to be adjusted to train them well. Hence, the number of the test images needed by the neural network will be huge compared to the current approach. The presented method complies with the objective of identifying a unique landing pattern in a faster, simpler, and more efficient way than a deep learning-based approach.

In later stages of development, an upgrade of the mobile device is planned. The aim is to obtain better performance with a lower power usage and to upgrade to 4G networks.