MechaTag: A Mechanical Fiducial Marker and the Detection Algorithm

Fiducial markers are fundamental components of many computer vision systems that help, through their unique features (e.g., shape, color), a fast localization of spatial objects in unstructured scenarios. They find applications in many scientific and industrial fields, such as augmented reality, human-robot interaction, and robot navigation. In order to overcome the limitations of traditional paper-printed fiducial markers (i.e. deformability of the paper surface, incompatibility with industrial and harsh environments, complexity of the shape to reproduce directly on the piece), we aim at exploiting existing, or additionally fabricated, structural features on rigid bodies (e.g., holes), developing a fiducial mechanical marker system called MechaTag. Our system, endowed with a dedicated algorithm, is able to minimize recognition errors and to improve repeatability also in case of ill boundary conditions (e.g., partial illumination). We assess MechaTag in a pilot study, achieving a robustness of fiducial marker recognition above 95% in different environment conditions and position configurations. The pilot study was conducted by guiding a robotic platform in different poses in order to experiment with a wide range of working conditions. Our results make MechaTag a reliable fiducial marker system for a wide range of robotic applications in harsh industrial environments without losing accuracy of recognition due to the shape and material.


Introduction
Machine Vision (MV) consists of the analysis and elaboration of digital images for extracting specific pieces of information.
The applications of MV cover a wide range of purposes such as localization [1], tracking objects [2][3][4], and recognizing and measuring objects in specific environments [5]. In order to accomplish these tasks, a vision system requires references to accelerate data elaboration and to precisely and accurately guide the robotic systems Fig. 1.
These references, also called as fiducial markers, are artificial planar elements with already-known features (e.g., shape, color, dimension). They usually have an external shape that works as a frame with an internal patterned image which encodes specific information and are designed to meet specific criteria such as resistance to partial occlusion and different lighting conditions. Inter-marker confusion is usually taken into account to assess the performance [8]. Researchers have mostly used squared markers since corners allow an accurate calibration and recognition also with a single marker. The most used marker-based systems are the following: ARTag [6], AprilTag [7], CALTag [8], Pi-Tag [10], ChromaTag [9], and CCTag [21] (Fig. 1). Most markers are monochromatic to minimize the sensitivity with different lighting conditions: ARTag uses, for instance, 2002 squared white/black planar markers that are detectable through their edges [6]. ARTag is used for AR applications, achieving a very low intermarker confusion and false positive error rates [6]. AprilTag is a system that uses the same markers as ARTag but with a different detection algorithm. The detection procedure is, indeed, improved to minimize the number of false positives by using a graph-based clustering method instead of exploiting an edge-based method [7]. Pi-Tag, in contrast, presents 12 dot-shaped markers equally placed on the sides of an imaginary square. The main advantages are the following: the circular shape of the dots in a pattern identical on the four sides of the square to minimize the localization error, robustness to ill conditions such as moderate occlusion of single dots, blurring, heavy artificial noise, and illumination conditions [10]. Moreover, if dots are sufficiently small, a circular shape still guarantees a negligible positioning error even under a severe perspective distortion. Other groups have developed markers with circular shapes like the Circular Data matrix Marker [8]. This circular marker is divided into black, white sectors, with small black and white circles that allow to understand its orientation. Bergamasco et al. [22] provided a fiducial marker system with a strong occlusion resilience while Calvet et al. developed a circular fiducials system, composed of three concentric circles, able to deal with severe conditions, such as partial occlusion, varying distances and angles of view [21]. The marker provides a high-frequency image since the circles are black over a white background. Furthermore, the researchers used the thickness of the rings to encode the unique ID of the marker, providing a simple and reliable method to recognize the marker in the scene [21]. Researchers have also developed fiducial markers with further features the most relevant addition is the embodiment of colored signs. DeGol et al. [9] introduced colored fiducial markers for real-time robot navigation applications, named ChromaTag. It used the color channel of LAB (Lightness, Red/Green value, Blue/Yellow value) opponent colorspace to reduce false detections. The color channels constitute the unique ID of the marker but create issues for the ID encoding and tag localization [9]. The SCR Marker system, designed by Siemens instead, employs markers with circular and square signs. However, despite the high reliability and accuracy, the associated detection algorithm is quite slow. However, all these systems are usually printed on paper to reduce costs and enhance an easier preparation and set-up. [21].
To date rigid mechanical parts with already known shapes have not been used to estimate the position of a target but, instead, they have been employed in other tasks. Bartindale et al. developed a method for identifying the order of stacked items by using fiducial markers made of reflective areas on mechanical parts [23]. As for pick & place robotic applications, Vijayalaxmi et al. provided a machine vision application for recognizing objects with simple shapes like circles, squares, or triangle, in order to create a vision-assisted robotic platform [24] [25]. As for inspecting defects, Wang et al. developed an automatic optical inspection system to check the integrity of holes on a printed circuit board (PCB) [26].
In this paper, we introduce MechaTag, a vision-based fiducial marker system able to localize, with a dedicated algorithm, specific targets through already existing, or additionally fabricated, mechanical features that are parts of the targets themselves (markers). We use mechanical markers with circular shape since they are the most common in mechanics (e.g., holes), and they have an easy shape to reconstruct. MechaTag is able to significantly minimize the time required for the recognition process based on a simple and immediate image elaboration process, adaptable to different working conditions such as different illumination conditions or harsh environments without losing precision and accuracy. In contrast to the state of art fiducial marker system, the shape of our marker is simple but reliable and robust to reproduce directly on the target, minimizing the error estimation of the localization task due to the rigidity of the surface where it is located. The state-of-the-art paper-based markers could be deformed by the application surface and increase the error for the localization objective. Furthermore, their shape is complex to be reproduced directly on the target, and they are less compatible with industrial and harsh environment characterized by high temperature, humidity, and high variability of environmental illumination.

CCTag (g)
Intersense (d) Fig. 1 Example of the most common fiducial marker with different features (e.g, shape, colour, configuration), (a) [6] and (b) [7] represent fiducial marker bi-colour with square shape, (g) and (c) are the same features [8] of a) and b) but circular shape, (d) [9] and e) [10] are hybrid fiducial marker for the addition of combination of more features: (e) adds the RGB colour configuration, d) combines an external square shape with circular dot internal

MechaTag Configuration and Design
The markers used to assess MechaTag's performance consist of two holes with increasing diameters (D 1 and D 2 ), 8 and 10 mm respectively, an inter-axis distance (b) of 15 mm, and a depth (d) of 3 mm (Fig. 2a). We select these specific dimensions to minimize the intra-class variation error between the two circles during the recognition process, and the effect of a light reflection on the internal walls of the markers. We analyze two different pairs of markers (through holes and blind holes) on two samples made of different materials. Blind holes present conic ends, and the depth d is defined as the distance from the vertex of the cone to the top surface of the sample.
Holes are drilled with a CNC machine (Kern, HSPC 2522with 1 μm of accuracy) on two rigid blocks either made of polymethylmethacrylate (PMMA) (a non-reflective material) or steel (reflective material), as shown in Fig. 2b, c, and d, respectively. The specific selection of such materials is driven by their large use in industrial and research applications (e.g., biomedical prosthetics [27,28], lithography, mechatronic [29]).

Detection Algorithm Architecture
MechaTag is driven by a dedicated parametric and closedloop detection algorithm, schematically shown in Fig. 3. The first step relates to pre-processing and segmenting the acquired image before its elaboration. Pre-processing includes three sub-steps: the binarization of the image, the application of a non-linear filter to minimize the noise and preserve edges, and the extraction of the marker's edges. The segmentation procedure uses a threshold to distinguish the marker from the background: it is an easy and parametric method already validated in a previous work [30]. Then, we apply a median filter, namely the Canny filter [31], to the thresholded image to reduce the noise from external lighting conditions, and to preserve the edges of the markers [32].
The second step consists of a first attempt to detect of the marker. First, we analyze the hierarchy and the topology of the contours in the image, and we choose the most internal edges (S. Suzuki 1985). Then, we run an algorithm to iteratively extract any further parametric feature. If the image does not possess any internal contours, it is pre-processed once again with an incremented value of the threshold for the segmentation. Otherwise, we approximate each contour with a tightfitting convex boundary around the points or the shape [33], and we evaluate the geometric features of each contour. The most important topological parameters are the numeric values  of the marker's area (in pixel) and the circularity of the marker in order to avoid false positives due to the noise [34]. Therefore, the algorithm detects a number of different contours that are further processed. Among them, we choose the contour with the maximum area and apply to this a spatial filter. Specifically, we create a spatial filter that is markerdependent and is able to reduce the amount of data to process and the false positives. We extract only the targeted information convolving the spatial filtering to the entire image. As result of the convolution, we analyze the marker in order to extract its main features (e.g., area, circularity, convexity). In this step, we perform a more restricted evaluation of the geometric features by adding the difference between the actual ratio of the squared perimeter and the area of the single marker, and the one estimated by the algorithm. This method adds an information on the real dimension of the fabricated marker, useful during the recognition process, in order to prevent the consideration of larger or smaller blobs in the environment. At this step, we introduce a second feedback. If the detected marker does not reflect the restricted evaluation of the geometric features above described (e.g., area, circularity), the image is pre-processed again. In contrast, the main geometric properties of the fiducial marker are extracted, visualized and stored: the planar position of the center of each circle, the numeric values of the area in pixel and circularity, and the identity of the marker (viz., if the circle is the smallest or the largest one).

Implementation Environment
The image is processed with a C++ code using OpenCV libraries and built on Visual Studio platform. The implementation environment is compatible with the operating system of the robot controller. Furthermore, the host-client architecture involves the main computer as host, able to send input signals to different clients as robot controller and vision system using an asynchronous communication and managing the overall information to estimate the desired output.

Experimental Design
We acquire images with a 2D camera (Baumer VCXG-24C) with a CMOS sensor (1920 × 1200 resolution) and a 12-mm fixed focal length lens with the possibility to dynamically change the parameters (i.e., gain, illumination, contrast) for the acquisition.
We endow the camera with a radial light source, able to create a concentrated bright white light on a 25 × 25 mm area at a working distance equal to 200 mm, Fig. 4. In the presence of reflective material, tests are performed with the light source. The camera is linked to the main workstation transferring information and images automatically through an Ethernet port. We perform the recognition process in stationary conditions, without any relative movement between the camera and the target. We set up the fiducial marker system in an arbitrary position, while the camera is installed on a controllable mechanical arm (Mitsubishi Electric Industrial Rob ot CR750/ 700/500 series) to acquire a large number of frames from different orientations. For each robot configuration, our algorithm performs the detection of the markers.
To assess MechaTag's performance, we analyze 8 cases with different conditions (Table 1).
For each of these, we analyze the outcomes of the detection algorithm from the pre-processing of the images to the identification and classification of the marker (Fig. 5) We test each case performing a sensitivity analysis on a set of parameters.
We estimate the Root Mean Square Error (RMSE) by taking into account the planar contour of each hole and their centers, as features, according to the following equation where b θ is the parameter for the estimated frame, and θ relates to that of the reference image. In our case study, the estimated image consists of the image with the contours of the detected and estimated mechanical markers, while the reference image consists of the image with the contours of ideal markers. Ideal in which the "recognized features" are the real features detected from the algorithm, while with "total features" we refer as the contours, centers and classification of the two holes of MechaTag. In addition to the performance assessment of the algorithm, this parameter gives a quantitative measurement of the efficiency of the system among different acquisition conditions. Concerning the geometrical evaluation of the markers, we estimate two features: circularity and eccentricity of the holes, according to the following equations: where A, P, c and a are the area, perimeter, and semi-axes of the ellipsoidal contours, respectively, which serve to measure the circularity of the markers. For each case, we present these parameters as mean ± standard deviation values.
To assess the performance of MechaTag along with its detection algorithm, we test three different working conditions: distance, orientation and illumination variability.
Performance under distance variability is assessed by changing the working distance from 100% to 150% of its nominal value of 15 cm, using steps of 10% each.
Performance under rotation variability is assessed by rotating the camera along the three axes of the fixed reference system of the robot base in which the z-axis is oriented perpendicularly to the ground, while test samples lay on a surface parallel to x-y plane. The rotation along x-and y-axes ranged from −10°to 10°, with 5°steps; the rotation along z-axis ranged from −15°to 15°, with 5°steps. All the tests are performed by both switching on/off an industrial illuminator installed on the robot arm and collinear with the camera, in order to assess MechaTag performance under illumination variability.
This operation variability allows us to test the robustness of the system against blurring, distortion and defocusing.

Results
Figures 5, 6, 7, and 8 report the performance of the detection algorithm with the array of working conditions described above. Nomenclature for case studies is reported in Table 1. Figure 4 shows the principal imaging elaboration phases of the detection algorithm. The first one is related to the preprocessing and segmentation phases: the image is binary as the thresholding methodology provides a black and white image (Fig. 4a). These elaborations are implemented for enhancing the features of the image to minimize the presence of the noise from the sensor acquisition. The second step is the edge detection: the image shows the extraction of the most important features, the edges, from the pre-processed image for focusing the computational cost and the attention only on target features (Fig. 4b). The third step is the edge evaluation and spatial filtering: it is an additional step to remove false positive information, to identify the region of interest and accelerate the image process in terms of computational time and iterations (Fig. 4c). The final steps are the marker analysis and identification: a detailed survey of the features in terms of geometrical features of the edge related to each single circle and the link between the two of them (Fig. 4d). Then, the algorithm provides the classification of the circles in terms of geometrical features (Fig. 4e).
By considering different working distance conditions, asymmetric boxplots with significant skewness are observed in all cases except for PBNI and PTCI, for which a Gaussian model describes the distributions. Figure 5 shows the boxplots related to the estimation of the RMSE measurement. This statistical value is correlated to distance, defocusing and illumination variabilities.
RMSE ranges from 0.36 to 0.65 with the only exception of the PBNI and PBCI. Concerning these two cases, a RMSE value could not be estimated since the detection algorithm is not able to detect these kinds of markers if working distance is above 130% of nominal value, independently of illumination conditions, due to the presence of a very low contrast between the target and the background. If we consider only the illumination variability (light on/ off), the mean value of RMSE reaches 0.72 and 0.47 for PMMA and steel, respectively. The third benchmark that we studied is the orientation variability along every axis of the 3D reference frame along with illumination conditions. Figure 5b, c and d show the variation of the RMSE values in relationship to camera orientation along x, y and z -axis, respectively, and illumination variabilities, during the variation of the rotation along the axis of the robot not only in clockwise direction but also specular direction, experimenting defocusing of the target when the angle reaches a working distance higher or lower than the nominal and illumination variability, respectively.
By considering the variations along the x-axis, the detection algorithm is always able to recognize MechaTag with results comparable between materials with an average RMSE equal to 0.59, ranging from 0.57 to 0.64 and the coefficient of variability of 3%. As for the variations along the yaxis, results present a mean value equal to 0.60, ranging from 0.52 to 0.72 and a coefficient of variability of 7%. Concerning the variations along the z -axis, the results present a mean value of 0.60, ranging from 0.56 to 0.64 and the coefficient of variability is equal to 3%. Results for the variations of yaxis present more variability but non-significant different from the results related to the variations along x -axis. This scenario describes all tested benchmarks except for the PBNI, for which the RMSE is higher or undefined since the detection algorithm is not able to detect the marker for all kind of orientation. Figures 6 and 7 show the boxplots related to the estimation of the eccentricity and circularity measurements in all the environmental conditions and for each MechaTag hole with minimum and maximum diameters. As Fig. 8 shows, the value of the circularity is, in all cases, at its maximum value (~1). As for the eccentricity, Fig. 7 shows a mean eccentricity value of 0.16 for the minimum-diameter hole and a mean variability of 3% concerning the distance variability. For the maximumdiameter hole, the mean eccentricity is 0.12 with a mean variability of 2%. Along the x-axis variability, the mean value is 0.19 with a mean variability of 4% for the minimum-diameter hole, 0.20 with variability of 5% for the maximum-diameter hole; along the y -axis the mean value is 0.18 with a mean variability of 6% for the minimum-diameter hole and 0.18 Fig. 6 The boxplots are related to the estimation of RMSE: a) The RMSE is related to distance, defocusing and illumination variabilities. b), c) and d)) The RMSE is related to camera orientation along x, y and z-axis, respectively, and illumination variabilities Fig. 7 The boxplots are related to the estimation of eccentricity of the hole a), b), c) and d) with the maximum diameter, e), f), g) and h) with the minimum diameter. The values have been tested in different benchmark: a) and e) distance variability, b) and f), c) and g), d) and h) x, y and z axis variability, respectively with a mean variability of 7% for the maximum-diameter hole; along the z -axis, the mean value is 0.24 with a mean variability of 6% for the minimum-diameter hole and 0.24 with mean variability of 7% for the maximum-diameter hole. Excepting for the PBNI case study, where the value is higher, the value of eccentricity is near to the minimum value for the overall cases of study. Table 2 shows the success rate of the recognition rate (%RR) using a colorbar ranging from dark red (0%) to light green (100%): the dark red means no capability to recognize the marker, while the light green means the maximum value of recognition rate with the capability to recognize the marker and all its features. The success rate of the recognition is maximum for through holes not only for the PMMA but also for the steel samples, in all tested working conditions. It has the minimum value for the blind hole in PMMA samples and without the presence of the illuminator. In this case, the intensity-based segmentation fails since the object to be recognized and the background have similar light intensities. Finally, concerning the specific case of the blind hole, the best performance is achieved with steel samples since PMMA samples do not show sufficient reflectivity and, therefore, while the center of the hole can be recognized, the edge cannot be fully identified. All the cases studies and their associated measurement are reported in the Table 3.

Discussion
We present a mechanical fiducial marker, named MechaTag, along with its detection algorithm that has the potential to be useful in a large number of robotic applications, in which intrinsic mechanical features (e.g., holes) can be used to reference objects in the three-dimensional space. In order to test our approach, we use as case study the recognition of blind and through holes fabricated on two materials (viz., steel and PMMA) to evaluate the identification capabilities of a robotic system in different environmental/working conditions: distance, orientation, and illumination. Degol et al. introduced a squared fiducial marker, called.
ChromaTag [9] with a detection algorithm based on the identification of the color information, integrated on the ChromaTag, in order to minimize the amount of false positive from grayscale acquisitions. They compared their system with other state-of-art fiducial marker systems such as AprilTag and RuneTag in terms of accuracy, varying the working distance and colour information. ChromaTag is a good choice for short and long working distance achieving the 94.4% of accuracy and it is robust to colour variation with an accuracy of 68%. In contrast, considering the tested set-up and environment conditions, MechaTag achieves a 100% of accuracy for the majority of the case studies also varying specific boundary conditions. Only for the PBNI and SBCI cases the accuracy decreases if working conditions a) and e) distance variability, b) and f), c) and g), d) and h) x, y and z axis variability, respectively Table 2 Recognition Rate for each of the case studies vary, due to the low contrast between the target and the background. In particular, poor reflectivity of PMMA decreases contrast in the image, not allowing to distinguish the blind hole area with respect to the background. In terms of the color robustness, our fiducial marker is not affected by the white balancing of the imaging because of the constant illumination conditions granted by the illuminator. In case of light off, only PBNI and SBNI are affected by the lack of an additional and constant light conditions. As a consequence, the target is not distinguishable from the background and with our proposed detection approach it is not possible to detect it. Bergamasco et al. [10] proposed the fiducial marker system, called Pi-Tag, for estimating camera pose. Their marker is composed of 12 dots distributed with specific patterns on a square profile. They chose the dot since it is easy to fabricate and, if dots are small enough, the perspective error is negligible. They tested the detection algorithm under heavy artificial noises, blur and severe illumination conditions. They observed a good recognition rate under the influence of Gaussian noises or very angled camera positions taking into account the case study of the modality of marker acquisition or blurring effects. Furthermore, they observed another possible cause of errors if dots are used as markers. If the marker acquisition is performed so far and angled from the reference system, the dots become too small and blended, so in this case, the cause is related to the resolution of the used vision system. In contrast, MechaTag is able to achieve the highest recognition rate independently of the boundary conditions, excepted for PBNI and SBNI, where the lack of an additional illumination affects the detection of the target from the background. Moreover, the high resolution of the employed camera does not introduce any blending and overlapping of the marker images even if they are far away from the camera and the dimensions of the hole edges are small to capture. In fact, the two targets are always distinguishable each other and their features preserved. Calvet et al. presented a circular fiducial consisting on a planar pattern with concentric rings [21]. They experimented their fiducial marker system by using synthetic images and testing the performance under the influence of blurring, lighting conditions, and varying working distances. The achieved detection rate of their proposed fiducial marker system equals 94%, but decreases down to 22% under the influence of blurring. By varying the distance, the detection rate decreases from 100% to 80%. In our case, we reveal a higher or comparable detection rate in the same conditions without decreasing the accuracy of detection. A final but important difference among the fiducial marker system proposed by the other research groups and our fiducial marker is the material of the marker. Scientists have proposed and tested fiducial marker systems designed on paper supports, having the possibility to set the contrast level between the target and the background, with the aim to increase it for a fast and accurate recognition of the marker. We introduce a fiducial marker system on rigid materials, used for a wide range of robotic applications, such as robotics, manufacturing, localization and tracking.
Moreover, it is reliable in harsh industrial environments without losing accuracy of recognition due to the shape and material. This prevents any deformation of the marker itself, which can influence the calibration and the estimation of the distance between the camera and the target. However, it presents some light reflection issues, due to the material, and low contrast between marker and background, due to the fact that it is built directly on the same material to localize. We provide a mechanical marker made by a mechanical process with high precision as it was fabricated with a Computer Numerical Control (CNC) machine [35], and therefore it can lead to a more precise recognition. With respect to the state-of-the-art fiducial marker, it has been tested that MechaTag can be recognized in a high variety of working conditions (distance, orientation and illumination), while its material allows for its use for many industrial and research applications and in many environmental conditions.

Conclusion
We present MechaTag, a fiducial marker and its detection algorithm that exploits mechanical features (e.g., holes) of a three-dimensional component for a precise reference in robotic applications. In contrast to state-of-the-art systems, MechaTag does not exploit paper-based markers, but intrinsic mechanical features. This peculiarity makes our system a reliable tool in challenging industrial environments. MechaTag also shows high robustness under different light environment conditions, varying colour and reflectivity of the material where they are fabricated (PPMA is white with low reflectivity, while steel is grey with high reflectivity), under blurring, defocusing and different working distance situations, showing a very satisfactory detection rate thanks to its dedicated detection algorithm. Due to its robustness, MechaTag can be applied to a very wide range of fields, also in industrial environments, since it is inert to humidity, operating incidence, or overheating: for example, it can be used as a reference for recognizing and localizing components in manufacturing cells, for machining or pick-andplace purposes. The system is very promising and it is possible to improve it by testing more kind of materials, making it suitable also for outdoor usage; additionally, for the cases in which an explicit recognition algorithm has been shown not to be effective, i.e. PBCI and SBNI cases, a neural network approach (for example a Convolutional Neural Network) can be introduced, in order to extend the applicability of the MechaTag also for low-contrast image; neural network could also help to increase the recognition rate. With the deep learning approach, the algorithm will be able to classify the image to identify the presence of the target, and then localize the target in the image. Next steps will explore the influence of not only different shapes, but also different environmental conditions, such as dusty environment and different colors in order to improve MechaTag's robustness.
Acknowledgments This work was possible thanks to the fruitful joint action between the BioRobotics Institute of Scuola Superiore Sant'Anna, and Baker Hughes Company, which has strongly believed in the collaboration between industrial companies and research centers. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Table 3 For each case of study the statistical evaluation is performed; the value of Root Mean Square Error (RMSE), circularity (f circ ) and eccentricity (e) are analyzed in terms of mean value, standard deviation and the Recognition Rate is registered in order to record the performance of the recognition algorithm and the developed custom vision system in four kind of different conditions. These conditions are related to the variability of environment features and the position of the vision system during the acquisition. We experimented different orientation of the vision system along each one of the axis of the 3D space of the robot. As main consequence of the variation of the position of the vision system, we registered different environment conditions as defocusing of the target, illumination variability on the target getting away or closing from the target, inclusion of other disturbing features or occlusion of the marker, blurring of the targe t