Platform
The stone wall was created using HEAP (Hydraulic Excavator for an Autonomous Purpose), a highly customized Menzi Muck M545 12t walking excavator that was developed for autonomous applications and advanced teleoperation (Fig. 2J). The fully mobile 23-axis excavator is capable of lifting objects to a height of 9 m and can freely manipulate items weighing up to 3000 kg. Adaptable to complex terrain, the machine is equipped with force-controllable hydraulic cylinders that allow for control of ground interaction forces, and cabin- and chassis-mounted inertial measurement units (IMUs) that track machine orientation. Global cabin localization is achieved by a Leica iCON iXE3 with two global navigation satellite system (GNSS) antennas and a receiver (Fig. 2h). Real-time kinematic (RTK) corrections for the GNSS signals are received over the network from permanently installed base stations for improved accuracy. Draw wire encoders are used to measure the position and velocity of the hydraulic arm- and grapple cylinders.
For exteroceptive sensing (scanning of the environment, available stones, and in-progress structure), two Velodyne Puck VLP-16 LiDAR scanners are placed at the front edge of the cabin’s roof to provide 3D scans. The LiDARs are mounted orthogonally to each other, such that scan points accumulate both while driving the robot around and while pivoting the cabin (Fig. 2g). For a more detailed description of the system’s sensors and actuators, we refer the reader to Jud et al. (2019).
All software components of the project are written in C++, and the robot operating system (ROS) is used to transfer data over the network between the different software nodes distributed on several computers. The ROS master, managing the connection between processes, is located on the on-board computer of HEAP.
Stone localization and scanning
To reconstruct and localize the stones in the surroundings of the excavator, we use the LiDARs mounted on the cabin roof (Fig. 2g). Before the construction process begins, the available stones are loosely distributed on the terrain (Fig. 3). This facilitates the detection of single stone instances by first segmenting the ground plane from the LiDAR map, and subsequently performing Euclidean clustering on the remaining point cloud to distinguish the single objects. These point clusters are used to compute an initial grasp configuration on the not yet reconstructed stone segments. Once a given stone is picked up using this relatively low-resolution data, a high-resolution scan of the full stone is completed while it is held in the gripper. In the following sections, we describe this stone reconstruction process, the LiDAR-based scene mapping, and the object pose refinement based on the reconstruction and scene map information.
Reconstruction For the 3D reconstruction, the end effector is spun with a continuous velocity in front of the LiDAR sensors while holding the stone. With the known geometry of the gripper, we filter out points that belong to the end effector while accumulating points on the stone. The complete contour of the stone is recorded by concatenating multiple point clouds from different viewing angles, and applying Poisson surface reconstruction (Kazhdan et al. 2006) to generate the surface mesh from the point cloud data.
LiDAR-based scene mapping A precise map of the excavator’s complete surrounding (including the as-built structure) is necessary for planning grasp configurations and arm motions, and for facilitating the decision of where and how to position stones. However, simple accumulation of single LiDAR scans is very temporally and spatially limited, as precise knowledge of the sensor’s motion is necessary (and error accumulation leads to drift over time). To create a consistent map over large time- and motion intervals, we use LaserSLAM (Dubé et al. 2017), a laser-based graph SLAM that considers the odometry constraints provided by the excavator’s state estimation, fusing GNSS measurements with the IMUs, and scan matching constraints provided by registering a new scan pair using the Iterative Closest Point (ICP) algorithm with a map consisting of the previous scans. Self-see points of the excavator’s arm or legs in the LiDAR scans are filtered to get a 3D point cloud, built locally around the current robot location, without being corrupted by its moving parts (Fig. 7a).
ICP refinement Being able to refine the pose of an already reconstructed stone in the environment (Fig. 7b) or in the gripper is an essential capability, as the stone might settle during placement or shift as it is being picked up. For this pose refinement, we first sample a point cloud on the reconstructed mesh of the stone. Using an ICP step, this point cloud is registered to the scene map (if the stone is placed in the environment), or to the accumulated gripper cloud (if it is grasped by the excavator). To improve the localization, we first segment the ground plane in the vicinity of the stone using RANSAC-based 3D plane fitting. Additional filtering is necessary, however, as stones might be cluttered and partially occluded, especially if they are constituents of the in-progress wall. Therefore, the scene map is further segmented using Euclidean clustering, and segments belonging to already refined stones or segments too far from the assumed pose are removed before the ICP step. This allows for the reliable pose refinement of objects even if they are partially occluded by the built structure. If the initial orientation of the stone is unknown (e.g. after it has been overturned), the ICP step is performed for multiple initial orientations and the refined transformation with the highest ICP-score is selected.
Geometric planning
The iterative selection, positioning and orienting of each stone is determined by a geometric planning software that attempts to construct a double-faced wall bounded vertically by any user-specified mesh target surface, and below by the LiDAR-scanned elevation map of the existing terrain. Given any number of available stones, the software determines the preferred placement from the available solutions, and the selected stone and the desired transformation are further processed for grasp planning and physical positioning. Rather than pre-planning the entire wall, the software computes solutions on the fly, stone by stone. This is done for two reasons:
First, for sufficiently large constructions, it is impractical to pre-scan, sort, and store hundreds or thousands of stones when the required space and complexity can instead be minimized if they are salvaged, unearthed, or delivered on-demand, as needed. Computing solutions on the fly allows for a more adaptive construction that is not held up if one pre-planned stone goes missing.
Second, the absolute rigidity of stone makes for significant error propagation that would result in undesirable deviation from the desired geometry, and higher probability of structural collapse in turn. Any minute difference in the scanned geometry can cause unexpected collisions or settling during placement, and the current setup allows for these changes to be automatically accounted for in the next solution.
Finding a suitable position of even one stone in a relatively small area can be a computationally heavy problem, and the irregular nature of the stones and substrate geometry make it generally difficult to apply stochastic solvers or continuous optimization methods. The software should generally find a valid solution in less time than it takes for the excavator to physically locate, grasp, and place a stone, such that the finding of solutions does not significantly deter progress (at present, the solver takes approximately 1 min per stone on a laptop running Ubuntu 18.04 with an Intel i7-8750H 4.1 GHz Processor and 16GB RAM).
Much like the algorithm presented by Liu et al. (2019), we make use of a number of heuristics derived from conventional stone masonry techniques to reduce the size of the solution space. However, our approach differs significantly in the specification and sequencing of these heuristics: we use both the geometric properties of the stones and the constraints of the designer-specified goal surface to inform an algorithm that begins with fast computational checks, and increases in complexity as the solution space is substantially reduced.
For any given stone placement, the following high level goals are established: determine a structurally stable solution that minimizes the volume directly under the stone in question (Fig. 4a) and also minimizes the volume between the exposed stone face and the designated geometry of the wall surface (Fig. 4b). To achieve these objectives, we implement the following subroutines:
Stone attributes At all times, the software holds an inventory of stones that have been scanned. Upon receipt of new information from the scanning node, the stone mesh is immediately cleaned by removing any degenerate faces and disjoint remnants that might remain from the Poisson reconstruction. The mass properties are computed using the method described by Bacher et al. (2014), and each stone is repositioned such that its centroid is at the origin, and its principal axes (computed using PCA on the mesh vertices) are aligned with the global frame. An iterative implementation of the variational shape approximation method (Cohen-Steiner et al. 2004) is used to determine approximate partitioned face surfaces on the stones given an allowable normal deviation per clustered region of mesh triangles. From these partitioned regions, a face-adjacency matrix is generated, stone edges and edge midpoints are identified, and the mean weighted normal of each region is used to represent the normal direction of each partitioned face surface (Fig. 4c).
Search space A volumetric search space (Fig. 2c) is defined as the region bounded by the designated outer wall surfaces on all sides and above, and bounded below by the ground and the upper surface of the in-progress wall. While this upper surface is in reality generally composed of many disjointed stones, it is approximated as a single continuous surface using a simulated LiDAR scanner on the known mesh data of the ground and placed stones, and meshed using Poisson reconstruction. Thus, both the empty ground and the upper bounds of a half-built wall are treated similarly as the foundation for the next stone. The intersection between this reconstructed “ground” surface and the target wall face surfaces defines the rim path, which we identify as the upper perimeter edge of the structure at any given state (the contour where the top surface of the as-built structure intersects with the vertical goal wall surfaces)(Fig. 2d).
An even sampling along the rim path serves as a one-dimensional list of possible placement positions, and this list is further reduced to only include lower points within a given distance of identified keypoints (Fig. 2d) along this path: keypoints are defined as any point along the rim path that lies either between two stones, where one stone meets the ground, or at the corners or ends of designated structure. This filtering process ensures that the wall is generally built up evenly, and reduces the likelihood of running joints that weaken the bonding of the wall.
Fit estimation Given the known edge midpoints, edge directions, and face-normal vectors on the stones, and the normal vectors of the goal wall surface at each point on the rim path, it is trivial to determine the transformations necessary to place any specified stone on this path—such that a given stone edge is tangent to the rim path, and the stone face is aligned with the wall surface. While each combination of path point and stone edge could be sampled (Fig. 4d), this still represents a substantially large solution space if the wall is long and there are multiple known stones in the available inventory. Just as a stone mason would likely only try placing stones that look like they will fit (instead of brute force sampling every possible position), we first check if a stone is geometrically similar to the region in question using rotationally invariant shape descriptors before attempting any more involved processing (Figs. 4e, f, 5). Specifically, we use the FPFH descriptor (Rusu et al. 2009) at each edge midpoint and at each rim path vertex coupled with a k-nearest neighbor (KNN) search to find the k-closest geometric matches at each included point on the rim path (\(k\approx \)10). In this demonstrator, the KNN search typically reduced over a million possible combinations of stone edges with rim path points to some thousands. Because of the rotationally invariant nature of such shape descriptors, each match must consider both possible orientations of an edge with the rim path (Figs. 4g, 4h).
Stability heuristics Prior to running any involved physics simulations, we remove potential candidates that are likely unstable, or likely to reduce future stability, informed by guidance from stone masonry literature. In dry stone walls, it is generally advised to not place stones in such an orientation that their thinnest dimension defines the depth into the wall—thus acting as a thin veneer that is more likely to fall away from the wall (Environmental Action Foundation 2019). We generalize this requirement as needing to meet a minimum ratio between the horizontal (\( d \)) and vertical (\( h \)) distance measured between the specified point on the rim path and the stone centroid (Fig. 4i):
$$\begin{aligned} d / h > 0.5. \end{aligned}$$
(1)
As the construction process uses a double-faced method, the structure benefits from having an inward slope at the top of each placed stone—such that any settling is minimized and supported by stones on the opposing side of the wall (Vivian 1976).Footnote 1 For each intended position, we take an even sampling of the exposed stone surface using a raycasting method, and compute the mean top-exposed surface normal (\(\hat{\mathbf {n}}\)) from the stone normal direction at each hit point. We then verify that \(\hat{\mathbf {n}}\) points into the wall and that the stone surface is not excessively steep by
$$\begin{aligned} \hat{\mathbf {e}}_\mathbf {y} \cdot \hat{\mathbf {n}}>= 0 \quad and \quad \hat{\mathbf {e}}_\mathbf {z} \cdot \hat{\mathbf {n}} > 0.7 , \end{aligned}$$
(2)
where \(\hat{\mathbf {e}}_\mathbf {y}\) is the unitized inward-facing surface normal of the target wall surface projected onto the horizontal plane, and \(\hat{\mathbf {e}}_\mathbf {z}\) is a unit vector pointing against the direction of gravity (Fig. 4I).
ICP refinement Once the pool of potential matches has been reduced by the aforementioned heuristics, each candidate is refined using ICP to better align it with the surrounding stones and goal wall surface.
Physics Following ICP refinement, the remaining solutions are sorted by their ICP score, and (if specified) reduced to a preset maximum number of solutions. These are then simulated for stability with the Bullet physics engine (Coumans 2015), and solutions that do not reach equilibrium within a fixed simulated-world-time period or within a distance moved are ruled out.
Scoring The remaining stones after the physics simulation step are sorted using the combined parameters from Fig. 4a and b, where the gap volume is calculated with a raycasting technique measuring the displacement of hits measured from local plan- and elevation projection planes with and without the placed stone (and considering the known stone volume). The best match is then sent for physical placement. A provided Imgui interface (Cornut 2005) allows for alternative high-scoring solutions to be selected manually, if desired.
The geometric planning software makes extensive use of libigl (Jacobson et al. 2018) for mesh processing, PCL for point cloud operations (Rusu and Cousins 2011), and openframeworks (Lieberman et al. 2005) for visuals (Fig. 6).
Grasp and placement execution
The goal of the grasp pose planning is to find viable grasping configurations that allow the excavator to pick a desired stone and place it at the planned location without collisions with the existing wall (Fig. 7e). A grasp configuration is defined as 6-DoF pose of the excavator’s gripper, where contact with the desired stone can be performed. To find a viable grasp configuration that respects the collision constraints, we sample a large amount of grasp hypotheses on the stone of interest in the map point cloud. Those grasp hypotheses are validated for collisions by intersecting a polyhedral gripper model with the map cloud and searching for inliers. Note that both the grasp configuration and the corresponding placement configuration must be verified to be without collisions (Fig. 7d, f). Due to occlusions, the map point cloud may contain holes that can result in undetected collisions. To prevent these situations, the map point cloud is augmented with point clouds generated from the meshes of already localized objects (Fig. 7g). After ruling out colliding configurations, the remaining grasp poses are evaluated for force closure and scored by task-specific criteria (alignment to ground, grasp encompassment, and distance to stone CoM) to obtain the final grasp.
A motion planner is used to generate a spline trajectory for reaching the desired grasp pose and for moving from the grasp pose to the placement pose (Fig. 7c). The approach direction during placement is chosen by raycasting to provide the largest margin to obstacles. Figure 7 shows the spline trajectory (c, blue) for moving the grasped stone from the ground to its desired placement pose (e, red). Each time a stone is grasped or released, its pose with respect to the map or end effector is updated with an ICP refinement step using its previous location as the initial guess (Sect. 3.2).