1 Introduction

Enabling spatial play, construction toys serve an important role as metaphors for scientific principles, spatial skills and math aptitude (Eisenberg 1998; Jirout and Newcombe 2015; Martin 2007a; Tracy 1987). The improvement of the performance of construction toy players in terms of building time and number of building errors contributes to the players’ overall learning (Christensen and Biskjaer 2018). The performance heavily depends on the instructions that guide the construction process. There are two primary tasks in designing the instructions: (1) planning a sequence of assembly operations for users to understand and follow easily, and (2) presenting the assembly operations clearly in a series of diagrams (Agrawala et al. 2003).

Due to the significant potentials of augmented reality (AR) to benefit manufacturing, building construction, and part assembly, this project aims to investigate and develop an AR-based instruction method and validate its advanced features, such as high accuracy of registration and realistic object/hand occlusions, for enabling small parts assembly, represented by construction toys. The research question is how AR instructions can be created to enable such assembly. The major finding of the research is that the integration of the advanced features makes AR instructions possible for small parts assembly, validated through a working AR prototype for constructing LEGO Arc de Triomphe, quantitative measures of the accuracies of registration and occlusions, and heuristic evaluation of AR instruction features. The heuristic evaluation has led to findings that the present method could advance AR instructions in terms of enhancing part visibility, match between mental models and visualization, alignment of physical and virtual parts in perspective views and spatial transformations, tangible user interface, consolidated structural diagrams, virtual cutaway views, among other benefits for guiding construction.

2 Background and literature review

Major assembly and construction projects are increasingly complex (Wood and Ashton 2010). In recent years, LEGO sets have also become more complex and labor intensive (Lindstrom 2016). Some sets have thousands of pieces, e.g., LEGO 75192 Star Wars Millennium Falcon has 7,541 pieces and an instruction booklet of 496 pages. In the meantime, augmented reality (AR) can superimpose digital images on the real-world view of users, having the potentials to benefit manufacturing, building construction, and part assembly significantly. AR has been studied as education and research tools, e.g., in the interactive AR assembly guidance and learning of Tou-Kung (a sophisticated cantilevered bracket system in traditional East Asian architecture) (Chen et al. 2018), in tutoring machine tasks with AR-assisted spatial interactions (Cao et al. 2020), and in the research of spatial design and urban planning problems by projecting data visualization onto LEGO models (Alonso et al. 2018). The LEGO Group uses AR to mix touch-screen game components, such as audiovisual effects, with the physical models, in order to make LEGO play more engaging and fun (Kobie 2017). However, one of the core LEGO building experiences—the construction process—may also be enriched from the transformational AR technology, which is the goal of BRICKxAR.

2.1 Benefits and limitations of 3D model-based AR instructions

A recent study about a 2D projection-based AR-assistive system shows only small benefits in the training scenario, in which AR training does not reach the personal training in terms of speed and recall precision after 24 h (Büttner et al. 2020). However, many studies have verified that 3D model-based AR instructions can significantly save task completion time (Henderson and Feiner 2011) and reduce the error rate (Richardson et al. 2014; Tatić and Tešić 2017). Funk et al. proposed using Duplo and an artificial industry task for evaluating AR instructions in terms of assembly time and errors, with paper-based instructions as baselines (Funk et al. 2015). Duplo is a larger version of LEGO with each dimension of width, depth, and height doubling the corresponding dimension of the standard LEGO, meaning that the AR instruction in Duplo requires less accuracy than in LEGO in general. Westerfield et al. evaluated participants’ performance for assembly of computer motherboards and found significant improvement using AR with a feedback system (Westerfield et al. 2013). They also pointed out the drawback of limited accuracy in AR tracking. Schwald and de Laval presented an AR prototype for assisting in equipment maintenance and concluded that the results were positive, but improvements were needed for the accuracy of 3D augmentation (Schwald and De Laval 2003). Tang et al. compared an AR-based instruction system with a printed manual, computer-assisted instructions on a monitor, and a Head-Mounted Display for Duplo assembly (Tang et al. 2003). Their user evaluations support the proposition that AR systems improve assembly performance significantly. They found, however, the limitations of tracking and calibration techniques being the biggest obstacles. Similarly in (Hoover et al. 2020), research has confirmed the benefits of using AR instructions over traditional digital or paper instructions and found that even though HoloLens AR instructions are a preferable alternative to traditional, 2D assembly instructions and tablet-based AR, the position of the virtual objects in HoloLens AR did not align correctly with the assembly parts. Research further reveals that AR instructions for assembly tasks can advance learning experience in sophisticated subjects. For example, AlNajdi et al. (2020) have conducted a study based on assembling a robot module, and demonstrated that the AR approach is more effective in learning achievement, learning activity enjoyment, and usefulness than the paper-based instruction approach in their experiments. In their assembly process, the AR system guides the learner step by step to assemble the robot module and learn the functions of the module components. Their superimposed virtual objects for instruction are virtual arrows, text overlays, and images, which are shown in front of the physical module through an iOS device (iPad) to guide the assembly process. As the research focus is more about the learning experience during the assembly process rather than the assembly process per se, there are no 3D models of the module components visualized in the AR instruction.

2.2 AR registration

In 3D model-based AR instructions, registration accuracy (on the mean localization) is required based on the actual tasks. To enable small parts assembly (such as LEGO) using AR instructions, improving model registration is needed. For example, many LEGO bricks are very small in size: 1 LEGO unit is 1.6 mm (the thickness of the plastic wall) and the stud’s diameter is 3 units (= 4.8 mm). Low accuracy will result in significant misalignment between virtual and physical models, and thus errors in construction. A comprehensive review of AR for assembly points out that accuracy and latency are the two critical issues (Wang et al. 2016). In a wood structure construction project using AR, the virtual object overlaying on the physical component causes uncertainty around positioning; also, the virtual object is not stable but shaking (Qian 2019). Using GearVR and a marker-based approach for constructing a timber structure has errors of 10–50 mm between virtual and physical objects (Abe et al. 2017). A recent workshop presents a HoloLens AR platform that enables interactive holographic instructions with a prototypical project to design and construct a pavilion from bent mild steel tubes (Jahn et al. 2018). The digital design model and the digitized physical model differ by at most 46 mm, with an average of 20 mm across all parts, attributed to human errors in construction, physical model self-weight and contortion, and holographic drift from inside-out device tracking (Jahn et al. 2018). In another collaborative construction project, some construction workers wear the HoloLens to instruct others without the device to build the structure, but the localization has an accuracy problem (Hahm et al. 2019). One of the findings from a survey of AR in the manufacturing industry in the last decade is that the marker-based solutions are usually the preferred tracking technology due to ease of implementation and higher accuracy compared with markerless solutions (Bottani and Vignali 2019). A recent marker-based registration for AR’s application with a tangible user interface in building design is demonstrated in (Son et al. 2020), in which a major limitation is identified being that their system only supports a single floor (essentially not 3D) AR scene.

2.3 Object and hand occlusions

Object occlusion is critical for correct depth perception in AR to ensure realistic and immersive AR experiences, but existing occlusion methods suffer from various limitations, e.g., the assumption of a static scene or high computational complexity (Du et al. 2016). An example is the usage of geospatial data to construct the geometric model of buildings in an outdoor environment to simulate occlusion between virtual content and real buildings in AR. While the simulated occlusion is achieved, the outcome is not realistic (Kasperi et al. 2017).

Model-based object occlusion has been studied for various AR applications, but there are few studies of object occlusions in AR instructions. Occlusion-by-contours was found to aid users by removing ambiguity about depth perception using contour rendering for virtual parts occluded by physical parts in assembly (MacAllister et al. 2017). However, the resulting images of the work demonstrate a registration accuracy problem and that the contour lines designed for the occluded portion of the virtual parts are not realistic for revealing an accurate spatial relationship between the physical and virtual parts. Thus, accurate object occlusion is still challenging to achieve for revealing the correct spatial relationship between physical and virtual parts in AR instructions.

Similar to the need for object occlusion, hand/people occlusion research is found in prior AR literature (Abate et al. 2014). A relatively recent study is on optimizing the consistency of object boundaries between RGB and depth data obtained by an RGB-D sensor. While the approach achieved accurate hand occlusion, the performance is near real time on a tablet platform, i.e., 30FPS (frame per second) for a screen resolution of 640 × 480, and roughly 15FPS for 720P (Du et al. 2016). Still, there is a lack of research on hand occlusion in AR instructions.

3 Methodology

The methodology of the research consists of designing and prototyping an augmented reality (AR)-based instruction method—BRICKxAR—and validating the method through experiments and heuristic evaluation.

BRICKxAR’s software architecture diagram is shown in Fig. 1. The user interface (UI) design of BRICKxAR is illustrated in Fig. 2. The BRICKxAR software and its UI are designed to guide physical toy construction (LEGO as an example) by visualizing the construction process using virtual bricks in the right place at the right time, step by step. In the UI (Fig. 2), information augmentation is created for specific bricks about architecture and construction knowledge. Physical and virtual object occlusion is implemented to enable a natural appearance of virtual bricks on the physical model. Players’ hand detection and occlusion are accomplished to allow a realistic immersive AR experience, in which virtual bricks can be “grasped” by the real hand, revealing correct spatial relationship of objects. Enabled by the software architecture, the physical model can be moved and rotated freely on a desk surface, and the AR device camera can move in 6 degrees of freedom (DoF); in the meantime, high accuracy of AR registration—the virtual model’s alignment with the physical model—is achieved through an effective design of an image marker in BRICKxAR, working with a built-in computer vision-powered marker-based registration method on an iOS device’s AR platform and with the device’s camera and motion sensors (including gyroscope, accelerometer, and magnetometer).

Fig. 1
figure 1

BRICKxAR’s software architecture diagram

Fig. 2
figure 2

User interface of BRICKxAR. Left: UI screen (on iPhone) showing a construction step; right: UI screen (partial) when inserting a physical brick guided by a virtual brick

To validate the project contributions, LEGO Architecture 21036 Arc de Triomphe in Paris is built completely for its 386 steps with the working prototype of BRICKxAR in experiments for quantitative measures of the accuracies of registration and occlusions, and for heuristic evaluation of AR instruction features by comparing BRICKxAR with design principles of assembly instructions and AR design guidelines suggested by the literature.

4 Prototyping and implementation

The prototyping process in BRICKxAR research includes virtual model preparation, marker-based registration, step-by-step instructions, object and hand occlusions, and implementation of the prototype—an app on AR-enabled iOS device.

4.1 Virtual model preparation

A virtual model for AR needs to be prepared with CAD modeling tools. LEGO Arc de Triomphe is used as an example due to the availability of both the physical set and its virtual model (Legolizer 2018), as well as the moderate complexity of the set. The virtual model can be rendered in shaded and wireframe modes in AR (Fig. 3).

Construction steps need to be introduced to the builder in a logical order. Thus, virtual bricks are stored in the order using an indexed array. In general, a brick at the lower elevation will be assembled before that at the higher elevation. However, in some cases, there are bricks to be attached to an upper brick instead of a lower brick, when the upper one acts like a bridge and the new brick needs to be attached to its bottom.

Fig. 3
figure 3

Left to right: (1) LEGO Arc de Triomphe, (2) and (3) shaded and wireframe virtual models, and (4) In BRICKxAR, the virtual model (green wireframe) aligns with the physical model.

4.2 Marker-based registration

Images can be set up as AR markers for the alignment of virtual and physical models. To transform the virtual model to the right scale, location, and orientation for accurate alignment between the virtual and physical models, different design options of markers were tested using images inspired by AprilTags (Olson 2011). A most accurate and robust registration was achieved by combining two images into a larger, single marker, with black color pixels filling the empty areas (Fig. 4 left). It is also very flexible in that even if part of the marker is covered (for example, the upper left black color area is gradually covered by the LEGO set step by step), at each step, the image marker is detected and tracked with high accuracy (Fig. 4 middle and right), evaluated in Sect. 5.

Fig. 4
figure 4

Left: a scanned image of the LEGO pieces-made plate serving as the marker for AR registration. Middle and right: high accuracy of BRICKxAR registration. The virtual model (green wireframe) aligns accurately with the physical model

4.3 Step-by-step instructions

The virtual brick of each current step is rendered in the right place of the model to guide the player for physical construction (Fig. 5). During the construction process, important architecture information is displayed on the user interface for specific, relevant steps. For example, at Step 137, the photograph of the sculpture—Napoleon crowned by the goddess of victory—and text explanation are shown and linked to the LEGO micro-figure that represents the sculpture (Fig. 6).

Fig. 5
figure 5

BRICKxAR: shaded or wireframe virtual bricks guiding physical LEGO construction. Left and middle: 1st and 2nd virtual Bricks 15254 (below the red labels), and right: player assembling the 2nd physical Brick 15254

Fig. 6
figure 6

Step 137: photograph and text information about the sculpture—Napoleon crowned by the goddess of victory—are shown on the upper right, and linked to a LEGO micro-figure (Brick 90398)

4.4 Physical–virtual object occlusion in step-by-step AR instructions

The algorithm performing object occlusion in the step-by-step BRICKxAR instructions is shown below.

figure a

In the algorithm, Shader #1 is a shaded or partially transparent wireframe rendering function for depicting the transformation of the current virtual brick in the AR image to guide construction. Shader #1 allows the current step’s virtual brick to be visible fully or partially depending on whether there are existing physical bricks in front of it or not. Shader #1 is made through a combination of shader properties, including:

  • Rendering mode: fully opaque or transparent for mesh faces but opaque for edges

  • Cull: back, determined by depth testing in the camera space

Shader #2 is a transparent and occlusive rendering function for showing the previous physical brick after it is inserted. It makes physical bricks appear to occlude a virtual brick behind them. Shader #2 is made through a combination of shader properties, including:

  • Rendering mode: transparent (for both mesh faces and mesh edges)

  • Cull: back, determined by depth testing in the camera space

4.5 Hand occlusion and real hand grasping virtual objects

Hand detection and occlusion are implemented to enable "grasping virtual objects with real hands," potentially enhancing realistic immersive AR experience of players. Hand occlusion in AR instructions becomes necessary in situations such as Step 350 (Fig. 7), where the virtual brick appears to be in front of the fingers, demonstrating an incorrect spatial relationship between the virtual brick and the hand.

Fig. 7
figure 7

Step 350: virtual Brick 3028 appears in front of the hand

Using computer vision, the hand area can be detected. Then virtual objects can be inserted into the scene to cover the hand area and rendered to occlude the virtual bricks, while being transparent to reveal the hand.

4.5.1 Hand detection

Color segmentation is used for detecting hands by the skin colors. The target skin colors can be selected in real time by the player touching the hand area on the AR screen multiple times. The colors at the two most recent touched points are used as target hand colors, each with a predefined tolerance value. A grid of points on screen are compared with the target colors and labeled as hand points if their color values fall into any of the target color ranges. The grid density can be adjusted.

A “flood-fill” method is used to fill in the holes if they exist within the detected hand points. This will enable the entire hand area to be filled with hand occlusion objects, in case the color detection result leaves any holes because of inconsistent colors on hands due to various reasons, e.g., lighting. In addition, very small blobs resulted from color segmentation are removed as they are likely to be other objects with similar colors as hands.

In computer vision, the YCbCr color space is suggested for effective and efficient performance of skin color segmentation (Shaik et al. 2015). BRICKxAR is able to access the Cb and Cr channels in the video frames captured by the AR device camera (iPhone’s rear camera) in real time. Therefore, the Cb and Cr channels are used in hand color segmentation based on the video frames.

4.5.2 Hand occlusion

Hand occlusion objects (small 2D hexagons) are instantiated to cover all the detected hand points (within the point grid on screen) and the area around each hand point. The objects are transformed from the screen coordinate system to the world coordinate system to be located in between the camera near-clipping plane and the virtual bricks. These hexagon objects are rendered in such a way that they are transparent in order to show the hands, while occluding the virtual bricks (Fig. 8).

Fig. 8
figure 8

Left: without hand occlusion, the virtual bricks (wireframe or otherwise shaded) are not “grasped” by the hand, but appearing in front of the hand. Right: with hand occlusion, the virtual bricks (wireframe or otherwise shaded) are “grasped” by the real hand, just like the physical bricks

In BRICKxAR, the algorithm performing hand occlusion is shown below.

figure b

4.5.3 Real hand grasping virtual bricks

Compared with the literature, BRICKxAR has achieved not only accurate occlusion in real time, but also the realistic visual effect of “grasping” virtual objects, which is enabled partially by hand occlusion and partially by the actual grasping of the virtual model’s counterpart—the physical model (Fig. 8 right).

4.6 Implementation

For implementing the BRICKxAR prototype, AR hardware and software were reviewed and iPhone XS Max, Apple’s ARKit, the Unity game development software, and UnityARKitPlugin were selected due to iPhone’s advanced camera and motion sensors accessible from ARKit, Unity’s interactive gaming and graphics capabilities, and UnityARKitPlugin enabling Unity games to interact with ARKit. C# was used for Unity programming and Objective-C for the iOS app development.

5 Evaluation and validation

3D model-based AR has proved in the literature for improving assembly performance significantly (Henderson and Feiner 2011; Richardson et al. 2014; Tang et al. 2003; Tatić and Tešić 2017), but accuracy and latency are the critical issues (Wang et al. 2016), and the lack of concrete design guidelines of AR applications is a key barrier (Ashtari et al. 2020). Based on these findings, the validation of the BRICKxAR research is designed as follows: An example LEGO set (Arc de Triomphe) has been built completely with BRICKxAR for all the 386 steps in experiments for (1) quantitative measures of the accuracy of model registration, compared with the state of the art from the literature, (2) quantitative measures and visual examination of the accuracies of object and hand occlusions, and (3) heuristic evaluation of BRICKxAR features for enhancing instruction design, compared with major design principles of assembly instructions and AR application design guidelines, both suggested by the literature.

5.1 Improvement of registration accuracy

To evaluate the registration accuracy comprehensively, the LEGO Arc de Triomphe set has been built completely with BRICKxAR and the process was recorded using the AR camera. By examining the complete video of a total length 1 h 34′51″, at a resolution of 2336 × 1080 pixels (different from the AR screen resolution 2689 × 1242), at 29.92 FPS, the average error of the registration is found to be less than 1 mm throughout the entire model, when at least the major area of the marker is within the AR camera’s field of view. For example, Fig. 6 shows the camera’s entire field of view and the marker is partially out of the view, but the registration is still accurate, as seen with the alignment between the green wireframe virtual tiles and the physical tiles. Error is defined as the distance between the corresponding edges on the virtual and physical models, obtained through visually examining the video images and measurement on sample images. The error is affected by the following factors: (1) the fidelity of the original CAD model, (2) the attachment of physical bricks being tighter or looser with slight, free rotations made by a player, (3) ARKit’s image tracking algorithm, and (4) BRICKxAR marker design and parameter settings, e.g., the marker’s claimed size, which can be calibrated. The registration error propagates and increases from locations closer to the AR device camera and the marker to locations farther away. The marker image size also contributes to the error: The smaller the size, the bigger the error.

If the major area of the marker is covered (resulting in the loss of AR tracking), BRICKxAR will fix the virtual model in the latest tracked location. If at this time the physical model is moved or rotated, the registration error will increase from less than 1 mm to misalignment in millimeters then centimeters, and to the worst case where the virtual and physical models become apart (the drifting effect). However, as soon as the major area of the marker is uncovered inside the camera’s field of view, i.e., the most cases, highly accurate registration and tracking will resume immediately. In another experiment, a much larger LEGO set 10243 Parisian Restaurant containing 2469 bricks verifies the high accuracy of registration again (Fig. 9). Compared with the state of the art (Sect. 2), significantly improved accuracy of registration is achieved in BRICKxAR.

Fig. 9
figure 9

a Physical LEGO Parisian Restaurant model. b Virtual model of the 2nd and 3rd floors superimposed on the physical model. c and d Virtual cutaway views for revealing the hidden structure of the 2nd and 3rd floors in BRICKxAR

The large marker design with two patterned areas improves the marker-based registration accuracy, so that AR-based LEGO construction is made possible. In addition to the marker-based approach, 3D point cloud SLAM (Simultaneous Localization and Mapping) (Durrant-Whyte and Bailey 2006; Bailey and Durrant-Whyte 2006) was also tested in BRICKxAR, but the accuracy was not sufficient for the purpose of LEGO construction. However, if in the future a high-resolution true-depth camera or LiDAR can be used for understanding the physical LEGO model’s 6-DoF poses, accuracy for markerless tracking can be further investigated.

5.2 Accuracy of object and hand occlusions

The accuracy of object occlusion is measured by examining the entire 386 steps in the experiment of constructing LEGO Arc de Triomphe with BRICKxAR and watching the recorded video. During each step, when the AR camera moves around the LEGO, the physical–virtual brick occlusion works correctly, showing their realistic and accurate spatial relationship to enable instructions. For example, in Fig. 10, virtual Brick 87079 appears realistically, similar to the real Brick 87079 that is closer to the camera, in terms of object occlusions. The virtual brick occludes the physical bricks behind it and is partially occluded by the physical bricks in front of it. The same realistic and accurate occlusions are shown in Fig. 5 for virtual Bricks 15254. 50 to 60 FPS at screen resolution 2689 × 1242 pixels is achieved when running the AR instructions with object occlusion.

Fig. 10
figure 10

Step 136: virtual Brick 87079 (below the red label) appears realistically in terms of occlusions

The accuracy of hand occlusion is measured on sample images from the recorded AR video using intersection over union (IoU)—a widely used method for evaluating image segmentation models:

$$\mathrm{IoU}= \frac{\mathrm{Intersection} \, \mathrm{Area}}{\mathrm{Union} \, \mathrm{Area}}$$

where Intersection Area is the number of pixels in the intersection between the manually selected hand area and the algorithm-detected hand area, and Union Area is the number of pixels in the union of the two areas. The manually selected hand area is the ground truth. The algorithm-detected hand area can be automatically rendered in BRICKxAR as blue pixels for visualizing the hand detection results, as shown in Fig. 11. The intersection and union of the two areas are shown in Fig. 12. The IoU of hand occlusion equals to 88.3% on the sample images. After turning on hand detection and occlusion, the performance of the AR session is real time, between 28 and 60 FPS on iPhone XS MAX’s high-resolution screen (Table 1), which can be compared with the near-real-time performance and lower resolutions in other studies, e.g., Du et al. (2016).

Fig. 11
figure 11

Hexagon objects are rendered in blue in the detected hand area to show the detection result. If the hexagons are rendered in a transparent yet occlusive material, the hand will appear occluding the virtual bricks, as in Fig. 8 right

Fig. 12
figure 12

Both left and right: The manually selected real hand area (i.e., ground truth) is superimposed on the detected hand area. Left: Intersection between the two areas (within the lasso) and right: union of the two areas (within the lasso), for calculating intersection over union (IoU)

Table 1 AR experiment parameters

5.3 Heuristic evaluation

Heuristic evaluation—comparison with rules of thumb—is a human–computer interaction research method (Endsley et al. 2017; Nielsen and Molich 1990). BRICKxAR is compared with the major instruction design guidelines from the literature and all the nine AR design guidelines proposed in the recent literature (Endsley et al. 2017). These comparisons lead to related findings about BRICKxAR advancing the AR instruction methodology.

  1. 1.

    Current parts visibility (CPV). Each part in the current subset should be visible with respect to the other parts in the subset” (Heiser et al. 2004). In BRICKxAR, the current part is the virtual brick. If the virtual brick is occluded by other parts (physical bricks), it can be made visible by the player rotating the physical model or moving the AR camera. The current part always has a correct spatial relationship with respect to other parts.

  2. 2.

    Previous parts visibility (PPV). Some portion of the parts attached in earlier steps should remain visible for context” (Heiser et al. 2004). In BRICKxAR, parts in earlier steps are the physical bricks and visible naturally.

  3. 3.

    Future parts visibility (FPV). We want to ensure that parts added in an earlier assembly step do not occlude parts added in a later assembly step” (Heiser et al. 2004). In BRICKxAR, the player can rotate the model or move the AR camera to reveal a future part (virtual brick) any time.

  4. 4.

    Instructions should support and build on the players’ existing schemas and mental models to minimize extraneous cognitive load (Martin 2007b). The choice of visualizations and metaphors should match the mental models of users based on their physical environment and task (Endsley et al. 2017). In BRICKxAR, the realistic shaded rendering, occlusions, and the aligned perspective views between the physical and virtual models match the player’s level of graphic literacy and mental models naturally, which further support the form of virtual objects based on existing metaphors of users for communicating affordances and capabilities (Endsley et al. 2017).

  5. 5.

    Alignment of physical and virtual worlds (Endsley et al. 2017) is achieved with BRICKxAR’s model registration.

  6. 6.

    In an instruction booklet, limited angles of view (mostly isometric) may obscure parts (Martin and Smith-Jackson 2008). In BRICKxAR, the continuous changes of unrestricted viewing angles and the perspective views are consistent with the player’s natural graphics literacy, eliminating the obscuring problem.

  7. 7.

    In a previous study, when a digital model on a tablet was used as instructions, the rotational view was used often by players, much more than zoom and time lapse views (Christensen and Biskjaer 2018). In BRICKxAR, rotation and zoom for the virtual model can be done by physically manipulating the LEGO model or the AR device, minimizing distraction and overload (Endsley et al. 2017).

  8. 8.

    Model scale 1:1 is shown from time to time in the instruction booklets to distinguish bricks of similar shapes but different sizes; however, the graphic syntax of model scale could be confusing (Martin 2007a). In addition, players had difficulty selecting some components correctly when the instruction colors did not accurately reflect the true colors of components (Martin 2007a). For example, in some LEGO instruction booklets, “orange parts appeared yellow, and black was depicted as a dark grey, presumably to allow edges and features to be shown in black” (Martin 2007a). In AR, to fit with user’s perceptual abilities, designers should consider size, color, motion, distance, and resolution (Endsley et al. 2017). In BRICKxAR the virtual model’s scale or size is automatically matched to the physical model due to correct model registration. The screen resolution used is high (Table 1). The completed bricks are the physical bricks with their natural colors, and the virtual brick can be rendered photo-realistically with little color mismatch.

  9. 9.

    For instruction booklets, a guideline is to depict the assembly on a work surface or relative to the player’s position (Martin and Smith-Jackson 2008). Similarly, an AR design guideline is to adapt user position and motion (Endsley et al. 2017). In BRICKxAR, this is achieved automatically by the player positioning the LEGO set on the desk surface, and then, the virtual brick instructions appear relative to the player’s position with a correct spatial relationship.

  10. 10.

    Minimal “look times” and “look duration” of gazing at the instructions in between gazing at the assembly are important measures for the success of instruction booklets (Martin 2007b). In BRICKxAR, the player always looks at the virtual and physical bricks at the same time; thus, the minimal “look times” and “look duration” can be achieved straightforwardly. In addition, all physical motion required should be easy in AR (Endsley et al. 2017), while BRICKxAR does not require different physical motion from the conventional LEGO assembly.

  11. 11.

    With instruction booklets, users must compare two consecutive Structural Diagrams to infer which parts are to be attached (Agrawala et al. 2003; Heiser et al. 2004), while in BRICKxAR, all the diagrams are consolidated into one single LEGO physical model under construction and a virtual brick to be attached.

  12. 12.

    Line drawings of cutaway views or cross-sectional views can be used to show normally hidden or hard to see parts (Smith et al. 2003). BRICKxAR can superimpose a virtual cutaway view on a physical model to help recall the hidden structure of the completed part of the assembly (Fig. 9).

  13. 13.

    Accessibility of offscreen objects is suggested as an AR design guideline (Endsley et al. 2017). For a specific step of construction using BRICKxAR, the guiding virtual brick can be seen within the screen and is sufficient for the task without recalling other virtual bricks. A corresponding physical brick will need to be found offscreen from the box of parts. The virtual brick on the assembly and its copy with animation (rotation) in the sub-window (Fig. 10 middle–right) are intended for helping find the physical brick offscreen. Virtual bricks occluded completely or offscreen (as in Fig. 9c and d) can be visible by physically rotating the LEGO assembly or the AR device.

  14. 14.

    AR experiences should be designed to accommodate the capabilities and limitations of the hardware platform (Endsley et al. 2017). The BRICKxAR prototype takes the advantages of the AR-enabled iOS platform with its powerful sensors for SLAM and high-resolution screen for display. The hardware limitation of being tablet-based, compared with advanced Head-Mounted Display systems (Hoover et al. 2020), is addressed by the integration of improved features in BRICKxAR.

The evaluation also found the following heuristic violations—unmet guidelines.

  1. 1.

    Skipping repeated instructions is suggested for the repeated actions when building sub-models in current instruction booklets (Agrawala et al. 2003). This requires a “reference” to the repeated instructions for multiple sub-models. Currently, BRICKxAR provides the repeated instructions step by step; thus, the hierarchical structure of the model is “flattened.” If the instruction of the structural hierarchy is important for learning the modeling method, additional visualization of sub-models need to be added.

  2. 2.

    While instructions should build on the players’ mental models to minimize extraneous cognitive load, they should also allow for some cognitive drag (Martin 2007b). Compared with the current instruction booklets where 2D isometric CAD drawings are often used for guiding assembly, BRICKxAR eliminates the training opportunities for players to learn and understand 2D isometric drawings. However, serious games with different levels of challenges may potentially be built into BRICKxAR, and engaging spatial and STEM trainings are possible in the future.

6 Discussion

Based on the above evaluation and validation, compared with the current literature on existing AR applications and features reviewed in the section of “Background and literature review,” BRICKxAR has achieved significant improvement in AR instructions for (1) high accuracy with a high frame rate in AR registration, which can achieve an average error less than 1 mm throughout the model (Sect. 5.1), (2) realistic physical–virtual object occlusion (Sect. 5.2), and (3) real-time accurate visualization of hand occlusion or real hands grasping virtual objects (Sect. 5.2). The integration of these improved features makes AR instructions for small parts assembly such as LEGO construction possible. Furthermore, compared with the major instruction design guidelines and the AR design guidelines found in the literature, BRICKxAR’s features have advanced the AR instruction methodology regarding most of the guidelines, while future work is needed to address the unmet guidelines as pointed out in Sect. 5.3.

7 Conclusions and future work

A novel augmented reality (AR) instruction method has been investigated, developed, and prototyped as an iOS app BRICKxAR for small parts assembly using a construction toy (LEGO) as an example. With BRICKxAR, physical LEGO construction can be guided by 3D virtual bricks with accurate AR model registration. Algorithms for step-by-step physical–virtual object occlusion and real hand–virtual object occlusion are developed. The major finding of the research is that the integration of the accurate AR model registration and the occlusion algorithms makes AR instructions possible for small parts assembly, validated through the working AR prototype for constructing LEGO Arc de Triomphe and quantitative measures of the accuracies of registration and occlusions. In addition, a heuristic evaluation of BRICKxAR features has led to findings that the present method could advance AR instructions in terms of enhancing part visibility, match between mental models and visualization, alignment of physical and virtual parts in perspective views and spatial transformations, tangible user interface, consolidated structural diagrams, virtual cutaway views, among other benefits for guiding construction. The major contributions of the BRICKxAR project include the following:

  1. 1.

    Compared with the state of the art, accuracy of the virtual–physical model alignment is significantly improved through a unique design of marker-based registration, which can achieve an average error less than 1 mm throughout the model.

  2. 2.

    Realistic object occlusion is accomplished to reveal the true spatial relation between physical and virtual bricks.

  3. 3.

    Hand detection and occlusion are realized to visualize the correct spatial relation between real hands and virtual bricks, and allow virtual bricks to be “grasped” by the real hands in AR.

  4. 4.

    The integration of the above features makes AR instructions possible for small parts assembly tasks, validated through the working prototype for constructing LEGO Arc de Triomphe.

Future work will be built on the following ideas. The marker-based AR in this project can be applied to many other LEGO sets’ construction or assembly with small parts, but may not be applicable to all block toys and various types of assembly or construction. For example, for LEGO Technic, the player needs to grasp parts, and translate/rotate them in 6 DoF for connecting the parts in hand. Investigating more comprehensive SLAM methods for more flexible yet accurate AR registration will continue to be necessary. Integrating BRICKxAR’s features and 3D model- and machine learning-based AR registration may be investigated for AR instructions in various other real-world applications beyond the tabletop scenario, including furniture assembly and building construction. BRICKxAR has not been tested with AR glasses. However, the design and techniques demonstrated by BRICKxAR can be applied to other AR devices through future development. AR has great potentials for enhancing spatial and STEM learning (Ibáñez and Delgado-Kloos 2018), beyond assembly performance. Future development of BRICKxAR and user studies may further reveal its potential applications in education, which will be conducted after the COVID-19 pandemic.