Improved Structure from Motion Using Fiducial Marker Matching

DeGol, Joseph; Bretl, Timothy; Hoiem, Derek

doi:10.1007/978-3-030-01219-9_17

Improved Structure from Motion Using Fiducial Marker Matching

Joseph DeGol¹⁶,
Timothy Bretl¹⁶ &
Derek Hoiem^16,17

Conference paper
First Online: 07 October 2018

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11207))

Abstract

In this paper, we present an incremental structure from motion (SfM) algorithm that significantly outperforms existing algorithms when fiducial markers are present in the scene, and that matches the performance of existing algorithms when no markers are present. Our algorithm uses markers to limit potential incorrect image matches, change the order in which images are added to the reconstruction, and enforce new bundle adjustment constraints. To validate our algorithm, we introduce a new dataset with 16 image collections of large indoor scenes with challenging characteristics (e.g., blank hallways, glass facades, brick walls) and with markers placed throughout. We show that our algorithm produces complete, accurate reconstructions on all 16 image collections, most of which cause other algorithms to fail. Further, by selectively masking fiducial markers, we show that the presence of even a small number of markers can improve the results of our algorithm.

You have full access to this open access chapter, Download conference paper PDF

1 Introduction

Fiducial markers are often claimed to be useful for 3D reconstruction [1,2,3,4,5,6,7]. Markers provide highly detectable and identifiable features that 3D reconstruction can use to overcome challenging scene characteristics such as low-texture surfaces (e.g., blank walls), reflective surfaces (e.g., windows), and repetitive patterns (e.g., columns and door frames). Figure 1 shows an example of a dataset with exactly these challenging characteristics. Figure 1 also shows that approaches that treat markers as texture, only use them as additional tracks, or rely on them exclusively perform no better or even worse than if markers were ignored.

In this paper, we present an incremental structure from motion (SfM) algorithm that significantly outperforms these other approaches when markers are present in the scene. We exploit that markers can be identified with very low false positive rates (e.g. AprilTag2 with 36h11 markers has a false positive rate of 0.000044% [2]) to create a reliable marker match graph that guides image matching and resectioning. We encode constraints on marker size, shape, and planarity in bundle adjustment to further improve results. Importantly, our approach benefits from any detected markers without sacrificing performance when markers are not detected, and can benefit from even a small number of markers.

To evaluate our method, we introduce a new dataset with 16 image collections of indoor scenes. The scenes present challenging circumstances for SfM (e.g. blank hallways, reflective glass facades, and repetitive brick walls). Each indoor scene has tens to hundreds (depending on scene size) of markers placed approximately uniformly throughout. We test our system and several cutting edge benchmarks on this data and show that our system performs favorably. We also selectively mask markers and show that performance gracefully degrades towards markerless SfM as the number of markers in the scene decreases.

In summary, the contributions of this paper are: (1) an SfM algorithm that uses both fiducial markers (when available) and interest point features for improved results; (2) a large, challenging dataset of indoor scenes with markers placed throughout; and (3) experiments showing the effectiveness of our approach, even when only a small number of markers are visible.

2 Related Work

Incremental SfM: Early works by Schaffalitzky and Zisserman [10] and Snavely et al. [11] establish the pipeline for feature extraction, matching, and incremental SfM for unordered image collections. Focus then turns to large image collections with work by Agarwal et al. [12] and Frahm et al. [13] who use appearance based clustering to limit potential image matches; enabling reconstructions of Rome from thousands of internet photos. Work by Wu [14] shows that preemptive feature matching and well timed global bundle adjustments can maintain high accuracy while reducing the runtime of SfM to roughly $\mathcal {O}(n)$. Recently, several new SfM algorithms are available including COLMAP [15] by Schönberger and Frahm and OpenSfM [8] by Mapillary. These impressive works provide the baseline for the work in this paper.

3D Reconstruction Using Fiducial Markers: Early works using markers for 3D reconstruction focus on tracking the markers in simultaneous localization and mapping (SLAM) systems. Work by Klopschitz and Schmalstieg [17] tracks both feature points and marker matches in video frames to estimate the camera pose and triangulate the marker positions in 3D. Lim and Lee [18] and Yamada et al. [19] add an extended kalman filter (EKF) for estimating robot camera pose and marker positions in 3D. Neunert et al. [16] integrates IMU measurements into the EKF-SLAM system to improve pose estimates during marker tracking. Feng et al. [20] proposes an incremental SfM approach to marker based 3D reconstruction. They use markers to create an initial reconstruction, add new images using marker matches, and add constraints to bundle adjustment to enforce the square shape and planarity of markers. The work of Muñoz-Salinas et al. [9] introduces MarkerMapper. MarkerMapper overcomes the pose ambiguity problem [21] with planar marker pose estimation to create an initial proposal of 3D camera and marker locations and refines the proposal using global bundle adjustment. Only MarkerMapper [9] and Feng et al. [20] pursue 3D reconstruction from unordered image collections. However, neither method uses both image features and marker detections for 3D reconstruction. Experiments in Sect. 5 show that both image features and marker detections can be used together to achieve the best results, and, when few or no markers are available, our system performs no worse than non-marker based SfM.

Datasets: Datasets for testing marker based 3D reconstruction are limited. Only the dataset of Neunert et al. [16] is publicly available. Figure 2 provides snapshots from the four video sequences of this dataset. With only four sequences (two of which are of very small environments with only 1–3 markers), this dataset is no longer challenging for the current state of the art (e.g. in Sect. 5, we process this data with our method and other current SfM approaches, and all perform well). Our new dataset (Sect. 3) consists of 16 new image collections in environments with challenging characteristics for SfM (e.g. many low-texture walls and reflective glass). We hope our dataset will offer new challenges for future work on SfM both with and without marker assistance.

3 Indoor Image Collections with Fiducial Markers

We introduce 16 new unordered image sets for evaluating structure from motion for scenes containing fiducial markers. Each set is from one of three buildings: ECE, CEE, or MUF. Figures 3 and 4 provide floor plans for the sections of these buildings that are used to collect this data. Paths are drawn on each floor plan and the colors of the paths match the respective image sets in the figures (e.g. the green path on Floor 4 and 5 of ECE matches the ECE Floor5 Hall image set). For each set, fiducial markers are placed around the scene with enough density to see at least one in every image (and images are captured to satisfy this also). All images are captured with an iPhone7 camera and have a resolution of $4032 \times 3024$ pixels.

There are seven image sets not shown in the figures. That is because they are either combinations or subsets of the shown sets. Specifically, ECE Floor5 includes all the images of ECE Floor5 Hall and ECE Floor5 Stairs. ECE Floor3 Loop includes all the images of ECE Floor3 Loop CW and ECE Floor 3 CCW. CEE Day includes all the images of CEE Day CW and CEE Day CCW (plus some extra images). The nice thing about collecting data in this way is that we can test progressively larger datasets that present different circumstances that may make the image set easier or more difficult. For example, the results in Sect. 5 show that ECE Floor3 Loop CW and ECE Floor3 Loop CCW are typically more difficult than putting them together into ECE Floor 3 Loop. This is most likely because of the additional overlap between images since all locations are now seen more often from more viewing directions.

We use ECE, CEE, and MUF because they are large indoor scenes with characteristics that are challenging for SfM (as shown in Sect. 5). Specifically, ECE has long plain hallways, large glass walls separating conference rooms, large exterior windows, and the hallways form a loop. CEE has a two-floor glass facade and repetitive brick walls. MUF is currently under construction and has large open spaces and limited texture. See supplementary material for more examples.

4 Improving SfM with Markers

Figure 5 diagrams our marker assisted incremental SfM algorithm. The blue boxes represent the components of our algorithm that are different from typical state of the art incremental SfM approaches: detecting markers, filtering image pairs, resectioning images, and marker constraints for bundle adjustment.

4.1 Incremental SfM Overview

Incremental SfM takes a collection of images as input. For each image, focal length (and other priors) is estimated from metadata (or using heuristics when metadata is unavailable). Next, image features (e.g. SIFT features [22]) are extracted from each image. These image features are matched across image pairs. Matching is attempted between the set of all images pairs or a subset of the image pairs selected based on filtering criteria (e.g. GPS locations [13], Vocab Tree [12]). A fundamental matrix is estimated from the feature matches to filter bad matches and verify that each image pair is a good match.

After matching, reconstruction begins. Feature matches in two images are used to create an initial 3D reconstruction (pose of the two images with triangulated 3D points). Then, one at a time, a new image is added to the reconstruction (resectioning). This image is typically chosen based on the number of feature matches this image shares with the already reconstructed images. These shared feature matches are used to estimate the pose of this new camera and triangulate new 3D points. Bundle adjustment then optimizes all camera poses and 3D point positions to minimize reprojection error. Lastly, outlier points are removed. Resectioning is repeated to add all images to the reconstruction. The final output is a point cloud and set of camera poses, one for each image that is successfully resectioned.

4.2 Detect Markers

We run a square marker detection algorithm on each input image. The images are processed in parallel. Image name, marker id, corner locations, and corner pixel colors are saved for each detection.

4.3 Marker Informed Image Pairs

Prior to matching and verification, we create a set of image pairs that potentially match. We only attempt matching on the image pairs in this set. One approach is to add all possible image pairs; however, this greatly increases matching time and can lead to bad image matches that cause errors in the reconstruction. We apply three rules to use marker detections to dictate which images are added. Rule 1: we add an image pair if the same marker (at least one) is detected in both images. Rule 2: if an image does not share a detected marker with any other image, we add all possible pairs that contain that image. Rule 3: if the set of all added pairs do not form one connected component, we connect separate components by adding pairs for each image in the separate component to each image not in the separate component.

As an example, consider the top left diagram in Fig. 6. Each lettered box represents an image, and each numbered edge represents the number of marker matches those images share. Applying rule 1, we add the following possible image pairs (A, B), (A, C), (B, C), (B, D), (C, E), and (F, H). No pair is added that includes G, so based on rule 2, we add (G, A), (G, B), ..., (G, H). Lastly, since (F, H) is a separate component (rule 3), we add (F, A), (F, B), ..., (F, E) and (H, A), (H, B), ..., (H, E). We show in the results that this strategy can greatly speed up processing and eliminate many bad image matches. Note that other filtering approaches (e.g. Vocab Tree [12]) can be used in conjunction with our approach to add or filter image pairs.

4.4 Marker Informed Resectioning

Resectioning is the process of adding a new image to the existing reconstruction. The order in which images are added is important because poorly registered images can propagate errors that result in failure. One approach is to choose the image to resection that shares the most feature matches with the images in the reconstruction. This approach works well when image features are distinct and plentiful; however, for the challenging scenes we are targeting, failure can occur. Instead, we apply two rules to use marker detections to dictate resectioning order. Rule 1: the next image to resection shares the most marker matches with the current reconstruction. Rule 2: if multiple images share the same number of marker matches with the current reconstruction, choose the image that shares the most feature matches.

For example, consider the diagrams in Fig. 6. In the top left diagram, each edge represents the number of marker matches those images share. In the top middle and top right diagram, each numbered edge represents the number of image feature matches those images share. The bottom diagram depicts the resectioning procedure. First, images A and B are used for the initial reconstruction (step 1). The next image that is resectioned is C because it shares 4 (3 with A and 1 with B) marker matches with the current reconstruction (step 2). After that, image E is added because E and D both share 3 marker matches with the reconstruction, but E shares 100 feature matches and D only shares 60 (step 3). Image D is then added (step 4). No remaining images share marker matches with the current reconstruction, so image H is added based on shared image feature matches (step 5). F is added next (step 6) because it now shares marker matches with the reconstruction (because H was added). Lastly, G is added (step 7).

4.5 Marker Constraints for Bundle Adjustment

In bundle adjustment, we solve for camera poses $\varvec{P}$ and 3D points $\varvec{X}$ that optimize the following:

$$\begin{aligned} \min _{\varvec{P},\varvec{X}} \left[ w_R E_R \left( \varvec{P},\varvec{X} \right) + w_S E_S \left( \varvec{V} \right) + w_O E_O \left( \varvec{V} \right) \right] \,. \end{aligned}$$

(1)

$\varvec{V}$ is the set of vectors formed between neighboring 3D corners on each marker (i.e. there are four vectors for each marker detection). $w_R$, $w_S$, and $w_O$ are weights. Reprojection error [23] is

$$\begin{aligned} E_R \left( \varvec{P},\varvec{X} \right) = \sum \limits _{i=1}^C \sum \limits _{j=1}^N L \left( x_{ij}, \varvec{P}_i\left( \varvec{X}^j\right) \right) \, \end{aligned}$$

(2)

where C is the number of cameras, N is the number of 3D points (both marker and feature points), L is a loss function, $x_{ij}$ is the 2D location in image i of 3D point $\varvec{X}^j$, and $\varvec{P}_i$ is the projection function of camera i. Similar to [20], we also include error terms for marker scale ($E_S$, Eq. 3) and marker orthogonality ($E_O$, Eq. 4).

Marker Scale: The distance between marker corners in the reconstruction should match the known marker size. We define this error as $E_S\left( \varvec{V} \right) =$

$$\begin{aligned} \sum \limits _{i=1}^T \left( \left\| \varvec{V}^i_{12} \right\| _2-S \right) ^2 + \left( \left\| \varvec{V}^i_{23} \right\| _2-S \right) ^2 + \left( \left\| \varvec{V}^i_{34} \right\| _2-S \right) ^2 + \left( \left\| \varvec{V}^i_{41} \right\| _2-S \right) ^2 \end{aligned}$$

(3)

where $\varvec{V}^i_{NM}$ is the 3D vector from the 3D point of corner N to the 3D point of corner M on marker i, T is the number of markers, and S is the marker size.

Marker Orthogonality: Adjacent sides of the marker should be perpendicular. We define this error as $E_O\left( \varvec{V} \right) =$

$$\begin{aligned} \sum \limits _{i=1}^T \left( \varvec{V}^i_{12} \cdot \varvec{V}^i_{23} \right) ^2 + \left( \varvec{V}^i_{23} \cdot \varvec{V}^i_{34} \right) ^2 + \left( \varvec{V}^i_{34} \cdot \varvec{V}^i_{41} \right) ^2 + \left( \varvec{V}^i_{41} \cdot \varvec{V}^i_{12} \right) ^2 \end{aligned}$$

(4)

where $\varvec{V}^i_{NM}$ is the 3D vector from the 3D point of corner N to the 3D point of corner M on marker i, and T is the number of markers.

4.6 Implementation Details

We implement our approach on top of OpenSfM v0.1.0 [8]. We use default parameters, which work well for unordered image collections. We use AprilTag2 [2] to detect markers. For all experiments, we use a soft L1 loss for L; cost weights of $w_R=62500$, $w_S=100$, and $w_O=100$; and marker size $S=0.21$ m. In principal, our approach works with any square marker detector and can be integrated with any incremental or global [24, 25] (except resectioning) SfM method.

5 Results

We process our new dataset using: (1) OpenSfM [8], an open source state of the art SfM algorithm that is actively used and maintained by Mapillary [26]; (2) OpenSfM, but with all feature points on markers masked; (3) MarkerMapper [9], a state of the art algorithm for marker based SfM; (4) OpenSfM with the four marker corners used as tracks in reconstruction; and (5) our method. Table 1 provides quantitative results on the number of images localized, number of points, and reprojection errors. Failure reconstructions are denoted by a “-”. Figures 7 and 8 provide qualitative results of the 3D reconstructions. The green pyramids are the camera locations. The floor plans in Figs. 3 and 4 provide guidelines for how each reconstruction should look (e.g. ECE Floor3 Loop should be a rectangle). Because of the challenging nature of these datasets, the algorithms often fail or have large, noticeable mistakes; therefore, we focus more on the qualitative results because they illustrate the improvements clearly.

We also process the Neunert et al. [16] dataset. Since it is video data, we subsample the frames by a factor of 5 to simulate an unordered image collection. All OpenSfM methods and our method successfully reconstruct all image sets. MarkerMapper has trouble with this dataset because there are few (often only one) markers in each image. Reconstruction and timing results are shown in Tables 1 and 2 respectively. Qualitative results are in the supplementary material.

We do an ablation study with marker informed matching (Sect. 4.3) and marker informed resectioning (Sect. 4.4). For each dataset and method we calculate the percent of images localized. The average percentages of localized images are 98% (our full method), 68% (no marker informed resectioning), 50% (no marker informed matching), and 42% (OpenSfM with markers masked — the next best method). These percentages show that both marker informed matching and resectioning are useful individually, but most effective when used together. We also test our method without marker scale ($E_S$, Eq. 3) and orthogonality ($E_O$, Eq. 4) constraints and find that they provide little to no gain, sometimes making the results worse. See supplementary material for more details about the ablation study.

Table 1. Reconstruction results for OpenSfM [8], OpenSfM with markers masked (denoted by [8]*), MarkerMapper [9], OpenSfM with marker tracks (denoted by MT), and our method. Failure reconstructions (Figs. 7 and 8) are blank because the numbers can be misleading (e.g. all cameras localized to one spot). Our method achieves similar or better results for number of registered images and points for all reconstructions.

Full size table

All experiments use an Intel Xeon E5-2620 V4 2.1 GHz 16 cores (32 virtual cores) processor with 128 GB of RAM. No graphics card is used.

Using Markers as Texture Often Makes Reconstructions Worse. Masking the markers shows how OpenSfM performs if the scenes have no markers. Comparing column 1 (OpenSfM) and column 2 (OpenSfM with masked markers) in Figs. 7 and 8, shows that masking the markers often produces better results. For example, ECE Floor2 Hall should have an “L” shape, which OpenSfM with masked markers achieves, but OpenSfM does not. Other examples where masking markers is clearly better are ECE Floor3 Loop CW, ECE Floor5 Stairs, CEE Day CCW, CEE Day, and CEE Night.

Marker texture does not always produce bad results (e.g. MUF Floor3), but marker texture can cause bad feature matches because the appearance is similar between the markers (i.e. black and white squares). This reinforces the need for our approach which takes advantage of visible markers to improve results.

Table 2. Reconstruction timings for OpenSfM [8], OpenSfM with markers masked (denoted by [8]*), MarkerMapper [9], OpenSfM with marker tracks (denoted by MT), and our method. Using the markers to limit possible image pairs decreases the matching time significantly. Also, because more images are resectioned, the reconstruction time increases. Overall, our method produces better reconstructions in a shorter time.

Full size table

Using Marker Detections as Tracks Has Little Effect. Comparing column 4 (OpenSfM with marker tracks) to columns 1 and 2 (OpenSfM with and without markers masked) of Figs. 3 and 4 shows that the marker tracks rarely improve the reconstructions and sometimes make them worse (e.g. ECE Floor5 and CEE Night). We suspect this is because the localization of the marker corners can be less accurate (e.g. off by 3–5 pixels [3]) than image features.

Our Approach Succeeds Where Others Fail. From Figs. 7 and 8, we see that our method produces a successful reconstruction for every image set. We also see that our method produces better results than the other methods on the challenging sets. Most notable are ECE Floor3 Loop CW, ECE Floor3 Loop, and CEE Day CCW because all other methods fail or have significant mistakes. For ECE Floor5 stairs, ECE Floor5, and CEE Day CW, other methods produce reasonable results, but our approach is more complete.

Our Approach Succeeds Where Others Succeed. There are several image sets where all (or most) of the methods produce successful reconstructions (e.g. ECE Floor5 Hall and CEE Night CW). In these cases, our method also produces nice reconstructions. This is important because our algorithm improves on the challenging image sets without sacrificing accuracy on the easier image sets.

Using Markers Improves Reconstruction Time. Table 2 provides the run times for marker detection, matching, and reconstruction for all image sets. Timings for other parts of SfM are not included since they do not change between methods. Also, only the total run time of MarkerMapper is reported because it does not follow the same pipeline as the others. One main thing to note is that using markers to limit pairs for image matching can decrease run times significantly (e.g. for MUF Floor2, our method took 5596 seconds and the other OpenSfM approaches took 5–6 times longer). Time is added to detect markers in each image, but it is typically negligible compared to the time saved in matching. Another interesting point is that reconstruction time often increases. This is because more images are able to be registered with our method.

Few Marker Detections Still Improves Reconstructions. Figure 9 demonstrates how marker density effects the reconstructions. In particular, the left six images show how the reconstruction of ECE Floor3 Loop CCW improves as the marker density increases. Here AMD stands for average marker detections per image (e.g. AMD = 0.0 means there are no markers detected, and AMD = 6.0 means that there were an average of 6 markers detected per image).

The plot in Fig. 9 shows how the percent of localized images increases as the AMD increases for seven datasets. These datasets were chosen because our method achieves clear improvements over the other methods. The trend line is plotted in black. We see from this plot that markers help even when AMD is less than 1 (sometimes even 100% of the images are localized). As AMD increases, the number of localized images increases towards 100%. Placing enough markers for an AMD of 6 will likely produce accurate, complete reconstructions with 90+% of images localized. However, markers are most useful in areas with challenging conditions for SfM, so placing more markers in these challenging areas and fewer (or zero) markers in easier areas can help our method achieve accurate, complete reconstructions with drastically fewer total marker detections.

6 Conclusion

We present an incremental SfM method that significantly outperforms existing methods when fiducial markers are detected in the scene. We introduce a new dataset with 16 image collections of indoor scenes with square markers placed throughout. We use the unique marker IDs to improve image matching and resectioning order and show that these improvements greatly improve reconstruction results when compared to other methods. Lastly, we show that even a small number of visible markers often improves reconstruction results.

References

Birdal, T., Dobryden, I., Ilic, S.: X-tag: a fiducial tag for flexible and accurate bundle adjustment. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 556–564, October 2016
Google Scholar
Wang, J., Olson, E.: AprilTag 2: efficient and robust fiducial detection. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), October 2016
Google Scholar
DeGol, J., Bretl, T., Hoiem, D.: ChromaTag: a colored marker and fast detection algorithm. In: ICCV (2017)
Google Scholar
Garrido-Jurado, S., noz Salinas, R.M., Madrid-Cuevas, F., Marín-Jiménez, M.: Automatic generation and detection of highly reliable fiducial markers under occlusion. Pattern Recogn. 47(6), 2280–2292 (2014)
Article Google Scholar
Fiala, M.: Designing highly reliable fiducial markers. IEEE Trans. Pattern Anal. Mach. Intell. 32(7), 1317–1324 (2010)
Article Google Scholar
Bergamasco, F., Albarelli, A., Cosmo, L., Rodola, E., Torsello, A.: An accurate and robust artificial marker based on cyclic codes. IEEE Trans. Pattern Anal. Mach. Intell. PP(99), 1 (2016)
Google Scholar
Calvet, L., Gurdjos, P., Griwodz, C., Gasparini, S.: Detection and accurate localization of circular fiducials under highly challenging conditions. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016
Google Scholar
OpenSfM. https://github.com/mapillary/opensfm
Muoz-Salinas, R., Marn-Jimenez, M.J., Yeguas-Bolivar, E., Medina-Carnicer, R.: Mapping and localization from planar markers. Pattern Recogn. 73, 158–171 (2018)
Article Google Scholar
Schaffalitzky, F., Zisserman, A.: Multi-view matching for unordered image sets, or “How Do I Organize My Holiday Snaps?”. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp. 414–431. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-47969-4_28
Chapter Google Scholar
Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring photo collections in 3D. In: Proceedings of ACM SIGGRAPH (2006)
Google Scholar
Agarwal, S., Snavely, N., Simon, I., Seitz, S.M., Szeliski, R.: Building Rome in a day. In: IEEE 12th International Conference on Computer Vision, pp. 72–79, September 2009
Google Scholar
Frahm, J.-M., et al.: Building Rome on a cloudless day. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 368–381. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_27
Chapter Google Scholar
Wu, C.: Towards linear-time incremental structure from motion. In: 2013 International Conference on 3D Vision - 3DV 2013 (2013)
Google Scholar
Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Neunert, M., Bloesch, M., Buchli, J.: An open source, fiducial based, visual-inertial motion capture system. In: 2016 19th International Conference on Information Fusion (FUSION) (2016)
Google Scholar
Klopschitz, M., Schmalstieg, D.: Automatic reconstruction of wide-area fiducial marker models. In: 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality (2007)
Google Scholar
Lim, H., Lee, Y.S.: Real-time single camera slam using fiducial markers. In: 2009 ICCAS-SICE (2009)
Google Scholar
Yamada, T., Yairi, T., Bener, S.H., Machida, K.: A study on slam for indoor blimp with visual markers. In: ICCAS-SICE 2009, pp. 647–652 (2009)
Google Scholar
Feng, C., Kamat, V., Menassa, C.C.: Marker-assisted structure from motion for 3D environment modeling and object pose estimation. In: Construction Research Congress (2016)
Google Scholar
Schweighofer, G., Pinz, A.: Robust pose estimation from a planar target. IEEE Trans. Pattern Anal. Mach. Intell. 28(12), 2024–2030 (2006)
Article Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Article MathSciNet Google Scholar
Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press, Cambridge (2004). ISBN 0521540518
Book Google Scholar
Moulon, P., Monasse, P., Marlet, R., et al.: OpenMVG. An open multiple view geometry library. https://github.com/openMVG/openMVG
Moulon, P., Monasse, P., Marlet, R.: Global fusion of relative motions for robust, accurate and scalable structure from motion. In: 2013 IEEE International Conference on Computer Vision (2013)
Google Scholar
Mapillary. https://www.mapillary.com/

Download references

Acknowledgement

This work is supported by NSF Grant CMMI-1446765 and the DoD National Defense Science and Engineering Graduate Fellowship (NDSEG). Thank you also to Reconstruct for computational resources that enabled this research and Daniel Yuan, Jae Yong Lee, and Shreya Jagarlamudi for help with data collection.

Author information

Authors and Affiliations

University of Illinois, Urbana-Champaign, USA
Joseph DeGol, Timothy Bretl & Derek Hoiem
Reconstruct Inc., Champaign, USA
Derek Hoiem

Authors

Joseph DeGol
View author publications
You can also search for this author in PubMed Google Scholar
Timothy Bretl
View author publications
You can also search for this author in PubMed Google Scholar
Derek Hoiem
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joseph DeGol .

Editor information

Editors and Affiliations

Google Research, Zurich, Switzerland
Vittorio Ferrari
Carnegie Mellon University, Pittsburgh, PA, USA
Martial Hebert
Google Research, Zurich, Switzerland
Cristian Sminchisescu
Hebrew University of Jerusalem, Jerusalem, Israel
Yair Weiss

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 10797 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

DeGol, J., Bretl, T., Hoiem, D. (2018). Improved Structure from Motion Using Fiducial Marker Matching. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds) Computer Vision – ECCV 2018. ECCV 2018. Lecture Notes in Computer Science(), vol 11207. Springer, Cham. https://doi.org/10.1007/978-3-030-01219-9_17

Download citation

DOI: https://doi.org/10.1007/978-3-030-01219-9_17
Published: 07 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01218-2
Online ISBN: 978-3-030-01219-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics