Skip to main content

Computer Vision for Mobile Augmented Reality

Abstract

Mobile augmented reality (AR) employs computer vision capabilities in order to properly integrate the real and the virtual, whether that integration involves the user’s location, object-based interaction, 2D or 3D annotations, or precise alignment of image overlays. Real-time vision technologies vital for the AR context include tracking, object and scene recognition, localization, and scene model construction. For mobile AR, which has limited computational resources compared with static computing environments, efficient processing is critical, as are consideration of power consumption (i.e., battery life), processing and memory limitations, lag, and the processing and display requirements of the foreground application. On the other hand, additional sensors (such as gyroscopes, accelerometers, and magnetometers) are typically available in the mobile context, and, unlike many traditional computer vision applications, user interaction is often available for user feedback and disambiguation. In this chapter, we discuss the use of computer vision for mobile augmented reality and present work on a vision-based AR application (mobile sign detection and translation), a vision-supplied AR resource (indoor localization and post estimation), and a low-level correspondence tracking and model estimation approach to increase accuracy and efficiency of computer vision methods in augmented reality.

Keywords

  • Mobile AR
  • Augmented Reality (AR)
  • Nonuniform Sampling Scheme
  • Cell Phone Images
  • Keypoint Correspondences

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-24702-1_1
  • Chapter length: 40 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   99.00
Price excludes VAT (USA)
  • ISBN: 978-3-319-24702-1
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   129.00
Price excludes VAT (USA)
Hardcover Book
USD   119.99
Price excludes VAT (USA)
Fig. 1.1
Fig. 1.2
Fig. 1.3
Fig. 1.4
Fig. 1.5
Fig. 1.6
Fig. 1.7
Fig. 1.8
Fig. 1.9
Fig. 1.10
Fig. 1.11
Fig. 1.12
Fig. 1.13
Fig. 1.14
Fig. 1.15
Fig. 1.16
Fig. 1.17
Fig. 1.18
Fig. 1.19

Notes

  1. 1.

    https://developers.google.com/translate/.

  2. 2.

    Libcurl is available at http://curl.haxx.se/.

References

  1. Arthur, D., Vassilvitskii, S.: K-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA’07, pp. 1027–1035. Society for Industrial and Applied Mathematics, New Orleans, Louisiana (2007)

    Google Scholar 

  2. Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110(3), 346–359 (2008)

    CrossRef  Google Scholar 

  3. Beis, J.S., Lowe, D.G.: Shape indexing using approximate nearest-neighbour search in high-dimensional spaces. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (1997)

    Google Scholar 

  4. Benhimane, S., Malis, E.: Real-time image-based tracking of planes using efficient second-order minimization. Proc. IEEE Int. Conf. Intell. Robot. Syst. (IROS 2004) 1, 943–948 (2004)

    Google Scholar 

  5. Bissacco, A., Cummins, M., Netzer, Y., Neven, H.: Photoocr: reading text in uncontrolled conditions. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2013)

    Google Scholar 

  6. Brahmachari, A.S., Sarkar, S.: Blogs: balanced local and global search for non-degenerate two view epipolar geometry. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1685–1692 (2009)

    Google Scholar 

  7. Brown, M., Winder, S., Szeliski, R.: In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2005)

    Google Scholar 

  8. Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 8(6), 679–698 (1986)

    CrossRef  Google Scholar 

  9. Castillo, E., Hadi, A.S., Balakrishnan, N., Sarabia, J.M.: Extreme Value and Related Models with Applications in Engineering and Science. Wiley, Hoboken (2005)

    MATH  Google Scholar 

  10. Cheng, C.-C., Peng, G.-J., Hwang, W.-L.: Subband weighting with pixel connectivity for 3-d wavelet coding. IEEE Trans. Image Process. 18(1), 52–62 (2009)

    MathSciNet  CrossRef  Google Scholar 

  11. Chum, O., Matas, J.: Matching with prosac—progressive sample consensus. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2005)

    Google Scholar 

  12. Coles, S.: An Introduction to Statistical Modeling of Extreme Values. Springer, Berlin (2001)

    CrossRef  MATH  Google Scholar 

  13. Crandall, D., Owens, A., Snavely, N., Huttenlocher, D.: SfM with MRFs: discrete-continuous optimization for large-scale reconstruction. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 12 (2013)

    CrossRef  Google Scholar 

  14. Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2010)

    Google Scholar 

  15. Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)

    MathSciNet  CrossRef  Google Scholar 

  16. Fragoso, V., Turk, M.: SWIGS: a swift guided sampling method. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2013)

    Google Scholar 

  17. Fragoso, V., Gauglitz, S., Zamora, S., Kleban, J., Turk, M.: TranslatAR: a mobile augmented reality translator. In: Proceedings of the IEEE Workshop on Applications of Computer Vision (WACV’11) (2011)

    Google Scholar 

  18. Fragoso, V., Sen, P., Rodriguez, S., Turk, M.: EVSAC: accelerating hypotheses generation by modeling matching scores with extreme value theory. In: Proceedings of IEEE International Conference on Computer Vision (ICCV) (2013)

    Google Scholar 

  19. Gao, J., Yang, J.: An adaptive algorithm for text detection from natural scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2001)

    Google Scholar 

  20. Gauglitz, S., Höllerer, T., Turk, M.: Evaluation of interest point detectors and feature descriptors for visual tracking. Int. J. Comput. Vis. 94(3), 335–360 (2011)

    CrossRef  MATH  Google Scholar 

  21. Gauglitz, S., Sweeney, C., Ventura, J., Turk, M., Höllerer, T.: Live tracking and mapping from both general and rotation-only camera motion. In: Proceedings of the 11th IEEE International Symposium on Mixed and Augmented Reality (ISMAR’12), pp. 13–22. Atlanta, Georgia (2012)

    Google Scholar 

  22. Goshen, L., Shimshoni, I.: Balanced exploration and exploitation model search for efficient epipolar geometry estimation. IEEE Trans. Pattern Anal. Mach. Intell. 30(7), 1230–1242 (2008)

    CrossRef  Google Scholar 

  23. Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2000). ISBN 0521623049

    MATH  Google Scholar 

  24. Jaderberg, M., Vedaldi, A., Zisserman, A.: Deep features for text spotting. In: Computer Vision ECCV 2014. Lecture Notes in Computer Science, vol. 8692, pp. 512–528. Springer International Publishing, Berlin (2014)

    Google Scholar 

  25. Kato, H., Billinghurst, M.: Marker tracking and hmd calibration for a video-based augmented reality conferencing system. In: Proceedings of the 2nd IEEE and ACM International Workshop on Augmented Reality, 1999 (IWAR’99), pp. 85–94 (1999)

    Google Scholar 

  26. Klein, G., Murray, D.: Parallel tracking and mapping for small AR workspaces. In: Proceedings of the Sixth IEEE and ACM International Symposium on Mixed and Augmented Reality (ISMAR’07), Nara, Japan (2007)

    Google Scholar 

  27. Kneip, L., Li, H., Seo, Y.: UPnP: an optimal O(n) solution to the absolute pose problem with universal applicability. In: Computer Vision ECCV 2014. Lecture Notes in Computer Science, vol. 8689, pp. 127–142. Springer International Publishing, Berlin (2014)

    Google Scholar 

  28. Lee, C.W., Jung, K., Kim, H.J.: Automatic text detection and removal in video sequences. Pattern Recognit. Lett. 24(15), 2607–2623 (2003)

    CrossRef  Google Scholar 

  29. Lee, C.-Y., Bhardwaj, A., Di, W., Jagadeesh, V., Piramuthu, R.: Region-based discriminative feature pooling for scene text recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)

    Google Scholar 

  30. Lepetit, V., Moreno-Noguer, F., Fua, P.: EPnP: an accurate O(n) solution to the PnP problem. Int. J. Comput. Vis. 81(2), 155–166 (2009)

    CrossRef  Google Scholar 

  31. Levenberg, K.: A method for the solution of certain non-linear problems in least squares. Q. J. Appl. Math. II(2), 164–168 (1944)

    Google Scholar 

  32. Levenshtein, I.V.: Binary codes capable of correcting deletions, insertions, and reversals. Cybern. Control Theory 10(8), 707–710 (1966)

    MathSciNet  MATH  Google Scholar 

  33. Liu, Y., Goto, S., Ikenaga, T.: A contour-based robust algorithm for text detection in color images. IEICE—Trans. Inf. Syst. E89–D(3), 1221–1230 (2006)

    Google Scholar 

  34. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    CrossRef  Google Scholar 

  35. Lucas, S.M.: LCDAR 2005 text locating competition results. Proc. IEEE Conf. Doc. Anal. Recognit. 1, 80–84 (2005)

    CrossRef  Google Scholar 

  36. Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Anal. Mach. Intell. 27(10), 1615–1630 (2005)

    CrossRef  Google Scholar 

  37. Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2012)

    Google Scholar 

  38. Neumann, L., Matas, J.: Scene text localization and recognition with oriented stroke detection. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2013)

    Google Scholar 

  39. Park, A., Jung, K.: Automatic word detection system for document image using mobile devices. In: Human-Computer Interaction. Interaction Platforms and Techniques. Lecture Notes in Computer Science, vol. 4551, pp. 438–444. Springer, Berlin (2007)

    Google Scholar 

  40. Paucher, P., Turk, M.: Location-based augmented reality on mobile phones. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPR Workshops) (2010)

    Google Scholar 

  41. Petter, M., Fragoso, V., Turk, M., Baur, C.: Automatic text detection for mobile augmented reality translation. In: Proceedings of IEEE International Conference on Computer Vision Workshops (ICCV Workshops) (2011)

    Google Scholar 

  42. Raguram, R., Frahm, J.-M., Pollefeys, M.: A comparative analysis of ransac techniques leading to adaptive real-time random sample consensus. In: Computer Vision ECCV 2008. Springer, Berlin (2008)

    Google Scholar 

  43. Rousseeuw, P.J.: Least median of squares regression. J. Am. Stat. Assoc. 79(388), 871–880 (1984)

    MathSciNet  CrossRef  MATH  Google Scholar 

  44. Scheirer, W.J., Rocha, A., Micheals, R.J., Boult, T.E.: Meta-eecognition: the theory and practice of recognition score analysis. IEEE Trans. Pattern Anal. Mach. Intell. 33(8), 1689–1695 (2011)

    CrossRef  Google Scholar 

  45. Smith, R.: An overview of the tesseract ocr engine. In: Proceedings of the Ninth International Conference on Document Analysis and Recognition, ICDAR’07, vol. 02, pp. 629–633. IEEE Computer Society (2007)

    Google Scholar 

  46. Sweeney, C., Fragoso, V., Hllerer, T., Turk, M.: gDLS: a scalable solution to the generalized pose and scale problem. In: Computer Vision ECCV 2014. Lecture Notes in Computer Science, vol. 8692, pp. 16–31. Springer International Publishing, Berlin (2014)

    Google Scholar 

  47. Tordoff, B.J., Murray, D.W.: Guided-MLESAC: faster image transform estimation by using matching priors. IEEE Trans. Pattern Anal. Mach. Intell. 27(10), 1523–1535 (2005)

    CrossRef  Google Scholar 

  48. Torr, P.H.S., Zisserman, A.: MLESAC: a new robust estimator with application to estimating image geometry. Comput. Vis. Image Underst. 78(1), 138–156 (2000)

    CrossRef  Google Scholar 

  49. Wagner, D., Schmalstieg, D.: Artoolkitplus for pose tracking on mobile devices. In: Proceedings of the 12th Computer Vision Winter Workshop (CVWW’07), pp. 139–146 (2007)

    Google Scholar 

  50. Wagner, D., Mulloni, A., Langlotz, T., Schmalstieg, D.: Real-time panoramic mapping and tracking on mobile phones. In: IEEE Virtual Reality Conference (VR). IEEE, pp. 211–218 (2010)

    Google Scholar 

  51. Ye, Q., Gao, W., Wang, W., Zeng, W.: A robust text detection algorithm in images and video frames. Proc. IEEE Int. Conf. Inf. Commun. Signal Process. 2, 802–806 (2003)

    Google Scholar 

  52. Ye, Q., Huang, Q., Gao, W., Zhao, D.: Fast and robust text detection in images and video frames. Image Vis. Comput. 23(6), 565–576 (2005)

    CrossRef  Google Scholar 

Download references

Acknowledgments

We wish to acknowledge our colleagues who were involved in various aspects of the research reported on in this chapter: Steffen Gauglitz, Shane Zamora, Jim Kleban, Marc Petter, Charles Baur, Pradeep Sen, Sergio Rodriguez. This work was partially supported by UC MEXUS-CONACYT (Fellowship 212913) and NSF award 1219261. Parts of this chapter present research originally published in references [1618, 40, 41].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matthew Turk .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Turk, M., Fragoso, V. (2015). Computer Vision for Mobile Augmented Reality. In: Hua, G., Hua, XS. (eds) Mobile Cloud Visual Media Computing. Springer, Cham. https://doi.org/10.1007/978-3-319-24702-1_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24702-1_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24700-7

  • Online ISBN: 978-3-319-24702-1

  • eBook Packages: Computer ScienceComputer Science (R0)