Advertisement

Mobile Augmented Reality Framework - MIRAR

  • João M. F. RodriguesEmail author
  • Ricardo J. M. Veiga
  • Roman Bajireanu
  • Roberto Lam
  • João A. R. Pereira
  • João D. P. Sardo
  • Pedro J. S. Cardoso
  • Paulo Bica
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10908)

Abstract

The increasing immersion of technology on our daily lives demands for additional investments in various areas, including, as in the present case, the enhancement of museums’ experiences. One of the technologies that improves our relationship with everything that surrounds us is Augmented Reality. This paper presents the architecture of MIRAR, a Mobile Image Recognition based Augmented Reality framework. The MIRAR framework allows the development of a system that uses mobile devices to interact with the museum’s environment, by: (a) recognizing and tracking on-the-fly, on the client side (mobile), museum’s objects, (b) detecting and recognizing where the walls and respective boundaries are localized, as well as (c) do person detection and segmentation. These objects, wall and person segmentation will allow the projection of different contents (text, images, videos, clothes, etc.). Promising results are presented in these topics, nevertheless, some of them are still in a development stage.

Keywords

Augmented reality Object recognition Wall detection Human detetion HCI 

Notes

Acknowledgements

This work was supported by the Portuguese Foundation for Science and Technology (FCT), project LARSyS (UID/EEA/50009/2013), CIAC, and project M5SAR I&DT nr. 3322 financed by CRESC ALGARVE2020, PORTUGAL2020 and FEDER. We also thank Faro Municipal Museum and the M5SAR project leader, SPIC - Creative Solutions [www.spic.pt].

References

  1. 1.
    Artoolkit.: ARtoolKit, the world’s most widely used tracking library for augmented reality (2017). http://artoolkit.org/. Accessed 16 Nov 2017
  2. 2.
    Azuma, R., Baillot, Y., Behringer, R., Feiner, S., Julier, S., MacIntyre, B.: Recent advances in augmented reality. IEEE Comput. Graph. Appl. 21(6), 34–47 (2001)CrossRefGoogle Scholar
  3. 3.
    Baggio, D.L.: Mastering OpenCV with Practical Computer Vision Projects. Packt Publishing Ltd, Birmingham (2012)Google Scholar
  4. 4.
    Bailey, T., Durrant-Whyte, H.: Simultaneous localization and mapping (SLAM): part II. IEEE Robot. Autom. Mag. 13(3), 108–117 (2006)CrossRefGoogle Scholar
  5. 5.
    Bhole, C., Pal, C.: Automated person segmentation in videos. In: 21st International Conference on Pattern Recognition (ICPR), pp. 3672–3675. IEEE (2012)Google Scholar
  6. 6.
    Buch, A.G., Kraft, D., Kamarainen, J.-K., Petersen, H.G., Krüger, N.: Pose estimation using local structure-specific shape and appearance context. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 2080–2087. IEEE (2013)Google Scholar
  7. 7.
    Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: CVPR, vol. 1, no. 2, p. 7 (2017)Google Scholar
  8. 8.
    Catchoom.: Catchoom (2017). http://catchoom.com/. Accessed 16 Nov 2017
  9. 9.
    Cheng, K.-H., Tsai, C.-C.: Affordances of augmented reality in science learning: suggestions for future research. J. Sci. Educ. Technol. 22(4), 449–462 (2013)CrossRefGoogle Scholar
  10. 10.
    COCO.: COCO - common objects in context (2018). http://cocodataset.org/. Accessed 14 Jan 2018
  11. 11.
    Duan, W.: Vanishing points detection and camera calibration. Ph.D. thesis, University of Sheffield (2011)Google Scholar
  12. 12.
    Durrant-Whyte, H., Bailey, T.: Simultaneous localization and mapping: part I. IEEE Rob. Autom. Mag. 13(2), 99–110 (2006)CrossRefGoogle Scholar
  13. 13.
    Engel, J., Koltun, V., Cremers, D.: Direct sparse odometry. IEEE Trans. Pattern Anal. Mach. Intell. 40(3), 611–625 (2017)CrossRefGoogle Scholar
  14. 14.
    Engel, J., Schöps, T., Cremers, D.: LSD-SLAM: large-scale direct monocular SLAM. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 834–849. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10605-2_54CrossRefGoogle Scholar
  15. 15.
    Erhan, D., Szegedy, C., Toshev, A., Anguelov, D.: Scalable object detection using deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2147–2154 (2014)Google Scholar
  16. 16.
    Estimote.: Create magical experiences in the physical world (2017). https://goo.gl/OHW04y. Accessed 04 Apr 2017
  17. 17.
    Fang, H., Xie, S., Lu, C.: RMPE: Regional multi-person pose estimation. arXiv preprint (2017)Google Scholar
  18. 18.
    Farnebäck, G.: Two-frame motion estimation based on polynomial expansion. In: Bigun, J., Gustavsson, T. (eds.) SCIA 2003. LNCS, vol. 2749, pp. 363–370. Springer, Heidelberg (2003).  https://doi.org/10.1007/3-540-45103-X_50CrossRefGoogle Scholar
  19. 19.
    Figat, J., Kornuta, T., Kasprzak, W.: Performance evaluation of binary descriptors of local features. In: Chmielewski, L.J., Kozera, R., Shin, B.-S., Wojciechowski, K. (eds.) ICCVG 2014. LNCS, vol. 8671, pp. 187–194. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-11331-9_23CrossRefGoogle Scholar
  20. 20.
    Fleet, D., Weiss, Y.: Optical flow estimation. In: Paragios, N., Chen, Y., Faugeras, O. (eds.) Handbook of Mathematical Models in Computer Vision, pp. 237–257. Springer, Boston (2006).  https://doi.org/10.1007/0-387-28831-7_15CrossRefGoogle Scholar
  21. 21.
    Google.: TensorFlow - an open-source machine learning framework for everyone (2018). https://www.tensorflow.org/. Accessed 14 Jan 2018
  22. 22.
    Hallquist, A., Zakhor, A.: Single view pose estimation of mobile devices in urban environments. In: 2013 IEEE Workshop on Applications of Computer Vision (WACV), pp. 347–354. IEEE (2013)Google Scholar
  23. 23.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: IEEE International Conference on Computer Vision (ICCV), pp. 2980–2988. IEEE (2017)Google Scholar
  24. 24.
    Hernández-Vela, A., Reyes, M., Ponce, V., Escalera, S.: Grabcut-based human segmentation in video sequences. Sensors 12(11), 15376–15393 (2012)CrossRefGoogle Scholar
  25. 25.
    HMS.: Srbija 1914/augmented reality exhibition at historical museum of Serbia, Belgrade (2017). https://vimeo.com/126699550. Accessed 04 Apr 2017
  26. 26.
    Hough, P.V.: Method and means for recognizing complex patterns. Technical report (1962)Google Scholar
  27. 27.
    Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. CoRR, abs/1704.04861 (2017)Google Scholar
  28. 28.
    Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., Fischer, I., Wojna, Z., Song, Y., Guadarrama, S., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. arXiv preprint arXiv:1611.10012 (2016)
  29. 29.
    Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., Fischer, I., Wojna, Z., Song, Y., Guadarrama, S., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. In: IEEE CVPR (2017)Google Scholar
  30. 30.
    Hulik, R., Spanel, M., Smrz, P., Materna, Z.: Continuous plane detection in point-cloud data based on 3D hough transform. J. Vis. Commun. Image Represent. 25(1), 86–97 (2014)CrossRefGoogle Scholar
  31. 31.
    InformationWeek.: Informationweek: 10 fantastic iPhone, Android Apps for museum visits (2017). https://goo.gl/XF3rj4. Accessed 04 Apr 2017
  32. 32.
    Kudan.: Kudan computer vision (2017). https://www.kudan.eu/. Accessed 16 Nov 2017
  33. 33.
    Layar.: Layar (2017). https://www.layar.com/. Accessed 16 Nov 2017
  34. 34.
    Leutenegger, S., Chli, M., Siegwart, R.Y.: BRISK: binary robust invariant scalable keypoints. In: IEEE International Conference on Computer Vision (ICCV), pp. 2548–2555. IEEE (2011)Google Scholar
  35. 35.
    Lv, Q., Josephson, W., Wang, Z., Charikar, M., Li, K.: Multi-probe LSH: efficient indexing for high-dimensional similarity search. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 950–961. VLDB Endowment (2007)Google Scholar
  36. 36.
    Muja, M., Lowe, D.G.: Fast matching of binary features. In: 2012 Ninth Conference on Computer and Robot Vision (CRV), pp. 404–410. IEEE (2012)Google Scholar
  37. 37.
    Mur-Artal, R., Montiel, J.M.M., Tardos, J.D.: ORB-SLAM: a versatile and accurate monocular slam system. IEEE Trans. Rob. 31(5), 1147–1163 (2015)CrossRefGoogle Scholar
  38. 38.
    OpenCV.: OpenCV (2017). http://opencv.org/. Accessed 04 Apr 2017
  39. 39.
    Ouyang, W., Wang, X.: Joint deep learning for pedestrian detection. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 2056–2063. IEEE (2013)Google Scholar
  40. 40.
    Pádua, L., Adão, T., Narciso, D., Cunha, A., Magalhães, L., Peres, E.: Towards modern cost-effective and lightweight augmented reality setups. Int. J. Web Portals (IJWP) 7(2), 33–59 (2015)CrossRefGoogle Scholar
  41. 41.
    Papandreou, G., Zhu, T., Kanazawa, N., Toshev, A., Tompson, J., Bregler, C., Murphy, K.: Towards accurate multiperson pose estimation in the wild. arXiv preprint arXiv:1701.01779, 8 (2017)
  42. 42.
    Park, S., Yoo, J.H.: Human segmentation based on grabcut in real-time video sequences. In: IEEE International Conference on Consumer Electronics (ICCE), pp. 111–112. IEEE (2014)Google Scholar
  43. 43.
    Pereira, J.A.R., Veiga, R.J.M., de Freitas, M.A.G., Sardo, J.D.P., Cardoso, P.J.S., Rodrigues, J.M.F.: MIRAR: mobile image recognition based augmented reality framework. In: Mortal, A., Aníbal, J., Monteiro, J., Sequeira, C., Semião, J., Moreira da Silva, M., Oliveira, M. (eds.) INCREaSE 2017, pp. 321–337. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-70272-8_27CrossRefGoogle Scholar
  44. 44.
    Qualcomm.: Invisible museum (2017). https://goo.gl/aS0NKh. Accessed 04 Apr 2017
  45. 45.
    Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. arXiv preprint arXiv:1612.08242 (2016)
  46. 46.
    Riba Pi, E.: Implementation of a 3D pose estimation algorithm. Master’s thesis, Universitat Politècnica de Catalunya (2015)Google Scholar
  47. 47.
    Ring, J.: The laser in astronomy. New Sci. 18(344), 672–673 (1963)Google Scholar
  48. 48.
    Rodrigues, J.M.F., Pereira, J.A.R., Sardo, J.D.P., de Freitas, M.A.G., Cardoso, P.J.S., Gomes, M., Bica, P.: Adaptive card design UI implementation for an augmented reality museum application. In: Antona, M., Stephanidis, C. (eds.) UAHCI 2017. LNCS, vol. 10277, pp. 433–443. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-58706-6_35CrossRefGoogle Scholar
  49. 49.
    Rother, C., Kolmogorov, V., Blake, A.: Grabcut: Interactive foreground extraction using iterated graph cuts. In: ACM Transactions on Graphics (TOG), vol. 23, no. 3, pp. 309–314. ACM (2004)Google Scholar
  50. 50.
    Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to SIFT or SURF. In: Proceedings of International Conference on Computer Vision, pp. 2564–2571. IEEE (2011)Google Scholar
  51. 51.
    Sardo, J.D.P., Semião, J., Monteiro, J.M., Pereira, J.A.R., de Freitas, M.A.G., Esteves, E., Rodrigues, J.M.F.: Portable device for touch, taste and smell sensations in augmented reality experiences. In: Mortal, A., Aníbal, J., Monteiro, J., Sequeira, C., Semião, J., Moreira da Silva, M., Oliveira, M. (eds.) INCREaSE 2017, pp. 305–320. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-70272-8_26CrossRefGoogle Scholar
  52. 52.
    Sermanet, P., Kavukcuoglu, K., Chintala, S., LeCun, Y.: Pedestrian detection with unsupervised multi-stage feature learning. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3626–3633. IEEE (2013)Google Scholar
  53. 53.
    Serrão, M., Shahrabadi, S., Moreno, M., José, J.T., Rodrigues, J.I., Rodrigues, J.M.F., du Buf, J.M.H.: Computer vision and GIS for the navigation of blind persons in buildings. Univ. Access Inf. Soc. 14(1), 67–80 (2015)CrossRefGoogle Scholar
  54. 54.
    SM.: Science museum - atmosphere gallery (2017). https://vimeo.com/20789653. Accessed 04 Apr 2017
  55. 55.
    Sousa, L., Rodrigues, J.M.F., Monteiro, J., Cardoso, P.J.S., SemiãO, J., Alves, R.: A 3D gesture recognition interface for energy monitoring and control applications. In: Proceedings of ACE 2014, pp. 62–71 (2014)Google Scholar
  56. 56.
    Tian, Y., Luo, P., Wang, X., Tang, X.: Deep learning strong parts for pedestrian detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1904–1912 (2015)Google Scholar
  57. 57.
    TWSJ.: The wall street journal: Best apps for visiting museums (2017). https://goo.gl/cPTyP9. Accessed 04 Apr 2017
  58. 58.
    Unity.: Unity3D (2018). https://unity3d.com/pt. Accessed 10 Jan 2018
  59. 59.
    Vainstein, N., Kuflik, T., Lanir, J.: Towards using mobile, head-worn displays in cultural heritage: user requirements and a research agenda. In: Proceedings of the 21st International Conference on Intelligent User Interfaces, pp. 327–331. ACM (2016)Google Scholar
  60. 60.
    Xiao, J., Zhang, J., Adler, B., Zhang, H., Zhang, J.: Three-dimensional point cloud plane segmentation in both structured and unstructured environments. Rob. Auton.Syst. 61(12), 1641–1652 (2013)CrossRefGoogle Scholar
  61. 61.
    Zhang, L., Lin, L., Liang, X., He, K.: Is faster R-CNN doing well for pedestrian detection? In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 443–457. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46475-6_28CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.LARSyS (ISR-Lisbon) and ISEUniversity of the AlgarveFaroPortugal
  2. 2.SPIC - Creative SolutionsLouléPortugal

Personalised recommendations