Abstract
Cognitive robots are robots provided with artificial intelligence capabilities, able to properly interact with people and with the objects in an a priori unknown environment, using advanced artificial intelligence algorithms. For instance, a humanoid robot can be perceived as a plausible tourist guide in a museum. Within this context, in this work we present how the latest findings in the field of machine learning and pattern recognition can be applied to equip a robot with sufficiently advanced perception capabilities in order to successfully guide visitors through the halls and the attraction in a museum.
The challenge of running all those algorithms on a mobile, embedded platform in real time is tackled on an architectural level, where all the artificial intelligence features are tuned to run with a low computational burden and a Neural Network accelerator is included in the hardware setup. Improved robustness and predictable latency is obtained avoiding the use of cloud services in the system.
Our robot, that we call MIVIABot, is able to decode and understand speech as well as extract soft biometrics from its interlocutor such as age, gender and emotional status. The robot can integrate all those elements in a dialog, using basic Natural Language Processing capabilities.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
The M-AILABS Speech Dataset (2019). https://www.caito.de/2019/01/the-m-ailabs-speech-dataset/
Mozilla common voice, Italian dataset (2019). https://voice.mozilla.org/it/datasets
Voxforge, Italian dataset (2019). http://www.voxforge.org/it
Amert, T., Otterness, N., Yang, M., Anderson, J.H., Smith, F.D.: GPU scheduling on the NVIDIA TX2: hidden details revealed. In: 2017 IEEE Real-Time Systems Symposium (RTSS), pp. 104–115. IEEE (2017)
Bruce, A., Nourbakhsh, I., Simmons, R.: The role of expressiveness and attention in human-robot interaction. In: Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No. 02CH37292), vol. 4, pp. 4138–4142. IEEE (2002)
Collobert, R., Puhrsch, C., Synnaeve, G.: Wav2Letter: an end-to-end convnet-based speech recognition system. arXiv preprint arXiv:1609.03193 (2016)
Duffy, B.R.: Anthropomorphism and the social robot. Robot. Auton. Syst. 42(3–4), 177–190 (2003)
Flacco, F., Kröger, T., De Luca, A., Khatib, O.: A depth space approach to human-robot collision avoidance. In: 2012 IEEE International Conference on Robotics and Automation, pp. 338–345. IEEE (2012)
Foggia, P., Greco, A., Percannella, G., Vento, M., Vigilante, V.: A system for gender recognition on mobile robots. In: Proceedings of the 2019 on Applications of Intelligent Systems (APPIS). ACM (2019)
Fulgenzi, C., Spalanzani, A., Laugier, C.: Dynamic obstacle avoidance in uncertain environment combining PVOs and occupancy grid. In: Proceedings 2007 IEEE International Conference on Robotics and Automation, pp. 1610–1616. IEEE (2007)
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376. ACM (2006)
Hannun, A.Y., Maas, A.L., Jurafsky, D., Ng, A.Y.: First-pass large vocabulary continuous speech recognition using bi-directional recurrent DNNs. arXiv preprint arXiv:1408.2873 (2014)
Huang, G.B., Mattar, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: a database for studying face recognition in unconstrained environments. In: Workshop on Faces in ‘Real-Life’ Images: Detection, Alignment, and Recognition (2008)
Kemp, C.C., Edsinger, A., Torres-Jara, E.: Challenges for robot manipulation in human environments [grand challenges of robotics]. IEEE Robot. Autom. Mag. 14(1), 20–29 (2007)
Kunze, J., Kirsch, L., Kurenkov, I., Krug, A., Johannsmeier, J., Stober, S.: Transfer learning for speech recognition on a budget. arXiv preprint arXiv:1706.00290 (2017)
Liao, S., Jain, A.K., Li, S.Z.: A fast and accurate unconstrained face detector. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 211–223 (2016)
Mollahosseini, A., Hasani, B., Mahoor, M.H.: AffectNet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. 10, 18–31 (2017)
Mori, M., MacDorman, K.F., Kageki, N.: The uncanny valley [from the field]. IEEE Robot. Autom. Mag. 19(2), 98–100 (2012)
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2001)
Parkhi, O.M., Vedaldi, A., Zisserman, A., et al.: Deep face recognition. In: BMVC (2015)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Savchenko, A.V.: Efficient facial representations for age, gender and identity recognition in organizing photo albums using multi-output CNN. arXiv preprint arXiv:1807.07718 (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Saggese, A., Vento, M., Vigilante, V. (2019). MIVIABot: A Cognitive Robot for Smart Museum. In: Vento, M., Percannella, G. (eds) Computer Analysis of Images and Patterns. CAIP 2019. Lecture Notes in Computer Science(), vol 11678. Springer, Cham. https://doi.org/10.1007/978-3-030-29888-3_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-29888-3_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29887-6
Online ISBN: 978-3-030-29888-3
eBook Packages: Computer ScienceComputer Science (R0)