Autonomous Robots

, Volume 34, Issue 3, pp 217–232 | Cite as

The ManyEars open framework

Microphone array open software and open hardware system for robotic applications
  • François Grondin
  • Dominic Létourneau
  • François Ferland
  • Vincent Rousseau
  • François Michaud
Article

Abstract

ManyEars is an open framework for microphone array-based audio processing. It consists of a sound source localization, tracking and separation system that can provide an enhanced speaker signal for improved speech and sound recognition in real-world settings. ManyEars software framework is composed of a portable and modular C library, along with a graphical user interface for tuning the parameters and for real-time monitoring. This paper presents the integration of the ManyEars Library with Willow Garage’s Robot Operating System. To facilitate the use of ManyEars on various robotic platforms, the paper also introduces the customized microphone board and sound card distributed as an open hardware solution for implementation of robotic audition systems.

Keywords

Open source Sound source localization Sound source separation Mobile robotics USB sound card Open hardware Microphone array  

References

  1. Abran-Côté, D., Bandou, M., Béland, A., Cayer, G., Choquette, S., Gosselin, F., Robitaille, F., Telly Kizito, D., Grondin, F., Létourneau, D. (2012). Eight Sound USB. Retrieved January 22, 2013 from http://eightsoundsusb.sourceforge.net.
  2. Bonnal, J., Argentieri, S., Danes, P., & Manhes, J. (2009). Speaker localization and speech extraction with the EAR sensor. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems (pp. 670–675).Google Scholar
  3. Cohen, I., & Berdugo, B. (2002). Noise Estimation by Minima Controlled Recursive Averaging for Robust Speech Enhancement. Signal Processing Letters, 9(1), 12–15.CrossRefGoogle Scholar
  4. Creative Commons (2012). Attribution-ShareAlike 3.0 Unported. Retrieved January 22, 2013 fromhttp://http://creativecommons.org/licenses/by-sa/3.0/legalcode.
  5. Danes, P., & Bonnal, J. (2010). Information-theoretic detection of broadband sources in a coherent beamspace MUSIC scheme. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems (pp. 1976–1981).Google Scholar
  6. Ephraim, Y., & Malah, D. (1984). Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator. IEEE Transactions on Acoustics, Speech and Signal Processing, 32(6), 1109–1121.CrossRefGoogle Scholar
  7. Ephraim, Y., & Malah, D. (1985). Speech Enhancement Using a Minimum Mean-Square Error Log-Spectral Amplitude Estimator. IEEE Transactions on Acoustics, Speech and Signal Processing, 33(2), 443–445.CrossRefGoogle Scholar
  8. Ferland, F., Létourneau, D., Frémy, J., Legault, M. A., Lauria, M., & Michaud, F. (2012). Natural interaction design of a humanoid robot. Journal of Human-Robot Interaction (in press).Google Scholar
  9. Free Software Foundation, Inc. (2012). GNU General Public License. Retrieved January 22, 2013 from http://www.gnu.org/licenses/gpl.html.
  10. Grondin, F., & Michaud, F. (2012). WISS, a speaker identification system for mobile robots. In Proceedings of the IEEE International Conference on Robotics and Automation (pp. 1817–1822).Google Scholar
  11. Grondin, F., Valin, J.M., Létourneau, D. (2012). The ManyEars Project: Microphone Array-Based Audition for Mobile Robots. Retrieved January 22, 2013 from http://manyears.sourceforge.net/.
  12. Haykin, S. (2002). Adaptive Filter Theory. New York: Prentice Hall.Google Scholar
  13. IntRoLab (2012). ManyEars ROS Package. Retrieved January 22, 2013 from http://introlab.github.com/introlab-ros-pkg/.
  14. Ishi, C., Chatot, O., Ishiguro, H., & Hagita, N. (2009). Evaluation of a MUSIC-based real-time sound localization of multiple sound sources in real noisy environments. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems (pp. 2027–2032).Google Scholar
  15. Knapen, G. (2006). Universal Serial Bus Device Class Definition for Audio Devices. Retrieved January 22, 2013 from http://www.usb.org/developers/devclass_docs/Audio2.0_final.zip.
  16. Létourneau, D., Valin, J. M., Côté, C., & Michaud, F. (2005). Flow Designer: the Free Data-flow Oriented Development Environment. Software, 2, 3.Google Scholar
  17. Michaud, F., Côté, C., Létourneau, D., Brosseau, Y., Valin, J. M., Beaudry, E., et al. (2007). Spartacus Attending the 2005 AAAI Conference. Autonomous Robots, 22(4), 369–383.CrossRefGoogle Scholar
  18. Mori, Y., Takatani, T., Saruwatari, H., Hiekata, T., & Morita, T. (2006). Blind source separation combining SIMO-ICA and SIMO-Model-Based binary masking. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 81–84).Google Scholar
  19. Nakadai, K., Ince, G., Nakamura, K., & Nakajima, H. (2012). Robot audition for dynamic environments. In Proceedings of the IEEE International Conference on Signal Processing, Communication and Computing (pp. 125–130).Google Scholar
  20. Nakadai, K., Takahashi, T., Okuno, H., Nakajima, H., Hasegawa, Y., & Tsujino, H. (2010). Design and Implementation of Robot Audition System ‘HARK’ Open Source Software for Listening to Three Simultaneous Speakers. Advanced Robotics, 5(6), 739–761.CrossRefGoogle Scholar
  21. Nakamura, K., Nakadai, K., & Ince, G. (2012). Real-time super-resolution sound source localization for robots. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems (pp. 694–699).Google Scholar
  22. Nokia corporation (2012). Qt - A Cross-platform Application and UI Framework. Retrieved January 22, 2013 from http://qt-project.org/.
  23. OpenCV (2012). OpenCVWiki. Retrieved January 22, 2013 from http://opencv.willowgarage.com/wiki/.
  24. Otsuka, T., Nakadai, K., Ogata, T., & Okuno, H. G. (2011). Bayesian extension of MUSIC for sound source localization and tracking. In Proceedings of the IEEE International Conference on Spoken Language Processing (pp. 3109–3112).Google Scholar
  25. Parra, L., & Alvino, C. (2002). Geometric Source Separation: Merging Convolutive Source Separation with Geometric Beamforming. IEEE Transactions on Speech and Audio Processing, 10(6), 352–362.CrossRefGoogle Scholar
  26. Quigley, M., Gerkey, B., Conley, K., Faust, J., Foote, T., Leibs, J., et al. (2009). ROS: an Open-source Robot Operating System. In Open-Source Software Workshop of the IEEE International Conference on Robotics and Automation.Google Scholar
  27. Rabinkin, D. (1998). Optimum sensor placement for microphone arrays. Ph.D. thesis, State University of New Jersey, New Brunswick.Google Scholar
  28. Valin, J.M., Létourneau, D. (2008). Flow Designer http://flowdesigner.sourceforge.net/.
  29. Valin, J. M., Michaud, F., & Rouat, J. (2006a). Robust 3D localization and tracking of sound sources using beamforming and particle filtering. In Proceedings of the IEEE International Conference on Acoustics. Speech and Signal Processing, 4, 841–844.Google Scholar
  30. Valin, J. M., Michaud, F., & Rouat, J. (2006b). Robust localization and tracking of simultaneous moving sound sources using beamforming and particle filtering. Robotics and Autonomous Systems, 55(3), 216–228.Google Scholar
  31. Valin, J.M., Michaud, F., Rouat, J., Létourneau, D. (2003). Robust bound source localization using a microphone array on a mobile robot. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems (vol. 2, pp. 1228–1233).Google Scholar
  32. Valin, J.M., Rouat, J., Michaud, F. (2004). Enhanced robot audition based on microphone array source separation with post-filter. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems(vol. 3, pp. 2123–2128).Google Scholar
  33. Valin, J. M., Yamamoto, S., Rouat, J., Michaud, F., Nakadai, K., & Okuno, H. (2007). Robust Recognition of Simultaneous Speech by a Mobile Robot. IEEE Transactions on Robotics, 23(4), 742–752.CrossRefGoogle Scholar
  34. Wolff, R., Lasseck, M., Hild, M., Vilarroya, O., & Hadzibeganovic, T. (2009). Towards human-like production and binaural localization of speech sounds in humanoid robots. In Proceedings of the IEEE International Conference on Bioinformatics and Biomedical Engineering (pp. 1–4).Google Scholar
  35. XMOS ltd: USB Audio 2.0 Multichannel Reference Design (2012). Retrieved January 22, 2013 from http://www.xmos.com/products/development-kits/usbaudio2mc.
  36. Yamamoto, S., Nakadai, K., Nakano, M., Tsujino, H., Valin, J. M., Komatani, K., et al. (2006). Real-time robot audition system that recognizes simultaneous speech in the real world. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems (pp. 5333–5338).Google Scholar
  37. Yamamoto, S., Nakadai, K., Nakano, M., Tsujino, H., Valin, J. M., Komatani, K., et al. (2007). Design and implementation of a robot audition system for automatic speech recognition of simultaneous speech. In Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding (pp. 111–116).Google Scholar
  38. Yamamoto, S., Nakadai, K., Valin, J. M., Rouat, J., Michaud, F., Komatani, K., et al. (2005). Making a robot recognize three simultaneous sentences in real-time. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems (pp. 4040–4045).Google Scholar
  39. Yamamoto, S., Takeda, R., Nakadai, K., Nakano, M., Tsujino, H., et al. (2006). Recognition of simultaneous speech by estimating reliability of separated signals for robot audition. In Yang, Q. & Webb, G. (Eds.), Proceedings of the 9th Biennial Pacific Rim International Conference on Artificial Intelligence (PRICAI) (Vol. 4099, pp. 484–494). Heidelberg: Springer.Google Scholar
  40. Yamamoto, S., Valin, J. M., Nakadai, K., Rouat, J., Michaud, F., Ogata, T., et al. (2005). Enhanced robot speech recognition based on microphone array aource separation and missing feature theory. In Proceedings of the IEEE International Conference on Robotics and Automation. (pp. 1477–1482).Google Scholar
  41. Yao, J., & Odobez, J. M. (2008). Multi-camera multi-person 3D space tracking with MCMC in surveillance scenarios. In Proceedings of the Workshop on Multi-camera and Multi-modal Sensor Fusion Algorithms and Applications.Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • François Grondin
    • 1
  • Dominic Létourneau
    • 1
  • François Ferland
    • 1
  • Vincent Rousseau
    • 1
  • François Michaud
    • 1
  1. 1.IntRoLab – Intelligent, Interactive, Integrated Robotics LabInterdisciplinary Institute for Technological Innovation Université de SherbrookeSherbrookeCanada

Personalised recommendations