The ManyEars open framework

Microphone array open software and open hardware system for robotic applications

Abstract

ManyEars is an open framework for microphone array-based audio processing. It consists of a sound source localization, tracking and separation system that can provide an enhanced speaker signal for improved speech and sound recognition in real-world settings. ManyEars software framework is composed of a portable and modular C library, along with a graphical user interface for tuning the parameters and for real-time monitoring. This paper presents the integration of the ManyEars Library with Willow Garage’s Robot Operating System. To facilitate the use of ManyEars on various robotic platforms, the paper also introduces the customized microphone board and sound card distributed as an open hardware solution for implementation of robotic audition systems.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Notes

  1. 1.

    Set by parameter GLOBAL_MICSNUMBER in the file parameters.h

References

  1. Abran-Côté, D., Bandou, M., Béland, A., Cayer, G., Choquette, S., Gosselin, F., Robitaille, F., Telly Kizito, D., Grondin, F., Létourneau, D. (2012). Eight Sound USB. Retrieved January 22, 2013 from http://eightsoundsusb.sourceforge.net.

  2. Bonnal, J., Argentieri, S., Danes, P., & Manhes, J. (2009). Speaker localization and speech extraction with the EAR sensor. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems (pp. 670–675).

  3. Cohen, I., & Berdugo, B. (2002). Noise Estimation by Minima Controlled Recursive Averaging for Robust Speech Enhancement. Signal Processing Letters, 9(1), 12–15.

    Article  Google Scholar 

  4. Creative Commons (2012). Attribution-ShareAlike 3.0 Unported. Retrieved January 22, 2013 fromhttp://http://creativecommons.org/licenses/by-sa/3.0/legalcode.

  5. Danes, P., & Bonnal, J. (2010). Information-theoretic detection of broadband sources in a coherent beamspace MUSIC scheme. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems (pp. 1976–1981).

  6. Ephraim, Y., & Malah, D. (1984). Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator. IEEE Transactions on Acoustics, Speech and Signal Processing, 32(6), 1109–1121.

    Article  Google Scholar 

  7. Ephraim, Y., & Malah, D. (1985). Speech Enhancement Using a Minimum Mean-Square Error Log-Spectral Amplitude Estimator. IEEE Transactions on Acoustics, Speech and Signal Processing, 33(2), 443–445.

    Article  Google Scholar 

  8. Ferland, F., Létourneau, D., Frémy, J., Legault, M. A., Lauria, M., & Michaud, F. (2012). Natural interaction design of a humanoid robot. Journal of Human-Robot Interaction (in press).

  9. Free Software Foundation, Inc. (2012). GNU General Public License. Retrieved January 22, 2013 from http://www.gnu.org/licenses/gpl.html.

  10. Grondin, F., & Michaud, F. (2012). WISS, a speaker identification system for mobile robots. In Proceedings of the IEEE International Conference on Robotics and Automation (pp. 1817–1822).

  11. Grondin, F., Valin, J.M., Létourneau, D. (2012). The ManyEars Project: Microphone Array-Based Audition for Mobile Robots. Retrieved January 22, 2013 from http://manyears.sourceforge.net/.

  12. Haykin, S. (2002). Adaptive Filter Theory. New York: Prentice Hall.

    Google Scholar 

  13. IntRoLab (2012). ManyEars ROS Package. Retrieved January 22, 2013 from http://introlab.github.com/introlab-ros-pkg/.

  14. Ishi, C., Chatot, O., Ishiguro, H., & Hagita, N. (2009). Evaluation of a MUSIC-based real-time sound localization of multiple sound sources in real noisy environments. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems (pp. 2027–2032).

  15. Knapen, G. (2006). Universal Serial Bus Device Class Definition for Audio Devices. Retrieved January 22, 2013 from http://www.usb.org/developers/devclass_docs/Audio2.0_final.zip.

  16. Létourneau, D., Valin, J. M., Côté, C., & Michaud, F. (2005). Flow Designer: the Free Data-flow Oriented Development Environment. Software, 2, 3.

  17. Michaud, F., Côté, C., Létourneau, D., Brosseau, Y., Valin, J. M., Beaudry, E., et al. (2007). Spartacus Attending the 2005 AAAI Conference. Autonomous Robots, 22(4), 369–383.

    Article  Google Scholar 

  18. Mori, Y., Takatani, T., Saruwatari, H., Hiekata, T., & Morita, T. (2006). Blind source separation combining SIMO-ICA and SIMO-Model-Based binary masking. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (pp. 81–84).

  19. Nakadai, K., Ince, G., Nakamura, K., & Nakajima, H. (2012). Robot audition for dynamic environments. In Proceedings of the IEEE International Conference on Signal Processing, Communication and Computing (pp. 125–130).

  20. Nakadai, K., Takahashi, T., Okuno, H., Nakajima, H., Hasegawa, Y., & Tsujino, H. (2010). Design and Implementation of Robot Audition System ‘HARK’ Open Source Software for Listening to Three Simultaneous Speakers. Advanced Robotics, 5(6), 739–761.

    Article  Google Scholar 

  21. Nakamura, K., Nakadai, K., & Ince, G. (2012). Real-time super-resolution sound source localization for robots. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems (pp. 694–699).

  22. Nokia corporation (2012). Qt - A Cross-platform Application and UI Framework. Retrieved January 22, 2013 from http://qt-project.org/.

  23. OpenCV (2012). OpenCVWiki. Retrieved January 22, 2013 from http://opencv.willowgarage.com/wiki/.

  24. Otsuka, T., Nakadai, K., Ogata, T., & Okuno, H. G. (2011). Bayesian extension of MUSIC for sound source localization and tracking. In Proceedings of the IEEE International Conference on Spoken Language Processing (pp. 3109–3112).

  25. Parra, L., & Alvino, C. (2002). Geometric Source Separation: Merging Convolutive Source Separation with Geometric Beamforming. IEEE Transactions on Speech and Audio Processing, 10(6), 352–362.

    Article  Google Scholar 

  26. Quigley, M., Gerkey, B., Conley, K., Faust, J., Foote, T., Leibs, J., et al. (2009). ROS: an Open-source Robot Operating System. In Open-Source Software Workshop of the IEEE International Conference on Robotics and Automation.

  27. Rabinkin, D. (1998). Optimum sensor placement for microphone arrays. Ph.D. thesis, State University of New Jersey, New Brunswick.

  28. Valin, J.M., Létourneau, D. (2008). Flow Designer http://flowdesigner.sourceforge.net/.

  29. Valin, J. M., Michaud, F., & Rouat, J. (2006a). Robust 3D localization and tracking of sound sources using beamforming and particle filtering. In Proceedings of the IEEE International Conference on Acoustics. Speech and Signal Processing, 4, 841–844.

  30. Valin, J. M., Michaud, F., & Rouat, J. (2006b). Robust localization and tracking of simultaneous moving sound sources using beamforming and particle filtering. Robotics and Autonomous Systems, 55(3), 216–228.

    Google Scholar 

  31. Valin, J.M., Michaud, F., Rouat, J., Létourneau, D. (2003). Robust bound source localization using a microphone array on a mobile robot. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems (vol. 2, pp. 1228–1233).

  32. Valin, J.M., Rouat, J., Michaud, F. (2004). Enhanced robot audition based on microphone array source separation with post-filter. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems(vol. 3, pp. 2123–2128).

  33. Valin, J. M., Yamamoto, S., Rouat, J., Michaud, F., Nakadai, K., & Okuno, H. (2007). Robust Recognition of Simultaneous Speech by a Mobile Robot. IEEE Transactions on Robotics, 23(4), 742–752.

    Article  Google Scholar 

  34. Wolff, R., Lasseck, M., Hild, M., Vilarroya, O., & Hadzibeganovic, T. (2009). Towards human-like production and binaural localization of speech sounds in humanoid robots. In Proceedings of the IEEE International Conference on Bioinformatics and Biomedical Engineering (pp. 1–4).

  35. XMOS ltd: USB Audio 2.0 Multichannel Reference Design (2012). Retrieved January 22, 2013 from http://www.xmos.com/products/development-kits/usbaudio2mc.

  36. Yamamoto, S., Nakadai, K., Nakano, M., Tsujino, H., Valin, J. M., Komatani, K., et al. (2006). Real-time robot audition system that recognizes simultaneous speech in the real world. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems (pp. 5333–5338).

  37. Yamamoto, S., Nakadai, K., Nakano, M., Tsujino, H., Valin, J. M., Komatani, K., et al. (2007). Design and implementation of a robot audition system for automatic speech recognition of simultaneous speech. In Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding (pp. 111–116).

  38. Yamamoto, S., Nakadai, K., Valin, J. M., Rouat, J., Michaud, F., Komatani, K., et al. (2005). Making a robot recognize three simultaneous sentences in real-time. In Proceedings of the IEEE International Conference on Intelligent Robots and Systems (pp. 4040–4045).

  39. Yamamoto, S., Takeda, R., Nakadai, K., Nakano, M., Tsujino, H., et al. (2006). Recognition of simultaneous speech by estimating reliability of separated signals for robot audition. In Yang, Q. & Webb, G. (Eds.), Proceedings of the 9th Biennial Pacific Rim International Conference on Artificial Intelligence (PRICAI) (Vol. 4099, pp. 484–494). Heidelberg: Springer.

  40. Yamamoto, S., Valin, J. M., Nakadai, K., Rouat, J., Michaud, F., Ogata, T., et al. (2005). Enhanced robot speech recognition based on microphone array aource separation and missing feature theory. In Proceedings of the IEEE International Conference on Robotics and Automation. (pp. 1477–1482).

  41. Yao, J., & Odobez, J. M. (2008). Multi-camera multi-person 3D space tracking with MCMC in surveillance scenarios. In Proceedings of the Workshop on Multi-camera and Multi-modal Sensor Fusion Algorithms and Applications.

Download references

Acknowledgments

This work was supported in part by the Natural Sciences and Engineering Research Council of Canada, the Canadian Foundation for Innovation and the Canada Research Chair program.

Author information

Affiliations

Authors

Corresponding author

Correspondence to François Michaud.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Grondin, F., Létourneau, D., Ferland, F. et al. The ManyEars open framework. Auton Robot 34, 217–232 (2013). https://doi.org/10.1007/s10514-012-9316-x

Download citation

Keywords

  • Open source
  • Sound source localization
  • Sound source separation
  • Mobile robotics
  • USB sound card
  • Open hardware
  • Microphone array