Abstract
In static scenarios, binaural sound localization is fundamentally limited by front-back ambiguity and distance non-observability. Over the past few years, “active” schemes have been shown to overcome these shortcomings, by combining spatial binaural cues with the motor commands of the sensor. In this context, given a Gaussian prior on the relative position to a source, this paper determines an admissible motion of a binaural head which leads, on average, to the one-step-ahead most informative audio-motor localization. To this aim, a constrained optimization problem is set up, which consists in maximizing the entropy of the next predicted measurement probability density function over a cylindric admissible set. The method is appraised through geometrical arguments, and validated in simulations and on real-life robotic experiments.
Similar content being viewed by others
Notes
Consider again the dynamic equation (2) with no dynamic noise, and assume that the posterior covariance \(\overline{P}_{k|k}\) of the full state \(X_k\) (defined in \(\mathbb {R}^3\)) is \(\overline{P}_{k|k} = {{\mathrm{diag}}}(0,P_{k|k})\). As the vector \(R^T(\phi ){T}\) is constant, the next “full” predicted covariance \(\overline{P}_{k+1|k}\) writes as \(\overline{P}_{k+1|k} = R^T(\phi )\overline{P}_{k|k}R(\phi )\), with \(R(\phi ) = {{\mathrm{diag}}}(1,r(\phi ))\), and \({|R(\phi )|{}={}|r(\phi )|{}={}1}\). Consequently, \(\overline{P}_{k+1|k} = {{\mathrm{diag}}}(0,P_{k+1|k})\) with \({|P_{k+1|k}|=|r^T(\phi )P_{k|k}r(\phi )|=|P_{k|k}|}\).
References
Aaronson, N., & Hartmann, W. (2014). Testing, correcting, and extending the Woodworth model for interaural time difference. The Journal of the Acoustical Society of America, 135, 817–823.
Bourgault, F., Makarenko, A., Williams, S., Grocholsky, B., Durrant-Whyte, H. (2002). Information based adaptive robotic exploration. In IEEE/RSJ international conference on intelligent robots and systems, (IROS’2002), Lausanne, Switzerland.
Bustamante, G., Danès, P., Forgue, T., Podlubne, A. (2016) Towards information-based feedback control for binaural active localization. In IEEE international conference on acoustics, speech, and signal processing (ICASSP’2016), Shanghai, China.
Bustamante, G., Portello, A., Danès, P. (2015). A three-stage framework to active source localization from a binaural head. In IEEE international conference on acoustics, speech, and signal processing (ICASSP’2015), Brisbane, Australia.
Cooke, M., Lu, Y., Lu, Y., Horaud, R. (2007). Active hearing, active speaking. In International symposium on auditory and audiological research (ISAAR’07), Marienlyst, Helsigør, Denmark.
Cover, T., & Thomas, J. (1991). Elements of information theory. New York: Wiley.
Denzler, J., & Brown, C. (2002). Information theoretic sensor data selection for active object recognition and state estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(2), 145–157.
Deutsch, B., Zobel, M., Denzler, J., & Niemann, H. (2004). Multi-step entropy based sensor control for visual object tracking. Pattern Recognition, 3175, 359–366.
Feder, H., Leonard, J., & Smith, C. (1999). Adaptive mobile robot navigation and mapping. The International Journal of Robotics Research, 18(7), 650–668.
Forster, C., Pizzoli, M., Scaramuzza, D. (2014). Appearance-based active, monocular, dense reconstruction for micro aerial vehicles. In Proceedings of robotics, science and systems, Berkeley, USA.
Grocholsky, B., Makarenko, A., Durrant-Whyte, H. (2003). Information-theoretic coordinated control of multiple sensor platforms. In IEEE international conference on robotics and automation, (ICRA’03), Taipei, Taiwan.
Julian, B. (2013). Mutual information-based gradient-ascent control for distributed robotics. PhD thesis, Massachusetts Institute of Technology.
Julier, S. J., & Uhlmann, J. K. (2004). Unscented filtering and nonlinear estimation. Proceedings of the IEEE, 92(3), 401–422. doi:10.1109/JPROC.2003.823141.
Kumon, M., Fukushima, K., Kunimatsu, S., Ishitobi, M. (2010). Motion planning based on simultaneous perturbation stochastic approximation for mobile auditory robots. In IEEE/RSJ international conference on intelligent robots and systems (IROS’2010), Taipei, Taiwan.
Le Cadre, J. P., & Laurent-Michel, S. (1999). Optimizing the receiver maneuvers for bearings-only tracking. Automatica, 35(4), 591–606.
Mallet, A., Pasteur, C., Herrb, M., Lemaignan, S., Ingrand, F. (2010). Genom3: Building middleware-independent robotic components. In IEEE international conference on robotics and automation, (ICRA’2010), Anchorage, Alaska.
Manyika, J. (1993). An information-theoretic approach to data fusion and sensor management. PhD thesis, University of Oxford.
Martinson, E., Apker, T., Bugajska, M. (2011). Optimizing a reconfigurable robotic microphone array. In IEEE/RSJ international conference on intelligent robots and systems (IROS’2011), San Francisco, California.
Martinson, E., & Schultz, A. (2009). Discovery of sound sources by an autonomous mobile robot. Autonomous Robots, 27, 221–237.
Nakadai, K., Lourens, T., Okuno, H., Kitano, H. (2000). Active audition for humanoid. In National conference on artificial intelligence (AAAI’2000). Austin, TX.
Portello, A., Bustamante, G., Danès, P., Mifsud, A. (2014a). Localization of multiple sources from a binaural head in a known noisy environment. In IEEE/RSJ international conference on intelligent robots and systems (IROS’2014), Chicago, IL.
Portello, A., Bustamante, G., Danès, P., Piat, J., Manhès, J. (2014b). Active localization of an intermittent sound source from a moving binaural sensor. In Forum Acustium (FA’2014), Krakow, Poland.
Portello, A., Danès, P., Argentieri, S. (2012). Active binaural localization of intermittent moving sources in the presence of false measurements. In IEEE/RSJ international conference on intelligent robots and systems (IROS’2012).
Portello, A., Danès, P., Argentieri, S., Pledel, S. (2013). HRTF-based source azimuth estimation and activity detection from a binaural sensor. In IEEE/RSJ international conference on intelligent robots and systems (IROS’2013), Tokyo, Japan.
Ristic, B., & Arulampalam, M. (2003). Tracking a manoeuvring target using angle-only measurements: Algorithms and performance. Signal Processing, 83(6), 1223–1238.
Sasaki, Y., Thompson, S., Kaneyoshi, M., Kagami, S. (2010). Map-generation and identification of multiple sound sources from robot in motion. In IEEE/RSJ international conference on intelligent robots and systems (IROS’2010), Taipei, Taiwan (pp. 437–443).
Sommerlade, E., Reid, I. (2008). Information-theoretic active scene exploration. In IEEE conference on computer vision and pattern recognition, (CVPR’2008), Anchorage, Alaska.
Thrun, S., Burgard, W., & Fox, D. (2005). Probabilistic robotics. Cambridge, MA: The MIT Press.
Vincent, E., Sini, A., Charpillet, F. (2015). Audio source localization by optimal control of a mobile robot. In IEEE international conference on acoustics, speech and signal processing (ICASSP’2015), Brisbane, Australia.
Acknowledgements
The authors would like to thank Matthieu Herrb, Anthony Mallet, and Xavier Dollat for their invaluable help.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was partially supported by EU FET Grant Two!Ears, ICT-618075, www.twoears.eu.
This is one of several papers published in Autonomous Robots comprising the Special Issue on Active Perception.
Appendix
Appendix
Consider the posterior state pdf \(p(x_k|z_{1:k})\) of the sensor-to-source position at time k, and \({\mathcal {N}}(x_k;{\hat{x}}_{k|k},P_{k|k})\) the approximate Gaussian belief. This pdf can be mapped into the 1D Gaussian approximation \({\mathcal {N}}(z_{k+1};\hat{z}_{k+1|k},S_{k+1|k})\) of the predicted measurement pdf \(p(z_{k+1}|z_{1:k})\), by using the unscented transform. The aim is then to maximize the variance \(S_{k+1|k}\) so as to increase the entropy \(h(z_{k+1}|z_{1:k})\). This involves the composition of several functions.
First the sigma-points \(\left\{ X_{i}^{-}\right\} \) corresponding to \({p(x_{k}|z_{1:k}) = {\mathcal {N}}(x_k;{\hat{x}}_{k|k},P_{k|k})}\) are computed from the posterior mean \({\hat{x}}_{k|k}\) of the state vector at time k and the Cholesky decomposition \(P_{k|k} = L_{k|k}L_{k|k}^T\) of the posterior covariance:
The sigma-points \(\left\{ X_{i}^{+}\right\} \) of the next predicted state pdf \(p(x_{k+1}|z_{1:k}) = {\mathcal {N}}(x_k;{\hat{x}}_{k+1|k},P_{k+1|k})\) can be obtained by applying the translation and rotation on each sigma point in the set \(\left\{ X_{i}^{-}\right\} \). Note that (2) is defined as a function of \((T_y,T_z,\phi )\), so that
Then the set of sigma-points \(\left\{ Z_{i}^+\right\} \) of the predicted measurement pdf \(p(z_{k+1}|z_{1:k}) = {\mathcal {N}}(z_k;\hat{z}_{k+1|k},S_{k+1|k})\) can be obtained from \(\left\{ X_{i}^+\right\} \) defined in (16) by:
with \(X_{i}^+(1)\) and \(X_{i}^+(2)\) the components of \(X_{i}^+\), and \(l(\cdot )\) the measurement equation used to guide the exploration. Finally the mean \(\hat{z}_{k+1|k}\) and variance \(S_{k+1|k}\) of \(p(z_{k+1}|z_{1:k})\) are computed by
with \(\left\{ w_m^i\right\} \) and \(\left\{ w_c^i\right\} \) the classic weights of the unscented transform.
The log of the variance \(S_{k+1|k}\) comes as a function of the finite translation and rotation, i.e., \(\log S_{k+1|k} = {F}_{k}(T_y,T_z,\phi )\). However the maximum of this function is not analytically tractable. Its gradient around \({U} = (T_y,T_z,\phi )\) is then computed as follows.
The first order Taylor expansion of the functions \(\Phi _{X_{i}^-}\), \(\mathrm {atan2}\), l, and \(\log \), are composed around U with infinitesimal translations and rotation \({du} = (dT_y, dT_z, d\phi )^T\):
with \({\nabla }\) the gradient operator. \(J{\Phi _{X_i^-}}({U})\) is the Jacobian of \(\Phi _{X_i^-}\) at U. Then the result of the composition, noted \(Z_{i}(dT_y,dT_z,d\phi )\), is used to retrieve the mean and the variance with (18) and (19). Finally, the first order Taylor expansion of \(F_k(dT_y,dT_z,d\phi )\) is obtained, highlighting the gradient \({\nabla }F_k\):
Rights and permissions
About this article
Cite this article
Bustamante, G., Danès, P., Forgue, T. et al. An information based feedback control for audio-motor binaural localization. Auton Robot 42, 477–490 (2018). https://doi.org/10.1007/s10514-017-9639-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10514-017-9639-8