SpeakerSense: Energy Efficient Unobtrusive Speaker Identification on Mobile Phones

Lu, Hong; Bernheim Brush, A. J.; Priyantha, Bodhi; Karlson, Amy K.; Liu, Jie

doi:10.1007/978-3-642-21726-5_12

Hong Lu¹⁹,
A. J. Bernheim Brush¹⁹,
Bodhi Priyantha¹⁹,
Amy K. Karlson¹⁹ &
…
Jie Liu¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6696))

Included in the following conference series:

International Conference on Pervasive Computing

3817 Accesses
53 Citations

Abstract

Automatically identifying the person you are talking with using continuous audio sensing has the potential to enable many pervasive computing applications from memory assistance to annotating life logging data. However, a number of challenges, including energy efficiency and training data acquisition, must be addressed before unobtrusive audio sensing is practical on mobile devices. We built SpeakerSense, a speaker identification prototype that uses a heterogeneous multi-processor hardware architecture that splits computation between a low power processor and the phone’s application processor to enable continuous background sensing with minimal power requirements. Using SpeakerSense, we benchmarked several system parameters (sampling rate, GMM complexity, smoothing window size, and amount of training data needed) to identify thresholds that balance computation cost with performance. We also investigated channel compensation methods that make it feasible to acquire training data from phone calls and an automatic segmentation method for training speaker models based on one-to-one conversations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hayes, G., Patel, S., Truong, K., Iachello, G., Kientz, J., Farmer, R., Abowd, G.: The Personal Audio Loop: Designing a Ubiquitous Audio-Based Memory Aid. In: Proc. Mobile HCI 2004 (2004)
Google Scholar
Hodges, S., Williams, L., Berry, E., Izadi, S., Srinivasan, J., Butler, A., Smyth, G., Kapur, N., Wood, K.: SenseCam: A Retrospective Memory Aid. In: Dourish, P., Friday, A. (eds.) UbiComp 2006. LNCS, vol. 4206, pp. 177–193. Springer, Heidelberg (2006)
Chapter Google Scholar
Huang, L., Yang, C.: A Novel Approach to Robust Speech Endpoint Detection in Car Environments. In: ICASSP 2000, Istambul, Turkey, vol. 3, pp. 1751–1754 (May 2000)
Google Scholar
Kapur, N.: Compensating for Memory Deficits with Memory Aids. In: Wilson, B. (ed.) Memory Rehabilitation Integrating Theory and Practice, pp. 52–73. Guilford Press, New York
Google Scholar
Lee, M., Dey, A.: Lifelogging Memory Appliance for People with Episodic Memory Impairment. In: Proc. UbiComp, pp. 44–53 (2008)
Google Scholar
Lu, H., Pan, W., Lane, W., Choudhury, T., Campbell, A.: SoundSense: scalable sound sensing for people-centric applications on mobile phones. In: Proc. MobiSys 2009, pp. 165–178 (2009)
Google Scholar
Miluzzo, E., Cornelius, C., Ramaswamy, A., Choudhury, T., Liu, Z., Campbell, A.: Darwin Phones: the Evolution of Sensing and Inference on Mobile Phones. In: Proc. MobiSys 2010, pp. 5–20 (2010)
Google Scholar
Miluzzo, E., Lane, N., Fodor, K., Peterson, R., Lu, H., Musolesi, M., Eisenman, S., Zheng, X., Campbell, A.: Sensing meets mobile social networks: The design, implementation and evaluation of the CenceMe application. In: Proc. SenSys 2008, pp. 337–350 (2008)
Google Scholar
Power Monitor, http://www.msoon.com/LabEquipment/PowerMonitor/
Priyantha, B., Lymberopoulos, D., Liu, J.: LittleRock: Enabling Energy Effcient Continuous Sensing on Mobile Phones. IEEE Pervasive Computing Magazine (April-June 2011)
Google Scholar
Rabiner, L.R., Cheng, M.J., Rosenberg, A.E., McGonegal, C.A.: Acomparative performance study of several pitchdetection algorithms. IEEE Trans. Acoust., Speech, and Signal Processing, 399–418 (October 1976)
Google Scholar
Rachuri, K., Musolesi, M., Mascolo, C., Rentfrow, P., Longworth, C., Aucinas, A.: EmotionSense: A Mobile Phone based Adaptive Platform for Experimental Social Psychology Research. In: Proc. UbiComp 2010, pp. 281–290 (2010)
Google Scholar
Reynolds, D.A.: An Overview of Automatic Speaker Recognition Technology. In: Proc. Int. Conf. Acoustics, Speech, and Signal Processing, vol. 4, pp. 4072–4075 (2002)
Google Scholar
Reynolds, D.A., Rose, R.C.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3, 72–83 (1995)
Article Google Scholar
Saunders, J.: Real time discrimination of broadcast speech/music. In: Proc. Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), pp. 993–996 (1996)
Google Scholar
Scheirer, E., Slaney, M.: Construction and evaluation of a robust multifeature speech/music discriminator. In: Proc. ICASSP 1998 (May 1998)
Google Scholar
Vemuri, S., Schmandt, C., Bender, W.: iRemember: a Personal, Long-term Memory Prosthesis. In: Proc. CARPE 2006 (2006)
Google Scholar
Viikki, O., Laurila, K.: Cepstral domain segmental feature vector normalization for noise robust speech recognition. Speech Communication 25, 133–147 (1998)
Article Google Scholar
Wang, Y., Lin, J., Annavaram, M., Jacobson, Q., Hong, J., Krishnamachari, B., Sadeh, N.: A framework of energy efficient mobile sensing for automatic user state recognition. In: Proc. MobiSys, pp. 179–192
Google Scholar
Zheng, F., Zhang, G., Song, Z.: Comparison of different implementations of MFCC. J. Computer Science & Technology 16(6), 582–589 (2001)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft Research, One Microsoft Way, Redmond, WA, 98052, USA
Hong Lu, A. J. Bernheim Brush, Bodhi Priyantha, Amy K. Karlson & Jie Liu

Authors

Hong Lu
View author publications
You can also search for this author in PubMed Google Scholar
A. J. Bernheim Brush
View author publications
You can also search for this author in PubMed Google Scholar
Bodhi Priyantha
View author publications
You can also search for this author in PubMed Google Scholar
Amy K. Karlson
View author publications
You can also search for this author in PubMed Google Scholar
Jie Liu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Intel Labs Santa Clara, Intel Corporation, 2200 Mission College Blvd., 95052, Santa Clara, CA, USA
Kent Lyons
Google, Seattle, 651 N 34th Street, 98103, Seattle, WA, USA
Jeffrey Hightower
Department of Informatics, University of Zurich, Binzmühlestrasse 14, 8050, Zurich, Switzerland
Elaine M. Huang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lu, H., Bernheim Brush, A.J., Priyantha, B., Karlson, A.K., Liu, J. (2011). SpeakerSense: Energy Efficient Unobtrusive Speaker Identification on Mobile Phones. In: Lyons, K., Hightower, J., Huang, E.M. (eds) Pervasive Computing. Pervasive 2011. Lecture Notes in Computer Science, vol 6696. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21726-5_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-21726-5_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21725-8
Online ISBN: 978-3-642-21726-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics