Future Perspective

Ellis, Dan; Virtanen, Tuomas; Plumbley, Mark D.; Raj, Bhiksha

doi:10.1007/978-3-319-63450-0_14

Dan Ellis⁴,
Tuomas Virtanen⁵,
Mark D. Plumbley⁶ &
…
Bhiksha Raj⁷

2657 Accesses
2 Citations

Abstract

This book has covered the underlying principles and technologies of sound recognition, and described several current application areas. However, the field is still very young; this chapter briefly outlines several emerging areas, particularly relating to the provision of the very large training sets that can be exploited by deep learning approaches. We also forecast some of the technological and application advances we expect in the short-to-medium future.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Andrews, S., Tsochantaridis, I., Hofmann, T.: Support vector machines for multiple-instance learning. In: Advances in Neural Information Processing Systems, pp. 577–584 (2003)
Google Scholar
Auer, P., Ortner, R.: A boosting approach to multiple instance learning. In: European Conference on Machine Learning, pp. 63–74. Springer, Berlin (2004)
Google Scholar
Aytar, Y., Vondrick, C., Torralba, A.: Soundnet: learning sound representations from unlabeled video. In: Advances in Neural Information Processing Systems, pp. 892–900 (2016)
Google Scholar
Babenko, B.: Multiple instance learning: algorithms and applications. Technical Report, Department of Computer Science and Engineering, University of California, San Diego (2008)
Google Scholar
Bandyopadhyay, S., Ghosh, D., Mitra, R., Zhao, Z.: MBSTAR: multiple instance learning for predicting specific functional binding sites in microRNA targets. Sci. Rep. 5, 8004 (2015)
Article Google Scholar
Briggs, F., Lakshminarayanan, B., Neal, L., Fern, X.Z., Raich, R., Hadley, S.J., Hadley, A.S., Betts, M.G.: Acoustic classification of multiple simultaneous bird species: a multi-instance multi-label approach. J. Acoust. Soc. Am. 131(6), 4640–4650 (2012)
Article Google Scholar
Büchler, M., Allegro, S., Launer, S., Dillier, N.: Sound classification in hearing aids inspired by auditory scene analysis. EURASIP J. Adv. Signal Process. 2005(18), 387845 (2005)
Article MATH Google Scholar
Cohn, D.A., Ghahramani, Z., Jordan, M.I.: Active learning with statistical models. J. Artif. Intell. Res. 4(1), 129–145 (1996)
MATH Google Scholar
Cooke, M., Ellis, D.P.: The auditory organization of speech and other sources in listeners and computational models. Speech Commun. 35(3), 141–177 (2001)
Article MATH Google Scholar
Correia, J., Trancoso, I., Raj, B.: Adaptation of SVM for MIL for inferring the polarity of movies and movie reviews. In: Spoken Language Technology Workshop (SLT), 2016 IEEE, pp. 258–264. IEEE, New York (2016)
Google Scholar
Dalvi, B., Callan, J., Cohen, W.W.: Entity list completion using set expansion techniques. In: Proceedings of the Nineteenth Text REtrieval Conference (TREC 2010). NIST, Gaithersburg MD (2011)
Google Scholar
Doppler Labs: HearOne wireless smart earbuds (2017). http://hereplus.me
Google Scholar
Elizalde, B., Raj, B., Vincent, E.: Large-scale weakly supervised sound event detection for smart cars (2017). http://www.cs.tut.fi/sgn/arg/dcase2017/challenge/task-large-scale-sound-event-detection
Google Scholar
Frey, B.J., Deng, L., Acero, A., Kristjansson, T.T.: ALGONQUIN: iterating laplace’s method to remove multiple types of acoustic distortion for robust speech recognition. In: INTERSPEECH, pp. 901–904 (2001)
Google Scholar
Gemmeke, J.F., Ellis, D.P.W., Freedman, D., Jansen, A., Lawrence, W., Moore, R.C., Plakal, M., Ritter, M.: Audio set: an ontology and human-labeled dataset for audio events. In: IEEE ICASSP 2017, New Orleans (2017). https://research.google.com/pubs/pub45857.html
Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th Conference on Computational Linguistics, vol. 2, pp. 539–545. Association for Computational Linguistics, Stroudsburg, PA (1992)
Google Scholar
Hershey, S., Chaudhury, S., Ellis, D.P.W., Gemmeke, J., Jansen, A., Moore, R.C., Plakal, M., Sauros, R.A., Seybold, B., Slaney, M., Weiss, R.: CNN architectures for large-scale audio classification. In: IEEE ICASSP 2017, New Orleans (2017). https://research.google.com/pubs/pub45611.html
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Article MathSciNet MATH Google Scholar
Jansen, A., Gemmeke, J.F., Ellis, D.P.W., Liu, X., Lawrence, W., Freedman, D.: Large-scale audio event discovery in one million youtube videos. In: IEEE ICASSP 2017, New Orleans (2017)
Google Scholar
Kingsbury, B.E., Morgan, N., Greenberg, S.: Robust speech recognition using the modulation spectrogram. Speech Commun. 25(1), 117–132 (1998)
Article Google Scholar
Klapuri, A.: Multiple fundamental frequency estimation by summing harmonic amplitudes. In: ISMIR, pp. 216–221 (2006)
Google Scholar
Kong, Q., Xu, Y., Wang, W., Plumbley, M.D.: A joint detection-classification model for audio tagging of weakly labelled data. CoRR abs/1610.01797 (2016). http://arxiv.org/abs/1610.01797
Kotzias, D., Denil, M., De Freitas, N., Smyth, P.: From group to individual labels using deep features. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 597–606. ACM, New York (2015)
Google Scholar
Kumar, A., Raj, B.: Audio event detection using weakly labeled data. In: Proceedings of the 2016 ACM on Multimedia Conference, pp. 1038–1047. ACM, New York (2016)
Google Scholar
Kumar, A., Raj, B.: Weakly supervised scalable audio content analysis. In: 2016 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE, New York (2016)
Google Scholar
Kumar, A., Raj, B., Nakashole, N.: Discovering sound concepts and acoustic relations in text. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, New York (2017)
Google Scholar
Leistner, C., Saffari, A., Bischof, H.: Miforests: multiple-instance learning with randomized trees. In: Computer Vision–ECCV 2010, pp. 29–42 (2010)
Google Scholar
Mandel, M.I., Ellis, D.P.: Multiple-instance learning for music information retrieval. In: ISMIR, pp. 577–582 (2008)
Google Scholar
Maron, O., Ratan, A.L.: Multiple-instance learning for natural scene classification. In: ICML, vol. 98, pp. 341–349 (1998)
Google Scholar
Mesaros, A., Heittola, T., Virtanen, T.: Tut database for acoustic scene classification and sound event detection. In: Signal Processing Conference (EUSIPCO), 2016 24th European, pp. 1128–1132. IEEE, New York (2016). http://www.cs.tut.fi/~mesaros/pubs/mesaros_eusipco2016-dcase.pdf
Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Article Google Scholar
Mitchell, T., Cohen, W., Hruschka, E., Talukdar, P., Betteridge, J., Carlson, A., Dalvi, B., Gardner, M., Kisiel, B., Krishnamurthy, J., Lao, N., Mazaitis, K., Mohamed, T., Nakashole, N., Platanios, E., Ritter, A., Samadi, M., Settles, B., Wang, R., Wijaya, D., Gupta, A., Chen, X., Saparov, A., Greaves, M., Welling, J.: Never-ending learning. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI-15) (2015)
Google Scholar
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)
Article Google Scholar
Papadopoulos, D.P., Uijlings, J.R., Keller, F., Ferrari, V.: Training object class detectors with click supervision. In: Proceedings of IEEE Computer Vision and Pattern Recognition (CVPR). Honolulu, Hawaii (2017). ArXiv preprint arXiv:1704.06189
Google Scholar
Pillai, R., Qazi, U.W.: Acoustic analysis of text (aat): Extracting sound out of words. QSIURP Research Report, Carnegie Mellon University Qatar (2016)
Google Scholar
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Sager, S., Borth, D., Elizalde, B., Schulze, C., Raj, B., Lane, I., Dengel, A.: AudioSentiBank: large-scale semantic ontology of acoustic concepts for audio content analysis. arXiv preprint (arXiv:1607.03766) (2016)
Google Scholar
Salamon, J., Jacoby, C., Bello, J.P.: A dataset and taxonomy for urban sound research. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 1041–1044. ACM, New York (2014). https://serv.cusp.nyu.edu/projects/urbansounddataset/salamon_urbansound_acmmm14.pdf
Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: A unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)
Google Scholar
Stowell, D., Giannoulis, D., Benetos, E., Lagrange, M., Plumbley, M.D.: Detection and classification of acoustic scenes and events. IEEE Trans. Multimedia 17(10), 1733–1746 (2015)
Article Google Scholar
Temko, A., Malkin, R., Zieger, C., Macho, D., Nadeu, C., Omologo, M.: Clear evaluation of acoustic event detection and classification systems. In: International Evaluation Workshop on Classification of Events, Activities and Relationships, pp. 311–322. Springer, New York (2006)
Google Scholar
Wang, D., Brown, G.J.: Computational Auditory Scene Analysis: Principles, Algorithms, and Applications. Wiley/IEEE Press, New York (2006)
Book Google Scholar
Wikipedia: Amazon Echo (2017). https://en.wikipedia.org/wiki/Amazon_Echo
Xu, Y., Kong, Q., Huang, Q., Wang, W., Plumbley, M.D.: Attention and localization based on a deep convolutional recurrent model for weakly supervised audio tagging. CoRR abs/1703.06052 (2017). http://arxiv.org/abs/1703.06052
Zhao, S., Heittola, T., Virtanen, T.: Active learning for sound event classification by clustering unlabeled data. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (2017)
Google Scholar
Zhao, Z., Fu, G., Liu, S., Elokely, K.M., Doerksen, R.J., Chen, Y., Wilkins, D.E.: Drug activity prediction using multiple-instance learning via joint instance and feature selection. BMC Bioinf. 14(14), S16 (2013)
Article Google Scholar
Zhou, Z.H., Zhang, M.L.: Neural networks for multi-instance learning. In: Proceedings of the International Conference on Intelligent Information Technology, Beijing, pp. 455–459 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Google Inc, 111 8th Ave, New York, NY, 10027, USA
Dan Ellis
Laboratory of Signal Processing, Tampere University of Technology, Tampere, Finland
Tuomas Virtanen
Centre for Vision, Speech and Signal Processing, University of Surrey, Guildford, Surrey, GU2 7XH, UK
Mark D. Plumbley
Carnegie Mellon University, Pittsburgh, PA, USA
Bhiksha Raj

Authors

Dan Ellis
View author publications
You can also search for this author in PubMed Google Scholar
Tuomas Virtanen
View author publications
You can also search for this author in PubMed Google Scholar
Mark D. Plumbley
View author publications
You can also search for this author in PubMed Google Scholar
Bhiksha Raj
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tuomas Virtanen .

Editor information

Editors and Affiliations

Laboratory of Signal Processing, Tampere University of Technology, Tampere, Finland
Tuomas Virtanen
Centre for Vision, Speech and Signal Processing, University of Surrey, Surrey, United Kingdom
Mark D. Plumbley
Google Inc., New York, New York, USA
Dan Ellis

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ellis, D., Virtanen, T., Plumbley, M.D., Raj, B. (2018). Future Perspective. In: Virtanen, T., Plumbley, M., Ellis, D. (eds) Computational Analysis of Sound Scenes and Events. Springer, Cham. https://doi.org/10.1007/978-3-319-63450-0_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-63450-0_14
Published: 22 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63449-4
Online ISBN: 978-3-319-63450-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics