Skip to main content

An Approach to Hummed-tune and Song Sequences Matching

  • 1600 Accesses

Part of the Communications in Computer and Information Science book series (CCIS,volume 1688)


Melody stuck in your head, also known as “earworm”, is tough to get rid of, unless you listen to it again or sing it out loud. But what if you can not find the name of that song? It must be an intolerable feeling. Recognizing a song name base on humming sound is not an easy task for a human being and should be done by machines. However, there is no research paper published about hum tune recognition. Adapting from Hum2Song Zalo AI Challenge 2021 - a competition about querying the name of a song by user’s giving humming tune, which is similar to Google’s Hum to Search. This paper covers details about the pre-processed data from the original type (mp3) to usable form for training and inference. In training an embedding model for the feature extraction phase, we ran experiments with some states of the art, such as ResNet, VGG, AlexNet, MobileNetV2. And for the inference phase, we use the Faiss module to effectively search for a song that matched the sequence of humming sound. The result comes at nearly 94% in MRR@10 metric on the public test set, along with the top 1 result on the public leaderboard.


  • Humming sound recognition
  • Deep learning
  • Faiss module
  • Sound preprocessinng

H. H. Luong, T. P. Tran, H. P. Ngo, H. V. Nguyen, T. Nguyen—These authors contributed equally to this work.

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD   44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions


  1. Jiankang, D.: ArcFace: additive angular margin loss for deep face recognition. arXiv.Org, 23 January 2018.

  2. Jeff, J., et al.: Billion-0 Gpus., 28 February 2017.

  3. Kaiming, H., et al.: Deep residual learning for image recognition. 10 December 2015.

  4. Karen, S., Zisserman, A.: Very deep convolutional networks for large-scale image recognition.’ 10 April 2015.

  5. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009).

  6. Mark, S., et al.: MobileNetV2: inverted residuals and linear bottlenecks. 21 March 2019.

  7. Alex, K.: ImageNet classification with deep convolutional neural networks. In: Proceedings. Neurips.Cc (2012).

  8. Avery, W.: An industrial strength audio search algorithm. An Industrial Strength Audio Search Algorithm.

  9. Keunwoo, C., et al.: Automatic tagging using deep convolutional neural networks. 1 June 2016.

  10. Jongpil, L., et al.: Sample-level deep convolutional neural networks for music auto-tagging using raw waveforms., 22 May 2017.

  11. Jordi, P., Serra, X.: Musicnn: pre-trained convolutional neural networks for music audio tagging., 14 September 2019.

  12. Jiang, C., et al.: Similarity learning for cover song identification using cross-similarity matrices of multi-level deep sequences. IEEE Xplore, 15 May 2020.

  13. Xiaoshuo, X., et al.: Key-invariant convolutional neural network toward efficient cover song identification. IEEE Xplore, 11 October 2018.

  14. Zhesong, Y., et al.: Learning a representation for cover song identification using convolutional neural network., 1 November 2019.

  15. Dong, Y., et al.: Contrastive learning with positive-negative frame mask for music representation., 3 April 2022.

  16. Quynh Nhut, N., et al.: Movie recommender systems made through tag interpolation. In: Proceedings of the 4th International Conference on Machine Learning and Soft Computing. ACM Other Conferences, 1 January 2020.

  17. Hao Tuan, H., et al.: Automatic keywords-based classification of vietnamese texts. In: 2020 RIVF International Conference on Computing and Communication Technologies (RIVF), IEEE (2020)

    Google Scholar 

  18. Quynh Nhut, N., et al.: Movie recommender systems made through tag interpolation. In: Proceedings of the 4th International Conference on Machine Learning and Soft Computing (2020)

    Google Scholar 

  19. Nghia, D.-T., et al.: Genres and actors/actresses as interpolated tags for improving movie recommender systems. Int. J. Adv. Comput. Sci. Appl. 11(2) (2020)

    Google Scholar 

  20. Zalo AI Challenge.

  21. Vovanphuc. VOVANPHUC/hum2song: Top 1 Zalo AI Challenge 2021 Task Hum to Song. GitHub.

  22. Krishna, K.: Song Stuck in Your Head? Just Hum to Search. Google, Google, 15 October 2020.

  23. Shazam.

Download references

Author information

Authors and Affiliations


Corresponding authors

Correspondence to Bao Loc Pham or Huong Hoang Luong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pham, B.L., Luong, H.H., Tran, T.P., Ngo, H.P., Nguyen, H.V., Nguyen, T. (2022). An Approach to Hummed-tune and Song Sequences Matching. In: Dang, T.K., Küng, J., Chung, T.M. (eds) Future Data and Security Engineering. Big Data, Security and Privacy, Smart City and Industry 4.0 Applications. FDSE 2022. Communications in Computer and Information Science, vol 1688. Springer, Singapore.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-19-8068-8

  • Online ISBN: 978-981-19-8069-5

  • eBook Packages: Computer ScienceComputer Science (R0)