Abstract
Melody stuck in your head, also known as “earworm”, is tough to get rid of, unless you listen to it again or sing it out loud. But what if you can not find the name of that song? It must be an intolerable feeling. Recognizing a song name base on humming sound is not an easy task for a human being and should be done by machines. However, there is no research paper published about hum tune recognition. Adapting from Hum2Song Zalo AI Challenge 2021 - a competition about querying the name of a song by user’s giving humming tune, which is similar to Google’s Hum to Search. This paper covers details about the pre-processed data from the original type (mp3) to usable form for training and inference. In training an embedding model for the feature extraction phase, we ran experiments with some states of the art, such as ResNet, VGG, AlexNet, MobileNetV2. And for the inference phase, we use the Faiss module to effectively search for a song that matched the sequence of humming sound. The result comes at nearly 94% in MRR@10 metric on the public test set, along with the top 1 result on the public leaderboard.
Keywords
- Humming sound recognition
- Deep learning
- Faiss module
- Sound preprocessinng
H. H. Luong, T. P. Tran, H. P. Ngo, H. V. Nguyen, T. Nguyen—These authors contributed equally to this work.
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Jiankang, D.: ArcFace: additive angular margin loss for deep face recognition. arXiv.Org, 23 January 2018. https://arxiv.org/abs/1801.07698
Jeff, J., et al.: Billion-0 Gpus. ArXiv.org, 28 February 2017. https://arxiv.org/abs/1702.08734
Kaiming, H., et al.: Deep residual learning for image recognition. ArXiv.org. 10 December 2015. https://arxiv.org/abs/1512.03385
Karen, S., Zisserman, A.: Very deep convolutional networks for large-scale image recognition.’ ArXiv.org. 10 April 2015. https://arxiv.org/abs/1409.1556
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009). https://ieeexplore.ieee.org/document/5206848
Mark, S., et al.: MobileNetV2: inverted residuals and linear bottlenecks. ArXiv.org. 21 March 2019. https://arxiv.org/abs/1801.04381
Alex, K.: ImageNet classification with deep convolutional neural networks. In: Proceedings. Neurips.Cc (2012). https://proceedings.neurips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html
Avery, W.: An industrial strength audio search algorithm. An Industrial Strength Audio Search Algorithm. https://www.researchgate.net/publication/220723446_An_Industrial_Strength_Audio_Search_Algorithm
Keunwoo, C., et al.: Automatic tagging using deep convolutional neural networks. ArXiv.org. 1 June 2016. https://arxiv.org/abs/1606.00298
Jongpil, L., et al.: Sample-level deep convolutional neural networks for music auto-tagging using raw waveforms. ArXiv.org, 22 May 2017. https://arxiv.org/abs/1703.01789
Jordi, P., Serra, X.: Musicnn: pre-trained convolutional neural networks for music audio tagging. ArXiv.org, 14 September 2019. https://arxiv.org/abs/1909.06654
Jiang, C., et al.: Similarity learning for cover song identification using cross-similarity matrices of multi-level deep sequences. IEEE Xplore, 15 May 2020. https://ieeexplore.ieee.org/document/9053257
Xiaoshuo, X., et al.: Key-invariant convolutional neural network toward efficient cover song identification. IEEE Xplore, 11 October 2018. https://ieeexplore.ieee.org/document/8486531
Zhesong, Y., et al.: Learning a representation for cover song identification using convolutional neural network. ArXiv.org, 1 November 2019. https://arxiv.org/abs/1911.00334
Dong, Y., et al.: Contrastive learning with positive-negative frame mask for music representation. ArXiv.org, 3 April 2022. https://arxiv.org/abs/2203.09129
Quynh Nhut, N., et al.: Movie recommender systems made through tag interpolation. In: Proceedings of the 4th International Conference on Machine Learning and Soft Computing. ACM Other Conferences, 1 January 2020. https://dl.acm.org/doi/10.1145/3380688.3380712
Hao Tuan, H., et al.: Automatic keywords-based classification of vietnamese texts. In: 2020 RIVF International Conference on Computing and Communication Technologies (RIVF), IEEE (2020)
Quynh Nhut, N., et al.: Movie recommender systems made through tag interpolation. In: Proceedings of the 4th International Conference on Machine Learning and Soft Computing (2020)
Nghia, D.-T., et al.: Genres and actors/actresses as interpolated tags for improving movie recommender systems. Int. J. Adv. Comput. Sci. Appl. 11(2) (2020)
Zalo AI Challenge. https://challenge.zalo.ai/
Vovanphuc. VOVANPHUC/hum2song: Top 1 Zalo AI Challenge 2021 Task Hum to Song. GitHub. https://github.com/vovanphuc/hum2song
Krishna, K.: Song Stuck in Your Head? Just Hum to Search. Google, Google, 15 October 2020. https://blog.google/products/search/hum-to-search/
Shazam. https://www.shazam.com/
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Pham, B.L., Luong, H.H., Tran, T.P., Ngo, H.P., Nguyen, H.V., Nguyen, T. (2022). An Approach to Hummed-tune and Song Sequences Matching. In: Dang, T.K., Küng, J., Chung, T.M. (eds) Future Data and Security Engineering. Big Data, Security and Privacy, Smart City and Industry 4.0 Applications. FDSE 2022. Communications in Computer and Information Science, vol 1688. Springer, Singapore. https://doi.org/10.1007/978-981-19-8069-5_49
Download citation
DOI: https://doi.org/10.1007/978-981-19-8069-5_49
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-8068-8
Online ISBN: 978-981-19-8069-5
eBook Packages: Computer ScienceComputer Science (R0)