Abstract
Dhwani is a project focusing on sound generation from a given image using the content or background of the picture. Audio production from images can have much utilization, especially in reliving the past moment or for exploring the places without even visiting them. With the use of waving hand gesture, the user can open an image and can listen to the sound generated from one photograph at a time. The use of waving hand gesture for accessing the next or previous image has been incorporated into the project for improvizing the user’s experience. There is a use of folder linked to Google Drive on which the processing has been performed. So, the user gets an advantage of directly using the images stored on the Google Cloud saving time and hustle to transfer the pictures. Using deep learning, object tracking and computer vision, Dhwani is providing the users with a prominent innovation that comes bundled with high-level user experience.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aytar Y, Vondrick C, Torralba A (2016) SoundNet: learning sound representations from unlabeled video. NIPS
Chen L, Srivastava S, Duan Z, Xu C (2017) Deep cross-modal audio-visual generation. ArXiv:abs/1704.08292
Garg P, Aggarwal N, Sofat S (2009) Vision based hand gesture recognition
Yang F, Shi H (2016) Research on static hand gesture recognition technology for human computer interaction system. In: International conference on intelligent transportation, big data and smart city (ICITBS), Changsha, pp 459–463
Sun J, Ji T, Zhang S, Yang J, Ji G (2018) Research on the hand vision-based interpretation of hand gestures for remote control of a computer mouse gesture recognition based on deep learning. In: 12th International symposium on antennas, propagation and EM theory, pp 1–4, Hangzhou, China
Pendke K et al (2015) Int J Comput Sci Mob Comput 4(3):293–300
Argyros AA, Lourakis MIA (2006) In: Huang TS et al (eds) Computer vision in human-computer interaction (ECCV 2006). Lecture notes in computer science, vol 3979. Springer, Heidelberg
Le PD, Nguyen VH (2014) Remote mouse control using fingertip tracking technique. In: recent advances in electrical engineering and related sciences (201. Lecture notes in electrical engineering, vol 282). Springer, Heidelberg
Barchiesi D, Giannoulis D, Stowell D, Plumbley M (2015) Acoustic scene classification: classifying environments from the sounds they produce. SPM
Salamon J, Bello J (2015) Unsupervised feature learning for urban sound classification. In ICASSP
Serrano N, Savakis AE, Luo J (2002) A computationally efficient approach to indoor/outdoor scene classification. In: Object recognition supported by user interaction for service robots, vol 4, pp 146–149
Argyros AA, Lourakis MIA (2004) Real-time tracking of multiple skin-colored objects with a possibly moving camera. In: Pajdla T, Matas JG (eds) ECCV, LNCS, vol 3023. Springer, Heidelberg, pp 368–379
Wang RY, Popovic J (2009) Real-time hand-tracking with a color glove. ACM Trans Graph 28(3)
Raheja JL, Das K, Chaudhary A (2011) An efficient real time method of fingertip detection. In: 7th international conference on Trends in Industrial Measurements and Automation (TIMA, January 2011), Chennai, India, pp 447–450
Boutell M, Luo J, Shen X, Brown C (2004) Learning multi-label scene classification. Pattern Recogn
Papageorgiou CP, Oren M, Poggio T (1998) A general framework for object detection. In Sixth international conference on computer vision (IEEE Cat No 98CH36271). IEEE, pp 555–562
Cireşan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. arXiv preprint arXiv:1202.2745
Oord AVD, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kavukcuoglu K (2016) Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499
Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vision 60(2):91–110
Xuecheng L (1992) Entropy, distance measure and similarity measure of fuzzy sets and their relations. Fuzzy Sets Syst 52(3):305–318
Hong P, Turk M, Huang T (2000) Gesture modeling and recognition using finite state machines. In: Proceedings fourth IEEE international conference on automatic face and gesture recognition (Cat No PR00580). IEEE, pp 410–415
Owens A, Isola P, McDermott A (2016) Visually indicated sounds. In: CVPR
Zhou Y, Wang Z, Fang C, Bui T, Berg T (2018) Visual to sound: generating natural sound for videos in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3550–3558
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Bhola, T., Gupta, S., Kaul, S., Jain, H., Gupta, J., Jain, R. (2021). Dhwani: Sound Generation from Google Drive Images Controlled with Hand Gestures. In: Abraham, A., Castillo, O., Virmani, D. (eds) Proceedings of 3rd International Conference on Computing Informatics and Networks. Lecture Notes in Networks and Systems, vol 167. Springer, Singapore. https://doi.org/10.1007/978-981-15-9712-1_23
Download citation
DOI: https://doi.org/10.1007/978-981-15-9712-1_23
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-9711-4
Online ISBN: 978-981-15-9712-1
eBook Packages: EngineeringEngineering (R0)