Dhwani: Sound Generation from Google Drive Images Controlled with Hand Gestures

Bhola, Tanish; Gupta, Shubham; Kaul, Satvik; Jain, Harsh; Gupta, Jatin; Jain, Rachna

doi:10.1007/978-981-15-9712-1_23

Tanish Bhola¹²,
Shubham Gupta¹²,
Satvik Kaul¹²,
Harsh Jain¹²,
Jatin Gupta¹² &
…
Rachna Jain¹²

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 167))

733 Accesses

Abstract

Dhwani is a project focusing on sound generation from a given image using the content or background of the picture. Audio production from images can have much utilization, especially in reliving the past moment or for exploring the places without even visiting them. With the use of waving hand gesture, the user can open an image and can listen to the sound generated from one photograph at a time. The use of waving hand gesture for accessing the next or previous image has been incorporated into the project for improvizing the user’s experience. There is a use of folder linked to Google Drive on which the processing has been performed. So, the user gets an advantage of directly using the images stored on the Google Cloud saving time and hustle to transfer the pictures. Using deep learning, object tracking and computer vision, Dhwani is providing the users with a prominent innovation that comes bundled with high-level user experience.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aytar Y, Vondrick C, Torralba A (2016) SoundNet: learning sound representations from unlabeled video. NIPS
Google Scholar
Chen L, Srivastava S, Duan Z, Xu C (2017) Deep cross-modal audio-visual generation. ArXiv:abs/1704.08292
Garg P, Aggarwal N, Sofat S (2009) Vision based hand gesture recognition
Google Scholar
Yang F, Shi H (2016) Research on static hand gesture recognition technology for human computer interaction system. In: International conference on intelligent transportation, big data and smart city (ICITBS), Changsha, pp 459–463
Google Scholar
Sun J, Ji T, Zhang S, Yang J, Ji G (2018) Research on the hand vision-based interpretation of hand gestures for remote control of a computer mouse gesture recognition based on deep learning. In: 12th International symposium on antennas, propagation and EM theory, pp 1–4, Hangzhou, China
Google Scholar
Pendke K et al (2015) Int J Comput Sci Mob Comput 4(3):293–300
Google Scholar
Argyros AA, Lourakis MIA (2006) In: Huang TS et al (eds) Computer vision in human-computer interaction (ECCV 2006). Lecture notes in computer science, vol 3979. Springer, Heidelberg
Google Scholar
Le PD, Nguyen VH (2014) Remote mouse control using fingertip tracking technique. In: recent advances in electrical engineering and related sciences (201. Lecture notes in electrical engineering, vol 282). Springer, Heidelberg
Google Scholar
Barchiesi D, Giannoulis D, Stowell D, Plumbley M (2015) Acoustic scene classification: classifying environments from the sounds they produce. SPM
Google Scholar
Salamon J, Bello J (2015) Unsupervised feature learning for urban sound classification. In ICASSP
Google Scholar
Serrano N, Savakis AE, Luo J (2002) A computationally efficient approach to indoor/outdoor scene classification. In: Object recognition supported by user interaction for service robots, vol 4, pp 146–149
Google Scholar
Argyros AA, Lourakis MIA (2004) Real-time tracking of multiple skin-colored objects with a possibly moving camera. In: Pajdla T, Matas JG (eds) ECCV, LNCS, vol 3023. Springer, Heidelberg, pp 368–379
Google Scholar
Wang RY, Popovic J (2009) Real-time hand-tracking with a color glove. ACM Trans Graph 28(3)
Google Scholar
Raheja JL, Das K, Chaudhary A (2011) An efficient real time method of fingertip detection. In: 7th international conference on Trends in Industrial Measurements and Automation (TIMA, January 2011), Chennai, India, pp 447–450
Google Scholar
Boutell M, Luo J, Shen X, Brown C (2004) Learning multi-label scene classification. Pattern Recogn
Google Scholar
Papageorgiou CP, Oren M, Poggio T (1998) A general framework for object detection. In Sixth international conference on computer vision (IEEE Cat No 98CH36271). IEEE, pp 555–562
Google Scholar
Cireşan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. arXiv preprint arXiv:1202.2745
Oord AVD, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kavukcuoglu K (2016) Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499
Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258
Google Scholar
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vision 60(2):91–110
Article Google Scholar
Xuecheng L (1992) Entropy, distance measure and similarity measure of fuzzy sets and their relations. Fuzzy Sets Syst 52(3):305–318
Article MathSciNet Google Scholar
Hong P, Turk M, Huang T (2000) Gesture modeling and recognition using finite state machines. In: Proceedings fourth IEEE international conference on automatic face and gesture recognition (Cat No PR00580). IEEE, pp 410–415
Google Scholar
Owens A, Isola P, McDermott A (2016) Visually indicated sounds. In: CVPR
Google Scholar
Zhou Y, Wang Z, Fang C, Bui T, Berg T (2018) Visual to sound: generating natural sound for videos in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3550–3558
Google Scholar

Download references

Author information

Authors and Affiliations

Bharati Vidyapeeth’s College of Engineering, New Delhi, 110063, India
Tanish Bhola, Shubham Gupta, Satvik Kaul, Harsh Jain, Jatin Gupta & Rachna Jain

Authors

Tanish Bhola
View author publications
You can also search for this author in PubMed Google Scholar
Shubham Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Satvik Kaul
View author publications
You can also search for this author in PubMed Google Scholar
Harsh Jain
View author publications
You can also search for this author in PubMed Google Scholar
Jatin Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Rachna Jain
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tanish Bhola .

Editor information

Editors and Affiliations

Scientific Network for Innovation and Research Excellence, Machine Intelligence Research Labs (MIR Labs), Auburn, WA, USA
Ajith Abraham
Tijuana Institute of Technology, Tijuana, Mexico
Oscar Castillo
Bhagwan Parshuram Institute of Technology, New Delhi, India
Deepali Virmani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bhola, T., Gupta, S., Kaul, S., Jain, H., Gupta, J., Jain, R. (2021). Dhwani: Sound Generation from Google Drive Images Controlled with Hand Gestures. In: Abraham, A., Castillo, O., Virmani, D. (eds) Proceedings of 3rd International Conference on Computing Informatics and Networks. Lecture Notes in Networks and Systems, vol 167. Springer, Singapore. https://doi.org/10.1007/978-981-15-9712-1_23

Download citation

DOI: https://doi.org/10.1007/978-981-15-9712-1_23
Published: 15 March 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-9711-4
Online ISBN: 978-981-15-9712-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics