Skip to main content

Dhwani: Sound Generation from Google Drive Images Controlled with Hand Gestures

  • Conference paper
  • First Online:
Proceedings of 3rd International Conference on Computing Informatics and Networks

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 167))

  • 733 Accesses

Abstract

Dhwani is a project focusing on sound generation from a given image using the content or background of the picture. Audio production from images can have much utilization, especially in reliving the past moment or for exploring the places without even visiting them. With the use of waving hand gesture, the user can open an image and can listen to the sound generated from one photograph at a time. The use of waving hand gesture for accessing the next or previous image has been incorporated into the project for improvizing the user’s experience. There is a use of folder linked to Google Drive on which the processing has been performed. So, the user gets an advantage of directly using the images stored on the Google Cloud saving time and hustle to transfer the pictures. Using deep learning, object tracking and computer vision, Dhwani is providing the users with a prominent innovation that comes bundled with high-level user experience.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aytar Y, Vondrick C, Torralba A (2016) SoundNet: learning sound representations from unlabeled video. NIPS

    Google Scholar 

  2. Chen L, Srivastava S, Duan Z, Xu C (2017) Deep cross-modal audio-visual generation. ArXiv:abs/1704.08292

  3. Garg P, Aggarwal N, Sofat S (2009) Vision based hand gesture recognition

    Google Scholar 

  4. Yang F, Shi H (2016) Research on static hand gesture recognition technology for human computer interaction system. In: International conference on intelligent transportation, big data and smart city (ICITBS), Changsha, pp 459–463

    Google Scholar 

  5. Sun J, Ji T, Zhang S, Yang J, Ji G (2018) Research on the hand vision-based interpretation of hand gestures for remote control of a computer mouse gesture recognition based on deep learning. In: 12th International symposium on antennas, propagation and EM theory, pp 1–4, Hangzhou, China

    Google Scholar 

  6. Pendke K et al (2015) Int J Comput Sci Mob Comput 4(3):293–300

    Google Scholar 

  7. Argyros AA, Lourakis MIA (2006) In: Huang TS et al (eds) Computer vision in human-computer interaction (ECCV 2006). Lecture notes in computer science, vol 3979. Springer, Heidelberg

    Google Scholar 

  8. Le PD, Nguyen VH (2014) Remote mouse control using fingertip tracking technique. In: recent advances in electrical engineering and related sciences (201. Lecture notes in electrical engineering, vol 282). Springer, Heidelberg

    Google Scholar 

  9. Barchiesi D, Giannoulis D, Stowell D, Plumbley M (2015) Acoustic scene classification: classifying environments from the sounds they produce. SPM

    Google Scholar 

  10. Salamon J, Bello J (2015) Unsupervised feature learning for urban sound classification. In ICASSP

    Google Scholar 

  11. Serrano N, Savakis AE, Luo J (2002) A computationally efficient approach to indoor/outdoor scene classification. In: Object recognition supported by user interaction for service robots, vol 4, pp 146–149

    Google Scholar 

  12. Argyros AA, Lourakis MIA (2004) Real-time tracking of multiple skin-colored objects with a possibly moving camera. In: Pajdla T, Matas JG (eds) ECCV, LNCS, vol 3023. Springer, Heidelberg, pp 368–379

    Google Scholar 

  13. Wang RY, Popovic J (2009) Real-time hand-tracking with a color glove. ACM Trans Graph 28(3)

    Google Scholar 

  14. Raheja JL, Das K, Chaudhary A (2011) An efficient real time method of fingertip detection. In: 7th international conference on Trends in Industrial Measurements and Automation (TIMA, January 2011), Chennai, India, pp 447–450

    Google Scholar 

  15. Boutell M, Luo J, Shen X, Brown C (2004) Learning multi-label scene classification. Pattern Recogn

    Google Scholar 

  16. Papageorgiou CP, Oren M, Poggio T (1998) A general framework for object detection. In Sixth international conference on computer vision (IEEE Cat No 98CH36271). IEEE, pp 555–562

    Google Scholar 

  17. Cireşan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. arXiv preprint arXiv:1202.2745

  18. Oord AVD, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kavukcuoglu K (2016) Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499

  19. Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258

    Google Scholar 

  20. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861

  21. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vision 60(2):91–110

    Article  Google Scholar 

  22. Xuecheng L (1992) Entropy, distance measure and similarity measure of fuzzy sets and their relations. Fuzzy Sets Syst 52(3):305–318

    Article  MathSciNet  Google Scholar 

  23. Hong P, Turk M, Huang T (2000) Gesture modeling and recognition using finite state machines. In: Proceedings fourth IEEE international conference on automatic face and gesture recognition (Cat No PR00580). IEEE, pp 410–415

    Google Scholar 

  24. Owens A, Isola P, McDermott A (2016) Visually indicated sounds. In: CVPR

    Google Scholar 

  25. Zhou Y, Wang Z, Fang C, Bui T, Berg T (2018) Visual to sound: generating natural sound for videos in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3550–3558

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tanish Bhola .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bhola, T., Gupta, S., Kaul, S., Jain, H., Gupta, J., Jain, R. (2021). Dhwani: Sound Generation from Google Drive Images Controlled with Hand Gestures. In: Abraham, A., Castillo, O., Virmani, D. (eds) Proceedings of 3rd International Conference on Computing Informatics and Networks. Lecture Notes in Networks and Systems, vol 167. Springer, Singapore. https://doi.org/10.1007/978-981-15-9712-1_23

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-9712-1_23

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-9711-4

  • Online ISBN: 978-981-15-9712-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics