Skip to main content

Talking Detection in Collaborative Learning Environments

  • Conference paper
  • First Online:
Computer Analysis of Images and Patterns (CAIP 2021)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 13053))

Included in the following conference series:

Abstract

We study the problem of detecting talking activities in collaborative learning videos. Our approach uses head detection and projections of the log-magnitude of optical flow vectors to reduce the problem to a simple classification of small projection images without the need for training complex, 3-D activity classification systems. The small projection images are then easily classified using a simple majority vote of standard classifiers. For talking detection, our proposed approach is shown to significantly outperform single activity systems. We have an overall accuracy of 59% compared to 42% for Temporal Segment Network (TSN) and 45% for Convolutional 3D (C3D). In addition, our method is able to detect multiple talking instances from multiple speakers, while also detecting the speakers themselves.

This material is based upon work supported by the National Science Foundation under Grant No. 1613637, No. 1842220, and No. 1949230.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Darsey, C.J.: Hand movement detection in collaborative learning environment videos (2018)

    Google Scholar 

  2. Eilar, C.W., Jatla, V., Pattichis, M.S., LópezLeiva, C., Celedón-Pattichis, S.: Distributed video analysis for the advancing out-of-school learning in mathematics and engineering project. In: 50th Asilomar Conference on Signals, Systems and Computers, pp. 571–575. IEEE (2016)

    Google Scholar 

  3. Farnebäck, G.: Two-frame motion estimation based on polynomial expansion. In: Bigun, J., Gustavsson, T. (eds.) SCIA 2003. LNCS, vol. 2749, pp. 363–370. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-45103-X_50

    Chapter  Google Scholar 

  4. Jacoby, A.R., Pattichis, M.S., Celedón-Pattichis, S., LópezLeiva, C.: Context-sensitive human activity classification in collaborative learning environments. In: 2018 IEEE Southwest Symposium on Image Analysis and Interpretation (SSIAI), pp. 1–4. IEEE (2018)

    Google Scholar 

  5. Redmon, J., Farhadi, A.: Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018)

  6. Shi, W.: Human Attention Detection Using AM-FM Representations. Master’s thesis, the University of New Mexico, Albuquerque, New Mexico (2016)

    Google Scholar 

  7. Shi, W., Pattichis, M.S., CeledLón-Pattichis, S., LoLópezLeiva, C.: Robust head detection in collaborative learning environments using am-fm representations. In: IEEE Southwest Symposium on Image Analysis and Interpretation (in press, 2018)

    Google Scholar 

  8. Shi, W., Pattichis, M.S., Celedón-Pattichis, S., LópezLeiva, C.: Dynamic group interactions in collaborative learning videos. In: 2018 52nd Asilomar Conference on Signals, Systems, and Computers, pp. 1528–1531. IEEE (2018)

    Google Scholar 

  9. Tapia, L.S., Pattichis, M.S., Celedón-Pattichis, S., LópezLeiva, C.: The importance of the instantaneous phase for face detection using simple convolutional neural networks. In: 2020 IEEE Southwest Symposium on Image Analysis and Interpretation (SSIAI), pp. 1–4. IEEE (2020)

    Google Scholar 

  10. Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497 (2015)

    Google Scholar 

  11. Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., Paluri, M.: A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6450–6459 (2018)

    Google Scholar 

  12. Wang, L., et al.: Temporal segment networks: towards good practices for deep action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 20–36. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_2

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenjing Shi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shi, W., Pattichis, M.S., Celedón-Pattichis, S., LópezLeiva, C. (2021). Talking Detection in Collaborative Learning Environments. In: Tsapatsoulis, N., Panayides, A., Theocharides, T., Lanitis, A., Pattichis, C., Vento, M. (eds) Computer Analysis of Images and Patterns. CAIP 2021. Lecture Notes in Computer Science(), vol 13053. Springer, Cham. https://doi.org/10.1007/978-3-030-89131-2_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-89131-2_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-89130-5

  • Online ISBN: 978-3-030-89131-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics