Skip to main content

Toward a System of Visual Classification, Analysis and Recognition of Performance-Based Moving Images in the Artistic Field

  • Conference paper
  • First Online:
Image Analysis and Processing - ICIAP 2023 Workshops (ICIAP 2023)


This paper proposes a research program focused on the design of a model for the recognition, analysis and classification of video art works and documentations based on their semiotic aspects and audiovisual content. Focusing on a corpus of art cinema, video art, and performance art, the theoretical framework involves bringing together semiotics, film studies, visual studies, and performance studies with the innovative technologies of computer vision and artificial intelligence. The aim is to analyze the performance aspect to interpret contextual references and cultural constructs recorded in artistic contexts, contributing to the classification and analysis of video art works with complex semiotic characteristics. Underlying the conceptual framework is the simultaneous use of a set of technologies, such as pose estimation, facial recognition, object recognition, motion analysis, audio analysis, and natural language processing, to improve recognition accuracy and create a large set of labeled audiovisual data. In addition, the authors propose a prototype application to explore the primary challenges of such a research project.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions


  1. 1.

    Archival facilities in the GLAM (Galleries, Libraries, Archives and Museums) and MAB (Museums, Archives, Libraries) sectors are invested in the European Union’s strategic program for digitization, preservation and online accessibility of cultural heritage, supported by the Plan for Recovery, which will be completed by 2030. (last accessed 17 August 2023).


  1. Andrea, P., Antonio, S.: Teorie dell’immagine. il dibattito contemporaneo (2009)

    Google Scholar 

  2. Arcagni, S., et al.: L’occhio della macchina, vol. 705. Einaudi (2018)

    Google Scholar 

  3. Audry, S.: Art in the age of machine learning. Mit Press (2021)

    Google Scholar 

  4. Avola, D., Cinque, L., Fagioli, A., Foresti, G.L., Fragomeni, A., Pannone, D.: 3D hand pose and shape estimation from RGB images for keypoint-based hand gesture recognition. Pattern Recogn. 129, 108762 (2022)

    Article  Google Scholar 

  5. Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)

  6. Bosi, M., Pretto, N., Guarise, M., Canazza, S.: Sound and music computing using AI: Designing a standard. In: Proceedings of the 18th Sound Music Computing Conference (SMC’21) (2021)

    Google Scholar 

  7. Falcon, A., Serra, G., Lanz, O.: Video question answering supported by a multi-task learning objective. Multimedia Tools and Applications pp. 1–28 (2023).

  8. Fontanille, J.: Soma & séma. Figures du corps, Maisonneuve et Larose (2004)

    Google Scholar 

  9. Goldberg, R.: Performance now: Live art from the 21st Century. Thames and Hudson (2018)

    Google Scholar 

  10. Grespi, B.: Figure del corpo. Gesto e immagine in movimento (2019)

    Google Scholar 

  11. Hossain, M.S., Muhammad, G.: Emotion recognition using deep learning approach from audio-visual emotional big data. Inform. Fusion 49, 69–78 (2019)

    Article  Google Scholar 

  12. Huyghe, P., et al.: Pierre huyghe. (No Title) (1999)

    Google Scholar 

  13. Kazakos, E., Nagrani, A., Zisserman, A., Damen, D.: Epic-fusion: audio-visual temporal binding for egocentric action recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 5492–5501 (2019)

    Google Scholar 

  14. Khurana, D., Koli, A., Khatter, K., Singh, S.: Natural language processing: State of the art, current trends and challenges. Multimed. Tools Appl. 82(3), 3713–3744 (2023)

    Article  Google Scholar 

  15. Kim, J.W., Choi, J.Y., Ha, E.J., Choi, J.H.: Human pose estimation using mediapipe pose and optimization method based on a humanoid model. Appl. Sci. 13(4), 2700 (2023)

    Article  Google Scholar 

  16. Ko, B.C.: A brief review of facial emotion recognition based on visual information. Sensors 18(2), 401 (2018)

    Google Scholar 

  17. Mitchell, W.J.: Pictorial turn. In: Visual Global Politics, pp. 230–232. Routledge (2018)

    Google Scholar 

  18. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

    Google Scholar 

  19. Redmon, J., Farhadi, A.: Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018)

  20. Saba, C.: Per un supplemento d’indagine: la forza deterritorializzante del video. In: Valentini V., Saba C. (edited by), Medium senza medium. Amnesia e cannibalizzazione: il video dopo gli anni ‘90, pp. 79–127. Bulzoni (2015)

    Google Scholar 

  21. Saba, C.G.: Extended cinema: the performative power of cinema in installation practices. Cinéma & Cie 13(1), 123–140 (2013)

    Google Scholar 

  22. Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5693–5703 (2019)

    Google Scholar 

  23. Wang, L., et al.: Temporal segment networks: Towards good practices for deep action recognition. In: European Conference on Computer Vision pp. 20–36. Springer (2016)

    Google Scholar 

  24. Yao, G., Lei, T., Zhong, J.: A review of convolutional-neural-network-based action recognition. Pattern Recogn. Lett. 118, 14–22 (2019)

    Article  Google Scholar 

  25. Zamprogno, M., et al.: Video-based convolutional attention for person re-identification. In: Image Analysis and Processing-ICIAP 2019: 20th International Conference, Trento, Italy, September 9–13, 2019, Proceedings, Part I 20. pp. 3–14. Springer (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations



The paper has been conceived, discussed and planned by all three authors. Michael Castronuovo has written Sects. 3-4, Alessandro Fiordelmondo planned and carried out the implementation of a prototype application, and Cosetta Saba has written Sects. 1-2.

Corresponding author

Correspondence to Alessandro Fiordelmondo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Castronuovo, M., Fiordelmondo, A., Saba, C. (2024). Toward a System of Visual Classification, Analysis and Recognition of Performance-Based Moving Images in the Artistic Field. In: Foresti, G.L., Fusiello, A., Hancock, E. (eds) Image Analysis and Processing - ICIAP 2023 Workshops. ICIAP 2023. Lecture Notes in Computer Science, vol 14366. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-51025-0

  • Online ISBN: 978-3-031-51026-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics