Skip to main content
Log in

Vision-based drone control for autonomous UAV cinematography

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

One of the most important aesthetic concepts in autonomous Unmanned Aerial Vehicle (UAV) cinematography is the UAV/Camera Motion Type (CMT), describing the desired UAV trajectory relative to a (still or moving) physical target/subject being filmed. Usually, for the drone to autonomously execute such a CMT and capture the desired shot in footage, the 3D states (positions/poses within the world) of both the UAV/camera and the target are required as input. However, the target’s 3D state is not typically known in non-staged settings. This paper proposes a novel framework for reformulating each desired CMT as a set of requirements that interrelate 2D visual information, UAV trajectory and camera orientation. Then, a set of CMT-specific vision-driven Proportional-Integral-Derivative (PID) UAV controllers can be implemented, by exploiting the above requirements to form suitable error signals. Such signals drive continuous adjustments to instant UAV motion parameters, separately at each captured video frame/time instance. The only inputs required for computing each error value are the current 2D pixel coordinates of the target’s on-frame bounding box, detectable by an independent, off-the-shelf, real-time, deep neural 2D object detector/tracker vision subsystem. Importantly, neither UAV nor target 3D states are required ever to be known or estimated, while no depth maps, target 3D models or camera intrinsic parameters are necessary. The method was implemented and successfully evaluated in a robotics simulator, by properly reformulating a set of standard, formalized UAV CMTs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data Availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Notes

  1. https://drive.google.com/drive/folders/1RhxG0PbWIzrNHk3_1YQGaKU6CNz5LxpN

  2. Benchmark_RAI and Annotations_Bicycles_Raw datasets were downloaded from https://aiia.csd.auth.gr/open-multidrone-datasets/

  3. A default level of noise is already inserted by the simulator.

References

  1. Åström KJ, Hägglund T, Astrom KJ (2006) Advanced PID control, vol 461. ISA-The Instrumentation Systems, and Automation Society Research Triangle

  2. Alcantara A, Capitan J, Cunha R, Ollero A (2021) Optimal trajectory planning for cinematography with multiple Unmanned Aerial Vehicles. Robot Auton Syst 140:103778

    Article  Google Scholar 

  3. Alcantara A, Capitan J, Torres-Gonzalez A, Cunha R, Ollero A (2020) Autonomous execution of cinematographic shots with multiple drones. IEEE Access pp 201300–201316

  4. Alexandrov AG, Palenov MV (2014) Adaptive PID controllers: State of the art and development prospects. Autom Remote Control 75(2):188–199

    Article  MathSciNet  Google Scholar 

  5. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv:1409.0473

  6. Bhattacharya S, Mehran R, Sukthankar R, Shah M (2014) Classification of cinematographic shots using lie algebra and its application to complex event recognition. IEEE Trans Multimed 16(3):686–696

    Article  Google Scholar 

  7. Bonatti R, Ho C, Wang W, Choudhury S, Scherer S (2019) Towards a robust aerial cinematography platform:, Localizing and tracking moving targets in unstructured environments. arXiv:1904.02319

  8. Bradski G, Kaehler A, Pisarevsky V (2005) Learning-based computer vision with intel’s open source computer vision library. Intel Technol J 9(2):119–130

    Google Scholar 

  9. Cao Z, Fu C, Ye J, Li B, Li Y (2021) SiamAPN++: Siamese attentional aggregation network for real-time UAV tracking. In: Proceedings of the International Conference on Intelligent Robots and Systems (IROS)

  10. Caraballo LE, Montes-Romero A, Diaz-Baṅez JM, Capitan J, Torres-Gonzalez A, Ollero A (2020) Autonomous planning for multiple aerial cinematographers. In: Proceedings of the International Conference on Intelligent Robots and Systems (IROS)

  11. Carrio A, Sampedro C, Rodriguez-Ramos A, Campoy P (2017) A review of deep learning methods and applications for unmanned aerial vehicles. Journal of Sensors 2017

  12. Chen C, Ling Q (2019) Adaptive convolution for object detection. IEEE Trans Multimed 21(12):3205–3217

    Article  MathSciNet  Google Scholar 

  13. Devo A, Dionigi A, Costante G (2021) Enhancing continuous control of mobile robots for end-to-end visual active tracking. Robot Auton Syst 142:103799

    Article  Google Scholar 

  14. Fourati H, Belkhiat D (2016) Multisensor attitude estimation: fundamental concepts and applications CRC press LLC

  15. Gao XS, Hou XR, Tang J, Cheng HF (2003) Complete solution classification for the perspective-three-point problem. IEEE Trans Pattern Anal Mach Intell 25(8):930–943

    Article  Google Scholar 

  16. Grewal MS, Weill LR, Andrews AP (2007) Global positioning systems, inertial navigation, and integration. John Wiley & Sons

  17. Gschwindt M, Camci E, Bonatti R, Wang W, Kayacan E, Scherer S (2019) Can a robot become a movie director? learning artistic principles for aerial cinematography. arXiv:1904.02579

  18. Huang C, Lin CE, Yang Z, Kong Y, Chen P, Yang X, Cheng KT (2019) Learning to film from professional human motion videos. In: Proceedings of the IEEE Conference on computer vision and pattern recognition

  19. Huang C, Yang Z, Kong Y, Chen P, Yang X, Cheng KTT (2019) Learning to capture a film-look video with a camera drone. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)

  20. Huynh-Thu Q, Garcia MN, Speranza F, Corriveau P, Raake A (2010) Study of rating scales for subjective quality assessment of high-definition video. IEEE Trans Broadcast 57(1):1–14

    Article  Google Scholar 

  21. Jocher Gea (2022) ultralytics/yolov5: v7.0 - YOLOv5 SOTA Realtime instance segmentation. https://doi.org/10.5281/zenodo.7347926

  22. Joubert N, Goldman DB, Berthouzoz F, Roberts M, Landay JA, Hanrahan P (2016) Towards a drone cinematographer:, Guiding quadrotor cameras using visual composition principles. arXiv:1610.01691

  23. Kakaletsis E, Symeonidis C, Tzelepi M, Mademlis I, Tefas A, Nikolaidis N, Pitas I (2021) Computer vision for autonomous UAV flight safety: an overview and a vision-based safe landing pipeline example. ACM Computing Surveys (CSUR) 54(9):1–37

    Article  Google Scholar 

  24. Karakostas I, Mademlis I, Nikolaidis N, Pitas I (2018) UAV Cinematography constraints imposed by visual target tracking. In: Proceedings of the IEEE International Conference on Image Processing (ICIP)

  25. Karakostas I, Mademlis I, Nikolaidis N, Pitas I (2019) Shot type feasibility in autonomous UAV cinematography. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  26. Karakostas I, Mademlis I, Nikolaidis N, Pitas I (2020) Shot type constraints in UAV cinematography for autonomous target tracking. Inf Sci 506:273–294

    Article  Google Scholar 

  27. Kelchtermans K, Tuytelaars T (2017) How hard is it to cross the room? Training (Recurrent) Neural Networks to steer a UAV. arXiv:1702.07600

  28. Kim D, Chen T (2015) Deep neural network for real-time autonomous indoor navigation. arXiv:1511.04668

  29. Kuang Q, Jin X, Zhao Q, Zhou B (2019) Deep multimodality learning for UAV video aesthetic quality assessment. IEEE Transactions on Multimedia

  30. Li G, Mueller M, Casser V, Smith N, Michels DL, Ghanem B (2018) OIL:, Observational Imitation Learning. arXiv:1803.01129

  31. Li B, Yan J, Wu W, Zhu Z, Hu X (2018) High performance visual tracking with siamese Region Proposal Network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  32. Mademlis I, Mygdalis V, Nikolaidis N, Montagnuolo M, Negro F, Messina A, Pitas I (2019) High-level multiple-UAV cinematography tools for covering outdoor events. IEEE Trans Broadcast 65(3):627–635

    Article  Google Scholar 

  33. Mademlis I, Mygdalis V, Nikolaidis N, Pitas I (2018) Challenges in autonomous UAV cinematography: an overview. In: Proceedings of the IEEE International Conference on Multimedia and Expo (ICME)

  34. Mademlis I, Nikolaidis N, Tefas A, Pitas I, Wagner T, Messina A (2018) Autonomous unmanned aerial vehicles filming in dynamic unstructured outdoor environments. IEEE Signal Proc Mag 36:147–153

    Article  Google Scholar 

  35. Mademlis I, Nikolaidis N, Tefas A, Pitas I, Wagner T, Messina A (2019) Autonomous UAV cinematography: a tutorial and a formalized shot type taxonomy. ACM Comput Surv 52(5):105

    Google Scholar 

  36. Mademlis I, Torres-González A, Capitán J, Montagnuolo M, Messina A, Negro F, Le Barz C, Gonçalves T, Cunha R, Guerreiro B et al (2022) A multiple-UAV architecture for autonomous media production. Multimed Tools Appl pp 1–30

  37. Meier L, Tanskanen P, Fraundorfer F, Pollefeys M (2011) Pixhawk: a system for autonomous flight using onboard computer vision. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)

  38. Mur-Artal R, Tardós JD (2017) Visual-inertial monocular SLAM with map reuse. IEEE Robot Autom Lett 2(2):796–803

    Article  Google Scholar 

  39. Nägeli T, Alonso-Mora J, Domahidi A, Rus D, Hilliges O (2017) Real-time motion planning for aerial videography with dynamic obstacle avoidance and viewpoint optimization. IEEE Robot Autom Lett 2(3):1696–1703

    Article  Google Scholar 

  40. Nägeli T, Meier L, Domahidi A, Alonso-Mora J, Hilliges O (2017) Real-time planning for automated multi-view drone cinematography. ACM Trans Graphics 36(4):1321–13210

    Article  Google Scholar 

  41. Naseer T, Sturm J, Cremers D (2013) Followme: Person following and gesture recognition with a quadrocopter. In: Proceedings of the IEEE International Conference on Intelligent Robots and Systems (IROS)

  42. Nousi P, Mademlis I, Karakostas I, Tefas A, Pitas I (2019) Embedded UAV real-time visual object detection and tracking. In: Proceedings of the IEEE International Conference on Real-time Computing and Robotics (RCAR)

  43. Nousi P, Patsiouras E, Tefas A, Pitas I (2018) Convolutional Neural Networks for visual information analysis with limited computing resources. In: Proceedings of the IEEE International Conference on Image Processing (ICIP)

  44. Papaioannidis C, Mademlis I, Pitas I (2021) Autonomous UAV safety by visual human crowd detection using multi-task deep neural networks. In: Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)

  45. Papaioannidis C, Mademlis I, Pitas I (2022) Fast CNN-based single-person 2D human pose estimation for autonomous systems. IEEE Transactions on Circuits and Systems for Video Technology

  46. Passalis N, Tefas A (2019) Deep reinforcement learning for controlling frontal person close-up shooting. Neurocomputing 335:37–47

    Article  Google Scholar 

  47. Passalis N, Tefas A, Pitas I (2018) Efficient camera control using 2D visual information for unmanned aerial vehicle-based cinematography. In: Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS)

  48. Patrona F, Mademlis I, Tefas A, Pitas I (2019) Computational UAV cinematography for intelligent shooting based on semantic visual analysis. In: Proceedings of the IEEE International Conference on Image Processing (ICIP)

  49. Patrona F, Nousi P, Mademlis I, Tefas A, Pitas I (2020) Visual object detection for autonomous UAV cinematography. In: Proceedings of the northern lights deep learning workshop

  50. Piao J, Kim S (2019) Real-time visual–inertial SLAM based on adaptive keyframe selection for mobile AR applications. IEEE Trans Multimed 21 (11):2827–2836

    Article  Google Scholar 

  51. Ranjan R, Patel VM, Chellappa R (2017) Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans Pattern Anal Mach Intell 41(1):121–135

    Article  PubMed  Google Scholar 

  52. Redmon J, Farhadi A (2017) Yolo9000: Better, faster, stronger. In: The IEEE conference on computer vision and pattern recognition (CVPR)

  53. Sadeghi F, Levine S (2016) (CAD)2RL:, Real single-image flight without a single real image. arXiv:1611.04201

  54. Serra P, Cunha R, Hamel T, Cabecinhas D, Silvestre C (2016) Landing of a quadrotor on a moving target using dynamic image-based visual servo control. IEEE Trans Robot 32(6):1524–1535

    Article  Google Scholar 

  55. Shah S, Dey D, Lovett C, Kapoor A (2017) Airsim: High-fidelity visual and physical simulation for autonomous vehicles. In: Proceedings of the field and service robotics conference

  56. Symeonidis C, Mademlis I, Nikolaidis N, Pitas I (2019) Improving neural Non-Maximum Suppression for object detection by exploiting interest-point detectors. In: Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing (MLSP)

  57. Teuliere C, Eck L, Marchand E (2011) Chasing a moving target from a flying UAV. In: Proceedings of the IEEE International Conference on Intelligent Robots and Systems (IROS)

  58. Teuliere C, Marchand E (2014) A dense and direct approach to visual servoing using depth maps. IEEE Trans Robot 30(5):1242–1249

    Article  Google Scholar 

  59. Wu Y, Lim J, Yang M (2013) Online object tracking: a benchmark. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  60. Zhong F, Sun P, Luo W, Yan T, Wang Y (2018) AD-VAT: An asymmetric dueling mechanism for learning visual active tracking. In: Proceedings of the International Conference on Learning Representations (ICLR)

Download references

Funding

The research leading to these results has received funding from the European Union’s European Union Horizon 2020 research and innovation programme under grant agreement No 731667 (MULTIDRONE). This publication reflects only the author’s views. The European Union is not liable for any use that may be made of the information contained therein.

The authors have no competing interests to declare that are relevant to the content of this article.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ioannis Mademlis.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mademlis, I., Symeonidis, C., Tefas, A. et al. Vision-based drone control for autonomous UAV cinematography. Multimed Tools Appl 83, 25055–25083 (2024). https://doi.org/10.1007/s11042-023-15336-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-15336-7

Keywords

Navigation