Skip to main content
Log in

An improved deep learning method for flying object detection and recognition

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

The demand for detecting and classifying unmanned aerial vehicle (UAV) objects like birds, planes, and drones is increasing in various fields such as the military, surveillance, etc. A distant object appears to be point size; hence, it is difficult to classify the far-way objects in an image. There is always a trade-off between correct object detection and a confidence value. It is important to detect and classify the object correctly with a high confidence value. This paper introduces a hybrid model based on the combination of a convolutional neural network (CNN) and long short-term memory (LSTM) to detect and classify a UAV. Initially, we presented a comparative study of different algorithms from the You Only Look Once (YOLO) family. We have also gathered and prepared a dataset of images from various sources like GitHub, the University of California Irvine Machine Learning Repository (UCI), and the International Conference on Computer Vision (ICCV) for experimentation. The proposed CNN-LSTM model extracts spatial characteristics of the input video sequence and the better memory capacity of LSTM provides best-memorized results for object detection. The Bayesian optimization is used for hyper-parameter tuning that makes the results of the proposed hybrid CNN-LSTM model more promising when compared to the other state-of-the-art algorithms like YOLO, R-CNN, faster R-CNN, SGD, and CNN. We have also presented the detection accuracy with varying distances. The proposed model performs best w.r.t. precision, recall, training and validation accuracy, and loss. The processing speed per second (FPS) is also nearly equivalent to faster R-CNN.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Availability of data and materials

All the data are collected from well-known repositories on the Internet. The dataset prepared here will be available on request.

References

  1. Sommer, L., Schumann, A., Müller, T., Schuchert, T. and Beyerer, J.: Flying object detection for automatic UAV recognition. In: 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp. 1–6 (2017). doi: https://doi.org/10.1109/AVSS.2017.80785

  2. Redmon, J., Farhadi, A.: YOLO9000: Better, Faster, Stronger, arXiv:1612.08242, (2016)

  3. Nepal, U., Eslamiat, H.: Comparing YOLOv3, YOLOv4 and YOLOv5 for autonomous landing spot detection in faulty UAVs. Sensors 22(2), 464 (2022)

    Article  Google Scholar 

  4. Bochkovskiy, A., Yao, C., Hong-Yuan, W., Liao, M.: YOLOv4: optimal speed and accuracy of object detection.arXiv:2004.10934, (2020)

  5. Yao, G., Sun, Y., Wong, M., Lv, X.: A real-time detection method for concrete surface cracks based on improved YOLOv4. Symmetry 13(9), 1716 (2021). https://doi.org/10.3390/sym13091716

    Article  Google Scholar 

  6. Luo, S., Juan, Y., Xi, Y., Liao, X.: Aircraft target detection in remote sensing images based on improved YOLOv5. IEEE Access 10, 5184–5192 (2022). https://doi.org/10.1109/ACCESS.2022.3140876

    Article  Google Scholar 

  7. Nepal, U., Eslamiat, H.: Comparing YOLOv3, YOLOv4 and YOLOv5 for autonomous landing spot detection in faulty UAVs. Sensors 22(2), 464 (2022). https://doi.org/10.3390/s22020464

    Article  Google Scholar 

  8. Chien-Yao W., Alexey B., and Hong-Yuan M. L.: Institute of Information Science, Academia Sinica, Taiwan YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors, arXiv:2207.02696, (2022)

  9. Girshick, R., Donahue, J., Darrell, T. and Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pages 580–587, (2014).

  10. Girshick. R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, 1440–1448 (2015).

  11. Ren, S., He, K., Girshick, R. and Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Proces. Syst. 91–99 (2015).

  12. Dewangan, D.K., Sahu, S.P.: Towards the design of vision-based intelligent vehicle system: methodologies and challenges. Evol. Intel. 16, 759–800 (2023). https://doi.org/10.1007/s12065-022-00713-2

    Article  Google Scholar 

  13. Roh, M. C. and Lee, J. Y.: "Refining faster-RCNN for accurate object detection. In: 2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA) pp. 514–517 (2017). doi: https://doi.org/10.23919/MVA.2017.7986913.

  14. Zhao, S., Liu, Y., Han, Y., Hong, R., Hu, Q., Tian, Q.: Pooling the convolutional layers in deep convnets for video action recognition. IEEE Trans. Circuit Syst. Video Technol. 28(8), 1839 (2018)

    Article  Google Scholar 

  15. Cao, C., Liu, X., Yang, Y. et al.: Look and think twice: capturing top-down visual attention with feedback convolutional neural networks. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 2956–2964, Santiago, Chile (2016).

  16. Alahi, A., Goel, K., Ramanathan, V., Robicquet, A., Fei-Fei, L. and Savarese, S.: Social LSTM: human trajectory prediction in crowded spaces. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 961–971, Las Vegas, NV, USA (2016).

  17. Wang, Y.: A new concept using LSTM Neural Networks for dynamic system identification. In: 2017 American Control Conference (ACC) pp. 5324-5329 (2017)https://doi.org/10.23919/ACC.2017.7963782

  18. Ji, Y., Wang, L., Wu, W., Shao, H., Feng, Y.: A method for LSTM-based trajectory modeling and abnormal trajectory detection. IEEE Access 8, 104063–104073 (2020). https://doi.org/10.1109/ACCESS.2020.2997967

    Article  Google Scholar 

  19. Torvik, B., Olsen, K.E., Griffiths, H.: Classification of birds and uavs based on radar polarimetry. IEEE Geosci. Remote Sens. Lett. 13(9), 1305–1309 (2016)

    Article  Google Scholar 

  20. Mohajerin, N., Histon, J., Dizaji, R. and Waslander, S. L.: Feature extraction and radar track classification for detecting UAVs in civilian airspace. In: IEEE National Radar Conference - Proceedings, pp. 674–679, (2014).

  21. Srigrarom, S., Hoe Chew, K., Meng Da Lee, D. and Ratsamee, P.: Drone versus Bird Flights: Classification by Trajectories Characterization. In: 2020 59th Annual Conference of the Society of Instrument and Control Engineers of Japan (SICE), pp. 343–348 (2020), doi: https://doi.org/10.23919/SICE48898.2020.9240313.

  22. Gers, F.: Long short-term memory in recurrent neural networks. Neural Comput. (2001)

  23. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. (1997). https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  24. Hochreiter, S.: The vanishing gradient problem during learning recurrent neural nets and problem solutions. Int J Unc. Fuzz. Knowl. Based Syst 6, 107–116 (1998). https://doi.org/10.1142/S0218488598000094

    Article  Google Scholar 

  25. Chen, G.: A gentle tutorial of recurrent neural network with error backpropagation (2016), http://arxiv.org/abs/1610.02583

  26. Chigozie, E.N., Winifred I., Anthony G., and Stephen M.: Activation Functions: Comparison of Trends in Practice and Research for Deep Learning, (2018) arXiv:1811.03378v1

  27. Ruby, U., Yendapalli, V.: Binary cross entropy with deep learning technique for Image classification. Int. J. Adv. Trends Comput. Sci. Eng. (2020). https://doi.org/10.30534/ijatcse/2020/175942020

    Article  Google Scholar 

  28. Wu, J., Chen, X.Y., Zhang, H., Xiong, L.D., Lei, H., Deng, S.H.: Hyperparameter optimization for machine learning models based on bayesian optimization. J. Electron. Sci. Technol. 17(1), 26–40 (2019). https://doi.org/10.11989/JEST.1674-862X.80904120

    Article  Google Scholar 

  29. Zhang, Y., Sohn, K., Villegas, R., Pan, G., Lee; H.: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2015), pp. 249–258

  30. Dalal, N. and Triggs, B.: Histograms of oriented gradients for human detection. In: Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 1, pp. 886–893, IEEE (2005).

  31. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  32. Dewangan, D.K., Sahu, S.P.: Lane detection in intelligent vehicle system using optimal 2- tier deep convolutional neural network. Multimed. Tools Appl. 82, 7293–7317 (2023). https://doi.org/10.1007/s11042-022-13425-7

    Article  Google Scholar 

  33. Sahu, S., Sahu, S.P., Dewangan, D.K. (2023), “Pedestrian Detection Using MobileNetV2 Based Mask R-CNN” In: Joby, P.P., Balas, V.E., Palanisamy, R. (eds) IoT Based Control Networks and Intelligent Systems, Lecture Notes in Networks and Systems, vol 528. Springer, Singapore. https://doi.org/10.1007/978-981-19-5845-8_22

  34. Sahu, S.S., Sahu, S.P., Dewangan, D.K.: Pedestrian detection using ResNet-101 based Mask R-CNN. In: AIP Conference Proceedings 2705 (1): 020008 (2023). https://doi.org/10.1063/5.0134276

  35. Dewangan, D.K., Sahu, S.P.: Optimized convolutional neural network for road detection with structured contour and spatial information for intelligent vehicle system. Int. J. Patt. Recogn. Artif. Intell. (2022). https://doi.org/10.1142/S0218001422520024

    Article  Google Scholar 

Download references

Funding

Authors state no funding is involved.

Author information

Authors and Affiliations

Authors

Contributions

SSA performed algorithm design and implementation, result analysis, and writing-original draft and editing. NW did literature survey, result analysis, and writing-reviewing and editing. AP done algorithm implementation, result analysis, and writing-reviewing and editing. NM provide YOLO experimentation and result analysis. HA gave introduction writing and proofreading.

Corresponding author

Correspondence to Shailendra S. Aote.

Ethics declarations

Competing interests

The authors declare that they have no relevant financial or non-financial/ competing interests to disclose in any material discussed in this article.

Ethical approval

The proposed study do not require ethical approval.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aote, S.S., Wankhade, N., Pardhi, A. et al. An improved deep learning method for flying object detection and recognition. SIViP 18, 143–152 (2024). https://doi.org/10.1007/s11760-023-02703-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-023-02703-y

Keywords

Navigation