Skip to main content
Log in

Developing approaches in building classification and extraction with synergy of YOLOV8 and SAM models

  • Published:
Spatial Information Research Aims and scope Submit manuscript

Abstract

The ability to extract meaningful information from visual material, such as photographs and videos, has significantly enhanced the potential for object recognition in various disciplines. However, challenges arise in the geospatial domain while features are extracted. Existing approaches primarily focus on remotely sensed images, emphasizing semantic segmentation tasks. This study, in contrast, prioritizes the extraction of buildings as well as the classification of the structures into residential and non-residential types using instance segmentation. The proposed model pipeline combines the YOLOV8 detection with the Segment Anything Model algorithm to achieve these objectives. The approach outlined in this research produces segmentation outcomes that align with evaluation metrics, comparable to those achieved by earlier instance segmentation methods and the segmentation strategies utilized for assessing building extraction performance. Additionally, the segmentation results are georeferenced using extracted geospatial information, and vector images of the identified building rooftops are generated. The approach demonstrates robustness in effectively segmenting target objects, regardless of diverse characteristics like shape, size, or orientation. The model pipeline exhibits superior precision (0.929), recall (0.838), and mean average precision (0.899) values. Moreover, the model produces results approximately 50% faster in terms of inference time compared to other instance segmentation models. The proposed model pipeline holds significant applicability in valorous fields, including urban planning, transportation planning, urban development.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.

References

  1. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84–90.

    Article  Google Scholar 

  2. Yang, Z., & Nevatia, R. (2016, December). A multi-scale cascade fully convolutional network face detector. In 2016 23rd International Conference on Pattern Recognition (ICPR) (pp. 633–638). IEEE.

  3. Chen, X., Ma, H., Wan, J., Li, B., & Xia, T. (2017). Multi-view 3d object detection network for autonomous driving. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 1907–1915).

  4. Li, J., Gu, J., Huang, Z., & Wen, J. (2019). Application research of improved YOLO V3 algorithm in PNB electronic component detection. Applied Sciences, 9(18), 3750.

    Article  Google Scholar 

  5. Mao, Q. C., Sun, H. M., Liu, Y. B., & Jia, R. S. (2019). Mini-YOLOv3: Real-time object detector for embedded applications. IEEE Access, 7, 133529–133538.

    Article  Google Scholar 

  6. Kannadaguli P (2020) YOLO v4 based human detection system using aerial thermal imaging for UAV based surveillance applications. In 2020 international conference on decision aid sciences and application (DASA) pp 1213–1219.

  7. Amit, Y., Felzenszwalb, P., & Girshick, R. (2020). Object detection. Computer Vision: A Reference Guide, 1–9.

  8. Zou, Z., Chen, K., Shi, Z., Guo, Y., & Ye, J. (2023). Object detection in 20 years: A survey. Proceedings of the IEEE.

  9. Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580–587).

  10. Girshick, R., Fast R. C. N. N. (2015). Microsoft Research. Fast R-CNN, 27.

  11. Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779–788).

  12. Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28.

  13. Chen, L. C., Hermans, A., Papandreou, G., Schroff, F., Wang, P., & Adam, H. (2018). Masklab: Instance segmentation by refining object detection with semantic and direction features. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4013–4022).

  14. Belgiu, M., Tomljenovic, I., Lampoltshammer, T. J., Blaschke, T., & Höfle, B. (2014). Ontology-based classification of building types detected from airborne laser scanning data. Remote Sensing, 6(2), 1347–1366.

    Article  ADS  Google Scholar 

  15. Lu, Z., Im, J., Rhee, J., & Hodgson, M. (2014). Building type classification using spatial and landscape attributes derived from LiDAR remote sensing data. Landscape and Urban Planning, 130, 134–148.

    Article  Google Scholar 

  16. Du, S., Zhang, F., & Zhang, X. (2015). Semantic classification of urban buildings combining VHR image and GIS data: An improved random forest approach. ISPRS journal of photogrammetry and remote sensing, 105, 107–119.

    Article  ADS  Google Scholar 

  17. Sritarapipat, T., & Takeuchi, W. (2017). Building classification in Yangon City, Myanmar using Stereo GeoEye images, Landsat image and night-time light data. Remote Sensing Applications: Society and Environment, 6, 46–51.

    Article  Google Scholar 

  18. Vasavi, S., Somagani, H. S., & Sai, Y. (2023). Classification of buildings from VHR satellite images using ensemble of U-Net and ResNet. The Egyptian Journal of Remote Sensing and Space Sciences, 26(4), 937–953.

    Article  Google Scholar 

  19. Terven, J., & Cordova-Esparza, D. (2023). A comprehensive review of YOLO: From YOLOv1 to YOLOv8 and beyond. arXiv preprint arXiv:2304.00501.

  20. He, K., Zhang, X., Ren, S., & Sun, J. (2015). Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9), 1904–1916.

    Article  PubMed  Google Scholar 

  21. Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2117–2125).

  22. Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3431–3440).

  23. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020). End-to-end object detection with transformers. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16 (pp. 213–229). Springer International Publishing.

  24. He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961–2969).

  25. Bolya, D., Zhou, C., Xiao, F., & Lee, Y. J. (2019). Yolact: Real-time instance segmentation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9157–9166).

  26. Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., & Girshick, R. (2023). Segment anything. arXiv preprint arXiv:2304.02643.

  27. Aithal, B. H., Shivamurthy, V., & Ramachandra, T. V. (2017). Characterization and visualization of spatial patterns of urbanisation and sprawl through metrics and modeling. Cities and the Environment (CATE), 10(1), 5.

    Google Scholar 

  28. Chhor, G., Aramburu, C. B., & Bougdal-Lambert, I. (2017). Satellite image segmentation for building detection using U-Net. Web: http://cs229.stanford.edu/proj2017/final-reports/5243715.pdf.

  29. Ji, S., Wei, S., & Lu, M. (2018). Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set. IEEE Transactions on Geoscience and Remote Sensing, 57(1), 574–586.

    Article  ADS  Google Scholar 

  30. Aamir, M., Pu, Y. F., Rahman, Z., Tahir, M., Naeem, H., & Dai, Q. (2018). A framework for automatic building detection from low-contrast satellite images. Symmetry, 11(1), 3.

    Article  ADS  Google Scholar 

  31. Wang, H., & Miao, F. (2022). Building extraction from remote sensing images using deep residual U-Net. European Journal of Remote Sensing, 55(1), 71–85.

    Article  Google Scholar 

  32. Chen, X., Qiu, C., Guo, W., Yu, A., Tong, X., & Schmitt, M. (2022). Multiscale feature learning by transformer for building extraction from satellite images. IEEE Geoscience and Remote Sensing Letters, 19, 1–5.

    Google Scholar 

  33. Ps, P., & Aithal, B. H. (2023). Building footprint extraction from very high-resolution satellite images using deep Learning. Journal of Spatial Science, 68(3), 487–503.

    Article  Google Scholar 

  34. Yu, B., Yang, A., Chen, F., Wang, N., & Wang, L. (2022). SNNFD, spiking neural segmentation network in frequency domain using high spatial resolution images for building extraction. International Journal of Applied Earth Observation and Geoinformation, 112, 102930.

    Article  Google Scholar 

  35. Chen, Z., Luo, Y., Wang, J., Li, J., Wang, C., & Li, D. (2023). DPENet: Dual-path extraction network based on CNN and transformer for accurate building and road extraction. International Journal of Applied Earth Observation and Geoinformation, 124, 103510.

    Article  Google Scholar 

  36. Priyanka, N. S., Lal, S., Nalini, J., Reddy, C. S., & Dell’Acqua, F. (2022). DIResUNet: Architecture for multiclass semantic segmentation of high resolution remote sensing imagery data. Applied Intelligence, 52(13), 15462–15482.

    Article  Google Scholar 

  37. Jocher, G., Chaurasia, A., & Qiu, J. (2023). YOLO by Ultralytics (Version 8.0.0) [Computer software]. https://github.com/ultralytics/ultralytics

  38. Jocher, G., Nishimura, K., Mineeva, T., & Vilarino, R. (2020). Yolov5 by ultralytics. Disponıvel em: https://github.com/ultralytics/yolov5.

  39. Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., & Ren, D. (2020, April). Distance-IoU loss: Faster and better Learning for bounding box regression. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, No. 07, pp. 12993–13000).

  40. Li, X., Wang, W., Wu, L., Chen, S., Hu, X., Li, J., & Yang, J. (2020). Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection. Advances in Neural Information Processing Systems, 33, 21002–21012.

    Google Scholar 

  41. Cheng, B., Schwing, A., & Kirillov, A. (2021). Per-pixel classification is not all you need for semantic segmentation. Advances in Neural Information Processing Systems, 34, 17864–17875.

    Google Scholar 

  42. He, K., Chen, X., Xie, S., Li, Y., Dollár, P., & Girshick, R. (2022). Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 16000–16009).

  43. Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., & Sutskever, I. (2021, July). Learning transferable visual models from natural language supervision. In International conference on machine learning (pp. 8748–8763). PMLR.

  44. Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., & Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901.

    Google Scholar 

  45. Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part V 13 (pp. 740–755). Springer International Publishing.

  46. Van Etten, A., Lindenbaum, D., & Bacastow, T. M. (2018). SpaceNet: A remote sensing dataset and challenge series. arXiv preprint arXiv:1807.01232

  47. Lebedev, M. A., Vizilter, Y. V., Vygolov, O. V., Knyaz, V. A., & Rubis, A. Y. (2018). Change detection in remote sensing images using conditional adversarial networks. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 42, 565–571.

    Article  ADS  Google Scholar 

  48. Maggiori, E., Tarabalka, Y., Charpiat, G., & Alliez, P. (2017, July). Can semantic labeling methods generalize to any city? the inria aerial image labeling benchmark. In 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS) (pp. 3226–3229). IEEE.

  49. Madhumita, D., Bharath, H. A., Devendra, V. P., & Shivam, B. (2023). Road segmentation: exploiting the efficiency of skip connections for efficient semantic segmentation. Journal of South Asian Logistics and Transport, 3(1).

Download references

Acknowledgements

We thank the Indian Institute of Technology Kharagpur and Ranbir and Chitra Gupta School of Infrastructure Design and Management for financial and Infrastructure support.

Funding

The authors are thankful to Indian Institute of Technology Kharagpur for the financial and Infrastructure support.

Author information

Authors and Affiliations

Authors

Contributions

BHA contributed to formulating strategy, data collection, technical inputs, Funding for the work and paper writing as major part of contribution to this work. AK contributed to data collection, analysis of data and with major inputs in paper writing. AB was responsible for application and analysis of the work. AKG was responsible for formulating the overall analysis and final writing. All authors have read and agreed to the submitted version of the manuscript.

Corresponding author

Correspondence to Bharath H. Aithal.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest regarding the publication of the paper.

Ethical approval

All authors have read, understood, and have complied as applicable with the statement on "Ethical responsibilities of Authors" as found in the Instructions for Authors and are aware that with minor exceptions, no changes can be made to authorship once the paper is submitted.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 19773 kb)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khatua, A., Bhattacharya, A., Goswami, A.K. et al. Developing approaches in building classification and extraction with synergy of YOLOV8 and SAM models. Spat. Inf. Res. (2024). https://doi.org/10.1007/s41324-024-00574-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s41324-024-00574-0

Keywords

Navigation