Developing approaches in building classification and extraction with synergy of YOLOV8 and SAM models

Khatua, Aniruddha; Bhattacharya, Apratim; Goswami, Arkopal K.; Aithal, Bharath H.

doi:10.1007/s41324-024-00574-0

Developing approaches in building classification and extraction with synergy of YOLOV8 and SAM models

Published: 02 March 2024

(2024)
Cite this article

Spatial Information Research Aims and scope Submit manuscript

Aniruddha Khatua¹,
Apratim Bhattacharya¹,
Arkopal K. Goswami² &
…
Bharath H. Aithal ORCID: orcid.org/0000-0002-4323-6254¹

306 Accesses
Explore all metrics

Abstract

The ability to extract meaningful information from visual material, such as photographs and videos, has significantly enhanced the potential for object recognition in various disciplines. However, challenges arise in the geospatial domain while features are extracted. Existing approaches primarily focus on remotely sensed images, emphasizing semantic segmentation tasks. This study, in contrast, prioritizes the extraction of buildings as well as the classification of the structures into residential and non-residential types using instance segmentation. The proposed model pipeline combines the YOLOV8 detection with the Segment Anything Model algorithm to achieve these objectives. The approach outlined in this research produces segmentation outcomes that align with evaluation metrics, comparable to those achieved by earlier instance segmentation methods and the segmentation strategies utilized for assessing building extraction performance. Additionally, the segmentation results are georeferenced using extracted geospatial information, and vector images of the identified building rooftops are generated. The approach demonstrates robustness in effectively segmenting target objects, regardless of diverse characteristics like shape, size, or orientation. The model pipeline exhibits superior precision (0.929), recall (0.838), and mean average precision (0.899) values. Moreover, the model produces results approximately 50% faster in terms of inference time compared to other instance segmentation models. The proposed model pipeline holds significant applicability in valorous fields, including urban planning, transportation planning, urban development.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An improved fire detection approach based on YOLO-v8 for smart cities

Article Open access 28 July 2023

A survey on instance segmentation: state of the art

Article 03 July 2020

A comprehensive survey of image segmentation: clustering methods, performance parameters, and benchmark datasets

Article 09 February 2021

Data availability

The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.

References

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84–90.
Article Google Scholar
Yang, Z., & Nevatia, R. (2016, December). A multi-scale cascade fully convolutional network face detector. In 2016 23rd International Conference on Pattern Recognition (ICPR) (pp. 633–638). IEEE.
Chen, X., Ma, H., Wan, J., Li, B., & Xia, T. (2017). Multi-view 3d object detection network for autonomous driving. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 1907–1915).
Li, J., Gu, J., Huang, Z., & Wen, J. (2019). Application research of improved YOLO V3 algorithm in PNB electronic component detection. Applied Sciences, 9(18), 3750.
Article Google Scholar
Mao, Q. C., Sun, H. M., Liu, Y. B., & Jia, R. S. (2019). Mini-YOLOv3: Real-time object detector for embedded applications. IEEE Access, 7, 133529–133538.
Article Google Scholar
Kannadaguli P (2020) YOLO v4 based human detection system using aerial thermal imaging for UAV based surveillance applications. In 2020 international conference on decision aid sciences and application (DASA) pp 1213–1219.
Amit, Y., Felzenszwalb, P., & Girshick, R. (2020). Object detection. Computer Vision: A Reference Guide, 1–9.
Zou, Z., Chen, K., Shi, Z., Guo, Y., & Ye, J. (2023). Object detection in 20 years: A survey. Proceedings of the IEEE.
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 580–587).
Girshick, R., Fast R. C. N. N. (2015). Microsoft Research. Fast R-CNN, 27.
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779–788).
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28.
Chen, L. C., Hermans, A., Papandreou, G., Schroff, F., Wang, P., & Adam, H. (2018). Masklab: Instance segmentation by refining object detection with semantic and direction features. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4013–4022).
Belgiu, M., Tomljenovic, I., Lampoltshammer, T. J., Blaschke, T., & Höfle, B. (2014). Ontology-based classification of building types detected from airborne laser scanning data. Remote Sensing, 6(2), 1347–1366.
Article ADS Google Scholar
Lu, Z., Im, J., Rhee, J., & Hodgson, M. (2014). Building type classification using spatial and landscape attributes derived from LiDAR remote sensing data. Landscape and Urban Planning, 130, 134–148.
Article Google Scholar
Du, S., Zhang, F., & Zhang, X. (2015). Semantic classification of urban buildings combining VHR image and GIS data: An improved random forest approach. ISPRS journal of photogrammetry and remote sensing, 105, 107–119.
Article ADS Google Scholar
Sritarapipat, T., & Takeuchi, W. (2017). Building classification in Yangon City, Myanmar using Stereo GeoEye images, Landsat image and night-time light data. Remote Sensing Applications: Society and Environment, 6, 46–51.
Article Google Scholar
Vasavi, S., Somagani, H. S., & Sai, Y. (2023). Classification of buildings from VHR satellite images using ensemble of U-Net and ResNet. The Egyptian Journal of Remote Sensing and Space Sciences, 26(4), 937–953.
Article Google Scholar
Terven, J., & Cordova-Esparza, D. (2023). A comprehensive review of YOLO: From YOLOv1 to YOLOv8 and beyond. arXiv preprint arXiv:2304.00501.
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(9), 1904–1916.
Article PubMed Google Scholar
Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017). Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2117–2125).
Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3431–3440).
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020). End-to-end object detection with transformers. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16 (pp. 213–229). Springer International Publishing.
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (pp. 2961–2969).
Bolya, D., Zhou, C., Xiao, F., & Lee, Y. J. (2019). Yolact: Real-time instance segmentation. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 9157–9166).
Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., & Girshick, R. (2023). Segment anything. arXiv preprint arXiv:2304.02643.
Aithal, B. H., Shivamurthy, V., & Ramachandra, T. V. (2017). Characterization and visualization of spatial patterns of urbanisation and sprawl through metrics and modeling. Cities and the Environment (CATE), 10(1), 5.
Google Scholar
Chhor, G., Aramburu, C. B., & Bougdal-Lambert, I. (2017). Satellite image segmentation for building detection using U-Net. Web: http://cs229.stanford.edu/proj2017/final-reports/5243715.pdf.
Ji, S., Wei, S., & Lu, M. (2018). Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set. IEEE Transactions on Geoscience and Remote Sensing, 57(1), 574–586.
Article ADS Google Scholar
Aamir, M., Pu, Y. F., Rahman, Z., Tahir, M., Naeem, H., & Dai, Q. (2018). A framework for automatic building detection from low-contrast satellite images. Symmetry, 11(1), 3.
Article ADS Google Scholar
Wang, H., & Miao, F. (2022). Building extraction from remote sensing images using deep residual U-Net. European Journal of Remote Sensing, 55(1), 71–85.
Article Google Scholar
Chen, X., Qiu, C., Guo, W., Yu, A., Tong, X., & Schmitt, M. (2022). Multiscale feature learning by transformer for building extraction from satellite images. IEEE Geoscience and Remote Sensing Letters, 19, 1–5.
Google Scholar
Ps, P., & Aithal, B. H. (2023). Building footprint extraction from very high-resolution satellite images using deep Learning. Journal of Spatial Science, 68(3), 487–503.
Article Google Scholar
Yu, B., Yang, A., Chen, F., Wang, N., & Wang, L. (2022). SNNFD, spiking neural segmentation network in frequency domain using high spatial resolution images for building extraction. International Journal of Applied Earth Observation and Geoinformation, 112, 102930.
Article Google Scholar
Chen, Z., Luo, Y., Wang, J., Li, J., Wang, C., & Li, D. (2023). DPENet: Dual-path extraction network based on CNN and transformer for accurate building and road extraction. International Journal of Applied Earth Observation and Geoinformation, 124, 103510.
Article Google Scholar
Priyanka, N. S., Lal, S., Nalini, J., Reddy, C. S., & Dell’Acqua, F. (2022). DIResUNet: Architecture for multiclass semantic segmentation of high resolution remote sensing imagery data. Applied Intelligence, 52(13), 15462–15482.
Article Google Scholar
Jocher, G., Chaurasia, A., & Qiu, J. (2023). YOLO by Ultralytics (Version 8.0.0) [Computer software]. https://github.com/ultralytics/ultralytics
Jocher, G., Nishimura, K., Mineeva, T., & Vilarino, R. (2020). Yolov5 by ultralytics. Disponıvel em: https://github.com/ultralytics/yolov5.
Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., & Ren, D. (2020, April). Distance-IoU loss: Faster and better Learning for bounding box regression. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, No. 07, pp. 12993–13000).
Li, X., Wang, W., Wu, L., Chen, S., Hu, X., Li, J., & Yang, J. (2020). Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection. Advances in Neural Information Processing Systems, 33, 21002–21012.
Google Scholar
Cheng, B., Schwing, A., & Kirillov, A. (2021). Per-pixel classification is not all you need for semantic segmentation. Advances in Neural Information Processing Systems, 34, 17864–17875.
Google Scholar
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., & Girshick, R. (2022). Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 16000–16009).
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., & Sutskever, I. (2021, July). Learning transferable visual models from natural language supervision. In International conference on machine learning (pp. 8748–8763). PMLR.
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., & Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901.
Google Scholar
Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., & Zitnick, C. L. (2014). Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part V 13 (pp. 740–755). Springer International Publishing.
Van Etten, A., Lindenbaum, D., & Bacastow, T. M. (2018). SpaceNet: A remote sensing dataset and challenge series. arXiv preprint arXiv:1807.01232
Lebedev, M. A., Vizilter, Y. V., Vygolov, O. V., Knyaz, V. A., & Rubis, A. Y. (2018). Change detection in remote sensing images using conditional adversarial networks. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 42, 565–571.
Article ADS Google Scholar
Maggiori, E., Tarabalka, Y., Charpiat, G., & Alliez, P. (2017, July). Can semantic labeling methods generalize to any city? the inria aerial image labeling benchmark. In 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS) (pp. 3226–3229). IEEE.
Madhumita, D., Bharath, H. A., Devendra, V. P., & Shivam, B. (2023). Road segmentation: exploiting the efficiency of skip connections for efficient semantic segmentation. Journal of South Asian Logistics and Transport, 3(1).

Download references

Acknowledgements

We thank the Indian Institute of Technology Kharagpur and Ranbir and Chitra Gupta School of Infrastructure Design and Management for financial and Infrastructure support.

Funding

The authors are thankful to Indian Institute of Technology Kharagpur for the financial and Infrastructure support.

Author information

Authors and Affiliations

Energy and Urban Research Group, Ranbir and Chitra Gupta School of Infrastructure Design and Management, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India
Aniruddha Khatua, Apratim Bhattacharya & Bharath H. Aithal
Multimodal Urban Sustainable Transportation Research Group, Ranbir and Chitra Gupta School of Infrastructure Design and Management, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India
Arkopal K. Goswami

Authors

Aniruddha Khatua
View author publications
You can also search for this author in PubMed Google Scholar
Apratim Bhattacharya
View author publications
You can also search for this author in PubMed Google Scholar
Arkopal K. Goswami
View author publications
You can also search for this author in PubMed Google Scholar
Bharath H. Aithal
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

BHA contributed to formulating strategy, data collection, technical inputs, Funding for the work and paper writing as major part of contribution to this work. AK contributed to data collection, analysis of data and with major inputs in paper writing. AB was responsible for application and analysis of the work. AKG was responsible for formulating the overall analysis and final writing. All authors have read and agreed to the submitted version of the manuscript.

Corresponding author

Correspondence to Bharath H. Aithal.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest regarding the publication of the paper.

Ethical approval

All authors have read, understood, and have complied as applicable with the statement on "Ethical responsibilities of Authors" as found in the Instructions for Authors and are aware that with minor exceptions, no changes can be made to authorship once the paper is submitted.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 19773 kb)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Khatua, A., Bhattacharya, A., Goswami, A.K. et al. Developing approaches in building classification and extraction with synergy of YOLOV8 and SAM models. Spat. Inf. Res. (2024). https://doi.org/10.1007/s41324-024-00574-0

Download citation

Received: 21 December 2023
Revised: 02 February 2024
Accepted: 03 February 2024
Published: 02 March 2024
DOI: https://doi.org/10.1007/s41324-024-00574-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Developing approaches in building classification and extraction with synergy of YOLOV8 and SAM models

Abstract

Access this article

Similar content being viewed by others

An improved fire detection approach based on YOLO-v8 for smart cities

A survey on instance segmentation: state of the art

A comprehensive survey of image segmentation: clustering methods, performance parameters, and benchmark datasets

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (DOCX 19773 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Developing approaches in building classification and extraction with synergy of YOLOV8 and SAM models

Abstract

Access this article

Similar content being viewed by others

An improved fire detection approach based on YOLO-v8 for smart cities

A survey on instance segmentation: state of the art

A comprehensive survey of image segmentation: clustering methods, performance parameters, and benchmark datasets

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Supplementary Information

Supplementary file1 (DOCX 19773 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation