A CNN-Based Approach for Automatic Building Detection and Recognition of Roof Types Using a Single Aerial Image

  • Fatemeh Alidoost
  • Hossein ArefiEmail author
Original Article


Automatic detection and reconstruction of buildings have become essential in many remote sensing and computer vision applications. In this paper, the capability of Convolutional Neural Networks (CNNs) is investigated for building detection as well as recognition of roof shapes using a single image. The major steps are including training dataset generation, model training, image segmentation, building detection and roof shape recognition. First, a CNN is trained for extracting urban objects such as trees, roads and buildings. Next, classification of different roof types into flat, gable and hip shapes is performed using the second trained CNN. The assessment results prove effectiveness of the proposed method with approximately 97% and 92% of quality rates in detection and recognition steps, respectively.


Convolutional neural network (CNN) Deep learning (DL) 3D modelling Fine-tuning Pattern recognition Selective search 


Ein CNN-basierter Ansatz zur automatischen Erkennung von Gebäuden und Dachtypen in einem einzelnen Luftbild. Die automatische Erkennung und Rekonstruktion von Gebäuden ist bei vielen Anwendungen in Fernerkundung und Computer-Vision unerlässlich geworden. In diesem Beitrag wird die Fähigkeit von Convolutional Neural Networks (CNNs) zur Erkennung von Gebäuden und Dachformen in einem einzelnen Bild untersucht. Die wichtigsten Schritte sind die Erstellung von Trainingsdatensätzen, das Modelltraining, die Bildsegmentierung sowie die Gebäude- und Dachformerkennung. Zunächst wird ein CNN für das Extrahieren von städtischen Objekten wie Bäumen, Straßen und Gebäuden trainiert und der Datensatz klassifiziert. Anschließend erfolgt die Klassifizierung der Dächer in Flach-, Giebel- und Satteldach mit dem zweiten trainierten CNN. Die Ergebnisse belegen den Erfolg der vorgeschlagenen Methode mit ca. 97% bzw. 92% Klassifizierungsgenauigkeit bei Gebäudedetektion und Klassifizierung der Dachformen.



The Vaihingen and Potsdam data sets are provided by the German Society for Photogrammetry, Remote Sensing and Geoinformation (DGPF) (ISPRS 2012; Cramer 2010) which is acknowledged by authors.


  1. Alidoost F, Arefi H (2016) Knowledge based 3D building model recognition using convolutional neural networks from lidar and aerial imageries. Int Arch Photogramm Remote Sens Spat Inf Sci XLI-B3:833–840. CrossRefGoogle Scholar
  2. Awrangjeb M, Zhang C, Fraser CS (2013) Automatic extraction of building roofs using lidar data and multispectral imagery. ISPRS J Photogramm Remote Sens 83:1–18. CrossRefGoogle Scholar
  3. Ballard DH, Brown CM (1982) Computer vision. Prentice-Hall Inc, New JerseyGoogle Scholar
  4. Benedek C, Descombes X, Zerubia J (2012) Building development monitoring in multitemporal remotely sensed image pairs with stochastic birth-death dynamics. IEEE Trans Pattern Anal Mach Intell 34(1):33–50. CrossRefGoogle Scholar
  5. Bengio Y (2009) Learning deep architectures for AI. Found Trends® Mach Learn 2(1):1–127.
  6. Bengio Y (2012) Deep learning of representations for unsupervised and transfer learning. JMLR 27:17–37Google Scholar
  7. Chatfield K, Simoyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional nets. Proc B Mach Vision Conf. arXiv:1405.3531
  8. Chen Y, Zhao X, Jia X et al (2015) Spectral-spatial classification of hyperspectral data based on deep belief network. IEEE J Sel Top Appl Earth Obs Remote Sens 8(6):2381–2392. CrossRefGoogle Scholar
  9. Cheng L, Gong J, Li M, Liu Y (2011) 3D building model reconstruction from multi-view aerial imagery and lidar data. Photogramm Eng Remote Sens 77(2):125–139. CrossRefGoogle Scholar
  10. Cramer M (2010) The DGPF-test on digital airborne camera evaluation—overview and test design. Photogramm Fernerkundung Geoinf 2010:73–82. CrossRefGoogle Scholar
  11. Deng L, Yu D (2014) Deep learning: methods and applications. Found Trends® Signal Process 7(3–4):197–387. CrossRefGoogle Scholar
  12. Deng J, Dong W, Socher R et al (2009) ImageNet: a large-scale hierarchical image database. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR2009). IEEE, Miami, FL, USA, pp 248–255.
  13. Donahue J, Jia Y, Vinyals O et al (2014) DeCAF: a deep convolutional activation feature for generic visual recognition. Proc 31st Int Conf Mach Learn, PMLR 32(1):647–655Google Scholar
  14. Dornaika F, Moujahid A, El Merabet Y, Ruichek Y (2016) Building detection from orthophotos using a machine learning approach: an empirical study on image segmentation and descriptors. Expert Syst Appl 58:130–142. CrossRefGoogle Scholar
  15. Dorninger P, Pfeifer N (2008) A comprehensive automated 3D approach for building extraction, reconstruction, and regularization from airborne laser scanning point clouds. Sensors 8:7323–7343. CrossRefGoogle Scholar
  16. Felzenszwalb PF, Huttenlocher DP (2004) Efficient graph-based image segmentation. Int J Comp Vision 59(2):167–181. CrossRefGoogle Scholar
  17. Gamba P, Houshmand B (2000) Digital surface models and building extraction: a comparison of IFSAR and LIDAR data. IEEE Trans Geosci Remote Sens 38(4):1959–1968. CrossRefGoogle Scholar
  18. Ghaffarian S, Ghaffarian S (2014) Automatic building detection based on purposive FastICA (PFICA) algorithm using monocular high resolution google earth images. ISPRS J Photogramm Remote Sens 97:152–159. CrossRefGoogle Scholar
  19. Girshick R (2015) Fast R-CNN. In: Proceeding of IEEE conference on computer vision and pattern recognition (CVPR2014). IEEE, Santiago, Chile, pp 1440–1448.
  20. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR2014). IEEE, Columbus, Ohio, pp 580–587.
  21. Girshick R, Donahue J, Darrell T, Malik J (2015) Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans Pattern Anal Mach Intell 38(1):142–158. CrossRefGoogle Scholar
  22. Guo L, Chehata N, Mallet C, Boukir S (2011) Relevance of airborne lidar and multispectral image data for urban scene classification using random forests. ISPRS J Photogramm Remote Sens 66:56–66. CrossRefGoogle Scholar
  23. Haala N, Brenner C (1999) Extraction of buildings and trees in urban environments. ISPRS J Photogramm Remote Sens 54:130–137. CrossRefGoogle Scholar
  24. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask R-CNN. In: Proceedings of IEEE international conference on computer vision (ICCV2017). IEEE, Venice, Italy, pp 2980–2988.
  25. Hermosilla T, Ruiz LA, Recio JA, Estornell J (2011) Evaluation of automatic building detection approaches combining high resolution images and lidar data. Remote Sens 3:1188–1210. CrossRefGoogle Scholar
  26. Höfle B, Mücke W, Dutter M, Rutzinger M (2009) Detection of building regions using airborne lidar: a new combination of raster and point cloud based GIS methods. Proc Geoinformatics Forum Salzburg. pp 66–75. Accessed 15 Jan 2017
  27. Huang J, Rathod V, Sun C et al (2017) Speed/accuracy trade-offs for modern convolutional object detectors. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR2017). IEEE, Honolulu, HI, USA, pp 3296–3297.
  28. ISPRS (2012) Web site of the ISPRS test project on urban classification and 3D building reconstruction. Available at Accessed 17 Sep. 2016
  29. Izadi M, Saeedi P (2012) Three-dimensional polygonal building model estimation from single satellite images. Geosci Remote Sens IEEE Trans 50(6):2254–2272. CrossRefGoogle Scholar
  30. Kabolizade M, Ebadi H, Ahmadi S (2010) An improved snake model for automatic extraction of buildings from urban aerial images and lidar data. Comput Environ Urban Syst 34:435–441. CrossRefGoogle Scholar
  31. Karantzalos K, Koutsourakis P, Kalisperakis I, Grammatikopoulos L (2015) Model-based building detection from low-cost optical sensors onboard unmanned aerial vehicles. Int Arch Photogramm Remote Sens Spat Inf Sci 40:293–297. CrossRefGoogle Scholar
  32. Khurana M, Wadhwa V (2015) Automatic building detection using modified grab cut algorithm from high resolution satellite image. Int J Adv Res Comput Commun Eng 4(8):158–164. CrossRefGoogle Scholar
  33. Kim K, Shan J (2011) Building roof modeling from airborne laser scanning data based on level set approach. ISPRS J Photogramm Remote Sens 66:484–497. CrossRefGoogle Scholar
  34. Krizhevsky A, Sutskever I, Geoffrey EH (2012) ImageNet classification with deep convolutional neural networks. Proc 25th Int Conf Neural Infor Proc Syst, NIPS’12 1:1097–1105. CrossRefGoogle Scholar
  35. LeCun Y, Bengio Y (1995) Convolutional networks for images, speech, and time-series. In: Arbib MA (ed) The handbook of brain theory and neural networks. MIT PressGoogle Scholar
  36. Li E, Femiani J, Xu S et al (2015) Robust rooftop extraction from visible band images using higher order CRF. IEEE Trans Geosci Remote Sens 53(8):4483–4495. CrossRefGoogle Scholar
  37. Liu T, Fang S, Zhao Y et al (2015) Implementation of training convolutional neural networks. arXiv:1506.01195
  38. Liu W, Anguelov D, Erhan D et al (2016) SSD: single shot multibox detector. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision—ECCV 2016. ECCV 2016. Lecture notes in computer science. Springer, ChamGoogle Scholar
  39. Maas HG, Vosselman G (1999) Two algorithms for extracting building models from raw laser altimetry data. ISPRS J Photogramm Remote Sens 54:153–163. CrossRefGoogle Scholar
  40. Maitra DS, Bhattacharya U, Parui SK (2015) CNN based common approach to handwritten character recognition of multiple scripts. In: Proceedings of international conference on document analysis recognition (ICDAR2015). IEEE, Tunis, Tunisia, pp 1021–1025.
  41. Makantasis K, Karantzalos K, Doulamis A, Doulamis N (2015) Deep supervised learning for hyperspectral data classification through convolutional neural networks. IEEE Int Geosci Remote Sens Symp 2015:4959–4962. CrossRefGoogle Scholar
  42. Maltezos E, Ioannidis C (2015) Automatic detection of building points from lidar and dense image matching point clouds. ISPRS Ann Photogramm Remote Sens Spat Inf Sci II-3/W5:33–40. CrossRefGoogle Scholar
  43. Manno-Kovacs A, Ok AO (2015) Building detection from monocular vhr images by integrated urban area knowledge. IEEE Geosci Remote Sens Lett 12(10):2140–2144. CrossRefGoogle Scholar
  44. McGlone JC, Shufelt JA (1994) Projective and object space geometry for monocular building extraction. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR94). IEEE, Seattle, WA, USA, pp 54–61.
  45. McKeown DM, Bulwinkle T, Cochran S, Harvey W, McGlone C, Shufelt JA (2000) Performance evaluation for automatic feature extraction. Int Arch Photogramm Remote Sens Spat Inf Sci XXXII I(B2):379–394Google Scholar
  46. Nalani HA (2014) Automatic reconstruction of urban objects from mobile laser scanner data. Dissertation for awarding the academic degree Doktor-Ingenieur. Dresden, GermanyGoogle Scholar
  47. Ok AO, Senaras C, Yuksel B (2013) Automated detection of arbitrarily shaped buildings in complex environments from monocular VHR optical satellite imagery. IEEE Trans Geosci Remote Sens 51(3):1701–1717. CrossRefGoogle Scholar
  48. Oztimur Karadag O, Senaras C, Yarman Vural FT (2015) Segmentation fusion for building detection using domain-specific information. IEEE J Sel Top Appl Earth Obs Remote Sens 8(7):3305–3315. CrossRefGoogle Scholar
  49. Phung SL, Bouzerdoum A (2009) Matlab library for convolutional neural networks. Technical report, ICT research institute, visual and audio signal processing lab, university of Wollongong. Accessed 15 Aug 2016
  50. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR2016). IEEE, Las Vegas, NV, USA, pp 779–788.
  51. Ren S, He K, Girshick R, Sun J (2016) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39:1137–1149.
  52. Rottensteiner F, Trinder J, Clode S, Kubik K (2007) Building detection by fusion of airborne laser scanner data and multi-spectral images: performance evaluation and sensitivity analysis. ISPRS J Photogramm Remote Sens 62:135–149. CrossRefGoogle Scholar
  53. Saito S, Aoki Y (2015) Building and road detection from large aerial imagery. Proc. SPIE 9405, Image processing: machine vision applications VIII:94050K. CrossRefGoogle Scholar
  54. Sampath A, Shan J (2010) Segmentation and reconstruction of polyhedral building roofs from aerial lidar point clouds. IEEE Trans Geosci Remote Sens 48(3):1554–1567. CrossRefGoogle Scholar
  55. Schmidhuber J (2015) Deep Learning in neural networks: an overview. Neural Networks 61:85–117. CrossRefGoogle Scholar
  56. Senaras C, Vural FTY (2016) A self-supervised decision fusion framework for building detection. IEEE J Sel Top Appl Earth Obs Remote Sens 9(5):1780–1791. CrossRefGoogle Scholar
  57. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
  58. Singh G, Jouppi M, Zhang Z, Zakhor A (2015) Shadow based building extraction from single satellite image. Comput Imaging XIII:94010F. CrossRefGoogle Scholar
  59. Tuia D, Flamary R, Courty N (2015) Multiclass feature learning for hyperspectral image classification: sparse and hierarchical solutions. ISPRS J Photogramm Remote Sens 105:272–285. CrossRefGoogle Scholar
  60. Uijlings JRR, Van De Sande KEA, Gevers T, Smeulders AWM (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171. CrossRefGoogle Scholar
  61. Vakalopoulou M, Karantzalos K, Komodakis N, Paragios N (2015) Building detection in very high resolution multispectral data with deep learning features. In: Proceedings of IEEE international geoscience remote sensing symposium (IGARSS2015). IEEE, Milan, Italy, pp 1873–1876.
  62. Vedaldi A, Lenc K (2015) MatConvNet-Convolutional neural networks for MATLAB. In: Proceedings of the ACM international conference on multimedia. ACM, Brisbane, Australia, pp 689–692.
  63. Von Gioi RG, Jakubowicz J, Morel J-M, Randall G (2010) LSD: a fast line segment detector with a false detection control. IEEE Trans Pattern Anal Mach Intell 32(4):722–732. CrossRefGoogle Scholar
  64. Vu TT, Yamazaki F, Matsuoka M (2009) Multi-scale solution for building extraction from lidar and image data. Int J Appl Earth Obs Geoinf 11(4):281–289. CrossRefGoogle Scholar
  65. Yu B, Liu H, Wu J et al (2010) Automated derivation of urban building density information using airborne lidar data and object-based method. Landsc Urban Plan 98(3–4):210–219. CrossRefGoogle Scholar
  66. Yuan J (2016) Automatic building extraction in aerial scenes using convolutional networks. arXiv:1602.06564. Accessed 15 Jan 2017
  67. Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. Comput vision–ECCV 2014 8689:818–833. CrossRefGoogle Scholar
  68. Zhang K, Yan J, Chen SC (2006) Automatic construction of building footpoints from airborne lidar data. IEEE Trans Geosci Remote Sens 44(9):2523–2533. CrossRefGoogle Scholar
  69. Zhang Y, Sohn K, Villegas R et al (2015) Improving object detection with deep convolutional networks via bayesian optimization and structured prediction. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR2015), Boston, MA, USA, pp 249–258.
  70. Zhang Q, Wang Y, Liu Q et al (2016) CNN based suburban building detection using monocular high resolution google earth images. In: Proceedings of IEEE international geoscience remote sensing symposium (IGARSS2016). IEEE, Beijing, China, pp 661–664.
  71. Zuo Z, Wang G (2014) Learning discriminative hierarchical features for object recognition. IEEE Signal Process Lett 21(9):1159–1163. CrossRefGoogle Scholar

Copyright information

© Deutsche Gesellschaft für Photogrammetrie, Fernerkundung und Geoinformation (DGPF) e.V. 2019

Authors and Affiliations

  1. 1.School of Surveying and Geospatial Engineering, College of EngineeringUniversity of TehranTehranIran

Personalised recommendations