Advertisement

A CNN-Based Approach for Automatic Building Detection and Recognition of Roof Types Using a Single Aerial Image

  • Fatemeh Alidoost
  • Hossein ArefiEmail author
Original Article
  • 36 Downloads

Abstract

Automatic detection and reconstruction of buildings have become essential in many remote sensing and computer vision applications. In this paper, the capability of Convolutional Neural Networks (CNNs) is investigated for building detection as well as recognition of roof shapes using a single image. The major steps are including training dataset generation, model training, image segmentation, building detection and roof shape recognition. First, a CNN is trained for extracting urban objects such as trees, roads and buildings. Next, classification of different roof types into flat, gable and hip shapes is performed using the second trained CNN. The assessment results prove effectiveness of the proposed method with approximately 97% and 92% of quality rates in detection and recognition steps, respectively.

Keywords

Convolutional neural network (CNN) Deep learning (DL) 3D modelling Fine-tuning Pattern recognition Selective search 

Zusammenfassung

Ein CNN-basierter Ansatz zur automatischen Erkennung von Gebäuden und Dachtypen in einem einzelnen Luftbild. Die automatische Erkennung und Rekonstruktion von Gebäuden ist bei vielen Anwendungen in Fernerkundung und Computer-Vision unerlässlich geworden. In diesem Beitrag wird die Fähigkeit von Convolutional Neural Networks (CNNs) zur Erkennung von Gebäuden und Dachformen in einem einzelnen Bild untersucht. Die wichtigsten Schritte sind die Erstellung von Trainingsdatensätzen, das Modelltraining, die Bildsegmentierung sowie die Gebäude- und Dachformerkennung. Zunächst wird ein CNN für das Extrahieren von städtischen Objekten wie Bäumen, Straßen und Gebäuden trainiert und der Datensatz klassifiziert. Anschließend erfolgt die Klassifizierung der Dächer in Flach-, Giebel- und Satteldach mit dem zweiten trainierten CNN. Die Ergebnisse belegen den Erfolg der vorgeschlagenen Methode mit ca. 97% bzw. 92% Klassifizierungsgenauigkeit bei Gebäudedetektion und Klassifizierung der Dachformen.

Notes

Acknowledgements

The Vaihingen and Potsdam data sets are provided by the German Society for Photogrammetry, Remote Sensing and Geoinformation (DGPF) (ISPRS 2012; Cramer 2010) which is acknowledged by authors.

References

  1. Alidoost F, Arefi H (2016) Knowledge based 3D building model recognition using convolutional neural networks from lidar and aerial imageries. Int Arch Photogramm Remote Sens Spat Inf Sci XLI-B3:833–840.  https://doi.org/10.5194/isprsarchives-xli-b3-833-2016 CrossRefGoogle Scholar
  2. Awrangjeb M, Zhang C, Fraser CS (2013) Automatic extraction of building roofs using lidar data and multispectral imagery. ISPRS J Photogramm Remote Sens 83:1–18.  https://doi.org/10.1016/j.isprsjprs.2013.05.006 CrossRefGoogle Scholar
  3. Ballard DH, Brown CM (1982) Computer vision. Prentice-Hall Inc, New JerseyGoogle Scholar
  4. Benedek C, Descombes X, Zerubia J (2012) Building development monitoring in multitemporal remotely sensed image pairs with stochastic birth-death dynamics. IEEE Trans Pattern Anal Mach Intell 34(1):33–50.  https://doi.org/10.1109/TPAMI.2011.94 CrossRefGoogle Scholar
  5. Bengio Y (2009) Learning deep architectures for AI. Found Trends® Mach Learn 2(1):1–127.  https://doi.org/10.1561/2200000006.
  6. Bengio Y (2012) Deep learning of representations for unsupervised and transfer learning. JMLR 27:17–37Google Scholar
  7. Chatfield K, Simoyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional nets. Proc B Mach Vision Conf. arXiv:1405.3531
  8. Chen Y, Zhao X, Jia X et al (2015) Spectral-spatial classification of hyperspectral data based on deep belief network. IEEE J Sel Top Appl Earth Obs Remote Sens 8(6):2381–2392.  https://doi.org/10.1109/JSTARS.2015.2388577 CrossRefGoogle Scholar
  9. Cheng L, Gong J, Li M, Liu Y (2011) 3D building model reconstruction from multi-view aerial imagery and lidar data. Photogramm Eng Remote Sens 77(2):125–139.  https://doi.org/10.14358/PERS.77.2.125 CrossRefGoogle Scholar
  10. Cramer M (2010) The DGPF-test on digital airborne camera evaluation—overview and test design. Photogramm Fernerkundung Geoinf 2010:73–82.  https://doi.org/10.1127/1432-8364/2010/0041 CrossRefGoogle Scholar
  11. Deng L, Yu D (2014) Deep learning: methods and applications. Found Trends® Signal Process 7(3–4):197–387.  https://doi.org/10.1136/bmj.319.7209.0a CrossRefGoogle Scholar
  12. Deng J, Dong W, Socher R et al (2009) ImageNet: a large-scale hierarchical image database. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR2009). IEEE, Miami, FL, USA, pp 248–255.  https://doi.org/10.1109/CVPR.2009.5206848
  13. Donahue J, Jia Y, Vinyals O et al (2014) DeCAF: a deep convolutional activation feature for generic visual recognition. Proc 31st Int Conf Mach Learn, PMLR 32(1):647–655Google Scholar
  14. Dornaika F, Moujahid A, El Merabet Y, Ruichek Y (2016) Building detection from orthophotos using a machine learning approach: an empirical study on image segmentation and descriptors. Expert Syst Appl 58:130–142.  https://doi.org/10.1016/j.eswa.2016.03.024 CrossRefGoogle Scholar
  15. Dorninger P, Pfeifer N (2008) A comprehensive automated 3D approach for building extraction, reconstruction, and regularization from airborne laser scanning point clouds. Sensors 8:7323–7343.  https://doi.org/10.3390/s8117323 CrossRefGoogle Scholar
  16. Felzenszwalb PF, Huttenlocher DP (2004) Efficient graph-based image segmentation. Int J Comp Vision 59(2):167–181.  https://doi.org/10.1023/B:VISI.0000022288.19776.77 CrossRefGoogle Scholar
  17. Gamba P, Houshmand B (2000) Digital surface models and building extraction: a comparison of IFSAR and LIDAR data. IEEE Trans Geosci Remote Sens 38(4):1959–1968.  https://doi.org/10.1109/36.851777 CrossRefGoogle Scholar
  18. Ghaffarian S, Ghaffarian S (2014) Automatic building detection based on purposive FastICA (PFICA) algorithm using monocular high resolution google earth images. ISPRS J Photogramm Remote Sens 97:152–159.  https://doi.org/10.1016/j.isprsjprs.2014.08.017 CrossRefGoogle Scholar
  19. Girshick R (2015) Fast R-CNN. In: Proceeding of IEEE conference on computer vision and pattern recognition (CVPR2014). IEEE, Santiago, Chile, pp 1440–1448.  https://doi.org/10.1109/ICCV.2015.169
  20. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR2014). IEEE, Columbus, Ohio, pp 580–587.  https://doi.org/10.1109/CVPR.2014.81
  21. Girshick R, Donahue J, Darrell T, Malik J (2015) Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans Pattern Anal Mach Intell 38(1):142–158.  https://doi.org/10.1109/TPAMI.2015.2437384 CrossRefGoogle Scholar
  22. Guo L, Chehata N, Mallet C, Boukir S (2011) Relevance of airborne lidar and multispectral image data for urban scene classification using random forests. ISPRS J Photogramm Remote Sens 66:56–66.  https://doi.org/10.1016/j.isprsjprs.2010.08.007 CrossRefGoogle Scholar
  23. Haala N, Brenner C (1999) Extraction of buildings and trees in urban environments. ISPRS J Photogramm Remote Sens 54:130–137.  https://doi.org/10.1016/S0924-2716(99)00010-6 CrossRefGoogle Scholar
  24. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask R-CNN. In: Proceedings of IEEE international conference on computer vision (ICCV2017). IEEE, Venice, Italy, pp 2980–2988.  https://doi.org/10.1109/ICCV.2017.322
  25. Hermosilla T, Ruiz LA, Recio JA, Estornell J (2011) Evaluation of automatic building detection approaches combining high resolution images and lidar data. Remote Sens 3:1188–1210.  https://doi.org/10.3390/rs3061188 CrossRefGoogle Scholar
  26. Höfle B, Mücke W, Dutter M, Rutzinger M (2009) Detection of building regions using airborne lidar: a new combination of raster and point cloud based GIS methods. Proc Geoinformatics Forum Salzburg. pp 66–75. https://ezproxy2.utwente.nl/login?url=https://webapps.itc.utwente.nl/library/2009/chap/rutzinger_det.pdf. Accessed 15 Jan 2017
  27. Huang J, Rathod V, Sun C et al (2017) Speed/accuracy trade-offs for modern convolutional object detectors. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR2017). IEEE, Honolulu, HI, USA, pp 3296–3297.  https://doi.org/10.1109/CVPR.2017.351
  28. ISPRS (2012) Web site of the ISPRS test project on urban classification and 3D building reconstruction. Available at http://www2.isprs.org/commissions/comm3/wg4/detection-and-reconstruction.html. Accessed 17 Sep. 2016
  29. Izadi M, Saeedi P (2012) Three-dimensional polygonal building model estimation from single satellite images. Geosci Remote Sens IEEE Trans 50(6):2254–2272.  https://doi.org/10.1109/TGRS.2011.2172995 CrossRefGoogle Scholar
  30. Kabolizade M, Ebadi H, Ahmadi S (2010) An improved snake model for automatic extraction of buildings from urban aerial images and lidar data. Comput Environ Urban Syst 34:435–441.  https://doi.org/10.1016/j.compenvurbsys.2010.04.006 CrossRefGoogle Scholar
  31. Karantzalos K, Koutsourakis P, Kalisperakis I, Grammatikopoulos L (2015) Model-based building detection from low-cost optical sensors onboard unmanned aerial vehicles. Int Arch Photogramm Remote Sens Spat Inf Sci 40:293–297.  https://doi.org/10.5194/isprsarchives-xl-1-w4-293-2015 CrossRefGoogle Scholar
  32. Khurana M, Wadhwa V (2015) Automatic building detection using modified grab cut algorithm from high resolution satellite image. Int J Adv Res Comput Commun Eng 4(8):158–164.  https://doi.org/10.17148/IJARCCE.2015.4833 CrossRefGoogle Scholar
  33. Kim K, Shan J (2011) Building roof modeling from airborne laser scanning data based on level set approach. ISPRS J Photogramm Remote Sens 66:484–497.  https://doi.org/10.1016/j.isprsjprs.2011.02.007 CrossRefGoogle Scholar
  34. Krizhevsky A, Sutskever I, Geoffrey EH (2012) ImageNet classification with deep convolutional neural networks. Proc 25th Int Conf Neural Infor Proc Syst, NIPS’12 1:1097–1105.  https://doi.org/10.1109/5.726791 CrossRefGoogle Scholar
  35. LeCun Y, Bengio Y (1995) Convolutional networks for images, speech, and time-series. In: Arbib MA (ed) The handbook of brain theory and neural networks. MIT PressGoogle Scholar
  36. Li E, Femiani J, Xu S et al (2015) Robust rooftop extraction from visible band images using higher order CRF. IEEE Trans Geosci Remote Sens 53(8):4483–4495.  https://doi.org/10.1109/TGRS.2015.2400462 CrossRefGoogle Scholar
  37. Liu T, Fang S, Zhao Y et al (2015) Implementation of training convolutional neural networks. arXiv:1506.01195
  38. Liu W, Anguelov D, Erhan D et al (2016) SSD: single shot multibox detector. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision—ECCV 2016. ECCV 2016. Lecture notes in computer science. Springer, ChamGoogle Scholar
  39. Maas HG, Vosselman G (1999) Two algorithms for extracting building models from raw laser altimetry data. ISPRS J Photogramm Remote Sens 54:153–163.  https://doi.org/10.1016/S0924-2716(99)00004-0 CrossRefGoogle Scholar
  40. Maitra DS, Bhattacharya U, Parui SK (2015) CNN based common approach to handwritten character recognition of multiple scripts. In: Proceedings of international conference on document analysis recognition (ICDAR2015). IEEE, Tunis, Tunisia, pp 1021–1025.  https://doi.org/10.1109/icdar.2015.7333916
  41. Makantasis K, Karantzalos K, Doulamis A, Doulamis N (2015) Deep supervised learning for hyperspectral data classification through convolutional neural networks. IEEE Int Geosci Remote Sens Symp 2015:4959–4962.  https://doi.org/10.1109/IGARSS.2015.7326945 CrossRefGoogle Scholar
  42. Maltezos E, Ioannidis C (2015) Automatic detection of building points from lidar and dense image matching point clouds. ISPRS Ann Photogramm Remote Sens Spat Inf Sci II-3/W5:33–40.  https://doi.org/10.5194/isprsannals-ii-3-w5-33-2015 CrossRefGoogle Scholar
  43. Manno-Kovacs A, Ok AO (2015) Building detection from monocular vhr images by integrated urban area knowledge. IEEE Geosci Remote Sens Lett 12(10):2140–2144.  https://doi.org/10.1109/LGRS.2015.2452962 CrossRefGoogle Scholar
  44. McGlone JC, Shufelt JA (1994) Projective and object space geometry for monocular building extraction. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR94). IEEE, Seattle, WA, USA, pp 54–61.  https://doi.org/10.1109/CVPR.1994.323810
  45. McKeown DM, Bulwinkle T, Cochran S, Harvey W, McGlone C, Shufelt JA (2000) Performance evaluation for automatic feature extraction. Int Arch Photogramm Remote Sens Spat Inf Sci XXXII I(B2):379–394Google Scholar
  46. Nalani HA (2014) Automatic reconstruction of urban objects from mobile laser scanner data. Dissertation for awarding the academic degree Doktor-Ingenieur. Dresden, GermanyGoogle Scholar
  47. Ok AO, Senaras C, Yuksel B (2013) Automated detection of arbitrarily shaped buildings in complex environments from monocular VHR optical satellite imagery. IEEE Trans Geosci Remote Sens 51(3):1701–1717.  https://doi.org/10.1109/TGRS.2012.2207123 CrossRefGoogle Scholar
  48. Oztimur Karadag O, Senaras C, Yarman Vural FT (2015) Segmentation fusion for building detection using domain-specific information. IEEE J Sel Top Appl Earth Obs Remote Sens 8(7):3305–3315.  https://doi.org/10.1109/JSTARS.2015.2403617 CrossRefGoogle Scholar
  49. Phung SL, Bouzerdoum A (2009) Matlab library for convolutional neural networks. Technical report, ICT research institute, visual and audio signal processing lab, university of Wollongong. https://www.uow.edu.au/~phung. Accessed 15 Aug 2016
  50. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR2016). IEEE, Las Vegas, NV, USA, pp 779–788.  https://doi.org/10.1109/CVPR.2016.91
  51. Ren S, He K, Girshick R, Sun J (2016) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39:1137–1149.  https://doi.org/10.1109/TPAMI.2016.2577031
  52. Rottensteiner F, Trinder J, Clode S, Kubik K (2007) Building detection by fusion of airborne laser scanner data and multi-spectral images: performance evaluation and sensitivity analysis. ISPRS J Photogramm Remote Sens 62:135–149.  https://doi.org/10.1016/j.isprsjprs.2007.03.001 CrossRefGoogle Scholar
  53. Saito S, Aoki Y (2015) Building and road detection from large aerial imagery. Proc. SPIE 9405, Image processing: machine vision applications VIII:94050K.  https://doi.org/10.1117/12.2083273 CrossRefGoogle Scholar
  54. Sampath A, Shan J (2010) Segmentation and reconstruction of polyhedral building roofs from aerial lidar point clouds. IEEE Trans Geosci Remote Sens 48(3):1554–1567.  https://doi.org/10.1109/TGRS.2009.2030180 CrossRefGoogle Scholar
  55. Schmidhuber J (2015) Deep Learning in neural networks: an overview. Neural Networks 61:85–117.  https://doi.org/10.1016/j.neunet.2014.09.003 CrossRefGoogle Scholar
  56. Senaras C, Vural FTY (2016) A self-supervised decision fusion framework for building detection. IEEE J Sel Top Appl Earth Obs Remote Sens 9(5):1780–1791.  https://doi.org/10.1109/JSTARS.2015.2463118 CrossRefGoogle Scholar
  57. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
  58. Singh G, Jouppi M, Zhang Z, Zakhor A (2015) Shadow based building extraction from single satellite image. Comput Imaging XIII:94010F.  https://doi.org/10.1117/12.2083500 CrossRefGoogle Scholar
  59. Tuia D, Flamary R, Courty N (2015) Multiclass feature learning for hyperspectral image classification: sparse and hierarchical solutions. ISPRS J Photogramm Remote Sens 105:272–285.  https://doi.org/10.1016/j.isprsjprs.2015.01.006 CrossRefGoogle Scholar
  60. Uijlings JRR, Van De Sande KEA, Gevers T, Smeulders AWM (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171.  https://doi.org/10.1007/s11263-013-0620-5 CrossRefGoogle Scholar
  61. Vakalopoulou M, Karantzalos K, Komodakis N, Paragios N (2015) Building detection in very high resolution multispectral data with deep learning features. In: Proceedings of IEEE international geoscience remote sensing symposium (IGARSS2015). IEEE, Milan, Italy, pp 1873–1876.  https://doi.org/10.1109/igarss.2015.7326158
  62. Vedaldi A, Lenc K (2015) MatConvNet-Convolutional neural networks for MATLAB. In: Proceedings of the ACM international conference on multimedia. ACM, Brisbane, Australia, pp 689–692.  https://doi.org/10.1145/2733373.2807412
  63. Von Gioi RG, Jakubowicz J, Morel J-M, Randall G (2010) LSD: a fast line segment detector with a false detection control. IEEE Trans Pattern Anal Mach Intell 32(4):722–732.  https://doi.org/10.1109/TPAMI.2008.300 CrossRefGoogle Scholar
  64. Vu TT, Yamazaki F, Matsuoka M (2009) Multi-scale solution for building extraction from lidar and image data. Int J Appl Earth Obs Geoinf 11(4):281–289.  https://doi.org/10.1016/j.jag.2009.03.005 CrossRefGoogle Scholar
  65. Yu B, Liu H, Wu J et al (2010) Automated derivation of urban building density information using airborne lidar data and object-based method. Landsc Urban Plan 98(3–4):210–219.  https://doi.org/10.1016/j.landurbplan.2010.08.004 CrossRefGoogle Scholar
  66. Yuan J (2016) Automatic building extraction in aerial scenes using convolutional networks. http://jiangyeyuan.com/bldgExt.html. arXiv:1602.06564. Accessed 15 Jan 2017
  67. Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. Comput vision–ECCV 2014 8689:818–833.  https://doi.org/10.1007/978-3-319-10590-1_53 CrossRefGoogle Scholar
  68. Zhang K, Yan J, Chen SC (2006) Automatic construction of building footpoints from airborne lidar data. IEEE Trans Geosci Remote Sens 44(9):2523–2533.  https://doi.org/10.1109/TGRS.2006.874137 CrossRefGoogle Scholar
  69. Zhang Y, Sohn K, Villegas R et al (2015) Improving object detection with deep convolutional networks via bayesian optimization and structured prediction. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR2015), Boston, MA, USA, pp 249–258.  https://doi.org/10.1109/cvpr.2015.7298621
  70. Zhang Q, Wang Y, Liu Q et al (2016) CNN based suburban building detection using monocular high resolution google earth images. In: Proceedings of IEEE international geoscience remote sensing symposium (IGARSS2016). IEEE, Beijing, China, pp 661–664.  https://doi.org/10.1109/IGARSS.2016.7729166
  71. Zuo Z, Wang G (2014) Learning discriminative hierarchical features for object recognition. IEEE Signal Process Lett 21(9):1159–1163.  https://doi.org/10.1109/LSP.2014.2298888 CrossRefGoogle Scholar

Copyright information

© Deutsche Gesellschaft für Photogrammetrie, Fernerkundung und Geoinformation (DGPF) e.V. 2019

Authors and Affiliations

  1. 1.School of Surveying and Geospatial Engineering, College of EngineeringUniversity of TehranTehranIran

Personalised recommendations