A CNN-Based Approach for Automatic Building Detection and Recognition of Roof Types Using a Single Aerial Image
- 36 Downloads
Abstract
Automatic detection and reconstruction of buildings have become essential in many remote sensing and computer vision applications. In this paper, the capability of Convolutional Neural Networks (CNNs) is investigated for building detection as well as recognition of roof shapes using a single image. The major steps are including training dataset generation, model training, image segmentation, building detection and roof shape recognition. First, a CNN is trained for extracting urban objects such as trees, roads and buildings. Next, classification of different roof types into flat, gable and hip shapes is performed using the second trained CNN. The assessment results prove effectiveness of the proposed method with approximately 97% and 92% of quality rates in detection and recognition steps, respectively.
Keywords
Convolutional neural network (CNN) Deep learning (DL) 3D modelling Fine-tuning Pattern recognition Selective searchZusammenfassung
Ein CNN-basierter Ansatz zur automatischen Erkennung von Gebäuden und Dachtypen in einem einzelnen Luftbild. Die automatische Erkennung und Rekonstruktion von Gebäuden ist bei vielen Anwendungen in Fernerkundung und Computer-Vision unerlässlich geworden. In diesem Beitrag wird die Fähigkeit von Convolutional Neural Networks (CNNs) zur Erkennung von Gebäuden und Dachformen in einem einzelnen Bild untersucht. Die wichtigsten Schritte sind die Erstellung von Trainingsdatensätzen, das Modelltraining, die Bildsegmentierung sowie die Gebäude- und Dachformerkennung. Zunächst wird ein CNN für das Extrahieren von städtischen Objekten wie Bäumen, Straßen und Gebäuden trainiert und der Datensatz klassifiziert. Anschließend erfolgt die Klassifizierung der Dächer in Flach-, Giebel- und Satteldach mit dem zweiten trainierten CNN. Die Ergebnisse belegen den Erfolg der vorgeschlagenen Methode mit ca. 97% bzw. 92% Klassifizierungsgenauigkeit bei Gebäudedetektion und Klassifizierung der Dachformen.
Notes
References
- Alidoost F, Arefi H (2016) Knowledge based 3D building model recognition using convolutional neural networks from lidar and aerial imageries. Int Arch Photogramm Remote Sens Spat Inf Sci XLI-B3:833–840. https://doi.org/10.5194/isprsarchives-xli-b3-833-2016 CrossRefGoogle Scholar
- Awrangjeb M, Zhang C, Fraser CS (2013) Automatic extraction of building roofs using lidar data and multispectral imagery. ISPRS J Photogramm Remote Sens 83:1–18. https://doi.org/10.1016/j.isprsjprs.2013.05.006 CrossRefGoogle Scholar
- Ballard DH, Brown CM (1982) Computer vision. Prentice-Hall Inc, New JerseyGoogle Scholar
- Benedek C, Descombes X, Zerubia J (2012) Building development monitoring in multitemporal remotely sensed image pairs with stochastic birth-death dynamics. IEEE Trans Pattern Anal Mach Intell 34(1):33–50. https://doi.org/10.1109/TPAMI.2011.94 CrossRefGoogle Scholar
- Bengio Y (2009) Learning deep architectures for AI. Found Trends® Mach Learn 2(1):1–127. https://doi.org/10.1561/2200000006.
- Bengio Y (2012) Deep learning of representations for unsupervised and transfer learning. JMLR 27:17–37Google Scholar
- Chatfield K, Simoyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional nets. Proc B Mach Vision Conf. arXiv:1405.3531
- Chen Y, Zhao X, Jia X et al (2015) Spectral-spatial classification of hyperspectral data based on deep belief network. IEEE J Sel Top Appl Earth Obs Remote Sens 8(6):2381–2392. https://doi.org/10.1109/JSTARS.2015.2388577 CrossRefGoogle Scholar
- Cheng L, Gong J, Li M, Liu Y (2011) 3D building model reconstruction from multi-view aerial imagery and lidar data. Photogramm Eng Remote Sens 77(2):125–139. https://doi.org/10.14358/PERS.77.2.125 CrossRefGoogle Scholar
- Cramer M (2010) The DGPF-test on digital airborne camera evaluation—overview and test design. Photogramm Fernerkundung Geoinf 2010:73–82. https://doi.org/10.1127/1432-8364/2010/0041 CrossRefGoogle Scholar
- Deng L, Yu D (2014) Deep learning: methods and applications. Found Trends® Signal Process 7(3–4):197–387. https://doi.org/10.1136/bmj.319.7209.0a CrossRefGoogle Scholar
- Deng J, Dong W, Socher R et al (2009) ImageNet: a large-scale hierarchical image database. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR2009). IEEE, Miami, FL, USA, pp 248–255. https://doi.org/10.1109/CVPR.2009.5206848
- Donahue J, Jia Y, Vinyals O et al (2014) DeCAF: a deep convolutional activation feature for generic visual recognition. Proc 31st Int Conf Mach Learn, PMLR 32(1):647–655Google Scholar
- Dornaika F, Moujahid A, El Merabet Y, Ruichek Y (2016) Building detection from orthophotos using a machine learning approach: an empirical study on image segmentation and descriptors. Expert Syst Appl 58:130–142. https://doi.org/10.1016/j.eswa.2016.03.024 CrossRefGoogle Scholar
- Dorninger P, Pfeifer N (2008) A comprehensive automated 3D approach for building extraction, reconstruction, and regularization from airborne laser scanning point clouds. Sensors 8:7323–7343. https://doi.org/10.3390/s8117323 CrossRefGoogle Scholar
- Felzenszwalb PF, Huttenlocher DP (2004) Efficient graph-based image segmentation. Int J Comp Vision 59(2):167–181. https://doi.org/10.1023/B:VISI.0000022288.19776.77 CrossRefGoogle Scholar
- Gamba P, Houshmand B (2000) Digital surface models and building extraction: a comparison of IFSAR and LIDAR data. IEEE Trans Geosci Remote Sens 38(4):1959–1968. https://doi.org/10.1109/36.851777 CrossRefGoogle Scholar
- Ghaffarian S, Ghaffarian S (2014) Automatic building detection based on purposive FastICA (PFICA) algorithm using monocular high resolution google earth images. ISPRS J Photogramm Remote Sens 97:152–159. https://doi.org/10.1016/j.isprsjprs.2014.08.017 CrossRefGoogle Scholar
- Girshick R (2015) Fast R-CNN. In: Proceeding of IEEE conference on computer vision and pattern recognition (CVPR2014). IEEE, Santiago, Chile, pp 1440–1448. https://doi.org/10.1109/ICCV.2015.169
- Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR2014). IEEE, Columbus, Ohio, pp 580–587. https://doi.org/10.1109/CVPR.2014.81
- Girshick R, Donahue J, Darrell T, Malik J (2015) Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans Pattern Anal Mach Intell 38(1):142–158. https://doi.org/10.1109/TPAMI.2015.2437384 CrossRefGoogle Scholar
- Guo L, Chehata N, Mallet C, Boukir S (2011) Relevance of airborne lidar and multispectral image data for urban scene classification using random forests. ISPRS J Photogramm Remote Sens 66:56–66. https://doi.org/10.1016/j.isprsjprs.2010.08.007 CrossRefGoogle Scholar
- Haala N, Brenner C (1999) Extraction of buildings and trees in urban environments. ISPRS J Photogramm Remote Sens 54:130–137. https://doi.org/10.1016/S0924-2716(99)00010-6 CrossRefGoogle Scholar
- He K, Gkioxari G, Dollár P, Girshick R (2017) Mask R-CNN. In: Proceedings of IEEE international conference on computer vision (ICCV2017). IEEE, Venice, Italy, pp 2980–2988. https://doi.org/10.1109/ICCV.2017.322
- Hermosilla T, Ruiz LA, Recio JA, Estornell J (2011) Evaluation of automatic building detection approaches combining high resolution images and lidar data. Remote Sens 3:1188–1210. https://doi.org/10.3390/rs3061188 CrossRefGoogle Scholar
- Höfle B, Mücke W, Dutter M, Rutzinger M (2009) Detection of building regions using airborne lidar: a new combination of raster and point cloud based GIS methods. Proc Geoinformatics Forum Salzburg. pp 66–75. https://ezproxy2.utwente.nl/login?url=https://webapps.itc.utwente.nl/library/2009/chap/rutzinger_det.pdf. Accessed 15 Jan 2017
- Huang J, Rathod V, Sun C et al (2017) Speed/accuracy trade-offs for modern convolutional object detectors. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR2017). IEEE, Honolulu, HI, USA, pp 3296–3297. https://doi.org/10.1109/CVPR.2017.351
- ISPRS (2012) Web site of the ISPRS test project on urban classification and 3D building reconstruction. Available at http://www2.isprs.org/commissions/comm3/wg4/detection-and-reconstruction.html. Accessed 17 Sep. 2016
- Izadi M, Saeedi P (2012) Three-dimensional polygonal building model estimation from single satellite images. Geosci Remote Sens IEEE Trans 50(6):2254–2272. https://doi.org/10.1109/TGRS.2011.2172995 CrossRefGoogle Scholar
- Kabolizade M, Ebadi H, Ahmadi S (2010) An improved snake model for automatic extraction of buildings from urban aerial images and lidar data. Comput Environ Urban Syst 34:435–441. https://doi.org/10.1016/j.compenvurbsys.2010.04.006 CrossRefGoogle Scholar
- Karantzalos K, Koutsourakis P, Kalisperakis I, Grammatikopoulos L (2015) Model-based building detection from low-cost optical sensors onboard unmanned aerial vehicles. Int Arch Photogramm Remote Sens Spat Inf Sci 40:293–297. https://doi.org/10.5194/isprsarchives-xl-1-w4-293-2015 CrossRefGoogle Scholar
- Khurana M, Wadhwa V (2015) Automatic building detection using modified grab cut algorithm from high resolution satellite image. Int J Adv Res Comput Commun Eng 4(8):158–164. https://doi.org/10.17148/IJARCCE.2015.4833 CrossRefGoogle Scholar
- Kim K, Shan J (2011) Building roof modeling from airborne laser scanning data based on level set approach. ISPRS J Photogramm Remote Sens 66:484–497. https://doi.org/10.1016/j.isprsjprs.2011.02.007 CrossRefGoogle Scholar
- Krizhevsky A, Sutskever I, Geoffrey EH (2012) ImageNet classification with deep convolutional neural networks. Proc 25th Int Conf Neural Infor Proc Syst, NIPS’12 1:1097–1105. https://doi.org/10.1109/5.726791 CrossRefGoogle Scholar
- LeCun Y, Bengio Y (1995) Convolutional networks for images, speech, and time-series. In: Arbib MA (ed) The handbook of brain theory and neural networks. MIT PressGoogle Scholar
- Li E, Femiani J, Xu S et al (2015) Robust rooftop extraction from visible band images using higher order CRF. IEEE Trans Geosci Remote Sens 53(8):4483–4495. https://doi.org/10.1109/TGRS.2015.2400462 CrossRefGoogle Scholar
- Liu T, Fang S, Zhao Y et al (2015) Implementation of training convolutional neural networks. arXiv:1506.01195
- Liu W, Anguelov D, Erhan D et al (2016) SSD: single shot multibox detector. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision—ECCV 2016. ECCV 2016. Lecture notes in computer science. Springer, ChamGoogle Scholar
- Maas HG, Vosselman G (1999) Two algorithms for extracting building models from raw laser altimetry data. ISPRS J Photogramm Remote Sens 54:153–163. https://doi.org/10.1016/S0924-2716(99)00004-0 CrossRefGoogle Scholar
- Maitra DS, Bhattacharya U, Parui SK (2015) CNN based common approach to handwritten character recognition of multiple scripts. In: Proceedings of international conference on document analysis recognition (ICDAR2015). IEEE, Tunis, Tunisia, pp 1021–1025. https://doi.org/10.1109/icdar.2015.7333916
- Makantasis K, Karantzalos K, Doulamis A, Doulamis N (2015) Deep supervised learning for hyperspectral data classification through convolutional neural networks. IEEE Int Geosci Remote Sens Symp 2015:4959–4962. https://doi.org/10.1109/IGARSS.2015.7326945 CrossRefGoogle Scholar
- Maltezos E, Ioannidis C (2015) Automatic detection of building points from lidar and dense image matching point clouds. ISPRS Ann Photogramm Remote Sens Spat Inf Sci II-3/W5:33–40. https://doi.org/10.5194/isprsannals-ii-3-w5-33-2015 CrossRefGoogle Scholar
- Manno-Kovacs A, Ok AO (2015) Building detection from monocular vhr images by integrated urban area knowledge. IEEE Geosci Remote Sens Lett 12(10):2140–2144. https://doi.org/10.1109/LGRS.2015.2452962 CrossRefGoogle Scholar
- McGlone JC, Shufelt JA (1994) Projective and object space geometry for monocular building extraction. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR94). IEEE, Seattle, WA, USA, pp 54–61. https://doi.org/10.1109/CVPR.1994.323810
- McKeown DM, Bulwinkle T, Cochran S, Harvey W, McGlone C, Shufelt JA (2000) Performance evaluation for automatic feature extraction. Int Arch Photogramm Remote Sens Spat Inf Sci XXXII I(B2):379–394Google Scholar
- Nalani HA (2014) Automatic reconstruction of urban objects from mobile laser scanner data. Dissertation for awarding the academic degree Doktor-Ingenieur. Dresden, GermanyGoogle Scholar
- Ok AO, Senaras C, Yuksel B (2013) Automated detection of arbitrarily shaped buildings in complex environments from monocular VHR optical satellite imagery. IEEE Trans Geosci Remote Sens 51(3):1701–1717. https://doi.org/10.1109/TGRS.2012.2207123 CrossRefGoogle Scholar
- Oztimur Karadag O, Senaras C, Yarman Vural FT (2015) Segmentation fusion for building detection using domain-specific information. IEEE J Sel Top Appl Earth Obs Remote Sens 8(7):3305–3315. https://doi.org/10.1109/JSTARS.2015.2403617 CrossRefGoogle Scholar
- Phung SL, Bouzerdoum A (2009) Matlab library for convolutional neural networks. Technical report, ICT research institute, visual and audio signal processing lab, university of Wollongong. https://www.uow.edu.au/~phung. Accessed 15 Aug 2016
- Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR2016). IEEE, Las Vegas, NV, USA, pp 779–788. https://doi.org/10.1109/CVPR.2016.91
- Ren S, He K, Girshick R, Sun J (2016) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39:1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
- Rottensteiner F, Trinder J, Clode S, Kubik K (2007) Building detection by fusion of airborne laser scanner data and multi-spectral images: performance evaluation and sensitivity analysis. ISPRS J Photogramm Remote Sens 62:135–149. https://doi.org/10.1016/j.isprsjprs.2007.03.001 CrossRefGoogle Scholar
- Saito S, Aoki Y (2015) Building and road detection from large aerial imagery. Proc. SPIE 9405, Image processing: machine vision applications VIII:94050K. https://doi.org/10.1117/12.2083273 CrossRefGoogle Scholar
- Sampath A, Shan J (2010) Segmentation and reconstruction of polyhedral building roofs from aerial lidar point clouds. IEEE Trans Geosci Remote Sens 48(3):1554–1567. https://doi.org/10.1109/TGRS.2009.2030180 CrossRefGoogle Scholar
- Schmidhuber J (2015) Deep Learning in neural networks: an overview. Neural Networks 61:85–117. https://doi.org/10.1016/j.neunet.2014.09.003 CrossRefGoogle Scholar
- Senaras C, Vural FTY (2016) A self-supervised decision fusion framework for building detection. IEEE J Sel Top Appl Earth Obs Remote Sens 9(5):1780–1791. https://doi.org/10.1109/JSTARS.2015.2463118 CrossRefGoogle Scholar
- Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
- Singh G, Jouppi M, Zhang Z, Zakhor A (2015) Shadow based building extraction from single satellite image. Comput Imaging XIII:94010F. https://doi.org/10.1117/12.2083500 CrossRefGoogle Scholar
- Tuia D, Flamary R, Courty N (2015) Multiclass feature learning for hyperspectral image classification: sparse and hierarchical solutions. ISPRS J Photogramm Remote Sens 105:272–285. https://doi.org/10.1016/j.isprsjprs.2015.01.006 CrossRefGoogle Scholar
- Uijlings JRR, Van De Sande KEA, Gevers T, Smeulders AWM (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171. https://doi.org/10.1007/s11263-013-0620-5 CrossRefGoogle Scholar
- Vakalopoulou M, Karantzalos K, Komodakis N, Paragios N (2015) Building detection in very high resolution multispectral data with deep learning features. In: Proceedings of IEEE international geoscience remote sensing symposium (IGARSS2015). IEEE, Milan, Italy, pp 1873–1876. https://doi.org/10.1109/igarss.2015.7326158
- Vedaldi A, Lenc K (2015) MatConvNet-Convolutional neural networks for MATLAB. In: Proceedings of the ACM international conference on multimedia. ACM, Brisbane, Australia, pp 689–692. https://doi.org/10.1145/2733373.2807412
- Von Gioi RG, Jakubowicz J, Morel J-M, Randall G (2010) LSD: a fast line segment detector with a false detection control. IEEE Trans Pattern Anal Mach Intell 32(4):722–732. https://doi.org/10.1109/TPAMI.2008.300 CrossRefGoogle Scholar
- Vu TT, Yamazaki F, Matsuoka M (2009) Multi-scale solution for building extraction from lidar and image data. Int J Appl Earth Obs Geoinf 11(4):281–289. https://doi.org/10.1016/j.jag.2009.03.005 CrossRefGoogle Scholar
- Yu B, Liu H, Wu J et al (2010) Automated derivation of urban building density information using airborne lidar data and object-based method. Landsc Urban Plan 98(3–4):210–219. https://doi.org/10.1016/j.landurbplan.2010.08.004 CrossRefGoogle Scholar
- Yuan J (2016) Automatic building extraction in aerial scenes using convolutional networks. http://jiangyeyuan.com/bldgExt.html. arXiv:1602.06564. Accessed 15 Jan 2017
- Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. Comput vision–ECCV 2014 8689:818–833. https://doi.org/10.1007/978-3-319-10590-1_53 CrossRefGoogle Scholar
- Zhang K, Yan J, Chen SC (2006) Automatic construction of building footpoints from airborne lidar data. IEEE Trans Geosci Remote Sens 44(9):2523–2533. https://doi.org/10.1109/TGRS.2006.874137 CrossRefGoogle Scholar
- Zhang Y, Sohn K, Villegas R et al (2015) Improving object detection with deep convolutional networks via bayesian optimization and structured prediction. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR2015), Boston, MA, USA, pp 249–258. https://doi.org/10.1109/cvpr.2015.7298621
- Zhang Q, Wang Y, Liu Q et al (2016) CNN based suburban building detection using monocular high resolution google earth images. In: Proceedings of IEEE international geoscience remote sensing symposium (IGARSS2016). IEEE, Beijing, China, pp 661–664. https://doi.org/10.1109/IGARSS.2016.7729166
- Zuo Z, Wang G (2014) Learning discriminative hierarchical features for object recognition. IEEE Signal Process Lett 21(9):1159–1163. https://doi.org/10.1109/LSP.2014.2298888 CrossRefGoogle Scholar