International Journal of Computer Vision

, Volume 117, Issue 3, pp 290–316 | Cite as

Learning Grammars for Architecture-Specific Facade Parsing

  • Raghudeep GaddeEmail author
  • Renaud Marlet
  • Nikos Paragios


Parsing facade images requires optimal handcrafted grammar for a given class of buildings. Such a handcrafted grammar is often designed manually by experts. In this paper, we present a novel framework to learn a compact grammar from a set of ground-truth images. To this end, parse trees of ground-truth annotated images are obtained running existing inference algorithms with a simple, very general grammar. From these parse trees, repeated subtrees are sought and merged together to share derivations and produce a grammar with fewer rules. Furthermore, unsupervised clustering is performed on these rules, so that, rules corresponding to the same complex pattern are grouped together leading to a rich compact grammar. Experimental validation and comparison with the state-of-the-art grammar-based methods on four different datasets show that the learned grammar helps in much faster convergence while producing equal or more accurate parsing results compared to handcrafted grammars as well as grammars learned by other methods. Besides, we release a new dataset of facade images following the Art-deco style and demonstrate the general applicability and extreme potential of the proposed framework.


Grammar learning Facade parsing Subtree isomorphism Clustering 



We thank Prof. Nikos Komodakis for providing the code for LP-based clustering. This work was partly carried out in IMAGINE, a joint research project between Ecole des Ponts ParisTech (ENPC) and the Scientific and Technical Centre for Building (CSTB). It was partly supported by ANR project Semapolis ANR-13-CORD-0003 and the European Research Council Starting Grant ERC-STG-259112.


  1. Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., & Susstrunk, S. (2012). SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(11), 2274–2282.CrossRefGoogle Scholar
  2. Alegre, F., & Dellaert, F. (2004). A probabilistic approach to the semantic interpretation of building facades. In CIPA international workshop on vision techniques applied to the rehabilitation of city centres (pp. 25–27).Google Scholar
  3. Benz, F., & Kötzing, T. (2013). An effective heuristic for the smallest grammar problem. In Proceedings of the 15th annual conference on genetic and evolutionary computation (pp. 487–494). ACM.Google Scholar
  4. Berg, A.C., Grabler, F., & Malik, J. (2007). Parsing images of architectural scenes. In IEEE 11th International Conference on Computer Vision, 2007 (ICCV 2007). (pp. 1–8). IEEEGoogle Scholar
  5. Bod, R. (2003). An efficient implementation of a new DOP model. In 10th Conference on European Chapter of the Association for Computational Linguistics (EACL 2003) (Vol 1, pp 19–26).Google Scholar
  6. Bod, R. (2006). An all-subtrees approach to unsupervised parsing. In 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics (ACL 2006) (pp. 865–872). Association for Computational Linguistics.Google Scholar
  7. Carrasco, R. C., Oncina, J., & Calera-Rubio, J. (2001). Stochastic inference of regular tree languages. Machine Learning, 44(1–2), 185–197.CrossRefzbMATHGoogle Scholar
  8. Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Rasala, A., & Sahai, A., et al. (2002). Approximating the smallest grammar: Kolmogorov complexity in natural models. In Proceedings of the thiry-fourth annual ACM symposium on theory of computing (STOC) (pp. 792–801). ACM.Google Scholar
  9. Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., et al. (2005). The smallest grammar problem. IEEE Transactions on Information Theory, 51(7), 2554–2576.MathSciNetCrossRefzbMATHGoogle Scholar
  10. Chi, Y., Muntz, R. R., Nijssen, S., & Kok, J. N. (2005). Frequent subtree mining - an overview. Fundamenta Informaticae, 66(1), 161–198.MathSciNetzbMATHGoogle Scholar
  11. Clark, A. (2010). Distributional learning of some context-free languages with a minimally adequate teacher. In Grammatical Inference: Theoretical Results and Applications (pp. 24–37). Springer.Google Scholar
  12. Cohen, A., Schwing, A.G., & Pollefeys, M. (2014). Efficient structured parsing of facades using dynamic programming. In 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE.Google Scholar
  13. Cohen, S. B., Stratos, K., Collins, M., Foster, D. P., & Ungar, L. (2014). Spectral learning of latent-variable pcfgs: Algorithms and sample complexity. The Journal of Machine Learning Research, 15(1), 2399–2449.MathSciNetzbMATHGoogle Scholar
  14. Cohen, S.B., Stratos, K., Collins, M., Foster, D.P., & Ungar, L.H. (2013). Experiments with spectral learning of latent-variable PCFGs. In Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL 2013) (pp. 148–157).Google Scholar
  15. Cohn, T., Blunsom, P., & Goldwater, S. (2010). Inducing tree-substitution grammars. The Journal of Machine Learning Research, 11, 3053–3096.MathSciNetzbMATHGoogle Scholar
  16. Comaniciu, D., & Meer, P. (2002). Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 603–619.CrossRefGoogle Scholar
  17. Dai, D., Prasad, M., Schmitt, G., & Van Gool, L. (2012). Learning domain knowledge for façade labelling. In Computer Vision–ECCV 2012 (pp. 710–723). Springer.Google Scholar
  18. Davies, D. L., & Bouldin, D. W. (1979). A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1(2), 224–227.CrossRefGoogle Scholar
  19. De La Higuera, C. (2005). A bibliographical study of grammatical inference. Pattern Recognition, 38(9), 1332–1348.CrossRefGoogle Scholar
  20. D’Ulizia, A., Ferri, F., & Grifoni, P. (2011). A survey of grammatical inference methods for natural language learning. Artificial Intelligence Review, 36(1), 1–27.CrossRefGoogle Scholar
  21. Dunn, J. C. (1974). Well-separated clusters and optimal fuzzy partitions. Journal of Cybernetics, 4(1), 95–104.MathSciNetCrossRefzbMATHGoogle Scholar
  22. Flajolet, P., Sipala, P., & Steyaert, J.M. (1990). Analytic variations on the common subexpression problem. In Proceedings of the 17th international colloquium on automata, languages and programming (pp. 220–234). Springer.Google Scholar
  23. Frey, B. J., & Dueck, D. (2007). Clustering by passing messages between data points. Science, 315(5814), 972–976.MathSciNetCrossRefzbMATHGoogle Scholar
  24. Gould, S. (2012). DARWIN: a framework for machine learning and computer vision research and development. The Journal of Machine Learning Research, 13(1), 3533–3537.MathSciNetzbMATHGoogle Scholar
  25. Grünwald, P. (1996). A minimum description length approach to grammar inference. In Connectionist, statistical, and symbolic approaches to learning for natural language processing, (pp. 203–216). Springer.Google Scholar
  26. De la Higuera, C. (2010). Grammatical inference: Learning automata and grammars. New York: Cambridge University Press.CrossRefzbMATHGoogle Scholar
  27. Jampani, V., Gadde, R., & Gehler, P.V. (2015). Efficient facade segmentation using auto-context. In 2015 IEEE Winter Conference on Applications of Computer Vision (WACV) (pp. 1038–1045). IEEE.Google Scholar
  28. Johnson, M., Griffiths, T., & Goldwater, S. (2007). Bayesian inference for PCFGs via Markov Chain Monte Carlo. In Human Language Technologies 2007: The conference of the north american chapter of the association for computational linguistics (pp. 139–146).Google Scholar
  29. Kass, M., Witkin, A., & Terzopoulos, D. (1988). Snakes: Active contour models. International Journal of Computer Vision, 1(4), 321–331.CrossRefzbMATHGoogle Scholar
  30. Kolmogorov, V., & Zabin, R. (2004). What energy functions can be minimized via graph cuts? IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(2), 147–159.CrossRefGoogle Scholar
  31. Komodakis, N., Paragios, N., & Tziritas, G. (2009). Clustering via lp-based stabilities. In Advances in neural information processing systems (Vol 21, pp. 865–872).Google Scholar
  32. Korč, F., & Förstner, W. (2009). eTRIMS Image Database for interpreting images of man-made scenes. Tech. Rep. TR-IGG-P-2009-01, Dept. of Photogrammetry, University of Bonn.
  33. Koutsourakis, P., Simon, L., Teboul, O., Tziritas, G., & Paragios, N. (2009). Single view reconstruction using shape grammars for urban environments. In 2009 IEEE 12th international conference on computer vision (pp. 1795–1802). IEEE.Google Scholar
  34. Koziński, M., Gadde, R., Zagoruyko S., Marlet, R., & Obozinski, G. (2015). A MRF shape prior for facade parsing with occlusions. In 2015 IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  35. Koziński, M., & Marlet, R. (2014). Image parsing with graph grammars and markov random fields. In Winter conference on applications of computer vision (WACV 2014).Google Scholar
  36. Koziński, M., Obozinski, G., & Marlet, R. (2014). Beyond procedural facade parsing: Bidirectional alignment via linear programming. In 12th asian conference on computer vision (ACCV 2014).Google Scholar
  37. Lehman, E., & Shelat, A. (2002). Approximation algorithms for grammar-based compression. In Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms (pp. 205–212). Society for Industrial and Applied Mathematics.Google Scholar
  38. Mäkinen, E. (1989). On the subtree isomorphism problem for ordered trees. Information Processing Letters, 32(5), 271–273.MathSciNetCrossRefzbMATHGoogle Scholar
  39. Manning, C.D. (2011). Part-of-speech tagging from 97% to 100%: Is it time for some linguistics? In: 12th international conference on computational linguistics and intelligent text processing (CICLing 2011) (Vol Part I, pp. 171–189). SpringerGoogle Scholar
  40. Martinović, A., Mathias, M., Weissenberg, J., & Van Gool, L. (2012). A three-layered approach to facade parsing. In ECCV 2012 computer vision (pp. 416–429). Springer.Google Scholar
  41. Martinovic, A., & Van Gool, L. (2013). Bayesian grammar learning for inverse procedural modeling. In 2013 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 201–208). IEEE.Google Scholar
  42. Martinović, A., & Van Gool, L. (2013). Earley parsing for 2D stochastic context free grammars. Tech. Rep. KUL/ESAT/PSI/1301, KU Leuven.Google Scholar
  43. Matsuzaki, T., Miyao, Y., & Tsujii, J. (2005). Probabilistic CFG with latent annotations. In 43rd annual meeting on association for computational linguistics (ACL 2005) (pp. 75–82).Google Scholar
  44. Miller, P. (1999). Strong generative capacity. Stanford: CSLI Publications.zbMATHGoogle Scholar
  45. Müller, P., Wonka, P., Haegler, S., Ulmer, A., & Van Gool, L. (2006). Procedural modeling of buildings. In ACM SIGGRAPH 2006 / ACM transactions on graphics (pp. 614–623).Google Scholar
  46. Nevill-Manning, C.G., & Witten, I.H. (1997). Identifying hierarchical structure in sequences: A linear-time algorithm. Journal of Artificial Intelligence Research 67–82Google Scholar
  47. Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryigit, G., Kübler, S., et al. (2007). Malt parser: A language-independent system for data-driven dependency parsing. Natural Language Engineering, 13(2), 95–135.Google Scholar
  48. Ok, D., Kozinski, M., Marlet, R., & Paragios, N. (2012). High-level bottom-up cues for top-down parsing of facade images. In: 2nd Joint 3DIM/3DPVT conference on 3D imaging, modeling, processing, visualization and transmission (3DIMPVT).Google Scholar
  49. Osher, S., & Paragios, N. (2003). Geometric level set methods in imaging, vision, and graphics. New York: Springer.zbMATHGoogle Scholar
  50. Parisot, S., Duffau, H., Chemouny, S., & Paragios, N. (2011). Graph based spatial position mapping of low-grade gliomas. In Medical image computing and computer-assisted intervention–MICCAI 2011 (pp. 508–515). SpringerGoogle Scholar
  51. Parisot, S., Duffau, H., Chemouny, S., & Paragios, N. (2012). Graph-based detection, segmentation & characterization of brain tumors. In 2012 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 988–995). IEEE.Google Scholar
  52. Petrov, S., & Klein, D. (2007). Improved inference for unlexicalized parsing. In Human Language Technologies 2007: The conference of the North American Chapter of the Association for computational linguistics (pp. 404–411). Association for Computational Linguistics.Google Scholar
  53. Riemenschneider, H., Krispel, U., Thaller, W., Donoser, M., Havemann, S., Fellner, D., & Bischof, H. (2012). Irregular lattices for complex shape grammar facade parsing. In 2012 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1640–1647). IEEE.Google Scholar
  54. Ripperda, N., & Brenner, C. (2006). Reconstruction of façade structures using a formal grammar and RJMCMC. In Pattern recognition (pp. 750–759). Springer.Google Scholar
  55. Rousseeuw, P. J. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53–65.CrossRefzbMATHGoogle Scholar
  56. Sakakibara, Y., & Kondo, M. (1999). GA-based learning of context-free grammars using tabular representations. In ICML (Vol 99, pp. 354–360).Google Scholar
  57. Si, Z., & Zhu, S. C. (2013). Learning and-or templates for object recognition and detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(9), 2189–2205. doi: 10.1109/TPAMI.2013.35.CrossRefGoogle Scholar
  58. Simon, L., Teboul, O., Koutsourakis, P., & Paragios, N. (2011). Random exploration of the procedural space for single-view 3D modeling of buildings. International Journal of Computer Vision, 93(2), 253–271.MathSciNetCrossRefzbMATHGoogle Scholar
  59. Simon, L., Teboul, O., Koutsourakis, P., Van Gool, L., & Paragios, N. (2012). Parameter-free/Pareto-driven procedural 3D reconstruction of buildings from ground-level sequences. In 2012 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 518–525). IEEE.Google Scholar
  60. Sutton, R. S., & Barto, A. G. (1998). Introduction to reinforcement learning. Cambridge, MA: MIT Press.Google Scholar
  61. Teboul, O. (2011). Shape grammar parsing: Application to image-based modeling. Ph.D. thesis, Ecole Centrale Paris.Google Scholar
  62. Teboul, O., Kokkinos, I., Simon, L., Koutsourakis, P., & Paragios, N. (2011). Shape grammar parsing via reinforcement learning. In 2011 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2273–2280). IEEE.Google Scholar
  63. Teboul, O., Kokkinos, I., Simon, L., Koutsourakis, P., & Paragios, N. (2013). Parsing facades with shape grammars and reinforcement learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(7), 1744–1756.CrossRefGoogle Scholar
  64. Teboul, O., Simon, L., Koutsourakis, P., & Paragios, N. (2010). Segmentation of building facades using procedural shape priors. In 2010 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3105–3112). IEEE.Google Scholar
  65. Tomita, M. (1991). Parsing 2-dimensional language. In M. Tomita (Ed.), Current issues in parsing technology (Vol. 126, pp. 277–289)., The springer international series in engineering and computer science New York: Springer.CrossRefGoogle Scholar
  66. Tu, K., Pavlovskaia, M., & Zhu, S.C. (2013). Unsupervised structure learning of stochastic and-or grammars. In Advances in neural information processing systems (pp. 1322–1330)Google Scholar
  67. Tylecek, R. (2012). The cmp facade database. Tech. rep., CTU–CMP–2012–24, Czech Technical University.Google Scholar
  68. Valiente, G. (2002). Algorithms on trees and graphs. Berlin: Springer.Google Scholar
  69. Wang, C., Komodakis, N., & Paragios, N. (2013). Markov random field modeling, inference & learning in computer vision & image understanding: A survey. Computer Vision and Image Understanding, 117(11), 1610–1627.CrossRefGoogle Scholar
  70. Weissenberg, J., Riemenschneider, H., Prasad, M., & Van Gool, L. (2013). Is there a procedural logic to architecture? In 2013 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 185–192). IEEE.Google Scholar
  71. Wonka, P., Wimmer, M., Sillion, F., & Ribarsky, W. (2003). Instant architecture. ACM Transactions on Graphics (TOG), 22(3), 669–677.CrossRefGoogle Scholar
  72. Zaki, M.J. (2002). Efficiently mining frequent trees in a forest. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 71–80). ACM.Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Raghudeep Gadde
    • 1
    Email author
  • Renaud Marlet
    • 1
  • Nikos Paragios
    • 2
  1. 1.Université Paris-Est, LIGM (UMR CNRS 8049), ENPCMarne-la-ValléeFrance
  2. 2.Center for Visual Computing, CentraleSupélec, InriaUniversit Paris-SaclayChâtenay-MalabryFrance

Personalised recommendations