A Hierarchical and Contextual Model for Aerial Image Parsing

Porway, Jake; Wang, Qiongchen; Zhu, Song Chun

doi:10.1007/s11263-009-0306-1

A Hierarchical and Contextual Model for Aerial Image Parsing

Open access
Published: 03 November 2009

Volume 88, pages 254–283, (2010)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Computer Vision Aims and scope Submit manuscript

A Hierarchical and Contextual Model for Aerial Image Parsing

Download PDF

Jake Porway¹,
Qiongchen Wang^1,2 &
Song Chun Zhu^1,2

2396 Accesses
50 Citations
Explore all metrics

Abstract

In this paper we present a hierarchical and contextual model for aerial image understanding. Our model organizes objects (cars, roofs, roads, trees, parking lots) in aerial scenes into hierarchical groups whose appearances and configurations are determined by statistical constraints (e.g. relative position, relative scale, etc.). Our hierarchy is a non-recursive grammar for objects in aerial images comprised of layers of nodes that can each decompose into a number of different configurations. This allows us to generate and recognize a vast number of scenes with relatively few rules. We present a minimax entropy framework for learning the statistical constraints between objects and show that this learned context allows us to rule out unlikely scene configurations and hallucinate undetected objects during inference. A similar algorithm was proposed for texture synthesis (Zhu et al. in Int. J. Comput. Vis. 2:107–126, 1998) but didn’t incorporate hierarchical information. We use a range of different bottom-up detectors (AdaBoost, TextonBoost, Compositional Boosting (Freund and Schapire in J. Comput. Syst. Sci. 55, 1997; Shotton et al. in Proceedings of the European Conference on Computer Vision, pp. 1–15, 2006; Wu et al. in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, 2007)) to propose locations of objects in new aerial images and employ a cluster sampling algorithm (C4 (Porway and Zhu, 2009)) to choose the subset of detections that best explains the image according to our learned prior model. The C4 algorithm can quickly and efficiently switch between alternate competing sub-solutions, for example whether an image patch is better explained by a parking lot with cars or by a building with vents. We also show that our model can predict the locations of objects our detectors missed. We conclude by presenting parsed aerial images and experimental results showing that our cluster sampling and top-down prediction algorithms use the learned contextual cues from our model to improve detection results over traditional bottom-up detectors alone.

Article PDF

Synthesizing 2D Ground Images for Maps Creation and Detecting Texture Patterns

Evidential Grammars for Image Interpretation – Application to Multimodal Traffic Scene Understanding

Spatial Pattern Templates for Recognition of Objects with Regular Structure

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Barbu, A., & Zhu, S.-C. (2005). Generalizing Swendsen-Wang to sampling arbitrary posterior probabilities. Pattern Analysis and Machine Intelligence, 27, 1239–1253.
Article Google Scholar
Berg, A., Grabler, F., & Malik, J. (2007). Parsing images of architectural scenes. In IEEE 11th international conference on computer vision.
Chen, H., Xu, Z., Liu, Z., & Zhu, S.-C. (2006). Composite templates for cloth modeling and sketching. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. 1, pp. 943–950).
Chi, Z., & Geman, S. (1998). Estimation of probabilistic context-free grammars. Computational Linguistics, 24(2).
Felzenszwalb, P., & Huttenlocher, D. (2005). Pictorial structures for object recognition. International Journal of Computer Vision, 61(1), 55–79.
Article Google Scholar
Fischler, M., & Elschlager, R. (1973). The representation and matching of pictorial structures. IEEE Transactions on Computers, 22(1), 67–92.
Article Google Scholar
Freund, Y., & Schapire, R. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55.
Fu, K. S. (1981). Syntactic pattern recognition and applications. New York: Prentice Hall.
Google Scholar
Han, F., & Zhu, S.-C. (2005). Bottom-up and top-down image parsing by attribute graph grammar. In Proceedings of the international conference on computer vision (Vol. 2).
Hinz, S., & Baumgartner, A. (2000). Road extraction in urban areas supported by context objects. International Archives of Photogrammetry and Remote Sensing, 33.
Jin, Y., & Geman, S. (2006). Context and hierarchy in a probabilistic image model. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. 2, pp. 2145–2152).
Johnson, M., Geman, S., Canon, S., Chi, Z., & Riezler, S. (1999). Estimators for stochastic unification-based grammars. In Proceedings ACL’99, Maryland.
Keselman, Y., & Dickinson, S. (2001). Generic model abstraction from examples. Pattern Analysis and Machine Intelligence, 27, 1141–1156.
Article Google Scholar
Li, F.-F., & Perona, P. (2005). A bayesian hierarchical model for learning natural scene categories. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. 2, pp. 524–531).
Li, Y., Atmosukarto, I., Kobashi, M., Yuen, J., & Shapiro, L. (2005). Object and event recognition for aerial surveillance. In SPIE—the international society for optical engineering.
Maloof, M. A., Langley, P., Binford, T. O., Nevatia, R., & Sage, S. (2003). Improved rooftop detection in aerial images with machine learning. Machine Learning.
Matsuyama, T., & Hang, V. (1990). Sigma: A framework for image understanding integration of bottom-up and top-down analyses. New York: Plenum.
Google Scholar
Moissinac, H., Matre, H., & Bloch, I. (1994). Urban aerial image understanding using symbolic data. In Image and signal processing for remote sensing, proc. SPIE.
Nicolas, B., Viglino, J., & Cocquerez, J. (2000). Knowledge based system for the automatic extraction of road intersections from aerial images. International Archives of Photogrammetry and Remote Sensing.
Ohta, Y. (1985). Knowledge-based interpretation of outdoor natural color scenes. London: Pitman.
Google Scholar
Porway, J., & Zhu, S. C. (2009). C4: Stochastic inference on graphical models with positive and negative edges for rapidly exploring competing solutions (Technical Report).
Porway, J., Wang, K., Yao, B., & Zhu, S.-C. (2008). A hierarchical and contextual model for aerial image understanding. In Proceedings of the IEEE conference on computer vision and pattern recognition.
Shotton, J., Winn, J., Rother, C., & Criminisi, A. (2006). Textonboost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In Proceedings of the European conference on computer vision (pp. 1–15).
Siddiqi, K., Shokoufandeh, A., Dickinson, S., & Zucker, S. W. (1999). Shock graphs and shape matching. International Journal of Computer Vision, 35(1), 13–32.
Article Google Scholar
Singhal, A., Luo, J., & Zhu, W. (2003). Probabilistic spatial context models for scene content understanding. In IEEE computer society conference on computer vision and pattern recognition (Vol. 1).
Sivic, J., Russell, B., Efros, A., Zisserman, A., & Freeman, W. (2005). Discovering objects and their location in images. In Tenth IEEE international conference on computer vision.
Sudderth, E. B., Torralba, A., Freeman, W. T., & Willsky, A. S. (2005). Describing visual scenes using transformed Dirichlet processes. In Neural information processing systems.
Swendsen, R., & Wang, J. (1987). Nonuniversal critical dynamics in Monte Carlo simulations. Physical Review Letters.
Todorovic, S., & Ahuja, N. (2006). Extracting subimages of an unknown category from a set of images. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. 1, pp. 927–934).
Tu, Z., & Zhu, S.-C. (2002). Image segmentation by data-driven Markov chain Monte Carlo. IEEE Transactions on Pattern Analysis and Machine Learning, 24(5), 657–673.
Article Google Scholar
Ullman, S., Sali, E., & Vidal, M. (2001). A fragment-based approach to object representation and classification. In Proceedings of the 4th international workshop on visual form.
Vestri, C., & Devernay, F. (2001). Using robust methods for automatic extraction of buildings. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. 1).
Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 511–518).
Wainwright, M., & Jordan, M. (2008). Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 1(1), 1–305.
Article Google Scholar
Weber, M., Welling, M., & Perona, P. (2000). Towards automatic discovery of object categories. In Proceedings of the IEEE conference on computer vision and pattern recognition (Vol. 2, pp. 101–108).
Wei, L., & Prinet, V. (2005). Building detection from high-resolution satellite image using probability model. In Geoscience and remote sensing symposium, IGARSS (pp. 25–29).
Wu, T. F., Xia, G. S., & Zhu, S.-C. (2007). Compositional boosting for computing hierarchical image structures. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–8).
Yao, B., Yang, X., & Zhu, S.-C. (2007). Introduction to a large scale general purpose groundtruth dataset: methodology, annotation tool, and benchmarks. Energy Minimization Methods in Computer Vision and Pattern Recognition, 4697, 169–183.
Article Google Scholar
Zhao, T., & Nevatia, R. (2001). Car detection in low resolution aerial image. In IEEE international conference on computer vision (Vol. 1).
Zhu, S.-C., & Mumford, D. (2006). A stochastic grammar of images. Foundation and Trends in Computer Graphics and Vision, 2(4), 259–362.
Article Google Scholar
Zhu, S.-C., Wu, Y.-N., & Mumford, D. (1998). Frame: Filters, random fields, and minimax entropy towards a unified theory for texture modeling. International Journal of Computer Vision, 2, 107–126.
Article Google Scholar
Zhu, L., Lin, C., Huang, H., Chen, Y., & Yuille, A. (2008). Unsupervised structure learning: Hierarchical recursive composition, suspicious coincidence and competitive exclusion. In Proceedings of the 10th European conference on computer vision: Part II.

Download references

Author information

Authors and Affiliations

Department of Statistics, University of California, Los Angeles, USA
Jake Porway, Qiongchen Wang & Song Chun Zhu
Lotus Hill Institute for Computer Vision and Information Science, Ezhou, China
Qiongchen Wang & Song Chun Zhu

Authors

Jake Porway
View author publications
You can also search for this author in PubMed Google Scholar
Qiongchen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Song Chun Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jake Porway.

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and permissions

About this article

Cite this article

Porway, J., Wang, Q. & Zhu, S.C. A Hierarchical and Contextual Model for Aerial Image Parsing. Int J Comput Vis 88, 254–283 (2010). https://doi.org/10.1007/s11263-009-0306-1

Download citation

Received: 28 July 2008
Accepted: 20 October 2009
Published: 03 November 2009
Issue Date: June 2010
DOI: https://doi.org/10.1007/s11263-009-0306-1

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A Hierarchical and Contextual Model for Aerial Image Parsing

Abstract

Article PDF

Similar content being viewed by others

Synthesizing 2D Ground Images for Maps Creation and Detecting Texture Patterns

Evidential Grammars for Image Interpretation – Application to Multimodal Traffic Scene Understanding

Spatial Pattern Templates for Recognition of Objects with Regular Structure

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Navigation

A Hierarchical and Contextual Model for Aerial Image Parsing

Abstract

Article PDF

Similar content being viewed by others

Synthesizing 2D Ground Images for Maps Creation and Detecting Texture Patterns

Evidential Grammars for Image Interpretation – Application to Multimodal Traffic Scene Understanding

Spatial Pattern Templates for Recognition of Objects with Regular Structure

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation