An applet for the Gabor similarity scaling of the differences between complex stimuli
- First Online:
- 233 Downloads
It is widely accepted that after the first cortical visual area, V1, a series of stages achieves a representation of complex shapes, such as faces and objects, so that they can be understood and recognized. A major challenge for the study of complex shape perception has been the lack of a principled basis for scaling of the physical differences between stimuli so that their similarity can be specified, unconfounded by early-stage differences. Without the specification of such similarities, it is difficult to make sound inferences about the contributions of later stages to neural activity or psychophysical performance. A Web-based app is described that is based on the Malsburg Gabor-jet model (Lades et al., 1993), which allows easy specification of the V1 similarity of pairs of stimuli, no matter how intricate. The model predicts the psycho physical discriminability of metrically varying faces and complex blobs almost perfectly (Yue, Biederman, Mangini, von der Malsburg, & Amir, 2012), and serves as the input stage of a large family of contemporary neurocomputational models of vision.
Keywords2-D shape and form Similarity Face perception
Consider the problem of determining whether people or monkeys are more sensitive to differences in nonaccidental properties (NAPs)—whether a contour is straight or curved, for example—than to differences in metric properties (MPs)—such as differences in degrees of curvature. If we assume that the sensitivity to differences in NAPs arises at a stage in the ventral pathway later than V1, how can the physical properties of the stimuli be selected in a principled manner, so that the comparisons are not confounded with differences in V1 activation? The same methodological problem arises if an investigator wishes to determine whether observers are more sensitive to differences in facial expression than to differences in identity (or sex, or orientation in depth, etc.). This problem arises not only in psychophysical scaling of stimuli, but also with studies designed to more directly reflect the underlying neural correlates, such as fMRI fast-adaptation designs and single-unit recordings. It can be argued that this problem of the scaling of shape similarity had been a major reason why, despite shape being the major input into visual cognition, the rigorous study of shape perception had clearly lagged the study of other perceptual attributes, such as color, motion, or stereo.
The value of an intuitive implementation of the Gabor-jet model
Despite the utility in employing such a scaling system, the Gabor-jet model is somewhat mathematically dense and cumbersome to explain to the uninitiated, thus diminishing its accessibility. Here, we introduce a Web-based applet designed to provide an engaging, graphically oriented guided tour of the model. The applet allows users to upload their own images, observe the transformations and computations made by the algorithm, customize the visualization of different processes, and retrieve a ranking of dissimilarity values for pairs of images. Such interactive experiences can be valuable in fostering an understanding of otherwise challenging methodologies, rendering this tool accessible to a broad range of users. Since almost all contemporary neurocomputational models of vision assume a form of Gabor filtering as their input stage, an understanding of the Gabor-jet model also provides an introduction to the first stage of the larger family of computer vision approaches, including GIST (Oliva & Torralba, 2001), HMAX (Riesenhuber & Poggio, 1999), and recently popular convolution neural network (CNN) approaches (e.g., Krizhevsky, Sutskever, & Hinton, 2012). Of course, frivolous applications can be enjoyed. When Suri, the daughter of Tom Cruise and Katie Holmes, was a toddler, a (much too) lively debate raged as to which parent Suri most resembled. People Magazine requested that author I.B. weigh in with the model’s choice (http://celebritybabies.people.com/2006/09/19/who_does_suri_r/).
An applet for the Gabor-jet model
The Gabor-jet model (Lades et al., 1993) is designed to capture the response properties of simple cells in V1 hypercolumns, whose receptive field spatial profiles can be described by two-dimensional Gabor functions (De Valois & De Valois, 1990; Jones & Palmer, 1987; Ringach, 2002). Gabor modeling of cell tuning in early visual cortex has also enjoyed great success in other computational models of visual processing (Kay, Naselaris, Prenger, & Gallant, 2008; Serre & Riesenhuber, 2004). By representing image inputs as feature vectors derived from convolution with Gabor filters, the Gabor-jet model can be used to compute a single value that represents the similarity of two images with respect to V1 cell filtering. These values have been shown to almost perfectly predict psychophysical similarity in discriminating metrically varying, complex visual stimuli such as faces and blobs (resembling teeth; Yue, Biederman, Mangini, von der Malsburg, & Amir, 2012). Under the assumption that V1 captures metric variation, sensitivity to the “qualitative” differences between complex stimuli, such as nonaccidental (i.e., viewpoint-invariant) properties (NAPs) versus metric (i.e., viewpoint-dependent) properties (MPs), or to differences in facial identity versus expression (which are presumably rendered explicit in later stages) can be more rigorously evaluated.
Prior to the implementation of the Gabor-jet model, a common scaling technique was to examine differences in pixel energy between pairs of stimuli. Of course, this method neglects the information present in orientation and scale. Yue et al. (2012) provided an example in which relatively slight differences in the orientations of two straight contours yielded pixel energy differences that were equivalent to the differences between a pair of contours, one straight and the other curved, which would have been much more readily discriminated. Perhaps the most general and well-documented effect in shape perception is that differences in NAPs of shape, such as straight versus curved, are much more readily discriminated than MPs, such as differences in degree of curvature (e.g., Amir, Biederman, & Hayworth, 2012). However, this inference could not be made without a scaling that equated the NAP and MP differences according to early stage filtering. Otherwise, one could not know how much a particular difference in curvature could be equated to a NAP differences between straight and curved.
Gabor-like filters develop as part of the linear decomposition of natural images (Olshausen & Field, 1996) so the Gabor-like filtering characteristic of V1 simple cells is not unexpected. These basis sets emerge in the first layer of leading CNNs for image recognition (e.g., Krizhevsky et al., 2012) or are simply assumed as in the GIST model of Oliva and Torralba (2001), that adopts the use of multi-scale, multi-orientation Gabor filters to create a sparse description of image locations in much the same way that each jet in the Gabor-jet model is composed of a set of Gabor filters at different scales and orientations that share a common center in the image space. Similarly, the first layer of HMAX (Riesenhuber & Poggio, 1999) convolves image pixels with oriented Gabor filters before pooling responses (and then repeating those operations). So although the Gabor-jet model was developed almost a quarter of a century ago, its offering of an explicit measure of V1-based image similarity is still relevant given the widespread incorporation of Gabor filtering as the input stage in contemporary neurocomputational models of vision.
The value of Gabor-jet scaling
Following a brief set of upload instructions, the user is prompted to upload three images. JPG and PNG file formats may be uploaded by using a file-selector button or the camera application on most mobile devices. If necessary, images may be resized by the applet to 256 × 256 pixels. Although resizing may disrupt the aspect ratio if the uploaded image dimensions are nonsquare, this approach may be preferable to cropping the images (which deletes information altogether) or rescaling them such that the longest side is 256 pixels (which could pad the image space with uninformative values and introduce artificial boundaries).
An aspect of the representation of faces, readily appreciated from a perusal of Fig. 4, is that we can often distinguish two similar faces without being able to articulate just what it is about the faces that differs. That is, the differences between similar faces are ineffable (Biederman & Kalocsai, 1997). The ineffability of discriminability seems to be specific to faces and rarely characterizes perceptible differences among objects (which are typically coded by their edges). A possible reason for this is that faces, but not objects, may retain aspects of the original spatial filtering (Yue et al., 2006)—activation of Gabor-like kernels, in the present case—and these kernels are not directly available to consciousness. The faces in Fig. 4 vary in the vertical distances between the eyes, nose, and mouth and in the heights of the cheekbones. The horizontally oriented Kernel C is directly affected by the variation in vertical distance, and we can see in Fig. 8 that it strongly signals that Face 3 differs from Faces 1 and 2. The output of the kernels can thus signal differences without an awareness of the particular kernels signaling that difference.
When learning about visual neuroscience, a typical student is provided a detailed explanation of low-level vision—beginning with retinal optics and extending to simple and complex cells in V1—before jumping to topics such as object or face recognition. Often neglected, however, is an explicit account of the ways in which the functions of the early visual system give rise to representations computed in later stages of the visual pathway. Textbooks often reflect this chasm between understandings of low-level and high-level visual processing. To this end, educational tools that help demystify these processes may have significant didactic potential. The Gabor-jet model Web application, for example, illustrates the process by which V1 simple cell activation is used to differentiate similar metrically varying objects such as faces. The psychophysical similarity of faces, as well as of objects that vary metrically, such as the blobs in Fig. 3, can be predicted from the Gabor-jet model (Yue et al., 2012). However, only the neural coding of faces—but not blobs—retains aspects of the initial spatial coding, in that their release from adaptation in the fusiform face area depends on a change in the specific combinations of scale and spatial frequency values (Yue et al., 2006). Xu, Biederman, and Shah (2014) showed how face configural effects, which had previously defied neurocomputational explanation, could readily be derived from the action at a distance afforded by kernels with large, overlapping receptive fields. Furthermore, an understanding of V1-like convolution algorithms would provide the user with a strong foundation from which to understand more recent and intricate algorithms.
Above and beyond its didactic value, the Gabor-jet Web application has methodological utility. Researchers can take advantage of the interface of the simplified Web model, designed to be user-friendly, to test stimuli before employing the full MATLAB model available at http://geon.usc.edu/GWTgrid_simple.m. Unlike the Web app described here, the full-featured MATLAB code offers a wider range of parameters and lends itself more readily to batch processing, which is often necessary for stimulus scaling. Thus, the Web application can be considered an introduction to the Gabor-jet model that may encourage more frequent use of this valuable scaling system.
This research was supported by NSF Grant No. BCS 0617699 and by the Dornsife Research Fund. Our 2-D FFT code utilizes a 1-D FFT implementation by Nayuki, which can be found at https://www.nayuki.io/page/free-small-fft-in-multiple-languages. Our code for resizing images, along with a fix for iPhone camera images, uses an image-rendering library by Shinichi Tomita, https://github.com/stomita/ios-imagefile-megapixel. Line charts were created with Chart.js, www.chartjs.org/.
- De Valois, R. L., & De Valois, K. K. (1990). Spatial vision. New York, NY: Oxford University Press.Google Scholar
- Günther, M., Haufe, D., & Würtz, R. (2012). Face recognition with disparity corrected Gabor phase differences. In A. E. P. Villa, W. Duch, P. Érdi, F. Masulli, & G. Palm (Eds), Artificial neural networks and machine learning—ICANN 2012, Part 1 (Lecture Notes in Computer Science Vol. 7552, pp. 411–418). Berlin, Germany: Springer.Google Scholar
- Jahanbin, S., Choi, H., Jahanbin, R., & Bovik, C. A. (2008). Automated facial feature detection and face recognition using Gabor features on range and portrait images. In Proceedings of the 15th IEEE International Conference on Image Processing (pp. 2768–2771). Piscataway, NJ: IEEE Press.Google Scholar
- Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (pp. 1097–1105). Cambridge, MA: MIT Press.Google Scholar
- Margalit, E., Shah, M. P., Tjan, B. S., Biederman, I., Keller, B., & Brenner, R. (in press). The lateral occipital complex shows no net response to object familiarity. Journal of Vision.Google Scholar
- Serre, T., & Riesenhuber, M. (2004). Realistic modeling of simple and complex cell tuning in the HMAX model, and implications for invariant object recognition in cortex (Technical Report No. AI-MEMO-2004-017). Massachusetts Institute of Technology, Cambridge Computer Science and Artificial Intelligence Lab.Google Scholar
- Xu, X., & Biederman, I. (2010). Loci of the release from fMRI adaptation for changes in facial expression, identity, and viewpoint. Journal of Vision, 10(14), 36:1–13. doi:10.1167/10.14.36
- Xu, X., Biederman, I., & Shah, M. P. (2014). A neurocomputational account of the face configural effect. Journal of Vision, 14(8), 9:1–9. doi:10.1167/14.8.9