Introduction

A central question in the study of visual perception is how and under what circumstances the visual system is able to separate the physically invariant reflectance of a surface from its potentially changing illumination. The intensity distribution falling on the photoreceptor array is the product of these two sources and their independent recovery is thus an ill-posed problem in that there are a myriad of combinations of illumination and reflectance that can give rise to any particular intensity distribution, and in the absence of additional information there is no way to uniquely recover the physically correct solution. Much of the current debate surrounding brightness (perceived intensity) and lightness (perceived reflectance) perception, therefore, centers on the nature of the prior assumptions and processing strategies the visual system uses to parse (correctly or incorrectly) the intensity distribution at the retina into components of surface reflectance and illumination.

Blakeslee and McCourt (1999) developed the Oriented Difference of Gaussians (ODOG) model to assess the degree to which early visual processes sufficed to account for brightness (perceived intensity) in a set of canonical stimuli which included the White effect stimulus (White, 1979, 1981; White & White, 1985), the classical simultaneous brightness contrast (SBC) stimulus (Heinemann, 1972), and the grating induction (GI) stimulus (Blakeslee & McCourt, 1997; Foley & McCourt, 1985; McCourt, 1982, 1994; McCourt & Blakeslee, 1994; McCourt & Foley, 1985), including the variations introduced by Zaidi (1989). The defining features of the ODOG model are characteristics exhibited at early stages of cortical visual processing, e.g., spatial frequency selectivity, orientation selectivity, and contrast gain control. The ODOG model can account for the modulation in the strength of the White effect (Blakeslee & McCourt, 2004) and GI (Blakeslee & McCourt, 2011) with changes in the spatial frequency of the inducing gratings. The ODOG model has also been shown to account for brightness perception in a wide variety of additional displays including the Wertheimer-Benary Cross stimulus (Benary, 1924; Blakeslee & McCourt, 2001, 2003), the Hermann Grid stimulus (Blakeslee & McCourt, 2003), the Gelb Staircase stimulus (Blakeslee, Reetz, & McCourt, 2009; Cataliotti & Gilchrist, 1995), Howe's variations on White's stimulus (Blakeslee et al., 2005; Howe, 2001), Todorovic's (1997) and Williams, McCoy, & Purves’ (1998) variations on the SBC stimulus (Blakeslee & McCourt, 1999; 2012), the checkerboard induction stimulus (Blakeslee & McCourt, 2004; DeValois & DeValois, 1988), the shifted White stimulus (Blakeslee & McCourt, 2004; White, 1981), Adelson's Checker-Shadow stimulus (Adelson, 1993; Blakeslee & McCourt, 2012), Adelson's Corrugated Mondrian stimulus (Adelson, 1993; Blakeslee & McCourt, 2003) including Todorovic's (1997) variation (Blakeslee & McCourt, 2001, 2003), Adelson's Snake stimulus (Adelson, 2000; Blakeslee & McCourt, 2003, 2012; Somers & Adelson, 1997), Hillis and Brainard's (2007) Paint/Shadow stimulus (Blakeslee & McCourt, 2012), so-called “remote” brightness induction stimuli (Blakeslee & McCourt, 2003, 2005; Logvinenko, 2003; Shapley & Reid, 1985), and the mid-gray probes inserted into photographs by Cartier-Bresson (Blakeslee & McCourt, 2012; Gilchrist, 2006). A modified version of the ODOG model (LODOG) which replaces the image-based contrast normalization of ODOG with a local computation (Robinson, Hammon & de Sa, 2007) can explain illusory brightness effects in a somewhat wider variety of stimuli including the zig-zag White stimulus (Spehar & Clifford, 2015) and the radial White stimulus (Anstis, 2005).

Critically, unlike competing explanations for brightness perception such as anchoring theory (Gilchrist, 2006; Gilchrist, Kossyfidis, Bonato, Agostini, Cataliotti, Li, Spehar, Annan & Economou, 1999), filling-in (Grossberg & Todorovic, 1988), edge-integration (Land & McCann, 1971; Rudd & Zemach, 2004, 2007), or layer decomposition (Anderson, 1997), the spatial filtering approach embodied by the ODOG model readily accounts for the often overlooked but ubiquitous gradient (i.e., non-uniform) structure of induction which, while most striking in grating induction (Blakeslee & McCourt, 1999, 2013; Kingdom, 1999; McCourt, 1982; McCourt & Blakeslee, 2015), also occurs in the Hermann grid illusion (Hermann, 1870; Spillmann, 1994), the Chevreul staircase (Chevreul, 1890), Mach Bands (Mach, 1865), and within the test fields of classical simultaneous brightness contrast and the White stimulus (Blakeslee & McCourt, 1999, 2015)Footnote 1. Also, because the ODOG model does not require defined regions of interest it is generalizable to any stimulus, including natural images.

We acknowledge that the ODOG model is imperfect. It would, for example, benefit from modifications such as the replacement of ODOG filters with balanced Gabor functions (Cope, Blakeslee & McCourt, 2009) and the substitution of local contrast gain control (Cope, Blakeslee & McCourt, 2013; 2014) for the image-based (global) normalization procedure which is currently implemented. Nonetheless, the utility of the spatial filtering approach lies in the ODOG model’s success in accounting for brightness in a wide variety of stimuli, ranging from simple to complex, without the adjustment of any parameter values, and its parsimony, which acts as a scientifically necessary counterweight to high-level theories which posit only vaguely specified mechanisms such as unconscious inference, perceptual transparency, Gestalt grouping, intrinsic image layer decomposition, and the like.

Because of its rigor and simplicity the ODOG model has proven both influential and provocative (Kingdom, 2011). There are ten principal publications in which the ODOG model (or its earlier non-oriented DOG version) has been invoked to explain various aspects of brightness perception (Blakeslee & McCourt, 1997, 1999, 2001, 2003, 2004, 2005, 2012, 2013; Blakeslee et al., 2005, 2009). These papers have collectively been cited over 400 times (Google Scholar). In response to persistent requests for source code from colleagues desiring to test their psychophysical results and/or own model predictions against those of the ODOG model, we here provide fully annotated Wolfram Mathematica notebooks accompanied by a brief mathematical description of the ODOG model.

ODOG model filters

The ODOG model consists of 42 oriented difference-of-gaussians filters (i.e., receptive fields) taken over six orientations and seven octave-interval spatial scales. Input patterns are linearly processed by each filter and the filter outputs are combined by a particular nonlinear weighting which approximates the shallow low-frequency falloff of the suprathreshold contrast sensitivity function (Georgeson & Sullivan, 1975).

The ODOG filters are given by:

$$ \begin{array}{c}\hfill\ {f}_{ODOG}\left({\sigma}_1,{\sigma}_2,\alpha; {x}_1,{x}_2\right)=\frac{1}{2\pi {\sigma}_1^2}\ \exp \left(-\frac{y_1^2 + {y}_2^2}{2{\sigma}_1^2}\right) - \frac{1}{2\pi {\sigma}_1{\sigma}_2}\ \exp \left(-\frac{1}{2}\left(\frac{y_1^2}{\sigma_2^2}+\frac{y_2^2}{\sigma_1^2}\right)\right)\hfill \\ {}\hfill =\frac{1}{2\pi {\sigma}_1}\ \exp \left(-\frac{y_2^2}{2{\sigma}_1^2}\right)\left(\frac{1}{\sigma_1}\ \exp \left(-\frac{y_1^2}{2{\sigma}_1^2}\right) - \frac{1}{\sigma_2}\ \exp \left(-\frac{y_1^2}{2{\sigma}_2^2}\right)\right)\hfill \end{array} $$
(A.1)

where σ 2 > σ 1 > 0 and y 1y 2 are rotated variables given by:

$$ {y}_1 = + \cos \left(\alpha \right){x}_1+ \sin \left(\alpha \right){x}_2\kern0.5em \mathrm{and}\kern0.5em {y}_2 = - \sin \left(\alpha \right){x}_1+ \cos \left(\alpha \right)\ {x}_2 $$
(A.2)

The condition σ 2 > σ 1 ensures that regions of excitation and inhibition are aligned along the y1-axis. The filters are simple difference of unit volume gaussians and are thus perfectly balanced (i.e., total filter volume = 0).

The Fourier Transform of ODOG filters is given by:

$$ {F}_{ODOG}\left({\sigma}_1,{\sigma}_2,\alpha; {s}_1,{s}_2\right)= \exp \left(-2{\pi}^2{\sigma}_1^2{t}_2^2\right)\ \left( \exp \left(-2{\pi}^2{\sigma}_1^2{t}_1^2\right)- \exp \left(-2{\pi}^2{\sigma}_2^2{t}_1^2\right)\right) $$
(A.3)

where t 1t 2 are rotated variables given by:

$$ {t}_1 = + \cos \left(\alpha \right){s}_1+ \sin \left(\alpha \right){s}_2\ \mathrm{and}\ {t}_2 = - \sin \left(\alpha \right){s}_1+ \cos \left(\alpha \right){s}_2 $$
(A.4)

ODOG filters possess six orientations at 30o intervals:

$$ \mathrm{a}={0}^{\mathrm{o}},3{0}^{\mathrm{o}},6{0}^{\mathrm{o}},9{0}^{\mathrm{o}},12{0}^{\mathrm{o}},15{0}^{\mathrm{o}} $$
(A.5)

and seven spatial scales arranged at octave intervals:

$$ {\sigma}_1 = 0.0{46875}^{\mathrm{o}},0.0{9375}^{\mathrm{o}},0.{1875}^{\mathrm{o}},0.{375}^{\mathrm{o}},0.{75}^{\mathrm{o}},1.{5}^{\mathrm{o}},3{.0}^{\mathrm{o}}\mathrm{with}\ {\sigma}_2=2{\sigma}_1 $$
(A.6)

Figure 1(a) illustrates a space-domain representation of an ODOG filter in cross-section along the (oriented) y 1 -axis. Figure 1(b) illustrates the same filter in the spatial.

Fig. 1
figure 1

(a) Space-domain representation of an ODOG filter in cross-section along the (oriented) y 1 -axis; (b) the same filter in the spatial frequency (Fourier) domain along the (oriented) t 1 -axis frequency (Fourier) domain along the (oriented) t 1 -axis

Input patterns

Input patterns p(x 1x 2) are non-negative functions on the plane (i.e., images). The working region of the model is a square patch subtending 32o × 32o of visual angle, and the size of input patterns should be scaled accordinglyFootnote 2. The ODOG filters map input patterns p to output patterns q (which may be negative):

$$ q\left({\sigma}_1,{\sigma}_2,\alpha; {x}_1,{x}_2\right) = {\displaystyle {\int}_{\mathrm{\mathbb{R}}x\mathrm{\mathbb{R}}}}\ {f}_{ODOG}\left({\sigma}_1,{\sigma}_2,\alpha; {y}_1-{x}_1,{y}_2-{x}_2\right)\ p\left({y}_1,{y}_2\right)d{y}_1d{y}_2 $$
(B.1)

Note that the linear operator is a (reversed) convolution where the kernel has the form f(y − x)dy instead of f(x − y)dy.

The convolution form is exploited for computational efficiency. In practice, patterns are represented as 1024 × 1024 RGB pixel matrices, and the convolution is calculated using the Fast Fourier Transform.

Output patterns

Let q(σ 1, σ 2, α, x 1, x 2) be the output pattern produced by convolving an ODOG filter with spatial parameters σ 1, σ 2 and orientation α with a given input pattern p(x 1x 2), as described in (Eq. B.1). The 42 output patterns undergo two additional stages of processing.

First, for each orientation α a weighted summation over filter size is taken:

$$ Q\left(\alpha; {x}_1,{x}_2\right)={\displaystyle {\int}_{all\ {\sigma}_1}}w\left({\sigma}_1\right)q\left({\sigma}_1,2{\sigma}_1,\alpha; {x}_1,{x}_2\right)d{\sigma}_1 $$
(C.1)

where the weight function is \( w\left({\sigma}_1\right)={\left(\frac{8}{3}{\sigma}_1\right)}^{-1/10} \) with σ 1 in degrees. The integral is approximated by the sum:

$$ Q\left(\alpha; {x}_1,{x}_2\right)\simeq {\displaystyle \sum }w\left({\sigma}_1\right)\ q\left({\sigma}_1,2{\sigma}_1,\alpha; {x}_1,{x}_2\right) $$
(C.2)

where the values of σ 1 are given in (Eq. A.6) above.

The root mean square magnitude ‖Q(α; x 1, x 2)‖ of the output pattern at each orientation α is calculated by:

$$ {\left\Vert Q\left(\alpha; {x}_1,{x}_2\right)\right\Vert}^2={\displaystyle {\int}_{\mathrm{\mathbb{R}}x\mathrm{\mathbb{R}}}}{\left(Q\left(\alpha; {x}_1,{x}_2\right)\right)}^2d{x}_1d{x}_2 $$
(C.3)

and is used as a contrast normalization factor.

The final output pattern R(x 1x 2) is obtained by averaging the normalized output patterns over all orientations (0 ≤ α ≤ π rad):

$$ R\left({x}_1,\ {x}_2\right)=\frac{1}{\pi }{\displaystyle {\int}_{all\ \alpha }}\frac{Q\left(\alpha; {x}_1,{x}_2\right)}{\left\Vert Q,\left(\alpha; {x}_1,{x}_2\right)\right\Vert }d\alpha $$
(C.4)

The ODOG model approximates the integral by averaging over the six discrete orientations which are spaced at intervals of 30o:

$$ R\left({x}_1,\ {x}_2\right)\simeq \frac{1}{6}{\displaystyle {\sum}_{k=0}^5}\ \frac{Q\left(k\pi /6;{x}_1,{x}_2\right)}{\left\Vert Q,\left(k\pi /6;{x}_1,{x}_2\right)\right\Vert } $$
(C.5)

Notes concerning implementation

It should be kept in mind that the model assumes a square region of space subtending 32o × 32o, which corresponds to an image size of 1,024 × 1,024 pixels (0.03125o/pixel). The space constant (σ) of the largest ODOG filter measures 6o (192 pixels), so the mapping of output patterns to input patterns which are restricted to the central 16o × 16o (512 × 512 pixel) region, and which are padded beyond this area with zeros (or with the pattern mean value) will be essentially free from distortion. In practice, input patterns larger than 512 × 512 pixels can be tolerated, although users may want to vary input pattern size and examine the output patterns to assess whether significant distortion is occurring.

Whereas input patterns are images (matrices of non-negative integers ranging from 0–255), the convolution of these patterns with the volume-balanced ODOG filters produces output patterns of positive and negative real numbers whose mean is zero. To display ODOG model output as images, and to compare it with psychophysical brightness matches (expressed as percent maximum luminance), output patterns are typically additively offset to possess a mean of 128, and are scaled to possess integer values between 0 and 255. The scaling factor is arbitrary, but is usually chosen to maximize the correlation between ODOG model output and brightness matching data.

Mathematica notebooks

Four executable Mathematica (.nb) notebooks are included with this paper. They are:

  • Blakeslee_Cope_&_McCourt_(Notebook_A_ODOG_Filter_FT_Generation).nb

  • Blakeslee_Cope_&_McCourt_(Notebook_B_File_Format_Conversion_TIF_to_DAT).nb

  • Blakeslee_Cope_&_McCourt_(Notebook_C_ODOG_Pattern_Processing).nb

  • Blakeslee_Cope_&_McCourt_(Notebook_D_Examine_Results).nb

These fully annotated Notebooks are written to step users through setting up appropriate directories, generating the library of ODOG filter files, processing a sample (image) pattern (White_Stimulus.tif) through the ODOG model, and examining the Input and Output patterns.

About Wolfram Mathematica

This implementation of the ODOG model uses Mathematica, a general purpose mathematical software platform by Wolfram Research, Inc. Extensive knowledge or experience with Mathematica is not required but the following background may be helpful:

  • Mathematica files are called Notebooks and the corresponding filenames have the extension .nb.

  • Notebooks are divided into cells which are identified by cell delimiters at the extreme right edge of the display.

  • To select a cell, click once on the cell delimiter, which highlights the selected cell.

  • Types of cells include Text cells (which contain text material), Input cells (which contain the executable Mathematica commands), and Output cells (which display the results of evaluating Input cells). Cell types can be identified by their different fonts. When Input cells are evaluated Output cells will appear. Another way to identify a cell type is to select the cell (click on the delimiter) and look under Format: Style in the Menu Bar where a check mark appears next to the cell type.

  • To evaluate an Input cell, select the cell (click on the delimiter) and press SHIFT+ENTER. This evaluates the commands in the Input cell. Text and Output cells cannot be evaluated. The delimiter of an Input cell is highlighted when the cell is evaluated and remains highlighted while the evaluation proceeds. In the ODOG model, some steps, such as generating the filter FT files in Notebook A, or the pattern processing stage in Notebook C, may take a few minutes for evaluation.

  • To stop an evaluation, click on Evaluation: Abort Evaluation in the Menu Bar.

  • To delete a cell (such as an Output cell), select the cell (click on the delimiter) and press the Delete key. After running the notebooks it is good practice to Delete All Output by selecting that option under Cell in the Menu Bar.

  • In this implementation of the ODOG model the only commands requiring interaction by the user are ones where directory or file names need to be set to specify storage locations. Automatic checks are provided to help.

  • The commands in an Input cell often end with semicolons which suppress the display of the output of evaluating the command. You can add or remove a semicolon at the end of a command without affecting the evaluation of a command, and you may find it helpful to remove one to see the result which is displayed. However, if the output is, say, a 1,024 × 1,024 matrix of complex-valued numerical data, the display may be too extensive to be helpful, although the experience will be memorable.

  • Mathematica is available for download on a 15-day trial basis on the Wolfram website (https://www.wolfram.com/mathematica/trial).