When a target object (such as a letter) is presented to the peripheral retina flanked by similar non-target objects (other letters), a human observer’s ability to discriminate or identify the target object is impaired relative to conditions where no flankers are present. This “crowding” phenomenon (Andriessen and Bouma 1975; Levi et al. 1985; Greenwood et al. 2009; Bouma 1970; Parkes et al. 2001; Toet and Levi 1992; Strasburger 2014; Herzog et al. 2015; Harrison and Bex 2015) is characterized by a reduction in sensitivity to peripheral image structure. One way to physically change image structure is to apply spatial distortion, in which the position of local elements (pixels) are perturbed in some fashion (for example, by stretching or shifting). Characterizing human sensitivity to spatial distortions is one way to investigate the perceptual encoding of local image structure. For example, showing that perception is invariant to a certain type of distortion (i.e., things look the same whether physically distorted or not) implies that the human visual system does not encode the distortion in question, either directly or indirectly. Arguably, measuring sensitivity to the distortion of highly familiar shapes such as letters (as we do in this paper) allows one to characterize human perception in a more complex task than (for example) grating orientation discrimination, but one that is more tractable from a modeling perspective than (for example) letter identification, which may require a full model of letter encoding. In addition, psychophysical investigation of spatial distortions is relevant to metamorphopsia—the perception of persistent spatial distortions in everyday life—which is commonly associated with retinal diseases that affect the macular (Wiecek et al. 2014).

Human sensitivity to spatial distortions has been investigated previously in images of faces (Spence et al. 2014; Rovamo et al. 1997; Dickinson et al. 2010; Hole et al. 2002) and natural scenes (Kingdom et al. 2007; Bex 2010). To our knowledge, only one study has assessed the impact of spatial distortion for letter stimuli. Wiecek et al. (2014) had observers identify letters (26-alternative identification task) distorted with bandpass noise distortion (see below) while varying the spatial scale of distortion, the letter size and the viewing distance. Interestingly, they report an interaction between the spatial scale of distortion (CPL; cycles per letter) and viewing distance (changing letter size), such that for small letters (subtending 0.33 degrees of visual angle) performance was worst for coarse-scaled distortions (2.4 CPL), whereas for large letters (5.4 deg) the most detrimental distortion shifted to a finer scale (4 CPL). This result has important implications for patients with metamorphopsia: a stable retinal distortion may affect letter recognition for some letter sizes but not others, influencing acuity assessments using letter charts (a primary outcome measure for clinical vision assessment; Wiecek et al., 2014).

Here we investigate sensitivity to spatial distortions in letters, under crowded (flanked) and uncrowded (unflanked) conditions. Note that our goal here is distinct from that of Wiecek et al. (2014), who measured the impact of distortions on letter identification. We do not measure letter identification here, but instead use letters as a class of relatively simple, artificial, but highly familiar stimuli to investigate sensitivity to the presence of distortion per se. We quantify the detectability of two different types of spatial distortion commonly used in the literature (see also Stojanoski & Cusack, 2014, for another distortion not employed here). In bandpass noise distortions (hereafter referred to as BPN distortion; Bex, 2010), pixels are warped according to bandpass filtered noise; this ensures that the distortion occurs on a defined and limited spatial scale. In radial frequency distortions (hereafter referred to as RF distortion; Dickinson et al., 1998; Wilkinson, Wilson, & Habak, 2010), the image is warped by modulating the radius (defined from the image center) according to a sinusoidal function of some frequency defined in polar coordinates. For our purposes, they serve to produce two different graded changes in letter images. A successful model of form discrimination in humans would explain sensitivity to both types of distortion and any dependence on surrounding letters (potentially, different mechanisms may be required to explain sensitivity to each distortion type).

Experiment 1

Methods

Stimuli, data and code associated with this paper are available to download from 10.5281/zenodo.159360. This document was prepared using the knitr package (Xie 2013; 2015) in the R statistical environment (Core Development Team 2016; Wickham and Francois 2016; Wickham 2009; 2011; Auguie 2016; Arnold 2016) to increase its reproducibility.

Observers

Five observers with normal or corrected-to-normal vision participated in this experiment: two of the authors, one lab member and two paid observers (10 euros per hour) who were unaware of the purpose of the study. All of the observers had prior experience with psychophysical experiments and were between 20 and 31 years of age. All experiments conformed to Standard 8 of the American Psychological Association’s Ethical Principles of Psychologists and Code of Conduct (2010).

Apparatus

Stimuli were displayed on a VIEWPixx LCD (VPIXX Technologies; spatial resolution 1920×1200 pixels, temporal resolution 120 Hz). Outside the stimulus image the monitor was set to mean grey. Observers viewed the display from 60 cm (maintained via a chinrest) in a darkened chamber. At this distance, pixels subtended approximately 0.024 degrees on average (41.5 pixels per degree of visual angle). The monitor was carefully linearized (maximum luminance 212 cd/m2) using a Gamma Scientific S470 Optometer. Stimulus presentation and data collection was controlled via a desktop computer (12 core i7 CPU, AMD HD7970 graphics card) running Kubuntu Linux (14.04 LTS), using the Psychtoolbox Library (Brainard, 2007; Kleiner, Brainard, & Pelli, 1997; Pelli, 1997, version 3.0.11) and our internal iShow library (10.5281/zenodo.34217) under MATLAB (The MathWorks, Inc., R2013B). Responses were collected using a RESPONSEPixx button box.

Stimuli

The letters stimuli were a subset of the Sloan alphabet (Sloan 1959), used commonly on acuity charts to measure visual acuity in the clinic. Target letters were always the letters D, H, K, and N; flanker letters were always C, O, R, and Z. Letter images were 64×64 pixels. To prevent border artifacts in distortion, each image was padded with white pixels of length 14 at each side, creating 92×92 pixel images. These padded letter images were distorted according to distortion maps generated from the BPN or RF algorithms (see below) in a Python (v2.7.6) environment, using Scipy’s griddata function with linear 2D interpolation to remap pixels from the original to the distorted image. That is, the distortion map specifies where to move the pixels from the original image; pixel values in intermediate spaces are linearly interpolated from surrounding pixels to produce smooth distortions.

Bandpass noise (BPN) distortion

Bex (2010, see also (Rovamo et al. 1997; Wiecek et al. 2014)) describes a method for generating spatial distortions that are localized to a particular spatial passband (see Fig. 1a–d). Two random 92×92 samples of zero-mean white noise were filtered by a log exponential filter (see Equation 1 in Bex, 2010):

$$A(\omega) \propto \exp \left( - \frac{|\ln (\omega / \omega_{peak})|^{3} \ln 2}{(b_{0.5} \ln2)^{3}} \right) $$
Fig. 1
figure 1

Distortion methods for bandpass noise (BPN; A–D) and radial frequency (RF; E–G). a A Sloan letter (D) with 14 pixels of white padding. b A sample of bandpass filtered noise, windowed in a circular cosine. Two such noise samples determine the BPN distortion map. c The letter distorted by the BPN technique. d The effects of varying the frequency (columns) and amplitude (rows) of the BPN distortion. e An original letter image, showing the original radius r from the centre to an arbitrary pixel. f RF distortion modulates the radius of every pixel according to a sinusoid, producing a new radius \(r^{\prime }\). g The effects of varying the frequency (columns) and amplitude (rows) of the RF distortion. More examples of distortions applied to letters are provided in the ??

where ω p e a k specifies the peak frequency, ω is the spatial frequency and b 0.5 is the half bandwidth of the filter in octaves. Noise was filtered at one of six peak frequencies (2, 4, 6, 8, 16, 32 cycles per image; corresponding to 1.3, 2.6, 4, 5.3, 10.6, and 21.3 c/deg under our viewing conditions) with a bandwidth of one octave. The filtered noise was windowed by multiplying with a circular cosine of value one, falling to zero at the border over the space of 14 pixels, ensuring that letters did not distort beyond the borders of the padded image region. The amplitude of the filtered noise was then rescaled to have max/min values at 0.25, 0.5, 1, 1.5, 2, 2.5, 3, or 5 pixels; this controlled the strength of the distortion. For presentation of the results (thresholds, below), these amplitude units were transformed from pixels to degrees. One filtered noise sample controlled the horizontal pixel displacement, the other controlled vertical displacement (together giving the distortion map for the griddata algorithm).

Radial frequency (RF) distortion

Here, the distortion map was created by modulating the distance of each pixel from the centre of the padded image according to a sinusoid defined in polar coordinates (see Equation 3 in Wilkinson et al., 1998, and Fig. 1, panels E–G):

$$r^{\prime}(\theta) = r_{0} (1 + A \sin (\omega \theta + \phi)) $$

where \(r^{\prime }\) is the distorted radius from the center, r 0 the undistorted (mean) radius, A is the amplitude of distortion (the proportion of the unmodulated distance from the centre), 𝜃 is the polar angle and ω is the radial frequency of distortion (here 2, 3, 4, 5, 8, or 12 cycles in 2π radians). The angular phase of the modulation (ϕ) on each trial was drawn from a random uniform distribution spanning [0, 2π]. The amplitude of the distortion was set to one of 0.0075, 0.01, 0.0617, 0.1133, 0.1650, 0.2167, 0.2683, or 0.3200. The distortion map was windowed in a circular cosine as above, then the cosine and sine values were passed to griddata as the horizontal and vertical offsets.

To facilitate future modeling of our experiment, we pregenerated all images presented to observers (see below) and saved them to disk. In total we generated 1920 images: two distortion types (BPN, RF) × two conditions (flanked, unflanked; see below) × eight amplitudes × six frequencies, each repeated ten times. BPN distortions are generated from new random noise images and RF distortions with random phases, meaning that these ten repetitions were unique images. Target positions, letter identities, and distortions were randomized on each repeat. In addition, we generated the same 1920 images without applying distortion to one of the target letters and saved them to disk. An image-based model of pattern recognition could be evaluated on the same stimuli as we have shown to our observers, using an undistorted “full-reference” image as a baseline (all images are provided online at 10.5281/zenodo.159360).

Procedure

On each unflanked trial, observers saw the four target letters and indicated the location (relative to fixation) of the distorted letter. The letters subtended approximately 1.5×1.5 dva and were located above, below, right and left of fixation (see Fig. 2a); letter identity at each location was randomly shuffled on each trial. The target letters were centered at a retinal eccentricity of 320 pixels (7.7 dva), and observers were instructed to maintain fixation on the central fixation cross (best for steady fixation from Thaler, Schtz, Goodale, & Gegenfurtner, 2013). The entire letter array was presented on a square background of maximum luminance (side length 1024 pixels or 24.3 dva); the remainder of the monitor area was set to mean grey. Letter strokes were set to minimum luminance (i.e., the letters were approximately 100 % Michelson contrast). The letter array was presented for 150 ms (abrupt onset and offset), after which the screen was replaced with a fixation cross on the same square bright background. The observer had up to 2000 ms to respond (a response triggered the next trial with ITI 100 ms), and received auditory feedback as to whether their response was correct.

Fig. 2
figure 2

Example stimulus arrays showing BPN distortions. a An unflanked trial example. In this example, the correct response is “above”. b A flanked trial example. The correct response is “below”

On flanked trials (Fig. 2B), four undistorted flanking letters the same size as the target were presented above, below, left, and right of each target letter (center-to-center separation 1.9, corresponding to approximately 0.25 of the eccentricity, well within the spacing of “Bouma’s law”; Bouma (1970)). The arrangement of the four flanking letters was randomly determined on each trial.

Different distortion frequencies (six levels) and amplitudes (seven levelsFootnote 1) were randomly interleaved within a block of trials, whereas the distortion type (BPN or RF) and letter condition (unflanked or flanked) were presented in separate blocks. Each pairing of frequency and amplitude was repeated ten times (corresponding to the unique images generated above), creating 420 trials per block. Breaks were enforced after every 70 trials. Blocks of trials were arranged into four-block sessions, in which observers completed one block of each pairing of distortion type and letter condition. Observers always started the session with an unflanked letter condition in order to familiarize them with the task.Footnote 2 Each session took approximately 2 h. All observers participated in at least four sessions. Before the first block of the experiment observers completed 70 practice trials to familiarize themselves with the task. In total, we collected 20,160 trials on each of the unflanked and flanked conditions.

Data analysis

Data from each experimental condition were fit with a cumulative Gaussian psychometric function using the psignifit 4 toolbox for Matlab (Schütt et al. 2016), with the lower asymptote fixed to chance performance (0.25). The posterior mode of the threshold parameter (midpoint of the unscaled cumulative function) and 95 % credible intervals were calculated using the default (weak) prior settings from the toolbox. The 95 % credible intervals mean that the parameter value has a 95 % probability of lying in the interval range, given the data and the prior. Psychometric function widths (slopes) either did not vary appreciably over experimental conditions (Experiment 1) or, when they did (Experiment 2), patterns of variation showed effects consistent with the threshold estimates. This paper therefore presents only threshold data for brevity.

Results

Thresholds for detecting the distorted target letter are shown in Fig. 3. For both distortion types, observers were less sensitive to letter distortion (thresholds were higher) when the target letters were surrounded by four flanking letters (light triangles) compared to when targets were isolated (dark circles). This pattern is an example of crowding. Furthermore, we observe that the two distortion types (BPN and RF) show different dependencies on their respective frequency parameters (which are not themselves comparable). RF distortions become easier to detect the higher their frequency (c / 2π radians). BPN distortions show evidence of tuning, such that thresholds are lowest for frequencies in the range of 4–10 c/deg and rise for both lower and higher frequencies (note the log-log scaling in Fig. 3). To quantify these effects, we fit curves to the thresholds as a function of the log distortion frequency (BPN: four-parameter Gaussian fit by minimizing the sum of squared errors with the BFGS method of R’s optim function;Footnote 3 RF: linear model fit with R’s lm function; see lines in Fig. 3 for model fits).

Fig. 3
figure 3

Results of Experiment 1. Top panels show threshold amplitude for detecting letters distorted with BPN distortions, as a function of distortion frequency (c/deg) for five observers. Note both the x- and y-axes are logarithmic. Points show the posterior MAP estimate for the psychometric function threshold; error bars show 95 % credible intervals. Thresholds are higher (observers are less sensitive to distortions) when flanking letters are present (light triangles) compared to unflanked conditions (dark circles). Additionally, thresholds appear to show tuning, being lowest at approximately 6–8 c/deg. Lines show fits of a Gaussian function to the log frequencies and linear thresholds (see text for details). Bottom row of panels show RF distortions. Flanking letters again impair performance. Unlike in the BPN distortions, for RF distortions performance simply worsens for higher distortion frequencies. Lines show fits of a linear model to the log frequencies and linear thresholds. The reader can appreciate these results for themselves by examining how distortion visibility changes as a function of frequency in Fig. 1d and g

To quantify the overall decrease in performance caused by the presence of flanking letters, we examined how the area under these curves (estimated numerically) changed from unflanked to flanked conditions.Footnote 4 Larger areas mean higher thresholds (i.e., lower sensitivity). We quantify these differences using paired t tests of both frequentist and Bayesian (Rouder et al. 2012; Morey and Rouder 2015) flavors. For the BPN distortion type, flanking letters raised the mean area under the Gaussian threshold curve from 0.09 (SD = 0.01) to 0.14 (SD = 0.02); t(4) = 6.26, p = 0.0033, BF = 15.7. For the RF distortion type, flanking letters raised the mean area under the linear fit from 0.17 (SD = 0.01) to 0.33 (SD = 0.05); t(4) = 7.17, p = 0.002, BF = 22.6. Thus, both crowding effects we observe appear reasonably robust.

Next, we consider the peak distortion frequency at which thresholds were lowest for the BPN distortions (there is no peak in our data for the RF distortions). There was a reasonable effect of flanking, such that when flanking letters were present, distortion sensitivity peaked at higher frequencies (M = 8.73 c/deg, SD = 0.88) than when target letters were unflanked (M = 6.44, SD = 0.88; a difference in peaks of 0.44 octaves; t(4) = 5.9, p = 0.0041, BF = 13.4). While the effect is therefore large compared to the relevant error variance, note that it ignores the precision with which the peak frequency is determined by the data, and so should be interpreted with a degree of caution.

Experiment 2

Our first experiment showed that sensitivity to both BPN and RF distortions was reduced in the presence of undistorted flanking letters. Interestingly, our observers reported experiencing “pop-out” in the flanked condition, such that the distorted letter appeared relatively more salient than the three undistorted targets by virtue of its contrast with neighboring undistorted flankers. That is, the distorted letter strokes appeared subjectively more noticeable when next to undistorted strokes. While the data quantitatively argue against such a pop-out effect (since flanking letters impaired performance), we nevertheless decided to conduct a series of follow-up experiments to determine whether there was any dependence of the thresholds on the kind of flankers employed. Flankers more similar to the target are known to cause stronger crowding (e.g. Bernard & Chung, 2011; Kooi, Toet, Tripathy, & Levi, 1994); it is therefore plausible that distorted flankers would produce even greater performance impairment.

We test this hypothesis in three related sub-experiments. Because we will directly compare the data from each experiment, we present the similarities and differences in the experimental procedures first, followed by all data collectively. Three of the observers from Experiment 1 (two authors plus one lab member) participated in these experiments; all other experimental procedures were as in Experiment 1 except as noted below. As in Experiment 1, all test images were pregenerated and saved along with undistorted reference images to facilitate future modeling work.

Methods

Experiment 2a: varying the number of distorted flankers

This experiment was identical to Experiment 1, with the primary exception that in some trials either two or four of the flanker letters in every letter array (above, left, below, and right) were also distorted (see Fig. 4A–C). That is, observers reported the location of the distorted target letter, sometimes in the presence of distorted flankers. If distorted targets pop out from undistorted flankers and undistorted targets pop out from distorted flankers (symmetrical popout), we might expect that settings in which two of four flankers are distorted would be hardest. In the case of no undistorted flankers (i.e., the same as the flanked condition in Experiment 1), the distorted target pops out from the flankers. In the case of four distorted flankers, the undistorted targets pop out in three of the four possible locations, alerting the observer to the correct response by elimination. Finally, when two flanking letters are distorted, any differential pop-out signal is minimized because the nontarget letter arrays contain two distorted letters whereas the letter array corresponding to the correct response contains three distorted letters. This account would therefore predict that thresholds in the two distorted flanker letter condition should be higher than those for zero or four distorted flankers.

Fig. 4
figure 4

Example stimulus displays from Experiment 2 (all examples show the BPN distortion type at high distortion amplitudes). In Experiment 2a, observers detected the distorted middle letter when surrounded by zero (a), two (b) or four (c) distorted flankers. d In Experiment 2b, observers indicated the undistorted middle letter surrounded by four distorted flankers. e In Experiment 2c, flankers were always distorted at a highly-detectable distortion level. The correct response in panels a–e are down, left, down, left and right

In this experiment, we selected one distortion frequency for each distortion type: 2.6 c/deg for the BPN and 4 c/ 2π for the RF distortions. Because our pilot testing indicated these tasks were more difficult than those in Experiment 1, we generated distortions at higher amplitudes than those in the first experiment: 0.024, 0.048, 0.072, 0.096, 0.120, 0.144, and 0.168 for BPN and 0.05, 0.125, 0.2, 0.275, 0.25, 0.425, and 0.5 for RF. Flanking letters were distorted with the same frequency and amplitude distortion as the target letter on every trial.

Trials of different distortion types (BPN, RF) and flanker conditions (zero, two or four distorted flankers) were presented in separate blocks in which each of the seven amplitudes were randomly interleaved. Ten unique images were created for each amplitude, each repeated three times to give 30 trials per amplitude (210 per block). Blocks of trials were arranged into six-block sessions, consisting of each distortion type and flanker condition in a random order for each observer. All observers participated two sessions, creating a total of 7560 trials.

Experiment 2b: detect the undistorted letter in the presence of distorted flankers

In Experiment 1, observers detected which of four letters was distorted when surrounded by four undistorted flanking letters. In Experiment 2b we examine the inverse task: to detect which middle letter is undistorted in the presence of four distorted flankers (Fig. 4d). If distortion detection is symmetric, performance in this condition should be as good as in the zero distorted flanker condition of Experiment 2a. That is, distorted letters should pop out from undistorted flankers just as undistorted letters pop out from distorted flankers. The procedure was otherwise identical to Experiment 2a, with the exception that observers did two blocks (BPN and RF distortion types) of 210 trials (totaling 1260 trials).

Experiment 2c: flanker distortion at fixed high amplitude

In Experiments 2a and 2b, flanker distortions had the same amplitude as the target letter distortion. Therefore, for low target distortion amplitudes the flanker distortions were also subthreshold. Popout, if it exists, may require detectable levels of distortion in the flanking elements. To test this question, we repeated the four distorted flanker condition from Experiment 2a, with the exception that the flankers were distorted at a fixed amplitude that rendered distortions easily detectable (0.144 c/deg for BPN, 0.425 c/ 2π for RF; see Fig. 4e). If popout requires suprathreshold distortions in flanking letters, then sensitivity in this condition should be higher than the four distorted flanker condition from Experiment 2a (i.e., more similar to the zero distorted flanker condition for Experiment 2a). Observers performed at least two blocks, one for each distortion type (2520 trials total).

Results

Threshold levels of distortion are shown in Fig. 5. The results for the BPN and RF distortions show qualitatively similar effects of the experimental conditions. First, thresholds increase as more flanking letters are distorted: detecting distortions in arrays with two or four distorted flankers is more difficult than when no flankers are distorted (Experiment 2a; Fig. 5 circles). There is therefore no support for the prediction that thresholds would be higher in the two distorted flanker condition which, had it occurred, would be consistent with targets popping out from (un)distorted flankers in the zero and four distorted flanker conditions.

Fig. 5
figure 5

Results of Experiment 2. Top panels show threshold amplitude for detecting the target letter as a function of the number of distorted flankers, for three observers in the BPN distortion condition (Experiment 2a). Note the logarithmic y-axis. Points show the posterior MAP estimate for the psychometric function threshold; error bars show 95 % credible intervals. Different shapes and shading denote Experiments 2a, 2b and 2c. Points for four distorted flankers have been shifted in the x direction to aid visibility. Bottom panels show the same as the top for RF distortions

The results of Experiment 2b (Fig. 5, triangles) also provided no support for symmetrical popout. There was no evidence that detecting an undistorted target letter amongst four distorted flankers was as difficult as the zero distorted flanker condition of Experiment 2a; instead, thresholds for detecting the undistorted target letter were more similar to those for detecting a distorted target letter amongst four distorted flankers.

Finally, thresholds in Experiment 2c (Fig. 5, squares) show that detecting a distorted letter amongst four distorted flankers requires substantially more distortion amplitude than those with no distorted flankers (Experiment 2a with no distorted flankers), despite the flanker distortions always being easily detectable. This result confirms the absence of symmetrical popout found in Experiments 2a and 2b: it is not the case that the three undistorted targets pop out from their distorted surrounds (which if it occurred would allow the observer to choose the correct response by selecting the array with no popout).

It is additionally interesting to consider the pattern of results for Experiment 2c relative to the other four letter distorted-flanker conditions. Here we see opposite patterns of results for the BPN and RF distortions. For BPN distortions, Experiment 2c produces the highest thresholds compared to the other experiments, suggesting that highly visible flanker distortions produce even stronger masking. Conversely, for the RF distortions Experiment 2c thresholds are lowest of the other four-distorted-flanker data in two of three observers. This could reflect some facilitation for this distortion type, but given the inconsistency between observers we would want to collect more data before drawing strong conclusions.

Discussion

We have measured human sensitivity to geometric distortions of letter stimuli presented to the peripheral retina. For two types of distortion, Experiment 1 showed that distortion sensitivity is reduced when target letters are surrounded by task-irrelevant flankers. This result is therefore an example of crowding (Bouma 1970). In the follow-up studies of Experiment 2 we found that this impairment became more severeFootnote 5 when flanking letters were themselves distorted – i.e. we do not find evidence of distortion “pop-out”. That distortion sensitivity can be crowded is perhaps unsurprising; nevertheless, we find it worthwhile to demonstrate the impairment and measure its strength. The second result is more curious, because a consideration of the stimulus dimensions that may underlie distortion detection suggests we should have found the opposite result.

Relevance to crowding

Crowding has previously been shown to exist for both letter identification (Bouma 1970; Pelli et al. 2004; Chung et al. 2002; Estes 1982) and orientation discrimination (Andriessen and Bouma 1975; Parkes et al. 2001; Wilkinson et al. 1997; Pelli et al. 2004; Harrison and Bex 2015). Our experiments could be considered to probe an intermediate level of representation: geometric distortions can change the contours of these simple but highly familiar shapes.

It is therefore relevant to ask what more primitive dimensions might underlie the effects we report. Detecting deviations from expected shape potentially involves local orientation processing, position, curvature, contour alignment and spatial frequency changes. What does the crowding literature tell us about these potential cues? As mentioned above, there is strong evidence from a number of studies that local orientation processing is impaired by crowding. Sensitivity to local position (Dakin et al. 2010; Greenwood et al. 2009; 2012), spatial frequency (Wilkinson et al. 1997), curvature (Kramer and Fahle 1996), and contour alignment (Robol et al. 2012; Dakin and Baruch 2009; May and Hess 2007; Chakravarthi and Pelli 2011) is also impaired by flanking elements. Some or all of these potential cues could therefore be related to the effects we observe.

The results from our second experiment show that distorted targets do not pop out from undistorted flankers (and vice versa). This is interesting in light of the extensively documented effects of target-flanker similarity in crowding (Estes 1982; Wilkinson et al. 1997; Kooi et al. 1994; Bernard and Chung 2011; Chung et al. 2001; Chakravarthi and Pelli 2011; Glen and Dakin 2013; Livne and Sagi 2007; 2010; Herzog et al. 2015; Manassi et al. 2013; Saarela et al. 2009; Sayim and Cavanagh 2013). If we define “similarity” at the level of “distortedness”, then in Experiment 1 the distorted target becomes less similar to the undistorted flankers as distortion amplitude increases. The degree of target-flanker similarity in the non-target letter arrays is constant, and determined only by the confusability of the undistorted letters in those arrays. The same holds true for Experiment 2a in the zero distorted flankers condition. When the four flankers are also distorted in Experiment 2a, the similarity between target and flankers in the target array is held constant (as the target becomes distorted with increasing amplitude, so do the flankers), whereas in the non-target letter arrays the central (undistorted) letters and the distorted flankers become less similar. If observers were able to use this decreasing similarity to rule out the non-target arrays, we would expect them to be sensitive to the target location. Instead, their thresholds are much higher relative to the zero distorted flankers case. Experiment 2c provides the opposite case to Experiment 1: because flankers were distorted with a strong amplitude distortion, then as distortion amplitude in the target letter increases, it becomes more similar to the flankers. Therefore, target-flanker similarity effects defined at the level of “distortedness” do not appear to be generally consistent with the patter of results we observe.

A more parsimonious account consistent with the results of Experiment 2 is that performance decays as the “complexity” of the stimulus array increases (under the assumption that flanker distortion increases complexity).Footnote 6 When all four flanking letters were distorted (Fig. 5), thresholds for target detection were higher than other conditions whether the observers were trying to discriminate a distorted middle letter from undistorted ones (Experiment 2a), the undistorted middle letter from distorted middle letters (Experiment 2b) or the distorted middle letter in the presence of strong flanker distortions (Experiment 2c). Flanker distortion increases complexity, making the task more difficult. Letter complexity effects have indeed been demonstrated to play a distinct role from target-flanker similarity in crowded letter identification (Bernard and Chung 2011), an effect attributed to the number of features to be detected within a character (see also Pelli, Burns, Farell, & Moore-Page, 2006; Suchow & Pelli, 2012). It seems plausible then that in our Experiment 2, it is difficult to detect distorted letters in the presence of distorted flankers because of feature crowding.

The model of letter complexity presented by Bernard and Chung (2011) requires a letter skeleton to be known (their paper compared different fonts). We require an image-based metric. We made a coarse attempt to quantify the complexity account above by investigating whether two metrics of visual clutter (Rosenholtz et al. 2007) could qualitatively mimic the effects—on the assumption that a complex display is a cluttered display. Feature congestion is a multiscale measure of the covariance of the luminance contrast, orientation, and color in a given input image. Subband entropy is determined by the bit depth required for wavelet image encoding, expressed as Shannon entropy in bits. These metrics have previously been associated with performance in tasks such as visual search (Asher et al. 2013; Henderson et al. 2009; Rosenholtz et al. 2007). While both metrics showed a robust increase in clutter from unflanked to flanked displays, there was only weak evidence that they were able to capture the other effects in our data (see Supplementary Material). One would need to find a more appropriate measure of complexity—perhaps something similar to these clutter metrics—to capture the full range of the data we report.

Two dominant classes of crowding models are “averaging” models, in which crowding occurs because task-relevant features from the target and flankers are averaged together, and “substitution” models in which properties of the flankers are sometimes mistakenly reported as properties of the target. The present study was not designed to discriminate between these accounts of crowding, and it is somewhat unclear what predictions models of either class would make for our results (can the appearance of distortion be substituted?). Interestingly, recent work shows that because both averaging- and substitution-like errors can be accounted for under a simple population coding model and decision criterion, observing either of these behaviors experimentally does not necessarily discriminate between mechanisms (at least for orientation discrimination; Harrison & Bex, 2015). It may be fruitful to consider what such a letter-agnostic population coding model might predict for our experiments.

Relevance to other investigations of distortions

How do our results fit with previous investigations of human perception of these two distortion types? We first consider BPN distortions. Our Experiment 1 revealed that distortion sensitivity is tuned to mid-range distortion frequencies (approximately 6–9 c/deg). Bex (2010) also found bandpass tuning for detecting BPN distortions introduced into one quadrant of natural scenes. Observers were maximally sensitive to distortions of approximately 5 c/deg, and these peaks were relatively stable for distortions centered at retinal eccentricities of 1.5, 2.8 and 5.6 deg. These estimates are at the lower bound of those we observe here. This might suggest that distortion detection sensitivity in letter stimuli peaks at higher spatial scales than detecting distortions of natural scene content. However, the results of Wiecek et al. (2014) imply that the peaks we observe will also depend on letter size, so it may not be generally meaningful to compare the peaks we observe to those of Bex (2010).

In Wiecek et al. (2014), letters of different sizes were presented foveally, and participants identified the letter after BPN distortion. Letter identification performance showed different tuning for distortion frequency at different letter sizes. Filtering with a peak frequency of 8 c/deg produced poorest identification performance for letters subtending 0.33 deg. These results fit with our data, if we assume that when a distortion is maximally detectable (peak sensitivities in our experiment) it maximally reduces letter identification Wiecek et al. (2014); the difference in letter size likely reflects a size scaling constant in detectability as letters move away from the fovea (Chung et al. 2002; Song et al. 2014).

What causes the bandpass tuning for BPN distortions? Potentially, sensitivity to whatever primitive feature dimensions are used to detect the distortions (e.g., contrast, curvature changes) also follow a bandpass shape. Note, however, that an analysis of the spatial frequency and orientation energy changes induced by distortions (Supplemental Material) reveals no obvious relationship to performance for those dimensions. Additionally, BPN distortions of sufficient amplitude (when the pixel shift exceeds half the distortion wavelength) will cause reversals in pixel positions, producing “speckling” at high frequencies but leaving the mean position of low-frequency components unchanged (see for example Fig. 1d, the highest amplitude distortions for the two highest frequencies). The bandpass tuning might reflect sensitivity to this speckling: detecting high-frequency distortions requires detecting high-frequency speckles (see also spectral analysis in the Supplemental Material), which are difficult to see in the periphery due to acuity loss.Footnote 7 Thresholds therefore rise again compared to mid-frequency distortions, which observers can detect well before speckling occurs. Experiment 1 also showed that when flankers are present, peak sensitivity shifts to higher frequencies than when flankers are absent. This could be because flanking letters selectively reduce sensitivity to position changes at lower spatial scales, or because flanking letters increase sensitivity to higher-frequency speckles. Given that there is no plausible mechanism that might support the latter possibility, we favor the former.

As to RF distortions, Wilkinson et al. (1998) measured thresholds for detecting RF distortions applied to spatially-bandpass circular shapes as a function of radial distortion frequency. They found that threshold amplitudes decreased as radial frequency increased as we do, but with a different pattern in which thresholds appeared to asymptote for higher frequencies. For RF1 patterns (which we do not test in our study), thresholds were ≈0.2, for RF2 patterns thresholds dropped to ≈0.01, and for higher frequencies (3–24 c/2π) thresholds asymptoted at an average amplitude of 0.003 (in the “hyperacuity” range). Thresholds in our data (Experiment 1 unflanked condition) were much higher (for example, average thresholds for our RF2 patterns were ≈0.15, which is about fifteen times higher than in their data). This is likely because distortions in our experiment were applied to more complex shapes (letters as opposed to bandpass circles) that were presented peripherally (whereas in Wilkinson et al.’s experiment stimuli were nearer to the fovea). Nevertheless, there is little evidence that the asymptotic sensitivities in their results also hold in ours. This may be because the asymptote occurs for higher radial frequencies in the periphery, which conceivably reflects an interaction between the image content of our letter stimuli and the sensitivity of the peripheral retina. Dickinson et al. (2010, see also Dickinson, Mighall, Almeida, Bell, and Badcock (2012)) applied RF distortions to complex broadband images (faces) but did not characterize the radial frequency sensitivity function of these manipulations, so their results are not informative for this question.

Caveats

The experiments in the present paper should be considered with a number of caveats. First, we measure performance for a single target-flanker spacing distance. While this distance was selected to be well within “Bouma’s law” for crowding, and we indeed find an influence of flanking letters, our data provide only a snapshot of the spatial interference profile for these stimuli. Successful models could also be expected to account for the spatial extent of crowding for letter distortions, and so measuring the spatial interference zones would be a useful experimental contribution. In the interests of brevity, we leave those investigations to future studies.

Second, our results do not allow a direct comparison between the two distortion techniques. The frequency and amplitude parameters for each distortion type represent different physical image changes. Radial frequency distortions are highly correlated both tangentially and radially, whereas BPN distortions are not, and these correlations will interact with the original structure of the letter. Each distortion type produces different patterns of human sensitivity as a function of its distortion parameters. Therefore, the distortions and psychophysical results we present here define distinct physical shape changes that produce different patterns of sensitivity, providing a challenge for future accounts of shape perception.

Finally, the generality of our results should be considered with a degree of caution. The detectability of a given distortion will depend on the image content to which it is applied (for example, distorting a blank image region results in no image change). In our experiments, we used only four target letter stimuli. This choice was motivated by the fact that our intention was not to quantify the visibility of distortions across a broad range of stimuli, but to investigate sensitivity in highly familiar simple patterns. Nevertheless, the research discussed above (Bex 2010; Wiecek et al. 2014) corroborates the pattern of bandpass tuning we observe for the BPN distortions in our small set of letter stimuli, suggesting that this pattern applies more generally than just our limited stimulus set. As to RF distortions, we cannot say with any degree of certainty how the patterns of RF sensitivity we observe will generalize to new stimuli, because the previous investigations we are aware of either have not characterized distortion sensitivity as a function of frequency, or have done so in much simpler stimuli (see above).

Other implications

The results of Wiecek et al. (2014) imply that the visibility and functional impairment caused by distortions originating in the retina (such as in metamorphopsia) will depend on viewing distance. Alongside the functional impact of these distortions for the patients in the real world, this result has important consequences for visual acuity testing in the clinic. Interestingly, patients with metamorphopsia often fail to notice their distortions in the real world (Wiecek et al. 2015) and even when tested with artificially regular stimuli (Crossland and Rubin 2007; Schuchard 1993; Wiecek et al. 2015). “Filling-in” processes (Crossland and Rubin 2007) and binocular masking (Wiecek et al. 2015) undoubtedly contribute to this insensitivity. To the extent that the results we report here are generalizable (see above), they (along with Bex, 2010) offer an additional explanation for why patients with metamorphopsia often fail to notice their distortions: in the real world, distortions caused by retinal disease will often be crowded by cluttered visual environments.

Conclusions

Taken together, the pattern of results presented here provides a challenge for models of 2D form processing in humans. A successful model of form discrimination would need to explain sensitivity to two distinct distortion types, the dependence of distortion sensitivity on flanking letters, and the dependence on the type of flanking letters (distorted flankers reduce sensitivity). Directly comparing the BPN and RF distortions would require an image-based similarity metric that captured the perceptual size of the distortions on a common scale. One test of such a similarity metric would be to rescale the results of the BPN and RF data reported here such that the different sensitivity patterns as a function of distortion frequency overlap (assuming that they are detected by a common mechanism). We have provided our raw data and images of the stimuli used in these experiments (10.5281/zenodo.159360) to facilitate future efforts along these lines.