Introduction

Numerous studies have demonstrated that a brief visual stimulus can elicit sustained perceptual states that can contribute to discriminations and memory. Sperling (1960) and Averbach and Coriell (1961) used partial report paradigms to show that letter arrays persisted for several hundred milliseconds, preserving information about shape and spatial location as one might expect from a weak afterimage. Rensink (2014) described this persistent activity as a “visual echo” that serves as a surrogate for the stimulus. The persistent activity may provide a perceptual state as though the stimulus were still present, what Coltheart (1980) and others have called visible persistence. It may also include persistence of information that is not visible, which has been called by various names, including iconic memory (Neisser, 1967), short-term visual storage (Haber & Standing, 1969), non-visible persistence (Phillips, 1974), and information persistence (Erwin, 1976; Coltheart, 1980). Many of the studies make no effort to distinguish and identify the specific kind of post-stimulus information that is being manifested, which complicates the task of relating new data to what has gone before.

The present work follows on a well-established strategy for studying the visual processing of verbal material using brief displays of letters and words (Bethlehem, 1968; Chou, 1929; Cormier & Tomlinson-Keasey, 1991; Robinson, 1934; Segalowitz, 1987; Tinker, 1929; Voboril et al. 2014). Prior studies have displayed stimuli for tens of milliseconds, or longer, these durations being sufficient to drive a concurrent perceptual state, what might be called stimulus driven perception. The challenge then is to find ways to show how sustained activity makes a contribution to the perception, e.g., allows for visible persistence of the stimulus. The studies reported here displayed each letter as a pattern of discrete dots, each being activated (flashed) for only 10 microseconds (μs). The only physiological response that is concurrent with such an ultrabrief flash is photopigment isomerization, and possibly the beginning of the photochemical cascade (Cobbs & Pugh, 1987). All neuronal interactions within and beyond the retina are essentially post-stimulus responses, so any and all conscious perception of the stimulus can be attributed to visible persistence.

Displaying letters with dot patterns has the additional advantage of allowing for unbiased sampling of the pattern as a way to manipulate information content. One can reduce the density of the dots to determine how this affects recognition of the letters. When the dots themselves are readily perceived but the depleted dot pattern is not able to elicit letter recognition, one can discuss the role of information persistence, e.g., iconic memory, as distinct from visible persistence.

This system was previously used to study recognition with a diverse inventory of 360 shapes, e.g., animals, vehicles, tools, and furniture (Greene & Ogden, 2013, Greene, 2013, 2014). The shapes were rendered as a continuous string of dots that marked the major contours of each shape, in particular the outer boundary, similar to a silhouette. While those experiments yielded interesting insights about mechanisms for registering visual content, the use of letters has several advantages with respect to experimental control and modeling.

First, with the inventory of complex namable shapes, one can display a given shape only once. If a respondent identifies a shape on the basis of an accommodating display condition, there is a higher likelihood of doing so again under more adverse conditions. With letters the choice is among 26 known options and the fact that a letter was recognized once should not change the odds that it will be identified in a subsequent display. Second, with the 360-inventory one picks a sample of shapes for display under a given treatment condition, trusting that the random selection has provided samples that are equivalent in difficulty. With the 26-letter alphabet one can present them all at each treatment level, which assures that differences in recognition can be attributed to the treatment, per se. Third, letters are highly over-learned, so under suitable display conditions each letter can be identified by each respondent, i.e., a hit rate of 1.0. This is a better match to assumptions of the logistic regression models that are used for statistical analysis.

Here we report four experiments designed to further evaluate visual mechanisms for registering shape patterns, with special attention to the roles of visual persistence and information persistence. The first experiment displayed each letter pattern across a range of intensities to quantitatively establish the probability of recognition as a function of stimulus intensity. This provided what can be described as an “intensity activation curve,” which is comparable to the classic “frequency of seeing” curve except that it relates specifically to the visibility that allows for recognition. The second experiment displayed each letter pattern at a near-threshold intensity using a two-pulse protocol, i.e., displayed twice with manipulation of the interval between each display. The near-threshold intensity provides a stimulus that is very dim and relatively weak at eliciting letter recognition. With two displays within a short time interval the activation produced by the first display can sum with that of the second to generate a percept that has greater visibility, providing a much higher level of recognition. As the interval becomes longer, there is decay of the initial activation, and the hit rate (probability of recognition) declines. The interval across which the decline of a two-pulse condition reaches the level of recognition that can be elicited by a single pulse control condition reflects the duration of visible persistence.

The third experiment examined how reductions in dot density affected the probability of letter recognition, deriving a “density activation curve.” Here the dots themselves were displayed at a relatively high intensity and were readily visible, but the dot patterns became less effective as density was reduced. This provided a quantitative basis for specifying what level of density to use in the two-pulse protocol of the fourth experiment. Low-density patterns were displayed in the fourth experiment with an intensity that provided for good visibility of the individual dots. A given pulse, i.e., low-density display, could elicit a relatively low level of recognition when delivered alone as a control condition. But with display of two complementary patterns, each being a random low-density sample from the full letter pattern but with non-overlap of the dots, the combined information provided by each could sum to provide a higher probability of recognition. The interval between each display was varied, as with the initial two-pulse experiment, to determine the duration of the information persistence.

Materials and methods

Stimulus display board

The display board was composed of a 64 × 64 array of AlGaInP LEDs (part # RL5-R8030 from Super Bright LEDs, Inc.) which emit at a wavelength of 630 nm (red). LED diameters and center-to-center spacings were 5 mm and 9.4 mm, respectively, and the horizontal and vertical spans of the array were each 60 cm. At the observation distance (3.5 m), the visual angle formed by LED diameter, LED spacing, and the span across the array were 4.92 min, 9.23 min, and 9.80°, respectively.

The stimulus display was controlled by a Propox MMnet101 microcontroller running at 16 Mhz; the microcontroller crystal had stability of 50 ppm. Average speed for processing firmware instructions was 12 MIPs. This system allowed for nominal specification of treatment durations as short as 1 μs. A Mac G4 Cube provided instructions to the microcontroller, these being Tcl/tk custom applications written for OS-X.

Ambient lighting, flash duration, and flash intensity

Test room lights (standard fluorescent fixtures) were fitted with occluding panels that provided ambient illumination of 10 lux (measured with a calibrated Tektronix J1811 photometer) without changing color balance.

A given letter was displayed using a single and simultaneous flash of all the dots in the pattern representing the letter. This was accomplished by applying a specified voltage to all the LEDs forming the pattern for 10 μs. Oscilloscope traces taken with a ThorLabs photodiode having a temporal resolution of 15 nanoseconds revealed that an additional pulse of light was generated at the offset of the requested duration that lasted for 0.3 μs, measured at mid-amplitude. Therefore the full duration of the flash was 10.3 μs, which has been rounded for convenience in the body of this report.

Light emission was controlled by the amount of voltage that was applied to the LEDs. Specifically, the ATMega 128 portion of a microcontroller specified the control voltage using a 13-bit digital-to-analog circuit; this voltage was delivered to a Kepco RKW programmable power supply and then to the LEDs forming the letter pattern.

Flash intensities were measured with a Thorlabs PM100USB radiometer with an S120C calibrated sensor. The radiometer provided calibrated measures of radiant intensity for continuous (steady) emission of light. To assess the intensities of the 10-μs flashes, oscilloscope traces taken with the fast photodiode were scaled in relation to the measured readings. The 0.3-μs Off pulse described above adds a quantity of light that depends on the intensity being requested, the maximum for the present report being 1.2 %. This seems negligible, so the reported intensities have not been adjusted to include the Off pulse.

Stimulus letters

A standard Windows device-independent bitmap (DIB) service was used to render regular Arial 33-point TrueType fonts (without anti-aliasing) as discrete dots that were positioned for display by the LED array (see Fig. 1). Heights and widths of letters, specified as dot count and degrees of visual angle, are provided in Table 1 along with the total number of dots in each dot pattern. All letters except Q were the same height, with Q having two additional dots at the tip of the tail. The letters were centrally positioned on the board, each being shown as simultaneous 10-μs flashes of all the LEDs forming the letter pattern or low-density pattern sample.

Fig. 1
figure 1

Discretized versions of letters were displayed on an array of LEDs as simultaneous 10-μs flashes of the dots forming the letter pattern; four examples are provided here. The upper portion of the panel shows how the letter would be positioned within the full array of LEDs. Each treatment level in each of the four experiments displayed all 26 letters of the alphabet

Table 1 The height and width of each letter is specified as dot count and as degrees of visual angle. The latter was measured from the outer edges of the most extreme dots. Mean width of letters was 17.3 dots (2.60 arc°). Mean number of dots in the letter patterns was 153.6

Experiment approvals and informed consent

The experimental protocols for this experiment were approved by the Institutional Review Board at the University of Southern California. The work was carried out in accordance with the Code of Ethics of the World Medical Association (Declaration of Helsinki). Each respondent was provided with a description of the nature of the research and the judgments they would be asked to make. Further, each was informed that he or she could discontinue participation at any time and for any reason (or no reason) without penalty. Respondents were free to choose among a broad range of experiments and a record of their choice has been retained by the Psychology Department, providing documentation of informed consent. Thirty-two respondents contributed data, eight in each of the four experiments.

Experimental treatments and protocols

All 26 letters were displayed once at each treatment level in each of the four experiments. In each experiment the letter pattern was displayed as a single and simultaneous 10 μs flash of all the LEDs forming the pattern. The display order for the combinations of treatments and letters was random.

Experiment 1 determined the radiant intensity required to elicit letter recognition, with the change in probability of recognition (hit rate) as a function of intensity being described as an “intensity activation curve.” Radiant intensity was varied from 2.73 μW/sr to 69.1 μW/sr in 16 steps. Each letter was displayed once at each intensity level for a total of 416 trials.

Experiment 2 was designed to evaluate summation of near-threshold activation. Respondents were pretested to determine the radiant intensity that would elicit a hit rate that was no greater than 12 %. Based on the individual pretest results, the radiant intensity used for four of the respondents was 6.64 μW/sr, and the remaining four were tested with an intensity set at 13.6 μW/sr. The experiment flashed all the dots in each letter twice at the same radiant intensity separated by stimulus onset asynchrony intervals, designated as T2. T2 was varied from 10 to 100 milliseconds (ms) in 10-ms steps (ten treatment levels). A control treatment was also provided wherein the letter was flashed only once. Each letter was displayed once at each of the treatment levels for a total of 286 trials.

Experiment 3 derived a “density activation function” that specified how changes in density affected hit rate. Figure 2 provides an example of how progressively greater reductions in dot density affect the appearance of a letter.

Fig. 2
figure 2

Experiments 3 and 4 varied the density of dots forming the letters, and all reduced-density patterns were displayed with simultaneous 10-μs flashes. Experiment 3 examined how hit rate changed as a function of dot density and Experiment 4 examined the ability of the visual system to combine two low-density patterns with flash separations of up to 1 s

Radiant intensity for all displays in Experiment 3 was set at 255 μW/sr, this being well above the level that was able to produce high recognition. Pilot work had determined that letters could be readily identified with densities that were over 50 %, so dot density for this experiment ranged from 50 % to 4.29 % in 12 steps, each being 20 % lower than the preceding density. Specifically, the density percentages were: 50, 40, 32, 25.6, 20.5, 16.4, 13.1, 10.5, 8.39, 6.71, 5.37, 4.29. To maximize even distribution of dots at a given density, each letter was partitioned into 10-dot clusters and the sampling removed dots from each cluster as a stratified random process. Each letter was displayed once at each density for a total of 312 trials.

Experiment 4 evaluated the summation of non-redundant low-density dot samples such as the pattern shown in the third panel of Fig. 2. Each respondent was pretested to determine the density that would provide for a hit rate in the 70 % range. On the basis of those measures, the density sample for the eight respondents ranged from 12 % to 17 %, with the mean being 14.6 %.

The experiment then used a random sample that provided two complementary subsets that were half this density for each letter that was displayed. The subsets were complementary in the sense that the dots in one subset did not overlap those of the other. If the activation provided by each were combined on the basis of persistence of neuronal activity, the total quantity of shape information would be increased, thus providing a more effective stimulus for eliciting letter recognition. An independent random selection was made for each display condition. Each low-density subset was flashed with a T2 separation (stimulus onset asynchrony) that ranged from 0 to 1 s in 100-ms increments (11 treatment levels) plus a control treatment wherein a single flash displayed only one of the low-density subsets. Each letter was displayed once at each of the treatment levels for a total of 312 trials.

Radiant intensity for all displays in Experiment 4 was set at 255 μW/sr, the same as for Experiment 3. At this intensity the dots were clearly visible.

Respondents were required to name a letter even if it was a guess, and each response was recorded by the experimenter. There was no suggestion that the respondent should give a quick answer but most voiced a name immediately with no hesitation. A given display and response usually took 3–5 s and the entire test session was completed in about 40 min. Respondents were not provided with any feedback about the accuracy of their responses.

The experimenter was not informed of the treatment that was used for a given display.

Modeling and statistics

For each of the experiments random effects semi-parametric logistic regression was used to model the treatment effects. Conceptually, a smoothly varying “average” curve was fitted for the probability of correct response over all respondents, and the idiosyncratic deviations from the average response curve were incorporated using smooth respondent-specific random effects. The average effect was modeled by a cubic spline with two equally spaced internal knots, while the respondent-specific deviations were modeled with a random intercept and a penalized cubic spline that had three equally spaced knots.

To be more explicit, let P i (x) denote the probability of correct response for subject i when tested at value x of a continuous predictor of interest. Then the following model is used, with the first bracket giving the fixed effect group model and the second bracket the random individual deviation from it:

$$ \log \frac{P_i(x)}{1-{P}_i(x)}=\left[a+{\displaystyle \sum_j{b}_j{B}_j^{(f)}(x)}\right]+\left[{\alpha}_i+{\displaystyle \sum_k{\beta}_{ki}{B}_k^{(r)}(x)}\right]-- $$

where B (f) j (x) is the B-spline basis for the fixed effects, B (r) k (x) is the B-spline basis for the random effects, α i  ∼ N(0, σ 2) is the random subject-intercept, and β ki  ∼ N(0, σ 2 p ) are random spline coefficients.

Based on the shape of the fitted smooth curves, further quantification of the effect was performed by replacing the splines (average) with a piecewise linear effect, with a break-point selected where the slope of the curve changed.

On Experiments 2 and 4, to test whether the average-effect spline had reached the estimate of the single-flash response, single-flash responses were included in the model at the highest value of time, with a fixed effect term measuring its deviation from the spline.

The method provided fitted curves that followed the data closely by assuming a lack of sudden jumps; otherwise, the approach invoked only a minimal number of assumptions. It should be understood that the models are purely descriptive; they do not reflect any sort of response-production process. Thus, the parameters have no psychological meaning.

Analyses were performed using SAS version 9.3 (SAS Institute, Cary, NC, USA), using the Glimmix procedure for the primary analysis. Raw data are available from the first author upon request.

Results and discussion

Letter recognition as a function of flash intensity

Experiment 1 derived an intensity activation function that specified how hit rate changed as a function of radiant intensity. This is essentially a classic frequency of seeing curve that assessed stimulus salience at the various levels of flash intensity, except here the judgments were based on recognition of letter patterns rather than whether the stimuli could be seen. Each letter was shown only once at each treatment level so the effectiveness of the treatment is reflected in the overall hit rate across all the letters of the alphabet.

Figure 3 shows the models for each respondent along with the model for the group, also including a 95 % confidence band for the group model. The group model describes the average pattern over all respondents, whereas an individual model captures the deviation of each individual from the average. Each model was derived using a logistic regression that takes account of the judgments being binary.

Fig. 3
figure 3

(Experiment 1) Models of individual respondents are shown with dashed lines and the group model is shown as a solid line. The 95 % confidence band for the group model is shown as a solid band of color. For all respondents the probability of successful recognition of letters increased monotonically as the radiant intensity of flashes was increased

For assessing the effectiveness of intensity for producing differential hit rates, we chose a break point for the rise in the group activation function at 11.5 μW/sr, designating this as a near-threshold level of flash intensity. The slope at intensities below this break was not significant (log-odds slope = 11.5, SE = 8.85, p = 0.20). Above the break-point intensity, the hit rate of the group model increased significantly up to 65.9 μW/sr (log-odds slope = 61.2, SE = 2.9, p < 0.0001). Between-respondent variability showed significant differences in the shapes of individual response curves (p = 0.019).

Early physiological mechanisms – photoreceptor impulse response

As noted in the Introduction, all of the perceptual influence from the flashes can be attributed to post-stimulus neuronal activity. Articles reporting on psychophysical judgments and/or modeling often use the term “impulse response” to describe a presumed prolonged activation from such displays. Retinal physiologists have long used this term to describe long-duration changes in current and/or voltage that can be recorded from photoreceptors in response to a very brief flash. Using the same term as retinal physiologists could be intended to imply that the perception is evoked and driven by the sustained activity in photoreceptors. To see if this is a reasonable inference, it would be useful to briefly review findings on the duration of photoreceptor impulse responses.

Schnapf et al. (1990) measured the photocurrent of macaque cones in response to 10.7-ms flashes, varying the wavelength to favor each of the three cone classes. Response profiles were biphasic and were similar for red, green, and blue cones. The time course of positive and negative components were similar irrespective of flash intensity. For red cones the time to peak of the positive component was 55 ms and this phase of the impulse response returned to baseline in about 125 ms. The negative component peaked at about 175 ms and the response returned to baseline in about 250 ms.

Nikonov et al. (2006) provided physiological measures of mouse M-cones and S-cones in response to 20-μs light flashes and found that photocurrent peaked in 60–70 ms. Absolute sensitivity of the cones was about the same whether or not a background light was used. The rise time of the impulse response was similar irrespective of flash intensity (their Fig. 2), and response duration to a dim flash was just over 200 ms (their Fig. 4). Response to 7-ms flashes were about the same as for 20-μs flashes, both of which manifested monotonic declines from the peak.

Fig. 4
figure 4

(Experiment 2) The respondent and group two-pulse models are shown in pink. The mean hit rate for the single pulse is shown as a horizontal blue line for purposes of comparison, and the standard error of the one-flash data provides the equivalent of a confidence band (blue). All the two-pulse models showed high levels of shape recognition when all dots in the letters were flashed twice with a separation of only 10 ms. The effectiveness of the paired flashes dropped as the interval between them was increased, reaching the one-pulse baseline in about 100 ms

Although we generally assume that the early visual anatomy and physiology of humans is comparable to that found in macaque, van Hateren & Lamb (2006) have reported that the dim-flash impulse response of human cones is very fast, with responses to an 11-ms flash peaking in about 20 ms and returning to baseline in about 100 ms. Additional single-cell recordings from ground squirrel cones provided curves having profiles and time courses that were almost identical to the human results. It is not clear why van Hateren and Lamb (2006) and Nikonov et al. (2006) found only monophasic responses, whereas Schnapf et al. (1990) recorded responses that were biphasic.

Perceptual mechanisms – visible persistence

The physiological studies affirm that a very brief stimulus can elicit a sustained photoreceptor impulse response. That sustained activity could provide the perceptual states needed for recognition, some portion of that activity producing visible persistence. Various methods have been used to determine how long the post-stimulus activity remains visible. Efron (1970) displayed an orange disk of light, shown about 3 arc degrees (°) from a fixation point, with luminance of 2.6 fL and with durations ranging from 10 to 500 ms. He asked respondents to adjust the timing on a supplementary marker to be simultaneous with the end of the visible sensation. He reported that a visual stimulus with a duration of 130 ms or more visibly persisted for a fixed amount of time – about 100 ms. For a stimulus of less than 130 ms the persistence was inversely related to the duration of the stimulus.

Haber and Standing (1969) provided flash sequences and assessed the separation interval at which the perception was fused to appear as continuous stimulation. With the stimulus (a circle) displayed for 15 ms, they reported the visible persistence of about 300 ms. Long (1980) displayed letters and small photos at foveal and parafoveal locations using a 50-ms duration. He found background and target luminance to be major factors in the duration of visible persistence, which questions the utility of specifying a single value. Others have found stimulus intensity and background intensity to be important variables, though results of numerous experiments have yielded diverse and even opposite results (see Coltheart, 1980; Di Lollo & Bischof, 1995).

Prior results from this laboratory are most relevant to the current results. Greene (2007a) examined shape recognition with very brief LED flashes, but also included a direct measure of the duration of visibility with a task that was inspired by Newton (1730/1952), with initial experimental results provided by D’Arcy (1765, reported by Allen, 1926). The task flashed successive pairs of dots comprising two columns of an LED array, each pair being displayed for 100 μs, followed with some delay after flash offset by activation of the next pair. The flash sequence passed from the top of the array to the bottom, and then back to begin again at the top as a continuous loop. Because of visible persistence a zone of the array appeared to be continuously lighted, so the respondent perceived an unbroken bar of light moving from top to bottom as successive strokes. Respondents had control of the time interval between successive flashes, with changes of that interval serving to shorten or lengthen that bar. They were instructed to adjust the bar length until it appeared to span half of the full height of the array, and from that judgment the duration of visible persistence could be calculated. Mean duration of visible persistence was 88, 111, and 122 ms with room illumination that was bright, dim, and dark, respectively. Ambient illumination in the present work was the same as the dim condition of the earlier study. We may reasonably infer, therefore, that letter recognition in the first experiment was based on the pattern being visible for up to about 100 ms. The second experiment contributes to that conclusion.

At least for the present experimental conditions, the duration of visible persistence is about 100 ms, which is a reasonable match to the duration of the photoreceptor impulse response. Additional considerations will be considered below.

Two-pulse summation of near-threshold intensity

One can use a “two-pulse protocol” to assess the duration of persistence. Each pulse of the sequence can provide a stimulus that is relatively ineffective, producing only a small amount of neural and perceptual activation. If this weak stimulus is presented twice with very little time between each pulse, the combined strength of the influence is increased, which can be described as summation. Greene (2014) found that shape patterns that were flashed twice, each being displayed at a near-threshold level of intensity, would produce hit rates that were above a one-flash baseline with flash-separations up to 100 ms.

Experiment 2 examined two-pulse summation with displays that were at near-threshold levels of intensity. A single near-threshold flash that produced minimal recognition in Experiment 1 provided a baseline condition for judging the duration of persistence. The subject and group for the one- and two-pulse models are plotted in Fig. 4. The hit rate was quite high for the two-pulse models (colored in pink) when the temporal separation between flashes was only 10 ms; this was followed by a monotonic decline in letter recognition.

Statistical evaluation showed the initial decline in hit rate, evaluated up to an arbitrary break point of 65 ms, was significant (log-odds slope = −0.05, SE = 0.0052, p < 0.0001. The decline continued at a lower rate for up to 100 ms (log-odds slope = −0.0165, SE = 0.007, p = 0.026). The average hit rate at 100 ms was no greater than the single-flash hit rate (log-odds = −0.61, SE = 0.31, p = 0.055), indicating that neural persistence from the first flash lasts no longer than 100 ms. Between-respondent variability showed significant differences in the shapes of individual response curves (p = 0.0088).

It is clear that the one-pulse (single flash) control display was very weak at eliciting recognition, providing for identification of only 9 % of the letters where chance performance is 4 %. At this intensity the observer may see only partial zones within the pattern, and on many trials the display is simply not seen. Letter patterns that were displayed twice with no delay were readily seen, providing for recognition in this experiment at about 85 %. With an increase in delay between the two pulses the hit rate declined to the one-pulse level in 80–100 ms.

Prior work on two-pulse intensity summation

Granit and Davis (1931) provided some of the earliest data using the two-pulse method. They studied absolute threshold detection wherein the first flash had duration of 11.3 ms and the duration of the second flash was varied as a way to provide quantitative control over its strength. Making the second flash longer, thus delivering more light, allowed the pair to be perceived when the interval between the two was less than 20 ms. This summed influence manifested a monotonic decline over an interval of about 100 ms. A follow-up study found similar results when they varied the intensity of the second flash rather than its duration.

Bouman and van den Brink (1952) provided two-pulse data using red and green flashes against a dark-adapted field. They also reported a monotonic decline of summation across an interval of about 100 ms for each color, irrespective of retinal eccentricity. A subsequent study found that this interval shortened to 70 ms if the flashes were judged against a moderately bright background (van den Brink & Bouman, 1954).

Blackwell (1963) was the first to observe an inhibitory component for two-pulse summation, these displays being done against a high background radiant. Ikeda (1965) followed up on this work. Against a high background radiant, using 12.5-ms flashes of a red spot having a 30 arc° diameter, he observed a monotonic decline to zero detection by 50–75 ms, followed by a rise in hit rate across the next 30–40 ms.

Ikeda (1965) modeled this behavioral data to derive estimates of the underlying activation provided by each flash. His models manifested excitatory and inhibitory components which when combined produced a decline and then a subsequent rise in detection. Herrick (1972) found results with two 5-ms flashes that were very similar to those reported by Ikeda (1965).

Uetsuki and Ikeda (1970) followed up with a greater range of background radiant levels and longer flash separations to better probe the contribution of the inhibitory component. Their results showed clear biphasic responses with high background intensities, demonstrating that the inhibitory phase could extend out to 300 ms. Low adapting backgrounds produced mostly monotonic declines with only minor increases of hit rate after that decline reached a minimum.

Meijer et al. (1978) determined the interval over which two 10-ms flashes of red LEDs could sum to provide for flash detection. Their 5.5 arc° targets (similar to the present dot size) had 100 % detection with 0 ms of separation, with a monotonic decline of hit rate to chance with 100 ms of separation.

Burr and Morrone (1993) used 8-ms flashes, increment/increment as well as increment/decrement of intensity in relation to a baseline. They examined chromatic differentials using sinusoidal gratings that were restricted to a disk. They found that chromatic summation was monophasic, having an integration window of 200–250 ms. Achromatic summation was biphasic. They noted that Kremers et al. (1993) found responses of retinal ganglion cells to be similar to the response properties reported in their psychophysics study.

Uchikawa and Yoshizawa (1993) used a two-pulse protocol wherein the “flash” was the detection of change in a test patch that was surrounded by a black field. The reference “white” of the test patch was an equal mixture of isometric red and green light (630 and 500 nm). Their conditions included chromatic flash combinations and radiant flashes, plus/plus and plus/minus. They found simple monotonic declines to zero in about 100 ms for red+red and green+green, but other pairings produced more complex changes in threshold detection levels. They proposed a formal model that had four separate chromatic and radiant channels for the various combinations of output from red and green sources (presumably cones).

One relevant two-pulse study of retinal physiology might be mentioned. From electroretinogram recordings of human bipolar cells, Cameron and Lam (2012) reported summation of peak positive response to dim blue flashes. The study also found that two-pulse summation occurred across separations up to 100 ms.

Overall, a majority of the prior studies using a two-pulse protocol have found a monotonic decline of simple summation with pulse separations of about 100 ms, some of them reporting a later inhibitory component. Biphasic components were observed with the greatest consistency when the target was seen against a bright background and with the least consistency when the stimulus was dim. For the present work the ambient illumination of the test room was dim, the board itself appeared black, and light emissions from the LEDs were at or not far above threshold levels of intensity. Under these test conditions it seems likely that a negative component of the cone impulse response would be weak or nonexistent.

It is reasonable to infer that the combined stimulus energy of the two pulses made the letters more visible, allowing for successful recognition as a function of pulse separation. At least some and perhaps all of the perceptual state produced by the two-pulse summation can be attributed to visible persistence.

Perceptual mechanisms of information persistence

There is abundant evidence that stimulus information persists for durations longer than 100 ms, indeed that it can be stored indefinitely in a relatively permanent state that we call long-term memory. But we also have evidence for more ephemeral forms of storage that have been called by various names, including iconic memory, short-term visual memory, non-visible persistence, information persistence, and working memory (Baddelely, 1975; Coltheart, 1980; Erwin, 1976; Haber & Standing, 1969; Neisser, 1967; Phillips, 1974). This persistence is generally characterized as having the ability to preserve substantial amounts of information with spatial specificity, as being relatively insensitive to ambient, background, and stimulus intensity, and lasting no longer than 300–500 ms. For present purposes we will follow a popular practice of identifying this kind of persistence as “iconic memory.” Stimulus information lasting longer than half a second will be described as “working memory.” Doubtless working memory has a number of characteristics that differ from iconic memory, but the only factor relevant here is an expectation that the stimulus information manifests persistence for more than half a second.

It is usually assumed that the various stages of information persistence are mediated by cortical mechanisms. Neuronal response latencies from retina to primary visual cortex in monkeys is approximately 40–50 ms, and Inui et al. (2006) recorded electrical potentials that peaked in human V1 in about 30 ms. Activation of other occipital locations is not much longer, and visual signals reach inferotemporal neurons within about 100 ms (Baylis, Rolls & Leonard, 1987; Nowak & Bullier, 1997; Rolls & Deco, 2002). Under normal viewing conditions this transfer is likely taking place while the stimulus is still visible, so one should not view the perceptual states as having segregated temporal domains.

A useful method for assuring that one is studying iconic (or working) memory is to manipulate the information content of the stimulus. For the present work this was done by varying the density of dots in each letter pattern, which alters the relative effectiveness of the resulting pattern for eliciting recognition. One must first establish the quantitative relationship between dot density and recognition, after which one can use low-information patterns with a two-pulse protocol to evaluate the duration of information persistence.

Letter recognition as a function of dot density

Letter patterns of varying density were displayed using a high-salience flash intensity, i.e., bright flashes that could be readily seen. Letter dots were sampled at random, displaying reduced-density patterns that ranged from 4.29 % to 50 % of the dots in a given letter. Figure 5 plots the respondent and group models, each manifesting a simple monotonic rise in hit rate as density levels were increased.

Fig. 5
figure 5

(Experiment 3) Letter recognition increased progressively as the density of the dot pattern was increased. Again, the dashed lines reflect respondent models, the solid line shows the group model, and the solid colored zone specifies the 95 % confidence band for the group model

Up to a chosen break point of 29 % density, the increase in hit rate was significant (log-odds slope = 0.14, SE = 0.011, p < 0.0001). The hit rate continued to increase significantly up to 50 % density (log-odds slope = 0.047, SE = 0.016, p = 0.005). Between-respondent variability showed significant differences in the shapes of individual response curves (p = 0.0065).

The results from Experiment 3 (Fig. 5) clearly show that respondents can identify letters when only a small fraction of the dot pattern is displayed. The fact that exceptionally sparse dot patterns can serve as shape cues is rather remarkable. None of the random low-density samples provided clear boundaries or alignments that one would need to activate orientation-selective neurons. This seems at odds with popular theories about the role of lines and edges as elemental and essential features for defining letters and other shapes, as previously discussed by Greene (2007b). That work manipulated the density of dots in the outer boundary of diverse shapes and respondents were able to identify all of the shapes in the inventory when only 18 % of the dots were displayed. Some shapes could be identified by all the respondents with display of far fewer dots. Recognition was possible when the spacing of dots was too great to effectively activate orientation-selective cells of primary visual cortex, which was one of the reasons that Greene (2007b) questioned the concept that contour attributes serve as elemental building blocks for shape perception. An alternative possibility is that each activated location within the shape pattern contributes to a summary, wherein the essential information consists of the distances among pattern dots or from each dot to a functional center, e.g., a centroid. Examples of how shapes can be summarized using distance information are provided by Greene and Waksman (1987); Waksman and Greene (1988); and Greene (2010).

Two-pulse summation of low-density patterns

Experiment 2 examined two-pulse summation with low-salience displays, i.e., showing the letters with low-intensity flashes. Experiment 4 also used the two-pulse paradigm, but with high-salience displays of complementary low-density letter patterns, the goal being to see across what interval the shape information of the complementary patterns could be combined. Density levels were determined on the basis of individual respondent pretests, finding the density that produced a hit rate of about 0.7 (70 % recognition). This hit rate was chosen because at higher levels the density activation function shown in Fig. 5 becomes asymptotic, making it difficult to see a change from combining low-density subsets. At hit rates below this level the density of subsets becomes so low that judgments of respondents become less consistent, as evidenced from pilot data.

For each presentation of a given letter, two random and non-redundant subsamples were then derived at half that density, i.e., one being used for the first flash and the other for the second. Based on the results of Experiment 3, a single flash with a half-density subsample was expected to produce a hit rate of about 0.45, so recognition as a function of weakening information persistence was expected to decline from 0.70 to 0.45 as the interval between flashes increased. The test results confirmed our expectations.

As can be seen in Fig. 6, there was a decline of initial hit rate that proved to be significant (log-odds slope = -0.0055, SE = 0.0008, p < 0.0001, break point arbitrarily selected at 225 ms). The hit rate did not decline significantly in the interval that followed the break point (log-odds slope = -0.000036, SE = 0.00018, p = 0.86). At 1000 ms the summation of two low-density dot patterns provided for a hit rate that was significantly greater than for a single flash of a low-density pattern (log-odds = − 0.40, SE = 0.20, p = 0.047). Interpretation of this long plateau of recognition above the baseline level will be discussed subsequently. Between-respondent variability estimates reflected quantitatively different individual averages (p = 0.037), but little or no indication of different shapes for the individual response curves.

Fig. 6
figure 6

(Experiment 4) Two complementary low-density letter-patterns, each flashed at an intensity that could be readily seen, summed to elicit higher hit rates than was provided by flashing only one of the low-density patterns (this baseline level being shown by the horizontal yellow line). The information persistence lasted for at least 200 ms. At longer intervals the two-flash models did not drop all the way to the one-flash baseline, which could reflect the contribution of information that has been stored in working memory

The pulses (flashed displays) in Experiment 4 made use of very sparse complementary sub-patterns, each being quite visible but providing few dots, such that a given sub-pattern was relatively ineffective at eliciting letter recognition. The two were effective at eliciting recognition on about 80 % of the trials when the separation between the pulses was short, but recognition declined to an asymptote above the one-pulse level beyond about 200 ms of separation. This is twice the interval across which the low-intensity pulses could sum (see Fig. 4), and almost twice as long as the duration of visible persistence under similar dim room illumination (Greene, 2007a). The task requires integration of stimulus information rather than intensity, so the decline in recognition should not be attributed to visible persistence. The dots were relatively bright and on the basis of stimulus intensity and room lighting one would expect them to be visible for about 100 ms (Greene, 2007a). Nonetheless, the probability of recognition began to drop within tens of milliseconds of pulse separation. Temporal separation may have precluded effective integration (encoding) of the partial cues, which is not quite the same as a decay of icon salience. However, in the absence of more comprehensive data that would justify formulation of an alternative model, it suffices to continue describing the decline as the decay of iconic memory.

After about 200 ms the recognition performance became asymptotic just above the one-pulse control. The fact that the two-pulse asymptote remained above the one-pulse level out to a full second suggests continued summation of information for the two sub-pattern displays, though clearly providing only a weak boost to recognition. This plausibly represents continued persistence of information in working memory, which can preserve information across much longer time spans.

Information persistence in iconic and working memory

Experiment 4 follows on a large body of work that has examined the role of persistence for the integration of complementary stimulus cues (see Coltheart 1980; Di Lollo, 1977; Di Lollo & Dixon, 1988; Neisser, 1967). A report by Eriksen & Collins (1967) is most pertinent. These investigators displayed two-dot patterns that were complementary so that when combined, i.e., displayed simultaneously, the respondent could see a three-letter trigram. Their ability to guess the trigram based on display of only one of the patterns was nil; the density of the dots was simply too low to allow the trigram to be identified. Each of the low-information patterns was displayed for 6 ms. The simultaneous display condition produced a mean identification level of 93 %. With temporal separation of the two displays with intervals up to 100 ms, the performance dropped to about 50 %. The level of recognition had not yet reached asymptote by 100 ms, so they tested the same respondents in a brief follow-up using intervals of 100, 300, and 500 ms. Percent recognition dropped to a minimum by 300 ms, but this level of recognition was still higher than the recognition elicited by a one-flash treatment. They further reported that at the longer durations the respondents were seeing the displays as two distinct flashes.

For Eriksen and Collins (1967) as well as for Experiment 4 of the present report, one is dealing with synthesis of complementary information for each letter being displayed. The time intervals may be quite different where the observer must synthesize across groups of complete letters in order to perceive a word. Forget et al. (2010) flashed two three-letter components of a six-letter word, e.g., B_A_N_ and _R_I_S to form BRAINS. They found effective two-pulse summation that allowed recognition of the whole word out to 80 ms. Between 80 ms and 300 ms they saw a competition between integration and segregation mechanisms that elicited many errors in recognition and other masking effects. Beyond 300 ms the two trigrams were perceived as different stimuli.

Graziano and Sigman (2008) conducted a partial-recall task where respondents were shown an eight-letter array for 106 ms. After intervals ranging from 24 to 1000 ms they were told which cell in the array to recall. The results indicated that after 300 ms, errors were limited to letters that were of similar morphology to the correct stimulus. With an interval of 1000 ms, errors were randomly distributed throughout the alphabet. This might suggest that iconic memory for shape is maintained for intervals out to 300 ms. For statistical purposes we set the break point at 225 ms, but do not see the 300-ms interval claimed by Graziano and Sigman (2008) to be at odds with our results. Differences in task demands could easily account for this difference. Both results are congruent with the inference that summation below 200–300 ms should be attributed to iconic memory.

As the icon is fading, information is being transferred into working memory, which can also be described as visual short-term memory (Irwin, 1991, 1996; Phillips, 1974; Vogel et al., 2001). This memory stage is generally thought to have a limited storage capacity, though Sligte et al. (2008) have provided evidence that a substantial amount of information may be held in a “fragile” state, meaning that it is readily disrupted by post-stimulation events. The limitations in capacity and/or fragility of working memory likely relates to the encoding process that is taking place. In particular, spatial and temporal shape cues must be combined to determine what object is being displayed. That process would require suppression of irrelevant stimuli, but it would also be subject to disruption by such unrelated information.

Phillips (1974) examined iconic and working memory using block patterns, with size of the block array being a major variable and including masking and manipulation of spatial location in the experimental protocols. The performance differentials affirmed working memory as a distinct process that is limited in capacity, is not dependent on spatial coincidence, and is not necessarily subject to masking effects.

Jacob et al. (2013) examined iconic and working memory using a two-pulse protocol that varied in color and shape. They found that persistence of stimulus information was significantly longer when participants were required to compare the two displays and report whether they were same or different. That comparison requires that the stimulus attributes be classified or encoded, so the information must be kept active in working memory while that integration is taking place.

A number of other studies have examined processing of nonverbal information across sequential displays with both detection and identification tasks, e.g., Bouman and van den Brink, 1952; Broadbent and Broadbent 1981; Blackwell, 1963; Purcell and Stewart, 1971; Watson and Nachnias, 1977; Bergen and Wilson, 1985; Jiang, 2004; Kumar and Jiang, 2005. Those who examined interactions beyond the period of iconic memory most often found that the content of later displays impaired discrimination or recall of stimuli in earlier displays. Stimulus information was integrated only under limited task conditions (Brockmole et al. 2002; Jiang et al. 2005). Busey (1998) provided quantitative modeling of inhibitory interactions with successive displays of digits, and Forget et al. (2010) found competition in the perception of trigrams.

One might infer that working memory is intrinsically fragile, as suggested by Sligte et al. (2008), such that information from an initial display is degraded by the image content of subsequent displays. On the other hand, our ability to see stable scenes from successive frames of a movie or video attests to the fact that our visual system can combine successive images that have complementary information. Experiment 4 used stimuli that could be combined in this manner.

The present work used complementary dot patterns that allowed for integration of information rather than disruption. Non-overlap of the dots might be a factor in allowing for synthesis. That cannot be the full reason, however, for a random-dot mask wherein the dots do not overlap a shape-pattern can impair or preclude recognition of the shape (unpublished data). Disruption of processing by successive displays is not obligatory but rather depends on the degree to which the stimuli are complementary and can contribute to a coherent summary. Summation was possible with the present displays because each sparse pattern provided an appropriate link to specific memory stores. If they had not, information from the first pattern would have been destroyed upon arrival of the second pattern.

Further evaluation of locus

The analysis above has assumed that visibility of the stimulus should be attributed to retinal mechanisms. For the persistence being mediated by the retina, many stimulus attributes are thought to be irrelevant to the duration of sustained activity. One should see a similar time course for visible persistence irrespective of whether low- or high-level information is being processed, i.e., the decay rate would likely be the same with dots, oriented bars, or letters.

The duration of persistence observed in Experiment 2 is certainly consistent with the time-course of the photoreceptor impulse response, and the strong role of stimulus and ambient light levels on duration of persistence is consistent with a retinal locus of effect (Coltheart, 1980; Long, 1985; Di Lollo & Bischof, 1995). To this one could add a report by Engel (1970), who formulated a quantitative model that apportioned the contribution of each eye to a stereoscopic perception and differentially registered monocular and binocular persistence. He found the duration of monocular persistence was a function of stimulus energy, but the duration of cortically-based persistence was not.

Meyer et al. (1975) assessed the duration of information persistence by asking when a flickering stimulus was perceived as continuous (following Haber & Standing, 1969). They displayed oriented stimulus arrays and found that prior adaptation to a given orientation decreased the duration of persistence when the test orientations were subsequently presented. A second experiment adapted one eye and tested duration of persistence with the other with the same outcome. This provides compelling evidence that the persistence was being generated at the cortex rather than by retinal mechanisms. One might add that persistence of orientation information has been tested using cued discrimination (Bradley & Pearson, 2012), change detection (Sligte et al. 2008), and flicker search (Rensink, 2014). Neuronal responses to orientation are first manifested in primary visual cortex and are most precise and vigorous at that site (Bair, 2005; Hubel & Wiesel, 1959; Snodderly & Gur, 1995). One might reasonably attribute the persistence of orientation information seen in those studies to this cortical location.

Gur (2015) makes the case that conscious perception is mediated by activity in primary visual cortex, this being for dot patterns as well as for lines and edges. Given that, it would be tempting to attribute the iconic memory effects observed in Experiment 4 to primary visual cortex. One source of caution is the relative absence of evidence of persistent, post-stimulus firing by V1 neurons. The post-stimulus time histograms of neuron firing patterns, adjusted for latency of transmission, typically show the production of spikes concurrent with the time-course of stimulation, then ceasing with termination of the stimulus or before. This can be observed for orientation-selective neurons (Hubel & Wiesel, 1959; Wurtz, 1969) as well as color-sensitive neurons (Livingstone & Hubel, 1984; Gur & Snodderly, 1997). The same is found in recordings from behaving animals as they focus on stimulus displays and execute differential responses as required by the stimulus attributes (Snodderly & Gur, 1995; Snodderly et al., 2001). Albrecht et al. (2002) observed firing patterns in primary visual cortex of macaque and cat in response to gratings displayed for 200 ms. The firing patterns of responding neurons were generally far shorter than the duration of the stimulus. Xing et al. (2012) evaluated V1 of macaque using multiunit spike activity and local field potentials, produced in response to a drifting sinusoidal grating. All recorded responses were stimulus driven and did not persist following offset of the stimulus. Moshe Gur comments that in the many thousands of orientation selective neurons that he has recorded, he has not observed responding that extended much beyond the end of stimulation (personal communication).

Within 40 ms from the beginning of display, the stimulus information has passed well beyond early visual areas (Kirchner et al., 2009). It is possible that sustained neuronal activity providing for persistence of information is taking place in the higher visual centers of occipital and temporal cortices. A number of laboratories have reported sustained post-stimulus firing of neurons in inferotemporal cortex that may persist for several hundred milliseconds (Fuster & Jervey, 1981; Rolls & Tovee, 1994). Further, presenting a new stimulus can terminate the persistent response from a previous stimulus, similar to the information disruption produced by masking (Keysers et al., 2001; Kovacs et al., 1995; Rolls & Tovee, 1994).

Miyashita and Chang (1988) monitored firing of anterior ventral temporal cortex of macaque during a match-to-sample task. Fractal patterns and photos of scenery were displayed for 200 ms, producing a high rate of firing that did not decline during a 16-s delay. Similarly, inferotemporal neurons are found to be active during the delay period when animals must remember an object (Chelazzi et al, 1998; Miller et al., 1993).

Hemodynamic responses from fMRI recordings are generally too slow for assessing iconic memory, but might reflect the longer persistence provided by working memory. Offen et al. (2009) found robust activation of V2, V3 and LO (a portion of the lateral occipital complex) in human observers in a delayed discrimination task. Some activation was seen in V1, V3A/B and hV4. Ferber et al. (2003) found an increased hemodynamic response in LO from movement-triggered persistence that lasted a few seconds, and Wong et al. (2009) were able to produce similar effects by briefly displaying a segmented figure upon a background of random line segments.

Very little of the single unit or fMRI evidence focuses specifically on persistence of information for the first few hundred milliseconds after stimulus termination, so the evidence derived from delayed response tasks may relate more to working memory than to iconic memory. A report by Mukamel et al. (2004) may be an exception. These investigators successively displayed photos of animals, houses, and faces at frequencies of 1 Hz or 4 Hz. In MT/V5 the 4 Hz displays produced twice the amplitude of hemodynamic response as did the 1-Hz sequence, but the increase was only 25 % in occipito-temporal cortex. They interpreted the difference to be due to sustained activation of occipito-temporal cortex due to iconic memory – activity that would survive for 250 ms but not for a full second. Controls were run to counter the argument that the reduced activation might be due to saturation of hemodynamic response.

The sustained single-unit and hemodynamic activity of cortical structures is most often attributed to attentional mechanisms, these being for coordination and control of activity more than for basic stimulus encoding. Our intuition is that the stimulus itself evokes the visible persistence and information persistence that is seen within the first 200 ms. The signal can be transmitted up through successive stages very quickly, and it is entirely possible that activity in temporal lobe structures is mediating information-evoked persistence. Future work will be needed to distinguish these bottom-up mechanisms from attention-related activity.