Recently, interest has been growing in the study of time perception and representation (e.g., Di Luca & Rhodes, 2016; Grondin, 2010; Matthews & Meck, 2016; Shi, Church, & Meck, 2013; Wittmann, 2009). The majority of these studies addressed short intervals, ranging from milliseconds to a few minutes. Despite the relevance of longer time scales (i.e., days, months, or years) to human decision making, only a few studies have investigated how we perceive and use calendar time when making financial decisions (Ray & Bossaerts, 2011; Takahashi, Oono, & Radford, 2008). For instance, many economic intertemporal choices, such as taking a loan, postponing the purchase of a car, choosing among different options for investment, or deciding on withdrawing money from a retirement plan, are made on those longer time scales. Our internal representation of longer time intervals, along with several possible cognitive biases, likely affects how we make related decisions. For instance, when choosing between prospects with different associated delays, humans apply hyperbolic discounting of future rewards (e.g., Frederick, Loewenstein, & O’Donoghue, 2002; Green & Myerson, 2004), instead of the normative exponential discounting (Samuelson, 1937). This leads to irrational choice behavior from a theoretical economic perspective, such as preference reversals. Biases in subjective time scale are hypothesized to explain this inconsistency, along with alternative explanations such as uncertainty about the delivery or reception of the reward (Dasgupta & Maskin, 2005; Sozou, 1998).

Long-range time representation, by which we mean calendar times in the range of days to years, is commonly assessed via a cross-modal matching paradigm, frequently using line length. This approach consists of asking participants to indicate, on a straight line, the perceived length of the duration of different long-range time intervals. It requires the translation of time representation into a spatial representation (the length of a line), based on a presumed isomorphism between the representations of these two dimensions. There is substantial variability in the ways this paradigm is used across different studies. In most versions, the line is presented to the participants with a predefined length. Participants then have to move the mouse cursor along the line to create a segment that best represents how long they perceive a certain time interval to be (Zauberman, Kim, Malkoc, & Bettan, 2009). Another version allows participants to stretch the line to a desired length, unlimited by the size of the screen, by means of a scroll bar (Kim & Zauberman, 2009, 2013). The extremes of the line are typically labeled with reference words, such as “very short” and “very long” (but see Kim & Zauberman, 2009, 2013).

The results of several studies (Han & Takahashi, 2012; Kim & Zauberman, 2009, 2013; Zauberman et al., 2009) have suggested that long-range subjective time follows a nonlinear function (compressed form); that is, participants do not increase the length of their spatial representation proportionally to the increase in the time interval estimated. For example, the estimated spatial (line) length of a 36-month-long interval is only twice as long as the estimated spatial length of a 3-month-long interval (Zauberman et al., 2009). A natural way to approach both accelerating and decelerating psychophysical functions is by fitting them through a power function. Power functions have been used to describe the perception of several physical quantities since the early days of experimental psychology (Stevens, 1957, 1961). Stevens’s power law maps the intensity of physical stimulus M to subjective magnitude ψ(M). The power function ψ(M) = αM β used toward this end has two parameters, where α is a multiplicative constant that depends on the measurement units, and the exponent β changes from one stimulus attribute to the next, determining the shape of the psychophysical function (e.g., negatively [β < 1], positively [β > 1], and proportionally [β = 1] accelerating curves). The resultant psychophysical functions become straight lines with a slope equal to β when they are expressed on a log–log scale. Over a large number of studies, Stevens identified how subjective intensity increases as a function of physical intensity (namely, the β parameter) for different attributes.

Although power functions were also found to account for subjective time (Eisler, 1976, with exponents lower than but close to 1; Grondin & Laflamme, 2015, with exponents higher than but close to 1), the perception of short intervals in the range of seconds is better described by linear, or quasilinear, functions (e.g. Allan, 1983; Wearden & Jones, 2007). Here we approached the issue by first reevaluating how well a linear model explains the long-range timing data, in comparison to power function models. We also evaluated how robust these results are across different experiments through key procedural variations.

Method

Participants

Thirty-five healthy undergraduate students (ages 19–29 years, mean 21.9; 19 women, 16 men) participated as volunteers. Each participant was assigned randomly to one of five experimental groups; thus, each group was constituted of seven participants. The experimental protocol for Groups I–IV followed the conventional line paradigm (Zauberman et al., 2009), whereas in Group V a modified version was used. Prior to the beginning of the task, participants were told that they would participate in a time perception experiment and provided written consent for their participation. All experimental protocols were approved by the Research Ethics Committee at the Federal University of ABC.

Stimuli, apparatus, and procedures

Group I Participants were seated in an isolated laboratory room, 70 cm from a computer monitor. The following instructions were presented on screen, in Portuguese: “In this study you will be asked to indicate your subjective feeling of duration between today and many days in the future. Time intervals may vary between 3 and 36 months. Please read the instructions carefully and indicate your response.”Footnote 1 After confirming that the participant had understood these instructions, the following text was displayed on the upper part of the screen: “Imagine the interval below. Move the bar to indicate how long you consider the duration between today and the given interval to be.”Footnote 2 The time interval in months was presented below these instructions and was chosen pseudorandomly from the set {3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36}, with the restriction that, by the end of the session, the number of trials with each interval was the same. Below the numeric time interval, a 180-mm line (681 pixels) was presented with the labels “Very short” and “Very long” placed at the left and right line extremes, respectively. The initial position of the mouse cursor was always at the center of the line. Participants had to move the cursor to the right or left to arrive at the desired segment length and click the left mouse button to confirm their response. The maximum response window was 10 s, after which a new trial was initiated. Each of the 12 time intervals was presented five times, totaling 60 trials per session block. Four training trials with random selection of intervals were presented in the beginning of the task to familiarize participants to the procedure and were not included in the analyses.

Groups II, III, and IV The experimental designs for these groups allowed for evaluation of the importance of the reference labels at the extremes of the line to performance. The “Very short” and “Very long” labels were replaced by “Today” and “36 months,” for Group II, and “Today” and “10 years,” for Group III. For the participants in Group IV, the lines had no labels.

Group V The participants in Group V were explicitly asked to imagine how long each time interval lasted, with the following instructions presented on screen in Portuguese:

Please read carefully the instructions. In this experiment, you will be presented with time intervals. Imagine how long each interval lasts. The beginning and end will be signaled with a cross in the middle of the screen. Next, a line will be presented. The left extreme represents a short duration time interval. The right extreme represents a long duration time interval. Mark on the line, with a finger touch, the length that corresponds to your sensation in relation to the duration of the time interval.Footnote 3

A fixation cross then appeared for 2 s in the middle of the screen, to improve attentional focus and reduce trial-to-trial variability. Next, the time interval was presented on the center of the screen for 4 s, followed by another 2-s presentation of the fixation cross. Finally, the line appeared in a random location around the center of the screen (randomized coordinates, with a radius distance of 100 pixels from the center of the screen). Randomization of the line position was introduced due to the possibility that participants might memorize the previous line and response positions and use them to guide subsequent responses, hence reducing the independency of the trials. For Group V, the lines were presented with no reference labels. Figure 1 illustrates the general procedure.

Fig. 1
figure 1

Experimental procedures. (A) Instructions given to the participants in Group I. The same overall design was used for Groups II–IV, but with variations on the labels at the extreme ends of the line (see the text for details). (B) Sequence of events for Group V. The main differences from the procedure used in the other groups were the presentation of events on different screens, the presentation of a fixation point, and the presentation of the response line at a randomized location in each trial.

Analyses

All responses (line lengths) R M were transformed into month units—that is, R M /MU, where R M is the individual average line length response generated for M months, and a month unit MU was defined as R 18/18. A reference M = 18 was used because the coefficient of variation was relatively low for stimuli at and above 18 months. The data gathered were submitted to different types of analyses, to evaluate the different quantitative models at the level of averaged and individual data. Linear and power models were fit.

We compared the performance of a simple linear approach to one in line with nonlinear behavior using the nonnormalized data. Specifically, we used the power function R = αM β, with R being the nonstandardized rating by the respondents, and M being the interval to be evaluated, expressed in months. A constraint of β = 1 is equivalent to a no-intercept linear model. Results with 0 < β < 1 indicate deceleration of the ratings with increasing time, which is typically reported in the literature based on aggregate data and is numerically similar to a Weber–Fechner psychophysical relationship. Results with β > 1 indicate acceleration as a function of time. Whereas the β metric quantifies the downward or upward curvature of the psychophysical metric function, the α metric is a simple scaling parameter that varies according to how participants use the rating scale. It is specifically related to the upper limit of the lengths used across the range of the independent variable. Estimates of α and β are bound to be inversely correlated across the range of the independent variable, but this is a numerical rather than a conceptual issue.

The power function, as given in Stevens (1961), includes an additional parameter to capture an “offset” term, which he called the effective threshold. We assumed the offset to be zero in order to make comparisons with other published models (usually with two free parameters) easier. The offset is also more meaningful in the original context of sensory coding, where it indicates the effective sensory threshold; it is not clear what it would mean in symbolically represented input information, as is the case for the experiment reported here.

Comparisons of Bayesian information criteria (BICs) and log-likelihood ratio tests were used to evaluate the extents to which participants had a compressed subjective experience as a function time (β < 1), a near-linear view of time (β ≈ 1), or an accelerating view of time (β > 1). Note that we do not advocate the hypothesis that people cluster in three distinct groups, but rather that this nonlinearity coefficient might have a continuous distribution across the population, rather than a fixed value of 1, possibly similar to a normal distribution. The BIC comparisons and log-likelihood tests merely allowed us to identify for which participants the hypothesis that responses were linear could be rejected with some amount of statistical confidence.

In all models, unless mentioned otherwise, a zero-value intercept constant was assumed. All analyses were programmed in the R programming environment with the nlme package, to obtain maximum-likelihood fits of the nonlinear models (Pinheiro et al., 2016).

Results

Linear model

Figure 2A shows the lengths (in pixels) of the generated line segments for each long-range interval, averaged across all trials for each of the seven participants in Group I. Since there was considerable variability across participants, responses were normalized through a transformation to month units, as described above. This normalization greatly reduced the variability across participants (Fig. 2B), which suggests that, although individuals differed in their criteria, their psychophysical functions for time representation were similar. A linear regression analysis showed that a linear function, fitted to the averaged normalized data, explained over 99% of the variance (Fig. 2C). Figure 3 shows that the normalization absorbed the differences between the other conditions and Group III, who had the instruction to consider a 10-year horizon, which prompted people to adopt shorter line lengths for corresponding intervals than in the other groups.

Fig. 2
figure 2

Time interval estimations of Group I. (A) Average lengths of responses (line segments, in pixels) for the seven participants in Group I. (B) Normalized average responses for each participant. (C) Average responses across participants, with the best linear function superimposed (error bars represent SEMs).

Fig. 3
figure 3

Comparison of the normalized time interval estimations from the different experimental groups (I to V). The data points are averages across all group participants.

The same analysis was carried out with the data from all five experimental groups. A comparison of the averaged responses across participants shows only minor differences between the groups (Fig. 3). In all cases, the fitted linear functions explained more than 98% of the variance in the data. Thus, at the level of aggregated data, a simple linear model provided excellent absolute fits and did not support the suggestion of subjective perceptual compression of calendar time.

Linear model versus nonlinear model

We first applied a standard simple linear regression and a power model fit to the aggregated data (across groups), obtained by averaging the ratings, in normalized subjective months, across volunteers per time interval. The linear regression with an intercept (intercept = –1.23 ± 0.266, slope = 1.10 ± 0.012) outperformed the power model with the same number of parameters (α = 0.823 ± 0.048, β = 1.07 ± 0.018) for the aggregated data, with BICs of 19.22 versus 20.22. Note that a lower BIC indicates a better model, and that a BIC difference of less than 2 is considered “not worth more than a bare mention” (Kass & Raftery, 1995). If participants, at a group level, were using a function of subjective time experience that was compressed, we would expect an estimate of β < 1. We can see that this is not the case: The better fit of the linear model and the estimate of β larger than—and not significantly different from—unity shows no evidence of the compression of time.

The model applied to aggregated data masks important variability between individuals. A subgroup of the participants showed upwardly curving psychophysical functions, whereas others showed a downwardly curving trend in their ratings as a function of time interval, and yet other participants seemed to map months almost perfectly linearly onto the visual scale. The averaging of individual scores that themselves followed nonlinear power laws might produce a linear function at the group level, depending on the specific sample distributions of the α and β parameters. The effect of averaging across observers before fitting a model is often overlooked, but it can give misleading results (Estes, 1956; Gallistel, Fairhurst, & Balsam, 2004). It is, of course, not possible to obtain individual psychophysical functions in experimental designs that do not rely on repeated measurements, which is the general case of studies in long-range time representation (Han & Takahashi, 2012; Kim & Zauberman, 2009, 2013; Zauberman et al., 2009). To investigate the extent to which participants varied in their psychophysical functions, we used two complementary approaches. First, the power function was fit to the data from individual observers, and point estimates of α and β were obtained. Variation in α should be natural and is what is generally accounted for by standardization, but variation in β around the unity reference value should reveal whether observers had a compressed or an accelerated view of time at long time scales. In a second approach, variations in α and β values across volunteers were parametrically modeled and incorporated in a nonlinear mixed model.

The linear model had a higher log-likelihood for 14 out of the 35 participants, whereas a power model provided the best fit for 21 participants. Out of these 21, 12 participants yielded estimates of β > 1, consistent with acceleration in their respective psychophysical functions, whereas only nine yielded estimates of β < 1, the latter being in line with the compression-of-time hypothesis. The estimates of α and β were correlated (Fig. 4), as expected, but this fact does not bear on whether the participants were represented better by accelerating or decelerating psychophysical functions. Clearly, most β estimates were above 1.

Fig. 4
figure 4

Point estimates of the α and β parameters in Stevens’s power law [ψ(t) = αt β] for all volunteers in the five experimental groups. The dashed horizontal reference line represents linearity (power estimates above the line support an accelerating curvature, and values below the line support a decelerating curvature). The insets present results from the participants in experimental Group I (the arrows indicate their respective β and α function values).

Note that, because the transformation to subjective month units was linear (though with scale parameters for each individual), the model with the best fit for each participant was independent of whether the raw line lengths or the subjective month-unit scale was used. The values of the estimated slope parameters, represented by α in the power function, would be different, however. One notices, in Fig. 4, that Group III, in which participants were told to consider the end of the line as representing 10 years, shows lower estimates for α than do the other groups when the raw length scores are used. This is to be expected: If the line has to span 120 months rather than 36, the subjective months-unit references in Group III should have a length that is lower than the references in the other groups. In the next paragraph, we test this formally by using mixed models. The β parameter estimates, however, which respond to nonlinearity, are not affected by scaling. A histogram, Gaussian fit, and kernel density estimate are shown in Fig. 5. Out of the 35 participants, the null hypothesis β = 1 was rejected for 20 (F test, df = 1, p < .05). Although this drops to 13 when a Bonferroni–Holm correction is applied, it points clearly toward a wide distribution of this nonlinearity parameter across volunteers. Figure 5 shows that a normal distribution across the population is a useful approximation. Both Kolmogorov–Smirnov and Shapiro–Wilk tests of normality showed that the Gaussian distribution fit the distribution of βs excellently (ps > .50 for both tests).

Fig. 5
figure 5

Density of β estimates across the sample. The solid lines form a traditional histogram with 0.1-width bins, the dashed line is the best-fitting normal distribution (mean 1.070, standard deviation 0.200), and the dotted line is the kernel density estimate—see the legend. In this density estimation method, the bandwidth is determined in a data-driven fashion, after Sheather and Jones (1991). In this case, the bandwidth value thus produced, and used in this graph, was 0.1059.

By applying a mixed nonlinear model, the observations made above were quantified in the form of the estimated population mean and standard deviation of the distribution of α and β values. Since the volunteers participated in groups with different scaling instructions, which particularly affected the α estimates for Group III (10-year horizon) when using line length as the dependent variable, we included Group as an additional fixed-effect factor. Specifically, indicator variables were used to identify which instruction manipulation each of the volunteers pertained to. Likelihood ratio tests for hierarchical models were used in order to identify whether the group variable affected α or β, for pixel lengths as well as for the normalized subjective month scale.

According to the Akaike information criterion (AIC) and likelihood ratio (LR) tests, for raw pixel length, group significantly affected α (LR 20.49, df = 4, p = .0004). To verify whether Group III alone was responsible for this significant likelihood ratio test, we constructed a model in which a single indicator variable was used to code whether participants did or did not belong to Group III. The results of these model fits showed that this was, indeed, the case: There was no significant decrease in fit from the substitution of all previous dummy indicator variables by this single one (LR 2.58, df = 3, p = .46), but including this single indicator variable yielded a significantly better fit, judging from both information criteria and the likelihood ratio test (LR 17.91, df = 1, p < .0001). This sequence indicates that Group III alone was responsible for the significant effect of α when pixel line length was used.

In a similar fashion, the effect of group on β was tested. However, when we took the most parsimonious previous model, in which α was allowed to be different from the value for other groups only for Group III, including estimates for β that varied across groups did not significantly improve the model (LR = 4.05, df = 4, p = .40). This shows that, although the different experimental instructions did change the ways that volunteers—specifically, the group that was requested to observe a 10-year maximum—used the visual scale, the psychophysical functions were not either more or less linear in any group than in the others. The best model for nonnormalized responses had, as its population distribution estimates, a mean of 17.37 and a population standard deviation of 10.62; for Group III specifically, the mean was 11.13 pixels lower. The population mean estimate for β was 1.052, and its standard deviation was 0.146. The standard error of the mean β estimate was 0.028, and a 95% two-sided confidence interval still included 1, so in principle we cannot reject that the population average corresponds to linear treatment of the scale, although we did find a marginally significant p value when testing this (.064). Fixing β at either 1 (linearity) or its best estimate in the mixed model for all participants failed to capture the variability in the data completely. In likelihood ratio testing, there are some caveats for assigning p values to model comparisons when one of the parameters is at boundary. However, with a likelihood ratio of 189.6 and considerable increases in the AIC and BIC due to the constraints, there is no doubt that it is inappropriate to consider β to be fixed in the population. This underscores the importance of understanding individual variation.

Although sex differences in time perception of short intervals have been reported in the literature (e.g., Glicksohn & Hadad, 2012), we did not find systematic differences due to sex in follow-up analyses. The mean β estimates for males and females, respectively, were 1.11 (SD = 0.237) and 1.04 (SD = 0.186). A t test for independent samples on the beta estimates did not come close to reaching statistical significance (p > .30). Neither did age correlate significantly with β, with a Pearson correlation of .10 (p > .50 in a t test). Nonparametric equivalents, as well as an analysis of variance of age group and sex, including an interaction, did not reveal any straightforward relation between age, sex, and long-range time representations.

Discussion

In this study, the line paradigm was used to investigate the relationship between subjective and objective calendar time. A conventional experimental paradigm was used with variations on the reference labels, and a modified version was tested with participants who imagined the durations of different long-range intervals and responded on a line with no reference labels. The group-averaged data gathered from all experiments were described well by a linear function. These results are discrepant from the nonlinear (i.e., compressed) functions that have related subjective to objective calendar time reported in previous studies (Han & Takahashi, 2012; Kim & Zauberman, 2009, 2013; Zauberman et al., 2009). The source of the discrepancy is uncertain, but the experimental setting is not a plausible source, since the demonstrated effects were stable across different conditions and all participants were members of the same Western culture. The experimental designs and statistical analyses of earlier studies are more plausible sources. All previous studies measured the performance of many participants but ignored their individual differences, thus not allowing for estimation of the individual psychophysical functions. However, any claim made regarding subjective time scales necessitates the assessment of temporal judgments regarding multiple targets within an individual participant. In the present study, fewer participants were engaged in the experiments, but repeated measurements allowed for estimates of the individual subjective time representation functions. The analyses revealed significant differences across individuals, demonstrating the amount of information lost by averaging across participants.

Taken together, the results of the best individual model fits with the mixed power model, the conclusion is sobering and unavoidably unspecific: Some participants produced near-linear evaluations of time on time scales up to 3 years; others produced a compressed pattern, as is commonly reported in literature; and yet others, possibly the largest group, produced ratings that as compared to a linear model, accelerated as a function of time. The experimental design had sufficient statistical power to show that the data do not support a general compressed-time model. In fact, the best generalization for the aggregated data was a simple linear function. Interestingly, the reference labels at the extremes of the response line did not affect the subjective functions after normalization at the midpoint. This indicates that humans maintain relatively constant ratios between different subjective magnitudes of the time intervals in different conditions of line labeling.

The near-linearity of the subjective time scales for long intervals provides some insights regarding the representational basis of long time intervals. The mechanisms (e.g., integration) and processing dynamics that are widely implicated for timing short intervals are not directly applicable to intervals in the range of days, months, and years. However, magnitude-based representations of short intervals might still be used for assigning values to the semantic categories of calendar units. The kind of transformation that interfaces between these symbolic and magnitude-based representations appears to preserve proportionality (as unitless quantities) to a large extent. Similar arguments have previously been made regarding the nature of the mapping between numerals and magnitude-based representations of numerosities (e.g., Gallistel & Gelman, 1992), as well as for the mapping between the magnitude-based representations of time and numerosity (e.g., Balci & Gallistel, 2006). Thus, humans might be able to process and operate on long intervals by translating them into a functional representational space that originally constitutes the representational raw material of other quantities.

A theory-of-magnitude approach formulated by Walsh (2003), in fact, assumes that the representations of time, numerosity, and space are part of a generalized magnitude system with overlapping cortical substrates. This would provide similar metric properties for these representations, making it possible to use other dimensions in addition to short time intervals (such as spatial distances) as the metric basis for quantifying calendar units. According to the same rationale, it is also possible that similar results would be observed for long-scale distances (in addition to calendar times)—for instance, those presented in conventional units. It is important to note that one would not expect the individual differences observed in this study to emerge from personal prioritization of one dimension or another as the metric basis for calendar units, since from a formal theoretical perspective these dimensions would be expected to abide by the same representational transformations during both encoding and decoding.

In summary, the results of the present study shed new light on the psychophysical functions that govern long-range time representation. Instead of a highly compressed, biased time representation, the results suggest that, at least on average, people perceive time in a near-linear manner, with considerable individual differences toward slightly compressed or accelerating power functions. This weakens the basis for attributing deviations from exponential discount rates in intertemporal choice to a bias in subjective time (Lucci, 2013; Wittmann & Paulus, 2008), and calls for the reevaluation of results that have suggested otherwise. Given that this work emphasizes individual differences in psychophysical functions, future studies could focus on the reliability of β estimates by testing the same group of participants periodically over a long test period. Finally, it needs to be noted that the age range of the participants in the present study, 17–29, only spanned early adulthood, and the correlation analysis conducted might therefore not capture tendencies across a larger portion of the lifespan. Clearly, to understand the patterns of representational differences across the population, further studies are in order.