Three types of data were obtained from the three phases of the experiment, each with different levels of description of depth, and each requiring somewhat different analysis techniques. Nevertheless, observers’ performance in the individual phases of the experiment could be directly compared, because all of the data referred to the perceptual judgments of interobject distances in the same stimulus spatial configuration. To assess the precisionFootnote 1 of the observers’ judgments measured at the different scales, it was necessary to transform the data collected using ordinal, ratio, and magnitude estimations into a common format. In principle, two methods of transformation were possible. First, the data could be transformed to the weakest shared level of measurement (in this case, an ordinal scale). Such a transformation would provide information regarding the observers’ relative distance judgments. Importantly, however, the informational content of the data at the higher levels of measurement obtained in a ratio distance task and in an absolute magnitude distance task would necessarily be reduced, because the judgments would be treated nonmetrically and only ordinal information about the compared distances would enter into the analysis; the more specific information regarding the magnitude of the perceived differences would not be included. The alternative method of transformation was multidimensional scaling, which would allow metric information to be obtained regarding a perceived spatial layout and its relationship to the actual layout. This transformation, however, might result in a loss of information regarding possible data inconsistencies, in terms of minimality, symmetry, and triangle inequality (Wagner, 2006), as expressed by Kruskal’s stress function.
Ordinal scale of measurement
For the purpose of the data analysis, the distance judgments obtained in the three phases of the experiment were converted to pairwise comparisons: In the ordinal distance task, all pairs of responses from each row (i.e., from each order ranking of ten distances referring to a given reference) were extracted and analyzed separately. The position on the response sheet determined which of the two posts was perceived as being closer to the given reference and which was perceived as being farther—that is, which of two distances was perceived as being shorter. In the ratio distance task, based on the perceived ratios of the particular interobject distances relative to the unit distance, we could infer which of any two distances from the given reference was perceived as being shorter, but without specifying by how much. In the absolute-magnitude distance task, pairs of absolute-magnitude distance judgments were compared to determine which of the two distances was perceived as being shorter, again without specifying the magnitude of the difference. Using these procedures, 495 pairwise comparisons were conducted over all phases of the experiment and scales.Footnote 2 Subsequently, values of 0 or 1 were assigned to particular pairwise comparisons that corresponded to correct or incorrect responses, respectively.
We expected most pairwise comparisons to be correct, because for the majority of the interpost distances the differences were clearly visible (the distance ratios ranged from 1.00 to 5.47, with a median value of 1.53). An increased frequency of erroneous responses would, therefore, indicate the increased level of difficulty of the experimental task. Specifically, in the statistical analysis, the mean rate of erroneous responses and the error variance were calculated. The data for the small-scale and large-scale configurations that were obtained in each experimental session were analyzed separately. Table 1 shows the mean error rates of responses and the mean error variability of the responses calculated for the small-scale and large-scale stimulus configurations and for the ordinal, ratio, and magnitude judgments. The precision and consistency of the relative distance judgments decreased clearly with increased response measurement level.
Table 1 Comparison of the results from the three phases of the experiment evaluated using an ordinal scale
A two-way analysis of variance (ANOVA) performed on the error rates confirmed that the relative distance judgments were significantly different between measurement levels, F(2, 30) = 7.23, p = .03, η
p
2 = .03, with higher precision being evident in the ordinal distance judgments. Specifically, a significant difference was found in precision between the ordinal and absolute-magnitude estimations (Bonferroni, p < .01). In addition, the precisions of the observers’ judgments differed across scales, F(1, 30) = 32.90, p < .001, η
p
2 = .06. The Level × Scale interaction did not attain significance, F(2, 30) = 0.23, p = .79, η
p
2 = .001.
A two-way ANOVA performed on the error variances confirmed that the relative distance judgments were again significantly different between measurement levels, F(2, 2964) = 9.57, p < .001, η
p
2 = .027, with greater consistency being evident in the ordinal distance judgments. Specifically, a significant difference was found in the variances between the ordinal and absolute-magnitude (Bonferroni, p < .001) and between the ratio and absolute-magnitude (Bonferroni, p = .04) estimations. Again, the consistencies of the observers’ judgments differed across scales, F(1, 2964) = 7.69, p < .01, η
p
2 = .002, and the Level × Scale interaction proved nonsignificant, F(2, 2964) = 0.55, p = .58, η
p
2 = .001.
We next considered how the ordinal, ratio, and magnitude distance judgments differed when only pairwise comparisons among the stimuli oriented in mutually perpendicular directions were considered. It has commonly been observed that the in-depth dimension is perceived differently from the frontal dimension; specifically, visual space is perceptually contracted in the in-depth dimension relative to the frontal dimension, as compared with physical space (Foley et al., 2004; Kudoh, 2005; Levin & Haber, 1993; Loomis et al., 1992; Loomis & Philbeck, 1999; Norman et al., 1996; Toye, 1986; Wagner, 1985). Consequently, the in-depth distances must be made larger in order to be perceived as being equal to the frontal distances. We attempted to address the question of whether visual space shows the same degree of anisotropy, regardless of the computational demands of the different experimental tasks. To evaluate the observers’ relative distance judgments of stimuli in different orientations, the interpost intervals were divided into two categories depending on whether their orientation was more frontal or more radial. Radial distance was defined as being within 45° of the line of sight, and frontal distance was defined as being within 45° of the perpendicular, as measured from the observer’s position and the midpoint of the distance in the array. In the statistical analysis, only pairs for which one distance was radial and one was frontal were included. For instance, the pair comprising a frontal distance [1, 5] and a radial distance [1, 7] complied with this rule (see Fig. 2). Conversely, pairs comprising two radial or two frontal distances were excluded from further analysis. Of the 495 pairwise comparisons available, 260 satisfied the above criteria.
The data were fit with a logistic psychometric function with two parameters: threshold and slope. The threshold was determined as the ratio of the radial to the frontal distance required to achieve 50 % correct discriminations. For instance, the value 1.080 means that for a given frontal distance, a radial distance at least 1.08 times as large would be needed for the interval to be perceived as larger. This parameter corresponds to the systematic error. The slope negatively corresponds to consistency; that is, a steeper slope indicates better discrimination ability.
For the small-scale configuration, the threshold values for detecting radial distances as being larger than the frontal distances were 8.0 %, 9.8 %, and 10.6 % for the ordinal, ratio, and magnitude tasks, respectively. For the large-scale configuration, the thresholds for the same tasks were 20.4 %, 25.8 %, and 23.9 %, respectively. That is, although the precision of the observers’ relative distance judgments was generally higher for the ordinal task, considerable compression remained in the in-depth dimension, relative to the frontoparallel distance. This compression was similar to the distortion reported by Loomis et al. (2002) and, interestingly, was much lower than the distortions reported by Wagner (1985), Loomis et al. (1992), Norman et al. (1996), Loomis and Philbeck (1999), and Kudoh (2005). Such a difference, however, is not very surprising, given that both the in-depth and the frontal intervals encompassed a wide range of stimulus orientations. Additionally, the finding of greater compression for the large-scale stimulus configuration is unsurprising when one considers the retinal sizes of the intervals compared at the two scales (see the Discussion and Conclusions section). Our findings are consistent with those reported in previous studies (Baird & Biersdorf, 1967; Loomis et al., 1992; Loomis & Philbeck, 1999; Loomis et al., 2002; Šikl & Šimeček, 2011), which showed that the compression of the radial dimension grows as its distance from the observer increases. A two-way ANOVA performed on the threshold values for the psychometric functions confirmed that the relative distance judgments were not significantly different between measurement levels, F(2, 30) = 0.75, p = .48, η
p
2 = .001. A Bonferroni post hoc test revealed that no pairwise difference was statistically significant. In contrast, the effect of scale was significant, F(1, 30) = 29.58, p < .001, η
p
2 = .004, with this difference being apparent in all three tasks. No significant interaction effect was observed, F(2, 30) = 0.18, p = .84, η
p
2 = .001.
The slope of the psychometric function (which reflects variability between observers) was steeper (i.e., judgments were less variable) for the ordinal estimations than for either the ratio or the absolute-magnitude estimations (see Table 2). A two-way ANOVA performed on the scale values for the fitted psychometric functions revealed a significant effect of measurement level, F(2, 30) = 8.07, p < .01, η
p
2 = .037. Specifically, the pairwise comparisons were significantly different for the ordinal versus the absolute-magnitude (Bonferroni, p = .001) and ratio (Bonferroni, p = .03) estimations. The differences between the small- and large-scale data were also significant, F(1, 30) = 26.35, p < .001, η
p
2 = .06, and were more pronounced for the ordinal task. Moreover, the ANOVA revealed no interaction between the measurement level and scale, F(2, 30) = 2.33, p = .12, η
p
2 = .01.
Table 2 Threshold and slope of the psychometric functions of the ordinal, ratio, and magnitude responses
Metric scale of measurement
It is not surprising that the data obtained in the ratio and absolute-magnitude distance judgment phases could be transformed to the ordinal level of measurement. However, the inverse transformation was also possible. Multidimensional scaling (MDS) enables the transformation of ordinal categorical data to the interval level of measurement (Young & Null, 1978). In principle, the same procedure was used in all three phases of the experiment: Using the raw data in the form of orders, ratios, and absolute magnitudes of the interpost distances, the perceived positions of the posts in each stimulus spatial configuration were calculated and compared with the actual layout within which the observers made their assessments.Footnote 3 That is, the nontransformed data were used in the analysis.
The data were analyzed as row-conditional, square-asymmetric matrices using the ALSCAL algorithm (Young & Harris, 1990). Procrustes analysis (including translation, rotation, and scaling) was then used to match the configurations obtained using MDS to a target configuration. The magnitude of visual space distortion relative to physical space in the individual tasks was measured using a parameter of MDS analysis, the sum of squared errors (SSE). The sum-of-squared-error measure was based on the sum of the squared deviations of the individual data points around the correct physical location. The units are given in meters. In addition, Kruskal’s stress formula I value was calculated as a measure of the goodness of fit between the perceived and actual locations of the stimulus points.
As can be seen in Fig. 3 and Table 3, the precision with which the perceived locations of the objects in the scene matched the true locations decreased as the measurement level increased. On average, for the small-scale configuration, the SSE values were 1.55, 2.53, and 3.28 for the ordinal, ratio, and magnitude tasks, respectively. For the large-scale configuration, the SSEs for the same tasks were 4.56, 8.83, and 11.86, respectively. These values correspond to mean absolute errors of approximately 4, 5, and 6 cm for the small-scale configuration, and approximately 19, 27, and 30 cm for the large-scale configuration. Comparison of these errors with the range of interpost distances presented in the experiment (0.5 to 3.0 m in the small-scale and 1.4 to 8.9 m in the large-scale configuration) indicated that the observers exhibited generally good precision for all types of judgments, particularly for the small-scale stimuli (cf. Levin & Haber, 1993; Toye, 1986). A two-way ANOVA performed on the SSEs yielded a significant effect for measurement level, F(2, 30) = 4.71, p = .02, η
p
2 = .063. Specifically, the pairwise comparisons differed significantly between the ordinal and absolute-magnitude estimations (Bonferroni, p = .01). In contrast, the observers presented similar depth structures between the ordinal and ratio estimations and between the ratio and absolute-magnitude estimations. The difference between the small- and large-scale data was significant, F(1, 30) = 24.44, p < .001, η
p
2 = .164. In contrast, the Level × Scale interaction proved nonsignificant, F(2, 30) = 1.80, p = .18, η
p
2 = .02.
Table 3 Multidimensional-scaling stress (Kruskal I) and the root-mean squared error (RMSE)
The same data pattern was also seen in the consistency of responses. On average, for the small-scale configuration, the Kruskal’s stress values were 0.01, 0.02, and 0.05 for the ordinal, ratio, and magnitude tasks, respectively. For the large-scale configuration, the values for the same tasks were 0.01, 0.02, and 0.06, respectively. A two-way ANOVA performed on the Kruskal’s stress values revealed a significant effect of measurement level, F(2, 30) = 29.08, p < .001, η
p
2 = .243. Specifically, the pairwise comparisons significantly differed between the absolute-magnitude and ordinal estimations (Bonferroni, p < .001) and between the absolute-magnitude and ratio estimations (Bonferroni, p < .001), indicating less consistency in the observers’ judgments for the absolute-magnitude task. Even in this case, however, the values were less than 0.1, which is considered acceptable representation with little risk of misinterpreting the data. No significant main effect of scale emerged, F(1, 30) = 2.84, p = .10, η
p
2 = .012, and there was no significant interaction effect, F(2, 30) = 0.62, p = .55, η
p
2 = .005.