Observation oriented modeling revised from a statistical point of view

Observation Oriented Modeling was proposed to overcome some of the problems in the application of statistical inference methods in the behavioral sciences. In this paper, we refine one part of this approach and show how it is connected to methods that are well known in statistical learning. Specifically, we argue that the Moore-Penrose pseudo inverse is superior to the initial solution from a statistical point of view. With this we also show that Observation Oriented Modeling can indeed be appropriate for some tasks in the analysis of observed data. To provide a practical example, we demonstrate the revised method by analyzing the effect of mindfulness training on attentional processes.

present paper is to further refine OOM, and present a version that is based on well known classification methods. Specifically, we present a matrix algebra computation approach that is different from the one proposed by Grice, and we add an accuracy gain coefficient that is novel to OOM. The paper is structured as follows: First, we describe some problems with current inference practices, and go on to explain why OOM may be able to alleviate some of these problems. Next, we describe the original OOM and our revised version. The final part of the manuscript reports a practical application of OOM in some detail as a reanalysis of published data.

Criticisms on NHST
At least in applied research settings, the dominant paradigm for statistical inference is the technique of NHST, in research practice often understood as a "hybrid" version (Gigerenzer, 2004, p. 591) of the approaches by Fisher (1935) and Neyman and Pearson (1966). Despite its popularity, not only consumers of statistical methods but also statisticians have lamented the pitfalls of the concept (Cohen, 1990;Hubbard & Lindsay, 2008;Meehl, 1978). A thorough yet accessible criticism was brought forward by Lambdin (2012).
The criticism of NHST is twofold. One criticism is that, when applying NSHT it is easy to misinterpret the p-value. Another criticism is that the p-value is ill-suited for science, particularly because it does not address the right questions or is based on unrealistic assumptions (Halsey, Curran-Everett, Vowler, & Drummond, 2015;Wagenmakers, 2007). The American Statistical Association published a statement expressing the view that the p-value as a single number is often "misused", but "can be a useful statistical measure" (Wasserstein and Lazar, 2016, p. 131).
There might be some basis for the criticism that p-values are frequently misunderstood -at least for less statistically trained researchers. What might cause the confusion is the fact that the p-value is based on the tail probability of p(D|H ), with D, the data, summarized in an appropriate, i.e. sufficient test statistic and with H , the null hypothesis, and not, as it is often mistakenly understood, as p(H |D). Gigerenzer (2004), for instance, presented empirical evidence that many research psychologists (including those teaching statistics) fail to interpret the p-value correctly.
The second criticism, that the p-value does not adequately address the more important questions of science, can be approached from several directions. Bayesians claim that scientists are interested in p(H |D) instead of p(D|H ). Further, Bayesians claim that only Bayesian inference can estimate p(H |D) by including "subjective" prior beliefs, as opposed to Frequentists who insist on considering actual data only for inference (Edwards, Lindman, & Savage, 1963).
Furthermore, there are also some doubts about whether the assumptions on which an inference is based (i.e., p(D|H )) hold in typical research scenarios. These assumptions often include the normal distribution or at least uni-modality, and, in bivariate cases, linear or monotone relations, respectively. Quite often it is unclear what the consequences of violations of the assumptions are. An additional problem needs to be considered in some research branches, including most of psychology: Are psychological (i.e., "latent") attributes additive in structure, that is, are they quantitative (Stevens, Volkmann, & Newman, 1937)? Some have expressed serious doubts about whether psychologoical attributes possess an additive structure (Michell, 1997(Michell, , 2000. Also it is well known that even when results achieve statistical significance, real effects are often small. As a result, overconfidence may occur. The p-value does not address relevance: It does not say how big a significant difference is -the problem of "sizeless science" (Ziliak and McCloskey, 2008).
Apart from Bayesian inference, the remedies that have been proposed to address the perceived problems with the pvalue within the Frequentist framework include confidence intervals and effect sizes, to name a couple of concepts (Thompson, 1999;Wagenmakers, 2007). Nevertheless both Frequentist and Bayesian approaches may struggle if the aforementioned assumptions are not met.

Psychological theories are about organisms, not parameters
Many theories in the behavioral sciences are not about means or parameters but are about individual organisms. For example, the so-called by-stander effect in social psychology holds that if an individual is not alone, his or her willingness to help in emergencies will decrease. In other words, victims are more likely to receive assistance when only a single individual witnesses the emergency (Latané & Nida, 1981). Although not uncontested, social inhibition in helping as a stable phenomenon has become common textbook knowledge (Aronson, Wilson, & Sommers, 2015). This theory states that an individual will decrease his or her willingness to help others in certain situations; the theory does not make any assertions about populations means. Thus, inferences about population means might not be suitable for such theories although frequently computed (Levine, Cassidy, Brazier, & Reicher, 2002). In order to evaluate a theory one should rather be interested in knowing for how many individuals the model hold than knowing how likely a mean is under the assumption of randomness.

Observation oriented modeling
Observation Oriented Modeling (OOM) includes a data analytic approach that is aimed at overcoming the aforementioned challenges (Grice, 2011;Grice, Barrett, Schlimgen, & Abramson, 2012). OOM is an exploratory analysis of model-data fit. The most prominent feature of OOM is that it focuses only on counts of observations (e.g. individuals) and not on parameters. Analytically, OOM is based on the cross-tabulation of data.
Epistemologically, OOM is founded on moderate realism, which goes to Aristotle and Thomas Aquinas (Grice, 2011). In contrast to other concepts of causality (e.g., Pearl, 2009), causality is analyzed by checking for the number of observations for which the effect conforms to the cause (i. e., counting and cross-tabulation). The observed pattern should therefore closely match the hypothesized pattern, similar to Ordinal Pattern Analysis (Thorngate & Ma, 2016). It should be noted that theory-driven modeling is important as missing factors can lead to biased results, such as in Simpson's paradox (Simpson, 1951). In sum, OOM may present an approach to circumvent some of the problems attributed to common inference tools as discussed above for some scientific questions. It has been applied and extended in different fields of the behavioral and social sciences (Craig, Varnon, Sokolowski, Wells, & Abramson, 2014;Dinges et al., 2013;Grice, 2015;Grice, Cohn, Ramsey, & Chaney 2015a;Grice, Craig, & Abramson 2015b;Grice, Yepez, Wilson, & Shoda, 2016;Valentine & Buchanan, 2013).

Non-technical overview on OOM
In non-technical terms, OOM is based on matrix algebra approaches that can quantify the degree to which data fit a model by cross-tabulation. Basically, OOM provides a descriptive and visual data summary plus a measure of association based on the cross-tabulation and classification of data. The generalization (i.e., inference to out-of-sample data) of results relies on replication only in OOM. One important method in OOM, primarily exploratory in nature, is called "Binary Procrustes Rotation" (Grice, 2011). A refinement of this method is the focus of the present work. In this context the goal of the proposed transformation is to gauge the similarity of two sets of data. These two sets are the observed behavior pattern y i and the predicted behavior y i . As a data set can be seen as a collection of vectors (i.e., a matrix), we can project one of the data sets onto the other. To find the closest match between y i andŷ i , one has to identify the optimal projection. The degree of proximity is taken as an indicator of model fit.
OOM converts all values to indicator (0/1) values. For example, the variable gender (with the two distinct values female and male) would be converted to two binary variables female and male. Thus, the proximity simplifies to the (relative) number of matches (i.e., the number of "conformed" observations). Grice (2011) referred to matrices of this binary indicator form as "Deep Structure" matrices.
When the measurement scale of the events is continuous or the event space is (very) large, some kind of discretization or binning may be needed (Dougherty, Kohavi, & Sahami, 1995). This transformation into a nominal scale can of course cause information loss -if the data are truely metric. But this preprocessing step can also enhance researchers' understanding of their data (e.g., in terms of the relevance of the values in the model). Szepannek, Luebke, and Weihs 2005 employed a similar idea in a classification context; they spoke of "relevant regions" for pattern recognition.
By converting a variable to indicator variables the problem of whether a variable possesses metric level is circumvented. In addition, assumptions such as a monotonic or linear relationship, uni-modality, or the absence of outliers are no longer needed. In OOM, statistical reasoning is based on the observations, and their model fit. In contrast, probability distribution and parameters thereof are of no concern in OOM, thus avoiding overconfidence in the explanation or prediction of individual observations by significant parameters. Of course, model fit plays an important role in OOM. For this purpose, Grice (2011) proposed a number of coefficients. The most important is the "Percent Correct Classification" (PCC) statistic. The PCC is the percentage of correctly assigned target observations (i.e., the relative number of observations where predicted valuesŷ i equal observed values y i )-just like the success rate for accuracy in machine learning (Labatut and Cherifi, 2012): Note that in contrast to p-values, the PCC is not a function of sample size n, but it gets more stable with higher sample sizes, e.g. applying resampling techniques or when seeing PCC as a realization of a random variable.
To estimate the probability of encountering a certain PCC by chance alone, permutation or cross-validation statistics can be computed. Grice (2011) reported a "chance value" c, which is the probability of achieving an equally high (or even higher) accuracy (PCC) calculated on the basis of a permutation of the data, comparable to randomization tests (Edgington and Onghena, 2007). In order to avoid overconfidence or over-fitting, cross-validation techniques for calculating the PCC can be applied (Efron & Gong, 1983), especially in the case of small samples or complex models.

Data analysis in the original OOM
In this section, we provide a more formal explanation of what Grice (2011) called "Binary Procrustes Rotation", as outlined in the previous section. For n observations, let x i be the value (the event in probability theory) of some effect for observation i, which can take one of L different values, so x i ∈ 1, 2, . . . , L. The Deep Structure of x is then a n × L matrix X with Note that within this framework the effect is endogenous (i.e., it is a dependent variable) but represented by x. Similarly, let y i be the value of some hypothesized cause (exogenous, independent variable) for observation i. y i can take one of K different values, so y i ∈ 1, 2, . . . , K. The Deep Structure of y is then an n × K indicator matrix Y . In order to make Y "conform" to X, Grice (2011) employed a transformation matrix T such that This L×K matrix is a cross-tabulation of effects and causes. Let P be the resulting matrix after the normalization of the transformation matrix T , where with U ∈ R L×L and V ∈ R K×K being diagonal matrices that are necessary for normalization. For both for U and V , the non-zero elements u ll , and v kk , respectively, refer to the inverse of the sum of the squared elements of the lth column and kth row, respectively, of T . Further, let be the rotated and normalized conforming matrix (i.e., the matrix that is to be compared with Y ); Grice (2011) denoted Y as Y C .

Model fit in OOM
Visually, a researcher might check how many observations (i.e., rows) ofŶ match the conforming matrix Y (see example below). More precisely, predicted valuesŷ i can be compared with y i , whereŷ i = arg maxŷ i1 ,ŷ i2 , . . .ŷ iK . By cross-tabulating x and the classification results ofŷ (either correct, false, or ambiguous) one can eyeball the support for the model (i.e., visualize the number of observations for which the model holds, cf. Fig. 1). This diagram is called multigram in OOM (Grice, 2011). A further statistic in OOM is the Classification Strength Index (CSI), proposed by Grice (2011). The CSI of observation i is designed as the highest value of row i inŶ (i.e., maxŷ i· ). The mean CSI can therefore be used as an indicator of the extent to which the procedure is clearly able to assign the observations to an event (i.e., certainty within the model).

Revised version of OOM
We propose that the data analysis in OOM can be seen as a statistical classification approach that uses indicator matrices in a regression framework (Hastie, Tibshirani, & Friedman, 2009). Thus, one aims to minimize Our main contribution to OOM is that we propose that P (in Eq. 5) should be computed by using the Moore-Penrose pseudo inverse, X + (Ben-Israel and Greville, 2003). With some matrix calculations, it can be shown that in the case of indicator matrices the calculation of the Moore-Penrose pseudo inverse simplifies to Here, ∈ R L×L is a diagonal matrix that contains the inverse of the event frequencies of x: :e l s e .
Instead of using a one-dimensional vector x, our revised version of OOM uses an L−dimension indicator matrix X, and thus circumvents the problem of masking (Hastie et al., 2009), which can occur in the case of the regression of indicator matrices Y . Thus, in contrast to Grice (2011) method, we propose that Eq. 5 be optimized by in OOM. The results of Eq. 8 are identical to those obtained by using the "normalizing the conforming variable only" function in the OOM software (Grice, 2017). By using the Moore-Penrose pseudo inverse in our adapted approach to OOM, Eq. 8 differs from the initial normalization procedure in Eq. 3. One advantage of the present approach is that the generalized pseudo inverse, such as the Moore-Penrose pseudo inverse, is the optimal solution to Eq. 5 (Ben-Israel and Greville, 2003). Moreover, P can now be seen as a classifier under the diagnostic paradigm (Dawid, 1976); note that in Eq. 2 (i.e., the number number of co-occurrences of x = i and y = j , Grice (2011)). This is just the cross-tabulation of data. Hence is the conditional probability P (y = j |x = i), soŶ = XP contains the conditional probabilities of y k = j given x k = i for sample data (i.e,. the conditional relative frequencies). So one can see that the revised version of OOM is a naive Bayes classifier, where the conditional probability is non-parametric and based on the empirical events (or binned versions thereof). We propose that two issues regarding PCC calculation demand attention. First, should observations which fit two or more classes equally well (ambiguous observations) be counted as success or as error? Our PCC statistic reports the overall correct classification rate when ambiguous classification are treated as success. In contrast, our PCC amb statistic counts ambiguous classification as not matching. Second, which class should be picked if two classes match equally well? In other words: How should ties be broken? Two possibilities are to 1) break ties randomly or to 2) match the observation to the most frequent class. Our algorithm allows for both tie breaking methods; in this paper, we demonstrate method 2.
Furthermore, we propose that the accuracy of a model should be judged against some baseline criterion. A useful baseline criterion is a model that always predicts the category for which the relative frequency of the predicted event is highest. The resulting coefficient may be called the (accuracy) gain, which is comparable to performance measures in machine learning. Thus, the gain of the model is calculated as Thus, the gain is defined as the improvement in the classification of y by including x in the model. Of course it should be noted that finer data-analytic techniques exist for the analysis of cross-tabulations of events, such as association measures or independence tests (cf.  Agresti, 2003;Stemmler, 2014). The same is true for classifying y ∈ {1, 2, . . . , K} on the basis of x (i.e., supervised classification methods, cf. Hastie et al. (2009)). However, the different epistemological approach of OOM should be remembered. In OOM, the focus of interest lies in the empirical pattern of the frequency of observations. OOM considers the question of how many observations fit, or, as Gosset put it to Pearson (1939, p. 247) (. . . ) obviously the important thing in such is to have a low real error, not to have a "significant" result (. . . ).
In sum, our refined OOM approach is optimal in terms of the regression of indicator matrices and can be seen as a naive Bayes classifier, methods that are well known in statistical learning (Hastie et al., 2009). Unless stated otherwise, we have used this revised version of OOM throughout the paper.

OOM example in mindfulness research
In the first part of the article, we presented the extended and revised version of OOM, and explained its use. In the remainder, we present an empirical example to demonstrate an OOM analysis. To this end, we reanalyzed a published study (Sauer et al., 2012). The study tested whether mindfulness training would be found to be associated with an extension of the subjective now. Participants engaging in regular mindfulness training were expected to indicate a longer subjective "now", according to the model. The initial analysis was based on testing for statistically significant differences and found some evidence for an effect of mindfulness on the extension of the subjective now. However, our OOM analysis presents a more nuanced picture with less evidence for the effect in question.

Mindfulness and time perception
Mindfulness has been defined as focusing attention on present experiences, both internal (e.g., thoughts), and external (e.g., visual impressions) (Kabat-Zinn, 2003). This attentional focus is accompanied by a stance of nonjudgmental awareness with regard to a stream of consciousness perception (Sauer et al., 2013). Mindfulness can be conceptualized as an inherent human capacity, and it can be trained (Sauer et al., 2013). In a similar vein, "informal" mindfulness has been shown to reduce cognitive bias in a randomized trial (Williams, Teasdale, Segal, & Soulsby, 2000). Of course, during "formal" mindfulness practice (e.g. meditation), the effects of mindfulness should be stronger. Some interesting data that support this claim have been published. For example, Carter et al. (2005) found that during concentrative meditation, ocular rivalry was reduced in mindfulness experts.
Research on mindfulness has received a surge in interest in recent years, with a rising number of publications. A growing body of research suggests its effectiveness in alleviating a wide number of health conditions including stress, anxiety, depression, pain, high blood pressure, skin diseases, eating disorder, and substance abuse, to name a few (Chiesa & Serretti, 2014;Godfrey, Gallo, & Afari, 2014;Gu, Strauss, Bond, & Cavanagh, 2015: Hofmann, Sawyer, Witt, & Oh, 2010Keng, Smoski, & Robins, 2011;Piet & Hougaard, 2011). It is interesting that, from a neurophysiological point of view, it appears that not only does mindfulness affect attentional resources (Slagter et al., 2007), but it has also been shown to be related with altered brain structures. For example, Gard, Hölzel, and Lazar (2014) investigated whether mindfulness could lessen the impact of age-related neurological decay and found some evidence to support this. Further, a study by Hölzel et al. (2011) suggest that mindfulness training may build up grey matter in areas that are associated with attention and emotion regulation.
Another strand of research links mindfulness to time perception. As the practice of mindfulness consists of staying as long as possible in the present moment, it appears plausible to hypothesize that mindfulness training may also prolong the "subjective now" (Wassenhove, Wittmann, Craig, & Paulus, 2011). Investigating human time perception is crucial for fostering the understanding of perceptual and cognitive processes. It should be stressed that individual temporal perception is not isomorphic to physical time duration (Wassenhove et al., 2011). From a practical point of view, time perception may be of interest, as the temporal borders of time perception limit the human capacity to perceive "zeitgestalts". Thus, extending the borders of temporal perception may have implications for working memory or perhaps enable the perception of previously imperceptible precepts. On this basis, Sauer et al. (2012) argued that regular mindfulness training should be associated with a longer subjective now (as operationalized below) compared with a control group of mindfulness-naive laypersons.

Method Participants
The sample consisted of 38 mindfulness meditation experts (21 female, 17 male), and an age-matched group of 38, recruited in the UK, Germany, and Switzerland. Meditators were trained in Buddhist meditation techniques. The meditation group was recruited from two sources, first, a Tibetan Buddhist meditation center in wich all meditators practiced various meditation exercises including concentration and compassion meditation ("Tibet group", n = 16), and second, a Zen meditation group focusing on concentration meditation ("Zen group", n = 22). The mean age was 51 years in each group. Inclusion criteria consisted of at least 5 years of daily meditation practice for the meditation group, and no meditation experience for the control group. Thus, high selection standards were applied.

Materials and procedure
Among other measures (see the original publication (Sauer et al., 2012) for details), individuals were instructed to watch a Necker Cube, a basic form of a bistabile image (cf. Fig. 2). The most prominent feature of the Necker cube is that perception changes after some amount of time, irrespective to resistance of the individual. After some seconds, the cube "falls" from one visual perspective, say B, to the other (i.e., C).
In block 1, each individual was instructed to watch the computerized image of the Necker cube, and indicate any change in perspective upon occurrence by pressing a button on the keyboard. In block 2, individuals were instructed to hold the perspective for as long as possible. Duration of each block was 3 min (see the additional materials, S2, Fig. 3 for an illustration).

Hypotheses and data analysis
On the basis of the theoretical account on mindfulness and time perception presented above, we posit the following In block 2 (right panel), participants were instructed to hold their perception constant for as long as possible. Individuals with a smaller number of reversals occurred most often in the Zen group, followed by the control group, and then by the Tibet group. This was true for both Blocks, whereas the differences between the groups were more pronounced in block 1. The overall number of reversals decreased from block 1 to 2. Red triangles indicate medians. In the Zen group, the median number of reversals was lower than in the other two groups. This was true in both Blocks hypotheses (H). Mindfulness experts will report fewer reversals in perceptions of the Necker cube compared with the laypersons in block 1 (H1), and in block 2 (H2). Reversals refer here to a change in the person's perception of the Necker cube (from Position B to C or vice versa in Fig. 2). H3 expects a drop in the number of reversals from block 1 to block 2. H4 states that the drop of reversals is greater for the meditators than for the lay persons. Specifically, H4 expects that the drop is more pronounced for the members of the Zen than for the members of the Tibet group. We expected this order because individuals in the Zen groups focused their meditative practice on sustained attention, whereas individuals in the Tibet group exercised a variety of meditative practices. Put shortly, our hypotheses on the number of reversals (NR) were as follows; note that these hypotheses are observation based: H1: NR control > NR T ibet > NR Zen in block 1.
H4: NR control < NR T ibet < NR Zen for the drop of reversals between block 1 and block 2.
One observation had to be excluded (Zen group) due to a technical problem resulting in missing values. For permutation tests, 10,000 trials were computed to achieve numeric stability for reproducibility. Outliers were left untreated in the sample.
In terms of OOM statistics, we report two variants of the PCC statistic: first, with ambiguous observations assigned to the condition with the highest frequency (PCC), and second, with ambiguous observations assigned to a class called "ambiguous", thus lowering the PCC amb . In addition, we report the accuracy gain, the c-value, and the CSI, as described above. Results are given for our revised approach as well as for Grice (2011) original approach.
We conducted all analyses in R (3.2.3; R Core Team, 2015) including several packages (Auguie, 2015;Aust and Barth, 2016;Dahl, 2014;Fox & Weisberg, 2015;R Core Team, 2015;Wickham, 2015;Wickham & Chang, 2015;Wickham & Francois, 2015). More details on computer environment, including the names and versions of the R packages used, are provided in S1 (CompEnv). For ensuring the reproducibility of the computations, we conducted all analyses based on a CRAN image of 2016-01-18, using the package "checkpoint" (Analytics, 2015). The R Code for the analyses can be freely accessed here: osf.io/u3nq9 (Sauer & Luebke, 2017b). The data set can be freely accessed here: osf.io/sgzyd (Sauer & Luebke, 2017a). We confirm that we did not alter the data in any other way than as described here. Supplemental files can be freely accessed here: https://osf. io/6vhja/. This paper was not preregistered.

Descriptive results
In sum, the median number of reversals was 44.0 in block 1, and 25.0 in block 2 (a difference of 19.0). Given a block duration of 180s, the average reversal time was 4.1s in block 1, and 7.2s in block 2. Descriptive statistics (Md, Min, Max) for the whole sample as well as for the three subgroups are given in Table 1.
As can be seen in Table 1, the median number of reversals differed between groups. The observations in the Tibet group and the control group tended to show more reversals than the Zen group; this was true in both blocks. Thus, on an aggregate level (Md), the Zen group appeared to demonstrate a clear advantage in comparison over the control group and the Tibet group, consistent with what we expected. Contrary to our expectations, the Tibet group showed more reversals, on a median level than the control group (cf. Fig. 3). In sum, this results provided partial support for H1 and H2.
Apart from comparing the three groups, we were interested in comparing the differences in change from block 1 to block 2 between the three groups. As can be seen in Table 1, the median reduction in reversals was most pronounced in the Tibet group (24.5, 45%). The relative reduction was stronger in the Zen group than in the control group (36% vs. 30%). However, the absolute reduction was stronger in the control group (17.5) than in the Zen group (16.0). Thus, the absolute reduction rates showed a mixed picture, and provide partial support for H4.
In each group, most of the individuals reduced their number of reversals. As expected, this effect was more pronounced in the Tibet group (15/16 ≈ .94) than in the control group (34/38 ≈ .90). However, contrary to expectation, the Zen group (18/21 ≈ .86) showed a smaller proportion of participants who lowered their reversal number; the difference between the groups was small. Figure 4 depicts this pattern. On a side note, the reduction was apparent for extreme cases as well thus supporting the argument for not excluding extreme cases from this dataset.

OOM results
For the OOM analyses, we binned the 43 different values of the conforming variable (numbers of reversals) in order to reduce sparse bins and for easier comprehension (see supplementary information files S3 for problems with unbinned data and file S4 for OOM results without binning). Binning is a standard procedure, for example in exploratory data analyses for histograms. However, there is no single rule for the correct numbers of bins. The most widely used choices  (Sturges, 1926), the Freedman-Doane rule (Freedman & Diaconis, 1981), or the square root rule (Venables & Ripley, 2013), among others; see supplementary information S2 for details. For the present data, these rules resulted in 8, 14, and 9 bins in block 1, respectively. As Sturges' rule the Freedman-Doane rule are acknowledged as more adequate, we decided to conduct the OOM analyses with both 8 and 14 bins and to compare results. It should be noted that we do not exclude the possibility that different binning procedures are defensible. As a reference binning, we included a median-split binning.
In the results for both block 1 and 2, as well as for the change between the two Blocks, we found that the results of the different binning strategies differed very little. Thus, here, we present only the results of the Freedman binning (PCCs: block 1: .59; block 2: .52; change: .55). A more detailed analyses of the results can be found in the supplementary information (S2). As can be seen in Fig. 5, the PCC was moderate at best. This was true for block 1 (H1), block 2 (H2), and for the change in the number of reversals from block 1 to 2 (H4).
Thus, the OOM statistics provided only a little support for H1, H2, and H4.
A summary of the results of the OOM analysis is given in Table 2; results are based either on the Moore-Penrose pseudo inverse or on Grice' method). Comparing the tables, it is obvious that the two normalization procedures can differ to a moderate extent. H3 was supported by the OOM   results, as can be seen in Fig. 6: The number of reversal times conformed well to block 1; data from the three groups were combined in this analyses. Although the PCC of .67 appears moderate, the c-value of .001 indicates that the classification was nevertheless well above the rate of chance. Note that in block 2, the number of correct classifications (green color) was much higher than in block 1. The reason is that in block 1 the variation of the reversals was much more spread out than in block 2.

Discussion of OOM example
The initial analysis by Sauer et al. (2012) presented results that favored the main hypothesis, (i.e., that mindfulness can prolong the subjective now). However, our reanalysis reveals a more nuanced and less favorable picture. In sum, H1, H2, and H4 were moderately supported by the descriptive statistics (cf. Figs. 3 and 4), but they were not well supported by the OOM statistics (cf. Fig. 5, Table 2). H3 was well supported by the OOM statistics. Why were the OOM results less favorable than the descriptive results? One explanation is that the differences were not pronounced enough for the OOM algorithm to separate the groups. This explanation is backed up by the marked overlap of the three distributions. Thus, although the descriptive analyses uncovered some difference between the groups, this difference should be regarded as small in size -especially on the level of the observations. We did not exploit the numeric (count) level of the data, consistent with the theoretical underpinnings of OOM.
In sum, the OOM analysis provided only weak evidence that some forms of attentive training can prolong the reversal times in bistable images. If it exists, this effect would be of interest as cognitive comprehension hinges on the size of working memory. If working memory capacity could be Multigram for comparing Blocks 1 and 2. Data from all three groups were merged for this analysis. Note that the classification results (correct/false/ambiguous) are labeled by c, f, or a, respectively, in the facet headers of the plot. The number of reversals could be conformed well to the Blocks enhanced by mindfulness training, practical and theoretical benefits can be envisioned.
In the original paper, the results were based on a test of statistically significant differences in reversals between meditators and non-meditators (statistical significance was found in block 2 but not in block 1 in the original paper). The present reanalyses shed much more light on the particularities of the data. For example, the effects of mindfulness appeared to be moderated by the type of training, (i.e., Zen and Tibet data exhibited quite different results).
Future studies should provide more data and should try to find more reliable measures of when perception changes. Of course, limitations of the present study include its quasiexperimental setup. A true experimental setup is necessary to shed light on the effects and mechanisms of mindfulness with regard to attentive resources and processes. Of interest, previous research noted that the subjective now typically has a time window of around 3 seconds (Cohen, 1959). In our data, mean reversal times were often longer. However, it has been noted that whereas the distribution of reversal time may peak at 3 seconds, considerable variance prevails (Atmanspacher, Filk, & Römer, 2004). Thus, investigating the distribution of the raw reversal time would be of interest.

Overall discussion
The main objective of this paper was to present a refined version of OOM as an exploratory data modeling technique that may be suited to some of the data and research questions in the behavioral sciences, as its data analytic part is non-parametric and based on well-known methods such as cross-tabulation. Its most prominent features are the focus on counting individual observations and modeling based on sample data, but still offering possibilities for model checking by reporting the number of observations for which the model holds by presenting classification accuracy. We believe that OOM can offer an extension of the more common statistical armory in behavioral science and may help provide insights with respect to some scientific questions. One important aspect of OOM, which was not the focus of the present paper, involves the use of iconic or "integrated models" (Grice, 2011) (cf. Fig. 3 in S2). Such visual modeling forces the analyst to make assumptions that are more explicit than what is presented in standard path diagrams.
The initial OOM method by Grice (2011) is based on matrix algebra. This is an advantage for applied scientists with limited time and interest in statistical subtleties. However, such subtleties should be addressed when the assumptions of (standard) statistical methods are not met. Here, we adopted an approach that was based on the Moore-Penrose pseudo inverse. The advantage of this method is that it is known to optimize error as detailed above. The results of the two methods -Grice vs. Moore-Penrose -can differ.
We therefore advise analysts to report their normalization method.
In OOM, logical or arithmetic combinations of variables can be modeled -a procedure dubbed "logical hypothesis testing" by Grice (2011). This method allows reseachers to potentially disentangle the effects of different causes. For monotonic relationships, for example, cross-tabulating, as is performed in OOM, may be too flexible. Also it may require data pre-processing (i.e., binning). Similarly, in situations with known stochastic models and distributions, Frequentist or Bayesian approaches may be more appropriate than OOM.
For the time being, we conclude that OOM is a promising modeling approach for some research settings as it circumvents some of the problems encountered in the application and interpretation of more common methods. There is a need for more research experience with this method and certainly room for more statistical revisions or extensions.