Skip to main content
Log in

Error statistical modeling and inference: Where methodology meets ontology

  • Published:
Synthese Aims and scope Submit manuscript

Abstract

In empirical modeling, an important desiderata for deeming theoretical entities and processes as real is that they can be reproducible in a statistical sense. Current day crises regarding replicability in science intertwines with the question of how statistical methods link data to statistical and substantive theories and models. Different answers to this question have important methodological consequences for inference, which are intertwined with a contrast between the ontological commitments of the two types of models. The key to untangling them is the realization that behind every substantive model there is a statistical model that pertains exclusively to the probabilistic assumptions imposed on the data. It is not that the methodology determines whether to be a realist about entities and processes in a substantive field. It is rather that the substantive and statistical models refer to different entities and processes, and therefore call for different criteria of adequacy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Baggerly, K. A., & Coombes, K. R. (2009). Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology. Annals of Applied Statistics, 3, 1309–1334.

    Article  Google Scholar 

  • Box, G. E. P. (1976). Science and statistics. Journal of the American Statistical, 71, 791–799.

    Article  Google Scholar 

  • Cox, D. R., & Hinkley, D. V. (1974). Theoretical statistics. London: Chapman & Hall.

    Book  Google Scholar 

  • Cox, D. R., & Mayo, D. G. (2010). Objectivity and conditionality in frequentist inference. In D. G. Mayo & A. Spanos (Eds.), Error and inference (pp. 276–304). Cambridge: Cambridge University Press.

    Google Scholar 

  • Fama, E. F., & French, K. R. (2004). The capital asset pricing model: Theory and evidence. The Journal of Economic Perspectives, 18, 25–46.

    Article  Google Scholar 

  • Fisher, R. A. (1922). On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society A, 222, 309–368.

    Article  Google Scholar 

  • Fisher, R. A. (1935). The design of experiments. Edinburgh: Oliver and Boyd.

    Google Scholar 

  • Jensen, M. C. (1968). The performance of mutual funds in the period 1945–1964. Journal of Finance, 23, 389–416.

    Article  Google Scholar 

  • Lai, T. L., & Xing, H. (2008). Statistical models and methods for financial markets. NY: Springer.

    Book  Google Scholar 

  • Lintner, J. (1965). The valuation of risk assets and the selection of risky investments in stock portfolios and capital budgets. Review of Economics and Statistics, 47, 13–37.

    Article  Google Scholar 

  • McGuirk, A., & Spanos, A. (2008). Revisiting error autocorrelation correction: Common factor restrictions and granger non-causality. Oxford Bulletin of Economics and Statistics, 71, 273–294.

    Article  Google Scholar 

  • Mayo, D. G. (1996). Error and the growth of experimental knowledge. Chicago: The University of Chicago Press.

    Book  Google Scholar 

  • Mayo, D. G. (1997). Duhem’s problem, the Bayesian way, and error statistics, or ”What’s belief got to do with It?”. Philosophy of Science, 64, 222–244.

    Article  Google Scholar 

  • Mayo, D. G. (2010a). Learning from error, severe testing, and the growth of theoretical knowledge. In D. G. Mayo & A. Spanos (Eds.), Error and inference: Recent exchanges on experimental reasoning, reliability, and the objectivity and rationality of science (pp. 28–57). Cambridge: Cambridge University Press.

  • Mayo, D.G. (2010b). Learning from error: The theoretical significance of experimental knowledge, The modern schoolman. Guest editor, Kent Staley. Volume 87, Issue 3/4, March/May 2010, Experimental and theoretical knowledge, The 9th Henle conference in the history of philosophy, 191–217.

  • Mayo, D. G., & Cox, D. R. (2010). Frequentist statistics as a theory of inductive inference. In D. G. Mayo & A. Spanos (Eds.), Error and inference: Recent exchanges on experimental reasoning, reliability, and the objectivity and rationality of science (Vol. 7, pp. 247–275). Cambridge: Cambridge University Press.

    Google Scholar 

  • Mayo, D. G., & Spanos, A. (2004). Methodology in practice: Statistical misspecification testing. Philosophy of Science, 71, 1007–1025.

    Article  Google Scholar 

  • Mayo, D. G., & Spanos, A. (2006). Severe testing as a basic concept in a Neyman-Pearson philosophy of induction. British Journal for the Philosophy of Science, 57, 323–57.

    Article  Google Scholar 

  • Mayo, D. G., & Spanos, A. (2010). Error and inference: Recent exchanges on experimental reasoning, reliability, and the objectivity and rationality of science. Cambridge: Cambridge University Press.

    Google Scholar 

  • Mayo, D. G., & Spanos, A. (2011). Error statistics. In D. Gabbay, P. Thagard, & J. Woods (Eds.), Philosophy of statistics, handbook of philosophy of science. Amsterdam: Elsevier.

    Google Scholar 

  • Potti, A., Dressman, H. K., Bild, A., Riedel, R. F., Chan, G., Sayer, R., et al. (2006). Genomic signatures to guide the use of chemotherapeutics. National Medicine, 12, 1294–1300.

    Article  Google Scholar 

  • Senn, S. J. (2001). Two cheers for P-values. Journal of Epidemiology and Biostatistics, 6(2), 193–204.

    Article  Google Scholar 

  • Sharpe, W. F. (1964). Capital asset prices: A theory of market equilibrium under conditions of risk. Journal of Finance, 19, 425–442.

    Google Scholar 

  • Spanos, A. (1990). The simultaneous equations model revisited: Statistical adequacy and identification. Journal of Econometrics, 44, 87–108.

    Article  Google Scholar 

  • Spanos, A. (1999). Probability theory and statistical inference: Econometric modeling with observational data. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • Spanos, A. (2006). Where do statistical models come from? Revisiting the problem of specification, pp. 98–119 In Optimality: The second Erich L. Lehmann Symposium, Rojo, J. (Ed.) Lecture notes-monograph series, vol. 49, Institute of Mathematical Statistics.

  • Spanos, A. (2007). Curve-fitting, the reliability of inductive inference and the error-statistical approach. Philosophy of Science, 74(5), 1046–1066.

    Article  Google Scholar 

  • Spanos, A. (2010a). Theory testing in economics and the error statistical perspective. In D. G. Mayo & A. Spanos (Eds.), Error and inference: Recent exchanges on experimental reasoning, reliability, and the objectivity and rationality of science (pp. 202–246). Cambridge: Cambridge University Press.

  • Spanos, A. (2010b). Akaike-type criteria and the reliability of inference: Model selection versus statistical model specification. Journal of Econometrics, 158, 204–220.

    Article  Google Scholar 

  • Spanos, A. (2010c). Statistical adequacy and the trustworthiness of empirical evidence: Statistical vs. substantive information. Economic Modelling, 27, 1436–1452.

    Article  Google Scholar 

  • Spanos, A. (2010d). The discovery of argon: A case for learning from data? Philosophy of Science, 77(3), 359–380.

    Article  Google Scholar 

  • Spanos, A. (2010e). Is frequentist testing vulnerable to the base-rate fallacy? Philosophy of Science, 77, 565–583.

    Article  Google Scholar 

  • Spanos, A. (2013). A frequentist interpretation of probability for model-based inductive inference. Synthese, 190, 1555–1585.

    Article  Google Scholar 

  • Spanos, A., & McGuirk, A. (2001). The model specification problem from a probabilistic reduction perspective. Journal of the American Agricultural Association, 83, 1168–1176.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aris Spanos.

Additional information

Thanks are due to two anonymous reviewers for many contructive and valuable comments/suggestions.

Appendix: M-S testing and auxiliary regressions

Appendix: M-S testing and auxiliary regressions

In light of the fact that the Linear Regression model (Table 1) is specified in terms of the conditional mean and variance:

$$\begin{aligned} \begin{array}{cc} E\left( Y_{t}\mathbf {~|~}X_{t}\mathbf {=}x_{t}\right) \mathbf {=}\beta _{0}\mathbf {+}\beta _{1}x_{t},&Var\left( Y_{t}\mathbf {~|~}X_{t} \mathbf {=}x_{t}\right) \mathbf {=}\sigma ^{2},\ \ t{\in }{\mathbb {N}}, \end{array} \end{aligned}$$
(14)

one can test for any departures from the linear regression assumptions: [1] Normality, [2] linearity, [3] homoskedasticity, [4] independence, and [5] t-invariance, by expanding the orthogonal decompositions stemming from (14) (Spanos 1999):

$$\begin{aligned} u_{t} = \overset{0}{\overbrace{E\left( u_{t}\mathbf {~|~} X_{t}\mathbf {=}x_{t}\right) }}\mathbf {+}v_{1t},\ u_{t}^{2} = \overset{\sigma ^{2}}{\overbrace{E\left( u_{t}^{2}\mathbf {~|~}X_{t} \mathbf {=}x_{t}\right) }}\mathbf {+}v_{2t},\ \ t{\in }{\mathbb {N}} , \end{aligned}$$
(15)

to include additional terms representing potential violations from these assumptions. Whereas the adequacy of the model assumes that \(E\left( u_{t}\mathbf {~|~}X_{t}\mathbf {=}x_{t}\right) \mathbf {=}0\), the true error might be non-zero when any of the assumptions [2]-[5] are invalid; similarly for \(E\left( u_{t}^{2}\mathbf {~|~}X_{t}\mathbf {=}x_{t}\right) \mathbf {=} \sigma ^{2}\). A particular example of such auxiliary regressions whose terms are only indicative of the kind of terms one could use to seek out any remaining systematic information in the residuals, is:

$$\begin{aligned} \widehat{u}_{t}= & {} \overset{[1],[2],[4],[5]}{\overbrace{\gamma _{10}+\gamma _{11}x_{t}}}+\overset{\overline{[5]}}{\overbrace{\gamma _{12}t + \gamma _{13}t^{2}}} + \overset{\overline{[2]} }{\overbrace{\gamma _{14}x_{t}^{2}}}+\overset{\overline{[4]}}{\overbrace{\gamma _{15}x_{t-1}+\gamma _{16}y_{t-1}}}+v_{1t},\nonumber \\&H_{0}:\gamma _{11}=\gamma _{12}=\gamma _{13} =\gamma _{14}=\gamma _{15}\mathbf {=}\gamma _{16}\mathbf {=}0\end{aligned}$$
(16)
$$\begin{aligned} \widehat{u}_{t}^{2}= & {} \overset{[1],[3],[5]}{\overbrace{\gamma _{20}} }+\overset{\overline{[3]}}{\overbrace{\gamma _{21}x_{t}}}+\overset{\overline{[5]}}{\overbrace{\gamma _{22}t + \gamma _{23}t^{2}}}+\overset{\overline{[3]}}{\overbrace{\gamma _{24}x_{t}^{2}}}+\overset{\overline{[4]} }{\overbrace{\gamma _{25}x_{t-1}^{2}+\gamma _{26}y_{t-1}^{2}}}+v_{2t},\nonumber \\&H_{0}:\gamma _{21}=\gamma _{22}=\gamma _{23} =\gamma _{24}=\gamma _{25}=\gamma _{26}\mathbf {=}0 \end{aligned}$$
(17)

In each case the null hypotheses \(H_{0}\) assert that the model assumptions hold, taking us back to (15). The terms beyond \(\gamma _{10}+\gamma _{11}x_{t}\) in (16) and beyond \(\gamma _{20}\) in (17) represent different types of statistical systematic information that the original model might have overlooked. The interesting upshot of this is that the additional terms represent potential violations, which are expressed in generic terms that represent systematic statistical information already in \(\mathbf {Z}\) and do not directly refer to any specific substantive factors. Their statistical significance, however, raises questions about how generic terms such as \(t\) and \(t^{2}\)—which represent substantive ignorance—can be replaced by relevant explanatory variables for substantive adequacy purposes; see Spanos (2010c).

One has reduced the problem of probing for model violations to testing the statistical significance of these additional terms, individually or in groups, using simple t-tests and F-tests (Spanos 1999). A rejection of a null hypothesis indicates departures from the underlying model assumption(s).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Spanos, A., Mayo, D.G. Error statistical modeling and inference: Where methodology meets ontology. Synthese 192, 3533–3555 (2015). https://doi.org/10.1007/s11229-015-0744-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11229-015-0744-y

Keywords

Navigation