Inductive knowledge under dominance

Campi, Marco C.

doi:10.1007/s11229-023-04172-9

Inductive knowledge under dominance

Original Research
Published: 16 May 2023

Volume 201, article number 184, (2023)
Cite this article

Synthese Aims and scope Submit manuscript

Marco C. Campi ORCID: orcid.org/0000-0002-5209-3312¹

233 Accesses
Explore all metrics

Abstract

Inductive reasoning aims at constructing rules and models of general applicability from a restricted set of observations. Induction is a keystone in natural sciences, and it influences diverse application fields such as engineering, medicine and economics. More generally, induction plays a major role in the way humans learn and operate in their everyday life. The level of reliability that a model achieves depends on how informative the observations are relative to the flexibility of the process by which the model is constructed. When the process is articulated so that the model can incorporate descriptive details and subtleties, a large set of informative observations are required to reliably tune the model, whereas models obtained from simple procedures can be tuned with fewer observations. This article introduces the concept of “dominance”, which refers to the situation in which a reduced subset of observations suffices to reconstruct the model. A mathematical framework is presented to quantify the reliability of learning procedures as a function of the size of the subset of dominant observations. Although limited in scope, we believe that our study can contribute to the understanding of some fundamental mechanisms by which knowledge is generated from observations in inductive reasoning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Induction

The Induction Problem: A Machine Learning Vindication Argument

A Solution to Wiehagen’s Thesis

Article 21 April 2016

Notes

It is important to observe that not all existing models operate in the way described here. For example, in some applications one wants to construct a line of best fit. We shall describe alternatives, so as to better position our contribution, in Sect. 2 after we introduce a formal definition of set model. We also advise the reader that, throughout this article, when we use the word “model” this will stand for “set model” unless otherwise specified.
For example, social demography studies the relationships between economic, social and cultural features of a society from the analysis of a sample elicited from the population; the penetration of machine learning techniques aiming at constructing classifiers from an observed set of cases (the so-called training sequence) is getting ever more pervasive in medical diagnoses as well as in control, telecommunications and computer engineering; and, certainly, predictors built from previous measurements (e.g., rates-of-return) are broadly used in quantitative finance. Also in physics laws are built by generalizing from a limited number of observed cases; for example, electrons are deemed to have negative charge because all electrons that have been thus far tested in a laboratory had this property (this is an example of enumerative induction, Example 3 provides another example of this type).
Bruno De Finetti, referring to Henri Poicarè wrote in de Finetti (1989) “he has clearly understood that only an accomplished fact is certain, that science cannot limit itself to theorizing about accomplished facts but must forsee, that science is not certain.”
More precisely, in a sub-branch of statistical learning that aims to establish the coverage of set models. More generally, statistical learning studies how well models of various nature describe a population, for example according to a criterion of average fit.
In some cases, the model is allowed to fail on specific members of the sample that have a “odd” behavior as compared to other members (outliers) so that a smaller model, with improved descriptive capabilities, is achieved.
While complexity is one of the most debated and controversial concepts in science, and indeed it has attracted the attention of eminent mathematicians including Kolmogorov and Chaitin, Kolmogorov (1965), Chaitin (1966), Kolmogorov (1968), Definitions 1 and 2 do not want to contribute this discussion, they merely introduce measures of complexity within the specific setup here described of constructing models from observations.
In fact, it is not difficult to show that the complexity of this procedure is exactly 3.
For example, in Campi et al. (2018) viable approaches are provided to upper bound the “complexity of a procedure for a given sample” (Definition 1) based on the progressive removal of observations until no observation can be further removed without altering the model.
Beyond this simple example, in modern decision-making problems dealing with complex systems, besides observations one does want to exploit domain knowledge that comes from various sources, often including some knowledge that, while not completely trustworthy, can still be of help to obtain a satisfactory model.
If \(N = \#(\text{ members } \text{ of } \text{ the } \text{ population})\), we are in the extreme case that all members of the population is in the sample, in which case no reliability issue arises. At a mathematical level, condition \(N < \#(\text{ members } \text{ of } \text{ the } \text{ population})\) prevents division by zero in the definition of R(M). This condition in force throughout the rest of this article.
\(|S |\) is the cardinality of S, i.e., the number of elements in S.
Should a complete description of the population be available, then we would have nothing to learn as the attributes of the whole population would be known, and the inductive problem would not exist altogether.
The reader may also be interested in consulting the recent monograph Campi and Garatti (2018) that contains a broad presentation of real learning problems in a context of dominance.
As we have said before the statement of Proposition 1, the evaluation of R(P) provided in the proposition cannot be improved because it holds with equality for certain populations, a fact that is further commented upon in Sect. 5.2. On the other hand, it remains that the evaluation of R(P) is worst-case and R(P) can decay at a rate faster than the inverse of N for specific populations.
This positions the result within the tradition of the principle of indifference (a terminology coined by John M. Keynes). While inchoate versions of this principle were already present in Blaise Pascal, Jacob Bernoulli and Gottfried W. von Leibniz, the principle of indifference was fully developed into a theoretical apparatus mainly in de Moivre (1718), and, later, in Laplace (1814).
A convex set is a set where the line segment connecting any two points in the set is entirely contained in the set. Hence, a square or a disk is convex, but a horseshoe-shaped set is not. The “convex hull” of given points is the smallest convex set that contains all the points.
A word of clarification is perhaps appropriate to dissipate any doubts regarding this claim. For a given procedure and a given population, the right-hand side of (3) does return a value that is equal to or bigger than that returned by the right-hand side of (2). The very point is that (3) applies also to populations that lie outside the domain of applicability of (2) (populations for which the complexity exceeds c). When (3) is applied to one of these populations, it can return a value larger than the upper bound given in Proposition 1 with \(c(P) = c\) despite the fact that one only accepts the model when the complexity is no more than the threshold c.
For simplicity, the proposition refers to when the cardinality of the population tends to infinity, which is referred to as the “large population” set-up. The reader unfamiliar with the notation \({N \atopwithdelims ()c}\) is referred to Sect. 5.1 for an explanation. We also note that, in the formula, \(\frac{1}{N-c}\) corresponds to exponentiation, that is, the binomial coefficient \({N \atopwithdelims ()c}\) is raised to the fractional exponent \(\frac{1}{N-c}\).
The difference between a list and a sample is that the sample is a set and does not contain a concept of ordering. Hence, two lists (a, b, c) and (b, c, a) are different, however, they correspond to the same sample.
The notion of exchangeability captures, and suitably formalizes, the principle of uniformity of nature formulated by Hume (2008).
Recall that \({N \atopwithdelims ()0} = 1.\) so that the first factor in the right-hand side of (4) is 1.
In Popper’s approach, a model is rejected as soon as complexity exceeds the value 0.
If \({Q \atopwithdelims ()c} {K+N-c \atopwithdelims ()N-c} > {Q \atopwithdelims ()N}\) for all \(K \ge 0\), then the right-hand side of (9) is taken always to be equal to \({Q \atopwithdelims ()N}\).
To ease the notation, in the equation below we write the set over which summation runs as \(\{S: \ c(P,S) \le c\}\) instead of \(\{S: |S |= N \text{ and } c(P,S) \le c\}\), that is, the cardinality of set S is omitted; a similar shorthand applies to the the case \(c(P,S) > c\).

References

Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716–723.
Article Google Scholar
Bailer-Jones, D. (2009). Scientific models in philosophy of science. University of Pittsburgh Press.
Book Google Scholar
Birkes, D., & Dodge, Y. (1993). Alternative methods of regression. Wiley.
Book Google Scholar
Calafiore, G., & Campi, M. (2005). Uncertain convex programs: Randomized solutions and confidence levels. Mathematical Programming, 102(1), 25–46.
Article Google Scholar
Campi, M., & Garatti, S. (2018). Introduction to the scenario approach. In MOS-SIAM series on optimization.
Campi, M., S, S. G., & Ramponi, F. (2018). A general scenario theory for nonconvex optimization and decision making. IEEE Transactions on Automatic Control, 63, 4067–4078.
Article Google Scholar
Cartwright, N. (1989). Nature’s capacities and their measurement. Oxford University Press.
Google Scholar
Chaitin, G. (1966). On the length of programs for computing binary sequences. Journal of the ACM, 13, 547–569.
Article Google Scholar
Contessa, G. (2007). Scientific representation, interpretation, and surrogative reasoning. Philosophy of Science, 74, 48–68.
Article Google Scholar
Da Costa, N., & French, S. (2000). Models, theories, and structures: Thirty years on. Philosophy of Science, 67, 116–127.
Article Google Scholar
de Finetti, B. (1989). Probabilism: A critical essay on the theory of probability and on the value of science. Erkenntnis (translation of “Probabilismo Saggio critico sulla teoria delle probabilita e sul valore della scienza”, Biblioteca di Filosofia, Napoli, 1931) 31, 169–223.
de Moivre, A. (1718). The doctrine of chances: Or, a method of calculating the probability of events in play. W. Pearson (reprinted 1967, New York, NY: Chelsea).
Forster, M., & Sober, E. (1994). How to tell when simple, more unified, or less ad hoc theories will provide more accurate predictions. British Journal for the Philosophy of Science, 45, 1–35.
Article Google Scholar
Forster, M., & Sober, E. (2011). AIC scores as evidence: A Bayesian interpretation. In M. Forster & P. Bandyopadhyay (Eds.), Philosophy of statistics (Handbook of the philosophy of science) (Vol. 7, pp. 535–549). Elsevier.
Google Scholar
Frigg, R., & Hartmann, S. (2012). Models in science. In Zalta, E. N. (Ed.), The Stanford encyclopedia of philosophy. Fall 2012
Goodman, N. (1955). Fact, fiction, & forecast. Harvard University Press.
Google Scholar
Harris, T. (2003). Data models and the acquisition and manipulation of data. Philosophy of Science, 70, 1508–1517.
Article Google Scholar
Harter, H. (1982). Minimax method. Encyclopedia of statistical sciences (Vol. 4, pp. 514–516). Wiley.
Google Scholar
Hughes, R. (1997). Models and representation. Philosophy of Science, 64, 325–336.
Article Google Scholar
Hume, D. (2008). An enquiry concerning human understanding. Oxford World Classics (originally published in 1748).
Kolmogorov, A. (1965). Three approaches to the quantitative definition of information. Problems of Information Transmission (USSR), 1, 4–7.
Google Scholar
Kolmogorov, A. (1968). Logical basis for information theory and probability theory. IEEE Transactions on Information Theory, 14, 662–664.
Article Google Scholar
Laplace, P. (1814). Essai philosophique des probabilitès. (translated version Philosophical Essay of Probabilities, Springer, 1999).
Laymon, R. (1982). Scientific realism and the hierarchical counterfactual path from data to theory. In Proceedings of the Biennial Meeting of the Philosophy of Science Association (pp. 107–121).
Magnani, L., & Nersessian, N. (Eds.). (2002). Model-based reasoning: Science, technology, values. Kluwer.
Google Scholar
Magnani, L., Nersessian, N., & Thagard, P. (Eds.). (1999). Model-based reasoning in scientific discovery. Kluwer.
Google Scholar
Maki, U. (1994). Isolation, idealization and truth in economics. In Hamminga, B., & Marchi, N.D. (eds) Idealization VI: Idealization in economics. Poznan studies in the philosophy of the sciences and the humanities (Vol. 38, pp. 147–168). Rodopi, Amsterdam.
Mayo, D. (1996). Error and the growth of experimental knowledge. University of Chicago Press.
Book Google Scholar
McAllister, J. (1997). Phenomena and patterns in data sets. Erkenntnis, 47, 217–228.
Article Google Scholar
Morgan, M., & Morrison, M. (1999). Models as mediating instruments. In M. Morgan & M. Morrison (Eds.), Models as mediators. Perspectives on natural and social science (pp. 10–37). Cambridge University Press.
Chapter Google Scholar
Popper, K. (1963). Conjectures and refutations: The growth of scientific knowledge. Routledge & Kegan Paul.
Shiryaev, A. (1996). Probability. Springer.
Book Google Scholar
Sober, E. (2008). Evidence and evolution: The logic behind the science. Cambridge University Press.
Book Google Scholar
Sober, E. (2015). Ockham’s razors. Cambridge University Press.
Book Google Scholar
Steel, D. (2010). What if the principle of induction is normative? Formal learning theory and Hume’s problem. International Studies in the Philosophy of Science, 24, 171–185.
Article Google Scholar
Suppes, P. (1960). A comparison of the meaning and uses of models in mathematics and the empirical sciences. Synthese, 12, 287–301.
Article Google Scholar
Suppes, P. (1962). Models of data. In: Nagel, E.P.S., & Tarski, A. (Eds.), Methodology and philosophy of science: Proceedings of the 1960 international congress. Stanford University Press (pp. 252–261).
Swoyer, C. (1991). Structural representation and surrogative reasoning. Synthese, 87, 449–508.
Article Google Scholar
Takeuchi, K. (1976). Distribution of informational statistics and a criterion of model fitting. Suri-Kagaku (Mathematical Sciences), 153, 12–18. in Japanese.
Google Scholar
van Fraassen, B. (1980). The scientific image. Oxford University Press.
Book Google Scholar
Woodwart, J. (1989). Data and phenomena. Synthese, 79, 393–472.
Article Google Scholar

Download references

Acknowledgements

The author would like to thank Dr. Sean Kenny for providing suggestions on how to improve the presentation of this work. The author also gratefully acknowledges the valuable and constructive comments made by anonymous referees.

Funding

No funds, grants, or other support was received.

Author information

Authors and Affiliations

Center for the Study of Inductive Methods c/o Department of Information Engineering, University of Brescia, via Branze 38, 25123, Brescia, Italy
Marco C. Campi

Authors

Marco C. Campi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marco C. Campi.

Ethics declarations

Conflict of interest

The author has no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Campi, M.C. Inductive knowledge under dominance. Synthese 201, 184 (2023). https://doi.org/10.1007/s11229-023-04172-9

Download citation

Received: 18 October 2021
Accepted: 24 April 2023
Published: 16 May 2023
DOI: https://doi.org/10.1007/s11229-023-04172-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Inductive knowledge under dominance

Abstract

Access this article

Similar content being viewed by others

Induction

The Induction Problem: A Machine Learning Vindication Argument

A Solution to Wiehagen’s Thesis

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Inductive knowledge under dominance

Abstract

Access this article

Similar content being viewed by others

Induction

The Induction Problem: A Machine Learning Vindication Argument

A Solution to Wiehagen’s Thesis

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation