1 Sets versus classes

Russell’s paradox of the class of all non-self-membered classes was first discovered in connection with Frege’s Grundgesetze (Frege 1964), where Frege sought to establish the logicist thesis that arithmetic is a branch of logic. The paradox caused (together with other paradoxes such as Cantor’s and Burali-Forti’s) what some have called the “third foundational crisis of mathematics”, which prompted the search for a firm foundation of mathematics (Fraenkel and Bar-Hillel 1958, pp. 14–15). This foundation was found in modern axiomatic set theory, which has its roots in the work of Cantor.

A number of authors (Maddy 1983; Lavine 1994) have argued that there were at least two different notions of class in the literature, and that only of them is prone to paradox (Gödel 1983). Following Lavine (1994, p. 63), we may call them the logical and the combinatorial notion of class, respectively.

According to the logical notion, a class may be defined as the extension of a concept or predicate, or, to use Russell’s words, “as all the terms satisfying some propositional function”.Footnote 1 Such classes are associated with some kind of definition or rule that tells us in a principled way whether an object belongs to the class or not. This is the notion of class that was championed by Frege, Peano and Russell.Footnote 2 Extensions of concepts had been part of logic since antiquity; they can be found in the works of Leibniz and are explicit in the Port-Royal Logic (Bochenski 2002, pp. 302–303). It is this fact that allowed Frege, dialectically, to assume that a reduction of number theory to class theory is sufficient to establish his thesis that arithmetic is a branch of logic (Heck 2011b, p. 126).

According to the combinatorial notion, on the other hand, classes are obtained from some well-defined objects such as the natural numbers by ‘enumerating’ their members in an arbitrary way. Such classes exist independently of our ability to provide a defining condition or rule that characterizes its members. Arguably, this is the notion adopted by Cantor and Zermelo and which underlies our modern iterative concept of set.

The difference between the concept of class as given by a rule and the concept of class freed from such restriction was an important factor in the controversy about the axiom of choice. This axiom states that we can select one element out of each of a family of (non-empty) classes and collect them into one class. As Bernays (1983) remarks, the axiom of choice “is an immediate application of the combinatorial concepts in question.” On the logical notion of class, on the other hand, it is doubtful whether a class satisfying the requirements set out in the axiom of choice can always be found.Footnote 3

In what follows, we will call the combinatorial classes sets and the logical ones classes. The (logical) notion of class motivates what is commonly referred to as “the naïve calculus”, which consists in the naïve or unrestricted comprehension axiom scheme, which postulates the existence of a class corresponding to each predicate, and the axiom of extensionality, which states that two classes are identical if they have the same members. Of course, as Russell’s paradox shows, the naïve calculus is inconsistent. The standard approach to the class-theoretic paradoxes is to be found in the theory of types, which originated with Russell (1903a, 1908). In a nutshell, what happens here is that one abandons the idea of a general or unrestricted variable and replaces it with a series of variables differentiated as to type.

While Cantor never laid down explicitly the principles that he was working with, it has been argued that the naïve comprehension axiom scheme was not part of it and that the notion of set was never subject to the paradoxes (e.g., Lavine 1994). By contrast, one can make a case that the class-theoretic paradoxes are still unsolved. For instance, Gödel says about the type-theoretic approach that “it cannot satisfy the condition of including the concept of concept which applies to itself or the universe of all classes that belong to themselves. To take such a hierarchy as the theory of concepts is an example of trying to eliminate the intensional paradoxes in an arbitrary manner.” (Wang 1996, p. 278) The aim of this paper is to provide reasons for developing a type-free theory of classes and to indicate one way how this might be done.

2 The function of class talk

In a series of papers, Parsons (1982, 1983a, b) has argued that the introduction of the notion of class answers a general need to generalize on predicate places in our language (where ‘predicate’ means formula with one free variable). For example, consider the usual (first-order) principle of mathematical induction. This consists in all sentences of the form

$$\varphi (0)\wedge \forall n\,(\varphi (n)\rightarrow \varphi (n+1))\rightarrow \forall n\,\varphi (n)$$

where \(\varphi (x)\) is a predicate applying to numbers. The introduction of class terms, governed by the comprehension axiom scheme, allows us to substitute the expression \(\varphi (t)\) by the materially equivalent \(t\in \{ u\mid \varphi (u)\}\), where \(\{ u\mid \varphi (u)\}\) occupies an object position and is therefore open to (objectual) quantification. Hence, the notion of class allows us to finitely axiomatize the induction schema by the single statement

$$\forall y\,(0\in y\wedge \forall n\,(n\in y\rightarrow n+1\in y)\rightarrow \forall n\, n\in y)$$

Of course, in mathematical contexts the demand for generalising predicate places is met to a considerable extent by sets. But the notion of class allows us in addition to generalize every predicate in the language of set theory. This cannot be done by sets themselves because some predicates of the language of set theory, such as ‘x is an ordinal’, have extensions that are “too big to be sets”. Examples of the use of classes in set theory include the formulation of certain schemata as single statements, such as the axiom schemes of separation and replacement, or reflection principles.Footnote 4 There are many other uses as well that are often eliminable but seem heuristically indispensable, for example, in connection with the study of elementary embeddings of the universe of sets into some inner model of ZFC (see Uzquiano 2003, section 2).

The function of class talk brings the notion of class in proximity with the notions of truth and truth-of (satisfaction). This was stressed by Parsons: just as the notion of class answers a need to generalize predicate places, so does the notion of truth answer to a need to generalize sentence places (cf. Quine 1970). Moreover, Parsons (1983b) observes that the notion of satisfaction can be seen a means to generalize predicate places as well, and that the usual predicative theories of satisfaction and classes are mutually interpretable. In Schindler (2015, 2017) it is shown that even impredicative theories of classes can be interpreted in (type-free) theories of satisfaction. Given the similar functions of the notions of truth and class, and the mentioned interpretability results, this suggests that someone who already has a broadly deflationary understanding of the notions of truth and satisfaction should probably have a deflationary understanding of the notion of class as well.

While I find the idea of a deflationary account of classes intriguing, it is rather tangential to our present purposes and I won’t pursue it any further here. But let me make the following remark. That classes were merely introduced to fulfill a particular function does not imply a nominalistic account of classes, at least if one subscribes to Quine’s criterion of ontological commitment. On the contrary, classes are introduced so that we can objectually quantify over entities that would otherwise be in predicate position. However, a deflationary account of classes may help us argue that classes are “thin” objects in the sense of Linnebo (2012), where “thin” is taken in the sense that “very little is required for their existence”. But this is a task for another paper.

3 Reasons for postulating type-free classes

The literature is full of interesting attempts to overcome the restrictions imposed by the theory of types. For an overview, I send the reader to Cantini (2009). The are various reasons why one may be interested in a type-free theory of classes. For instance, Feferman (1977, 1984), Muller (2001) and others are interested in a set-or class-theoretic foundation of category theory. The problem here is that there are certain categories that are very natural to think about, such as the category of all sets, the category of all groups or the category of all categories, that cannot be formed in standard set theory. [For a recent overview, see Schulman (2008).] In what follows, I will list four more reasons. My own interests are mainly with the first and last of them.

3.1 Unrestricted quantification

There are certain contexts in logic and philosophy where we intend our quantifiers to range over absolutely everything whatsoever, or at least to be unrestricted, for example when we say that everything is self-identical or that the empty set has no members. Presented with a counterexample, we would not regard it open to the defendant to dismiss the counterexample on the ground that it is not in the domain of quantification. The possibility of unrestricted quantification does not only seem to be plausible, its denial seems to border on the incoherent. If someone claims that one cannot quantify over everything, they seem to imply at the same time that there is something one cannot quantify over (Williamson 2003).

Despite this, the coherence of unrestricted quantification has been doubted. For an overview of this debate, see Rayo and Uzquiano (2006). One objection is related to a principle that was first discussed (but not endorsed) by Cartwright (1994), and is nowadays known as the

All-in-One Principle The objects in a domain of discourse make up a set or some set-like object.

In modern semantics, for example, the domain of discourse is usually taken to be a set. However, according to standard set theory there is no universal set. This causes problems, in particular, when one tries to interpret set-theoretic talk itself. It seems natural to assume that when a set theorist talks about sets, she is (at least sometimes) talking about all sets. The proposal that we should trade in standard set theory for a theory that admits a universal set, such as Quine’s New Foundations (Quine 1980), has not been met with enthusiasm, because this theory does not seem to embody any intuitive picture of sets.

One popular defence of unrestricted quantification makes use of the theory of types. On this account, interpretations are not (first-order) objects but higher-order entities. But this defence is not unproblematic; see Sect. 4 below and Linnebo (2006, pp. 154–156). Therefore, one may think it is preferable to treat the domain of quantification as an (first-order) object. As Linnebo points out, there is no reason to assume that this object needs to be a set. Hence, one possible solution to the problem of unrestricted quantification consists in replacing or supplementing set theory by a theory of classes that allows for a universal class. This proposal is not unproblematic itself, because theories with a universal class are incompatible with the axiom of separation, which seems necessary for semantics. I believe, however, that this problem can be dealt with and will return to it in Sect. 6.

3.2 Reduction of properties

Another area where classes might be useful is metaphysics: one might try to reduce properties or universals to classes. An influential account of this sort was given by Lewis (1986, chap. 1.5). However, there a good reasons, mainly in connection with the semantics of natural language (see below), to assume that properties need to be type-free.Footnote 5

Another motive for self-membered properties was suggested by Allen (2016 pp. 28–31). One classical problem confronting property theory is Bradley’s Regress argument. This argument can be described as follows. Assume that a instantiates the universal or property F. This relation of instantiation is itself a universal, say \(I_1\). Now, one might ask what connects aF and \(I_1\)? This will be another instantiation relation, \(I_2\). But then we may ask what connects \(a, F, I_1\) and \(I_2\)? This will be another instantiation relation, \(I_3\). And so on. Whether this regress is vicious or not is a hotly debated topic.

Whatever the outcome, one might try to simplify the hierarchy of instantiation relations required by the regress. There are at least two options: one could treat \(I_1, I_2, I_3, \ldots\) as instances of a single multigrade relation \(I'\) (where a relation is multigrade if the number of entities it relates can vary); or, one could treat \(I_1, I_2, I_3, \ldots\) as so-called inexactly resembling instances of a single instantiation relation \(I^{*}\) (where instances of a relation inexactly resemble each other if the resemblance is not exact similarity). Either way, \(I'\) and \(I^{*}\) need to be able to self-instantiate.

3.3 Natural language semantics

Classes (properties, concepts) have been applied in the analysis of natural language semantics (Montague 1974). However, there are many intuitively valid inferences that cannot be reconstructed in a typed framework due to the lack of self-exemplifying properties; this has motivated quite some research into type-free theories of properties (Bealer 1982; Menzel 1986, 1993; Orilia 1999; Chierchia and Turner 1988). For example, consider the inference from

  1. 1.

    Everything has the property of being self-identical

to

  1. 2.

    Socrates has the property of being self-identical

and the inference from (1) to

  1. 3.

    The property of being red has the property of being self-identical

The intuitive soundness of both inferences requires not only the existence of the property of being red but also that the quantifier in (1) ranges over both Socrates and the property in question. Hence, this inference cannot be captured in a typed language.

3.4 Reduction of mathematics

Last, but not least, one might be interested in a theory of classes (properties) for the very same reason for which Frege and Russell were originally drawn to it, namely, the “reduction of mathematics to logic”. It has often been claimed that logicism is dead, but several reformed versions of logicism have emerged in recent decades. One should mention here, on the one hand, the works of Bealer (1982), Cocchiarella (1986), Landini (2004), and Orilia (1991), which are based on type-free theories of properties, and, on the other hand, the works of the Neo-Fregean school, which are based on abstraction principles. For a technical overview of the latter, see Burgess (2005). For philosophical discussion, see Hale and Wright (2001), Heck (2011a), and Cook (2009). The Neo-Fregean project originated with the discovery of Frege’s Theorem, namely, that the second-order Dedekind–Peano axioms for arithmetic can be derived, in second-order logic, from what is known as “Hume’s Principle”, namely

$$\forall F\forall G(\# F=\# G\leftrightarrow Eq(F, G))$$

This principle states that the numbers of Fs is identical to the number of Gs if and only if the Fs and Gs are equinumerous (i.e., can be put in a one-one correspondence).

Now, one might be sceptical about the analyticity of Hume’s Principle or whether class theory should be counted as part of logic. But such reductions may still be seen as answering to Frege’s question: How are numbers given to us? The problem of epistemic access to abstract objects has been emphasized by Benacerraf (1983). How can we have knowledge of abstract objects, such as numbers, when we have no causal interactions with such objects? Wright’s idea is that an agent who is capable of second-order reasoning but has no knowledge of number theory could stumble upon Hume’s Principle, say, in a dream and decide to use terms of the form \(\#F\) in accordance with it. Then the claim is that the agent thereby acquires a concept of number without significant epistemological presupposition.

Similarly, one may claim that the concept of class (property) is acquired without significant epistemological presupposition. We “nominalize” predicates in order to generalize predicate places, and that’s all there is to class (property) talk. However, if we want to reduce mathematics to a theory of classes, then type-free classes are called for, because we need to initiate a boot-strapping process in order to generate enough objects that can serve as proxies for mathematical objects.

4 Ranges of significance

The purpose of the present section is to motivate a novel approach to the paradoxes that is loosely based on some remarks that Gödel made in (Gödel 1983) about Russell’s theory of types. Recall that a propositional function is a function that yields a proposition when given an argument. According to Russell’s theory, every propositional function \(\varphi (x)\) has “in addition to its range of truth a range of significance, i.e., a range within which x must lie if \(\varphi (x)\) is to be a proposition at all, whether true or false” (Russell 1903b, p. 523). More generally, the range of significance of a function is the collection of arguments for which said function is defined (i.e., has a value), and the range of significance of a propositional function is the collection of arguments for which the function yields a proposition. The idea of a range of significance need not be tied to the notion of propositional function. Gödel applies it to concepts,Footnote 6 but of course one can also apply it to predicates.

There are several ways in which the notion of a range of significance can be interpreted on a pre-theoretical level. The literature on philosophy of language provides many examples of grammatically well-formed sentences that, for some reason or other, do not express a proposition or lack a definite truth value. Many of these examples may be taken as instances of an object’s being a singular point of the relevant predicate or propositional function. For example, one may think that in the case of a category mistake (e.g., “The number 2 is green”), the object denoted by the name lies outside the range of significance of the predicate. Of course, one may simply treat such a sentence as false and its negation as true (perhaps for reasons of technical simplicity, e.g., in order to stay classical). On a more narrow understanding, one may think that in all and only those cases where the application of a predicate to a name yields a paradoxical sentence (e.g., “This sentence is false”), the object denoted by the name lies outside the range of significance of the predicate.

As Gödel remarks, the idea that every propositional function has a range of significance that need not exhaust the entire universe “brings in a new idea for the solution of the paradoxes, especially suited to their intentional form”, which “consists in blaming the paradoxes not on the axiom that every propositional function defines a concept or class, but on the assumption that every concept gives a meaningful proposition, if asserted for any arbitrary object or objects as arguments” (1983, p. 466). He adds that “[t]he obvious objection that every concept can be extended to all arguments, by defining another one which gives a false proposition whenever the original one was meaningless, can easily be dealt with by pointing out that the concept “meaningfully applicable” need not itself be always meaningfully applicable” (otherwise Grelling’s paradox would ensue).

For reasons that I do not want to enter here, Russell thought that ranges of significance form types such that whenever a propositional function is significant for some argument x, and y belongs to the same type as x, then that function is significant for the argument y as well. This means that

  1. 1.

    whenever a propositional function is significant for some argument x, its range of significance is identical with the type of x;

  2. 2.

    sameness of type is an equivalence relation and, therefore, types are mutually exclusive; and

  3. 3.

    if two functions are both significant for some argument x, then they must have exactly the same range of significance.

The types are then divided into orders (yielding the ramified theory of types), but this further complication need not interest us here. Unfortunately, the theory of types suffers from expressive limitations that have often been pointed out in the literature. For example, Gödel remarks that “[w]hat makes the above principle particularly suspect, however, is that its very assumption makes its formulation as a meaningful proposition impossible, because x and y must then be confined to definite ranges of significance which are either the same or different, and in both cases the statement does not express the principle or even part of it.” (Gödel 1983, p. 466)

It should be observed that Russell’s idea that every propositional function has a range of significance is logically independent of the assumption that the ranges of significance form types. One might therefore consider the possibility of construing classes based on the first but without the second assumption. In the remainder of this paper, I wish to develop the theory of classes in this direction. This approach is inspired by Gödel’s remark that:

It is not impossible that the idea of limited ranges of significance could be carried out without the above restrictive principle [i.e. that the ranges of significance form types]. It might even turn out that it is possible to assume every concept to be significant everywhere except for certain “singular points” or “limiting points”, so that the paradoxes appear as something analogous to dividing by zero. Such a system would be most satisfactory in the following respect: our logical intuitions would then remain correct up to certain minor corrections, i.e., they could then be considered to give an essentially correct, only somewhat ‘blurred’ picture of the real state of affairs. Unfortunately the attempts made in this direction have failed so far; on the other hand, the impossibility of this scheme has not been proved either, in spite of the strong inconsistency results of Kleene and Rosser. (Gödel 1983, p. 466-467)

The following general picture emerges. Let U be the universe of all objects, and \(\varphi (x)\) be some propositional function. \(\varphi (x)\) has a range of significance, \(R(\varphi )\), which is a subset of U. If \(\varphi (x)\) has singular points, then \(R(\varphi )\) is a proper subset of U. For every object a in \(R(\varphi )\), \(\varphi (a)\) is meaningful—that is, true or false. \(\varphi (x)\) thereby determines two classes, the extension \(\{a\in R(\varphi )\mid \varphi (a)\}\) and anti-extension \(\{a\in R(\varphi )\mid \lnot \varphi (a)\}\) of \(\varphi (x)\), whose union coincides with \(R(\varphi )\), that is,

$$\{a\in R(\varphi )\mid \varphi (a)\}\cup \{a\in R(\varphi )\mid \lnot \varphi (a)\}=R(\varphi )$$

Gödel mentions Church’s (inconsistent) system (Church 1932) as an interesting attempt to carry out these ideas. Another possibility is to use some non-classical logic, such as the Weak or Strong Kleene logics. This route faces the notorious problem that the material conditional is not well behaved in these logics. One might therefore consider the following alternative route, which retains classical logic.

Again, let U be the universe of all objects, and \(\varphi (x)\) be some propositional function. As before, \(\varphi (x)\) has a range of significance, \(R(\varphi )\), which is a subset of U. If \(\varphi (x)\) has singular points, then \(R(\varphi )\) is a proper subset of U. This time, however, we treat \(\varphi (a)\) as meaningful (i.e., true or false) for every object a in U. As before, \(\varphi (x)\) determines two classes, the extension \(\{a\in R(\varphi )\mid \varphi (a)\}\) and anti-extension \(\{a\in R(\varphi )\mid \lnot \varphi (a)\}\) of \(\varphi (x)\), whose union coincides with \(R(\varphi )\). The difference to the previous picture is that the classes \(\{a\in R(\varphi )\mid \varphi (a)\}\) and \(\{a\in R(\varphi )\mid \lnot \varphi (a)\}\) may “underspill”: if a is an object outside the range of \(\varphi (x)\), then either \(\varphi (a)\) or \(\lnot \varphi (a)\) will be true; but since a is a singularity of \(\varphi (x)\), it is neither an element of \(\{a\in R(\varphi )\mid \varphi (a)\}\) nor of \(\{a\in R(\varphi )\mid \lnot \varphi (a)\}\).

It is the latter route that will be followed in the remainder of this paper.Footnote 7 For technical convenience, I will modify the picture above in two ways. First, I will treat classes as extensions of predicates (i.e., formulas with one free variable) rather than propositional functions (or concepts) because I am not aware of any suitable theory of propositional functions (concepts). Second, instead of assigning ranges of significance to predicates, I will directly assign them to classes. This saves us the trouble of introducing names for predicates and function symbols for syntactic operations on predicates. From a technical point of view, this does not seem to make too much of a difference, because to every predicate \(\varphi (x)\) there corresponds a unique class abstract \(\{x\mid \varphi (x)\}\). The class abstract can therefore serve as some form of Gödel code for the predicate. In the informal presentation, I will nevertheless talk frequently (as a form of shorthand) of the range of significance of \(\varphi\) instead of that of \(\{x\mid \varphi (x)\}\).

5 A type-free theory of classes

The language of the theory that we are going to present is an ordinary one-sorted first-order language with identity. It contains a binary relation symbol, \(\in\), for membership in a class. One of the expressive limitations of the theory of types is that it cannot express that some object is not in the range of significance of some propositional function (or predicate). In order not to fall prey to the same objection, we will introduce a primitive binary relation symbol, R, into our language. We may read xRy as “x is in the range of significance of y” or “x is not a singular point (singularity) of y”. Let total(x) abbreviate the formula \(\forall z\, zRx\). If x is total, then x has an unrestricted range of significance (i.e., has no singular points). According to the theory that we are going to present, every predicate determines a class. We will therefore assume that our language contains a class term \(\{u\mid \varphi \}\) for every formula \(\varphi\) containing the free variable u. Since we are aiming at a type-free system, \(\varphi\) is allowed to contain \(\in , R\) and other class terms. Moreover, it may contain other free variables as parameters.

A remark on notation. I will use \(\varphi , \psi\) for well-formed formulas, uvxyz for variables, and st for arbitrary terms. Some special symbols will be introduced as we go along. \(\varphi (t/x)\) denotes the result of substituting all free occurrences of x in \(\varphi\) by t. Instead of \(\lnot \, s\in t\) we will also write \(s\notin t\). The usual conventions for the use of brackets apply.

The axioms can be divided into three groups. The first group consists of ‘conceptual’ axioms that describe the general relation between a class and its range of significance. These axioms are directly suggested by the picture provided in the previous section (i.e., that every predicate, together with its range of significance, determines an extension and anti-extension in the indicated way). The second group of axioms describe the relation between the range of significance of a predicate and the logical form of that predicate. They are based on the natural assumption that classes/ranges of significance should be closed under the algebraic operations corresponding to the logical operations on predicates. The third group contains only one axiom expressing the wide-spread idea that the paradoxes are to be blamed on some form of circularity or non-well-foundedness, a view that goes back at least to the days of Russell. These axioms belong to the pure theory of classes, i.e., the part that deals with classes of classes; at the end of this section, we will discuss an axiom for the applied theory of classes, i.e., classes of individuals or urelements.

Our first and most basic axiom scheme is a relativized form of naïve comprehension and follows immediately from the picture presented in the previous section. The axiom states that whenever x is in the range of significance of the predicate \(\varphi\) (or equivalently: whenever x is not a singularity of \(\varphi\)), then x is an element of the class \(\{ u\mid \varphi \}\) if and only if \(\varphi (x/u)\) holds. That is:

figure a

Notice that \(\varphi\) may contain free variables besides u. These should be bound by universal quantifiers. A similar remark applies to the other axioms below.

The axiom scheme is completely general and topic-neutral. We can insert any formula in place of \(\varphi\), including the predicates \(u=u, u\notin u\) and uRu.

It is easily seen that the Axiom of Class Comprehension is consistent. Being a universally quantified conditional, we can make it vacuously true. In this framework, Russell’s paradox is simply transformed into the theorem that the Russell class \(r:=\{u\mid u\notin u\}\) does not lie in its own range of significance: Carrying out the usual reasoning, we convince ourselves that

$$rR r\rightarrow (r\in r\leftrightarrow r\notin r)$$

from which we simply conclude that \(\lnot \, rRr\). No contradiction ensues.

Our second axiom, which also follows from the picture provided in the previous section, states that if x is a singular point of y, then x is not an element of y:

figure b

In conjunction with the Axiom of Class Comprehension, the Axiom of Singularity implies:

$$\forall x\, (x\in \{ u\mid \varphi \}\rightarrow \varphi (x/u))$$
(Out)

This is a very useful theorem. If we know that x is an element of the class y, then we can deduce that x satisfies the defining condition of y. Moreover, this theorem rules out that some classes “overspill”: it is not possible that the class \(\{u\mid \varphi \}\) contains some objects that are not \(\varphi\)s.

We adopt the following version of extensionality, according to which two classes are identical if they have the same range of significance and the same members. (The other direction follows from the logical laws of identity.)

figure c

Here, \(R(x)=R(y)\) is shorthand for \(\forall z\,(zRx\leftrightarrow xRy)\). The reason for imposing this condition is as follows. Assume it is possible to define a class w such that \(w=\{u\mid u\notin u\wedge u=w\}\). (Such self-referential classes cannot be defined in the present formalism, but one may muse about extensions of the system in which this is possible.) It is easy to prove, using the first two axioms, that \(w\notin w\). Hence, w has no members. Now assume that the class \(\varnothing :=\{u\mid u\ne u\}\) has an unrestricted range of significance. (This will actually follows from our other axioms.) Hence, if we identified classes with the same members, we would get that \(\varnothing =w\). But then w would have an unrestricted range of significance as well, which we have just ruled out. (It should be noted that, as things stand, the ordinary axiom of extensionality is consistent with our theory as well.)

The Axiom of Extensionality (in either form) will not be used in any of the theorems below. The reason to include it, apart from conceptual considerations, is merely to highlight that it can be included without leading to triviality. This seems noteworthy because there are well-known problems for adding axioms of extensionality to non-classical logics that contain naïve comprehension (Field 2008, pp. 296–298).

It is perhaps interesting to remark that, given the first three axioms, we can characterize classes with the following abstraction principle (scheme), which states that the class of \(\varphi\)s is identical to the class of \(\psi\)s if and only if \(\varphi\) and \(\psi\) have the same range of significance and are satisfied by exactly the same objects:

$$\{u\mid \varphi \}=\{u\mid \psi \}\leftrightarrow \big (R\{u\mid \varphi \}=R\{u\mid \psi \}\wedge \forall x(\varphi (x)\leftrightarrow \psi (x))\big )$$

The above abstraction principle is a theorem of our theory. In contrast to ordinary abstraction principles, in the above scheme the class terms appear also on the right-hand side of the biconditional. Of course, this is a side-effect of my decision to use classes, instead of predicates, as the second relatum of the R relation. If predicates were used instead, the abstraction terms would only occur on the left-hand side of the abstraction principle.

Our next group of axioms deals with the relation between the range of significance of a predicate and the logical form of that predicate. They are based on the natural assumption that classes/ranges of significance should be closed under the algebraic operations corresponding to the logical operations on predicates. For example, if the number 2 lies in the range of significance of “is green”, then it should lie within the range of “is not green” as well; if Aristotle lies in the range of significance of the predicates “is Greek” and “is a philosopher”, then Aristotle should also lie within the range of “is a Greek philosopher”.

figure d

We will adopt similar axioms for atomic predicates. For example, consider the atomic predicate \(u\in t\), where t denotes some class. Of what objects should we say that they lie in the range of significance of \(u\in t\)? Intuitively, t simply is \(\{u\mid u\in t\}\). Hence, the following seems natural: if x is an object that already lies in the range of significance of t, then x lies in the range of \(u\in t\) as well.Footnote 8

figure e

The axioms introduced so far are compatible with every predicate having an empty range of significance. (Note that all of them are universally quantified conditionals.) They are therefore trivially consistent. In order to get our theory off the ground, we need some axioms that ensure that some predicates have non-empty ranges. The following axiom stipulates that the (class determined by the) predicate \(u=u\) has an unrestricted range of significance. Recall that total(x) abbreviates the formula \(\forall z\, zRx\).

figure f

The reason for postulating this axiom is clear, given our motives. We want to design a theory in which models with a universal domain are available. Instead of adopting the Axiom of Self-Identity, we could stipulate that the empty class \(\varnothing :=\{u\mid u\ne u\}\) is total. Given that ranges of significance are preserved under negation, it does not matter which one we choose. The totality of one follows from the the totality of the other.

I find the Axiom of Self-Identity fairly innocuous. First, the predicate \(x=x\) (just as any other tautological predicate) is stable (i.e., its extension is fixed on every interpretation of the non-logical primitives). Second, the predicate \(x=x\) does not contain the membership symbol \(\in\), and should therefore be admissible. One might compare this line of argument to how the T-schema is restricted in formal theories of truth. The sentences without the truth predicate are always assumed to be admissible instances of the T-schema.

Before presenting the last axiom of the pure theory of classes, let me mention some straightforward consequences of the axioms introduced so far. I hope this will help the reader to get a better feeling for the theory.

First, observe that, as desired, the universal class \(V:=\{u\mid u=u\}\) contains every class, including itself and the Russell class r. For by the Axiom of Class Comprehension, we have

$$xRV\rightarrow (x\in V\leftrightarrow x=x)$$

By the Axiom of Self-Identity, we know that xRV for every x. Hence, \(x\in V\) for every x.

Second, since the Russell class r is contained in the universal class but not vice versa (as the reader may easily verify), we can conclude (by usual laws of identity) that \(V\ne r\). This is in stark contrast to the traditional theories of “proper classes” (i.e., theories of the Morse-Kelley variety), which do not distinguish the two.

Next, we notice that total classes are closed under the following operations. Assume that \(t_1, \ldots , t_n\) are total. Then:

  1. 1.

    \(\{t_1, \ldots , t_n\} :=\{u\mid u=t_1\vee \ldots \vee u=t_n\}\) is total.

  2. 2.

    \((t_1, t_2):=\{\{t_1\}, \{t_1, t_2\}\}\) is total.

  3. 3.

    \(t_1\cup t_2 :=\{u\mid u\in t_1\vee u\in t_2\}\) is total.

  4. 4.

    \(t_1\cap t_2 :=\{u\mid u\in t_1\wedge u\in t_2\}\) is total.

  5. 5.

    \(t_1\setminus t_2 :=\{u\mid u\in t_1\wedge u\notin t_2\}\) is total.

  6. 6.

    \(\overline{t}_1 :=\{u\mid u\notin t_1\}\) is total.

  7. 7.

    \(S(t_1) := t_1\cup \{t_1\}\) is total.

  8. 8.

    \(t_1\cup \{t_2\}\) is total.

For illustrative purposes, I show (3). The other items are proved in a similar fashion. By the Axiom of Membership and the totality of \(t_1\), \(\forall x (xR\{u\mid u\in t_1\})\). Similarly, we have \(\forall x (xR\{u\mid u\in t_2\})\). Therefore, by the Axiom of Connectives, we have

$$\forall x\, (xR\{u\mid u\in t_1\vee u\in t_2\}),$$

which means that \(t_1\cup t_2\) is total.

Using item (7), we can successively generate the finite ordinals in the usual von Neumann style. That is, we let \(0:=\varnothing\), \(1:=\{0\}\), \(2:=\{0, 1\}\), \(3:=\{0,1,2\}\) and so on. It is easily seen that all these classes are total. However, we are not yet able to collect these classes into one.

By item (8), it follows that total classes are closed under adjunction. This means that our theory relatively interprets adjunctive set theory (=existence of empty set plus closure under adjunction), which in turn interprets Robinson arithmetic (roughly, Peano arithmetic without the induction scheme).Footnote 9

It should also be noted that out theory interprets the system known as \({\mathsf {NF}}_2\), whose axioms are extensionality, existence of the empty set, and closure under complements, intersections, and singletons (Forster 2001).

This brings us to the most interesting axiom. It expresses the wide-spread idea that paradoxes are to be blamed on some form of circularity or non-well-foundedness. More precisely, the axiom states that if x is a singularity of some class, then x itself has singular points:

figure g

For a typical example, consider the Russell class. The Russell class has a singular point (namely, the Russell class itself), and that singular point has a singular point itself (namely, the Russell class). And similarly for the Burali-Forti paradox: the class of all ordinals is a singular point of (the class defined by) the predicate ‘x is an ordinal’. We need not assume that all paradoxes stem from such a simple type of circularity. Perhaps there are classes xy such that x is a singular point of y and vice versa. We may also imagine a infinitely descending chain of classes \(x_1, x_2, x_3, \ldots\) such that every class in that sequence is a singular point of its immediate predecessor.Footnote 10

The Axiom of Circularity is logically equivalent to the claim

$$\forall x\, (\forall z\, zRx\rightarrow \forall y\, xRy)$$

In words: whenever x is total, then x lies in the range of significance of every class (i.e., x is not a singularity of any class). Since there are total classes (in fact, infinitely many ones: our proxies for the natural numbers \(0, 1, 2, \ldots\) are all total), this means that no predicate has an empty range of significance (in fact, every predicate has infinitely many objects in its range). Thus I believe that the Axiom of Circularity actually captures to some extent Gödel’s idea that we may assume every predicate to be significant for most arguments.

The Axiom of Circularity is quite remarkable. It justifies impredicative class formation in the sense that it entitles us (in conjunction with the first two axioms) to form classes of total classes at will. For every predicate \(\varphi\), the following is a theorem of our theory:

$$\exists y\, \forall x\, (total(x)\rightarrow (x\in y\leftrightarrow \varphi (x/u)))$$

(The usual condition on the variables apply.) This can be seen as follows. The Axiom of Cirularity implies

$$\forall x\, (total(x)\rightarrow xR\{u\mid \varphi \})$$

Thus by the Axiom of Class Comprehension we get

$$\forall x\, (total(x)\rightarrow (x\in \{ u\mid \varphi \}\leftrightarrow \varphi (x/u))),$$
(SOC)

from which the above claim follows by existential weakening.

The Axiom of Circularity boosts the mathematical strength of our theory significantly. It allows us, using suitable definitions, to derive the second-order Dedekind–Peano axioms for arithmetic within our theory. Let us define

$$\omega :=\{x\mid total(x)\wedge \forall y(0\in y\wedge \forall z\in y(S(z)\in y)\rightarrow x\in y)\},$$

where 0 and S(z) are defined as above. This states that \(\omega\) is the class of total classes that are contained in every inductive class, where a class is inductive if it contains 0 and is closed under successor. This is the usual von Neumann definition of natural numbers; we have only added the condition that the natural numbers must be total.

For illustrative purposes, let us show that \(\omega\) actually contains all natural numbers (as defined above). First, we have already seen that 0 (the empty class) is total. Trivially, 0 is contained in every inductive class. Hence, 0 satisfies the defining condition for being a natural number. But then the scheme (SOC) above allows us to conclude that 0 is indeed an element of \(\omega\). Next, let us show that \(\omega\) is closed under successors. So let \(x\in \omega\). Then by (Out) we know that x satisfies the defining condition of \(\omega\). Hence x is total and contained in every inductive class. But then it trivially follows that its successor, S(x), must also be contained in every inductive class. Moreover, we have seen above that whenever x is total, so is its successor. Hence, S(x) satisfies the defining condition of being a natural number, and since it is total we can conclude that \(S(x)\in \omega\), by another application of (SOC). A complete derivation of the second-order Dedekind–Peano axioms can be found in “Appendix 1”.

The theory presented above is consistent. A proof is given in “Appendix 2”. It needs to be stressed that the axioms presented here are only basic axioms that can and should be extended by additional ones that increase the expressive (mathematical) power of the theory even more.

So far we have only considered classes of classes, that is pure classes. One of the main motives for developing a theory of classes lies in its application to some given domain. So let us assume that our language contains additional predicates applying to objects other than classes, such as people, stones, numbers or sets, and let us introduce a distinguished predicate, U (for urelements), applying to these objects. Then we may adopt the following axiom which states that every urelement is in the range of significance of every class.Footnote 11

figure h

Now let T be some first-order theory not containing the symbols \(\in , R, U\) nor any of the abstraction terms. If T is the language of set theory, we can simply work with two copies of \(\in\). Let \(T^U\) be the theory resulting from T by relativizing all quantifiers in the axioms of T to the predicate U. Moreover, if T contains axiom schemata (such as induction or replacement), extend these so that \(\in , R, U\) and the abstraction terms are allowed to occur in the instances of the schemata. Then it is easily seen that \(T^U\), conjoined with our axioms for classes, implies that

$$\exists y\, \forall x\, (Ux\rightarrow \, (x\in y\leftrightarrow \varphi ))$$

(This follows simply from the Axioms of Urelements and Class Comprehension.) Hence, \(T^U\) together with the theory of classes interprets the second-order version of T.

It is possible to go further. For example, we may add an axiom to the effect that whenever x is a class containing only urelements, then x is in the range of every class. This would allow us to interpret the third-order version of T. This process can be iterated. We can add an axiom to the effect that whenever x is a class of classes of urelements, then x is in the range of every class, which gives us fourth-order T, and so on for every finite order. Hence, we can embed the full type hierarchy over T into our theory of classes.

That this can be done in a type-free theory of classes is something I take as a minimal adequacy result. We have claimed that a type-free theory needs to be developed because of certain expressive limitations of type theory. But then the replacing theory should be at least as expressive as type theory.

There are some theories T that are inconsistent with full second-order comprehension, e.g., abstraction principles for ordinals conceived as sui generis objects (Florio and Leach-Krouse 2017). In such cases, one can weaken the Axiom of Urelements in several ways, if desired. For example, let \(P_1, P_2, \ldots\) be the predicates of T. Then one could replace the Axiom of Urelements with the following schema:

$$\forall x(Ux\rightarrow xR\{u\mid P_iu\})$$

In this case, one only obtains comprehension for predicates that are first-order definable in T, that is, a predicative comprehension principle.

6 Unrestricted quantification

Following a suggestion of Linnebo (2006), I have mentioned that one way to approach the problem of unrestricted quantification is by adopting a theory with a universal class or property. An objection that is frequently raised against such proposals is that theories with universal classes are incompatible with the axiom scheme of separation,

$$\forall x\,\exists y\,\forall z\,(z\in y\leftrightarrow z\in x\wedge \varphi )$$

This axiom states that given a class x, we can collect all members of x that satisfy some property \(\varphi\) into one class y. But if x is the universal class, and \(\varphi\) is the Russell predicate \(z\notin z\), then y cannot exist on pain of contradiction.

Hence, if we admit a universal class we lose separation. However, it seems that we need separation in order to comply with the following semantic principle:

For any domain of interpretation d and any predicate \(\varphi (z)\) in the language, it should be possible to specify an interpretation such that for all individuals \(x\in d\), a predicate letter ‘P’ applies to x if and only if \(\varphi (x/z)\).

The problem emerges because in a model-theoretic semantics the semantic value of ‘P’ needs to be an object. In order that the above principle is satisfied, we need to assign the class \(\{z\in d\mid \varphi (z)\}\) to ‘P’. And this in turn requires the axiom of separation.

I believe, however, that the quasi-Gödelian strategy adopted in the present paper allows us to formulate a satisfactory response to this objection.Footnote 12 For suppose that the domain of our interpretation consists of a class d, and let ‘P’ be a predicate symbol that we want to interpret by a predicate in our language. If we take the idea that every predicate (concept, propositional function) has a range of significance seriously, then it seems reasonable to demand that a predicate be chosen that is significant for all elements in d. I am not sure whether it is plausible to insist that it must be possible to interpret ‘P’ by some predicate that is not significant for some objects in the domain of interpretation. Indeed, the type-theoretic defense can be seen as a special case of this. After all, Russell’s theory diverges from Gödel’s only insofar as a further condition is imposed on the ranges of significance, namely, that they form types. (Recall our discussion in Sect. 4.) Hence we could replace the above semantic principle by the following one:

For any domain of interpretation d and any predicate \(\varphi (z)\) that is significant for all objects in d, it should be possible to specify an interpretation such that for all objects \(x\in d\), a predicate letter ‘P’ applies to x if and only if \(\varphi (x/z)\).

This demand can be met in the theory of classes presented in this paper. For if \(\varphi (z)\) is a predicate and d is a class such that all members of d are in the range of (the class determined by) \(\varphi (z)\), then for all \(x\in d\) we will have \(x\in \{z\in d\mid \varphi (z)\}\) if and only if \(\varphi (x/z)\). Hence we can assign the class \(\{z \in d\mid \varphi (z)\}\) as extension to the predicate letter ‘P’.

How severe is the restriction imposed by the suggested principle? One may argue about this, but I do not think that it is too severe. Notice, for instance, that whenever the universe d contains only urelements, then any predicate in the language can be assigned as interpretation to ‘P’. This still holds true if d contains, in addition, total classes. Only if d contains classes that are non-total are we not free to choose arbitrary predicates. (For example, if d contains the Russell class r, then we cannot interpret ‘P’ by the Russell predicate \(u\notin u\).) However, we can still assign to ‘P’ any predicate that is total—such as the predicate \(x=x\).

7 Conclusion

I have listed a number of desiderata for a type-free theory of classes. Let us now see how the theory proposed in this paper fares with respect to them. The main function of class talk is that it enables us to generalize on predicate places in our language. Second- and higher-order quantifiers provide a means to do so directly. Our class theory can be used for the same purpose. For example, if our class theory is applied to set theory, then we can express the axioms of separation and replacement by single sentences. In addition, our theory allows us to generalize predicate places that cannot be generalized in type theory.

Another possible application for a type-free theory of classes is to serve as a foundation of category theory. In order to be applied like this, we need at least be able to form the class of all sets and the class of all functions between sets, as well as the power class of the class of all sets and the class of all functions between these classes (see Muller 2001). This is possible if our class theory is combined with set theory and the Axiom of Urelements is iterated in the way described. How successful such a class-theoretic foundation of category theory is from a philosophical point of view is, of course, a difficult question which demands further discussion.

The problem of unrestricted quantification was cited as a main motive for a type-free theory of classes. In the previous section, I have argued that the idea of limited ranges of significance provides a response to one of the main objections against a universal domain, namely, the problem of separation.

What about the reduction of properties (universals) to classes and the corresponding analysis of natural language semantics? Obviously, this cannot be answered unless we are given a formal theory of properties (universals). But I think that the prospects here are not too bad either (assuming, of course, that one can deal effectively with the problem that classes seem to be more coarse-grained than classes, perhaps by following David Lewis’ strategy). For instance, consider the property of being a property, which applies to all properties including itself. This could be modelled by the class of all classes. There are good reasons to believe that properties are closed under the algebraic operations corresponding to the logical operations of negation, conjunction, etc. These operations can be performed on classes as well. Moreover, consider again the inference from

  1. 1.

    Everything has the property of being self-identical

to

  1. 2.

    Socrates has the property of being self-identical

and the inference from (1) to

  1. 3.

    The property of being red has the property of being self-identical

In our theory, both inferences can be carried out if talk of properties is appropriately replaced by talk of classes.

Finally, we have seen that the pure theory of classes allows for an interpretation of the second-order Dedekind–Peano axioms of arithmetic (i.e., \({\mathsf {Z}}_2\)). If the theory is extended with an appropriate axiom for forming total power classes of total classes, it is even possible to interpret \({\mathsf {Z}}_\omega\), that is the union of n-th order arithmetic for every \(n\in \omega\), which is roughly equivalent to Zermelo set theory (Zermelo–Fraenkel set theory without replacement and foundation). What this means for the philosophy of mathematics is an altogether different question. I have indicated that the naïve concept of class is acquired without significant epistemological presupposition, and therefore might be used in a project similar to the Neo-Fregean one. But the paradoxes force us to regiment the notion of class, and whether the regimentation proposed here preserves the epistemological status of the naïve notion is clearly in need of further discussion. This, however, is left for another occasion.