Reasoning About Truth in FirstOrder Logic
Abstract
First, we describe a psychological experiment in which the participants were asked to determine whether sentences of firstorder logic were true or false in finite graphs. Second, we define two proof systems for reasoning about truth and falsity in firstorder logic. These proof systems feature explicit models of cognitive resources such as declarative memory, procedural memory, working memory, and sensory memory. Third, we describe a computer program that is used to find the smallest proofs in the aforementioned proof systems when capacity limits are put on the cognitive resources. Finally, we investigate the correlation between a number of mathematical complexity measures defined on graphs and sentences and some psychological complexity measures that were recorded in the experiment.
Keywords
Firstorder logic Proof system Bounded cognitive resources Truth1 Introduction
What truths are humans able to identify in the case of firstorder logic (FO) on finite models? This is a fundamental question of logic, linguistics, and psychology, which nevertheless remains largely unexplored. In this paper, we study this question systematically using psychological experiments and computational models. We combine elements of proof theory and cognitive psychology and extend earlier work on propositional logic (Strannegård et al. 2010).
1.1 Psychological Complexity
1.2 Mathematical Complexity
What factors influence the difficulty for a human to determine whether \(M \models A\) for a finite model \(M\) and FO sentence \(A\)? Are they the properties of the sentence, the model, or some combination of such properties?
The properties of the sentences are clearly important. For instance, the truthvalue of \(\exists x E(x,x)\) is generally easier to determine than the truthvalue of \(\forall x \exists y E(x,y)\). Which properties of sentences are important here? Is it length, quantifier depth, parse tree depth, number of negations, type of connectives, a combination of these properties, or something else?
It is also clear that the properties of the models matter. For instance, the difficulty of determining the truthvalue of \(\forall x \exists y E(x,y)\) clearly depends on the properties of the model. Which properties of models are important in this connection? Is it cardinality, number of edges, some version of Kolmogorov complexity, a combination of these, or something else?
One strategy for predicting the difficulty of determining truth is to combine \(n\) complexity measures on sentences with \(m\) complexity measures on models using some function \(f\) (e.g., a polynomial) of \(n+m\) variables. Despite the apparent generality of this strategy, it might be inadequate regardless of the choice of \(f\). In fact, if the interplay between sentences and models is more intricate than that, then this strategy is bound to fail.
An entirely different strategy is to assume that truthvalues are computed and that the properties of such computations are the most promising indicators of difficulty. This is the strategy we use in this paper.
1.3 Models of Human Reasoning
Cognitive architectures, such as SOAR (Laird et al. 1987), ACTR (Anderson and Lebiere 1998), Clarion (Sun 2007), and Polyscheme (Cassimatis 2002), are computational models that have been used to model human reasoning in a wide range of domains. They typically include explicit models of cognitive resources, such as working memory, sensory memory, declarative memory, and procedural memory; these cognitive resources are known to be bounded in various ways, e.g., with respect to capacity, duration, and access time (Kosslyn and Smith 2006).
1.4 Models of Logical Reasoning
In cognitive psychology, several models of logical reasoning have been presented in the mental logic tradition (Braine and O’Brien 1998) and the mental models tradition (JohnsonLaird 1983). Computational models in the mental logic tradition are commonly based on natural deduction systems (Prawitz 1965); the PSYCOP system (Rips 1996) is one example. The analytical focus of the mental models tradition has been on exploring particular examples rather than developing general computational models.
Human reasoning in the domain of logic is considered from several perspectives in Adler and Rips (2008) and Holyoak and Morrison (2005). Stenning and van Lambalgen (2008) consider logical reasoning both in the laboratory and “in the wild” and investigate how problems formulated in natural language and situated in the real world are interpreted (reasoning to an interpretation) and then solved (reasoning from an interpretation).
Formalisms of logical reasoning include natural deduction (Prawitz 1965; Jaskowski 1934; Gentzen 1969), sequent calculus (Negri and von Plato 2001), natural deduction in sequent calculus style (Negri and von Plato 2001), natural deduction in linear style (Fitch 1952; Geuvers and Nederpelt 2004), and analytic tableaux (Smullyan 1995). Several formalisms have also emerged in the context of automated reasoning, e.g., Robinson’s resolution system (2001) and Stålmarck’s system (2000). None of these formalisms represent working memory explicitly, and most of them were constructed for purposes other than modeling human reasoning.
We suspect that working memory could also be a critical cognitive resource in the special case of logical reasoning (Gilhooly et al. 1993; Hitch and Baddeley 1976; Toms et al. 1993). Therefore, we will define our own proof system that includes an explicit model of working memory.
1.5 Structure of the Paper
In Sect. 2, we report the results of a psychological experiment concerning FO truth. In Sects. 3 and 4, we present proof systems for showing FO truth and FO falsity, respectively. In Sect. 5, we present resourcebounded versions of these proof systems. In Sect. 6, we present a number of mathematical complexity measures, some of which are defined in terms of the resourcebounded proof systems. In Sect. 7, we compare the psychological complexity measures that were obtained in the experiment with the mathematical complexity measures. Section 8 contains a discussion and Sect. 9, finally, presents some conclusions.
2 Experiment
In this section, we describe a psychological experiment concerning \(\mathsf{Truth}\).
2.1 Participants
The participants in our experiment were ten computer science students from Gothenburg, Sweden, whom were invited through email. These students had previously studied FO in their university education. They belonged to various nationalities, were aged 20–30 years, and included one woman and nine men.
2.2 Material
Fifty test items were prepared for the experiment. Each item consisted of a finite graph and an FO sentence. The task was to determine whether the sentence was true in the graph. A screenshot of the graphical user interface is shown in Fig. 1. The test comprised 24 true items and 26 false ones, which were presented in an order that was randomized for each participant.
The logical symbols used in the experiment were the quantifiers \(\forall \) and \(\exists \) and the connectives \(\lnot \), \(\wedge \), \(\vee \), \(\rightarrow \), and \(\leftrightarrow \). The vocabulary \(\tau \) included the binary predicate \(E(x,y)\) (for edges); the unary predicates \({\mathrm{Red}}(x)\), \({\mathrm{Blue}}(x)\), and \({\mathrm{Yellow}}(x)\) (for colors); and the constants \(1\), \(2\), \(3\), and \(4\) (for nodes). Let \(L(\tau )\) be the set of FOformulas defined in the usual way over the vocabulary \(\tau \).
A total of 50 test items, consisting of pairs of graphs and \(L(\tau )\)sentences, were prepared. They were selected manually on the basis of estimated level of difficulty from randomly generated lists of graphs and sentences. All nodes were labeled with unique numbers and colored either yellow, red, or blue.
Some of the test items are given in Appendix A and the full list can be found in Nizamani (2010).
2.3 Procedure
The experiment was conducted in a computer laboratory at the Department of Applied Information Technology, University of Gothenburg. Each participant was assigned an individual computer terminal. A short practice session was conducted before the test to familiarize the participants with some sample problems. The experiment was conducted in a 25min session, followed by a 10min break, followed by another 25min session.
The answers and the response times were recorded for each participant and each item. Two aggregated complexity measures were computed for each item. The accuracy is the proportion of participants who answered the item correctly. The latency is the median response time among the participants who answered the item correctly. The median was used, rather than the mean, to reduce the effects of extreme response times due to external disturbances.
3 The Proof System \(\mathcal T _M\)
In this section, we describe a proof system \(\mathcal T _M\) for proving sentences to be true in a fixed finite model \(M\). Please observe that this system is not a classical proof system in which only logical truths can be derived, but a system in which all truths in some fixed model \(M\) are derivable.
The system \(\mathcal T _M\) is a rather straightforward rewrite system with some modifications and extensions. The proofs are linear sequences of sentences. The system is local in the sense that checking whether a sequence of sentences is a proof can be done by only looking at two consecutive sentences at a time. The proofs are also goaldriven, which means that they start with a proof goal, i.e., the desired conclusion, and then use rules to reduce the goal to the true statement \(\top \).
The proof system has two ingredients, axioms and rules. The axioms model declarative memory content and visual information about models and sentences, whereas the rules model the procedural memory. The main rule (Substitution) is based on the principle of compositionality, which is that meaning (and truthvalue, in particular) is preserved under the substitution of logically equivalent components. The axioms are used as sideconditions to this rule. For example, substitution of \(A\) for \(B\) is only allowed if \(A \leftrightarrow B\) is an axiom. Substitution is a deep rule, which means that it allows subformulas appearing deep down in the parsetree to be substituted.
3.1 Formulas
 (i)
bounded quantifiers \(\forall ^\varOmega \) and \(\exists ^\varOmega \), where \(\varOmega \) is a set of constant symbols in \(\tau \) and \(\forall ^\varOmega x A\) has the intended meaning \(\forall x (\bigvee _{c \in \varOmega } x=c \rightarrow A)\);
 (ii)
abstraction boxes for modeling visual perception of formulas, \({[\![A ]\!]}\); and
 (iii)
contexts for assigning values to variables, \([x_1 = c_1, \ldots , x_k=c_k]\).
Definition 1
Definition 2

\(M \models _s \exists ^\varOmega xA\) iff \(M \models _s \exists x (\bigvee _{c \in \varOmega } x=c \wedge A)\),

\(M \models _s \forall ^\varOmega x A\) iff \(M \models _s \forall x (\bigvee _{c \in \varOmega } x=c \rightarrow A)\),

\(M \models _s {[\![A ]\!]}\) iff \(M \models _s A\), and

\(M \models _s A[x_1=c_1,\ldots ,x_k=c_k]\) iff \(M\models _{s[c_1/x_1,\ldots ,c_k/x_k]} A\).
Next, we define what it means for a relation symbol \(R\) in a formula to be substituted by a formula.
Definition 3
(Substitution) If \(R\) is a relation symbol of arity \(k\), \(A\) a formula in \(L^*(\tau \cup {\left\{ {R}\right\} })\) and \(B\) a formula in \(L^*(\tau )\) with the free variables \(\bar{x} = x_1,\ldots ,x_k\), we define the substitution \(A[B(\bar{x})/R]\) to be the formula we obtain from \(A\) by substituting the leftmost occurrence of the symbol \(R\) with \(B[t_1,\ldots ,t_k/ x_1,\ldots ,x_k]\), where \(R\) occurs as \(R(t_1,\ldots ,t_k)\) and \(t_i\) is free for \(x_i\) in \(B\) (if not, the notation \(A[B(\bar{x})/R]\) is undefined).
Observe that substitutions are not carried out inside abstraction boxes \({[\![C ]\!]}\).
3.2 Axioms
The axiom set \(\varGamma \) is constructed as the union of three sets: the logical axioms, the sentence axioms and the model axioms.
3.2.1 Logical Axioms
The set of logical axioms is denoted \(\varGamma _T\). The members of this set are the logical truths listed in Appendix . This list was compiled from a number of textbooks on basic logic, including (Huth and Ryan 2004).
3.2.2 Sentence Axioms
The second ingredient of \(\varGamma \) comes from the idea to include parsing of a formula in the proof system itself. When we must check the truth of a formula, we first must parse the formula. This action is modeled in the system using abstraction boxes, which are considered black boxes that the agent cannot look inside. By using the rules of the proof system, the agent may unwind the formula one step at a time. Formally, we add one abstraction box for each formula in the language. The abstraction box corresponding to the formula \(A\) is denoted by \({[\![A ]\!]}\). These boxes are in many respects similar to atomic formulas; the free variables of the abstraction box \({[\![A ]\!]}\) are defined to be the same as the free variables of \(A\). This idea is closely related to the template logic introduced in Engström (2002), which was constructed to understand nonstandard formulas in nonstandard models of arithmetic.
In a bounded version of the proof system that will be introduced later, we will restrict the complexity of the formulas that may be used in a proof. Thus, the abstraction boxes may be used to reduce this complexity. The drawback of this reduction in complexity is that the substitution rule (see Sect. 3.3.1) may not substitute inside abstraction boxes.

If \(A\) is atomic, \({[\![A ]\!]} \leftrightarrow A\).

\({[\![\lnot A ]\!]} \leftrightarrow \lnot {[\![A ]\!]}\)

\({[\![A \cdot B ]\!]} \leftrightarrow {[\![A ]\!]} \cdot {[\![B ]\!]}\), where \(\cdot \) is one of \(\vee \), \(\wedge \), \(\rightarrow \), and \(\leftrightarrow \).

If \(A\) is atomic, \({[\![A \cdot B ]\!]} \leftrightarrow A \cdot {[\![B ]\!]}\), where \(\cdot \) is one of \(\vee \), \(\wedge \), \(\rightarrow \), and \(\leftrightarrow \) (similarly for \(B\)).

\({[\![Q x A ]\!]} \leftrightarrow Q x {[\![A ]\!]}\), where \(Q\) is one of \(\forall \), \(\forall ^\varOmega \), \(\exists \), and \(\exists ^\varOmega \).

If \(A\) is atomic, \({[\![Q x A ]\!]} \leftrightarrow Q x A\), where \(Q\) is one of \(\forall \), \(\forall ^\varOmega \), \(\exists \), and \(\exists ^\varOmega \).
3.2.3 Model Axioms
 1.
\(A\) and \(A \leftrightarrow \top \), whenever \(A^{\prime }\) is a quantifierfree formula that is true in \(M\).
 2.
\(\lnot A\) and \(A \leftrightarrow \bot \), whenever \(A^{\prime }\) is a quantifierfree formula that is false in \(M\).
 3.
\(\forall x \, A \leftrightarrow \forall ^{\varOmega _0} x \, A\).
 4.
\(\exists x \, A \leftrightarrow \exists ^{\varOmega _0} x \, A\).
 5.
\(\forall ^\varOmega x \, A \leftrightarrow \forall ^{\varOmega \setminus {\left\{ {c}\right\} }} x\, A\), where \(M \models A^{\prime }[x=c]\) and \(A^{\prime }\) is quantifierfree.
 6.
\(\exists ^\varOmega x \, A \leftrightarrow \exists ^{\varOmega \setminus {\left\{ {c}\right\} }} x\, A\), where \(M \nvDash A^{\prime }[x=c]\) and \(A^{\prime }\) is quantifierfree.
 7.
\(\forall ^\varOmega x \, (B \rightarrow A) \leftrightarrow \forall ^{\varOmega ^{\prime }} x\, A\), where \(\varOmega ^{\prime }={\left\{ {a  M \models B[x=a]}\right\} }\) and \(B\) is an atomic formula.
 8.
\(\exists ^\varOmega x \, (B \wedge A) \leftrightarrow \exists ^{\varOmega ^{\prime }} x\, A\), where \(\varOmega ^{\prime }={\left\{ {a  M \models B[x=a]}\right\} }\) and \(B\) is an atomic formula.
The set of axioms \(\varGamma \) is defined as the union of \(\varGamma _T\), \(\varGamma _S\), and \(\varGamma _M\) (which depends on the model \(M\)).
3.3 Rules
In this subsection, we present the rules of \(\mathcal T _M\).
3.3.1 Substitution
In the case of MI, we allow all contexts in \(A\) to be used for determining whether \(C \leftrightarrow B\) is in \(\varGamma _M\). I.e., if \(\gamma \) is the list of all contexts in \(A\) then \(B\) may be replaced by \(C\) if \((C\gamma ) \leftrightarrow (B\gamma ) \in \varGamma _M\).^{1} For example in a model where \(P(c)\) holds we may use MI to deduce \((\top \wedge B)[x=c]\) from \((P(x) \wedge B)[x=c]\).
In short, Substitution enables us to deduce \(A[C(\bar{x})/R]\) from \(A[B(\bar{x})/R]\) whenever \(C \leftrightarrow B \in \varGamma \) and \(C\) and \(B\) have the same free variables.
3.3.2 Strengthening
3.3.3 Truth Recall
This concludes the definition of the proof system \(\mathcal T _M\).
3.4 Proofs
Definition 4
(Proof in \(\mathcal T _M\)) Suppose \(M \in \text{ MOD}\) and \(A \in \text{ SEN}\). A proof of \(A\) in \(\mathcal T _M\) is a sequence of sentences \((A_0,A_1,\ldots ,A_n)\) such that \(A_0 = {[\![A ]\!]}\), \(A_n = \top \), and \(A_{i+1}\) follows from \(A_i\) by one of the rules of \(\mathcal T _M\).
By letting the first sentence in the sequence be \({[\![A ]\!]}\) rather than \(A\), we aim to model the idea that an individual who is presented with a sentence perceives the structure of the sentence gradually, starting from perceiving nothing and possibly ending before all the details have been perceived. Formula inspection can be used for gradually perceiving and parsing the sentence.
Note that the abstraction boxes may be removed in the beginning of the proofs by zooming in on the formula using iterated applications of Substitution (FI). Abstraction boxes are strictly speaking unnecessary at this point, but they will play an important role in Sect. 5, in which bounded versions of proof systems will be considered.
3.5 Properties

declarative memory, which is modeled by the logical axioms in the set \(\varGamma _T\);

procedural memory, which is modeled by the rules;

sensory memory, which is modeled as a buffer to hold sentence axioms from the set \(\varGamma _S\) (modeling visual perception of sentence structure);

working memory, which is modeled as a buffer to hold temporary proof goals.
Proposition 1
Suppose \(M \in \text{ MOD}\) and \(A \in \text{ SEN}\). Then, \(M \models A\) iff \(A\) is provable in \(\mathcal T _M\).
Proof
Righttoleft is soundness: We prove that in any proof of \(\mathcal T _M\), if \(A\) occurs, then \(M \models A\). This is done by induction starting from the bottom; the trivial base case is when \(A\) is \(\top \). For the induction step, all we must check is that truth is preserved (in the sense that if \(A\) may be deduced from \(B\) and \(M \models A\), then \(M \models B\)) by the rules in \(\mathcal T _M\). This is straightforward.
Lefttoright is completeness: Suppose \(M \models A\). Note that by essentially using Substitution (FI) followed by Substitution (MI), we may reduce any sentence \(A\) to a propositional formula in which only \(\top \) and \(\bot \) are atomic formulas. Regardless of the details of this propositional formula, we can then use Substitution (LE) to produce strictly shorter formulas at each step until either \(\top \) or \(\bot \) is reached. Because \(M \models A\), however, the soundness of the system forces the proof to end with \(\top \).
4 The Proof System \(\mathcal F _M\)
In this section, we describe a proof system \(\mathcal F _M\) for showing sentences to be false in a fixed finite model \(M\). The formulas and axioms of \(\mathcal F _M\) are the same as for \(\mathcal T _M\).
4.1 Rules
Now, let us define the rules of \(\mathcal F _M\).
4.1.1 Substitution
This rule is identical to the Substitution rule of \(\mathcal T _M\).
4.1.2 Weakening
4.1.3 Falsity Recall
This rule makes use of a set \(\varGamma _F\), which contains contradictions only. We omit the exact definition of \(\varGamma _F\), which is a list of textbook contradictions in the style of Appendix .
4.2 Properties
Definition 5
(Proof in \(\mathcal F _M\)) Suppose that \(M \in \text{ MOD}\) and \(A \in \text{ SEN}\). A proof of \(A\) in \(\mathcal F _M\) is a sequence of sentences \((A_0,A_1,\ldots ,A_n)\) such that \(A_0 = {[\![A ]\!]}\), \(A_n = \bot \), and \(A_{i+1}\) follows from \(A_i\) by one of the rules of \(\mathcal F _M\).
Now, let us prove the adequacy of \(\mathcal F _M\) for showing falsity in \(M\).
Proposition 2
Suppose \(M \in \text{ MOD}\) and \(A \in \text{ SEN}\). Then, \(M \not \models A\) iff \(A\) is provable in \(\mathcal F _M\).
Proof
This proof is analogous to the proof of Proposition 1.
5 The Bounded Proof Systems \(\mathcal{ BT} _M\) and \(\mathcal{ BF} _M\)
In this section, we define resourcebounded versions of the proof systems \(\mathcal T _M\) and \(\mathcal F _M\). For this purpose, we require a precise definition of sentence length.
Definition 6

\(A=1\) when \(A\) is atomic.

\({[\![A ]\!]}=1\).

\(A[x_1=c_1,\ldots ,x_k=c_k] = 1+A\).

\(\lnot A= 1+ A\).

\(A \cdot B  = 1 + A +  B\), where \(\cdot \) is either \(\vee \), \(\wedge \), \(\rightarrow \) or \(\leftrightarrow \).

\(Qx A= 2 + A\), where \(Q\) is either \(\forall \), \(\forall ^\varOmega \), \(\exists \), or \(\exists ^\varOmega \).
Now we can define our two bounded proof systems.
Definition 7

Working memory limit. The maximum length of a sentence that can appear in a proof is 8.

Sensory memory limit. The rule Substitution (FI) is only allowed on sentence axioms of \(\varGamma _S\) of maximum length 7.
6 Mathematical Complexity Measures
 1.
Sentence length. The length of \(A\). Range: 5–9.
 2.
Quantifier count. The number of quantifiers appearing in \(A\). Range: 1–3.
 3.
Negation count. The number of negations appearing in \(A\). Range: 0–2.
 4.
Cardinality. The number of nodes of \(M\). Range: 3–4.
 5.
Edge count. The number of edges of \(M\). Range: 3–6.
 6.
Linear combination. A linear combination of sentence length, cardinality, and sentence length * cardinality, whose coefficients can be fitted to experimental data.
 7.
Working memory. The minimum size of the WM required for proving \(A\) in \(\mathcal BT _M\) or \(\mathcal BF _M\). Range: 3–8.
 8.
Proof length. The minimum number of steps of a proof of \(A\) in \(\mathcal BT _M\) or \(\mathcal BF _M\). Range: 4–38.
 9.
Proof size. The minimum size of a proof of \(A\) in \(\mathcal BT _M\) or \(\mathcal BF _M\), i.e. the minimum sum of the lengths of the formulas appearing in the proof. Range: 6–164.
Measures 1–5 are standard complexity measures in logic. Measure 6 (Linear combination) illustrates the possibility of combining such complexity measures in various ways, e.g. via polynomials. As suggested in Sect. 1.2, there may be a priori arguments for believing that these complexity measures are inadequate for the present experiment. Measure 7 (Working memory) models the maximum strain on the working memory and measure 8 (Proof length) models the length of the shortest train of thought leading to the desired conclusion.
Measure 9 (Proof size) is based on the idea that latency in the context of logical reasoning depends on some notion of computational workload. By identifying the working memory load with formula length, we get proof size as a measure of computational workload. Thus measure 9 models the minimum amount of data that must flow through the working memory for the problem to be solved.
We implemented automated theorem provers for the systems \(\mathcal BT _M\) and \(\mathcal BF _M\) in the functional programming language Haskell. Proofs of minimum length and minimum size were generated for all items appearing in the experiment. For instance, the proof appearing in Sect. 3.4 was generated by our theorem prover, as a proof of minimum size.
7 Results
Mean latency and mean accuracy for true and false items
True  False  

Latency (seconds)  33  37 
Accuracy (%)  74  67 
Correlations between latency and some mathematical complexity measures
True  False  

Sentence length  0.35  0.63 
Quantifier count  0.42  0.73 
Negation count  0.15  0.12 
Cardinality  0.01  \(0.02\) 
Edge count  \(0.28\)  0.09 
Linear combination  0.38  0.64 
Working memory  0.53  0.66 
Proof length  0.49  0.76 
Proof size  0.54  0.76 
Correlations between accuracy and some mathematical complexity measures
True  False  

Sentence length  \(0.22\)  \(0.02\) 
Quantifier count  \(0.22\)  \(0.05\) 
Negation count  \(0.33\)  \(0.56\) 
Cardinality  \(0.41\)  0.38 
Edge count  \(0.04\)  \(0.10\) 
Linear combination  0.46  0.50 
Working memory  \(0.32\)  \(0.19\) 
Proof length  \(0.21\)  0.13 
Proof size  \(0.24\)  0.07 
8 Discussion
In this paper, we studied human reasoning in the case of FO truth using standard methods of cognitive psychology. This reflects our view that there is nothing special about problemsolving in the domain of logic, and therefore, it can be explored using ordinary experimental methods and modeled using ordinary models of cognitive psychology that would apply equally well to mental arithmetic, Sudoku, or Minesweeper, for example.
8.1 Comments on the Experiment
In a preliminary investigation, we observed that the difficulty of determining truth was affected by the manner in which the models were presented graphically. For instance, determining whether a graph is complete seems to be simpler if the graph is drawn in a regular fashion. To mitigate this problem, we drew our graphs by placing the nodes equally distributed on a circle.
As is usual in experiments of this type, the answers given by the participants are potentially problematic because guesses, interaction errors (e.g., hitting the wrong button accidentally), and distractions in the experimental situation (e.g., coughing attacks) can affect the individual results substantially. Another potential problem relates to the instructions. In our particular experiment, we learned afterwards that some of the participants thought that if two variables \(x\) and \(y\) appeared in a sentence, then automatically \(x \ne y\). Our instructions failed to address this point.
On the aggregate level, some of the problems on the individual level might partially cancel out, but then new problems arise on the modeling side. In fact, to model experimental data on the aggregated level, one must develop a computational model that represents some sort of average of the participants. Strictly speaking, this might not be possible using our present type of computational model, which was designed for cognitive modeling on the individual level. These factors should be borne in mind when evaluating the results.
8.2 Comments on the Proof Systems
The bounded proof systems described in Sect. 5 can be modified in several ways to suit different human role models. One way is to modify the axioms; a second is to modify the rules; a third is to change the working memory and sensory memory capacities; and a fourth is to add models of other cognitive resources.
Intuitively, proofsize reflects how comprehensive a thought must be, i.e., how much information must be processed to produce the correct answer. Proofs that require more information processing might take longer to process and be more prone to error. One may perhaps think of the smallest proofs as the smartest proofs, relatively to a given repertoire of cognitive resources.
As is often the case with cognitive modeling, this complexity measure can be criticized for being too coarse. For instance, it does not directly reflect the effort of searching for a proof; instead, it reflects the effort required to verify the steps in a proof that has been found. Although the efforts required for finding a proof and verifying the same proof may be somewhat correlated, this limitation certainly allows for future improvements to the model.
8.3 Comments on the Results
The data in Table 1 indicate that the True items were easier to solve than the False items.
Among the complexity measures analyzed in Table 2, Proof size has the highest correlation with latency, both for True and False items. Second comes the closely related complexity measure Proof length. This might indicate that complexity measures that are defined in terms of computations are more adequate than those defined in terms of standard properties of models and sentences.
Some of the complexity measures on sentences in Table 2 fared relatively well. In Sect. 1 it was argued that in general, properties of models can dramatically affect the difficulty of determining truth. The reason why those complexity measures on sentences were relatively successful in this particular case might be that the models that were used in the experiment were quite homogeneous. In fact, Cardinality varied between 3 and 4, and Edge count varied between 3 and 6. Therefore, the complexity measures that considered only sentences were perhaps not sufficiently challenged in the present experiment.
All of the complexity measures analyzed in Table 3 have relatively low correlations with accuracy. We do not know why the contrast to latency is so pronounced. The correlation values for Negation count stand out here and provide some support to the idea that negations increase the probability of error.
9 Conclusions
In this paper, we presented (i) the results of an experiment pertaining to \(\mathsf{Truth}\), (ii) two proof systems for deriving membership and nonmembership in \(\mathsf{Truth}\) using bounded cognitive resources, and (iii) an analysis of the correlation between psychological complexity measures and different mathematical complexity measures, including proof size. The results indicate that proof size was more successful than the other complexity measures in the case of latency. The approach that we use enables predictions about latency to be made for arbitrary elements of \(\mathsf{Truth}\). To evaluate the usefulness of this approach, which combines elements of proof theory and cognitive psychology, more experiments are needed, in particular experiments with more heterogeneous test items.
Footnotes
 1.
This can be defined in a precise manner, but we omit the technical details here.
Notes
Open Access
This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.
References
 Adler, J. E., & Rips, L. J. (2008). Reasoning: Studies of human inference and its foundations. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
 Anderson, J. R., & Lebiere, C. (1998). The atomic components of thought. Mahwah, NJ: Lawrence Erlbaum.Google Scholar
 Braine, M. D. S., & O’Brien, D. P. (1998). Mental logic. UK: L. Erlbaum Associates.Google Scholar
 Cassimatis, N. (2002). Polyscheme: A cognitive architecture for integrating multiple representation and inference schemes. PhD thesis.Google Scholar
 Ebbinghaus, H. D. (1985). Extended logics: The general framework. Modeltheoretic logics (pp. 25–76).Google Scholar
 Engström, F. (2002). Satisfaction classes in nonstandard models of arithmetic. Licentiate thesis, Chalmers University of Technology.Google Scholar
 Fitch, F. B. (1952). Symbolic logic: an introduction. New York: Ronald Press.Google Scholar
 Gentzen, G. (1969). Investigation into logical deduction, 1935. In M. E. Szabo (Eds.), The collected papers of Gerhard Gentzen. NorthHolland Amsterdam.Google Scholar
 Geuvers, H., & Nederpelt, R. (2004). Rewriting for Fitch style natural deductions. In Rewriting techniques and applications. Springer (pp. 134–154).Google Scholar
 Gilhooly, K. J., Logie, R. H., Wetherick, N. E., & Wynn, V. (1993). Working memory and strategies in syllogisticreasoning tasks. Memory & Cognition, 21(1), 115–124.CrossRefGoogle Scholar
 Hitch, G. J., & Baddeley, A. D. (1976). Verbal reasoning and working memory. The Quarterly Journal of Experimental Psychology, 28(4), 603–621.CrossRefGoogle Scholar
 Holyoak, K. J., & Morrison, R. G. (2005). The Cambridge handbook of thinking and reasoning. Cambridge: Cambridge University Press.Google Scholar
 Huth, M., & Ryan, M. (2004). Logic in computer science: Modelling and reasoning about systems. Cambridge, UK: Cambridge University Press.CrossRefGoogle Scholar
 Jaskowski, S. (1934). The theory of deduction based on the method of suppositions. Studia Logica, 1, 5–32.Google Scholar
 JohnsonLaird, P. N. (1983). Mental models. Cambridge, MA: Harvard University Press.Google Scholar
 Kosslyn, S. M., & Smith, E. E. (2006). Cognitive psychology: Mind and brain. Upper Saddle River, NJ: PrenticeHall Inc.Google Scholar
 Laird, J., Newell, A., & Rosenbloom, P. (1987). Soar: An architecture for general intelligence. Artificial Intelligence, 33(3), 1–64.CrossRefGoogle Scholar
 Negri, S., & von Plato, J. (2001). Structural proof theory. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
 Nizamani, A. R. (2010). Anthropomorphic proof system for firstorder logic. Masters thesis, Chalmers University of Technology.Google Scholar
 Prawitz, D. (1965). Natural deduction. In A prooftheoretical study, volume 3 of Stockholm studies in philosophy. Stockholm: Almqvist & Wiksell.Google Scholar
 Rips, L. (1996). The psychology of proof. Bradford.Google Scholar
 Robinson, A., & Voronkov, A. (2001). Handbook of automated reasoning. The Netherlands: Elsevier Science.Google Scholar
 Sheeran, M., & Stålmarck, G. (January 2000). A tutorial on Stålmarck’s proof procedure for propositional logic. Formal Methods in Systems Design, 16(1), 23–58.Google Scholar
 Smullyan, R. M. (1995). Logic, FirstOrder (second corrected edition). NewYork: Dover. (Berlin: Heidelberg, New York: First published in 1968 by Springer).Google Scholar
 Stenning, K., & van Lambalgen, M. (2008). Human reasoning and cognitive science. Cambridge, MA: Bradford Books MIT Press.Google Scholar
 Strannegård, C., Ulfsbäcker, S., Hedqvist, D., & Gärling, T. (2010). Reasoning processes in propositional logic. Journal of Logic, Language and Information, 19(3), 283–314.CrossRefGoogle Scholar
 Sun, R. (2007). The importance of cognitive architectures: An analysis based on CLARION. Journal of Experimental & Theoretical Artificial Intelligence, 19(2), 159–193.CrossRefGoogle Scholar
 Toms, M., Morris, N., & Ward, D. (1993). Working memory and conditional reasoning. The Quarterly Journal of Experimental Psychology, 46(4), 679–699.Google Scholar