Review of Philosophy and Psychology

, Volume 1, Issue 2, pp 245–264

Similarity and Induction

Authors

    • University of Pennsylvania
  • Daniel Osherson
    • University of Pennsylvania
Article

DOI: 10.1007/s13164-009-0017-0

Cite this article as:
Weber, M. & Osherson, D. Rev.Phil.Psych. (2010) 1: 245. doi:10.1007/s13164-009-0017-0

Abstract

We advance a theory of inductive reasoning based on similarity, and test it on arguments involving mammal categories. To measure similarity, we quantified the overlap of neural activation in left Brodmann area 19 and the left ventral temporal cortex in response to pictures of different categories; the choice of of these regions is motivated by previous literature. The theory was tested against probability judgments for 40 arguments generated from 9 mammal categories and a common predicate. The results are interpreted in the context of Hume’s thesis relating similarity to inductive inference.

1 Introduction

David Hume (2006) famously asserted a role for similarity in non-deductive inference. Here is the well-known passage.

In reality, all arguments from experience are founded on the similarity which we discover among natural objects, and by which we are induced to expect effects similar to those which we have found to follow from such objects. ... From causes which appear similar we expect similar effects.

Hume’s view is consistent with his predecessor Locke (1689), for whom analogy was “the great rule of probability.” Just what Locke and Hume meant by the term “probability” is open to discussion, but their thesis is clear. Similarity often lies behind inductive inference. The goal of the present essay is to sharpen this insight.

By “induction” we’ll understand a certain relation between a list of statements and some further statement. The first statements are called premises, the last the conclusion, and the ensemble an argument. The inductive strength of an argument for a given person will be identified with the subjective conditional probability she attaches to the conclusion given the premises. This definition raises questions about subjective probability in the minds of people who misunderstand chance. (Most college students can be led to incoherent estimates of probability; see Bonini et al. 2004; Tentori et al. 2004.) So we will just assume that the probability idiom conveys a familiar kind of psychological coherence condition. An argument is strong to the extent that the reasoner would find it odd to believe the premises without believing the conclusion. Squeezing this mental sensation into the unit interval and calling it probability provides a rough measure.

Inductive inferences are of such diverse character that we may despair of treating them within a unified theory. It will make things easier to limit attention to premises and conclusion with subject-predicate syntax, the same predicate appearing throughout. Here is an illustration along with the style of abbreviation used in the sequel.
https://static-content.springer.com/image/art%3A10.1007%2Fs13164-009-0017-0/MediaObjects/13164_2009_17_Figa_HTML.gif
In the general case there may be multiple premises.
To predict the conditional probability of the conclusion given the premises, it is necessary to start from some sort of information. We assume that two kinds of quantity are available. First, we give ourselves the similarity between all pairs of objects in play. Second, we give ourselves all relevant unconditional probabilities. Similarity will be assumed to take values in the unit interval, be symmetric, and return 1 in case of identity. Denoting the similarity function by \({{\textit similarity}}(\cdot,\cdot)\), our assumptions are thus:
https://static-content.springer.com/image/art%3A10.1007%2Fs13164-009-0017-0/MediaObjects/13164_2009_17_Figb_HTML.gif

As mentioned, we also help ourselves to the unconditional probabilities of each premise and the conclusion of an argument. All that’s missing is the conditional probability of the conclusion given the premises, in other words, the inductive strength of the argument. Our project is thus to forge conditional probability from unconditional probability plus similarity. The criterion of success will be conformity to the estimates of conditional probability that people typically offer. This puts a descriptive spin on Hume’s thesis, which is consistent with his doubts about the normative justification of induction.

2 An Algorithm for Constructing Conditional Probability

Bayes’ Theorem offers little help in our enterprise since it expands the desired conditional probability into an expression featuring a different conditional probability, just as challenging (namely, the likelihood):
https://static-content.springer.com/image/art%3A10.1007%2Fs13164-009-0017-0/MediaObjects/13164_2009_17_Figc_HTML.gif
We get more help from the usual definition of conditional probability since it suggests that we attempt to estimate the probability of conjunctions of statements.
https://static-content.springer.com/image/art%3A10.1007%2Fs13164-009-0017-0/MediaObjects/13164_2009_17_Figd_HTML.gif
Following the conjunction strategy, our problem reduces to this one: We’re given the probabilities of the conjuncts, for example, the probability that sheep have at least 18% of their cortex in the frontal lobe. We’re also told the pairwise similarity among {a1 ⋯ an, c} for example, the similarity of goats and sheep. From this we wish to construct a sensible estimate of the probability of the conjunction. This is all we need to derive conditional probability.
So what is a sensible estimate of the probability of a conjunction Qb1 & ⋯ & Qbn?1 One constraint is that the conjunction probability fall between the minimum and maximum allowed by the laws of chance. It can be shown (Neapolitan 1990) that the lowest possible value is \( \max\{0,\ 1 - n + \sum_{i=1}^n {{\textit Prob}}(Qb_i)\} \), that is, one plus the sum of the n conjunct probabilities minus n, or zero if the latter number is negative. The highest possible value is the minimum probability of the conjuncts. Thus, we have:
https://static-content.springer.com/image/art%3A10.1007%2Fs13164-009-0017-0/MediaObjects/13164_2009_17_Fige_HTML.gif
For example, if Prob(Qa) = .8 and Prob(Qb) = .4 then the probability of the conjunction cannot exceed .4, and it can’t fall below .2:
$$ .2\ =\ 1 - 2 + .8 + .4 \ \le \ {{\textit Prob}}(Qa \ \&\ Qb) \ \le\ \min\{.4, .8\}\ =\ .4. $$
Now notice that if a and b were identical—had maximal similarity—then Qa and Qb would express the same proposition. So the conjunction of Qa with Qb would be logically equivalent to Qa, and also to Qb. In this case, the probability of the conjunction would fall at the upper bound of the possible interval, namely, the minimum of the conjunct-probabilities. In contrast, the conjunction makes a logically stronger claim to the extent that a and b are dissimilar, so low similarity should situate the probability of the conjunction closer to the bottom of the permitted interval. We allow the similarity of a and b to determine a position between these extremes. The conjunction seen earlier serves as illustration. Recall that its probability must fall in the interval from .2 to .4. The probability we construct for it is the weighted sum of these endpoints, where the weights are given by the similarity of a to b. Summarizing this example:
https://static-content.springer.com/image/art%3A10.1007%2Fs13164-009-0017-0/MediaObjects/13164_2009_17_Figf_HTML.gif
In the general case we’re given a conjunction with n conjuncts. As usual, we assume that we know the unconditional probabilities of its conjuncts. This allows us to compute the lower and upper bounds on its possible probability. We take the similarity-factor to be the minimum of all the pairwise similarities among the objects appearing in the conjunction. Then we estimate the probability of the conjunction to be the weighted sum of the lower and upper bounds, where similarity controls the weights. That is:
https://static-content.springer.com/image/art%3A10.1007%2Fs13164-009-0017-0/MediaObjects/13164_2009_17_Figg_HTML.gif
As a sanity check, it can be verified that our scheme satisfies the conjunction law:
$$ {{\textit Prob}}(Qb_{1} \ \&\ \cdots \ \&\ Qb_{n})\ \ge\ {{\textit Prob}}(Qb_{1} \ \&\ \cdots \ \&\ Qb_{n} \ \&\ Qb_{n+1}) $$
This law helps to motivates the use of minimum similarity in our model; for, it would not be satisfied had we relied on average or maximum instead.
We must also consider conditional probabilities that involve negated statements. They arise from negated premises or conclusion, as illustrated here:
$$ {{\textit Prob}}(Qc \mid \neg Qa ) = \frac{{{\textit Prob}}(Qc\ \&\ \neg Qa)}{{{\textit Prob}}(\neg Qa)} $$
In this case, we use the same procedure as before except that we substitute one minus the unconditional probability for negated statements. We also use one minus the similarity of two objects that appear in statements of opposite polarity. For example, the high similarity of lions to cougars should lower the probability of \(Q(\mbox{lion}) \ \&\ \neg Q(\mbox{cougar})\) and raise the probability of \(\neg Q(\mbox{lion}) \ \&\neg Q(\mbox{cougar})\).
With these principles, probabilities may be constructed for arbitrary conjunctions. Lower and upper bounds are computed as before, after negations are transformed by one-minus. The minimum pairwise similarity is also determined, using the one-minus operation on similarities associated with conjuncts of opposite polarity. Then a position in the permitted interval is selected via a convex sum, and this is taken to be the probability of the conjunction. To illustrate:
https://static-content.springer.com/image/art%3A10.1007%2Fs13164-009-0017-0/MediaObjects/13164_2009_17_Figh_HTML.gif
Our method satisfies important coherence conditions involving upper and lower bounds along with the conjunction law, as seen above. Moreover, it is easy to show that the method assigns zero probability to contradictions. That is, according to the scheme described above:
$$ {{\textit Prob}}\big(Qb_{\!1}\ \&\ \ldots\ \&\ Qa\ \&\ \ldots\ \&\ \neg Qa\ \&\ \ldots\ Qb_{\!n}\big) = 0. $$
Satisfaction of this principle once again relies on the role of minimum similarity in our model; use of average or maximum would assign positive probability to some contradictions.
On the other hand, there is no guarantee that the conjunctions over a given set of statements are assigned probabilities that sum to one. For example, here are all eight conjunctions over the three statements Qc, Qa, Qb.

Qc & Qa & Qb

Qc & Qa & \(\neg Qb\)

Qc & \(\neg Qa\) & Qb

\(\neg Qc\) & Qa & Qb

\(\neg Qc\) & \(\neg Qa\) & Qb

\(\neg Qc\) & Qa & \(\neg Qb\)

Qc & \(\neg Qa\) & \(\neg Qb\)

\(\neg Qc\) & \(\neg Qa\) & \(\neg Qb\)

Our method does not in general assign them eight numbers that sum to unity. The matter can be rectified through normalization but there is no need for this in the present context because conditional probabilities are ratios and normalizing has no numerical impact. For example, to construct \({{\textit Prob}}(Qc\mid Qa,\neg Qb)\), we calculate the probabilities of \(Qc \ \&\ Qa \ \&\ \neg Qb\) and \(\neg Qc \ \&\ Qa \ \&\neg Qb\), then insert them into the usual formula for conditional probability (equivalent to the definition shown earlier):
$$ {\kern-2pt}{{\textit Prob}}(Qc\!\mid\! Qa, \neg Qb)\! = \!\frac{{{\textit Prob}}(Qc \ \&\ Qa \ \&\ \neg Qb)} {{{\textit Prob}}(Qc \ \&\ Qa \ \&\ \neg Qb) \!+\! {{\textit Prob}}(\neg Qc \ \&\ Qa \ \&\ \neg Qb)}. $$

It remains to test whether the scheme just presented approximates human intuition about chance.

3 Behavioral Data

3.1 Eliciting Estimates of Probability

To test our theory, twenty undergraduates were asked to reason about these categories

Bears

Camels

Cougars

Dolphins

Elephants

Giraffes

Hippos

Horses

Lions

and the predicate (Q): have at least 18% of their cortex in the frontal lobe. The students offered probabilities for 40 arguments with the forms shown here.

Form

Number of instances

\({{\textit Prob}}(Qc\mid Qa)\)

5

\({{\textit Prob}}(Qc\mid \neg Qa)\)

5

\({{\textit Prob}}(\neg Qc\mid Qa)\)

5

\({{\textit Prob}}(\neg Qc\mid \neg Qa)\)

5

\({{\textit Prob}}(Qc\mid Qa, Qb)\)

10

\({{\textit Prob}}(Qc\mid Qa, \neg Qb)\)

10

They also estimated the nine unconditional probabilities corresponding to the nine mammal categories.2 For example, they were asked:
https://static-content.springer.com/image/art%3A10.1007%2Fs13164-009-0017-0/MediaObjects/13164_2009_17_Figi_HTML.gif
The probabilities obtained were averaged across participants, yielding the numbers displayed in Table 1.
Table 1

Average estimates of the 40 conditional and 9 unconditional probabilities

Argument

Rated prob

Argument

Rated prob

dolphins | horses

0.516

bears | hippos

0.567

hippos | elephants

0.678

camels | giraffes

0.677

lions | cougars

0.758

lions | \(\neg\)camels

0.396

cougars | \(\neg\)horses

0.421

dolphins | \(\neg\)horses

0.414

giraffes | \(\neg\)camels

0.381

elephants | \(\neg\)hippos

0.383

\(\neg\)bears | horses

0.388

\(\neg\)dolphins | elephants

0.482

\(\neg\)lions | cougars

0.394

\(\neg\)elephants | giraffes

0.401

\(\neg\)bears | dolphins

0.462

\(\neg\)dolphins | \(\neg\)hippos

0.596

\(\neg\)horses | \(\neg\)bears

0.559

\(\neg\)elephants | \(\neg\)hippos

0.691

\(\neg\)camels | \(\neg\)lions

0.597

\(\neg\)giraffes | \(\neg\)cougars

0.605

lions | bears, dolphins

0.690

camels | horses, giraffes

0.714

dolphins | elephants, hippos

0.570

cougars | lions, giraffes

0.723

elephants | dolphins, camels

0.633

camels | elephants, horses

0.674

giraffes | cougars, hippos

0.611

hippos | horses, bears

0.656

bears | cougars, lions

0.696

giraffes | horses, elephants

0.763

cougars | lions, \(\neg\)bears

0.654

elephants | hippos, \(\neg\)dolphins

0.662

giraffes | camels, \(\neg\)hippos

0.622

camels | bears, \(\neg\)dolphins

0.510

horses | giraffes, \(\neg\)cougars

0.573

elephants | hippos, \(\neg\)bears

0.626

elephants | lions, \(\neg\)camels

0.455

lions | cougars, \(\neg\)horses

0.680

hippos | camels, \(\neg\)dolphins

0.534

horses | bears, \(\neg\)giraffes

0.499

horses

0.583

hippos

0.563

dolphins

0.559

bears

0.588

elephants

0.601

giraffes

0.565

camels

0.550

cougars

0.564

lions

0.633

  

Estimates of probability. The common predicate (suppressed in the table) for all arguments was: have at least 18% of their cortex in the frontal lobe. Each number is the average of 20 responses

Then we attempted to predict the conditional probabilities on the basis of similarity plus the unconditional probabilities, using the scheme described above. The unconditional probabilities are available from the data, having been directly elicited. But what shall we use as our measure of similarity?

3.2 Similarity Untainted by Inductive Inference

We could ask the students to provide numerical estimates of the similarity of pairs of species, using a rating scale. But such a procedure would not fairly test Hume’s idea. His thesis was that perceived similarity gives rise to judged probability. We must not inadvertently test the converse idea, that perceived probability gives rise to judged similarity. After all, it could be that lions and cougars seem similar because inferences from one to the other strike us as plausible. Then similarity would indeed be related to induction but not in the way Hume intended. To focus on Hume’s idea, we need to operationalize similarity without allowing probability estimates to play an implicit role.

For this purpose, we adopt the idea that similarity of categories—like horses and camels—is determined by their respective neural representations. To quantify neural similarity, we rely on functional magnetic resonance imaging (fMRI) to identify the patterns of activation that support the categories; proximity of categories is then measured in physical terms.

4 Neurophysiological Data

4.1 Obtaining Activation Maps

Twelve new subjects were recruited to perform a classification task during scanning. On each trial, they viewed a category label for one of the nine mammals used in the behavioral study on probability estimates. Then they viewed a series of images of the designated mammal. But occasionally there was an intruder that had to be signaled via button-press. See Fig. 1. These “catch trials” were intended to ensure that the images were processed; only trials without intruders were analyzed further. There was also a series of control trials substituting phase-scrambled versions of the original images (hence, unidentifiable but with the same spatial frequencies and overall luminance). In the latter trials, subjects searched for a low-contrast cross hatch (#), present only in catch trials (excluded from analysis). Details of the procedure and analysis are provided in the Appendix.
https://static-content.springer.com/image/art%3A10.1007%2Fs13164-009-0017-0/MediaObjects/13164_2009_17_Fig1_HTML.gif
Fig. 1

fMRI procedure. Main task: spot any animal that is not named at the outset of the trial. In the case pictured here, the zebra must be signaled (a “catch trial”). In most trials there were no intrusions, and brain activations were used only in such non-catch trials. In control trials, phase-scrambled versions of the mammal images were presented, and subjects signaled the presence of a faint cross hatch. Again, these cases served as “catch trials” and were excluded from the analysis

None of the fMRI subjects participated in the probability assessments. Also, no mention was made of similarity or probability either before or during scanning. The fMRI subjects simply verified the category of mammal images (or verified in control trials that # was absent).

The fMRI procedure parcels the brain into roughly 50,000 cubes called voxels, 3 millimeters on a side. For each voxel, we obtained a measure of the metabolic activity provoked by recognizing bears, another value for giraffes, and so forth. The measure is the β coefficient for a given mammal’s regressor in the best linear model of the voxel’s behavior in the experiment; see the Appendix. These values were averaged across the 12 subjects (after projection of each brain onto a common template). Average activations were also obtained when viewing phase-scrambled pictures of each mammal. For each mammal, the activations arising from viewing its scrambled version were subtracted from the activations produced by the verification task. The resulting distribution of corrected values (obtained from the subtraction) induces a “map” of activations over the brain. There is one such map for each mammal. We compared the maps for each pair of mammals to estimate similarity. The method of comparison will be explained shortly.

4.2 Choosing Neural Regions

First we address the question: which structure of the brain should be mapped, that is, where are mammal categories located? It has been observed that lesions to the left temporal lobe are sometimes associated with specific deficits in knowledge of biological categories including mammals, vegetables, and fruit, sparing knowledge of human artifacts like furniture and tools (Warrington and Shallice 1984; Saffran and Schwartz 1994; Capitani et al. 2003). Also, single cell recording from inferior temporal cortex in monkeys reveals neurons that are responsive to natural categories (although their specificity may be influenced by size and position in the visual field, among other features; see Zoccolan et al. 2007). Partially converging information is available from human neuroimaging. A review of studies by Martin (2001) points to activity (often bilateral) in the lateral fusiform gyrus, medial occipital cortex, and superior temporal sulcus when subjects are asked to identify and name pictures of animals. Inferior regions of the left occipital cortex seem also to be recruited when viewing pictures of animals in contrast to tools (Martin et al. 1996). Broadly consistent findings emerge from studies of processing category-names (e.g., Perani et al. 1999, who report left fusiform gyrus activations for animals). Kounios et al. (2003) reach similar conclusions in their summary of the literature. There are, however, many inconsistent findings in both the clinical and neuroimaging literature (Caramazza 2000; Joseph 2001; Gerlach 2007). It is also unclear whether any given brain locus holds an integrated animal representation rather than perceptual or abstract features associated with it (e.g., visual properties in the left fusiform gyrus; see Thompson-Schill et al. 2006).

Moreover, structures beyond the temporal and occipital lobes have been implicated in the manipulation of conceptual knowledge. For example, Freedman et al. (2001) report categorical responding to pictures of cats and dogs by neurons in the lateral prefrontal cortex of monkeys.3 Human neuropsychology and neuroimaging also implicate the premotor cortex of the frontal lobe in object categorization, especially of manipulable objects such as fruit, tools, and clothing, although such evidence is not univocal (e.g. Martin et al. 1996; Chao et al. 1999; Gerlach et al. 2002; for reviews, see Gainotti 2000) and Martin 2007). There has been no similar report for mammal categories, however.

Consonant with the appearance in the previous literature of both temporal and occipital structures underlying the representation of mammals, we investigated two broad regions of the brain that include some of the principal areas discussed above. One region is left hemispheric Brodmann Area 19 (left BA19), overlapping the lateral occipital gyri. The other is left ventral temporal cortex (left VTC) comprising the inferior temporal gyrus, fusiform gyrus, and the parahippocampal gyrus. See Fig. 2.
https://static-content.springer.com/image/art%3A10.1007%2Fs13164-009-0017-0/MediaObjects/13164_2009_17_Fig2_HTML.gif
Fig. 2

Two regions for extracting similarity from the brain. Horizontal slices for LVTC and LBA19 (positions shown at the bottom right). The two regions were exploited separately to compute the neural similarity of mammal concepts

4.3 Comparing Activation Maps

Suppose that R is one of our two regions, comprised of voxels v1 ⋯ vn. Each vi has a level of activation hi for horse, an activation ci for camel, and so forth. Then \(\Sigma (h_i - c_i)^2\) (the sum of squared deviations) is a natural measure of the dissimilarity in R of the respective neural representations of horses and camels. The calculation of this sum can be summarized as follows.
$$ \begin{tabular}{@{\extracolsep\fill}cccc@{}}\hline Voxel & \begin{tabular}{c}Activation\\for \textit{horse}\\\end{tabular} & \begin{tabular}{c}Activation\\for \textit{camel}\\\end{tabular} & \begin{tabular}{c}Squared\\deviation\\\end{tabular}\\\hline\noalign{} $v_1$ & $h_1$ & $c_1$ & $(h_1 - c_1)^2$\\ $v_2$ & $h_2$ & $c_2$ & $(h_2 - c_2)^2$\\ $v_3$ & $h_3$ & $c_3$ & $(h_3 - c_3)^2$\\ \vdots & \vdots & \vdots & \vdots\\ &&& \begin{tabular}{c}Sum of squared\\deviations\\\end{tabular}\\ \hline \end{tabular} $$

The nine mammals give rise to 36 \(= \dbinom{9}{2}\) such computations of dissimilarity. To convert them to similarity, each is first inverted (divided into 1). Then the 36 resulting numbers are linearly scaled to run from \(\frac13\) to \(\frac23\). Occupying just the middle of the unit interval leaves room for pairs less similar than ours (e.g., moles compared to dolphins), as well as pairs more similar (e.g., camels versus dromedaries).

Each of the two neural regions yields its own set of 36 similarities via the foregoing procedure. They are shown in Table 2. Corresponding similarities are highly correlated (r = .840, N = 36), suggesting a common mental reality.4
Table 2

Similarities computed from left ventral-temporal cortex and from left BA 19

Mammals

Ventral temporal

BA 19

lion

cougar

0.650

0.591

hippo

elephant

0.612

0.569

giraffe

camel

0.605

0.641

giraffe

horse

0.581

0.617

camel

horse

0.628

0.632

dolphin

horse

0.365

0.422

lion

bear

0.656

0.628

horse

elephant

0.592

0.611

elephant

camel

0.593

0.583

hippo

giraffe

0.602

0.622

bear

cougar

0.658

0.510

dolphin

hippo

0.418

0.463

bear

horse

0.620

0.581

giraffe

dolphin

0.395

0.401

dolphin

elephant

0.333

0.333

elephant

giraffe

0.593

0.598

camel

hippo

0.649

0.651

hippo

bear

0.653

0.590

hippo

horse

0.608

0.605

horse

cougar

0.608

0.576

dolphin

bear

0.409

0.362

bear

elephant

0.656

0.564

horse

lion

0.603

0.666

camel

cougar

0.659

0.590

cougar

giraffe

0.660

0.579

elephant

lion

0.646

0.636

camel

lion

0.630

0.667

bear

giraffe

0.636

0.535

camel

bear

0.639

0.596

elephant

cougar

0.630

0.500

hippo

cougar

0.644

0.557

dolphin

cougar

0.433

0.451

dolphin

camel

0.382

0.453

lion

hippo

0.667

0.652

lion

giraffe

0.620

0.661

dolphin

lion

0.404

0.467

Two estimates of similarities. Each number is derived from the average squared deviation of the two β-coefficients associated with a given voxel when it is “viewing” the respective mammals. The numbers for each area are normalized to the \(\left[\frac13, \frac23\right]\) interval. (See the text for more complete explanation.) The two estimates for the 36 similarities correlate with each other at r = .840

5 Predicting Conditional Probabilities

Our neural measure is uncontaminated by use of strength-of-inference as an index of similarity; for, the neural measure was obtained from mammal-stimuli individually, with no mention of similarity or probability. Relative to the model of inductive strength advanced above, a pure test of Hume’s thesis is therefore possible. It suffices to enter neural similarity into the model, along with the unconditional probabilities culled directly from subjects. The predictions generated thereby can then be compared to the results of the probability experiment.

Specifically, the experiment produced 40 numbers, corresponding to the arguments shown in Table 1. Each is an average estimate of conditional probability, to be paired with the corresponding probability calculated from our model based on neural similarity. Because similarity was calculated twice (on the basis of two neural regions), predictions are evaluated separately for left VTC and left BA19.

The relation between predictions and observations is plotted in Fig. 3 for each region. A reliable linear association is discernable in each case (Pearson r = .716, .728, for left VTC and left BA19, respectively).
https://static-content.springer.com/image/art%3A10.1007%2Fs13164-009-0017-0/MediaObjects/13164_2009_17_Fig3_HTML.gif
Fig. 3

Scatter plots for results based on two neural regions. Predicted versus observed conditional probabilities: predictions are obtained using similarities computed from left ventral-temporal cortex and from left BA 19

The predictive accuracy of the model cannot be attributed exclusively to the use of neural similarity; the model also rests on estimates of unconditional probability, and these were obtained from the same subjects whose conditional probabilities are at issue. To isolate the role of similarity, we substituted for neural similarity 36 random numbers drawn uniformly from \(\left[\frac13, \frac23\right]\). In 60 random trials, the average correlation between predicted and observed conditional probabilities was .405. No random trial reached r = .716.

6 Discussion

6.1 Other Avenues to Neural Similarity

We calculated similarity from left VTC and left BA 19 because these areas are suggested by the previous literature devoted to the neural representation of natural categories like animals. Certain other areas, however, yield equally good results. For example, when similarities are computed from the left primary visual cortex, the correlation between predicted and observed estimates of conditional probability is .707.

Squared deviation is perhaps the simplest approach to neural similarity but at least one other technique works as well. Given a neural region R and mammal m, the alternative assigns a point in three-dimensional space that reflects the overall position of the activations in R in response to m. The similarity between two mammals is then measured by the Euclidean distance between the points assigned to them (with inversion and linearization to \(\left[\frac13, \frac23\right]\), as before). Used in the rhinal sulcus of the temporal lobe, this index of similarity predicts conditional probability at r = .728. On the other hand, most other neural structures have no predictive success under either of the approaches to similarity discussed here. Understanding how and where similarity is coded in the brain is a topic of current investigation (Weber et al. 2009).

6.2 A Lower Bound for Conjunctions Based on Independence

In our theory of inductive strength, the value of \({{\textit Prob}}(Qb_{\!1} \ \&\ \cdots \ \&\ Qb_{\!n})\) is situated in the interval from \( \max\{0, 1 \!-\! n\! +\! \sum_{i=1}^n {{\textit Prob}}(Qb_i)\}\) to \(\min\{{{\textit Prob}}(Qb_{\!1}),\cdots, {{\textit Prob}}(Qb_{\!n})\}\). Similarity is used to choose a point in the interval, with low similarity pushing the point to the lower bound. Let us consider changing the lower bound to the product of the probabilities of the conjuncts: \(\prod_{i=1}^n {{\textit Prob}}(Qb_i)\). The latter bound embodies the idea that low similarity signals the stochastic independence of the conjuncts rather than their incompatibility. This might correspond better to what Hume had in mind since he seems to take the absence of similarity to reflect no reason for belief rather than reason for disbelief (Cohen 1980).

It is therefore worth reporting that the revised model with multiplicative lower bound underperforms the original model. In every neural region examined (and with both measures of similarity), correlations between observed and predicted probabilities are about 0.1 lower for the revised model. Also, the independence bound implies \({{\textit Prob}}(Qb\ \&\ \neg Qb) > 0\) whenever \(0 < {{\textit Prob}}(Qb) < 1\), a coherence violation.

6.3 Limitations and Extensions

Hume’s thesis about induction has here been examined through the lens of a particular model of probability judgment, which starts from unconditional probability and pairwise similarity. The model cannot generate arbitrary distributions. A joint distribution over the statements Qb1, Qb2 ⋯ Qbn (where Q is a predicate and b1 ⋯ bn are objects) requires 2n − 1 numbers in general. The scheme described here specifies distributions based on only \(n + \dbinom{n}{2}\) numbers (n unconditional probabilities and all pairwise similarities). It follows that our method must omit many potential distributions. This kind of compression, however, may be compatible with describing aspects of human judgment, which likely chooses distributions from a limited set in most circumstances.5

Even if our method corresponds to the distributions that describe human judgment, we have not provided evidence that reasoning proceeds by constructing probabilities for conjunctions. Without such evidence, our model should be interpreted as describing just an input-output relation (unconditional probabilities and similarities in, conditional probability out).

Further progress in constructing a psychological theory will require distinctions among predicates. The predicates in the following list are adapted to the present model because they have biological content without evoking detailed knowledge in the minds of most college students.
https://static-content.springer.com/image/art%3A10.1007%2Fs13164-009-0017-0/MediaObjects/13164_2009_17_Figj_HTML.gif
They are also “shareable,” in contrast to the predicate eats more than half the grass in its habitat, which can’t be true of different species in the same habitat. Our model needs adjustment in the face of non-shareability.
The choice of predicate also affects similarity, as illustrated by the following arguments.
https://static-content.springer.com/image/art%3A10.1007%2Fs13164-009-0017-0/MediaObjects/13164_2009_17_Figk_HTML.gif
The first evokes physiology and may be connected to the default representation of mammals in the mind. The second shifts the criteria for relevant similarity, perhaps allowing influence by expense and public interest. The neural activations measured in response to given categories need to be elicited and interpreted in light of the predicate in play.

Other challenges arise when arguments display distinct predicates in premise and conclusion, or involve relations like preys on. Inferences involving non-natural kinds—like artifacts and political parties—bring fresh distinctions to light. Accommodation is also needed for the tendency of even well-educated respondents to issue probabilistically incoherent estimates of chance, or to judge similarity asymmetrically (Tversky 1977; but see also Aguilar and Medin 1999). Confronting these complexities is inevitable for the development of any theory of human inductive judgment. The data presented here suggest merely that progress will involve similarity in something like the sense Hume had in mind.

6.4 Hume

In the foregoing discussion we’ve interpreted Hume’s thesis as a psychological claim, namely, that inductive inference (as people actually perform it) is driven by similarity. Our formal rendition of this claim enriches the determinants of inductive strength by appealing to the prior probabilities of premises and conclusion. Such additions notwithstanding, the predictive success of the model (limited though it be) supports Hume’s thesis.

To test the thesis as Hume intended it, we relied on a measure of similarity that is free from contamination by inferential reasoning. The measure rests on comparison of the neural representations of mammal-categories, in the absence of judgments of similarity or probability. This is not to deny that over many years, inferences about the properties of mammals might affect how they are ultimately coded in the brain. Thus, lions and cougars may be represented via a common pattern because they are perceived to share many properties. Neural similarity could therefore depend on inference via this route. Nonetheless, our measure of similarity is directly mediated by the mental representation of concepts rather than accessing the machinery of inductive cognition. This seems a fair way of making Hume’s claim precise. So perhaps our results sustain his claim that similarity provides a partial foundation for understanding inductive inference.

Footnotes
1

The model presented here is an alternative to “QPf,” described in Blok et al. (2007). It relies on some of the same concepts.

 
2

Participants were interviewed singly. Questions were posed in individualized random order via computer interface. Responses were made with a slider that controlled a field displaying numbers in the unit interval. The concept of conditional probability was reviewed prior to testing.

 
3

The same categoricity, however, was observed when the monkeys were trained on concepts involving cat/dog mixtures. Note that the LPFC is directly interconnected with inferior temporal cortex (Webster et al. 1994).

 
4

The results reported below are virtually identical if the activations for each mammal in a given region are “mean-centered.” To mean-center mammal M in region R, the average activation (β) in the map for M in R is subtracted from all the activations in the map prior to computing the sum of squared deviations between M and any other mammal.

 
5

A natural generalization of our model replaces (binary) similarity with the homogeneity of sets of n ≥ 2 objects. To illustrate, such a measure might assign greater homogeneity to { camels, horses, giraffes } compared to { camels, bears, lions }. All distributions over Qb1, Qb2 ⋯ Qbn can be generated by a model like ours that relies on n-ary homogeneity.

 

Acknowledgements

We thank Sergey Blok, James Haxby, Douglas Medin, and Lawrence Parsons for discussion and assistance in various stages of this work.

Copyright information

© Springer Science+Business Media B.V. 2009