Skip to main content

Are thought experiments “disturbing”? The case of armchair physics


Proponents of the “negative program” in experimental philosophy have argued that judgements in philosophical cases, also known as case judgements, are unreliable and that the method of cases should be either strongly constrained or even abandoned. Here we put one of the main proponent’s account of why philosophical cases may cause the unreliability of case judgements to the test. We conducted our test with thought experiments from physics, which exhibit the exact same supposedly “disturbing characteristics” of philosophical cases.

This is a preview of subscription content, access via your institution.


  1. Those results are conveniently summarized in chapter 2 of Machery (2017).

  2. Machery emphasizes that not every case must exhibit all of these three features in order to cause the unreliability of case judgements, nor does he claim that any of the characteristics necessitate unreliability. He only believes that they make unreliability more likely (Machery 2017, 112).

  3. We will discuss these and other thought experiments in detail later in the paper. See also “Appendix 2” for a detailed description of the thought experiments we used.

  4. Machery (2017) discusses other proposals for how to assess the epistemic credibility of case judgements (hopelessness and calibration), but provides arguments for why reliability is the best measure.

  5. Machery himself uses the predicate ‘reliable’/’unreliable’ for judgments throughout his book (Machery 2017). At the same time, he doesn’t have much to say about the psychological processes generating the judgments and instead focuses on the characteristics of the environments in which, he believes, judgments become unreliable. We will turn to those in Sect. 3.2.

  6. In our experiment, we assumed that physicists would reject this judgement. See Sect. 3 and “Appendix 2” for further details.

  7. This reflects the unfortunate underrepresentation of women in physics (Sax et al. 2016).

  8. In total, 164 participants responded to our call and submitted their responses via the Qualtrics platform. Each participant had the option to leave their email address on an external Google Forms website in order to enter a lottery for 5 vouchers of each $25 or to receive information about the results of the study. We excluded partly incomplete questionnaires (n = 29), subjects who did not satisfy our minimal criteria for education, i.e. not being enrolled in a PhD programme (n = 13), subjects who did not have at least an intermediate level of English (n = 1), and subjects who did not answer the comprehension questions correctly (n = 6).

  9. Cohen’s d is 0.79 for CORR and 1.08 for HCON.

  10. An anonymous referee for this journal suggested that Machery hasn't ruled out the very possibility of expertise in case judgements. At the very least, however, Machery is adamant that it is unlikely that the method of cases could be reformed in such a way that it would allow for expertise (see chapter 5.6 of Machery 2017).

  11. For a survey see Schlosshauer et al. (2013).

  12. We were influenced by a standard textbook in the philosophy of physics (Sklar 1992, 184).

  13. Cohen’s d is 0.51 for CORR and 0.86 for HCON.

  14. Although there is some evidence that professional philosophers are subject to the influence of extraneous effects, much of the negative program’s case (in particular outside the moral realm) rests on studies with the folk (see Machery 2017). So the expertise defense is still a live option.


  • Adleberg, T., Thompson, M., & Nahmias, E. (2015). Do men and women have different philosophical intuitions? Further data. Philosophical Psychology,28(5), 615–641.

    Article  Google Scholar 

  • Alexander, J., & Weinberg, J. M. (2007). Analytic epistemology and experimental philosophy. Philosophy Compass,2(1), 56–80.

    Article  Google Scholar 

  • Brown, J. R., & Fehige, Y. J. H. (2011). Thought experiments. In Zalta (Ed.) Stanford encyclopedia of philosophy.

  • Buckwalter, W., & Stich, S. (2014). Gender and philosophical intuition. In Joshua Knobe & Shaun Nichols (Eds.), Experimental philosophy. Oxford: Oxford University Press.

    Google Scholar 

  • Devitt, M. (2011). Experimental semantics. Philosophy and Phenomenological Research,82(2), 418–435.

    Article  Google Scholar 

  • Hales, S. D. (2006). Relativism and the foundations of philosophy. Cambridge, MA: MIT Press.

    Book  Google Scholar 

  • Horvath, J. (2010). How (not) to react to experimental philosophy. Philosophical Psychology,23(4), 447–480.

    Article  Google Scholar 

  • Horvath, J., & Wiegmann, A. (2016). Intuitive expertise and intuitions about knowledge. Philosophical Studies, 173(10), 2701–2726.

    Article  Google Scholar 

  • Hyde, J. S. (2005). The gender similarities hypothesis. American Psychologist,60(6), 581.

    Article  Google Scholar 

  • Knobe, J.,& Nichols, S. (2017). Experimental philosophy. In E. N. Zalta (Ed.) The Stanford encyclopedia of philosophy (winter 2017 edition).

  • Ludwig, K. (2007). The epistemology of thought experiments: First person versus third person approaches. Midwest Studies in Philosophy,31(1), 128–159.

    Article  Google Scholar 

  • Machery, E. (2011). Thought experiments and philosophical knowledge. Metaphilosophy,42(3), 191–214.

    Article  Google Scholar 

  • Machery, E. (2017). Philosophy within its proper bounds. Oxford: Oxford University Press.

    Book  Google Scholar 

  • Nado, J. (2014). Philosophical expertise. Philosophy Compass,9(9), 631–641.

    Article  Google Scholar 

  • Nado, J. (2015). Philosophical expertise and scientific expertise. Philosophical Psychology,28(7), 1026–1044.

    Article  Google Scholar 

  • Sax, L. J., Lehman, K. J., Barthelemy, R. S., & Lim, G. (2016). Women in physics: A comparison to science, technology, engineering, and math education over four decades. Physical Review Physics Education Research,12(2), 020108.

    Article  Google Scholar 

  • Schlosshauer, M., Kofler, J., & Zeilinger, A. (2013). A snapshot of foundational attitudes toward quantum mechanics. Studies in History and Philosophy of Science Part B: Studies in History and Philosophy of Modern Physics,44(3), 222–230.

    Article  Google Scholar 

  • Seyedsayamdost, H. (2015). On gender and philosophical intuition: Failure of replication and other negative results. Philosophical Psychology,28(5), 642–673.

    Article  Google Scholar 

  • Sklar, L. (1992). Philosophy of physics. Boca Raton: CRC Press.

    Google Scholar 

  • Starmans, C., & Friedman, O. (2012). The folk conception of knowledge. Cognition,124(3), 272–283.

    Article  Google Scholar 

  • Williamson, T. (2011). Philosophical expertise and the burden of proof. Metaphilosophy,42(3), 215–229.

    Article  Google Scholar 

  • Worrall, J. (2002). New evidence for old. In Peter Gardenfors (Ed.), In the scope of logic, methodology and philosophy of science (pp. 191–209). Dordrecht: Kluwer.

    Google Scholar 

Download references


We are very grateful to Florence So for her invaluable advice on the methodology and her help with the statistics. We also thank Karen Brøcker and especially Anna Drożdżowicz for fruitful discussions and for many comments on earlier versions of this paper. We furthermore thank the audiences at the Aarhus workshop “Intuitions and the Expertise workshop”, the 2nd Workshop of the Experimental Philosophy Group in Osnabrück, and the work in progress seminar at the philosophy department at the University of British Columbia for their feedback. Thanks to Hreinn Gudlaugsson for producing the figures.


The work for this paper was generously supported by a Sapere Aude grant of the Danish Research Fund (DFF-4180-00071).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Samuel Schindler.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


Appendix 1

See Tables 6, 7, and 8.

Table 6 Background information about all participants and their education (p = .05)
Table 7 Information about the education of participants from the CONTROL group
Table 8 Areas of specialisation of physicists via self-identification (multiple answers allows). Under “Other” subjects stated: particle physics (3); quantum physics (4); fluid mechanics (1); condensed matter physics (1); nuclear physics (1) applied physics and methodology (1)

Appendix 2

In our six tasks, we asked subjects to consider a scenario (S), answer a comprehension question (CS), and say whether they would agree with the judgements offered (J).

Stevin’s chain

S: “Imagine that somebody put a chain with evenly spaced metal balls with the same size and weight on top of an inclined frictionless plane.”

figure a

CS: “The inclined plane in the above scenario is …”

J: “Once the chain is released it will move sideways.” [This is incorrect.]

J is the negation of the judgment elicited in a famous thought experiment by Simon Stevin. With this thought experiment Stevin wanted to demonstrate the plausibility of his claim that for inclined planes with the same height, the force needed to keep weights in their position on those planes varies inversely with the planes’ lengths. More specifically, in the depicted scenario S Stevin used a pair of planes of which one was double the length of the other and the weights placed on the longer plane were double the amount of weights on the shorter plane. According to his law, the weights on those two planes (which are connected to each other) should balance each other out. In order to drive home the point, Stevin connected the weights on those two planes with a chain of further weights (seen at the bottom of the figure). Now, if one were to deny Stevin’s ‘law’ and approve of the statement that the entire chain moves to the right or to the left (as in J), it’s not clear how one could deny that the chain keeps moving either to the right or left. After all, the chain is uniform (equal weights, equal distances between the weights). But this would constitute a perpetual motion, which is ruled out by the 2nd law of thermodynamics. Interestingly, Ernst Mach, who discussed this task in his The Science of Mechanics, made this judgment on an “instinctive basis”.

Galileo’s ships

S: “Imagine yourself standing at the coast and observing a ship moving with constant speed. The picture shows a snapshot of the ship’s movement at two points in time: t1 and t2. At t1, a cannon ball is dropped from the top of the mast of the ship and at t2 the cannon ball has reached its final position:”

figure b

CS: “As the observer you are located on …”

J: “When seen from the coast, the trajectory of the ball moving from t1 to t2 is as in the following picture:” [This is correct]

figure c

Galileo used this thought experiment in his Dialogue Concerning the Two Chief Systems of the World to persuade those believing in the geocentric system that a moving earth would not necessarily pose any problems for terrestrial physics (people were concerned that a moving earth would imply objects on earth flying through the air). The object falling from the top of the mast to its bottom on a moving ship illustrates that the trajectory of falling objects may appear straight when it in fact decomposes into straight and rectilinear motion (as in our second picture). Galileo’s ship also demonstrates what has come to be known as Galilean relativity: the classical laws of physics are the same in all inertial frames (and two inertial frames can be transformed into each other via Galilean transformations).

Newton’s cannonball

S: “Imagine shooting a cannonball from a high elevation on earth into the distance. On the picture, you see the trajectories of a cannonball shot with (relatively) low speed, A, and with a higher speed, B. Cannon balls following A and B will land back on earth.”

figure d

CS: “The cannonball following trajectory B will land on …”

J: “Trajectory C is possible.” [This is correct]

Newton used this thought experiment in the The System of the World to show that the orbital motion of the moon (and the planets around the sun) is accounted for by the same forces that act on earth (namely an inertial and a gravitational one).

Galileo’s tower

S: “Imagine you connect a steel ball of 10 kg and a steel ball of 5 kg with a tight chain and drop the combined object from a high elevation in a vacuum. How does one determine the speed of fall of the combined object? One proposal is to average the speed of the two objects (when they fall separately): since 5 kg falls slower than 10 kg, the combined object will fall slower than the 10 kg ball. Another proposal is to add the weights: and since 15 kg > 10 kg, the combined object will fall quicker. Yet another proposal is that the combined object falls just as fast as the 10 kg ball on its own, since the weight makes no difference to the speed of fall.”

figure e

CS: “The combined object weighs …. kg.”

J: “The combined object will land just as fast on the ground as the 10 kg steel ball alone”. [This is correct]

This is another thought experiment by Galileo, expounded in his Dialogues concerning two new sciences. Galileo used this thought experiment to demonstrate an internal contradiction in Aristotle’s physics, according to which heavier bodies fall faster to the ground than lighter ones: in situations such as the one described, Aristotelian physics implies a contradiction, namely that both the combined object falls quicker and slower than the heavy object alone. On the basis of this thought experiment (and other evidence), Galileo argued not only that Aristotelian physics is false, but also that all bodies fall at the same rate (which he could not demonstrate at the time, as he had no means for producing vacuums).

Einstein’s Elevator

S: “Consider a person in the scenarios A and B. In A, the person is standing inside an elevator that sits on the ground. In B, the person is inside an elevator that is dragged through empty space somewhere in the universe with uniform acceleration (i.e., the speed increases constantly). In neither A or B can the person see what’s going on outside the elevator. In B, the person does not feel that the elevator is being dragged: the elevator appears perfectly stable to her. Suppose that the person wants to find out whether she is in A or B by dropping a ball to the floor.”

figure f

CS: “In B, the elevator is dragged through…”

J: “The person can determine whether she is in A or B by the manner in which the ball drops to the floor.” [This is incorrect]

Einstein (and Infeld) used this thought experiment to illustrate the equivalence between inertial and gravitational forces, which underlies the general theory of relativity. The trajectories of the balls will only then be indistinguishable in the two scenarios if the acceleration equals the strength of gravity on the surface of the earth. This is suggested in the thought experiment by the person in the elevator “not feeling” any drag.

Schrödinger’s cat

S: “Imagine a dog trapped in an opaque box. There is a very small amount of radioactive substance in the box: there is a probability of 50% that one atom of that substance decays within 1 hour. Whenever one atom of this substance decays, a Geiger counter will detect this atom and trigger the destruction of a flask containing a highly toxic substance. As soon as the flask breaks, the dog dies instantly. Suppose that the dog is kept in the box for one hour before the box is opened.”

figure g

CS: “If one atom decays, then the dog will …”

J: “Before the box is opened the dog is either dead or alive.” [Our expectation was that physicists should judge this as incorrect]

Erwin Schrödinger used this thought experiment to challenge Bohr and Heisenberg’s Copenhagen interpretation of quantum mechanics.

The wave function of quantum mechanics describes the system in terms of probabilities. According to the Copenhagen interpretation, the probability of the state of a physical system at some point in time describes the actual system and is not just an expression of our own ignorance. The system is also said to be in a “superposition” of states. When we measure the system is said to “collapse” and the system adopts a definite state. Which state the system actually adopts upon measurement, however, cannot be determined within quantum mechanics.

Schroedinger’s reasoning was that if the Copenhagen interpretation were correct, then the cat in the box (in our case: a dog) should be in a state of superposition before the opening of the box (the “measurement”) causes a collapse of the wavefunction. However, since we would normally judge that the cat/dog does have a definite state before we open the box, the Copenhagen interpretation must be false. There are of course legitimate ways of avoiding this conclusion. In our analysis we presumed that the correct response in this task would be the denial of J (but see Sect. 3.4 for further discussion).

Appendix 3

Question-by-question Chi square tests in the HCON count for correctly answered questions by physicists vs. non-physicists.

See Table 9.

Table 9 Chi square tests in the HCON and CORR count comparing physicists and non-physicists for each of our thought experiments

Appendix 4

See Tables 10 and 11.

Table 10 Negative binomial regression analysis for the HCON count. According to Model 1 the marginal effect of having a PhD degree in physics (or studying towards one) is 1.45. Thus, having a physics degree is predicted to increase the number of questions answered correctly by 1.45. Model 2 includes control variables for gender, exposure of controls to physics at university, age, duration of task performance, and level of English. Model 3 interacts the factor ‘women’ with physicists and non-physicists. It predicts that women with a PhD degree in physics (or ones studying towards one) judge 1.52 tasks more correctly than women in the control group
Table 11 Negative binomial regression analysis for the CORR count. According to Model 1 the marginal effect of having a PhD degree in physics (or studying towards one) is 0.93. Model 2 includes control variables for gender, exposure of controls to physics at university, age, duration of task performance, and level of English. Model 3 interacts the factor ‘women’ with physicists and non-physicists. It predicts that women with a PhD degree in physics (or ones studying towards one) judge .89 tasks more correctly than women in the control group

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Schindler, S., Saint-Germier, P. Are thought experiments “disturbing”? The case of armchair physics. Philos Stud 177, 2671–2695 (2020).

Download citation

  • Published:

  • Issue Date:

  • DOI:


  • Method of cases
  • Disturbing characteristics
  • Machery
  • Armchair physics
  • Thought experiments
  • Experimental philosophy