Theorists have found it difficult to reconcile the unity of inner speech as a mental state kind with the diversity of its manifestations. I argue that existing views concerning the content of inner speech fail to accommodate both of these features because they mistakenly assume that its content is to be found in the ‘speech processing hierarchy’, which includes semantic, syntactic, phonemic, phonetic, and articulatory levels. Upon rejecting this assumption, I offer a position on which the content of inner speech is determined by voice processing, of which speech processing is but one component. The resulting view does justice to the idea that inner speech is a motley assortment of episodes that nevertheless form a kind.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Price includes VAT (USA)
Tax calculation will be finalised during checkout.
Although see Hurlburt and Heavey (2018) for skepticism about reported frequencies of inner speech.
The nature of the speech processing hierarchy remains contested. Psycholinguists disagree about how information flows through the hierarchy – serially or in parallel, feedforward or feedback – and about the exact operations and sub-operations within the hierarchy (e.g., Fromkin, 1971; Dell, 1986). Despite these differences, psycholinguists tend to agree on the organization presented in Figure 1.
Although see Langland-Hassan (2018) for a contrasting position on phonemes, according to which phonemes are auditory.
Langland-Hassan (2018) and Gauker (2018) differ in their framing of concretism and abstractionism. Langland-Hassan seems to assume that inner speech always represents phonetic content, while Gauker seems to assume that inner speech never has a phonetic vehicle. This difference will not matter in my discussion of the views. For this reason, I will use ‘phonetic/auditory/speech sound component’ with the understanding that it translates as ‘phonetic/auditory/speech sound content or vehicle’.
This line of argument puts into relief a plausible alternative explanation of how I know the language of my inner speech: my knowledge that I am speaking English during inner or outer speech is non-observational in just the way that my knowledge that I am grabbing a glass may be non-observational (see Anscombe (2000)). On this account, I know that my inner speech is in English because I use English words in my inner speech, where this knowledge is not grounded in observation. Although Langland-Hassan seems to assume that the knowledge of the language of our inner speech is gained by introspection, the alternative I have mentioned rejects this restriction.
Gauker might reject Levelt’s speech control model, since Gauker may take it to depend on the doubtful existence of a language of thought. However, Levelt’s speech control model does not depend on the existence of a language of thought, since the control model is a model of peripheral processes of speech production, e.g., motor control processes, and not core processes, e.g., concept selection.
Langland-Hassan (2014) discusses two problems. One problem—call it ‘the kindhood problem’—concerns how distinct inner speech episodes with different combinations of contents from the speech processing hierarchy all count as being cases of inner speech (pp. 519–520). Another problem—call it ‘the binding problem’—concerns how it is that a single inner speech episode possesses a combination of different contents from the speech processing hierarchy (pp. 520–529). Although these problems are related, a solution to the one does not entail a solution to the other. In particular, an account that states what it is in virtue of which inner speech episodes with various combinations of contents count as being cases of inner speech does not thereby state how any one of those episodes possesses the combination of contents it does. This paper is intended to provide a solution to the kindhood problem, not the binding problem.
A leftover possibility is that a state counts as inner speech in virtue of being a part of an aborted speech production process. The problem with this proposal, however, is that being a part of a mental process does not individuate mental state kinds. For example, states corresponding to prediction, prediction error, precision, and data are assumed to be parts of the mental process of prediction error minimization (Clark, 2015). But these states do not belong to some further mental state kind in virtue of being a part of the process of prediction error minimization. Or again the mere fact that motor, sensory, and cognitive states are part of the speech perception process does not entail that they are themselves states of speech perception, nor that there is some further kind to which they belong.
One might object that my use of the concept of voice is equivocal. On the one hand, voice refers to information that is extracted from a signal, while on the other hand, voice refers also to the medium within which a signal is communicated. The charge of equivocation stems from the fact that I can come to represent someone’s voice – as a medium – only by extracting information from a signal. This can make it seem as if voice is also represented as information extracted from a signal (e.g., on a par with speech sound information). But this does not follow. Although I learn facts about a medium via the signal it carries, I nevertheless continue to represent the medium as a medium. The same holds for voice: I learn how a voice sounds or whose voice it is by extracting information from a signal, but in so doing I continue to represent the voice as a medium through which the signal is communicated.
One might object that this evidence bears on the phenomenon of imagining speech and not on the phenomenon of inner speech. According to this objection, imagined speech involves the representation of another’s voice, while inner speech necessarily involves the representation of one’s own voice. However, it is not clear that the ‘own voice-other voice’ distinction marks a difference in mental state kinds; the objector must show that it does.
This paper has been concerned with the nature of the contents of inner speech, and not with the further, important question concerning the mechanism by which inner speech possesses its content (see Carruthers, 2010; Langland-Hassan, 2014; Vicente and Martínez-Manrique, 2016; Knappik, 2017). I believe that it is fruitful to first provide an account of the content of inner speech before fixing on one or another specific mechanism by which it is endowed with such content.
Many thanks to Edouard Machery, Peter Langland-Hassan, Wayne Wu, James Shaw, Mark Wilson, and Zina Ward for discussion and feedback on this paper.
Alderson-Day, B., & Fernyhough, C. (2015). Inner speech: Development, cognitive functions, phenomenology, and neurobiology. Psychological Bulletin, 141(5), 931–965. https://doi.org/10.1037/bul0000021
Anscombe, G. E. M. (2000). Intention (2nd edition). Harvard University Press.
Belin, P. (2019). The “Vocal brain”: Core and extended cerebral networks for voice processing. Oxford University Press.
Belin, P., Fecteau, S., & Bédard, C. (2004). Thinking the voice: Neural correlates of voice perception. Trends in Cognitive Sciences, 8(3), 129–135. https://doi.org/10.1016/j.tics.2004.01.008
Belin, P., Zatorre, R. J., Lafaille, P., Ahad, P., & Pike, B. (2000). Voice-selective areas in human auditory cortex. Nature, 403(6767), 309–312. https://doi.org/10.1038/35002078
Buchtel, H. A., & Stewart, J. D. (1989). Auditory agnosia: Apperceptive or associative disorder? Brain and Language, 37(1), 12–25. https://doi.org/10.1016/0093-934x(89)90098-9
Carruthers, P. (2010). Introspection: Divided and partly eliminated. Philosophy and Phenomenological Research, 80(1), 76–111. https://doi.org/10.1111/j.1933-1592.2009.00311.x
Clark, A. (2015). Surfing uncertainty: prediction, action, and the embodied mind. Oxford University Press.
Dell, G. S. (1986). A spreading-activation theory of retrieval in sentence production. Psychological Review, 93(3), 283–321. https://doi.org/10.1037/0033-295X.93.3.283
Denes, G., & Semenza, C. (1975). Auditory modality-specific anomia: Evidence from a case of pure word deafness. Cortex, 11(4), 401–411.
Fromkin, V. (1971). The non-anomalous nature of anomalous utterances. Language. https://doi.org/10.2307/412187
Fruhholz, S., & Belin, P. (2019). The oxford handbook of voice perception. Oxford University Press.
Gauker, C. (2018). Inner speech as the internalization of outer speech. In P. Langland-Hassan & A. Vicente (Eds.), Inner Speech: New Voices (pp. 53–77). Oxford University Press.
Grandchamp, R., Rapin, L., Perrone-Bertolotti, M., Pichat, C., Haldin, C., Cousin, E., Lachaux, J.-P., Dohen, M., Perrier, P., Garnier, M., Baciu, M., & Lœvenbruck, H. (2019). The ConDialInt Model: Condensation, dialogality, and intentionality dimensions of inner speech within a hierarchical predictive control framework. Frontiers in Psychology, 10, 2019. https://doi.org/10.3389/fpsyg.2019.02019
Hasan, B.A.S., Valdes-Sosa, M., Gross, J., & Belin, P. (2016). “Hearing faces and seeing voices”: Amodal coding of person identity in the human brain. Scientific Reports, 6(1), 37494. https://doi.org/10.1038/srep37494
Hurlburt, R. T., & Heavey, C. L. (2018). Inner speech as pristine inner experience. In P. Langland-Hassan & A. Vicente (Eds.), Inner speech: new voices. Oxford: Oxford University Press. Oxford.
Hurlburt, R. T., Heavey, C. L., & Kelsey, J. M. (2013). Toward a phenomenology of inner speaking. Consciousness and Cognition, 22(4), 1477–1494. https://doi.org/10.1016/j.concog.2013.10.003
Jescheniak, J. D., Meyer, A., & Levelt, W. J. M. (2003). Specific-word frequency is not all that counts in speech production. Evidence from the production of homophones in Dutch and German. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29, 432–438.
Jones, P. E. (2009). From ‘external speech’ to ‘inner speech’ in Vygotsky: A critical appraisal and fresh perspectives. Language & Communication, 29(2), 166–181. https://doi.org/10.1016/j.langcom.2008.12.003
Joyce, J. (2015). Ulysses (D. Kiberd, Ed.; New Edition). Penguin.
Knappik, F. (2018). Bayes and the first person: Consciousness of thoughts, inner speech and probabilistic inference. Synthese, 195(5), 2113–2140. https://doi.org/10.1007/s11229-017-1321-3
Kurby, C. A., Magliano, J. P., & Rapp, D. N. (2009). Those voices in your head: Activation of auditory images during reading. Cognition, 112(3), 457–461. https://doi.org/10.1016/j.cognition.2009.05.007
Langland‐Hassan, P. (2014). Inner Speech and Metacognition: In Search of a Connection. Mind & Language, 29(5), 511–533. https://doi.org/10.1111/mila.12064
Langland-Hassan, P. (2018). From introspection to essence: The auditory nature of inner speech. In P. Langland-Hassan & A. Vicente (Eds.), Inner Speech: New Voices. Oxford: Oxford University Press.
Langland-Hassan, P., & Vicente, A. (Eds.). (2018). Inner speech: New voices. Oxford: Oxford University Press.
Latinus, M., McAleer, P., Bestelmeyer, P. E. G., & Belin, P. (2013). Norm-based coding of voice identity in human auditory cortex. Current Biology, 23(12), 1075–1080. https://doi.org/10.1016/j.cub.2013.04.055
Lavan, N., Knight, S., & McGettigan, C. (2019). Listeners form average-based representations of individual voice identities. Nature Communications, 10(1), 2404. https://doi.org/10.1038/s41467-019-10295-w
Levelt, W. J. M. (1993). Speaking: From Intention to Articulation. MIT Press.
MacKay, D. G. (1992). Constraints on theories of inner speech. In Auditory imagery (pp. 121–149). Lawrence Erlbaum Associates, Inc.
Marshall, R. C., Rappaport, B. Z., & Garcia-Bunuel, L. (1985). Self-monitoring behavior in a case of severe auditory agnosia with aphasia. Brain and Language, 24(2), 297–313. https://doi.org/10.1016/0093-934X(85)90137-3
McGuigan, F. J., & Dollins, A. B. (1989). Patterns of covert speech behavior and phonetic coding. The Pavlovian Journal of Biological Science, 24(1), 19–26.
McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264(5588), 746–748. https://doi.org/10.1038/264746a0
Oppenheim, G. M., & Dell, G. S. (2010). Motor movement matters: The flexible abstractness of inner speech. Memory & Cognition, 38(8), 1147–1160. https://doi.org/10.3758/MC.38.8.1147
Papathanasiou, I., Macfarlane, S., & Heron, C. (1998). A case of verbal auditory agnosia: missing the word missing the sound. International Journal of Language & Communication Disorders, 33(S1), 214–218. https://doi.org/10.3109/13682829809179425
Perrone-Bertolotti, M., Rapin, L., Lachaux, J. P., Baciu, M., & Lœvenbruck, H. (2014). What is that little voice inside my head? Inner speech phenomenology, its role in cognitive performance, and its relation to self-monitoring. Behavioural Brain Research, 261, 220–239. https://doi.org/10.1016/j.bbr.2013.12.034
Pickering, M. J., & Garrod, S. (2013). An integrated theory of language production and comprehension. The Behavioral and Brain Sciences, 36(4), 329–347. https://doi.org/10.1017/S0140525X12001495
Roelofs, A., Meyer, A. S., & Levelt, W. J. (1998). A case for the lemma/lexeme distinction in models of speaking: Comment on Caramazza and Miozzo (1997). Cognition, 69(2), 219–230. https://doi.org/10.1016/s0010-0277(98)00056-0
Tsantani, M., Kriegeskorte, N., McGettigan, C., & Garrido, L. (2019). Faces and voices in the brain: A modality-general person-identity representation in superior temporal sulcus. NeuroImage. https://doi.org/10.1016/j.neuroimage.2019.07.017
Vicente, A., & Jorba, M. (2019). The Linguistic Determination of conscious thought contents. Noûs, 53(3), 737–759. https://doi.org/10.1111/nous.12239
Vicente, A., & Martínez-Manrique, F. (2016). The nature of unsymbolized thinking. Philosophical Explorations, 19(2), 173–187. https://doi.org/10.1080/13869795.2016.1176234
Wu, W. (2012). Explaining Schizophrenia: Auditory verbal hallucination and self-monitoring. Mind & Language, 27(1), 86–107. https://doi.org/10.1111/j.1468-0017.2011.01436.x
Yao, B., Belin, P., & Scheepers, C. (2011). Silent reading of direct versus indirect speech activates voice-selective areas in the auditory cortex. Journal of Cognitive Neuroscience, 23(10), 3146–3152. https://doi.org/10.1162/jocn_a_00022
Conflicts of interest
The author declared that they have no conflict of interest.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Patel, S. From speech to voice: on the content of inner speech. Synthese (2021). https://doi.org/10.1007/s11229-021-03274-6
- Inner speech
- Speech processing
- Voice processing
- Occurrent thought