Skip to main content

From speech to voice: on the content of inner speech


Theorists have found it difficult to reconcile the unity of inner speech as a mental state kind with the diversity of its manifestations. I argue that existing views concerning the content of inner speech fail to accommodate both of these features because they mistakenly assume that its content is to be found in the ‘speech processing hierarchy’, which includes semantic, syntactic, phonemic, phonetic, and articulatory levels. Upon rejecting this assumption, I offer a position on which the content of inner speech is determined by voice processing, of which speech processing is but one component. The resulting view does justice to the idea that inner speech is a motley assortment of episodes that nevertheless form a kind.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2

Reproduced from Belin et al. (2004)


  1. 1.

    Although see Hurlburt and Heavey (2018) for skepticism about reported frequencies of inner speech.

  2. 2.

    The nature of the speech processing hierarchy remains contested. Psycholinguists disagree about how information flows through the hierarchy – serially or in parallel, feedforward or feedback – and about the exact operations and sub-operations within the hierarchy (e.g., Fromkin, 1971; Dell, 1986). Despite these differences, psycholinguists tend to agree on the organization presented in Figure 1.

  3. 3.

    Although see Langland-Hassan (2018) for a contrasting position on phonemes, according to which phonemes are auditory.

  4. 4.

    Langland-Hassan (2018) and Gauker (2018) differ in their framing of concretism and abstractionism. Langland-Hassan seems to assume that inner speech always represents phonetic content, while Gauker seems to assume that inner speech never has a phonetic vehicle. This difference will not matter in my discussion of the views. For this reason, I will use ‘phonetic/auditory/speech sound component’ with the understanding that it translates as ‘phonetic/auditory/speech sound content or vehicle’.

  5. 5.

    Recall that Langland-Hassan (2018) believes that phonemes are auditory (see footnote 3). Although I have denied this (see Sect. 2), for the sake of the present argument, I will use ‘phonological’ in the sense that Langland-Hassan intends.

  6. 6.

    This line of argument puts into relief a plausible alternative explanation of how I know the language of my inner speech: my knowledge that I am speaking English during inner or outer speech is non-observational in just the way that my knowledge that I am grabbing a glass may be non-observational (see Anscombe (2000)). On this account, I know that my inner speech is in English because I use English words in my inner speech, where this knowledge is not grounded in observation. Although Langland-Hassan seems to assume that the knowledge of the language of our inner speech is gained by introspection, the alternative I have mentioned rejects this restriction.

  7. 7.

    Gauker might reject Levelt’s speech control model, since Gauker may take it to depend on the doubtful existence of a language of thought. However, Levelt’s speech control model does not depend on the existence of a language of thought, since the control model is a model of peripheral processes of speech production, e.g., motor control processes, and not core processes, e.g., concept selection.

  8. 8.

    Langland-Hassan (2014) discusses two problems. One problem—call it ‘the kindhood problem’—concerns how distinct inner speech episodes with different combinations of contents from the speech processing hierarchy all count as being cases of inner speech (pp. 519–520). Another problem—call it ‘the binding problem’—concerns how it is that a single inner speech episode possesses a combination of different contents from the speech processing hierarchy (pp. 520–529). Although these problems are related, a solution to the one does not entail a solution to the other. In particular, an account that states what it is in virtue of which inner speech episodes with various combinations of contents count as being cases of inner speech does not thereby state how any one of those episodes possesses the combination of contents it does. This paper is intended to provide a solution to the kindhood problem, not the binding problem.

  9. 9.

    A leftover possibility is that a state counts as inner speech in virtue of being a part of an aborted speech production process. The problem with this proposal, however, is that being a part of a mental process does not individuate mental state kinds. For example, states corresponding to prediction, prediction error, precision, and data are assumed to be parts of the mental process of prediction error minimization (Clark, 2015). But these states do not belong to some further mental state kind in virtue of being a part of the process of prediction error minimization. Or again the mere fact that motor, sensory, and cognitive states are part of the speech perception process does not entail that they are themselves states of speech perception, nor that there is some further kind to which they belong.

  10. 10.

    One might object that my use of the concept of voice is equivocal. On the one hand, voice refers to information that is extracted from a signal, while on the other hand, voice refers also to the medium within which a signal is communicated. The charge of equivocation stems from the fact that I can come to represent someone’s voice – as a medium – only by extracting information from a signal. This can make it seem as if voice is also represented as information extracted from a signal (e.g., on a par with speech sound information). But this does not follow. Although I learn facts about a medium via the signal it carries, I nevertheless continue to represent the medium as a medium. The same holds for voice: I learn how a voice sounds or whose voice it is by extracting information from a signal, but in so doing I continue to represent the voice as a medium through which the signal is communicated.

  11. 11.

    One might object that this evidence bears on the phenomenon of imagining speech and not on the phenomenon of inner speech. According to this objection, imagined speech involves the representation of another’s voice, while inner speech necessarily involves the representation of one’s own voice. However, it is not clear that the ‘own voice-other voice’ distinction marks a difference in mental state kinds; the objector must show that it does.

  12. 12.

    This paper has been concerned with the nature of the contents of inner speech, and not with the further, important question concerning the mechanism by which inner speech possesses its content (see Carruthers, 2010; Langland-Hassan, 2014; Vicente and Martínez-Manrique, 2016; Knappik, 2017). I believe that it is fruitful to first provide an account of the content of inner speech before fixing on one or another specific mechanism by which it is endowed with such content.

  13. 13.

    Many thanks to Edouard Machery, Peter Langland-Hassan, Wayne Wu, James Shaw, Mark Wilson, and Zina Ward for discussion and feedback on this paper.


  1. Alderson-Day, B., & Fernyhough, C. (2015). Inner speech: Development, cognitive functions, phenomenology, and neurobiology. Psychological Bulletin, 141(5), 931–965.

    Article  Google Scholar 

  2. Anscombe, G. E. M. (2000). Intention (2nd edition). Harvard University Press.

    Google Scholar 

  3. Belin, P. (2019). The “Vocal brain”: Core and extended cerebral networks for voice processing. Oxford University Press.

    Google Scholar 

  4. Belin, P., Fecteau, S., & Bédard, C. (2004). Thinking the voice: Neural correlates of voice perception. Trends in Cognitive Sciences, 8(3), 129–135.

    Article  Google Scholar 

  5. Belin, P., Zatorre, R. J., Lafaille, P., Ahad, P., & Pike, B. (2000). Voice-selective areas in human auditory cortex. Nature, 403(6767), 309–312.

    Article  Google Scholar 

  6. Buchtel, H. A., & Stewart, J. D. (1989). Auditory agnosia: Apperceptive or associative disorder? Brain and Language, 37(1), 12–25.

    Article  Google Scholar 

  7. Carruthers, P. (2010). Introspection: Divided and partly eliminated. Philosophy and Phenomenological Research, 80(1), 76–111.

    Article  Google Scholar 

  8. Clark, A. (2015). Surfing uncertainty: prediction, action, and the embodied mind. Oxford University Press.

    Google Scholar 

  9. Dell, G. S. (1986). A spreading-activation theory of retrieval in sentence production. Psychological Review, 93(3), 283–321.

    Article  Google Scholar 

  10. Denes, G., & Semenza, C. (1975). Auditory modality-specific anomia: Evidence from a case of pure word deafness. Cortex, 11(4), 401–411.

    Article  Google Scholar 

  11. Fromkin, V. (1971). The non-anomalous nature of anomalous utterances. Language.

    Article  Google Scholar 

  12. Fruhholz, S., & Belin, P. (2019). The oxford handbook of voice perception. Oxford University Press.

    Google Scholar 

  13. Gauker, C. (2018). Inner speech as the internalization of outer speech. In P. Langland-Hassan & A. Vicente (Eds.), Inner Speech: New Voices (pp. 53–77). Oxford University Press.

    Google Scholar 

  14. Grandchamp, R., Rapin, L., Perrone-Bertolotti, M., Pichat, C., Haldin, C., Cousin, E., Lachaux, J.-P., Dohen, M., Perrier, P., Garnier, M., Baciu, M., & Lœvenbruck, H. (2019). The ConDialInt Model: Condensation, dialogality, and intentionality dimensions of inner speech within a hierarchical predictive control framework. Frontiers in Psychology, 10, 2019.

    Article  Google Scholar 

  15. Hasan, B.A.S., Valdes-Sosa, M., Gross, J., & Belin, P. (2016). “Hearing faces and seeing voices”: Amodal coding of person identity in the human brain. Scientific Reports, 6(1), 37494.

  16. Hurlburt, R. T., & Heavey, C. L. (2018). Inner speech as pristine inner experience. In P. Langland-Hassan & A. Vicente (Eds.), Inner speech: new voices. Oxford: Oxford University Press. Oxford.

    Google Scholar 

  17. Hurlburt, R. T., Heavey, C. L., & Kelsey, J. M. (2013). Toward a phenomenology of inner speaking. Consciousness and Cognition, 22(4), 1477–1494.

    Article  Google Scholar 

  18. Jescheniak, J. D., Meyer, A., & Levelt, W. J. M. (2003). Specific-word frequency is not all that counts in speech production. Evidence from the production of homophones in Dutch and German. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29, 432–438.

    Google Scholar 

  19. Jones, P. E. (2009). From ‘external speech’ to ‘inner speech’ in Vygotsky: A critical appraisal and fresh perspectives. Language & Communication, 29(2), 166–181.

    Article  Google Scholar 

  20. Joyce, J. (2015). Ulysses (D. Kiberd, Ed.; New Edition). Penguin.

  21. Knappik, F. (2018). Bayes and the first person: Consciousness of thoughts, inner speech and probabilistic inference. Synthese, 195(5), 2113–2140.

    Article  Google Scholar 

  22. Kurby, C. A., Magliano, J. P., & Rapp, D. N. (2009). Those voices in your head: Activation of auditory images during reading. Cognition, 112(3), 457–461.

    Article  Google Scholar 

  23. Langland‐Hassan, P. (2014). Inner Speech and Metacognition: In Search of a Connection. Mind & Language, 29(5), 511–533.

  24. Langland-Hassan, P. (2018). From introspection to essence: The auditory nature of inner speech. In P. Langland-Hassan & A. Vicente (Eds.), Inner Speech: New Voices. Oxford: Oxford University Press.

    Chapter  Google Scholar 

  25. Langland-Hassan, P., & Vicente, A. (Eds.). (2018). Inner speech: New voices. Oxford: Oxford University Press.

    Google Scholar 

  26. Latinus, M., McAleer, P., Bestelmeyer, P. E. G., & Belin, P. (2013). Norm-based coding of voice identity in human auditory cortex. Current Biology, 23(12), 1075–1080.

    Article  Google Scholar 

  27. Lavan, N., Knight, S., & McGettigan, C. (2019). Listeners form average-based representations of individual voice identities. Nature Communications, 10(1), 2404.

    Article  Google Scholar 

  28. Levelt, W. J. M. (1993). Speaking: From Intention to Articulation. MIT Press.

  29. MacKay, D. G. (1992). Constraints on theories of inner speech. In Auditory imagery (pp. 121–149). Lawrence Erlbaum Associates, Inc.

  30. Marshall, R. C., Rappaport, B. Z., & Garcia-Bunuel, L. (1985). Self-monitoring behavior in a case of severe auditory agnosia with aphasia. Brain and Language, 24(2), 297–313.

    Article  Google Scholar 

  31. McGuigan, F. J., & Dollins, A. B. (1989). Patterns of covert speech behavior and phonetic coding. The Pavlovian Journal of Biological Science, 24(1), 19–26.

    Google Scholar 

  32. McGurk, H., & MacDonald, J. (1976). Hearing lips and seeing voices. Nature, 264(5588), 746–748.

    Article  Google Scholar 

  33. Oppenheim, G. M., & Dell, G. S. (2010). Motor movement matters: The flexible abstractness of inner speech. Memory & Cognition, 38(8), 1147–1160.

    Article  Google Scholar 

  34. Papathanasiou, I., Macfarlane, S., & Heron, C. (1998). A case of verbal auditory agnosia: missing the word missing the sound. International Journal of Language & Communication Disorders, 33(S1), 214–218.

    Article  Google Scholar 

  35. Perrone-Bertolotti, M., Rapin, L., Lachaux, J. P., Baciu, M., & Lœvenbruck, H. (2014). What is that little voice inside my head? Inner speech phenomenology, its role in cognitive performance, and its relation to self-monitoring. Behavioural Brain Research, 261, 220–239.

    Article  Google Scholar 

  36. Pickering, M. J., & Garrod, S. (2013). An integrated theory of language production and comprehension. The Behavioral and Brain Sciences, 36(4), 329–347.

    Article  Google Scholar 

  37. Roelofs, A., Meyer, A. S., & Levelt, W. J. (1998). A case for the lemma/lexeme distinction in models of speaking: Comment on Caramazza and Miozzo (1997). Cognition, 69(2), 219–230.

    Article  Google Scholar 

  38. Tsantani, M., Kriegeskorte, N., McGettigan, C., & Garrido, L. (2019). Faces and voices in the brain: A modality-general person-identity representation in superior temporal sulcus. NeuroImage.

    Article  Google Scholar 

  39. Vicente, A., & Jorba, M. (2019). The Linguistic Determination of conscious thought contents. Noûs, 53(3), 737–759.

    Article  Google Scholar 

  40. Vicente, A., & Martínez-Manrique, F. (2016). The nature of unsymbolized thinking. Philosophical Explorations, 19(2), 173–187.

    Article  Google Scholar 

  41. Wu, W. (2012). Explaining Schizophrenia: Auditory verbal hallucination and self-monitoring. Mind & Language, 27(1), 86–107.

    Article  Google Scholar 

  42. Yao, B., Belin, P., & Scheepers, C. (2011). Silent reading of direct versus indirect speech activates voice-selective areas in the auditory cortex. Journal of Cognitive Neuroscience, 23(10), 3146–3152.

    Article  Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Shivam Patel.

Ethics declarations

Conflicts of interest

The author declared that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Patel, S. From speech to voice: on the content of inner speech. Synthese (2021).

Download citation


  • Inner speech
  • Speech processing
  • Voice processing
  • Occurrent thought