Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Exploratory mapping of theoretical landscapes through word use in abstracts


We present a case study of how scientometric tools can reveal the structure of scientific theory in a discipline. Specifically, we analyze the patterns of word use in the discipline of cognitive science using latent semantic analysis, a well-known semantic model, in the abstracts of over a thousand academic papers relevant to these theories. Our results show that it is possible to link these theories with specific statistical distributions of words in the abstracts of papers that espouse these theories. We show that theories have different patterns of word use, and that the similarity relationships with each other are intuitive and informative. Moreover, we show that it is possible to predict fairly accurately the theory of a paper by constructing a model of the theories based on their distribution of word use. These results may open new avenues for the application of scientometric tools on theoretical divides.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15


  1. 1.

    For an example of such an overarching division, see the exchange between McClelland et al. (2010) and Griffiths et al. (2010).

  2. 2.

    All raw data, scripts used for analysis, and analyzed data used for generating the figures are available at http://github.com/contreraskallens/ExploratoryMapping.

  3. 3.

    We thank an anonymous reviewer for pointing out this limitation.

  4. 4.

    We thank the anonymous reviewers for pointing us towards this research.

  5. 5.

    We also explored the second methodology, as it holds much promise. However, our study is based on a relatively small set of words, and so choosing a new “word base” for doing this transformation proved to be more difficult, and tended to produce less stable results.

  6. 6.

    Due to space constraints, we can only show a few of these word clouds. However, a more thorough exploration of the parameters can be found in the aforementioned GitHub repository.

  7. 7.

    We thank an anonymous reviewer for this suggestion.


  1. Adams, F., & Aizawa, K. (2010). Defending the bounds of cognition. In R. Menary (Ed.), The extended mind (pp. 67–80). Cambridge, MA: The MIT Press.

  2. Alhazmi, F., Beaton, D., & Abdi, H. (2017). The latent semantic space and corresponding brain regions of the functional neuroimaging literature. bioRxiv. https://doi.org/10.1101/157826.

  3. Almeida e Costa, F. (Ed.). (2005). Embodied and situated cognition [Special Issue]. Artificial Life,  11(2).

  4. Anderson, M. L. (2003). Embodied cognition: A field guide. Artificial Intelligence, 149(1), 91–130.

  5. Barsalou, L. W. (1999). Perceptions of perceptual symbols. Behavioral and Brain Sciences, 22(4), 637–660.

  6. Bechtel, W., & Abrahamsen, A. (2006). Phenomena and mechanisms: Putting the symbolic, connectionist, and dynamical systems debate in broader perspective. Contemporary debates in cognitive science. Oxford, UK: Basil Blackwell.

  7. Bechtel, W., & Graham, G. (1998). A companion to cognitive science. Oxford, UK: Blackwell.

  8. Beer, R. D. (1995). A dynamical systems perspective on agent-environment interaction. Artificial Intelligence, 72(1–2), 173–215.

  9. Bergmann, T., & Dale, R. (2016). A scientometric analysis of evolang: Intersections and authorships. In S. Roberts, C. Cuskley, L. McCrohon, L. Barceló-Coblijn, O. Fehér, & T. Verhoef (Eds.), The evolution of language: Proceedings of the 11th international conference (EVOLANGX11). http://evolang.org/neworleans/papers/182.html. Retrieved 22 June 2018.

  10. Berry, M. W., Dumais, S. T., & O’Brien, G. W. (1995). Using linear algebra for intelligent information retrieval. SIAM Review, 37(4), 573–595.

  11. Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: Analyzing text with the natural language toolkit. Sebastopol, CA: O’Reilly Media Inc.

  12. Blatt, E. (2009). Differentiating, describing, and visualizing scientific space: A novel approach to the analysis of published scientific abstracts. Scientometrics, 80(2), 385–406.

  13. Calvo, P., & Gomila, T. (2008). Handbook of cognitive science: An embodied approach. New York, NY: Elsevier.

  14. Chemero, A. (2011). Radical embodied cognitive science. Cambridge, MA: The MIT Press.

  15. Chemero, A., & Silberstein, M. (2008). After the philosophy of mind: Replacing scholasticism with science. Philosophy of Science, 75(1), 1–27.

  16. Clark, A. (1998). Being there: Putting brain, body, and world together again. Cambridge, MA: The MIT Press.

  17. Contreras Kallens, P. A. (2016). La máquina del fantasma: unidad de análisis en la ciencia cognitiva (Unpublished master’s thesis). Universidad de Chile, Av. Capitán Ignacio Carrera Pinto 1025, Nuñoa, Santiago, Chile.

  18. Cowley, S. J. (2011). Distributed language (Vol. 34). Amsterdam, Netherlands: John Benjamins Publishing.

  19. Dale, R. (2008). The possibility of a pluralist cognitive science. Journal of Experimental and Theoretical Artificial Intelligence, 20(3), 155–179.

  20. Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391.

  21. de Oliveira, G. S., & Chemero, A. (2015). Against smallism and localism. Studies in Logic, Grammar and Rhetoric, 41(1), 9–23.

  22. Di Paolo, E. A. (2005). Autopoiesis, adaptivity, teleology, agency. Phenomenology and the Cognitive Sciences, 4(4), 429–452.

  23. Dumais, S. T. (1991). Improving the retrieval of information from external sources. Behavior Research Methods, Instruments, & Computers, 23(2), 229–236.

  24. Edelman, S. (2008). On the nature of minds, or: Truth and consequences. Journal of Experimental & Theoretical Artificial Intelligence, 20(3), 181–196.

  25. Evangelopoulos, N., Zhang, X., & Prybutok, V. R. (2012). Latent semantic analysis: Five methodological recommendations. European Journal of Information Systems, 21(1), 70–86.

  26. Evangelopoulos, N. E. (2013). Latent semantic analysis. Wiley Interdisciplinary Reviews: Cognitive Science, 4(6), 683–692.

  27. Feldman, J. A., & Ballard, D. H. (1982). Connectionist models and their properties. Cognitive Science, 6(3), 205–254.

  28. Fellows, I. (2014). wordcloud: Word clouds. Retrieved from https://CRAN.R-project.org/package=wordcloud (R package version 2.5).

  29. Fodor, J. A. (1975). The language of thought. Cambridge, MA: Harvard University Press.

  30. Garfield, E., et al. (1970). Citation indexing for studying science. Nature, 227(5259), 669–671.

  31. Gentner, D. (2010). Psychology in cognitive science: 1978–2038. Topics in Cognitive Science, 2(3), 328–344.

  32. Gibbs, R. W, Jr. (2005). Embodiment and Cognitive Science. Cambridge, UK: Cambridge University Press.

  33. Gibson, E. J., & Pick, A. D. (2000). An ecological approach to perceptual learning and development. Oxford, UK: Oxford University Press.

  34. Gibson, J. J. (1979). The ecological approach to visual perception: Classic edition (2014). Hove, UK: Psychology Press.

  35. Gomila, T., & Calvo, P. (2008). Directions for an embodied cognitive science: Toward an integrated approach. In P. Calvo & T. Gomila (Eds.), Handbook of cognitive science: An embodied approach (pp. 1–25). San Diego, CA: Elsevier.

  36. Griffiths, T. L., Chater, N., Kemp, C., Perfors, A., & Tenenbaum, J. B. (2010). Probabilistic models of cognition: Exploring representations and inductive biases. Trends in Cognitive Sciences, 14(8), 357–364.

  37. Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National academy of Sciences, 101(suppl 1), 5228–5235.

  38. Haugeland, J. (1978). The nature and plausibility of cognitivism. Behavioral and Brain Sciences, 1(2), 215–226.

  39. Haugeland, J. (1981). Semantic engines: An introduction to mind design. In J. Haugeland (Ed.), Mind design (pp. 34–50). Cambridge, MA: MIT Press.

  40. Hohwy, J. (2013). The predictive mind. Oxford, UK: Oxford University Press.

  41. Hollan, J., Hutchins, E., & Kirsh, D. (2000). Distributed cognition: Toward a new foundation for human–computer interaction research. ACM Transactions on Computer–Human Interaction (TOCHI), 7(2), 174–196.

  42. Hu, X., Cai, Z., Wiemer-Hastings, P., Graesser, A. C., & McNamara, D. S. (2007). Strengths, limitations, and extensions of LSA. In T. K. Landauer, D. S. McNamara, S. Dennis, & W. Kintsch (Eds.), Handbook of latent semantic analysis (pp. 401–426). New York, NY: Routledge.

  43. Hutchins, E. (1995). Cognition in the wild. Cambridge, MA: The MIT Press.

  44. Jorge-Botana, G., Olmos, R., & Luzón, J. M. (2018). Word maturity indices with latent semantic analysis: why, when, and where is Procrustes rotation applied? WIREs Cogn Sci, 9, e1457.

  45. Kireyev, K., & Landauer, T. K. (2011). Word maturity: Computational modeling of word knowledge. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies - Volume 1 (pp. 299–308). Stroudsburg, PA: Association for Computational Linguistics.

  46. Kugler, P. N., & Turvey, M. T. (2015). Information, natural law, and the self-assembly of rhythmic movement. Abingdon, UK: Routledge.

  47. Kuhn, T. S. (2000a). Commensurability, comparability, communicability. In J. Conant & J. Haugeland (Eds.), The road since Structure: Philosophical essays 1970–1993, with an autobiographical interview (pp. 33–57). Chicago IL: The University of Chicago Press.

  48. Kuhn, T. S. (2000b). What are scientific revolutions? In J. Conant & J. Haugeland (Eds.), The road since Structure: Philosophical essays 1970–1993, with an autobiographical interview (pp. 13–32). Chicago, IL: The University of Chicago Press.

  49. Landauer, T. K., & Dumais, S. T. (1997). A solution to plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2), 211.

  50. Landauer, T. K., Foltz, P. W., & Laham, D. (1998). An introduction to latent semantic analysis. Discourse Processes, 25(2–3), 259–284.

  51. Langfelder, P., Zhang, B., & Horvath, S. (2008). Defining clusters from a hierarchical cluster tree: The dynamic tree cut package for R. Bioinformatics, 24(5), 719–720.

  52. Leydesdorff, L. (1998). Theories of citation? Scientometrics, 43(1), 5–25.

  53. Louwerse, M. M. (2011). Symbol interdependency in symbolic and embodied cognition. Topics in Cognitive Science, 3(2), 273–302.

  54. Martin, D. I., & Berry, M. W. (2007). Mathematical foundations behind latent semantic analysis. In T. K. Landauer, D. S. McNamara, S. Dennis, & W. Kintsch (Eds.), Handbook of latent semantic analysis (pp. 35–56). New York, NY: Routledge.

  55. Marx, W., & Bornmann, L. (2010). How accurately does thomas kuhn’s model of paradigm change describe the transition from the static view of the universe to the big bang theory in cosmology? Scientometrics, 84(2), 441–464.

  56. Marx, W., & Bornmann, L. (2013). The emergence of plate tectonics and the kuhnian model of paradigm shift: A bibliometric case study based on the anna karenina principle. Scientometrics, 94(2), 595–614.

  57. McCauley, R. N., & Bechtel, W. (2001). Explanatory pluralism and heuristic identity theory. Theory & Psychology, 11(6), 736–760.

  58. McClelland, J. L., Botvinick, M. M., Noelle, D. C., Plaut, D. C., Rogers, T. T., Seidenberg, M. S., et al. (2010). Letting structure emerge: Connectionist and dynamical systems approaches to cognition. Trends in Cognitive Sciences, 14(8), 348–356.

  59. Menary, R. (Ed.) (2010a). 4e cognition: Embodied, embedded, enacted, extended [Special Issue]. Phenomenology and the Cognitive Sciences, 9(4), 459–463.

  60. Menary, R. (2010b). Introduction to the special issue on 4e cognition. Phenomenology and the Cognitive Sciences, 9(4), 459–463.

  61. Michaels, C. F., & Carello, C. (1981). Direct perception. Englewood Cliffs, NJ: Prentice-Hall.

  62. Moravcsik, M. J., & Murugesan, P. (1979). Citation patterns in scientific revolutions. Scientometrics, 1(2), 161–169.

  63. Newell, A. (1973). You can’t play 20 questions with nature and win: Projective comments on the papers of this symposium. In Visual Information Processing (pp. 283–308).

  64. Oaksford, M., & Chater, N. (2009). Précis of bayesian rationality: The probabilistic approach to human reasoning. Behavioral and Brain Sciences, 32(1), 69–84.

  65. Olmos, R., Jorge-Botana, G., León, J. A., & Escudero, I. (2014). Transforming selected concepts into dimensions in latent semantic analysis. Discourse Processes, 51(5–6), 494–510.

  66. Olmos, R., Jorge-Botana, G., Luzón, J. M., Martín-Cordero, J. I., & León, J. A. (2016). Transforming lsa space dimensions into a rubric for an automatic assessment and feedback system. Information Processing & Management, 52(3), 359–373.

  67. Paxton, A. E. (2015). Coordination: Theoretical, methodological, and experimental perspectives. Ph.D. thesis, University of California, Merced.

  68. Priva, U. C., & Austerweil, J. L. (2015). Analyzing the history of cognition using topic models. Cognition, 135, 4–9.

  69. R Core Team. (2017). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.

  70. Richardson, M. J., Shockley, K., Fajen, B. R., Riley, M. A., & Turvey, M. T. (2008). Ecological psychology: Six principles for an embodied–embedded approach to behavior. In P. Calvo & T. Gomila (Eds.), Handbook of cognitive science: An embodied approach (pp. 159–187). San Diego, CA: Elsevier.

  71. Robbins, P., Aydede, M., et al. (2009). The Cambridge handbook of situated cognition. Cambridge, UK: Cambridge University Press.

  72. Rowlands, M. (2010). The new science of the mind: From extended mind to embodied phenomenology. Cambridge, MA: The MIT Press.

  73. Rumelhart, D. E., McClelland, J. L., Group, P. R., et al. (1987). Parallel distributed processing (Vol. 1). Cambridge, MA: The MIT Press.

  74. Salton, G., Wong, A., & Yang, C.-S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613–620.

  75. Sankey, H. (1997). Incommensurability: The current state of play. Theoria: An International Journal for Theory, History and Foundations of Science, 12(3), 425–445.

  76. Shapiro, L. (2010). Embodied cognition. Abingdon, UK: Routledge.

  77. Shapiro, L. (2014). The Routledge handbook of embodied cognition. Abingdon, UK: Routledge.

  78. Sidorova, A., Evangelopoulos, N., Valacich, J. S., & Ramakrishnan, T. (2008). Uncovering the intellectual core of the information systems discipline. MIS Quarterly, 32(3), 467–482.

  79. Smolensky, P. (1988). On the proper treatment of connectionism. Behavioral and Brain Sciences, 11(1), 1–23.

  80. Spivey, M. (2008). The continuity of mind. Oxford, UK: Oxford University Press.

  81. Stepp, N., Chemero, A., & Turvey, M. T. (2011). Philosophy for the rest of cognitive science. Topics in Cognitive Science, 3(2), 425–437.

  82. Stewart, J., Stewart, J. R., Gapenne, O., & Di Paolo, E. A. (2010). Enaction: Toward a new paradigm for cognitive science. Cambridge, MA: The MIT Press.

  83. Tenenbaum, J. B., Griffiths, T. L., & Kemp, C. (2006). Theory-based bayesian models of inductive learning and reasoning. Trends in Cognitive Sciences, 10(7), 309–318.

  84. Tenenbaum, J. B., Kemp, C., Griffiths, T. L., & Goodman, N. D. (2011). How to grow a mind: Statistics, structure, and abstraction. Science, 331(6022), 1279–1285.

  85. Thelen, E., & Smith, L. B. (1996). A dynamic systems approach to the development of cognition and action. Cambridge, MA: The MIT Press.

  86. Van Gelder, T. (1995). What might cognition be, if not computation? The Journal of Philosophy, 92(7), 345–381.

  87. Varela, F. J., Thompson, E., & Rosch, E. (2017). The embodied mind: Cognitive science and human experience. Cambridge, MA: The MIT Press.

  88. Von Eckardt, B. (1995). What is cognitive science? Cambridge, MA: The MIT Press.

  89. Vygotsky, L. S. (1997). The collected works of LS Vygotsky: Problems of the theory and history of psychology (Vol. 3). Berlin, Germany: Springer.

  90. Wheeler, M. (2014). Revolution, reform, or business as usual? The future prospects for embodied cognition. In L. Shapiro (Ed.), The Routledge handbook of embodied cognition (pp. 374–383). Abingdon, UK: Routledge.

  91. Wilson, M. (2002). Six views of embodied cognition. Psychonomic Bulletin & Review, 9(4), 625–636.

  92. Wilson, R. A. (2004). Boundaries of the mind: The individual in the fragile sciences-Cognition. Cambridge, UK: Cambridge University Press.

  93. Yoshimi, J. (2012). Active internalism and open dynamical systems. Philosophical Psychology, 25(1), 1–24.

  94. Ziemke, T. (Ed.). (2002). Situated and embodied cognition [Special Issue]. Cognitive systems research 3(3).

  95. Ziemke, T. (2003). What’s that thing called embodiment. In R. Alterman & D. Kirsh (Eds.), Proceedings of the 25th annual meeting of the cognitive science society (pp. 1305–1310). Mahwah, NJ: Lawrence Erlbaum.

  96. Zwaan, R. A. (2014). Embodiment and language comprehension: Reframing the discussion. Trends in Cognitive Sciences, 18(5), 229–234.

Download references


We want to thank professors Paul Smaldino and Jeff Yoshimi for their feedback on this paper. Thanks to Martin Irani for his help with coding and feedback on the preliminary results.

Author information

Correspondence to Pablo Contreras Kallens.


Appendix 1: Search filters

For the search procedure, quotes were used on the keywords that generated the most confusion on the results due to how common the words used are. Only references classified as articles, proceedings paper, reviews, book chapters and editorial material were downloaded.

The following subdiscipline categories provided by WoS were used (in alphabetical order):

Behavioral Sciences; Computer Science—Artificial Intelligence; Computer Science—Cybernetics; Computer Science—Information Systems; Computer Science—Interdisciplinary Applications Computer Science—Theory Methods; History Philosophy of Science; Language—Linguistics; Linguistics; Neurosciences; Philosophy; Psychology; Psychology—Applied; Psychology—Biological; Psychology—Developmental; Psychology—Educational; Psychology—Experimental; Psychology—Mathematical; Psychology—Multidisciplinary; Psychology—Social; Robotics; Social Sciences—Interdisciplinary

Appendix 2: Entropy calculation

The formula with which each cell was weighted in our analysis, from (Martin and Berry 2007, p. 38). Each cell was weighted locally with the logarithm of frequency plus 1, \(\log \left( f_{ij} + 1 \right)\). Then, that value was multiplied by the entropy of each one of the terms:

$$\begin{aligned} 1 + \sum \limits _{j} \frac{P_{ij} \times \log _{2}P_{ij}}{\log _{2}n} \end{aligned}$$

where \(P_{ij}\) is the number of times the term i appears in document j, divided by the number of times the term appears in all of the documents, and n is the total number of documents in the dataset. This formula assumes that terms that appear in fewer documents are more informative than terms that appear in more documents. Thus, the values of the former are relatively increased, while the values of the latter are relatively diminished.

Appendix 3: Other dendrograms

In Figs. 16 and 17, we present the dendrograms that obtain by using the values lower (\(D = 3\)) and (\(D = 5\)) than the range of the stable dendrogram presented in Fig. 2. Figure 18 shows the same D as the one used to produce Fig. 2 (\(D = 10\)), but including a randomization of the theory labels attached to each paper (Figs. 19, 20, 21, 22, 23).

Fig. 16

\(D = 3\)

Fig. 17

\(D = 5\)

Fig. 18

\(D = 10\). Randomized theories. Note the low distance values between the tree branches in comparison to Fig. 2, on the y axis

Appendix 4: Selection of number of dimensions to evaluate

Fig. 19

Mean performance of the prediction by changing the number of dimensions allowed to be evaluated when selecting the best predictors for each theory. Peak performance is achieved by limiting it to 80 dimensions (red line). However, performance is robust, so this parameter can be changed without much decrease in performance. Results aggregate over 1000 iterations. (Color figure online)

Appendix 5: Prediction performance across values of D

Fig. 20

Mean performance of the GLM of each theory. D is the number of dimensions used for the model. Results aggregate over 10, 000 iterations

Fig. 21

Mean performance of the 8 models across values of D. Aggregated over 10, 000 iterations

Fig. 22

Mean performance of the GLM of each theory using new data set. D is the number of dimensions used for the model. Results aggregate 10, 000 iterations

Fig. 23

Mean performance of the 8 models across values of D using new data set. Results aggregate 10, 000 iterations

Appendix 6: Prediction performance in randomization condition

Figure 24 shows the prediction confusion matrix for \(D = 20\) the eight different theories with randomization of labels. The maximum value shown is 16.9% of confusion (embodied - ecological) and the minimum value is 8% (distributed - enactive). Figure 25 shows the mean performance for the prediction (y-axis) for each theory (x-axis) when theory labels are randomized.

Fig. 24

Confusion matrix with randomized theories, \(D = 20\)

Fig. 25

Mean performance by D. Randomized

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Contreras Kallens, P., Dale, R. Exploratory mapping of theoretical landscapes through word use in abstracts. Scientometrics 116, 1641–1674 (2018). https://doi.org/10.1007/s11192-018-2811-x

Download citation


  • Latent semantic analysis
  • Cognitive science
  • Text analysis
  • Theoretical issues