Introduction

High quality education in all age groups play an important role in the Sustainable Development Goals and the UN Agenda 2030 (United Nations 2015). Early childhood education (ECE) is ascribed a key role in promoting child development and preparing the children for their school careers (Burchinal et al. 2000; Duncan and National Institute of Child Health and Human Development Early Child Care Research Network 2003; Li et al. 2013; Mashburn et al. 2008).

With an increasing number of research activities and scientific publications in the field of early childhood education, it is becoming more and more difficult to get an overview of topics and trends in a research field (Linnenluecke et al. 2020). Common research synthesis approaches are systematic literature reviews or meta-analysis (Schmidt 2007; Siddaway et al. 2019). Systematic reviews help to get an overview of the current state of research related to a specific topic (Linnenluecke et al. 2020). For example, in early childhood education (ECE) research current literature reviews and meta-analysis were conducted concerning the role of early childhood educators  experiences (McMullen et al. 2020), structural and process quality aspects of ECE programs (Eadie et al. 2022), specific ECE quality indicators like staff-child ration (Perlman et al. 2017), strategies to promote children’s physical activity in early childhood education settings (Mak et al. 2021), children’s social, emotional and cognitive development (Johnstone et al. 2022), multicultural education (Khalfaoui et al. 2021), professional development of ECE professionals (Brunsek et al. 2020) and various child outcomes. It is hard to obtain a general overview of the major topics and temporal trends in this research field due to the large number of publications. One way in dealing with such big data is structural topic modelling (Roberts et al. 2019).

The aim of this current paper is to identify key research topics and temporal trends in early childhood education research between 2000 and 2021 by analysing scientific publications using a structural topic modeling approach.

Structural topic modelling

Qualitative text analyses like grounded theory (Strauss and Corbin 1997) are resource intensive and have limitations in analysing a large number of texts. The progress in computer sciences leads to the development of different text mining approaches like Latent Dirichlet Allocation (Blei et al. 2003) or structural topic modeling (Roberts et al. 2014, 2019) to deal with large amounts of textual data (Tonidandel et al. 2022; Vayansky and Kumar 2020). Topic modeling is a useful natural language processing technique to discover latent research themes/topics from a large number of documents (Sharma et al. 2021; Vayansky and Kumar 2020). Topic modeling is an “unsupervised machine learning technique that automatically learns and discovers the latent themes and their prevalences across a collection of documents” (Sharma et al. 2021 p. 3). Unsupervised techniques gather the themes of the texts in an inductive way without predefinition of codes ex ante by the researcher (supervised methods) (Roberts et al. 2014; Tonidandel et al. 2022). Not only the frequency of words is analysed, but also the relationships between words (which words occur frequently together) and documents are taken into account (Tonidandel et al. 2022). “The central premise underlying topic models is that words that occur in a similar context tend to have similar meanings” (Tonidandel et al. 2022 p. 2). The advantage of structural topic modeling is that meta-data (e. g. publication year) of the documents could be included in the analyses. The relationships between the topics and the meta-data could be estimated (Roberts et al. 2019). This approach makes it possible to discover temporal trends of research topics in a specific scientific field. Structural topic modeling was used in tourism research to analyse hotel reviews (Hu et al. 2019), to gather a comprehensive overview of information management research (Sharma et al. 2021), in educational technology research (Chen et al. 2020) or in the field of learning analytics and educational data mining (Lemay et al. 2021).

The aim of the present study is to get an overview of major research topics in the field of early childhood education research and discover temporal research trends. Therefore, a structural topic modelling approach is used.

Following research questions were examined:

  • What are the major research topics in early childhood education research published between 2000 and 2021?

  • How have the identified research topics evolved over time?

Method

Data collection and data cleaning

Bibliometric data of early childhood research papers were collected from Web of Science (WoS Core Collection) and Scopus because these databases provide a great coverage of social science literature (Norris and Oppenheim 2007). Following search terms (in the title, keywords, and abstracts) were used to identify research papers for the analyses: (“early child*” OR “preschool” OR “kindergarten”) AND (“education” OR “teach*”). Only articles in the field of social science, education, psychology and multidisciplinary etc. published between 2000 and 2021 and written in English language were exported (author names, year, title, abstract, keywords, journal etc.) from the databases. The exported bibliometric data from WoS and Scopus were merged using the R (R Core Team 2017) package bibliometrix (Aria and Cuccurullo 2017). Duplicate records were removed. 39,926 articles were used for structural topic modeling analyses (see Table 1).

In a next step the data were cleaned by transforming all characters to lower, removing URLs, punctuation, numbers, stop words (standard stop word list included in the tm package; e.g. and, by), symbols and special characters using the tm package (Feinerer et al. 2008) in R. These are standard data pre-processing steps in text analysis (Banks et al. 2018). Because there is no standard stop word dictionary, the list from the tm package (Feinerer et al. 2008) was used. Stop word were removed because these occur very frequently but they are not very important to get information about the content of the paper. Furthermore, the search terms were also removed because these words occur in all papers. Data were lemmatized using the textstem (Rinker 2018) R package. In lemmatization, different words that meaning the same thing are replaced by a lemma using a dictionary (Welbers et al. 2017). For the current study a dictionary was constructed for lemmatization. Descriptive analysis of the data were performed in the R package bibliometrix (Aria and Cuccurullo 2017).

Data analysis

In structural topic modeling (Roberts et al. 2019) meta-data of the texts (year of publication) could be incorporated in the analyses. This makes it possible to get information of developments and research trends over the time. Structural topic modeling analyses were conducted using the R (R Core Team 2017) packages stm (Roberts et al. 2019) and tm (Feinerer et al. 2008).

The first step is to determine the number of topics. In an iterative process, topic models with 10 to 100 topics were calculated and compared using the semantic coherence and exclusivity metrics of each model. The average semantic coherence describes “a well-performing measure of topic model quality, which measured whether the top words in each topic tended to occur together across documents” (Vanhala et al. 2020 p. 51). Topic exclusivity is “the amount of overlap in top words across topics” (Tonidandel et al. 2022 p. 5). These criteria were calculated for each model and the model with high values in both metrics was chosen. FREX metrics for each topic (frequency exclusivity scoring), which combines semantic coherence and exclusivity of words in a topic, were used to assess topic quality (Roberts et al. 2019). “FREX is the weighted harmonic mean of the word s rank in terms of exclusivity and frequency” (Roberts et al. 2019 p. 11). The words with the highest probability and exclusivity (FREX) for each topic which are used to name the topic are shown in the supplementary materials of this paper. To identify temporal trends in research topics, the relationship between publication year (meta-data) and topics was analysed using linear regressions in the stm package (Roberts et al. 2019). For a detailed description of structural topic modeling and the R package stm see Roberts et al. (2019).

Results

Descriptive analyses

The dataset consists of 39,926 articles published between 2000 and 2021. The articles were published in 3406 journals (see Table 1). Figure 1 shows that research in early childhood education has continuous increased over the years. The annual growth rate of publications is 10.5% (see Table 1; Fig. 1).

Table 1 Description of the dataset
Fig. 1
figure 1

Annual scientific productivity over the years (2000–2021; annual growth rate: 10.48%)

Topics

Topic models with 10 to 100 topics were compared using semantic coherence and exclusivity metrics. Models with 43, 44, 45, 46, 47, 48, 49 and 56 topics show good scores on both metrics. Since there is no single correct solution concerning the number of topics, the interpretability of these models was inspected in detail. The model with 48 major research topics (see Table 2) were chosen because this model shows high semantic coherence and high exclusivity values and the topics in this model have good interpretability. To interpret the topics in the model and name it, the word profiles of each topic were inspected using the words with the highest probability and FREX (Airoldi and Bischof 2016; Roberts et al. 2019) words (see supplementary materials). “FREX weights words by their overall frequency and how exclusive they are to the topic” (Roberts et al. 2019 p. 13). Furthermore articles which are highly associated (top 10 papers per topic) with a specific topic were screened to interpret and name the topic (Roberts et al. 2019).

The 10 most frequent research topics are (full list of topics see Table 2; Fig. 2):

  • Topic 9: Cultural practice in early child education (ECE)

  • Topic 38: Early child education professionals  experiences, attitudes, and beliefs (e.g. self-efficacy)

  • Topic 13: Educational institution and processes– development and evaluation issues

  • Topic 22: Assessment and measures (e.g. development assessment, social-emotional measures)

  • Topic 28: Interventions in ECE (e.g. cognitive and behavioural interventions): development improvement and effectivity

  • Topic 23: Professional development of ECE educators and leadership issues

  • Topic 25: Educational policy, -reforms and -systems

  • Topic 46: Inclusive practices: challenges and support issues (e.g. coaching, supervision)

  • Topic 44: (Pro)social behaviour, aggression, violence, bulling, peer rejection

  • Topic 34: Performance issues: abilities, cognition, memory, visuospatial etc.

Table 2 Extracted topics
Fig. 2
figure 2

Expected topic proportions of the extracted topics

Trends

To identify research trends over time, the publication year was included in the analyses using linear regressions. Figures 3, 4 and 5 show topics which have significantly (p < 0.05) changed of the years due to publication activity. Top 10 topics were marked with * to indicate that major research topics could decrease over the years even though there is still high research activity in these topics. The analyses show that research activity dealing with cultural practices (topic 9*; which is the most frequent studied topic between 2000 and 2021 see Table 1), development and evaluation issues regarding educational institutions and processes (topic 13*), educational practices, aims and knowledge of early childhood educators (topic 37), relationship issues (topic 20) and challenges and support issues in inclusive education (topic 46*) has significantly increased over the timespan.

Furthermore, research dealing with issues like self-regulation and executive functions (topic 24), early childhood numeracy and mathematics (topic 29), music activities (topic 32), language development (topic 40), interests (e.g. in stem) (topic 47), physical activity (topic 16), play environments (topic 30), children’s and ECE professionals’ creativity, critical thinking of ECE professionals and children’s spatial skills (topic 8), information and communication technologies in ECE (topic 45) and motivational issues (topic 19) significantly increased of the years.

Topic 1 which is characterized by research dealing with head start programs, transition and school readiness research, attachment research (topic 2), research in the field of adhs, autism and depression (topic 10), disability and special educational needs (topic 43), gender (topic 12) and effective teaching (e.g. instruction and feedback issues) topics (topic 39) show a significant decrease over the timespan 2000 to 2021.

Top 10 topics dealing with aggression and violence issues (topic 44*), assessment and measurements (topic 22*), and the development of interventions (topic 28*) show high publication activity even though they decrease over the time. Following top 10 topics did not significantly change over time: ECE professionals’ experiences and beliefs (e.g. self-efficacy) and attitudes (topic 38*), professional development ECE educators and leadership issues (topic 23*), educational policy, -reforms and -systems (topic 25*) and performance issues: abilities, memory, visuospatial, cognition etc. (topic 34*). The number of published articles was relatively stable.

Fig. 3
figure 3

Topic prevalences over the years; significant topic proportion changes over the years for the topic 1, 2, 8, 9*, 10, 12, 13*, 16 and 19

Fig. 4
figure 4

Topic prevalences over the years; significant topic proportion changes over the years for topic 20, 22*, 24, 28*, 29, 30, 32, 37 and 39

Fig. 5
figure 5

Topic prevalences over the years; significant topic proportion changes over the years for topic 40, 43, 44*, 45, 46* and 47

Discussion and conclusion

The aim of this structural topic model study is to provide an overview of the major research topics and trends in early childhood education literature between 2000 and 2021 using a structural topic modeling approach. The present results show the diversity of research topics as well as a strong increase (2004–2021) in international publications in this research field.

Socially and educationally relevant topics (cp. Sustainable Development Goals 2030) such as cultural practices, design of educational processes in educational institutions, inclusion and the support structures required for implementation, relationship aspects in ECE, play environments, educational practices, aims and knowledge of early childhood educators as well as communication and information technologies have increased in scientific discourse over the years. More specific areas of development, such as self-regulation, executive functions, motivation and interest (e.g. stem), numeracy, creativity, critical thinking of educators, music activities, language development and physical activity, are also increasingly addressed in publications.

Further thematic areas that have shaped the research landscape between 2000 and 2021 (even if publication activity has declined or remained relatively stable over time) are (pro)social behaviour, aggression and violence issues, performance, assessment and measurement issues, intervention research, experiences and training of educational professionals, and leadership issues. The examination of various educational systems and reforms is also often the subject of academic work.

There are also research topics which show a significant decline in publication activity, for example, topics that focus specifically on children with special needs. The expansion of the concept of inclusion (Education for all) (UNESCO 2005) may have contributed to this. Despite the trend of topics like physical activity and health, these are less frequently represented in the research landscape than research on developmental areas such as language, self-regulation, social-emotional development, and cognitive skills.

Topics like professionalisation and ECE quality aspects (support services in the implementation of inclusive practices, further training of pedagogical staff, leadership, educational policy systems and reforms, etc.) are also given high priority in early childhood education research.

In summary, early childhood education research provides a comprehensive body of knowledge in many areas that contribute to the achievement of the Sustainable Development Goals 2030 (United Nations). These include, for example, the creation of effective inclusive learning environments for all (SDG 4 A), access to quality early childhood education and preparation for primary education (SDG 4.2), professional development of ECE teachers and cooperation (SDG 4 C), cultural diversity, non-violence and peace etc. (SDG 4.7) and health and well-being issues (SDG 3). This body of research should be taken into account when implementing interventions (e.g. creating inclusive learning environments, professional development programs, physical activity and health interventions for children) in early childhood education institutions. Furthermore, changes in ECE education systems and decisions related to the provision of resources (to build professional development programs for ECE teachers, creation of favourable working conditions etc.) recommendations based on scientific work should be considered.

The diversity of topics in early childhood education reflects the general trend towards interdisciplinary research in education and educational sciences (Huang et al. 2020), which should be intensified. The increase of research activity in early childhood education also means that it is becoming increasingly difficult to get an overview of sub-areas of this research field. Interdisciplinary cooperation can help to bundle expertise from different fields. Even though there is a wide range of research, the translation of research findings to practice, and education systems is still challenging.

Limitations

Although the two scientific databases Web of Science and Scopus have a good coverage of international publication activity in this field, it must be noted that research papers that are not listed in these databases are not taken into account in this study. These include unpublished research papers, papers published as project reports on various homepages or published in journals that are listed in other databases. Publications written in non-English languages are also not included. Research papers written in English language show higher numbers of citations than non-English language articles which does not mean that non-English language papers are of lesser quality (Di Bitetti and Ferreras 2017), but it is assumed that their impact on the scientific community is lower than publications in English language. So, the role of non-English papers in the international discourse and the impact on the scientific community tends to be seen as rather limited due to the language barrier, lower probability of citation and the restricted readership (compared to papers published in English-language journals with high coverage due to their indexing in the major databases Web of Science and Scopus). Further limitations refer to the selection of the optimal number of topics. It has to be mentioned, that there is no single correct solution when choosing the number of topics in the model (Schmiedel et al. 2019). Standard metrics and interpretability were used to choose the model.