Early childhood, particularly from birth to age six, is a critical period for child language acquisition and social-emotional development. Language development involves the mastery of vocabulary, sentence structures, and communication nuances through observation and imitation. Furthermore, children also need to learn how to comprehend and express emotions, establish interpersonal connections, and develop skills related to emotional regulation. Early development in language proficiency and social-emotional abilities lays the foundation for future social interactions, emotional regulation, and communication skills [1, 2]. Additionally, they serve as early indicators of mental disorders such as autism spectrum disorders (ASD) [3, 4]. Developmental delays in these domains are widespread globally, with rates of language delay reaching 13.5–17.5% in developed countries within the first three years of life, similar to China [5,6,7,8]. Social-emotional developmental delays are even more pervasive, ranging from 11.8 to 26.2% [9,10,11]. Therefore, it is crucial to explore and comprehend the developmental shifts and key factors influencing early language and social-emotional skills in children.

Internationally, numerous large-scale longitudinal cohorts (Table 1) are dedicated to child development and health, such as the U.S. Baby Connectome Project (BCP, 2016–2022) and the Healthy Brain and Child Development (HBCD, 2021–2030), the Canadian Healthy Infant Longitudinal Development (CHILD, 2004–2012), and initiatives in Europe and the Netherlands. Similarly, since 2015, China has initiated several significant cohort studies, including the Dynamic Monitoring Cohort of Maternal and Child Health, the China National Birth Cohort, and the Beijing-Tianjin-Hebei Natural Population Cohort. While these cohorts provide valuable data on various aspects of child health and development in China, they often lack a synchronized approach integrating behavioral, neuroimaging, and detailed assessments of language and social-emotional development, which are critical for a comprehensive understanding of early childhood development. This study, as a subproject and behavioral branch under the Chinese Baby Connectome Project (CBCP) [12], funded by the Ministry of Science and Technology of China, specifically delves into the behavioral aspects of early childhood. It concentrates on the development of language and social-emotional skills in infants and young children within the context of their environment, primarily focusing on the family milieu. Engaging in data exchange with other subprojects concentrating on brain development, this endeavor aims to unveil the neurobiological mechanisms governing language and social-emotional growth. Therefore, the behavioral data collected in this study will later undergo analysis alongside neuroimaging and physiological data obtained from other subprojects, aiming to comprehend children’s developmental trajectories.

Table 1 International initiatives of longitudinal cohort construction on child development

The following sections will introduce the potential environmental factors that influence the development of social-emotional and language skills, review the current assessment tools used in these domains, and outline our specific objectives for this study.

Developmental changes and influencing factors shaping early language and social-emotional skills

The progression of language and social-emotional skills in early childhood traverses key milestones from prenatal phases to the early years. Even before birth, infants begin detecting sounds and voices in the womb [13], with cries serving as their initial mode of communication at birth, marking the onset of their developmental journey [14]. In the initial months, infants exhibit instinctual communication cues such as smiles and cooing, laying the foundation for crucial social interactions [14]. Around the age of six months, babbling and mimicking facial expressions signal the emergence of basic speech sounds and social engagement [15, 16]. As children approach the toddler years, typically between 2 and 3 years old, there is a remarkable expansion in vocabulary and the formation of simple sentence structures, denoting significant language advancements [17]. Concurrently, the development of emotional self-regulation and early empathy becomes evident, signifying the growth of social-emotional skills [18, 19]. Progressing into the ages of 3 to 6 years, children’s social circles widen, leading to enhanced interactions with peers. During this phase, language skills continue to develop significantly, with children expanding their vocabulary and refining their sentence structures [17]. Their ability to articulate thoughts and emotions with greater complexity enhances their overall communication skills. In terms of social-emotional skills, children aged 3 to 6 years typically experience further growth in emotional regulation and empathy [20, 21]. They become more adept at recognizing and responding to the emotions of others, displaying heightened social awareness and sensitivity.

Research across these developmental stages underscores the intricate interplay of environmental and social factors that shape language and social-emotional development in children. Cultural backgrounds are pivotal, contributing to disparities in developmental milestones. Variations in language acquisition speed and methods can be linked to exposure to diverse linguistic environments [22]. Cultural norms also influence family dynamics, parent-child interactions, and societal expectations, impacting a child’s social skill development [23]. Critical factors such as parental involvement, exposure to language-rich environments, and social interactions are essential in nurturing these skills [24, 25]. Meaningful exchanges between infants and caregivers can accelerate language acquisition, while a supportive and nurturing environment fosters emotional regulation, empathy, and social competence [26,27,28,29]. Understanding these developmental markers and the influencing factors provides vital insights for effective interventions and supportive systems crucial for fostering healthy language and social-emotional development in early childhood.

Assessment tools for language and social-emotional development in children

Assessment tools for evaluating language and social-emotional development are commonly categorized into screening scales and diagnostic scales (Table 2). Screening tools like the Denver Developmental Screening Test (DDST) and Ages and Stages Questionnaires (ASQ) play a crucial role in identifying potential developmental delays. Diagnostic instruments such as the Gesell Developmental Schedules (GDDS), Bayley Scales of Infant Development (BSID), and Griffith Mental Development Scales provide more detailed assessments and diagnostic capabilities.

Table 2 Current language and social-emotional assessment tools summary

In China, challenges persist despite the availability of assessment tools for language and social-emotional development in children under 6 years old. Existing screening scales are often outdated, lacking discriminatory power due to obsolete norms and test items like the DDST and GDDS. Resource utilization can be inefficient, with complex and time-consuming screenings primarily conducted in specialized institutions, posing limitations in using tools like the Bayley Scales due to specialized training requirements. Furthermore, some measures lack localized research and standardized norms in China, such as the SEAM. It is noteworthy that while tools like the ASQ and SEAM series have been introduced to offer a cost-effective and user-friendly approach filled out by primary caregivers, they may not perfectly align with developmental milestones for Chinese children due to cultural disparities. Thus, there is a clear need for the development and implementation of culturally relevant and up-to-date assessment tools for language and social-emotional development in Chinese children under 6 years old. These tools should be designed to align with the unique developmental milestones and cultural contexts of Chinese children, while also being accessible and efficient for use in a variety of settings, including primary healthcare and community centers. By addressing these challenges, the study can ensure more effective and equitable screening and support for children’s early development in China.

Study objectives

As previously mentioned, the field faces several challenges: the absence of comprehensive and user-friendly tools for assessing language and social-emotional development in Chinese infants and young children, the underutilization of emerging techniques such as AI-powered video/audio analysis, and the lack of a robust integration of behavioral assessments with multimodal data to delve into developmental mechanisms. To tackle these issues, the study outlines three primary objectives:

  1. (1)

    Develop a streamlined, accessible assessment toolkit for language and social-emotional development in children aged 0–6, ensuring that it aligns with the specific needs and contexts of the Chinese population. These tools will establish accurate norms and industry standards, enabling the sensitive detection and screening of early developmental risks.

  2. (2)

    Create an intelligent assessment system that measures language and social-emotional development in children aged 0–6 within naturalistic interactive contexts. This system will incorporate technologies such as facial expression recognition, micro-expression recognition, body movement, emotion recognition, and automatic behavioral coding using machine learning and other techniques to analyze children’s social and emotional behaviors in naturalistic social interactions.

  3. (3)

    Explore the influence of environmental factors on the language and social-emotional development of children aged 0–6 from both physiological and psychological perspectives. As part of a broader initiative, this study will integrate the psychological and behavioral data collected with physiological data obtained from other subprojects, including multimodal magnetic resonance imaging (MRI) and electroencephalography (EEG). The aim is to leverage this combined data to gain a comprehensive understanding of the early brain’s structure, function, and neural connectivity, thereby shedding light on the interplay between environmental influences and developmental processes.

Methods

Study design

This study is majorly funded by the Chinese Ministry of Science and Technology. The protocol for this study underwent a thorough peer review process that included evaluation by the Chinese Ministry of Science and Technology as well as the Research Ethics Committee of ShanghaiTech University.

Part 1: 0–6 child development assessment toolkit creation

To create a comprehensive toolkit to measure early language and social-emotional development, a general procedure will be followed (Fig. 1). In the realm of language assessment, our efforts involve gathering, revising, and compiling language tests to create a repository of appropriate assessments for Chinese infants and young children. For assessing children’s receptive vocabulary skills, the study will develop the Early Chinese Vocabulary Test (ECVT), drawing inspiration from established vocabulary tests like the Peabody Picture Vocabulary Test-Revised (PPVT-R) [30]. Recognizing the potential for cultural and linguistic disparities with the PPVT-R, our vocabulary selection will incorporate words from a variety of sources, including the PPVT-R, the Child Language Data Exchange System (CHILDES) [31], and the Ministry of Education recommended children’s books [32]. Furthermore, the vocabulary will draw from the Chinese Children’s Lexicon of Oral Words (CCLOOW) [33], and the Chinese Children’s Lexicon of Written Words (CCLOWW) [34] database, which are derived from animated films and TV shows designed for children aged 3 to 9 in China. In selecting vocabulary, the study will include nouns, verbs, adjectives, and classifiers, while excluding overly abstract terms, onomatopoeic and compound words, idioms, proverbs, culturally or gender-biased words, and those containing English letters or translations. The EVCT will present words in a picture format, similar to the PPVT-R, with four images simultaneously displayed to the child. The child will then select the image that best corresponds to the word they have heard. A sample of children in Shanghai will be recruited to establish reference standards and assess the reliability and validity of the EVCT.

Fig. 1
figure 1

Developing process of the measures

For young children under the age of two, language development will be assessed using the Putonghua (Mandarin Chinese) Communicative Development Scale (PCDI) questionnaire [35], which will assess children’s production and comprehension of gestures, vocabulary, and sentences. Furthermore, children’s language production will also be assessed in observational studies as part of their language development evaluation. For instance, during the show-and-tell segment of the peer interaction paradigm, four children will take turns sharing their birthday stories. The audio recordings from these sessions will be transcribed, and the children’s language development will be analyzed across five dimensions: grammar, vocabulary, phonology, productivity, and narrative discourse [36].

In the realm of social-emotional assessment, the study will collect a range of advanced tools commonly used internationally to evaluate social-emotional development in infants and young children aged 0–6. These tools are being customized to suit local context. The Chinese version of the Social-Emotional Assessment Measure (SEAM) for preschoolers aged 36–66 months has been successfully adapted and its psychometric properties examined [37]. The next phase involves adapting the SEAM for infants and toddlers aged 2–18 months and 18–36 months. This process includes professional translation following International Test Commission guidelines, cultural adaptations, validation by the original SEAM authors, assessment of readability and cultural appropriateness with teachers and parents, and evaluation of psychometric properties within the infant and toddler groups. Moreover, the current study attempts to track children’s social-emotional development with a series of age-appropriate observational paradigms. Particular attention will be given to children’s inhibition, persistence, compliance, and social competence, especially as these behaviors emerge during mother-child separation, frustration tasks, and peer interactions [38, 39].

Part 2: Intelligent coding system implementation

To achieve the second objective, the study aims to establish a naturalistic interactive setting to observe children’s language and social-emotional development. The approach will involve an in-depth examination of parent-child interactions and peer interactions (Table 4). All observational studies will be meticulously recorded and subsequently analyzed through a rigorous coding process. Subsequently, the study intends to harness AI to automate and streamline code analysis of children’s behaviors, paving the way for efficient and precise data processing in future studies.

The observational study room is equipped with strategically positioned cameras to capture facial and bodily expressions throughout the experimental sessions. Every session will be recorded via video to enable post-session analysis. AI specialists will incorporate the Skinned Multi-Person Linear Model (SMPL-eXpressive) [40] and the self-developed Mixture of Uncalibrated Cameras for Robust 3D Human Body Reconstruction (MUC) method [41] to track children’s movement in the behavioral tasks (e.g. theory of mind tasks), which then can be used to generate a comprehensive report on their performances on the task. For recognizing and analyzing social emotions, long short-term memory networks (LSTM) and convolutional neural networks (CNN) will be used to estimate gaze and identify joint attention during interactions.

Part 3: Exploration of environmental influence

The third overarching objective is to systematically identify protective and risk factors from numerous variables to examine the complex interplay of these factors and how they collectively impact the mechanism underpinning social-emotional and language development in children. Employing the refined assessment scales and instruments from Part 1, alongside the intelligent assessment tools designed in Part 2 for observational contexts, an accelerated longitudinal cohort design is implemented to monitor the language and social-emotional outcome of children at various developmental stages. Simultaneously, environmental data are collected from the primary caregivers, predominantly mothers. The accelerated longitudinal design expedites data accumulation over time, offering a strategic advantage for studies that require the monitoring of changes within a condensed timeframe. In this study, distinct sub-cohorts are arranged for follow-up at different intervals, categorized by their visitation frequency: Subgroup A undergoes assessments every 3 months, Subgroup B every 6 months, and Subgroup C every 12 months. For a detailed overview of the cohort design, please refer to Table 3.

Table 3 Accelerated longitudinal design

The developmental origins of health and disease theory posits that early-life environmental exposures significantly influence individual health trajectories [42]. The maternal uterine environment and the postnatal family environment are fundamental to a child’s survival and well-being. Maternal stress during pregnancy has been linked to increased vulnerability in offspring to fear and overreaction under stress conditions, a phenomenon that has been established as a stable epigenetic pattern. This “toxic stress” can lead to alterations in the epigenome of the exposed individuals, which may be transmitted to subsequent generations [43]. From the perspective of the Social Ecological Systems Theory, the family constitutes the immediate microsystem in which children are immersed postnatally. Factors such as parental caregiving cognition, parenting styles, and parent-child attachment are likely to intersect with the development of linguistic and socio-emotional abilities. For instance, research has revealed that children with early insecure attachments are more susceptible to symptoms of anxiety and depression [44]. Moreover, overprotective, punitive, and neglectful parenting approaches have been shown to hinder the social-emotional development of infants aged 12–36 months to varying degrees [45]. In light of these insights, this study aims to gather comprehensive environmental data from maternal pregnancy through children’s first six years, encompassing material surroundings and psychosocial environments. Additionally, demographic characteristics of the infants, such as prematurity status, sex, birth order, and geographical location, will be collected. Through this rigorous methodological approach, the study endeavors to gain a deeper understanding of the developmental trajectories of linguistic and social-emotional abilities, and to delve into the nuanced relationships between environmental variables and developmental outcomes in children.

Study sample

This study adheres to specific guidelines and requirements of the Research Ethics Committee of ShanghaiTech University. As a subproject of the CBCP, this project aims to enroll a minimum of 1000 typically developing children aged 0–6 years in the East China region who do not have congenital developmental abnormalities or a history of neurological conditions.

The sample size was meticulously determined based on a comprehensive consideration of various factors, including the study’s objectives, resource limitations, the risk of attrition, statistical requirements, and the stipulation by the funding body. Given the similarities to previous cohort studies, the sample size was benchmarked against BCP [46], and ultimately, we set the target at 1,000 children aged 0–6 years. The accelerated crossover cohort design including three subcohorts. Subcohorts A focuses on children aged 0–1 year (0–12 months). It is divided into four groups, with the first three groups being followed up every three months result in a total of four visits. As for the last group of Subcohorts A, we would follow up three times as these are the older age groups (10 m.o., 13 m.o., and 16 m.o.). Subcohorts B concentrates on children aged 1–4 years (12–48 months), with six cohorts. Since the developmental pace slows down compared to infancy, the follow-up interval is set at 6 months. Subcohorts C focuses on children aged 3–6 years (36–72 months) and consists of two cohorts, with an extended follow-up interval of every 12 months.

To ensure that the study possesses adequate statistical power, which we have set at 0.95, to detect actual effects, we have established a significance level (alpha) of 0.05 to mitigate the likelihood of Type I errors. Building upon these foundations, we have projected an effect size of 0.50, which represents the anticipated standardized difference between the groups. By inputting these parameters into the formula, n = (Zα/2 + Zβ/ES)^2, where ES is the effect size, α is the significance level, and β is the power, we derived a sample size of 96 for each cohort, which then we decide to round up to 100.

Attrition is a common issue in longitudinal studies, with participants potentially dropping out due to factors such as relocation, loss of interest, or health issues. Despite our efforts to minimize attrition, we estimated the maximum attrition rate of 25% at follow-up. Therefore, to ensure that we specifically target age groups with at least 100 participants, we need to recruit more participants initially. For example, in Subgroup A, to maintain at least 100 infants at the fourth follow-up, we would need to recruit approximately 134 initially, as 100/(1-0.25) ≈ 134. For Subgroup B, with a 15% attrition rate at the third follow-up and key age groups like 12, 15, 18, 21, 24, 36 months being covered in two of the six cohorts, we would need to recruit approximately 67 initially for each of these age groups to ensure a minimum of 100 participants, as 100/2*(1-0.25) ≈ 67. For Subgroup C, focusing on children aged 3–6 years, we would also need to recruit approximately 67 initially for each of the key age groups of 36, 48, 60, 72 months to maintain a minimum of 100 participants. Consequently, as shown in Table 3, the total initial recruitment for all subgroups is 1005. For a detailed overview of the cohort design, please refer to Table 3.

Instruments and measures

This study employs a wide range of assessment tools, including questionnaires and behavioral, observational, and standardized assessments, to evaluate language and social-emotional development in infants and young children while also considering their prenatal and postnatal living conditions. The role of executive functioning, which is essential for self-regulation and key to early language and social-emotional development, will be examined to better understand the mechanisms behind these developmental areas [47,48,49].

Recognizing the rapid developmental pace in early childhood, more frequent measurements will be conducted before the age of two, with shorter intervals between assessments. Due to the limitations of verbal expression and refined behavioral responses in infants aged 0–2, assessments primarily rely on observational methods and input from caregivers, focusing on language comprehension evaluations. For children aged 2–6, assessments primarily involve observation, caregiver feedback, and task-based evaluations, with a specific emphasis on language expression abilities. By integrating a variety of measurements from multiple angles and sources, this methodology not only acts as a reference for associated variables related to language and social-emotional development but also facilitates the exploration of environmental influences on children’s developmental milestones.

Behavioral Assessments and Observations. The behavioral assessments will involve neurofunctional developmental evaluations using tools such as the Griffiths Development Scales (Chinese Edition) and Wechsler Preschool and Primary Scale of Intelligence. Additionally, the assessments will encompass measuring children’s executive functions (e.g., Head-Toe-Knee-Shoulder), social cognitive abilities (e.g., theory of mind), and language development (e.g., Early Chinese Vocabulary Test). Observational studies will incorporate classic experimental paradigms such as the still face, strange situation, delayed gratification, and peer interactions. Video recordings will capture children’s social interactions with mothers/peers during individual tasks for later coding and analysis. The coding will primarily focus on behavioral inhibition, compliance, self-control, and social-emotional regulation in children. Please refer to Table 4 for a detailed list of specific tasks.

Table 4 Behavioral/observational assessment tasks

Parent questionnaires. Thirty parent-report questionnaires are utilized to assess children’s language and social-emotional development levels, along with reporting on environmental factors related to children’s language and social-emotional development. Parental scales related to children’s developmental outcomes include the Child Behavior Checklist, SEAM, Children’s Emotional Adjustment Scale–Preschool Version (CEAS-P), Putonghua (Mandarin Chinese) Communicative Development Scale (PCDI), and others. Parental scales evaluating the family nurturing environment cover various domains such as background demographics, the child’s home environment, parental anxiety, maternal bonding, co-parenting dynamics, parenting stress, as well as assessments of sleep quality, anxiety, depression, life events, and postnatal attachment. See Table 5 for specific questionnaire details.

Table 5 Parent questionnaires

Procedures

The protocol has undergone rigorous ethical review by the Research Ethics Committee of ShanghaiTech University to ensure all processes conform to ethical standards and guidelines for human research, safeguarding participant rights, privacy, and welfare throughout the study. Prior to participation, guardians were assured that all data, encompassing personal details, scores, photographs, and video recordings, would be anonymized for the purposes of scientific research and publication.

This study has established an electronic platform for distributing questionnaires to parents. Parents can schedule and complete the questionnaires before the specified deadline, with the system sending reminders to ensure timely submission. Additionally, when children reach the appropriate age, dedicated experimenters will invite parents to bring their children to the laboratory for behavioral assessments and observational tasks. Parents will receive feedback reports within two weeks after completing of the questionnaires and experiments.

Data analysis

Part 1. For toolkit creation, a representative sample will be obtained to verify its validity and reliability. Data will be analyzed using SPSS 26.0 and Mplus 7.4 to assess reliability and validity. Internal consistency reliability will be tested with Cronbach’s alpha coefficient, split-half reliability via odd/even item numbers, and test-retest reliability. Exploratory and confirmatory factor analyses will evaluate structural validity.

Furthermore, this study will establish norms for evaluating language and social-emotional development in infants and young children, filling the gap of normative standards within existing assessment scales. The process for establishing these norms will include: (a) Intergroup Norms Computation: Averaging performance across age groups to determine developmental norms, (b) Intragroup Norm Computation: Converting raw scores to percentile ranks for individual assessments and clinical reference, (c) Age Conversion Tables Creation: Developing tables for easy reference of age-based percentile ranks and T scores, and (d) Guideline Development: Formulating guidelines for standardized implementation, scoring, and interpretation.

Part 2. To develop an intelligent coding system, deep learning algorithms will be applied to observational recordings. Initially, the model will be trained and tested on an open visual dataset, manually coded according to established coding schemes to measure joint attention, inhibition, child compliance, etc [28, 38, 50,51,52,53,54,55]. As a result, the well-trained algorithms will be capable of automatic scoring and analyzing observational recordings, providing immediate feedback on participant behaviors. For instance, utilizing the joint attention framework [56] (refer to Fig. 2), video data are segmented into seconds, each further divided into 30 frames. Analysis of each frame commences with facial feature extraction, generating an attention map by integrating these data with head position information. Simultaneously, frame processing synthesizes features, emphasizing head positions, head orientation, and attention features. After initial feature extraction, the data will undergo further processing via a LSTM network [57] to incorporate temporal information, capturing dynamic behavioral changes over consecutive frames. The model will adjust feature maps to create a final heatmap reflecting participants’ collective attention in the image, adjusting the heatmap intensity based on attention directionality within or outside the frame. The process, combining deep learning and temporal analysis, will accurately predict and visualize participants’ attention distribution, facilitating an efficient and nuanced analysis of observed behaviors.

Additionally, an automatic scoring system is created by inputting multi-view videos into a 3D reconstruction algorithm to streamline processing and precisely reconstruct human body images [41]. This reconstruction process allows for the generation of intricate motion captions, effectively detailing the behaviors observed in the recordings. These annotations, in conjunction with a predefined behavioral coding scheme, are then consolidated into a comprehensive dataset. Subsequently, this dataset is employed to work with a sophisticated language model, enabling advanced behavioral analysis and automated scoring of observed behaviors. Refer to Fig. 3 for a detailed flowchart of the process.

Fig. 2
figure 2

Workflow of the intelligent coding system on joint attention

Notes. The model segments the video into seconds, and then into frames (30 frames per second). It extracts facial features from each frame and generates an attention map by combining this information with the position of the head. Simultaneously, it processes the full frame to focus on head positions, head orientation, and attention features. Then, these features are further processed through a Long Short-Term Memory network to capture dynamic changes in continuous frames. Finally, the model modulates the generated feature maps to produce a final heatmap, which reflects the collective joint attention of people in the image. The intensity of the heatmap is adjusted based on whether the participant’s attention is directed inside or outside the frame

Fig. 3
figure 3

Multi-view 3D reconstruction pipeline

Notes. This figure illustrates a moment in the Strange Situation, showing the participant placing toys into a box that the administrator is carrying while the mother observes from her seat. Our state-of-the-art 3D reconstruction algorithm effectively analyzes video streams to produce precise representations of the 3D human body by employing a multi-view camera configuration. This reconstruction simplifies the creation of intricate motion descriptions, which accurately describe the observed behaviors depicted in the recordings. These descriptions, along with a predetermined behavioral coding scheme, are combined into a comprehensive dataset. This dataset is subsequently employed to train an extensive language model, empowering sophisticated analysis of behaviors and automated assessment of observed actions

Part 3. To investigate the influence of environmental factors on children’s social emotions and language development, statistical models will be utilized to elucidate the relationships among various factors. Children’s characteristics, maternal attributes, and environmental variables are considered independent variables, with children’s social emotions and language development as dependent variables. Stepwise multiple regression analysis will be employed to explore potential risk and protective factors using data collected from the cross-sectional cohort. Furthermore, reciprocal influences between environmental factors and the individual characteristics of both mothers and children will be investigated using structural equation modeling (SEM) based on established theoretical frameworks. For longitudinal capacity, a latent growth curve model will be constructed to examine the developmental trajectories of infants’ social-emotional and language skills over time, delving into the dynamic interplay among influential factors and children’s language and social-emotional abilities.

The study will extend its analysis to encompass children’s brain development in conjunction with behavioral growth and environmental factors using SEM. Notably, as the first comprehensive cohort capturing the Chinese infant population, this study will permit comparisons with cohort studies from diverse populations to discern developmental similarities and discrepancies across different cultural contexts. As a result, comparative analyses between Chinese infants and infants from varied social and cultural backgrounds (e.g., CBCP versus BCP) will facilitate insights into the developmental trajectories and environmental influences in infant development.

Discussion

Understanding the language and social-emotional development of infants and young children is crucial for their overall well-being and future outcomes. These developmental aspects shape communication skills and form the foundation for cognitive and socioemotional abilities essential for successful adaptation in later life stages. Comprehensive research is vital for informing early intervention strategies that promote healthy development and address potential long-term challenges for children who may be at risk. Therefore, the current study addresses these developmental aspects and advances the field in three ways.

First, this study represents a pioneering effort in China, developing a comprehensive integration of assessments across diverse age groups within 0–6 years old to measure language and social-emotional development, marking a significant advancement in the field of early childhood development research. Most cohort studies in China tend to focus on younger children, with limited inclusion of the broader age spectrum [58, 59]. Among the few that do encompass a wider age range [60], the primary focus is often on individual risk factors for specific diseases, rather than on understanding the intricacies of children’s social-emotional and language development. By employing a combination of questionnaires, behavioral assessments, and observations, the study able to capture the nuanced developmental changes that occur within and between various age groups. This comprehensive approach allows for a more nuanced understanding of the developmental milestones and variations that exist within this critical period. Additionally, the streamlined, accessible assessment toolkit developed as part of this study is expected to establish accurate norms and industry standards, aiding in the early detection and screening of developmental risks. This is particularly important given the prevalence of developmental delays in language and social-emotional domains among young children.

Moreover, the study introduces a novel approach using multiple cameras and advanced AI algorithms in observational research, significantly streamlining the labor-intensive coding process. This method enables precise capture and analysis of human postures through multi-view 3D reconstruction models. By integrating advanced technologies and machine learning, the proposed intelligent assessment system facilitates in-depth observation and analysis of children’s social and emotional behavior in naturalistic settings. The incorporation of the joint attention framework into deep learning algorithms enhances the scrutiny of participant behaviors, with a specific focus on attention distribution within video recordings. This approach automates scoring, provides real-time behavioral feedback, and offers detailed insights into children’s interactions and social cognition in early childhood. The integration of AI-powered technologies into the assessment system enhances the accuracy and consistency of the evaluations, enabling objective and reliable measurement of intricate social-emotional behaviors in naturalistic social settings.

Furthermore, this pioneering project focuses on the intricate developmental trajectories of children’s social-emotional skills and language proficiency, particularly targeting younger age groups that presented challenges during data collection. The study stands out for its wide age range, ensuring a sizable sample size enriched by multimodal data from related subprojects, which enhances the generation of detailed and accurate developmental insights. By integrating behavioral, neuroimaging, and physiological data with environmental factors, we can unveil the complex web of influence that shape language and social-emotional development. Through analyzing children’s brain development alongside behavioral growth using SEM, we can gain a holistic view of developmental trajectories, shedding light on how environmental factors and individual characteristics impact language and social-emotional abilities at both behavioral and neurological levels. Comparing our findings with similar projects worldwide elucidates the subtle variations in child development and environmental influences arising from diverse populations and cultural backgrounds, enriching our insights into early childhood development. Studies conducted in Western countries, such as the BCP, HBCD, and CHILD, may not fully capture the cultural nuances relevant to Chinese infants and young children. This highlights the importance of tailored research in understanding and supporting the unique developmental needs of children in different cultural contexts. Additionally, Eastern culture-focused research often has a narrower age range and primarily emphasizes biological factors, rather than the broader aspects of language and social-emotional development [61]. Our research not only illuminates the complexities of Chinese infant and young children development but also guides tailored interventions to optimize children’s language and social-emotional growth.

Despite the comprehensive approach taken in this study, certain limitations should be acknowledged. First, the complex nature of environmental factors may present challenges in disentangling their contributions to language and social-emotional development. Additionally, integrating technology-based assessment tools and multimodal neuroimaging and physiological data may raise ethical and privacy concerns. Furthermore, the generalizability of findings from this study may be impacted by factors such as cultural differences and socioeconomic disparities in the studied populations. Also, the longitudinal nature of assessing language and social-emotional development in young children may present logistical and ethical challenges that require careful consideration throughout the research process. Finally, the accelerated longitudinal design, despite its effectiveness in spanning the age range of interest in a limited period of time, may pose challenges to statistical analysis because participants are not assessed at the same time and within the same intervals.

In conclusion, the proposed study, despite these limitations, presents a substantial opportunity to deepen our understanding of the environmental influence on the language and social-emotional development of infants and young children. By identifying key developmental markers indicative of healthy language and social-emotional development, the study aims to establish benchmarks for early intervention and support systems. These insights gained from this study have the potential to significantly inform early interventions and support strategies, positively shaping the developmental trajectories of Chinese children. Ultimately, this research stands to contribute to improved outcomes for future generations, thereby advancing our collective knowledge in the field of early childhood development.