1 Introduction

Comparative study of early years across nations provides the ability to potentially identify trends in child development that are consistent across cohorts. Through the identification of common patterns in child development across studies and nations, important insight into the environmental factors impacting on well-being and inequalities could be gained. Numerous factors have been linked to children’s socio-emotional and behavioural well-being including maternal health and depression (e.g. Eamon and Zuehl 2001); family structure (McLanahan 1994); poverty (Williams et al. 2014; Lee 2011); behavioural characteristics of the family and parents and characteristics of the children themselves (Watson et al. 2012). Research in this area is extremely valuable in advancing our understanding of risk and protective factors and the development of interventions to improve the well-being of children.

Conducting comparative harmonised analysis is ambitious and not without challenges. Although there have been many child cohort studies conducted, data access and data comparability have been identified as primary challenges when conducting harmonised analysis (Waldfogel 2013). Further challenges arise when examining the various sampling methods, study designs, measurement tools, response rates and attrition – these need to be identified and addressed before comparative study can be conducted (Bath et al. 2010; Vaus 2008). However, increasingly studies are being designed with cross-nation comparison in mind and it is increasingly possible to replicate studies, or conduct meta-analysis with data from different countries (for example: Bradbury et al. 2015; O'Keeffe et al. 2015; Pilkauskas and Martinson 2014; Washbrook et al. 2012). Through the development of international collaborations common research priorities can be studied and conducting new comparative harmonised analysis offers an alternative perspective to the replication of studies (Wood et al. 2017). The benefits of collaborative harmonised research include the potential to validate findings by coordinating study design (Cooper et al. 2011; Hofer and Piccinin 2009).

Starting in 2016, the Growing Up Healthy in Families Across the Globe (GUH) project is an international collaboration examining the possibility of coordinating harmonised analysis using five longitudinal studies. Specifically, the aim of GUH was to gain greater insight into findings from studies carried out in New Zealand through using international comparisons (from Ireland and Scotland). It was anticipated that learning from similar countries through critical analysis, cross-national findings could be used to reinforce the findings from the individual studies – particularly important as individually some of the studies have relatively small sample sizes (see below).

Although many countries across the globe have established child cohort studies, Ireland and Scotland were identified as suitable comparators in the GUH study for several reasons. Initially, potential for comparison was established after the identification of the triad of ‘Growing Up’ studies located in New Zealand, Scotland and Ireland. These three studies, all established within five years of each other, were built on accumulated knowledge of child cohort studies to date and designed with multi-level biopsychosocial models. The studies all incorporated the best and most widely validated outcome measures in all domains of the child’s life (including, for example, the Strength and Difficulties Questionnaire (SDQ) which is subsequently used in the paper). Geographically and politically the three countries have similarities that span across population size, health and education systems, indigenous populations and languages, and persistent inequalities. The similarity in settings, and respective similar longitudinal child cohort studies (see further below), enhances the potential power of harmonised analysis.

This paper will introduce the five longitudinal studies and describe the initial comparative harmonised analysis and findings. The aim of the harmonised analysis was to increase understanding of how families change over time and to determine how and why environments change, which environments are supportive of aspects of child development and consider how the policy context can shape these environments. Similarities and differences across the studies are identified, implications of the initial findings and next steps are discussed.

1.1 The Studies

New Zealand, Ireland and Scotland share aspects of a common heritage – all countries have been influenced by their colonial relationship with England – and resulting cultural norms. The three countries are similar in population size and public services provision (Table 1). There are some critical differences between countries, New Zealand is a more ethnically diverse country with a significant proportion of the population comprised of Māori, Asian and Pacific Peoples, however all three countries experience significant migration.

Table 1 Comparative demographic and public service characteristics of New Zealand, Ireland and Scotland

Three of the GUH studies come from New Zealand. The oldest study is Te Hoe Nuku Roa – Best Outcomes for Māori (THNR; Cunningham et al. 2013) and was designed specifically to capture the influence of Māori society on personal and family development. The second study, the Pacific Island Families Study (PIFS; Paterson et al. 2007) follows the development of New Zealand born children identifying as Pacific People. Next, Growing Up in New Zealand (GUiNZ; Morton et al. 2012) is a longitudinal study following the lives of children (from before birth) representing a cross-section of the diverse births in contemporary New Zealand. Comparison between New Zealand, Scotland and Ireland has been carried out using the studies Growing Up in Scotland (GUS; Anderson et al. 2007) and Growing Up in Ireland (GUI; Greene et al. 2010), both on-going child cohort studies in Scotland and Ireland respectively.

All five studies were commissioned by their respective governments and funding bodies to conduct research that would inform policy development. One objective of the GUH study was to gain further insight into the ethnically unique findings from THNR and PIFS by harmonising their data to GUiNZ, GUS and GUI; and all studies were interested in how cross-national comparisons of child outcomes could be used to inform local policy development. GUiNZ can be seen as the link study with strong similarities to GUS and GUI in terms of design and structure and to PIFS and THNR in terms of the importance put on understanding the lives of the Māori and Pacific Peoples of New Zealand.

Independently, an underpinning objective of the five studies is to understand what shapes the trajectories of child development (both positive and negative influences) and the studies were constructed using similar biopsychosocial conceptual frameworks. All the studies are interested in the dynamics of family change and work to inform policy to potentially improve population wellbeing across the life-course. These similarities made them ideal for the GUH study and exploring cross-national child outcomes. THNR differs from the other studies as it is a household study as opposed to a child cohort study, with a particular interest in cultural diversity, however based on the overlaps in study objectives and domain themes it was thought likely that some similar data would have been collected across all five studies. Through comparative analysis across the five cohort studies, the GUH study aimed to firstly determine the potential to generate harmonised variables to study risk factors and secondly, to analyse the consistency of the risk factors across the studies. The study asked the following research questions:

  1. 1.

    Are the studies comparable in relation to study approach, theoretical frameworks and purpose?

  2. 2.

    What are the similarities and differences across the five studies relating to key descriptives and measures (demographics, income, housing, health, social development etc.)?

  3. 3.

    Do any factors consistently relate to positive or poor child development across the five studies?

2 Methods

The GUH study carried out ex post harmonised analysis using data from five longitudinal studies. The study did not attempt to merge the data into one dataset, or to conduct meta-analysis from previous studies. Rather, the study was designed by aligning the five studies (both in relation to study design and key variables) and then conducting novel harmonised analysis.

Initially the theoretical approach, sampling strategy and data collection methods applied in the five studies were compared to test for shared purpose and approach. Next sample characteristics were aligned to identify cross-over in the waves of data collection. Details were extracted in relation to the age of children when data was collected, who the data was collected from and the frequency of data collection. Overlapping time points of data collection were identified.

2.1 Measures

Available survey documentation was examined in detail to identify measures and domains for which data was collected across the studies. Survey questionnaires and data dictionaries were the primary documentation source – supplemented by summary frameworks of measures and guides to the different waves of data collection. Potential key measures and questions were identified and inputted into an Excel spreadsheet clustered by domain topics (health – of both child and primary care giver; socio-economic status – education, employment, housing tenure; childcare provision; family dynamics; social support; child development; personal demographics – of both child and primary care giver). Summary data was transferred into tables and it was identified whether potential variables of interest were present across the five studies. The potential for harmonisation of variables across studies was categorised by attributing the following criteria to identified common variables:

  1. 1.

    Strong: Variable is present in all or most studies and information collected is very similar or identical.

  2. 2.

    Moderate: Variable is present in all or most studies however information collected varies slightly.

  3. 3.

    Weak: Variable is present in most or some studies and information collected varies slightly or significantly.

The process of categorisation of variables was first carried out by one researcher and then extensively checked by the project collaborators.

2.2 Analysis

Using five variables that could be harmonised through an identified strong match a simple cumulative risk factor model was constructed. The variables used were maternal relationship status, maternal education, smoking in pregnancy, maternal self-reported health and maternal long-standing illness. The cumulative risk model construction was based on a 12-factor model created to measure vulnerability using GUiNZ data, and following this model children were categorised as either having 0–1, 2–3 or 4–5 risks (Chittleborough et al. 2011; Morton et al. 2014; Morton et al. 2015; Wallander et al. 2019).

The accumulated risk was reported in relation to socio-emotional behaviour through a comparison with outcomes from the Strengths and Difficulties Questionnaire (SDQ total difficulties; Goodman 1997). The SDQ scores are categorised using three established thresholds; the close to average range is reported as 0–13 points, 14–16 points relates to a borderline category and a score of 17–40 relates to potentially problematic socio-emotional behaviours. These thresholds were established to so that roughly 80% of children are in the close to average range, 10% are borderline and 10% are potentially problematic (Goodman 1997).

Two time points were used in the analysis, the risk model used variables from children at age nine to twelve months and the SDQ outcomes where measured when children were aged four and half to six years old. A cross-tabulation was produced that compared accumulated risk factors measured at the first time point against being in the borderline or potentially problematic range (14–40 points) of SDQ outcomes the at the second time point. The relationship was illustrated using a clustered bar graph. Borderline and potentially problematic SDQ scores were combined at this point to account for small numbers in the potentially problematic category. The analysis was carried out independently by each study, following the agreed methodological approach, and then collated by an independent researcher (*first author Initials).

3 Results

Analysis of the theoretical approach and research methodology from across the studies revealed similar study objectives: all seek to measure dynamic change over time (longitudinal study designs) with a focus on the well-being and development of children in the context of their families – and all studies aim to inform policy and practice. Careful attention was given to examining the sampling strategies, data collection procedures and survey design of the studies. While very similar data collection approaches were taken by GUiNZ, GUS and GUI (area-level sampling, face-to-face computer assisted interviews, widely validated measures in the questionnaires), approaches were tailored to the samples for PIFS and THNR. For example, the entire cohort for PIFS was recruited from Middlemore Hospital because its maternity division has the largest number of Pacific births in New Zealand (Paterson et al. 2007), and in THNR innovative cultural appropriate measures were developed to capture important outcomes for the Māori population (Cunningham et al. 2013).

The key characteristics of the five studies are presented in Table 2. THNR is the oldest study and was a household panel study, therefore the children ranged in ages and the study differed significantly in methodological approach. For THNR five waves of data collection were carried out over an extended period of time, initially 1995–1998 followed by: 1998–2000, 2000–2002, 2004–2007, and the last wave of data collection was from 2011 to 2014 (Cunningham et al. 2013).

Table 2 Key characteristics of the five Growing Up Healthy studies

Time-points of data collection (in relation to the child age) were compared and three potential ages where there was significant overlapping data were identified (Table 3). At each identified time-point data had been collected from the primary care giver; in most cases by a face-to-face survey with the mother. The first identified common time-point was between ages nine and 12 months. This was the first data collection wave for GUI and GUS, however data was first collected in pregnancy for GUiNZ, and at six weeks for PIFS. Some of the data for GUiNZ (at nine months) and PIFS (at 12 months) was therefore derived from the previous wave of data collection. The second and third overlapping time-points were at three to four years and five to six years. For THNR data was collected about children of all ages in the household, but only in the cases where there is longitudinal data for the same child at nine to 12 months, three to four years and five to six years could the data potentially be used for comparative analysis.

Table 3 Identified parallel waves for harmonised analysis from the five Growing Up Healthy studies (with ages of children at each wave)

The exploration of the measures and variables used in the five studies revealed significant overlap in areas of data collection. Information relating to health of child and primary care giver, social economic status (education, employment, housing tenure), childcare provision, family dynamics, child development and personal demographics was common across studies. However many questions were asked in a variety of ways, using a mix of scales and standardised measures, therefore representing moderate or weak potential for harmonisation. Five maternal variables with a strong match were identified at time-point one: 1) maternal relationship; 2) maternal education; 3) maternal smoking in pregnancy; 4) maternal self-reported health; 5) maternal long-standing illness/disability (see Supplementary Information for full detail of questions). SDQ was used at later time-points by all studies (except THNR) to measure child development and was identified as an appropriate outcome variable. However the measure was not completed at every time-point, therefore only one additional time-point was used in the analysis (age ranged from 4.5 to 6 years old).

Descriptive statistics from the five studies were produced for the harmonised variables (Table 4). Time-point one characteristics for THNR were drawn from all children aged 0–6 years from the first wave of data collection (n = 274 children). The small sample resulted in less reliable data and results were not directly comparable to the other studies. Gender was included to illustrate the consistency across the studies and the four child cohort studies (excluding THNR) show very similar results. Maternal relationship was harmonised by dichotomising the variable to living with a partner or not, across the child cohort studies the range of living with a partner was from 79.7–87.8%.

Table 4 Descriptive statistics from the five Growing Up Healthy studies

Similarities were seen in relation to smoking in pregnancy (dichotomised to yes or no) with a range from 74.9–82.1%, and maternal long-standing illness/disability with a range 11.4–18.0% (THNR reported 27.2%). Greater variance was seen across maternal education and maternal self-reported health, in both cases variables were harmonised to a scale. However, scales were not harmonised to a similar number of points (for example PIFS only had a 3-point scale for self-reported health) as only the lower end of the scales was needed to build the risk model.

At time-point two results of the SDQ are compared (Table 4). The SDQ was not collected by THNR, and PIFS only collected the prosocial section of the measure (alternative questions are asked in relation to behavioural difficulties). The scale was harmonised according to the validated thresholds (Goodman 1997). The range across the SDQ close to average category between GUiNZ, GUS and GUI was 76.3–89.6%. Greater variance was seen across the prosocial scores (including PIFS); the range of children in the close to average range was 80.9–97.0% (including borderline for GUiNZ), however the range of potentially problematic prosocial scores was smaller 2.1–3.0%.

Figure 1 shows the relationship between the risk-factor model and SDQ outcomes, highlighting the percentage of children in each cumulative risk category with borderline or potentially problematic SDQ scores. The identified risk factors were: 1) not living with partner; 2) maternal education lower secondary or less; 3) maternal fair/poor health; 4) maternal long-standing disability; 5) smoking during pregnancy. A clear association is seen between cumulative risk-factors and SDQ outcomes, 40.0–43.6% of children across GUiNZ, GUS and GUI with four or five of the identified risk factors also have borderline or potentially problematic SDQ scores. In contrast only 8.9–19.6% of children with zero or one risk factor are also in the borderline or potentially problematic SDQ score range. A chi-square test of independence was calculated and a significant interaction was found (GUI: Χ2(2) = 298.02, p < .001; GUS: Χ2(2) = 56.29, p < .001; GUiNZ: p < .001).

Fig. 1
figure 1

Percentage of children with borderline or potentially problematic SDQ scores by number of risk factors

4 Discussion

The primary aim of the Growing Up Healthy in Families Across the Globe project was to investigate the potential for performing ex post harmonised analysis across five longitudinal studies. In answer to the first research question, the studies have been compared in relation to study design, theoretical frameworks and purpose – similarities and differences have been found. The frameworks underpinning the individual studies were tailored to each cohort, however the studies have a shared overarching objective, which is seeking to increase knowledge about what shapes the development of children in the context of their families over time. This objective is unsurprising, and indeed would be shared by most, if not all, longitudinal studies of child development. However, it is the differences that are important to note. While all were designed as longitudinal studies, THNR differed significantly as it is a household panel study and the other four are child cohort studies. Combined with the relatively small number of child respondents within THNR ultimately it will be difficult to produce significant results from THNR when conducting harmonised analysis alongside the other studies. The triad of Growing Up studies, GUiNZ, GUS and GUI, applied very similar approaches to study design and data collection, allowing comparison of findings across these studies. While PIFS and THNR applied a more tailored approach to their data collection, GUiNZ has the potential to act as a link study as all three of the New Zealand studies incorporate into their designs an emphasis on the need to understand the cultural diversity within the country.

To address the second research question, the ex post data harmonisation of measures resulted in the identification of several closely aligned variables. Focussing on overlapping themes, key measures used across the five studies were identified that are either the same (standardised measures) or had the potential for post-hoc harmonisation. The process of post-hoc harmonisation does result in the simplification of variables (for example dichotomising a more nuanced scale such as marital status) and through this process inevitably some of the detail in each study is lost. Some variables have nation-specific scales, for example the construction of a harmonised ‘maternal education’ measure was complicated due to different education systems across nations. However, the initial results reported from the GUH study do show that there is a potential to harmonise variables across a range of domains by relative rankings in different contexts. Strong similarities were found across key risk factors in all studies, although not surprisingly given the above reasons there was less alignment to measures from THNR.

The similar trend presented in the results from the SDQ (total difficulties) measure across GUiNZ, GUS and GUI (Table 4) perhaps reflects the value of standardised and validated measures (Hofer and Piccinin 2009). Given the different countries and contexts where the data was collected, variation across the harmonised results is to be expected. Consistence was found across a high number of risks and the percentage of children with borderline or potentially problematic SDQ scores. The comparable results may firstly reflect that SDQ is a robust measure, and secondly that the three studies were consistent in applying the measure to a standard that might allow further advanced analysis. Perhaps the similarities reflect in part the similarities across the nations, which was the founding block of the GUH project. Although the risk factors were harmonised ad hoc the study was collaboratively designed. Results from the risk model clearly show that similar vulnerabilities matter in Ireland, Scotland and New Zealand – with more risks there was a substantially higher likelihood of being in the borderline or potentially problematic SDQ categories in all three countries (although there was less distinction between two to three and four to five risks in GUiNZ).

At this stage, it is hard to speculate whether the differences across the studies are due to study design or genuine differences between the countries and the populations, which in turn may indicate the impact of policy on different cohorts of children. The risk model was constructed based on previous study using the GUiNZ data (Morton et al. 2014), perhaps different risks might have a greater impact on the GUI and GUS cohorts. Further study will include critically different variables across countries, such as ethnicity and religion, to explore in greater detail how these differences impact on child development. Isolating the individual risk factors and conducting regression analysis to further understand the impact on child development will be the next stage of the GUH study.

International comparative research is extremely important in understanding child development and the initial findings presented in this paper suggest that post-hoc harmonisation has been successful and very worthwhile. Further collaborative study applying harmonised analysis should be pursued. Potentially there are major policy implications that could emerge from the work. Whether it is consistently showing the effects of smoking during pregnancy across countries, or perhaps showing variation in the impact of maternal education, highlighting the similarities and the differences across the nations will add weight to the policy agendas in each country. Although it is unlikely that the data from THNR could be able to be harmonised in detail, there is potential that more measures from PIFS could be able to be aligned to the other three studies. Harmonisation would be particularly beneficial to the PIFS study in order to identify the risk factors specific to Pacific peoples, and therefore provide evidence for targeted interventions. Comparisons between PIFS and GUiNZ also offers the opportunity to compare similar numbers of Pasifika children over time within the NZ context a decade apart, when different policy settings and societal norms were operating.

4.1 Limitations

Contextual and temporal dimensions need to be considered when comparing the results of harmonised analysis. The GUH study identified overlapping time points across the studies where data was collected from children at the same age, however, due to the different starting point of the studies the data was collected years apart in same cases. For example, data for the first overlapping time-point (ages nine to 12 months) was collected by PIFS in 2001, GUS in 2005, GUI in 2008 and GUiNZ in 2011. The different years of data collection does not make the comparisons weaker, however the political and social environment at the time in the specific country needs to be considered when interpreting results. Through further examination of different policy interventions at different time points greater understanding of the findings might emerge, however disentangling the effects of policy versus demographics remains a challenge for comparative social research.

As is common in longitudinal social surveys, not all questions were asked in each wave of data collection. After the identification of overlapping time-points, the variation in data collected by the studies resulted in further restrictions of the possible harmonisation of measures. In relation to using SDQ as an outcome measure, it was only available in three of the five studies in overlapping time-points. SDQ is a measure more commonly used as children reach school age. As the children age (particularly in relation to GUiNZ – the youngest cohort) and further waves of data are released, more opportunities for harmonisation will become available.

The analysis presented in this paper focussed on key variables that were categorised as having a strong degree of similarity across studies. This approach was adopted to test the potential for harmonisation but limited the number of variables selected. There is also potential value in comparing variables measured in different ways, and not necessarily deemed to have strong similarity, to see if there is a varying effect on a common outcome. Further work that examined a broader range of variables would potentially be able to include all five studies more fully.

Access to data is available to external users subject to the access requirements and controls of each study. All the studies are funded and managed in different ways and there are various ethical issues to be considered when releasing unpublished analysis or indeed full datasets. As the studies (excluding THNR) are active with on-going data collection, understandably data and unpublished findings cannot always be released to external users before it has first been analysed as part of the primary study. Regardless, data access is becoming easier and through the collaborative approach to the GUH study the ability to conduct robust harmonised analysis has been demonstrated.

5 Conclusions

The process of cross-nation ex post harmonisation is not straightforward, and divergent factors (such as methodological differences) need to be considered. While ex-ante harmonisation may be considered preferable by some, one potential disadvantage of ex-ante harmonisation is that variables might be unnecessarily reduced to those which can be harmonised. Although further work on definitions relating to measures is necessary, the value of ex-post analysis has been illustrated by the GUH study.

The initial harmonised analysis carried out in the GUH study has demonstrated significant potential for cross-national comparison where overarching study objectives and conceptual frameworks align – even if individual measures are not exactly equivalent. It has been established that longitudinal analysis could be conducted over three identified time-points – perhaps limited to the three cross-sectional growing up studies (GUiNZ, GUS and GUI). No attempt to harmonise analysis across these studies has been previously carried out. The next stage of the study will involve further examination of harmonised measures and more detailed longitudinal analysis of individual factors effecting child well-being and development. Importantly comparison of outcomes needs to account for any explicit differences in socio-political and demographic contexts and future work will examine in more detail the impact of difference policy settings on child outcomes.