Introduction

Childhood Maltreatment (CM) is a serious global public health concern and is associated with long-lasting psychological and physical adverse health outcomes1. The World Health Organisation (WHO) in collaboration with the International Society for the Prevention of Child Abuse and Neglect (ISPCAN) recommend the development of clear operational case definitions and common reliable methods of identification. They have defined CM as "all forms of physical and/or emotional ill-treatment, sexual abuse, neglect or negligent treatment or commercial or other exploitation, resulting in actual or potential harm to the child's health, survival, development or dignity in the context of a relationship of responsibility, trust or power"2.

Estimates suggest 20–25% of the adult population may have experienced some form of maltreatment during childhood3,4,5. Official statistics from child protection agencies and authorities underestimate the true incidence and prevalence in society, as unrecognised and unreported cases would be absent6. The Office of National Statistics (ONS), as at 31 March 2019, 52,260 children in England (43 cases per 10,000) were subject to a Child Protection Plan and a further 2,820 children in Wales (45 cases per 10,000) were on the Child protection register5. The Crime Survey for England and Wales (CSEW) conducted in 2019 estimated that 1 in 5 adults experienced child abuse before the age of 16 years7.

Health professionals, such as general practitioners (GPs), in regular and long-standing contact with children and their families are well placed to identify children ‘at risk’8. The National Institute for Health and Clinical Excellence (NICE) and the General Medical Council (GMC) highlight the importance of the role of health professionals in identifying and documenting concerns of ‘suspected’ CM9,10 as victims may be reluctant to confide or even be unaware that they are being maltreated.

The availability of population-based routinely collected data such as medical records from primary and secondary care sources have the potential to be a rich data resource for research on CM. Hospital data has been used for CM surveillance in the UK11 and abroad12,13,14,15. GP data has a wider range of maltreatment-related concerns due to the extensive code nomenclature. While previous research with routinely collected data has explored CM, there is yet to be an agreed validated list of codes for its identification.

There have been several attempts to identify maltreatment and maltreatment related concerns in primary care. These wider code lists have been designed to capture clinical concern relating to suspected or possible maltreatment16. These include codes such as ‘referral to social services.’ Such codes have been selected to reflect a threshold that should trigger further action by health professionals17. While these codes may indicate potential vulnerability, these codes do not necessarily indicate maltreatment. Someone may be referred to social services for a variety of other reasons (e.g., parental illness or disability). A validation exercise of these codes in three practices indicated high specificity for ‘considered maltreatment’, however considerable under-reporting of CM was found compared with community studies17.

Other code lists for CM have been developed and include categorization into incident and prevalent code lists (with the latter including codes indicative of historical maltreatment18). However, these code lists did not include codes for maltreatment-related concerns and found a considerably lower incidence and prevalence than previous research encompassing a broader list of codes19. A recent paper set out to identify and validate adverse childhood experiences (ACEs), including CM, in routinely collected healthcare data20. This project developed and statistically validated lists using both maternal and child health records from two years before birth to five years after. This study aimed to identify indicators that reflected a continuum of clinically meaningful risk groups consistent with previous ACE definitions. This list represents one of the most comprehensive in the current literature. Incidence and prevalence using this list has not yet been explored and it has yet to be externally validated.

Standardised, reliable, and well-validated methods of identifying CM are needed to better understand the true extent and consequences of CM. In this study we aim to build on the current work. We will assess the ability to identify cases of CM in primary care and hospital admissions data using an externally validated secondary care-based clinically-assessed CM cohort as part of a child protection service as the gold-standard following the methodology of previous validation exercises21. We will explore recording of CM in individual healthcare datasets and discuss the strengths and limitations of coding in each setting, and explore the utility of an algorithm combining these two data sources. We will explore the reasons for missed and incorrectly identified cases in each healthcare setting. We also assess variation in CM rates over time.

Methods

Study design

This is a retrospective e-cohort study.

Ethical approval

Approval was granted on 4/12/2018 by the Swansea University Information Governance Review Panel (IGRP) (approval number 0809), an independent body consisting of a range of government, regulatory and professional agencies (British Medical Association (BMA), National Research Ethics Service (NRES), Involving People, NHS Wales Informatics Service and Public Health Wales (PHW) NHS Trust) and members of the public, which grants approval to studies conducted within the SAIL Databank. All methods were performed in accordance with the relevant guidelines and regulations and in line with the permissions granted under these ethical approvals. All data within the SAIL gateway are treated in accordance with the Data Protection Act 2017 and are compliant with the General Data Protection Regulation (GDPR).

Informed consent was not required as this study utilizes fully anonymised data in accordance with the GDPR.

Data source

We linked data on an individual level via the Adolescent Mental Health Data Platform (ADP), an international data platform that supports mental health research in children and young people (CYP). For our study, the ADP used datasets from the SAIL Databank, a repository of routinely collected health and education datasets for the population of Wales22,23. All data are treated in accordance with the Data Protection Act 2018. The following datasets were linked a patient level:

Welsh Demographic Service (WDS), Welsh Index of Multiple Deprivation containing deprivation scores for all lower super output areas in Wales; GP database (GPD), containing information for all GP interactions covering 79% of the Welsh population; Patient Episode Database for Wales (PEDW), containing data for all NHS Wales hospital admissions.

Externally validated dataset

Cardiff and Vale University Health Board Minimum dataset for CM (CVCM) was imported into SAIL databank via the split-file method for anonymisation. The CVCM dataset comprises 3622 clinical assessments pertaining to 3123 children for suspected CM and includes date of assessment, type of abuse suspected (i.e., physical, neglect, sexual), reason for suspicions, details, and confirmation of findings. Three quarters (75.8%, 2747/3622) of the assessments were conducted on the basis of suspicion of physical abuse, 17% were sexual abuse and around 4% were neglect. For the purpose of this study CM was examined overall with no further stratification by maltreatment type due to the number of cases available within each sub-category. Within the dataset, the outcomes of the clinical assessments were divided into three categories—confirmed maltreatment, possible maltreatment and no maltreatment. Of the 3,123 children, 388 (12.4%) had been seen on more than one occasion and 2889/3123 (92.5%) were under 18, living in Wales and were assessed between 2004 and 2018.

Study population

Individuals aged 0–17 registered with a SAIL-supplying GP from 01.01.2004–10.10.2020 were selected as the baseline population. Data collection began at GP registration date plus six months if newly registered (to avoid misclassification due to retrospective recording at registration), except for the under 1 s, who were followed from GP registration date or study onset whichever was the latest. Data collection ended on the date of GP de-registration, death, 18th birthday or study end whichever was the sooner. Individuals could supply multiple data periods.

Two study cohorts were created for the purpose of this study. The first for the validation exercise, the second for exploring incidence and prevalence (Supplementary File 1 Fig. S1).

Validation cohort

For validation purposes the CVCM dataset was linked with routinely collected data in the SAIL databank following the criteria above at the level of the individual.

We required one assessment in the CVCM dataset per child. Cases were divided into confirmed CM and not confirmed CM (the latter encompassing no maltreatment and possible cases) based on clinician assessment. Therefore, for the children seen on more than one occasion, a hierarchical rule system was adopted, whereby we used the most recent assessment date of any case that had ever been assessed as ‘confirmed CM’, followed by the most recent assessment date for the ‘no CM’ cases.

Only children within the CVCM dataset who were registered to a SAIL-supplying GP, supplied a minimum of 6 months data including the index date, were included for comparisons to the routine data.

Incidence and prevalence cohort

Individuals registered with a SAIL-supplying GP following the inclusion criteria above (independent of linkage to the CVCM dataset) were included in the incidence and prevalence analysis. Data collection for each year began on the 1st on January or the start of follow-up as defined above, whichever was the later, and ended on the 31st December or the end of data collection whichever was the sooner. Person time was calculated between the start and end dates for each year.

Measures

Age and deprivation indices were collected based on the onset of data collection for each year. Individuals were stratified by sex, age group (< 1 y, 1–4 y, 5–9 y, 10–14 y, 15–17 to align with data from the National Public Health Service for Wales) and quintiles of deprivation.

Maltreatment was identified from GP data using primary care read codes and from hospital admissions data using ICD 10 codes. The development of the code lists is described below. Code lists for primary care and hospital admissions were developed to map onto one another as closely as possible. However, the coding nomenclature in primary care is broader and encompasses codes not available in hospital admissions data (e.g., child protection codes).

Incidence and prevalence of CM over time was examined in primary care data and admissions data. Incidence was defined as no record in the previous 12 months. Prevalence was defined as any record of maltreatment within a given year, independent of any previous events21,24.

Development of code lists to identify CM

In order to fully explore the coding of CM in individual healthcare settings two sets of code lists were developed: Read codes for use in GP data and ICD-10 codes for admissions data. While we mapped these two sets of code lists as closely to one another as possible, these distinct coding systems contain different levels of detail. GP data contains a broader coding nomenclature than admissions data, including communication from other healthcare settings and child protection agencies. The utility of each set of codes will be assessed both separately and in combination.

Our codes lists were developed from existing literature15,16,19,20,25,26,27,28, conducting our own searches for codes that may be indicate maltreatment, risk, or cause for concern; and finally based upon clinician judgement. In keeping with previous research, we tiered our code list into those strongly indicating CM/confirmed maltreatment, referred to here as ‘Confirmed CM’ codes, and those codes that may indicate possible or suspected maltreatment or potential vulnerability16,20, referred to here as ‘’Possible CM’ codes. For the Confirmed CM codes in primary care we further divided into prevalent codes and incident codes with codes indicating historical maltreatment (e.g. history of child abuse) excluded from the incident list18.

Additional sensitivity analysis was conducted to refine previous lists testing sensitivity and specificity of different subgroups and using clinical input to determine whether these codes are appropriate (excluding for example codes such as ‘parents on benefits’ and ‘self-neglect.’ Supplementary analysis available on request).

Confirmed CM

Within the category of Confirmed CM were terms that unequivocally stated the existence of maltreatment. This included maltreatment syndromes, history/victim of abuse, prostitution, genital mutilation and criminal neglect/abandonment of baby and child protection categories. Child protection is a response to confirmed maltreatment that has already taken place. This is distinct from safeguarding which refers to measures put in place to prevent harm. Child protection and the presence of maltreatment would be determined by a case conference and this information communicated to GPs and as such should appear in primary care data. However, based on conversations with clinical colleagues child protection information may not always be present in the CVCM dataset and are rarely available in hospital admissions data.

Possible CM

Within the category of ‘Possible CM’ are codes that may indicate risk and vulnerability of children that may co-occur with maltreatment. However, they are not sufficient to indicate maltreatment in isolation. These codes fell into six categories:

  1. 1.

    At risk/safeguarding codes: For example, ‘at risk of abuse’ or ‘safeguarding example’. Safeguarding was distinguished from child protection which are included in the Confirmed CM list (see above). A child or young person is a safeguarding concern when they are living in circumstances where there is a significant risk of abuse (physical, sexual, emotional or neglect). At-risk codes may not be specific enough to record an event of Confirmed CM, however these codes have utility for identifying children at significant risk. This may have implications for long-term outcomes and future research with vulnerable children. There is no equivalent of these codes in hospital admissions data, so these were searched in GP data only.

  2. 2.

    Other social care: For example, ‘referred to social worker’ or ‘in foster care.’ Social care codes not specifically related to child-protection may also be useful for identifying potentially vulnerable children. However, there are many reasons why a child may have contact with social care that may not be related to maltreatment (e.g., parent or child disability and mental or physical health problems of either the parent or the child). There is no equivalent of these codes in hospital admissions data, so these were searched in GP data only.

  3. 3.

    Family circumstances: This includes codes ‘child abuse in family’ or ‘family member on protection register’. These codes are indicative of risk but do not necessarily indicate CM for the patient in question

  4. 4.

    Alleged/suspected maltreatment: For example, ‘suspected child abuse’, ‘alleged abuse’

  5. 5.

    Rib/limb fractures: For example, ‘multiple rib fractures’

  6. 6.

    Assaults: This included general codes such as ‘Assault’ and more specific codes such as ‘physical assault at home’.

The full lists of codes can be found in Supplementary Files 25.

Statistical analysis

Validation exercise

We compared cases of CM identified in the routinely collected datasets to the clinical assessment outcomes recorded in the externally validated CVCM cohort to establish levels of agreement. We utilised the ‘Confirmed CM’ and all ‘Possible CM’ code lists applied to both the 12 months either side of the recorded maltreatment/suspected maltreatment event in the CVCM dataset and searching at any time during the follow-up period. This may indicate the algorithms’ ability to identify individual events compared with utility in identifying maltreated individuals. We further utilised the ‘Confirmed CM’ list adding categories of ‘Possible CM’ codes (e.g. ‘Confirmed CM/other social care’, ‘Confirmed CM/suspected or alleged’). We calculated sensitivity, specificity, and positive predictive values (PPV), negative predictive values (NPV) and 95% confidence intervals and explored reasons for identifying false negatives and false positives. Prevalence of maltreatment was reported because this number affects the PPV and NPV. However, this was the prevalence only within the narrowly defined study population, which was defined by hospital evaluation protocols. It is not the prevalence within the general population of the hospital or community29.

Trends over time

To assess variation over time, we calculated change in annual incidence and 95% CIs for Confirmed CM and Possible CM rates per 1000 PYAR at risk between 01.01.2004 and 10.10.2020 for children aged under 18 years of age. Poisson regression was undertaken to investigate the adjusted association between incidence and prevalence of maltreatment, and year of diagnosis, sex, age group and deprivation. The significance of the variables in the Poisson regression modelling were assessed using Wald tests. Confidence intervals (CIs) for rates were estimated using two-tailed mid-p exact CIs (assuming Poisson distribution)30. Statistical analyses were conducted using SPSS statistical software (version 22).

Results

Study population

The study population comprised 1,078,486 CYP (aged 0–17 years), registered to a SAIL-supplying general practice with at least six months’ worth of GP data between 01.01.2004 and 10.10.2020. They contributed 7,270,724 person years’ worth of data. The mean follow-up for each individual was six years.

There were 2205 CYP who had been clinically assessed for CM out of the population (n = 1,078,486), which comprised the maltreatment validation e-cohort. Of these, around a quarter (25.1%) were confirmed maltreatment. Maltreatment was not confirmed for the remaining 1652 (74.9%) cases (encompassing both possible and no maltreatment categories).

Validation exercise 1: primary care data

Validation results 1a: previously published lists

The sensitivity, specificity and positive predictive values of previously published code lists relating to primary care are shown in Table 1.

Table 1 Sensitivity (95% CI), Specificity (95% CI), PPV(95% CI) NPV(95% CI) of previously published lists when run through validation exercise against gold standard.

Previously published code lists from18 performed with excellent specificity (> 97%). However, sensitivity was low ranging from 7.6% for incident code lists explored in the 12 months either side of an index date to 11.6% for prevalent code lists where records were searched at any time. While specificity for these lists is high around 90% of cases would be missed.

Sensitivity is improved in codes lists from16 ranging from 51.0 to 64.4%. However, specificity is lower ranging from 86.1 to 90.1%. This list included a wider range of maltreatment related codes including codes related to child protection procedures. While sensitivity is higher, this wider code list allows a wider range of potentially indicative codes.

The highest performing previously published code list was that published by20 (Table 1). To make this list comparable with the updated algorithm we ran three versions of this list through the validation exercise. Sensitivity ranged from 48.6%-69.3% and specificity from 85.5 to 91.1% and depending on time restrictions and subset of list used. Inclusion of codes related to wider social care outside of child protection increased sensitivity with a negative impact on specificity. Utilising codes categorised as maltreatment and suspected maltreatment resulted in the highest sensitivity ranging from 59.1 to 69.3 with the lowest specificity of 85.7–89.1.

Validation results 1b: current code list

Results of the validation exercise with the current code list are shown in Table 2. This includes sensitivity analysis accounting for the impact of adding or removing various groups of codes.

Table 2 Sensitivity (95% CI), Specificity(95% CI), PPV(95% CI) NPV(95% CI) of code lists for child maltreatment included sensitivity analysis of additional and exclusion of categories of indicative/at-risk codes in primary care.

When ‘child protection’ codes are removed sensitivity drops markedly from 43.0% (95% CI 38.9–47.3) to 19.2% (95% CI 16.0–22.8) at 12 months or 54.1%(95% CI 47.1–55.6) to 31.1(95% CI 27.3–35.2) at any time. Specificity is improved slightly from 91.4% (95% CI 89.9–92.7) to 96.4%(95% CI 95.4–97.2) at 12 months and 88.1%(95% CI 86.5–89.6) to 93.4%(95% CI 92.1–89.6) at any time.

Sensitivity is improved with each group of codes added with ‘Other Social Care’ codes having the biggest impact on sensitivity (sensitivity at 12 months either side 54.6% [95% CI 50.4–58.8] at any time 66.4%[95% CI 62.2–70.3]). However, these also have a detrimental impact on specificity (12 months 89.65% [95% CI 87.9–90.9] at any time 86.0% [95% CI 84.2–87.6]). Assault codes also have a large impact on the algorithm increasing sensitivity to 48.6% (95% CI 44.4–52.9) in the 12 months either side of an index date and to 58.8%(95% CI 54.5–62.9) at any time. Inclusion of these codes has a small impact on specificity (12 months 91.2% [95% CI 89.7–92.5]; at any time 87.4%[95% CI 85.7–89.0]). The most frequently used ‘assault’ code was ‘[X]Assault’. Refining these codes to only include ‘assaults occurring at home’ did not impact the algorithm as they were rarely used.

When all additional codes are added (‘at risk’, ‘family circumstances’, ‘other social care’, ‘suspected/alleged’, ‘rib/limb fractures’, ‘assaults’) to the original Confirmed CM list sensitivity is increased with a decrease in specificity (12 months around index date sensitivity, 61.1%[95% CI 56.9–65.2] specificity, 89.0%[95% CI 87.3–90.4]; present in records at any time sensitivity, 71.8%[95% CI 67.8–75.5] specificity, 85.0%[95% CI 83.2–86.7]).

False positives

Using the Confirmed CM code list (prevalence subset) we incorrectly identified 203 out of 1652 individuals (12.2%) who had been clinically assessed as not being maltreated. Of these 199 had codes relating to child protection and 29 had confirmed maltreatment codes (either current or historical e.g., ‘Physical abuse’). The most used codes were ‘13IM. Child on protection register’ (n = 142) and ‘13IO. Child removed from protection register’ (n = 90).

False negatives

A total of 254 of the 553 clinically confirmed cases were not identified using our algorithm. Of these 78 (30.7%) had at least one of the ‘Possible CM’ codes. The most frequently occurring of these codes was ‘U3… [x] assault’, followed by ‘13IB0 child in foster care’.

Of the remaining 175 false negatives the most used codes were administrative codes ‘e.g., notes summary on computer’ or routine codes such as inoculations, or paracetamol prescriptions. Other frequently used codes included ‘letter from specialist’, ‘seen in paediatric clinic’, and codes related to chest infections (e.g. ‘chest infection NOS’).

There were codes indicating hospital or emergency department attendance (e.g. ‘seen in hospital casualty’ or ‘emergency hospital admission’) without specific mention of maltreatment.

Validation exercise 2: Hospital admissions data

Results of the validation exercise in hospital admissions data are shown in Table 3. This includes sensitivity analysis accounting for the impact of adding or removing various groups of codes. Sensitivity of Confirmed CM codes was lower in hospital admissions data than in primary care ranging from 9.4 (95% CI 7.2–12.2) to 27.8 (95% CI 24.2–31.8). Specificity was high ranging from 96.4 (95% CI 95.3–97.2) to 99.4 (95% CI 98.9 – 99.7).

Table 3 Sensitivity (95% CI), Specificity (95% CI), PPV(95% CI) NPV(95% CI) of code lists for child maltreatment included sensitivity analysis of additional and exclusion of categories of.

False positives

Using the CM code list, we incorrectly identified 10 out of 1652 individuals (0.6%) who had been clinically assessed as not being maltreated. Of these almost all (numbers masked for confidentiality) had a code for; maltreatment syndromes’ (T74) alongside codes for ‘other maltreatment’ (Y07) and ‘neglect and abandonment’ (Y06).

False negatives

A total of 501 of the 553 clinically confirmed cases were not identified using our algorithm. Of these 235 were admitted to hospital within the 12 months either side of the index date.

Of these 61 (26.0%) had at least one of the ‘Possible CM’ codes. The most commonly occurring of these were codes for Assaults (n = 61) followed by Z638 ‘Other specified problems related to primary support group (excl. maltreatment syndromes, negative life events in childhood and upbringing; n = 28), and Fractures/dislocations/rib fractures (n = 14).

Of the remaining 174 false negatives the most used codes were for ‘Injury, poisoning and other consequences of external causes’. The most used single codes were S00 ‘superficial injury of head’, X59 ‘Exposure to unspecified factor’ and K02 ‘dental caries’. It appears that coding in hospital admissions may be more focused on the injury in need of treatment than on recording the presence of maltreatment. Also of note are the absence of child protection and social care codes that account for a large proportion of correctly identified cases in primary care.

Validation exercise 3: linking GP and admissions data

Combining GP and hospital admissions data to identify CM improved sensitivity slightly compared to using either dataset individually (12 months around index date, GP only 43.0 [95% CI 38.9–473], admissions only 9.4 [95% CI 7.2–12.2], GP and admissions 47.0 [95% CI 42.8 -51.3]; Record at any time GP only 54.1[95% 49.8–58.3], admissions only 10.3[95% CI 8.0–13.2], GP and admissions 57.7[95% CI 53.4–61.8] Table 4). There was only a slight decrease in specificity. Similar results were seen when looking at the Possible CM codes (Table 4).

Table 4 Sensitivity (95% CI), specificity (95% CI), PPV(95% CI) NPV(95% CI) of code lists for child maltreatment included sensitivity analysis of additional and exclusion of categories of indicative/at-risk codes in hospital admission data.

When looking at Confirmed CM 82.1% of cases were found identified in GP data only, 17.9% in admissions data only and 11.6% in both GP and admissions data (Fig. 1). When looking at the Possible CM codes the proportion identified in admissions data increased (GP only 62.4%; admissions only 34.4%; GP and admissions 37.6%).

Figure 1
figure 1

Percentage of cases picked up in each healthcare setting stratified by CM/risk possible and follow-up perioda. (a) Time 1 refers to 12 months either side of index date; Time 2 at any point during follow-up period.

Incidence and prevalence over time

Incidence of Confirmed CM in both GP and admissions data was comparable between sexes. Incidence decreased with increasing age with the highest incidence in those aged < 1 year (IRR GP 3.5[95% CI 3.1–3.9]; admissions 5.6[95% CI 4.2–7.5] 15–17 years as a reference group). Incidence was highest in the most deprived quintiles with more than five times the risk in GP data and six times the risk in admissions (IRR GP 5.4[95% CI 5.0–5.9]; admissions 6.1[95% CI 4.4–8.4]). Individuals with no deprivation data were also at increased risk (Tables 5 and 6).

Table 5 GP Events, incidencea per 1000 PYAR (95% CI), IRRb (95% CI)c of confirmed CM and possible CM by year, sex, age group and deprivation quintile.
Table 6 Hospital admissions, incidencea per 1000 PYAR (95% CI), IRRb (95% CI)c of confirmed CM and possible CM by year, sex, age group and deprivation quintile.

When exploring GP contacts for Possible CM demographic indices broadly mirrored that seen for Confirmed CM with little difference between sexes and a decreasing incidence with increasing age (IRR < 1 year 3.5 [95% CI 3.3–3.7] 15–17 years as reference group). An increase in incidence was seen with increasing deprivation, however this was smaller for Possible CM than Confirmed CM with over double the rate in the most deprived compared with the least deprived quintiles (IRR 2.6 [95% CI 2.6–2.8]). Those with unknown deprivation were also at increased risk.

Admissions related to Possible CM demonstrated differing demographic indices than the Confirmed CM with around double the admissions in males than in females and an increase in incidence rate with increasing age (Table 6). Further exploratory analysis revealed that this was attributable to assaults and fractures, rates of which increase with increasing age.

Incidence of GP events for Confirmed CM increased from 3.1 (95% CI 3.0–3.3) cases per 1000 PYAR in 2004 to 4.3(95% CI 4.2–4.5) in 2019 (IRR 1.3 [95% CI 1.1–1.5]) with a decrease seen in 2020 (Fig. 2; Table 5). Trends over time were similar for Possible CM although the overall rate over time was higher (IRR 2019 1.9 [95% CI 1.7–2.1]).

Figure 2
figure 2

Incidence per 1000 PYAR of confirmed CM and possible CM by setting over time.

Incidence of Confirmed CM related admissions remained low throughout the study period with numbers with less than 80 admissions per year. Admissions for Possible CM initially increased from 2.7 (95% CI 2.6–2.9) in 2004 to 3.1 (95% CI 2.9–3.2) cases per 1000 PYAR I n 2008 (IRR 2008 1.1 [95% CI 0.0–1.3]). Cases then decreased from 2008 onwards with significant decrease in 2020 (IRR 2019 0.9 [95% CI 0.8–1.0]).

Similar trends for both GP contacts and admissions were seen for prevalence (Supplementary File 6 Tables S1 and S2).

Discussion

Main findings

This study demonstrates the creation and first external validation of codes and algorithms to identify cases of CM from routinely collected healthcare data. Sensitivity was higher than that identified by previously published CM code lists. We utilised the validated code lists and found an increase in the incidence and prevalence of both Confirmed CM and Possible CM over time in GP data with a low rate over time of hospital admissions related to Confirmed CM.

We linked a clinically assessed hospital-based CM cohort to cases identified in GP and hospital admissions records to assess the sensitivity, specificity and PPV NPV. Using Confirmed CM codes the algorithm performed with high specificity minimising the proportion of incorrectly identified cases, an important factor for most cohort and case control studies.. The difference between datasets in the ability to identify maltreatment was highlighted, with the majority of cases detected in primary care rather than hospital admissions. Of note, the proportion of cases identified exclusively in admissions data was higher for Possible CM codes than confirmed CM. The true extent of CM is difficult to establish in routine data due to the complexity of the attendance, recognition, recording and coding of maltreatment.

Sensitivity analyses were conducted to encompass a broader range of codes that may indicate maltreatment or individuals who are vulnerable or at risk. This improves sensitivity with a small negative impact on specificity. However, the nature of these codes means that their use should be considered on a study-by-study basis. False negatives frequently had codes for ‘other social services’ (e.g., child in care) and codes for assaults. While these codes may indicate risk and potential maltreatment, this may not always be the case. For example, in the case of social services codes, children may be involved with social services for a number of reasons including child or parental illness/disability unrelated to maltreatment. These individuals may have many co-occurring risk factors and appear similar to those who have codes for maltreatment in large databases of routinely collected data. The care, support and resources needed will be unique to circumstances and grouping these individuals together for research may not be appropriate. Similarly, codes for assault may not always indicate maltreatment. Consideration could be given to apply age constraints to these codes dependent on the study (e.g., assault codes only for those aged under 5 or under 10 years). The ‘Possible CM’ codes are comprehensive, but cannot be reliably used for case ascertainment, without further evidence to substantiate. Further work, through data learning techniques may prove fruitful to improve performance by combining codes, for instance code terms such as ‘maternal concern’ along with ‘emergency admissions to hospital’.

While some iterations of previously published code lists performed with high sensitivity, care must be taken for inclusion of codes that do not indicate maltreatment (e.g., ‘parental benefits’ or ‘self-neglect’). Many of these codes may indicate a potentially vulnerable child with similar risk factors and co-morbidities, however these represent distinct groups of individuals who require different types and levels of support. The importance of care in selecting and validating codes for maltreatment is emphasized.

There was an increase in both incidence and prevalence of Confirmed CM and Possible CM codes from 2004 to 2019 in primary care, with the largest increase seen in Possible CM codes which more than doubled over time. This could reflect a genuine increase, or an increase in GP coding and recognition of vulnerable children. Rates of both CM and Possible CM codes were comparable between sexes, decreased with increasing age and were highest in the most deprived areas. This was most notable for Confirmed CM codes with more than five times the incidence in the most deprived compared with the least deprived areas.

Rates of hospital admissions for Confirmed CM remained at a low rate over time, with an initial increase in Possible CM admissions and a decrease from 2009 onwards.

False positives

Most false positives in GP records were identified from child protection codes. However, without these codes sensitivity is poor, picking up only around one fifth of cases. Child protection is the social services response to harm to a child and it seems reasonable to include these in any list of codes examining maltreatment. These CYP had the same confirmed CM codes recorded in their GP records as the confirmed cases, given this, it would be difficult to further improve specificity. These children may be at high risk of maltreatment, as possible CM is suggested within their medical records, however insufficient evidence of CM was found on the day of assessment.

False negatives

Around half of the clinically confirmed cases were missed using the Confirmed CM code list in GP records. All children within the CVCM cohort would have been assessed because they were considered ‘at risk’ of or a ‘victim of’ CM, therefore all these children would be more likely to have maltreatment-related codes recorded in their medical records than would perhaps be the case for a random sample. We found (30.7%) of these cases did have at least one Possible CM code recorded in their primary care records. Future application of machine learning techniques using the Possible CM code lists may identify combinations of codes that improves sensitivity, with minimal detriment to specificity, to tease apart ‘confirmed’ from ‘possible’ cases and to optimise performance.

Around 90% of cases were missed using admissions data. Of those missed around half were not admitted to hospital in the 12 months either side of their maltreatment assessment date and as such cannot be captured by this data. The narrower coding framework used in hospital admissions also limits recording of maltreatment with child protection and social care codes absent. Coding in hospital admissions is focused on the injury being treated and not on whether this was the result of maltreatment. While specificity was high the under-reporting of CM in admissions data must be acknowledged in any future studies.

Comparison to the literature

Incidence rates fall between those reported in two similar studies using routinely collected primary care data conducted in the UK18,19. These differences are likely attributable to the choice of codes employed to identify cases of maltreatment, which illustrates the need for standardisation in definitions and subsequent validation. Rates were higher than those reported by Chandan et al.18 most likely due to the addition of an extensive list of child protection codes in our algorithm. Rates using the confirmed CM list in the current study are lower than those reported by Woodman et al.19, however they included a wider set of codes, such as ‘out of home care’ and ‘social care’ codes, which we excluded from our Confirmed CM code list but, were present in our Possible CM code lists (incidence using this list was comparable to that found by Woodman et al.19). The findings from all three studies are in agreement that the incidence rates recorded in primary care underestimate the true rates present within the community, although incidence rates for CM reported in the current study are in keeping with ONS data on number of children in Wales subject to a child protection plan7.

The increase in CM over time as recorded in primary care is supported by previous research with routinely collected GP data18,19. The factors driving these increases in recordings in primary care is less clear. It may be due to raised awareness and real improvements in recognition, responding and changing coding and reporting behaviour to record all concerns of CM19, as a result of policies and practice guidance notes published by UK National Institute for Health and Care Excellence (NICE: 2009. 2016, 20179) General Medical Council (GMC, 2012, updated 2018) guidance notes10, National Society for the Prevention of Cruelty to Children (NSPCC collaboration, 20148) and Public Health Wales (PHW, 20154).

There have also been increases in child protection activity in recent years, but it is unclear whether this is because child protection services have become better at recognising and responding to maltreatment. An observational time-series study using official government agency and NSPCC data in England and Wales, found that the incidence of crimes against children, child protection registrations and children entering care had increased steeply between 2000 and 201631. It is difficult to know whether this is part of a trend of increasing reporting, as opposed to rising levels of maltreatment within society. Further time series studies using national survey data may be needed to establish whether CM is becoming more common.

We found a decrease in recorded CM in primary care in 2020. This is in keeping with concerns that cases of abuse may have been missed due to restricted access to protective services during lockdowns and disruptions to usual safeguarding pathways32,33. There are reports of reduced contact with public sector organisations such as schools, hospitals and emergency services in the UK34 and reductions in children added to the child protection register in 202035. Data from one county borough demonstrated the largest decrease in referrals in the youngest children (aged < 3 years)35. This is alongside reports of increased contact with child abuse helplines36. Therefore, there may have been a disparity between incidence in public services and incidence in the community.

Rates of CM in primary care were highest in those aged less than a year old, with older adolescents having the lowest rates. Increased GP awareness of maltreatment in younger children, particularly from health workers surveillance and lower consultation rates for older children may be responsible for these differences19. Younger children are more likely to come to the attention of children’s services, particularly the under 1's7. Fewer adolescents are placed on child protection registers than any other age group. This age group may be more at risk of maltreatment through lack of identification and protection measures31. Admissions for Possible CM were highest in 15–17 year olds, largely attributable to the higher rates of assaults and fractures/dislocations in older age groups. Further research is needed to explore the nature of admissions for assaults and fractures/dislocations in older age groups and whether these presentations may represent an opportunity to identify and support adolescents at risk of maltreatment.

Incidence of Confirmed CM was more than five times as high in the most deprived compared with the least deprived communities in primary care and more than six times as high in hospital admissions. Individuals with no deprivation data were also at increased risk compared with the least deprived quintiles. This finding has been reported in other studies18,19. The relationship between family poverty and the likelihood of a child experiencing maltreatment is already well established13,37,38.

Strengths and limitations

This study utilised a large population level database and the creation of comprehensive code lists for CM. Extensive manual searching was conducted alongside analysis of missed cases. It appears that the codes identified are exhaustive and that sensitivity could not be improved by adding additional codes. This underscores the importance of understanding that healthcare records underestimate the true incidence of maltreatment in the community.

This study further highlights the strengths and limitations of the individual healthcare datasets, their utility in detecting CM, and the use of these datasets in combination for study of CM. The difference in coding systems mean that comparison of rates between healthcare settings may not be appropriate. Further research is needed to assess comparability with settings outside of the UK.

Future research also may look to explore combinations of codes or machine learning to explore patterns of healthcare utilization to better identify CM in healthcare data.

Routinely collected data have limitations for research purposes, and the quality and completeness of data vary across datasets. We have attempted to minimise the impact of this by only including GPs that meet standards for data quality and validating study code lists. CM not resulting in presentation to services or where CM is discussed but not recorded will not be captured here. This is a common feature of all studies using routine data. These data are a reflection of contacts with the healthcare system, not rates of CM in the community.

There is selection bias in the externally validated CVCM dataset, as it represents a cohort of suspected victims/at risk of CM. All these CYP are therefore more likely to have a code suggestive of maltreatment recorded within the medical records. This makes improving performance of the algorithm more challenging, as we are effectively attempting to distinguish between ‘confirmed’ and ‘possible’ maltreatment cases.

Implications

The validation of codes and development of algorithms from routinely collected datasets that identify cases with high specificity are an important step in epidemiological research. These validated code lists will be applicable to other datasets of routinely collected data and the choice of algorithm will vary with study design. This standardisation is important for research purposes to better understand the true effect and consequences of CM. Around half of the cases missed in admissions data were not admitted to hospital and as such could not be picked up in admissions data. Records of maltreatment in GP data appeared much more frequently. This is likely a combination of the higher number of contacts with primary care, communication between GPs and hospital settings being recorded and the extensive coding nomenclature in GP settings, in particular the presence of child protection and social care codes. This makes GPs better placed to record CM. It is important to include this setting to identify cases where possible. Where CM is being explored in admissions data the limitations must be recognized with around 90% of cases likely to be missed, although specificity in this setting is high.

We add to a body of evidence that CM recorded in primary care data has been increasing and further demonstrate a decreased in recorded CM in 2020.The long-term consequences of this drop in recording of maltreatment during the pandemic and potential disparity with community rates are as yet unknown. This has significance for informing future policy surrounding protective public services.. Future research should seek to explore this, and additional support considered for vulnerable children who may not have been identified during the pandemic.

Individuals in more deprived areas were at markedly increased risk of maltreatment. Also of note is the increased risk of those where deprivation data is unknown. This may indicate unstable living arrangements. Higher rates of maltreatment in these individuals may indicate the need for additional support or service provision. Further research is required to explore how best to support the most deprived communities or individuals where living arrangements may be unstable.

Conclusions

Through the validation and assessment of CM-related codes in healthcare records, we create a platform for future epidemiological research. Time-series analysis on CM population-based epidemiological surveys may be needed to establish whether increasing recognition of cases represents rising trends within the community or whether it is simply due to improvements in recognition and responses or a combination of both. Additional support should be considered for individuals from deprived communities and those who may not have been identified as vulnerable during the pandemic.