Background

Despite substantial child mortality reduction in Sub-Saharan Africa, many children under-five are still developmentally at risk because of poverty and related risk factors such as malnutrition, poor health and unstimulating home environments [1]. The magnitude of developmental problems is, however, unknown due to lack of culturally relevant tools for assessing development. In the absence of such tools, it is also difficult to correctly determine the developmental effects of interventions targeting children at risk. In rare studies conducted on children at developmental risk, researchers have used tools originally created for technological societies of Europe and North America by either translating or adapting them with little validation [26]. Sometimes culture specific test items were totally dropped [711] or no adaptation was made [1214]. Among a Western tool adapted and used worldwide is the American Denver Developmental Screening Test [15] or its revised version, the Denver II [16]. The Denver II is a revised version of the Denver Developmental Screening Test developed in 1967. It was standardized in 1989 on 2,096 American children and published in 1992. It is a screening tool used to identify children between birth and six years who have problems in personal-social (self-help skills and socialization with others), problems in fine motor (eye-hand co-ordination, and manipulation of small objects), problems in language (production of sounds, ability to recognize, understand, and use language), and problems in gross motor (large muscle movements such as sitting, walking, jumping). The Denver II has been used in other countries such as Georgia, Singapore and Sri Lanka by adapting and standardizing it [1719]. Though it is a simple, quick and feasible to use at institution and home settings to identify children at developmental risks [20], Denver II has not been adapted and validated for use in many low income countries of Africa such as Ethiopia. An indigenous tool similar to it, in style, however, was created for children in Malawi [21]. By using Denver II as a prototype, new test items that were more culturally relevant for Malawian children were created from the Denver Developmental Screening Test, the Denver II and the Griffiths Mental Development Scales.

The main objective of this research, therefore, was to adapt and standardize the Denver II on children between birth and six years of age in the low income context of Jimma Zone of Ethiopia for a more realistic assessment of their development.

Methods

Study setting

The study was conducted in Jimma Zone, South West Ethiopia. Within this zone, the population was estimated to be 2.8 million. Jimma Town is the Zonal Capital having about 149, 166 inhabitants [22]. The town is home to more than nine ethnic and linguistic communities communicating mostly in a federal language, Amharic, and a regional language, Afan Oromo. With a mixture of both urban and rural life styles, Jimma town represents the diverse socio-economic, multicultural and multi-lingual Ethiopian society.

Adaptation process of the Denver II

The Denver II [16] comprises 125 test items grouped into four domains of child development: 25 personal-social (PS), 29 fine motor (FM), 39 language (LA) and 32 gross motor (GM). These test items are administered using a bell, glass bottle, set of 10 blocks, rattle, pencil, tennis ball, yarn, raisins, cup, white doll, white paper, and baby bottle. Adaptations involved identifying culture specific test items, test objects or materials and then modifying or replacing them to make them culturally relevant. In some cases, instructions for test item administration and criteria of passing were modified.

Classifying test items under ‘cross-cultural’ and ‘culture-specific’ categories

All test items were first categorized into culture-specific and cross cultural items. Cross cultural relevance of tasks in the test items was assessed using International Classifications of Functions [23]. Culture specific items related to movements (e.g. running, jumping, hopping) were identified using taxonomy of movement skills [24]. Other specific movement skills related to sport, complex movement skills and functional movement skills such as activities of daily living, work, and games are culture-specific. Cross-culturality of items other than movements was assessed using cross cultural psychology [25]. Within this process a local team (psychologists, a special educator and pediatricians) and a Belgian team (child psychiatrist, a pediatrician/ nutritionist, a neuroscientist, a physiotherapist and occupational therapists) worked together.

After translations into the dominant languages (Amharic and Afan Oromo) dialect appropriateness was checked.

Pilot studies and draft versions

The test items were then piloted on apparently healthy children of accessible parents who consented orally to participate in the study. Draft I emerged based on a survey conducted in 2009 on 19 households. Four urban and 15 rural families were interviewed about the items which were identified as culture-specific (see Fig. 1). Draft I was tried out on eight urban kindergarten children (26–60 months of age; mean = 42.9; SD = ±14.1). Three local study team members, trained in Denver II test item administration, did the testing and the problematic items were discussed at the multidisciplinary team meeting. Re-adaptations resulted in Draft II which was further explored in 2010 for feasibility and reliability on 24 urban kindergarten children (mean age = 51.4 months, SD ± 8.2 months). Testing was conducted by seven trained kindergarten teachers. Further adaptation resulted in Draft III. Figure 2 summarizes the adaptation process.

Fig. 1
figure 1

Adaptation and standardization process of the Denver II to Denver II-Jimma

Fig. 2
figure 2

Flow of activities in the adaptation and standardization of the Denver II to Denver II-Jimma

Large sample study and standardization

Sampling, inclusion and exclusion criteria

Trained nurses collected data using the third draft. Under-six children in Jimma town whose parents could afford to pay preschool education fees were targeted. Such children were assumed to belong to middle or higher socioeconomic level and thus in a context for optimal development. Quota sampling was used to include children in the following age categories (in months): 0–2, 3–8, 9–14, 15–20, 21–26, 27–32, 33–41, 42–53, and 54–65.

Before testing a target child, the mother was interviewed using a 10-point checklist which listed the exclusion criteria. Children, whose mother reported the presence of any of the following potential developmental risks were excluded: prematurely born, birth weight less than 2500 g, very tiny body at birth, instrumentally delivered, or delivered after 24 h of labor, born twins or triplets, born with a chronic health problem, sick during the first year after birth, having observable impairments affecting sight or/and hearing, or/and mobility, having a mother who was seriously sick during pregnancy. Besides, anthropometric measurements were made to assess the nutritional status and exclude malnourished children. Weight was measured with a calibrated digital weighing scale; mid-upper-arm-circumference (MUAC) with a MUAC tape. Anthropometric indices related to length/height were not used for fear of measurement inaccuracy as some children were nervous while positioning them for measurement. Earlier studies have also used weight-for-age to determine child’s nutritional status [7, 26] because the weight-for-age is considered as more comprehensive than the height-for-age [27]. Assessment was done (if the child was well) in the following sequence: developmental assessment, measuring weight, MUAC.

We dispatched questionnaire and study consent form to parents of private kindergarten attending children in Jimma town. The homes of parents who signed the consent form were visited. From 3502 children, only 1682 (mean age = 31.2, SD = 17.75 in months) who were eligible according to the inclusion criteria were tested.. The age of the children ranges from four days to 73.3 months. Initially, 1552 children were tested at home from 11 January to 21 June 2011 and later, 130 children of lower ages (<10 months) were added. Two children of unknown nutritional status and eighty-three malnourished children were excluded during analysis based on weight for age Z-score (WAZ) ≤ −2, or mid-upper-arm-circumference Z-score (MUACZ) ≤ −2 when WAZ was absent.

The study complied with the Helsinki Declaration [28] and was reviewed and approved by Ethical Clearance Board of Jimma University, Ethiopia, and Comite voor Medische Ethiek Universiteit Hasselt, Belgium. Written and oral consents of parents were obtained and children were always tested in the presence of caregivers.

Assessment of feasibility and reliability

Feasibility of each test item (meaningfulness of test items, their practicality and ease of administration) was documented during data collection and discussed at final consensus meeting. Inter-rater and test-retest reliabilities were assessed for each test item. Ten female clinical nurses worked in pairs alternately as a tester or an observer. Independent scores were generated for each child by a tester and an observer. These scores by testers and observers were calculated as percentages of agreement to determine the reliability of the test items. Inter-rater reliability was tested on 409 children. Within an average interval of 14 days, 147 of them were tested for test-retest reliability. Inter-rater reliability was not calculated during a re-test condition.

Test item administration and scoring system

Test item administration and scoring is the same as in Denver II manual [29]. Each test item on Denver II is presented on a chart by a horizontal bar partitioned into 25, 50, 75 and 90 percentile ages of passing the items. To test a child, his or her age is calculated and a vertical age-line is drawn on the II chart. The testing starts from a test item completely to the left of the age-line. All test items passed by 75 % or more children of same age in the norming sample and by lower ages are counted for a child as expected passes. If a child passes three consecutive test items arranged on Denver II test chart, all items to the left are assumed to be passed because they are items achieved at a lower age. These items are called implied passes. If a child fails three consecutive test items, it is assumed that all other items arranged to the right on the Denver II chart are failed. These items are implied failures. Items passed by a child through testing are tested passes. Implied passes and tested passes are added up as actual passes. A child’s raw score on each test item is marked as tested pass, implied pass, tested failure, implied failure, refusal, or no opportunity. Categorical and numerical scores were derived for statistical analysis.

Categorical score:

For each test item, a binary outcome variable (pass/fail) was created: pass (tested pass items) and fail (tested failure and refusal). “No opportunity” to perform the item, “implied passes” and “ implied failures” were treated as missing values.

Numerical score:

The ratio of actual passes to the expected passes was calculated as a performance ratio score.

Standardization

The objective of the standardization was to determine the ages at which 25, 50, 75 and 90 % of the children pass each of the adapted test items using binary logistic regression.

Data management and statistical analysis

Data within the adaptation process (except for reliability) were analyzed qualitatively. Whether or not a test item was culture-specific or cross-cultural was analyzed using theoretical information and discussion among the research team. Data collected during drafting and re-drafting were discussed at interdisciplinary team meetings comprising local and western professionals. Standardization data were entered into EpiData 3.1, double checked, cleaned and exported to SAS 9.3 and STATA 12.1 for analysis. The WAZ and MUACZ scores were calculated as anthropometric indices using WHO Anthro and AnthroPlus and children’s nutritional status determined against WHO reference standard [30].

Predicted ages at which 25, 50, 75 and 90 % of the norming sample passed each test item were derived from the models and calculated as milestone ages. Using the categorical score “pass/fail”, binary logistic regression model was fitted for each test item by entering child age in days as a single covariate. Predicted probabilities of passing were calculated from alpha and beta coefficients. Goodness of the fit was assessed using Hosmer and Lemeshow test statistic at 5 % level of significance. Items with poor model fit (p-value <0.05) were refitted using cubic splines [31].

Age of attaining milestones by Denver II and Denver II-Jimma norming samples were compared on 90 percentile age. More than 10 % difference was considered clinically significant.

Reliability was assessed at item and domain levels. Item reliability was calculated as a percentage of agreement between a tester score and an observer score (inter-rater), and between the first test and retest scores (test-retest) for the same child. Chance agreements were corrected using Cohen’s kappa. Kappa values by Landis and Koch [32] were used for interpretation: value below 0.20 as slight; between 0.21 and 0.40 as fair; between 0.41 and 0.60 as moderate, between 0.61 and 0.80 as substantial, and between 0.81–1.00 as excellent agreement. Where kappa could not be calculated, percentages of agreement for events were determined: 70 % or higher was considered as acceptable.

Domain reliability was evaluated using intra-class correlation coefficients. First, performance ratio scores were generated for each of the four domains separately. Then, the correlations between tester and observer performance ratio scores at two testing moments (test and retest) were computed for each domain as inter-rater and test-retest intra-class correlation.

Results

Outcome of the adaptation

Of the 125 Denver II test items, 55 (20 PS, 18 FM, 15 LA, 2 GM) were theoretically identified as culture-specific. These 55 items were piloted through exploratory survey and discussed at a consensus meeting. Only 36 of them needed adaptation. The other 19 items were retained as was in the original (Fig. 1). A tryout revealed difficulties with eight (6 LA and 2 PS) test items. Further fine-tuning resulted in Draft II (36 adapted, 1 newly added, 89 original Denver II items). Inter-rater reliability of Draft II was excellent (kappa > 0.83) for all tested items. For items with skewed data distribution kappa could not be computed. Their percentages of agreement, however, were all acceptable (71.4 to 95.2).

Some test items were found practically difficult to administer or still difficult for children to understand even after initial adaptation. Hence, to make sure that test items were feasible to administer, understandable for children and caregivers, further adaptations were made. One item from PS was adapted, and another re-adapted; and the adaptations of three LA items were dropped. This resulted in the Denver II-Jimma-Draft III, which comprises 36 adapted (18 personal social, 10 fine motor, 8 language), 1 newly added (toilet going), and 89 original Denver II items.

At the final consensus meeting following the standardization study, one gross motor (walk up steps) was adapted, the newly added item was dropped, and adaptation of one personal social item was dropped. This resulted in the final Denver II-Jimma having 36 adapted items (Table 1).

Table 1 Descriptions of adaptation made to the Denver II test items to make Denver II- Jimma

Outcome of the standardization

Characteristics of the standardization sample

Nearly equal number of boys and girls participated in the study. About 95 % of the caregivers rated themselves as belonging to middle or higher socio-economic standard.

The Oromo, as the largest ethnic group, seem to have been fairly represented (45.1 %). Only 9.3 % of mothers of children enrolled in the study are illiterate (Table 2).

Table 2 Characteristics of study participants

The Denver II-Jimma Age Milestones

Of the 126 test items separately fitted on logistic model, 66 items fitted well. Three items (PS1, LA1, GM1) could not be fitted because all tested children passed them. Fifty-seven items showed poor fit (13 PS, 18 FM, 11 LA, 15 GM). The model fitness for 39 of these were improved by refitting using cubic splines. On lots of test items, the Denver II-Jimma differed from Denver II on 50, 75 and 90 % ages of attaining milestones (Table 3).

Table 3 The Denver II-Jimma with its age norms (in months) for 25, 50, 75 and 90 % of children passing the test items within the different domains

The 90 % age of milestones attainment on Denver II-Jimma significantly differed from Denver II on 42 (33.6 %) items (9 PS, 6 FM, 15 LA, and 12 GM). Fifteen test items were attained at an earlier age and 27 items at a later age than they are achieved on the Denver II. The remaining 83 (66.4 %) milestones were achieved at a similar age (Table 3).

Reliability of the Denver II-Jimma

Table 4 summarizes the results for the reliability of the Denver II-Jimma at individual test item and overall domain levels. Inter-rater reliability was excellent except for two test items which showed substantial agreement: (“PS5: work for toy”, kappa = 0.74 and FM5: “follow 180 degrees”, kappa = 0.78). Majority (above 90 %) of the test items have a substantial to excellent test-retest reliability. Only one test item (FM 8: “look for yarn”, kappa = 0.33) showed unacceptable kappa values. The Denver II-Jimma also demonstrated very high intra-class correlations on all domains of development (Table 4).

Table 4 Reliability of Denver II-Jimma at item level indicated by inter-ratera and test re- testb kappa values, and at domain-level indicated by inter-rater and test-retest intraclass correlation coefficients

Final consensus on Denver II-Jimma

As bottle feeding is being discouraged in line with WHO’s recommendation, it is agreed that the test item “Feed doll” should be administered without using a toy bottle. Local material “Callee” initially suggested to replace the object “rattle” for administering the item “work for toy” was so risky for babies because it is small and could be swallowed. Hence, the adaptation was dropped. A newly added test item (“toilet going”) was found difficult to perform before the age of six years and was thus eliminated. A gross motor item “Walk up steps” was not possible to assess in homes lacking steps. In such cases, care givers were asked if a child is able to walk up-ward a steep position or cross elevated doorstep. Hence, the Denver II-Jimma finally evolved as a-125-test item tool with 36 (28.8 %) adapted test items: 17 PS, 10 FM, 8 LA and one GM items.

Discussion

In order to provide early intervention for children developmentally at risk, correct assessment of their developmental status is an essential first step. Since development is influenced by the sociocultural contexts, instrument assessing child development should take culture into account. The tools should also be psychometrically valid. While child development tools created in western cultural contexts are psychometrically valid, they may not be culturally relevant to use with African children. Many agree that culturally relevant developmental assessment tools should be either created [33] or adapted from tools developed in other cultures [5, 34]. Adapting an existing tool is less expensive and more suitable to maintain construct validity of a tool across different settings.

In this study the Denver II created in the Western socio-cultural context, was adapted and standardized on Ethiopian children in Jimma town. The Denver II-Jimma evolved as a culturally relevant tool, ready to use for children from birth to six years in the multicultural and multilingual communities in the Jimma Zone, south west of Ethiopia. In the adaptation process, 36 items of the 125 in the Denver II test were modified. No test item was dropped, and this would guarantee to maintain the objectives and content validity of the original tool. Content validation was conducted by going through each test item at different meetings by the multidisciplinary research team with knowledge of local and western cultures. First, the objective of testing each Denver II test item, specific skill or competence assessed was discussed. Then the equivalence of the adapted version of the test item with the original one was examined in line with the objective, skill or competence assessed. This process was meant to maintain both content and construct validity.

Adaptation was predominantly in personal social test items. Only one gross motor item was adapted. This is consistent with other studies [19, 34]. Personal social skills seem to be more prone to socio-cultural influences than gross motor skills.

Feasibility and reliability of all test items were ensured during the adaptation process through piloting and fine-tuning. Good inter-rater and test-retest reliabilities were demonstrated during testing at schools by kindergarten teachers, and, at home by clinical nurses indicating that the Denver II-Jimma is reliable to use at different settings by different professionals. A strong intra-class correlation across all the domains also shows good overall reliability. Similar to the Denver II [16], inter-rater reliability seems to be better than the test-retest reliability.

Milestones attainment on Denver II and Denver II-Jimma were compared on 90 percentile ages. Though there is no significant difference on majority (66.4 %) of the test items, a clinically significant difference was observed on 42 items. Such a difference was also reported in earlier studies [1719, 3439]. The difference was found for both the culture specific and the cross-cultural items. This finding of achieving milestones at different ages seems to justify the need to have separate normative standards for valid interpretation of test results from different socio-cultural contexts.

Differences are observed in the number of Denver II test items adapted in different settings. While 36 test items are adapted in the present study, only two items (personal social item “’play-pat-a-cake’ and language item Baba or Mama, nonspecific’) were modified while standardizing and adapting Denver test to Tbilisi [40] children in Georgia. Only five test items (4 personal social and one language) were modified while adapting and standardizing Denver II on Sri Lankan children [19]. In Singapore, 77 Denver II items (67 %) were shared with the adapted and standardized Singaporese version [17]. Such findings seem to show that the number of test items needing adaptation varies in different socio-cultural contexts.

There are also differences in ages of attaining milestones in different settings. With a difference of more than 10 % on 90 percentile passing age, the Singapore differed on more than 30 items (20.1 %); the Denver–Tbilisi on 25 items (24 %), the Denver II-Jimma on 42 items (33.6 %) with the original Denver II. A comparison of the Sri Lankan norm with the Singapore and the Denver II norms also showed a difference of more than one month in ages of attaining milestones in more than 75 % of items in all domains [19]. The differences in ages of attaining milestones in the present study produced findings that are expected and consistent to earlier studies.

Taking in account that the Denver II-Jimma should be an ‘ideal reference’ to detect children at developmental risk, and monitor the general recovery of the child during rehabilitation, much care was spent on the standardization. Standardization therefore was done on a large sample of healthy children by excluding those with obvious disabilities and at risk during pre and perinatal stages of development. Children from comparatively very low-income families were not included for fear that such children are at higher developmental risks related to malnutrition and developmentally non-stimulating home environment. Moreover, significantly malnourished children were also excluded from the analysis since malnutrition affects development.

An important aspect of the adaptation process is the involvement of an interdisciplinary team comprising academicians and practitioners from both the western and the local cultures. They were found instrumental in understanding both contexts while making relevant adaptations. Such a team composition was either not reported or considered in other similar studies.

The study is not also without limitations. First, though the Denver II is valid and is still in use in the western world, it was standardized 24 years ago. This standard is, however, still in use. Therefore, this study compared the data from two different time points. Second, though it is claimed that adaptation improves sensitivity [40], the Denver II-Jimma could still be a subject of limitation of the Denver II: weak specificity [41]. With adaptation of the traditional scoring and interpretation, however, the Denver II is regarded as more suitable for children with medically complex conditions [42], and a valid tool, particularly in assessing the language and fine motor skills of children with neurodevelopment risks [43].

Conclusion

This study demonstrated how a Western tool can be effectively adapted to a non-Western setting. With high inter-rater and test retest reliability, the Denver II-Jimma quickly assesses development of under six children, and is easy to use by first-line health workers and kindergarten teachers at home, school or health centers. Difference in milestones achievement ages on the adapted tool and on its originating Western tool shows that creating a local standard using the adapted tool is necessary for a valid interpretation of results. The study was conducted on children of diverse cultural, linguistic and ethnic communities. Hence, the result could be generalized to many other populations of Ethiopian children. However, some minor modifications may be needed in certain contexts which significantly differ from the present study setting. Future research has to examine if the tool can be used in other similar settings.

Abbreviations

FM, gross motor; GM, gross motor; LA, language; MUAC, Mid-upper circumference; MUACZ, Mid-upper arm circumference z score; PS, personal social; WAZ, weight-for-age z score.