Introduction

The benefits of physical activity (PA) have been well established [1]. In children 6–17 years of age, participation in PA is associated with improved cardiorespiratory and muscular fitness, bone health, cognitive functioning (e.g., executive function, memory and attention), and academic performance [1]. Children who participate in PA also have fewer cardiovascular risk factors and psychological impairments [1]. To gain the benefits associated with PA participation, current global recommendations suggest that children 5 to 17 years of age accumulate at least 60 min of moderate to vigorous PA daily [2]. However, in North America about a third of children 6–17 years of age do not meet current PA guidelines [3,4,5]. Furthermore, trends over time show that levels of PA in children have not historically changed and that participation in PA decreases in adolescence by as much as 60 to 75% from childhood [3, 6,7,8].

There are numerous social and environmental factors that can influence children’s participation in PA. In this study, we focus on the role of parents as a growing body of research highlights the role of parenting practices on children’s participation in PA [9, 10]. Parenting practices refer to the specific goal directed strategies that parents use to influence their child’s behaviours, including their PA [11]. A recent review concluded that parental role modeling of PA and provision of logistic support for PA were more consistently associated with children’s level of PA across studies than other PA parenting practices (e.g., monitoring and encouraging PA) [10]. The evidence for role modeling and parental logistic support influencing children’s PA was stronger from qualitative studies (87.5 and 62.5%, respectively) than quantitative studies (23.3 and 53.3%, respectively) [10]. Other aspects of PA parenting practices, including PA monitoring and encouragement, are associated with children’s level of PA; however, the evidence remains unclear as few studies have focused on these constructs and in some cases the results are contradictory [10, 12]. For example, Arrerondo et al. [13] found a positive association between PA monitoring and child PA in a sample of young children whereas Bradley et al.’s [14] study found that 9–15 yr. old boys with less parental monitoring but more encouragement had higher levels of PA. Comparing findings across studies or trying to assess the true impact of parenting practices on children’s PA behaviours remains challenging as there is considerable variation and little consistency in how PA parenting practices are operationalized [9, 12, 15].

A recent review identified more than 74 tools for measuring PA parenting practices of 5–12 year-old children [16]. These measures varied in their conceptual influences depending on the field in which they were developed (e.g., physical activity or psychology) and whether they were influenced by any theoretical perspectives [9]. There is a need to bring more rigour to PA parenting practice measures by integrating knowledge from the parenting literature in the operationalization of PA parenting practices constructs [9, 17]. Our team initiated a process to help standardize the measurement of PA parenting practices of 5–12 year-old children [16]. Specifically, concept mapping methodology was utilized to engage 24 experts from 6 countries in reconceptualising the constructs of PA parenting practices with the parenting literature as the underlying framework. This process identified 12 constructs covering three main domains of parenting namely structure, neglect/control, and autonomy promotion [16]. Following this process, an item bank of 100 PA parenting practice items covering 12 PA parenting concepts was developed.

The ultimate aim of this paper is to develop a repository of calibrated items to standardize the measurements of PA parenting practices among school age children (5–12 year-old children). Using the NIH PROMIS initiative [18] procedures, the specific aims of this study were to: a) Assess the structural validity of the PA Parenting Practices (PAPP) item bank. Using Confirmatory Factory Analyses (CFA), we tested whether the PAPP item bank replicated the underlying factorial structure of the published PA conceptual framework which includes 3 domains and 12 constructs to measure PAPP [16]. As many of the constructs within each parenting domain were expected to be highly correlated, we used advanced psychometric methodologies (confirmatory bi-factor item analyses with Item Response Modeling – IRM) to explore whether the hypothesized conceptual framework could be simplified to measure more general constructs; b) Determine whether the psychometric properties of the constructs or general constructs are invariant by parent sex, ethnicity of parent, and household income; and c) Determine the efficiency of the item bank – which involves assessing whether each construct or general construct can be assessed with fewer items.

This paper used both classical (CFA) and more advanced psychometric methods (confirmatory bi-factor item analyses) to develop more efficient ways of assessing PAPP – a methodological approach that has not been used in this field. Measuring the breadth of PAPP with fewer constructs and items is necessary to ensure uptake of these measures in practice.

Methods

Participants

Canadian parents of 5–12-year-old children were recruited from an internet research polling firm (InSight West, Canada) to complete the (Physical Activity Parenting Practice) PAPP questionnaire. Data were collected from November 2016 until January 2017. All participants previously consented to be part of the InSight West web-based panel. Participants were required to be a parent or guardian of a 5–12-year-old child. Participants were excluded if their child had a disability that limited their participation in any PA. A quota sampling approach was used to ensure representation by parent’s sex, ethnicity (Caucasian, East/Southeast Asian, South/West Asian and others), and income (using the 2015 median income of double and single income earners accordingly). Given the length of the PAPP questionnaire, parents completed the questionnaire in two waves about 2 weeks apart. Of the 945 panel members who completed the screener, 144 did not meet eligibility criteria, 158 dropped out, 16 opened the wave 1 questionnaire but did not respond to the parenting questions, and 626 completed the wave 1 questions (response rate = 66%). Of these, 479 completed the wave 2 questions (response rate 51%). Demographic characteristics of the sample are shown in Table 1. Ethics approval for conducting this study was obtained from the first author’s Research Ethics Board before study commencement.

Table 1 Demographic characteristics of participants (N = 626)

Development of physical activity parenting practices (PAPP) item bank

The PAPP conceptual framework, [16] developed from expert input by our team, guided the development of the PAPP item bank. Briefly, the PAPP conceptual framework identified 12 constructs that regrouped onto three main domains of parenting practices which have been widely used in developmental psychology [11, 19, 20] and recently adopted in the field of PA [9]. Specifically, the PAPP conceptual framework include 12 constructs that regrouped into the following three domains of PAPP: a) the neglect/control; b) the autonomy promotion, and c) the structure (for a full description see ref. [16]. Briefly, the neglect/control domain of PA parenting practices includes two constructs that measure permissive, coercive, and pressuring parenting practices. By regrouping these two constructs, two parenting practice concepts believed to be ineffective are brought together. The coercive and pressuring components includes criticizing, nagging, forcing, and punishing while the permissiveness is defined as not providing any parental guidance and allowing the child to make decisions about their PA. The autonomy promotion domain regroups four constructs (encouragement, guided choice, parental involvement in child PA, and praising/rewarding child) that assess how responsive parents are to their child’s PA needs. Encouragement includes verbal strategies that parents use to motivate child PA such as positive verbal reinforcement, reasoning, and highlighting behaviours of role models. Guided choice includes strategies that parents use to support their child’s independence and the strategies they use to involve their child into PA decisions such as choosing PA options and negotiating with the child. The parental involvement in child PA construct measures the extent to which parents are involved in their child’s PA/sport (e.g., talk about their sports, watch them participate, teach them skills to improve their skills, and volunteer in their activities). The praise/reward construct measures the strategies parents use to reinforce participation without coercing participation such as providing a small token of appreciation. Finally, the structure domain of parenting includes the strategies that parents employ to ensure their child participates in PA and regroups six constructs. These include a) co-participation - the extent to which parents are physically active with their child; b) expectations - whether parents have clear expectations for PA participation; c) facilitation - the tangible ways in which parents support PA participation such as enrolling, transporting to activities, taking children places to be active, and providing the financial support to be active); d) modeling - the extent to which parents are engaged in PA and model an active lifestyle; e) monitoring - involves tracking or being aware of their child’s level of PA; and f) restrictions for PA related to safety or academic concerns - includes the reasons why parents may limit involvement in PA.

Following the development of the PAPP conceptual framework, our team populated the item bank to cover the three domains and 12 constructs by selecting items from three different sources: 1) items from previous instruments (74 instruments and 608 items) which we identified from a review of the literature [16]; 2) items that had been developed to match the strategies parents reported using to encourage or discourage their child to be physically active – where 135 parents were asked 5-open ended questions and the data were qualitatively analyzed [21]; and 3) new items developed for this purpose when few items were available from the first two sources or the content did not represent the breath of the construct assessed. As many items were taken from various sources, all items and response formats were standardized. LCM and TMO took the lead in developing and standardizing the items and ensuring that the breadth of the content aligned with the operational definitions of each construct [16]. The content of the item bank (100 items) was iteratively reviewed by the larger team of investigators (MB, SOH, & TB). The items were then cognitively tested using both a think-aloud protocol and probing protocols with a total of 10 Canadian parents [22]. As the PAPP item bank included 100 items, the think-aloud protocol was primarily used to review newly developed items. This involved asking the parents to read each item out aloud and: a) verbalize their understanding of the items; b) state the process they used to retrieve a relevant response to the item, and c) articulate how they went about selecting a given response. As part of this process, parents were asked whether the response format was appropriate to determine whether certain items were best asked with a generic response format (using a 5-point response format ranging from never to very often) or a more specific response format (using a response format that quantified the number of times they used a given parenting practice ranging from never, 1–2 times per month, 3–4 times per month, etc. …). For the remaining items, parents were asked to complete 1 page of the survey at a time and asked to indicate whether some items were unclear, difficult to answer, or whether they could not answer a given item. Essentially, parents were asked to go through the think-aloud protocol on their own but to discuss with the interviewer any issues they encountered. As part of this process, the interviewer asked probing questions at the end of each page (e.g., whether they understood specific questions, struggled with mapping their response to the format used). This process continued until the full questionnaire was completed. This process was used to develop the items and prepare them for the psychometric evaluation. For the analyses, only the items that pertained to PA were analyzed (96 items) and the 4 items that pertained to screen time were dropped from the analyses.

Data collection

The participants completed the PAPP items in two waves as its length made it impossible to complete it in one administration, with all administrations completed online. At wave 1, the 626 web-based panel members completed the demographic questions and were administered 76 of the 96 PAPP items. The amount of missing data for these 76 PAPP items ranged between 5% to less than 1%. One to 2 weeks after completing the wave 1 questionnaire, the participants completed the remaining 20 PAPP items. A total of 479 (77%) participants completed the wave 2 questionnaire and among those who completed the wave 2 questionnaire the amount of missing data for the 20 PAPP items ranged between 4% to less than 1%, except for one item (19% missing data). To determine whether the pattern of responses differed between those who completed only the wave 1 PAPP items (N = 626–479 = 147) versus those who completed both waves of data collection (N = 479), chi-square tests and t-tests were conducted. The 76 PAPP items administered at wave 1 served to assess whether the pattern of responses were similar between wave 1 respondents and respondents of both waves. For the 76 items, the pattern of responses was not significantly different between responders (p < .01). In addition, the demographic characteristics (parental sex, age, ethnicity, education, and income as well as child sex and age) of the wave 1 responders versus the respondents of both waves were not significantly different (p < .01). This suggest that the data are likely missing at random. The results section provides the list of items that were administered and completed only by the wave 2 participants.

Structural validity of the PAPP item bank (aim 1)

Confirmatory factor analyses and confirmatory bi-factor item analyses

Initial Confirmatory Factor Analysis (CFA) assessed whether the 12 PAPP constructs were structurally supported. As the CFAs were followed by confirmatory bi-factor item analyses, CFAs were conducted for each domain. This was done as we assumed that correlated constructs within each domain could be simplified into more general construct(s) that would be conceptually meaningful. Analyses tested the following hypotheses of whether: a) the neglect/control domain included two constructs that measured permissive and pressuring parenting practices; b) the autonomy promotion domain included four constructs that measured encouragement, guided choice, parental involvement in child physical activities, and praising/rewarding child for being active; and c) the structure domain included six constructs that measured co-participation, expectations, facilitation, modeling, monitoring, and restrictions for PA related to safety [16]. Note that the original conceptual framework for restrictions also included restrictions for academic concerns. Restriction for academic concerns was not operationalized as the questions overlapped with the facilitations questions about enrollment and so we opted to not assess the reasons for not enrolling in PA.

As many of the constructs within the PAPP domains were significantly correlated (r ≥ .70), a confirmatory bi-factor item analysis followed these initial CFAs to assess whether the constructs for a given domain of parenting practices can be collapsed into a simpler structure. Fig. 1 provides a schematic of these analyses. After a suitable bi-factor structure was identified, the analyses proceeded to select items that measured general constructs of PAPP but conformed to an “essentially unidimensional” structure which confirms that items measure a construct that is on the same continuum.

Fig. 1
figure 1

Analytical steps to regroup correlated constructs into general constructs

MPlus version 8 was used for the CFA and confirmatory bi-factor item analyses [23]. To deal with the ordinal nature of the data and missing data, the weighted least squares means and variance-adjusted (WLSMV) estimation was used with a full information procedure (approach used to deal with the missing data) [24]. As there are no agreed standards to assess model fit with CFA or the confirmatory bi-factor analyses, a number of indices were reviewed to assess overall model fit: Steiger’s Root Mean Square Error of Approximation (RMSEA), with an upper value of 0.08 indicative of a reasonable fit, although some suggest a cut-off of 0.10 might be indicative of reasonable fit with more complex models; the Comparative Fit Index (CFI) with values ≥0.95 suggestive of a good fit; and the weighted root mean square residual (WRMR) with a value closer to 1 indicative of a good fit [25,26,27,28]. As part of the confirmatory bi-factor item analyses, the I-ECV (Explained Common Variance for a Single Item) [29] index was computed to identify items that were strongly related to the general construct. The I-ECV was computed as a ratio of the squares of the item’s loading on the general construct divided by the sum of the squares of the general and the specific construct the item measured. Items that had an I-ECV ≥ .70 and/or a factor loading on the general construct approaching .50 were retained for the IRM analyses [29]. Items with lower I-ECV or factor loadings were retained only if it made conceptual sense but this strategy was seldom used.

Item response modeling (IRM) analyses

After initial candidate items were selected, unidimensional IRM models were fitted for each construct identified using the MIRT function in R statistical package [30]. A two-parameter graded response model was fitted to the data using the Expectation Maximization algorithm with missing data imputed. To determine whether local dependence among the items was sufficiently addressed, the residual correlations were evaluated. Any residual correlations greater than .25 were evaluated and items were further deleted to ensure that the analysis satisfied the IRM assumption of local dependence. Overall model fit was assessed with the M2 chi-square statistic and the RMSEA, CFI, and SRMR using the criteria above and a value between .05 to .08 for the SRMR as indicative of good fit [31]. These analyses served to reassess the unidimensionality of the constructs. Note that the fit indices were available when the construct had 8 or more items but local dependence could be assessed with fewer items. Most importantly, these analyses generated the item parameters (i.e., discrimination and difficulty parameters) and item information, which are needed to assess the efficiency of the item bank (aim 3) and identify the most informative items.

Invariance properties of the item bank (aim 2)

To ensure the constructs measured by the PAPP item bank can be used to make valid group comparisons, differential item functioning (DIF) and differential response functioning (DRF) analyses were conducted using the MIRT R Software [32]. The aim of these analyses was to assess whether sub-groups of parents who have similar scores on a given construct had similar patterns of responses. If the scores were similar, it indicated that the scores resulted in similar interpretations across groups. Potentially invariant items were first assessed with the DIF analyses, followed by the DRF analyses. Invariance was tested only for constructs that had more than three items, as DRF cannot be assessed with fewer items. DRF tests for two types of response bias: 1) signed-DRF which tests whether bias is consistent across all scores in a given construct; and 2) unsigned-DRF which tests whether there is an interaction in the bias across scores on a given construct. Items retained from the structural validity analyses were assessed for DRF by parents’ sex, income, and ethnicity. Items identified to have significant DRF (p < .01) between groups were removed from the item bank.

Efficiency of the PAPP item bank and informative items (Aim3)

While the primary purpose of our work was to develop an item bank of PAPP, shorter measures may be needed for some applications. Therefore, for any constructs that had a reliability >.80, the FIRESTAR software was used to help determine how to shorten the scale [33]. Briefly, FIRESTAR uses Computerized Adaptive Testing (CAT) simulation to assess how many items, and which items, should be retained to preserve a reliability for a given scale ≥ .80. The CAT simulations used the graded response model with the maximum posterior weighted information to select items. The maximum standard error for the estimate was set at .447 which corresponds to a reliability of .80. Input for these analyses were the item parameters estimated from the IRM analyses.

Reliability of constructs

Cronbach alpha was used to assess the reliability of the responses following the CFAs. In addition, the IRM empirical reliability was also reported and this reliability is similar to Cronbach alpha except that it takes into account the ordinal nature of the responses (Likert type scales).

Results

Neglect/control domain of PAPP

Structural validity

As shown in Table 2, the hypothesized 2-factor structure for the neglect/control PAPP domain was supported by the initial CFA analyses. Given that the initial CFA found a high correlation between the permissive and pressure constructs (r = .85), it was not surprising that the results of the bi-factor item analysis found that all items loaded highly on the general dimension (coercive control), except for item 1 “Allow child to stay inside” (as the I-ECV for this item was less than .50 it was dropped) (see Tables 2 and 3). The IRM analyses further supported the unidimensionality of the 19 items as the model had an adequate fit and no local dependence was observed.

Table 2 Overview of the Confirmatory Factor Analyses (CFA), Bi-Factor item analyses (bi-factor), and Item Response Modeling analyses (IRM) for the Physical Activity (PA) parenting practices item bank
Table 3 Results from the Confirmatory Factor Analyses (CFA), bi-factor item analyses, and Item Response Modeling (IRM) analyses for the Physical Activity (PA) parenting practices item bank

Invariance properties

DIF and DRF were assessed on the remaining 19 items and none of the items exhibited any significant DIF or DRF by parents’ sex, income, or ethnicity.

Efficiency

In total, the coercive control construct includes 19 items and the overall IRM empirical reliability of the bank is .95. From the CAT simulations, it was estimated that 7 items are needed to maintain the reliability of the total scores at .80 with the correlation between the short and long form equal to .95. Table 4 shows the items retained in the short form.

Table 4 Physical activity (PA) parenting practices item bank – full list of items by domain and list of items included in the short form

Structure domain of PAPP

Structural validity

As shown in Table 2, the hypothesized 6-factor CFA for the structure domain did not fit the data. In the revised solution seven items were deleted to eliminate items that had high correlated errors or did not load as hypothesised and included the following changes: a) the co-participation, modeling, and monitoring constructs were combined into a new construct labelled nondirective support; b) the expectations and some items from the facilitation construct were regrouped into supportive expectations; c) some of the remaining items from the facilitation construct remained on that construct; and d) the restriction construct was split into two constructs, namely restricting inside PA and allowing unsupervised outside PA. As shown in Table 2, this revised 5-fator solution had an adequate fit. As none of the constructs were highly correlated (all r’s were less than .70) (see Table 3), the analyses proceeded directly to the IRM. From the IRM analyses, three items were deleted as they had high local dependence which suggested that their content was redundant with other items.

Invariance properties

All items for the structure constructs were invariant by parents’ sex, income, and ethnicity.

Efficiency was assessed for two constructs, namely nondirective support and supportive expectations, as the IRM empirical reliability was greater than .80 and these constructs included more than five items. From the CAT simulations, it was estimated that the reliability of the total score can remain at .80 if three items are retained for the nondirective support construct and three items are retained for the supportive expectations construct (See Table 4 for list of items). The correlations between the long and short forms were .90 and .76 for the nondirective and support constructs, respectively.

Autonomy support domain of PAPP

Structural validity

As shown in Table 2, the hypothesized 4-factor solution for the autonomy promotion domain of PAPP did not fit the data. However, a 4-factor solution that eliminated 12 items and moved the praise items to the involvement and encouragement constructs with the reward items forming a reward construct had an adequate overall fit. Correlation among constructs ranged from .16 to .75. As the involvement and encouragement constructs were highly correlated (.75), a common bi-factor item analysis was run. The overall fit of the confirmatory bi-factor solution was acceptable and supported combining the encourage and involvement constructs to measure a more general construct renamed autonomy support (see the solution in Table 3). The IRM analyses further assessed the unidimensionality of the resulting three constructs that assessed autonomy promotion. This process led to the deletion of more items that were redundant as they showed some local dependence with other items on the constructs.

Invariance properties

The DRF analyses found all items were invariant by parents’ sex, income, and ethnicity.

Efficiency was assessed for two constructs, namely autonomy promotion and guided choice as both had IRM empirical reliability greater than .80 and included 5 or more items. From the CAT simulation, the reliability of the scores can be maintained at .80 with two items for the autonomy promotion construct and with 4 items for the guided choice constructs (as indicated in Table 4). The correlation between the short and long forms was .86 and .95 for the autonomy support and guided choice, respectively.

Overview of results and alignment with the conceptual framework

Figure 2 shows the extent to which the findings align with the expert informed conceptual framework.16) The CFA results directly aligned for the neglect/control domain of parenting, are quite similar for the autonomy promotion domain of parenting and resulted in some changes for the structure domain of parenting. As the analyses progressed to identify whether some constructs could be collapsed to measure more general constructs, the alignment with the guiding conceptual framework began to diverge (see how the bi-factor and IRM results modified the structure). Fig. 2 also presents the operational definitions of the PAPP item bank constructs.

Fig. 2
figure 2

Expert informed Physical Activity (PA) conceptual framework and its alignment with the results

Discussion

This is the first study that utilizes a conceptual framework developed by an international panel of experts to achieve parsimony and gain efficiency in measuring PA parenting practices of 5–12 year-old children. The advanced psychometric analyses were anchored to the expert panel conceptual model to inform the structural validity of scores derived from the PAPP item bank. The underlying conceptual framework of the PAPP item bank was mainly supported by the CFAs. Some changes were made to the underlying structure based on the CFAs including: a) combining smaller constructs into one general construct (for example, combining the modeling, co-participation, and monitoring constructs of the structure domain into a general construct assessing nondirective support); or b) splitting a construct into two smaller constructs (for example, splitting the restrict for safety reason construct into a construct that assessed indoor PA restriction and another one that assessed allowance for outside PA). While the CFAs supported the structural validity of 11 constructs, the bi-factor item analyses and IRM analyses supported collapsing correlated constructs into more general constructs. These analyses further reduced the number of constructs measured by the PAPP item bank into nine constructs with internal consistency ranging from .79 to .94 and a total of 65 items. As seven of the PAPP constructs has reliability greater than .80, CAT simulations could further reduce the number of items for those constructs while maintaining high reliability. The CAT simulations found that the final nine constructs can be efficiently assessed with as a little as 31 items (see Table 4 and a full list of items is provided in Appendix A). Overall, the PAPP item bank has excellent psychometric properties and provides an efficient way to assess PA parenting practices.

The PAPP item bank addresses an identified need to align PA parenting practice measures with the parenting literature as a way to bring more rigour and consistency in operationalizing the PA parenting practices constructs [9, 17]. To achieve this, the conceptual framework which served as the foundation for the construct validity analyses: a) integrated knowledge of the parenting literature to provide assessment of the main domains of parenting (i.e., neglect/control, structure, and autonomy promotion) [16], b) consolidated knowledge from existing research, and c) incorporated inductive inquiry of parents to ensure the PAPP constructs included strategies used by parents [21]. The expert-informed conceptual framework provided a structure which we expected to validate with the CFAs. For the neglect/control and autonomy promotion domains of parenting, there was strong alignment between the CFAs and the expert-informed conceptual framework. In the original conceptual framework, praise and rewards were combined as one construct but the CFAs supported combining the praise items with encouragement which makes sense as praise is a form of encouragement. For the structure domain of parenting, there were more differences between the CFAs and the expert-informed conceptual framework than anticipated. The co-participation, modeling, and monitoring constructs were so highly correlated that they ended being combined through the CFA analyses and formed a more general construct. The CFAs did not support analyzing these three constructs separately. It makes sense that modeling and co-participation would be highly correlated as co-participation is likely higher among those who are highly active parents. Also, it is likely that active parents who value PA tend to be more aware of their child’s level of PA and monitor whether they should engage in PA with their child. Combining these separate constructs into a general construct that measures non-directive support is a departure from what has been previously done but needed to account for the overlap in these concepts. Finally, the CFAs found that the content for the restriction for safety split into two constructs. Conceptually the split regrouped items that focused on unsupervised outdoor activities and the other focused on injuries associated with playing roughly indoor. Overall, the skeleton of the expert-informed conceptual framework can be mapped out onto the CFAs as when a new construct emerged it still combined elements of the original conceptual framework without loosing any content or it added nuances by splitting a construct into two.

Interestingly, the two constructs within the neglect/control domain of parenting, namely the permissive and pressure constructs combined into one global construct. At the time the conceptual framework was developed there were extensive discussions as to whether permissive should be listed under the control domain of parenting and when we decided to so we renamed this domain of parenting as neglect/control. In the conceptual framework, pressure was defined as including the “coercive” component of control based on Baumrind’s definition and Grolnick and Pomerantz’s operationalization of control and permissive was defined as “lack of willingness to act as a socializing agent” which aligns with Darling and Steinberg’s operationalization of control [11, 19, 34]. While permissive and pressure have been considered as distinct constucts in the PA literature, there is some support from the parenting literature to align permissive and pressure onto a control continuum [11]. In fact, the bi-factor and IRM results suggest that as operationalized in the PAPP item bank the permissive and pressure constructs are on the same continuum and measure one dimension. This represents an important modification but one that aligns with the existing parenting literature.

Measurement of general constructs is not something that has been typically done in the PA parenting field but has occurred in fields that have integrated advanced psychometric methods in instrument development, such as outcomes research [18, 35]. This process is more easily understood when collapsing of constructs arises from the CFAs, meaning the structure needs to be simplified in order for the solution to fit the data. As discussed above, this was the case for the nondirective support construct as the CFA regrouped three constructs that have generally been independently assessed (co-participation, modeling, and monitoring). The bi-factor and IRM analyses further supported collapsing some of the constructs found in the CFAs into more general constructs. While some researchers may opt to use the results of the CFA analyses, it is worth highlighting the collapsing into more general constructs occurred because these constructs were found to be highly correlated in the CFA analyses. Furthermore, the bi-factor and IRM analyses further tested whether these constructs are unidimensional, meaning it assessed whether these items measure a single continuum. The combining of constructs into more general constructs represent a departure from the expert-informed conceptual framework. However, it is important to note that the process used to develop the conceptual framework did not discuss how the PAPP constructs may be combined to measure more general constructs. Therefore, the conceptual framework was primarily used to guide the CFAs and the analyses beyond the CFAs served to expand our conceptual understanding and move our thinking into assessing more general PAPP constructs. This represents a departure for the PA literature but an important one to move the field of PA parenting forward. As the process to identify general constructs was analytically informed, further studies should cross-validate the stability of these newly identified constructs.

It is also worth noting that the analysis prioritized collapsing general constructs within the main domain of parenting to ensure the analyses aligned with the parenting literature and the conceptual framework that guided this work. The correlations among the final constructs ranged in absolute value from .01 to .66 and as such did not meet the rule to proceed with a bi-factor analysis correlation because the correlations were not sufficiently high (>.70). This further highlights that the analytical process reduced the conceptual framework into the smallest number of constructs.

To optimize the use of the calibrated item bank, the next steps are to take advantage of the features associated with having a calibrated item bank [35]. This includes: a) allowing researchers to either select or supplement the constructs with items that are relevant for their study while ensuring that their results can be compared with other studies; b) administering the item bank with CAT as this process would optimize the efficiency of the administration and precision of the total scores; and c) using simulated CAT to develop short forms [35]. The first two features associated with item banking will need to be addressed in continued extension of this program of research. Whereas, the last feature was developed in this paper with efficient short forms developed. The CAT simulations identified that the nine PAPP constructs can be efficiently assessed with 31 items from the existing item pool of 65 items, highlighting the utility of using advanced psychometric methods to develop measures of PAPP that will decrease participant burden. In addition, future extension of this program of research needs to further assess the construct validity of the PAPP item bank to determine whether the PAPP constructs predict child’s PA behaviours. This can help focus which constructs are important to measure in future studies.

It is important to highlight that while short forms have some advantages such as decreased participant burden the short form should be cautiously used. The short form does not capture the full breadth of the constructs and as such should be interpreted cautiously. As the short forms are highly correlated with the long forms (correlations ranging from .76 to .95), the short forms are expected to yield very similar rankings to the long forms and as such can provide good indicators of parenting practices for the overall constructs. The short forms are likely better suited for correlational studies than for interventional studies where capturing the full breadth of the content is important. In addition, in the intervention context it is important to use scales that are sensitive to change and this may be less likely with the short forms.

The psychometric validation must be interpreted in light of the limitations of this study. First, the sample used for the item calibration was recruited from an English speaking web-based panel company in Canada and as such non-English speaking parents were excluded as this was a study requirement. Second, as the calibration was conducted in a Canadian sample, the extent to which the properties found in this study are stable across other populations remains unknown. Third, while invariance of item parameters was examined across ethnicity, these analyses were limited to comparing Caucasians versus all other ethnicities. Fourth, our sample size precluded us from splitting our sample into two samples so that the findings could be cross-validated. Therefore, future studies will need to further assess the psychometric properties of the item bank in larger, more varied samples to obtain measures that are stable across various ethnic groups and especially focus on cross-validating the structure of the general constructs.

In conclusion, the psychometric validation supported a simplified version of the PAPP conceptual framework for 5–12 year-old children, which was initially endorsed by 24 experts from 6 countries. The PAPP item bank resulted in nine constructs that assessed three main domains of PA parenting practices namely neglect/control, structure, and autonomy promotion. The CAT simulations demonstrated the potential of using calibrated item banking to develop short forms that have adequate reliability and resulted in the development of shorter measures to assess these nine construct (31 items instead of 65 items).