1 Introduction

The concept of Active citizenship (AC) is related to civic engagement and building social capital and is defined as participation in civil society, community and political life, characterized by mutual respect and non-violence and in accordance with human rights and democracy [1].

It deals with issues relating to rights and responsibilities. Clearly, form and content of AC varies with social values, cultural and political to environmental activities at local, regional, national and even international levels. Thus, it excludes participation in extremist groups promoting intolerance and violence.

In a democratic system, quality of citizenship depends on behavior of community members in terms of active participations in politics, society and community along with values. Improving economy, social, and political effectiveness of a country depends heavily on active participation of its citizens. In fact, the concept of good government differs in different countries. Attempts have been made to assess quality of citizenship and degree of active participation [2] with different number of indicators and associated dimensions. Factors of AC at individual level and also at national level using a multilevel model were identified [3] and found higher AC in countries with high GDP, equal distribution of income and heterogeneous religious climate.

Measurement of Participation Index (PI) means choosing a real valued function f from n-dimensional space corresponding to n- number of indicators. Quality of PI depends on properties of such function and measurement procedures adopted for the indicators. Ignoring the issues of selection of dimensions and items, the paper gives two methods of aggregation of discrete indicator scores to continuous scores following normal distribution for better and meaningful uses of PI and satisfying desired properties including computation of PI for a group of countries. Problems of construction of PI and remedial actions are addressed with the work of [4] as an example.

2 Construction of participation index

2.1 Framework on the concept of citizen participation

There is no agreement in operational definition of multidimensional participation index (PI). Active citizenship composite index (ACCI) considered participation in four dimensions viz. Political Life, Civil Society, Community Life and the Values and different number of sub-dimensions and indicators under each dimension [5]. Direct impact of participation in the plan-making process was found [6] in inclusion outcomes to reduce cultural barriers; mitigating poverty; increase economic opportunities and promote good governance, where degree of relationship varied across different indicators of social inclusion. In ref. [7] observed that theoretical framework of active ageing failed to identify the relevant contributing factors and barriers and need improved conceptual and analytical clarity. Moreover, empirical investigations of active ageing depend on health, education, good finances, etc. Thus, evaluation of active ageing for individuals may be an elusive goal for a large segment of older adults.

The European Charter on the Participation of Young People in Local and Regional Life (1992 and revised in 2003) is an international policy document approved by the Congress of Local and Regional Authorities of the Council of Europe (www.salto-youth.net/downloads/4-17-1510/Revised%20European%20Charter%20on%20the%20Participation%20of%20YP.pdf). The charter outlines the following 14 areas for involvement of young people:

  1. 1.

    Sport, leisure and associative life

  2. 2.

    Work and employment

  3. 3.

    Housing and transport

  4. 4.

    Education and training

  5. 5.

    Mobility and intercultural exchanges

  6. 6.

    Health

  7. 7.

    Equality for women and men

  8. 8.

    Young people in rural areas

  9. 9.

    Access to culture

  10. 10.

    Sustainable development and environment

  11. 11.

    Violence and crime

  12. 12.

    Anti-discrimination

  13. 13.

    Love and sexuality

  14. 14.

    Access to rights and law.

Similarly, the National Report on Youth Policy in Norway [8] summarizes a number of critical changes which impact on the lives of young people and has significant overlaps with the Council of Europe.

Socio-economic inequalities may restrict participation in the society and even undermines responsiveness and representativeness of people with disadvantaged backgrounds [9, 10]. Other factors influencing participation in positive fashions are increase in age, family income, education, etc. [11]. However, methodological issues like scaling, selection of weights and aggregation of indicators suffer from limitations for meaningful inter-country/inter-regional/inter-sample comparisons, classification, assessment of progress, etc. Statistical testing of equality of mean of participation across time and space can be done if assumptions of the techniques like normal distribution of the variables are satisfied. Effect of active participation to wellbeing and quality of life has been addressed by [12,13,14].

2.2 Data

To operationalize the concept, 63 basic indicators distributed over the following four dimensions and a number of sub-dimensions were considered. Details are given in Table 1.

Table 1 Dimensions and corresponding indicators

It may be observed that indicators like money donated are related with per capita GDP, which is an indicator of Human Development Index (HDI). Similarly, Women Participation in national parliament overlaps with Global Gender Gap Index (GGGI); Democracy may coincide with trust, perception of inequality, which are components of Social Cohesion Index (SCI). Thus, PI is likely to have positive correlations with HDI, GGGI, and SCI.

2.3 Natures of data on selected indicators

  • Membership to Political parties, Human rights organisations, Trade union organisations, and other organisations are in ratio scale usually given in percentages. However, addition of X% = \(\frac{{N}_{X}}{{D}_{X}}\times 100\) and Y% = \(\frac{{N}_{Y}}{{D}_{Y}}\times 100\) is not meaningful when \({D}_{X}\ne K{D}_{Y}\). For example, \(\frac{8}{24}\)= 33.33% and \(\frac{5}{10}=50\%\). Combined percentage is \(\frac{13}{34}=38.23\%\) and not average of \(50\%\) and 33.33% i.e. 41.67%

  • Participation could be at different levels like very active, active, dormant, etc. Information about participation obtained by survey using 3-point or 5-point items are ordinal and needs appropriate method of aggregation since latent distance between “very active” and “active” is not same between “active” and “dormant”. Thus, ordinal scores emerging from 3-point or 5-point items are not equidistant and hence, addition is not meaningful. X̅ > Y̅ or X̅ \(<\) Y̅ is meaningless for ordinal scales [15].

  • Data on money donated (in ratio scale) may not be reliable.

  • Voting Turnout in Parliament, Women Participation in national parliament are secondary data in percentages.

  • Working in an organisation or association is a binary indicator of Yes–No type. Signing a petition, taking part in lawful demonstrations, boycotting products, contacted a politician, etc. say in last 12 months are in numbers. However, concepts of lawful demonstrations, ethical consumption, etc. vary and need to be defined properly without ambiguity.

  • Non-organised help in the community may convey differently to different subjects. For better understanding of community organizing, [16] found it is useful to discuss what organizing is not. Functions of community organizers are different from community developers, social service professionals, lawyer, or other fields of community engagement.

  • Indicators of dimension of Values need to be defined since they are subjective and sensitive too. Questioner for such indicators needs to be developed carefully avoiding leading items/questions.

  • Correlation between a pair of indicators may vary. Very high correlation between two indicators gives rise to multicolinearity and needs to be avoided.

2.4 Observations

The selected indicators may not be exhaustive since active citizenship is an evolving concept. IT-related interactions and participations need to be included. Increasing use of digital technology like internet, civic/political websites, etc. offering diverse forms of interactive participations, information sharing, etc. may reflect the relationship between citizen and state, which could also influence the social networks on citizens’ willingness to participate in social governance. In ref. [17] used online and offline survey covering 15 provinces of China using questionnaires consisting of 1037 items including 773 valid items from online survey, and 250 items from offline survey. Here, the dependent variable “the citizens’ willingness to participate in social governance was assessed by 4-points items like “not willing at all,” “willing to participate in all the available activities,” “willing to participate in the activities of easy access,” and “willing to participate in the activities of easy access, with self-interests involved”. Independent variables were social capital having three parts viz. social networks, social trust, and social norms and each was assessed by 4-point items like “how many neighbors do you greet frequently in your community” and “how many neighbors are considered to be your friends in your community” To asses “social trust,” and “social norms,” Likert scales were used generating Ordinal scores.

Scale to assess civic competence is multidimensional due to presence of different dimensions. Principal Component Analysis (PCA) or Factor Analysis (FA) of indicators or dimensions may result in several independent factors. Thus, finding total scores of a multidimensional scale may not be appropriate. For example, the manual of the 36-Item Short Form Health Survey questionnaire (SF-36) provides no support to calculate total/overall score of SF-36 for an individual or a country due to several independent factors being measured by the scale (http://www.webcitation.org/6cfeefPkf)

2.5 Arithmetic aggregation

Addition of variables in ratio scale and variables in ordinal scale are problematic and difficult to interpret. Even if each of X and Y is in ratio scale, \(X+Y=Z\) is meaningful if X and Y follow similar distribution (may be with different parameters) and probability density function (pdf) of Z can be derived as convolution of distributions of X and Y.

Summative scores of a scale as sum of item scores assign equal importance to the items and dimensions despite different item-total or dimension-total correlations, factor loadings for dimensions and items as observed from PCA. Summative scores also suffers from substitution effect (poor score in one dimension can be compensated by higher scores in other dimensions) and may mislead results [18, 19].

Better could be to transfer data on K-point items (K = 2, 3, 4, 5, …….) to equidistant scores which can be further transformed to follow normal distribution with same range of item scores [20]. Here, sum of normally distributed item scores will also follow normal, facilitating estimation of the parameters from the data.

In addition, summative scores may result in a number of tied scores and reduce discriminating power of the scale. Quality of a scale in terms of psychometric properties like reliability, validity, discriminating value need to be reported.

2.6 Scaling

Raw scores of the indicators in different units and in different score ranges are usually scaled before aggregation. Min–Max function was used [4] where i-th indicator (\({X}_{i})\) was rescaled to \(\mathrm{unit less }{Y}_{i}\) by \({Y}_{i}= \frac{{X}_{i}-Min({X}_{i})}{Max\left({X}_{i}\right)-Min({X}_{i})}\) where \(0\le {Y}_{i}\le 1\)

Such scaling indicates relative performance and not absolute performance of a country [21]. It depends significantly on \({X}_{Max}\) and \({X}_{Min}\) which may be unreliable outliers. If X is in percentage,\([Max\left({X}_{i}\right)-Min({X}_{i})\)] may not be meaningful. Human Poverty Index (HPI) considers 3rd root and 4th root of average of figures in percentage for HPI-1 and HPI-2 respectively [22]. Min–Max transformation tends to overestimate the impact of indicators having small score ranges and changes distribution of the transformed scores and may have impact on the PI. Ranks of two countries may be influenced by performance of a third country [23]. Decrease in performances of the worst performing country may increase value of \({Y}_{i}\), even if \({X}_{i}\) for the i-th country remains unchanged. If \({X}_{Min}\) is changed, ranking and relative valuations may be changed due to change in marginal rates of substitution [24]. X−Y curve for a country is not linear since increase in Y per unit increase in X is different for different values of X. For example, consider an indicator X taking values: 58, 70, 80, 85, 90, 96. Clearly, \({X}_{Max}=96, {X}_{Min}=58\). Gain in Y due to increase of X from 80 to 90 is 0.2632. Similar gain from increasing X from 85 to 90 is 0.1032 which is less than 0.2632.

Better scaling ensuring unit less values is standardization by \(Z=\frac{X-Mean(X)}{SD(X)}\) \(\sim \) N (0, 1) where \(-\infty <{Z}_{i}<\infty \). Negative scores can be avoided by further transforming Z-scores to have a desired score range say 1 to 100. In PCA, original input variables are standardized to Z-scores.

Other illustrative transformations for scaling are:

  • \({Z}_{i}=\frac{{X}_{i}}{\overline{X} }\times 100\). It is less robust to the influence of outliers and is linearly related to Proportionate Normalization where \({Z}_{i}=\frac{{X}_{i}}{\sum {X}_{i}}\)

  • \({Z}_{i}=\frac{{X}_{i}}{{X}_{Max}}\times 100\). Depends on \({X}_{Max},\)

  • For longitudinal data, \({Y}_{i}^{t}=\frac{{X}_{i}^{t}-{X}_{i}^{t-1}}{{X}_{i}^{t}}\times 100\) where t denotes time period or ratio with \({X}_{i}^{0},\) the base period as \(\frac{{X}_{i}^{t}}{{X}_{i}^{0}}\times 100\) for each positive indicator/domain and \(\frac{{X}_{j}^{0}}{{X}_{j}^{t}}\times 100\) where j denotes a negative indicator/domain.

  • Logarithmic transformation of an indicator:\({Y}_{i}=\mathit{ln}\left({X}_{i}\right)\) for \({X}_{i}\ge 0\). For the Income component, HDI (2010) used \({Income}_{X}=\frac{{{log}_{e}}^{X}- {{log}_{e}}^{({X}_{Min})}}{{{log}_{e}}^{({X}_{Max})}- {{log}_{e}}^{({X}_{Min})}}\). Here, rate of increase of \({Income}_{X}\) is different for different values of X and \({Income}_{X}\) is not invariant under change of origin. Logarithmic transformation fails to satisfy desired properties like Translation Invariance and consistency in aggregation [25]. The index depends on the normalization methods applied to different indicators. Moreover, non-linear logarithmic transformation may alter the structure of the data. \({r}_{Life expectancy, HDI>{r}_{Life expectancy, GDP}}\) [26]. But, the inequality got reversed on taking logarithmic transformations.

The scaling methods have advantages and disadvantages. There is no best method of scaling. Different normalization techniques resulted in different ranking lists [27]. Multi-criteria decision-making (MCDM) methods like Analytic Hierarchy Process (AHP), Data envelopment analysis (DEA), Benefit of the Doubt (BoD), etc. require no normalization.

2.7 Weighted sum

Weighted sum is a popular approach for aggregation where weights (\({W}_{i}{\prime}s)\) are chosen to the indicators (\({X}_{i}{\prime}s\)) and the composite index (CI) is obtained as \(CI= \sum {W}_{i}{X}_{i}\) where 0 < \({W}_{i}\)< 1 and \(\sum {W}_{i}=1\) to satisfy the convex property. In ref. [4] used weighted sum to get dimension scores and further weights to the dimensions to compute PI.

There are different methods of selecting weights. Approaches to find weights to the indicators could be normative (subjective weights), data‐driven (determined objectively) or hybrid [28]. Selected weights serve as ‘trade-offs’ and reflect relative importance. The amount of indicator 2 to be sacrificed to gain an extra unit of indicator 1 is reflected in the ratio \(\frac{{W}_{1}}{{W}_{2}}\). Such trade-offs may not be meaningful when the indicators relate to economic growth (say GDP) and improvement in non-monetary areas like values against discrimination.

Weights of DEA are obtained satisfying relevant constraints of the Linear programming and deriving a single aggregate measure for each decision making unit (DMU). DEA results are sensitive to the choice of inputs and outputs. However, the best specification cannot be tested. Moreover, number of efficient units tends to increase with increase in number of inputs and output variables [29]. AHP finds weights of criteria based on \(\frac{n(n-1)}{2}\) pair-wise comparisons for n-number of alternatives using 1 to 9 scales, which gets increased with increase in n. Inconsistency may take place since transitivity rule does not hold good for all elements of the matrix showing pair wise comparisons, ambiguity in judgement scale may not be avoided. AHP cannot be of much use to requirements priority [30].

Best–Worst Method (BWM) finds weights based on Tchebychev distance, requiring less pair-wise comparisons than AHP [31]. However, major issues of BWM are lack of threshold for the consistency ratio, ordinal consistency, and complex calculation process, especially for large n. There could be situations where there is no unique best and/or worst criterion/criteria (as required in BWM). Situations like \(\ge 2\) best or worst criteria cannot be solved easily by BWM. Using scale from 1 to 9 to determine the most important/preferred criterion (or the least important criterion) is subjective and is sensitive to the sample composition. BoD weights can be used for linear aggregation but not for non-linear/non-compensatory aggregation. The weight vector of AHP as factor loadings of the principal eigenvector has significant drawbacks [32]. For Economic Sentiment Indicator and environmental sustainability index, PCA and FA failed [33]. Weight vector \({\varvec{W}}={({W}_{1}, {W}_{2}, \dots . ,{W}_{n})}^{T}\) was proposed by [34] such that \(\sum_{i=1}^{n}{W}_{i}=1\) and variance of \(Y=\) \(\sum_{i=1}^{n}{W}_{i}{X}_{i}\) is minimum. Instead of Xi’s, if standardized scores \({Z}_{ij}=\frac{{X}_{ij}-\overline{{X }_{j}}}{{S}_{{X}_{j}}}\) are taken, the \({r}_{Y,{Z}_{i}}\)= \({r}_{Y,{Z}_{j}}\)=\(\frac{1}{\sqrt{{e}^{T}{R}^{-1}e}}\) where Y= \(\sum_{i=1}^{n}{W}_{i}{X}_{i}\) and R is the correlation matrix and i ≠ j. In other words, \(Y\) is equi-correlated with\({Z}_{i}^{\prime}s\).

However, different methods of selection of weights may affect PI differently and may not ensure rank robustness. CI as weighted sum do not discuss about variance of the weighted sum and correlation of CI with the chosen indicators. Equal weights were applied to the dimensions and indicators within each sub-dimension to compute Active Citizenship Composite Indicator [4]. Equal weighting with perfect substitutability may reverse country ranks [35]. No weight or equal weights are wrong and no weighting system is above criticism [36]. Just like no consensus of the perfect weighting scheme, there is no ideal aggregation scheme [37]. Thus, it is desirable to construct PI without considering weighted sum.

3 Proposed methods

3.1 Arithmetic aggregation (Method-1)

Let \({X}_{ij}\) be the raw ordinal score of a country in the i-th item for choosing the j-th response-category where higher score ⇔ higher participation. For a 5-point item, weighted score (WS) =\(\sum \sum {W}_{ij}{X}_{ij}\) where \({W}_{ij}{\prime}s\) are different for different levels of the item satisfying \({W}_{ij}>0 \mathrm{and}\) \(\sum_{\mathrm{j}=1}^{5}{W}_{ij}=1.\)

Scores of the i-th item will be equidistant and monotonic if \({W}_{i1},\) \(2{W}_{i2}\), \(3{W}_{i3}\), \({4W}_{i4}\) and \({5W}_{i5}\) forms an arithmetic progression with positive common difference (\(\alpha \)).

For the i-th item, find maximum (\({f}_{i max})\) and minimum frequency (\({f}_{i min})\) of the levels. Find initial weights \({\omega }_{ij}=\frac{{f}_{ij}}{n}\).

Arrange the \({\omega }_{ij}{\prime}s\) so that \({\omega }_{i1}\)< \({\omega }_{i2}\) < \({\omega }_{i3}<\) \({\omega }_{i4}\) <\({\omega }_{i5}\) where \({\omega }_{i1}\)= \(\frac{{f}_{i min}}{n}\) and \({\omega }_{i5}=\frac{{f}_{i max}}{n}\).

Let intermediate weight \({W}_{i1}={ \omega }_{i1}\)

Compute \(\alpha \) as \(\alpha \) \(=\) \(\frac{5{f}_{imax}-{f}_{i min}}{4n}\) since \({W}_{i1}+ 4\alpha =5{\mathrm{W}}_{\mathrm{i}5}\)

Other intermediate weights are \({W}_{i2}=\frac{{\omega }_{i1}+ \alpha }{2}\), \({W}_{i3}=\frac{{\omega }_{i1}+ 2\alpha }{3}\);\({W}_{i4}=\frac{{\omega }_{i1}+ 3\alpha }{4}\); and

\({W}_{i5}= \frac{{\omega }_{i1}+ 4\alpha }{5}\). Get final weights \({W}_{ij(Final)}\) = \(\frac{{W}_{ij}}{\sum_{j=1}^{5}{W}_{j}}\) enabling \(\sum {W}_{ij(Final)}=1\) and

\(j.{W}_{j\left(Final\right)}-\left(j-1\right).{W}_{\left(j-1\right)\left(Final\right)}\) = constant (different for different items).

3.1.1 Observations

  1. i)

    \({W}_{j(Final)}\) are based on empirical probabilities.

  2. ii)

    \({f}_{ij}=0\) for a particular level of an item, can be taken as zero value for scoring Likert items as weighted sum.

  3. iii)

    Generated scores (E) as weighted sum are continuous, equidistant and monotonic in ratio scale.

  4. iv)

    Method-1 is applicable for items with different number of response-categories including binary items.

Standardize E-scores of the i-th item by \({Z}_{ij}=\frac{{X}_{ij}- \overline{{X }_{i}}}{SD({X}_{i})}\sim N(0, 1)\).

Take linear transformation of \({Z}_{ij}\) to P-scores by:

$$P=\left(99\right)*\left[ \frac{({Z}_{ij}- Min\left({Z}_{ij}\right)}{Max \left({Z}_{ij}\right)- Min\left({Z}_{ij}\right)}\right]+1$$
(1)

For the i-th item, \({P}_{i}\sim N ({\mu }_{i}, {\sigma }_{i}^{2})\) and \(1\le {P}_{i}\le 100\) where estimates of \({\mu }_{i}\) \(\mathrm{and} {\sigma }_{i}^{2}\) are obtained from the data. P-score of an item as per Eq. (1) can be obtained irrespective of length of scale and width of items.

For the indicators in ratio scales, raw scores can be standardized and use (1) to follow normal.

Domain score of a country is taken as sum of normally distributed P-score of relevant items which will follow normal with mean \(\sum_{i}{\mu }_{i}\) and SD = \(\sqrt{\sum {\sigma }_{i}^{2}+ 2\sum_{i\ne j}Cov({P}_{i},{P}_{j}})\). Similarly, scale score is sum of domain scores also following normal.

3.1.2 Properties

  1. 1.

    Domain scores (\({D}_{i}\)) and scale scores (\({S}_{i})\) of the i-th country are continuous, monotonically increasing, each following normal distribution. Normality ensures meaningful computation of arithmetic average, SD, correlation, etc. and facilitates statistical analysis under parametric set up including unbiased estimates of population mean (\(\mu )\), population variance \(\left({\sigma }^{2}\right),\) confidence interval of \(\mu ,\) and testing of null hypothesis \({H}_{0}: {\mu }_{1}={\mu }_{2}\) or \({H}_{0}: {\sigma }_{1}^{2}={\sigma }_{2}^{2}\) etc. across time and space.

  2. 2.

    \(Var\left( {S_{i} } \right) > \sum Var\left( {D_{i} } \right) \Rightarrow\)Positive correlations for most pair of dimensions.

  3. 3.

    Progress of the i-th country in successive time-periods can be quantified by \(\frac{{S_{i\left( t \right)} - S_{{i\left( {t - 1} \right)}} }}{{S_{{i\left( {t - 1} \right)}} }} \times 100\), which also quantifies responsiveness of the scale and effectiveness of a implement policy measure. Deterioration is indicated if \(\frac{{S}_{i(t)}-{S}_{i(t-1)}}{{S}_{i(t-1)}}\times 100<0\)

Similarly, progress for a group of countries is reflected if \(\overline{{S }_{i(t)}}>\overline{{S }_{i(t-1)}}\). Normally distributed \({S}_{i}\) helps to test \({H}_{0}: {\mu }_{{S}_{t}}={\mu }_{{S}_{(t-1)}}\) and\({H}_{0}\):\({Progress}_{\left(t+1\right)over t}=0\). Deterioration if any may be probed to find extent of deterioration in concerned domain(s) for possible corrective actions.

  1. 4.

    The plot of progress/deterioration of a country at various time points can be used to compare PI from the base period.

  2. 5.

    Policy makers are interested to know elasticity of each dimension as change in S-score per unit change in a dimension score i.e. \(\frac{\nabla S}{\nabla {D}_{i}}\). The dimensions can be ranked in terms of \(\frac{\nabla S}{\nabla {D}_{i}}\)

  3. 6.

    Normally distributed scores satisfy the assumptions of PCA and enable to find factorial validity in terms of ratio of the first eigenvalue to the sum of all eigenvalues i.e. Factorial Validity = \(\frac{{\lambda }_{1}}{\sum {\lambda }_{i}}\), where \({\lambda }_{1}\) is the highest eigenvalue reflecting the main factor for which the scale was developed[38] and accounts for \(\frac{{\lambda }_{1}}{\sum {\lambda }_{i}} \times 100\) percent of overall variability. Such factorial validity avoids the problems of construct validity and selection of criterion scale. Tracy–Widom (TW) statistic can be used to test significance of the largest or other eigenvalues [39].

  4. 7.

    Normality helps to estimate variance of subclass, scale and each item and estimated Cronbach alpha for a domain/subclass at population level as

    $$\widehat{\alpha }=(\frac{n}{n-1})(1-\frac{Sum\, of\, estimates\, of\, variance\, of\, items\, in\, the\, sub-class)}{Estimate\, of\, variance\, of\, the\, sub-class})$$
    (2)

Reliability of scale \(({r}_{tt})\) consisting of K-number of dimensions can be obtained as a function of dimension reliabilities by

$${r}_{tt}=\frac{\sum_{i=1}^{K}{r}_{tt(i)}{S}_{Xi}+ \sum_{i=1, i\ne j}^{K}\sum_{j=1}^{K}2COV({X}_{i},{X}_{j})}{\sum_{i=1}^{K}{S}_{Xi}+ \sum_{i=1, i\ne j}^{K}\sum_{j=1}^{K}2COV({X}_{i},{X}_{j})}$$
(3)

where \({r}_{tt(i)}\) and \({S}_{xi}\) denote respectively reliability and SD of the i-th dimension [34].

  1. 8.

    Discriminating value of a scale indicates ability of the scale to distinguish between countries that have different degrees of the underlying construct. Discriminating value of an item (\({Disc}_{i})\) and test (\({Disc}_{Test})\) can be computed by Coefficient of variation (CV) where \({Disc}_{i}=\) \(\frac{{SD}_{i}}{{mean}_{i}}\) and \({Disc}_{Test}= \frac{{SD}_{Test}}{{Mean}_{Test}}\). Cronbach \(\alpha \) and \({Disc}_{Test}\)(with m-items) are related by

    $$\alpha =(\frac{m}{m-1})(1-\frac{\sum_{i=1}^{m}{\overline{{X }_{i}}}^{2}\cdot{Disc}_{i}^{2}}{{\overline{X} }^{2}\cdot{{Disc}_{T}}^{2}})$$
    (4)

Since, variance of the i-th item \({S}_{{X}_{i}}^{2}={\overline{{X }_{i}}}^{2}.\,{Disc}_{i}^{2}\) \(\forall \) i = 1, 2, …., m \(\Rightarrow { }\mathop \sum \nolimits_{i = 1}^{m} S_{{X_{i} }}^{2} = \mathop \sum \nolimits_{i = 1}^{m} \overline{{X_{i} }}^{2} \cdot Disc_{i}^{2}\) and Test variance \(S_{X}^{2} = \overline{X}^{2}\)·\({{Disc}_{T}}^{2}\)

It can be proved that

$${({Disc}_{Test})}^{2}=\frac{{{CV }_{True scores}}^{2}}{{r}_{tt}}\mathrm{where }{r}_{tt}=\frac{{S}_{T}^{2}}{{S}_{X}^{2}}$$
(5)

Thus, test reliability and \({Disc}_{Test}\) are related by a negative non-linear relationship.

9. Quartile clustering helps in classification of a group of countries in four mutually-exclusive classes \({Q}_{1}, {Q}_{2}, {Q}_{3}, {Q}_{4}\). Quartile clustering of scale scores following normal distribution may be adopted because it is simple, appealing, provides well-defined cut-off scores for the four mutually exclusive classes and assigns equal probability to each quartile/class

$$ \mathop \smallint \limits_{0}^{{Q_{1} }} f\left( x \right)dx = \mathop \smallint \limits_{{Q_{1} }}^{{Q_{2} }} f\left( x \right)dx = \mathop \smallint \limits_{{Q_{2} }}^{{Q_{3} }} f\left( x \right)dx = \mathop \smallint \limits_{{Q_{3} }}^{{Q_{4} }} f\left( x \right)dx $$
(6)

Similar approach may be used for classification to five classes (pentiles), ten classes (deciles), etc.

For normally distributed dimension scores, a given score \({X}_{0}\) in dimension-1 will be equivalent to a score of \({Y}_{0}\) in dimension-2 if

$$ \mathop \smallint \limits_{{ - \infty }}^{{X_{0} }} f\left( x \right)dx = \mathop \smallint \limits_{{ - \infty }}^{{Y_{0} }} g\left( y \right)dy $$
(7)

where \(f(X)\) and \(g(Y)\) denote normal probability density function (pdf) of transformed scores of dimension-1 and dimension-2 respectively. The Eq. (7) can be solved using Standard Normal probability table. It helps to find all combinations of {\({X}_{0}, {Y}_{0} \}\) including cut-off scores of two scales.

3.2 Geometric aggregation (Method-2)

Let \({X}_{i0}\) denote value of the indicator in the base period. Unit free ratio \(\frac{{X_{it} }}{{X_{i0} }}\) indicates progress or decline of the country with respect to the i-th indicator at t-th time period over the base period. PI for the c-th time period (current period) is defined as the Geometric mean of n-indicators i.e.

$${PI}_{c0}=\sqrt[n]{\frac{{X}_{1c}, {X}_{2c},\dots \dots .., {X}_{nc}}{{X}_{10} {X}_{20} \dots \dots .. {X}_{n0}}}\times 100$$
(8)

or equivalently by avoiding the n-th root,

$${PI}_{c0}=\frac{{X}_{1c}, {X}_{2c},\dots \dots .., {X}_{nc}}{{X}_{10} {X}_{20} \dots \dots .. {X}_{n0}}\times 100$$
(9)

Equation (9) can also be written in terms of dimensions scores as

$${DI}_{c0}=\frac{{D}_{1c}{D}_{2c}\dots \dots .{D}_{kc}}{{D}_{10}{D}_{20}\dots \dots ..{D}_{k0}}\times 100$$
(10)
$$\mathrm{Clearly}, log{PI}_{c0}= \frac{1}{n}\sum_{i=1}^{n}log\frac{{X}_{ic}}{{X}_{i0}}$$
(11)

Here, \(\frac{{X_{it} }}{{X_{i0} }} > 1\) indicates progress and \(\frac{{X}_{it}}{{X}_{i0}}<1\) indicates decline assuming higher score ⇔ higher participation. The indicators for which \(\frac{{X}_{it}}{{X}_{i0}}<1\) are critical and can be identified as the indicators for which \(\frac{{X}_{it}}{{X}_{i(t-1)}}<1\) for deciding appropriate action plan. \(\frac{{PI}_{{i}_{t}}}{{PI}_{{i}_{(t-1)}}}>1\) implies overall improvement made by the i-th country in period t over (t-1)-th period.

Note that each of (9) and (10) can be applied for all types of indicators in ratio scales or in ordinal scales and considers all chosen indicators including those in percentages and depicts overall improvement/decline in the current year over the base year by a continuous function which is symmetric over its arguments and increases monotonically. Replacing the base period vector by the vector for the previous year will give improvement of PI on year-to-year basis.

3.2.1 Properties

Each of (9) or (10) satisfy the following desired properties:

  1. 1.

    Independent of order of the chosen indicators or dimensions and independent of change of scale

  2. 2.

    Increase of say 1% in \(\frac{{X_{it} }}{{X_{i0} }} \Rightarrow 1\%\) increase in the PI if all others are unchanged. Thus, the curve showing gain in \(\frac{{X_{it} }}{{X_{i0} }}\) and gain in PI is linear.

  3. 3.

    Avoids scaling, selection of weights and reduces level of substitutability between component indicators.

  4. 4.

    Not affected much by outliers and thus produce no bias for developed or under-developed countries.

  5. 5.

    Satisfaction of time-reversal test and formation of chain indices by Method-2 may help inter-country comparison over time by tracking the path of overall progress registered by a country.

  6. 6.

    Facilitates construction of separate indices for each domain by focusing on indicators related to that domain without further weights for domains.

  7. 7.

    Easy to find relative importance of each indicator. The ratios for which \(\frac{{X}_{it}}{{X}_{i0}}<1\) are the critical areas requiring attention.

  8. 8.

    Reliability, factorial validity, discriminating value of PI or a dimension can be found taking logarithm on both sides of Eq. (10).

  9. 9.

    Facilitates estimation of population GM, standard error of the GM and confidence interval of GM [40] since \(\mathrm{log}GM= \frac{1}{n}\sum_{i=1}^{n}{\mathrm{log}Y}_{i}\) where \({Y}_{i}=\frac{{X}_{it}}{{X}_{i0}}\). Geometric standard deviation (GSD), is given by log \({S}_{GM}= {[\frac{1}{n}\sum_{i=1}^{n}{(log {Y}_{i}-\mathrm{log}GM)}^{2}]}^\frac{1}{2}\) where \({S}_{GM}\) denotes GSD. This implies log (GSD of\({Y}_{1}, {Y}_{2}, \dots \dots , {Y}_{n}\)) = usual SD of log\({Y}_{1}\), log\({Y}_{2},\) ………, log \({Y}_{n}\)

Population estimate of GM is sample GM for large sample and estimate of standard error of the GM is \(Sample GM.(\frac{\mathrm{ Log }{S}_{GM}}{\sqrt{n-1}})\). Upper and lower limit of the confidence interval of GM for (1−α) % are (\({e}^{U}, {e}^{L})\) where \({U=logGM+ S}_{m}.{t}_{(\frac{\alpha }{2},df)}\) and \({L=logGM-S}_{m}.{t}_{(\frac{\alpha }{2}, df)}\)

Thus, significance tests of hypotheses regarding equality of GM’s can be performed across time and space using conventional t-tests on the logarithms of the observations.

4 Benefits of Method-1 and Method-2

The proposed PI can be constructed even for skewed longitudinal data over long time periods and also for two time periods. Thus, it helps in meaningful comparison of a set of regions/countries with reduced level of substitutability among the indicators/dimensions. For comparison of sub-groups of a country say ethnic groups, religious groups, disadvantage groups, elderly people, etc., the proposed index can also be applied with pre-determined indicators. Method-2 offers a simple solution to assess PI even for a single country from the base period without resorting to group data. Both methods can be well used for undertaking Cluster analysis and classification of countries. The graph of progress/decline of PI over time, for a country will help to assess impact of various socio-economic measures across time.

In addition to citizen participation, multidimensional CI as proposed in this article can also be applied to gender equality, quality of life, and socioeconomic well-being [41, 42]. It would be useful to comment on other composite indicators used in this type of studies, such as the synthetic distance indicator DP (2) [43] or the two methods proposed here to estimate gender equality. Empirical comparison of the two methods could be made and the most appropriate one selected for the estimation of citizen participation.

5 Limitations

  1. 1.

    Introduction of new indicator requires estimation of value of that indicator in the base period and subsequent periods.

  2. 2.

    The method fails if an indicator is \(\le 0.\)

6 Discussion

Focusing on methodology of assessing PI, the paper describes two assumption free methods avoiding scaling. Method-2 also avoids selection of weights. Thus, the Method-2 avoids the problems of rank robustness specific to selection of weights [35]. Normal distributions of two or more groups by Method-1 are likely to give rise to lower value of Ginis coefficient indicating equality. Conversion of discrete and ordinal scores of k-point items (\({X}_{i})\) to continuous, monotonically increasing equidistant scores (\({E}_{i}\)) by weighted sum where weights to different response-categories of different items are in ratio scales. Linear transformations of E-scores to Z-scores and P-scores can be added with indicators measured in ratio scales.

Benefits of both the method include:

  • Meaningful arithmetic aggregation (Method-1). Multiplicative aggregation in Method-2 can be converted to additive model by taking logarithms.

  • Normally distributed dimension scores (\({D}_{i})\) and PI satisfy many desired properties like meaningful comparisons, ranking and classification of set of countries, estimation of population parameters and testing of statistical hypothesis. However, normality of PI to be tested for Method-2 by say Anderson –Darling test.

  • Identify critical dimensions showing deterioration with time and thus, draw attention of the policy makers for initiation of corrective actions

  • Find effect of small change in i-th dimension to PI by elasticity as ratio of change in PI due to unit change in a dimension. Such elasticity can be used to rank the dimensions.

  • Possible to assess Progress/decline of PI at successive time periods for monitoring of effect of policies and strategies. Statistical hypothesis of significance of progress/decline of PI at two different time periods can be undertaken since ratio of two normally distributed variable follows \({\chi }^{2}\) distribution.

  • High reliability of the scale \(({r}_{tt})\) indicates rank robustness, which can be quantified by Spearman’s rank correlation (\(\rho ).\)

Thus, the proposed measures satisfying the desired properties is an improvement over the existing methods

7 Conclusion

The paper contributes to improve scoring of PI avoiding major limitations of ordinal scores and facilitating analysis under parametric set up for meaningful comparisons. Policy makers and researchers can take advantages of the proposed methods of arithmetic aggregation without weights or multiplicative aggregation without scaling and choosing weights. Both methods satisfy desired properties. Method-2 offers more generalized approach satisfying time reversal test and formation of chain indices. However, test of normality is required for this method unlike the Method-1 which ensures normally distributed scores. The proposed methodologies are innovative and contribute to estimation of citizen participation in their respective territories or countries. Empirical studies may be undertaken to find correlations of PIs by each proposed method and generalization of findings.