An Application of Partial Least Squares to the Construction of the Social Institutions and Gender Index (SIGI) and the Corruption Perception Index (CPI)

Yoon, Jisu; Klasen, Stephan

doi:10.1007/s11205-017-1655-8

An Application of Partial Least Squares to the Construction of the Social Institutions and Gender Index (SIGI) and the Corruption Perception Index (CPI)

Published: 25 May 2017

Volume 138, pages 61–88, (2018)
Cite this article

Social Indicators Research Aims and scope Submit manuscript

Jisu Yoon¹ &
Stephan Klasen¹

887 Accesses
17 Citations
Explore all metrics

Abstract

Composite indices used in social science research often rely on principal components analysis (PCA) as a way to derive weights for component variables, which emphasizes the largest variations in the variables in a composite index. However, PCA may not work when the informative variations account for only a small share of the variance in the variables; also, the best weighting scheme may also depend on the use of a particular composite index. We consider partial least squares (PLS) as an alternative weighting scheme, which takes advantage of the relationship between outcome variables of interest and the variables in a composite index. In this paper, the Social Institutions and Gender Index (SIGI), a composite index produced by the OECD, is re-constructed using weights generated by PCA and PLS. Using the revised SIGIs and female education, fertility, child mortality, and corruption as outcome variables, we investigate the relationship between social institutions related to gender inequality and these development outcomes, controlling for relevant other determinants. We find that gender inequality in social institutions has a significant correlation with fertility and corruption regardless of the weighting procedure, while for female education and child mortality only the SIGIs based on PLS show significant results. Additionally, PLS brings benefits in terms of prediction compared to PCA for female education and child mortality. In our analysis of corruption, we consider not only the Corruption Perception Index (CPI) as our measure of corruption, but also create new reweighted CPIs again using PLS and PCA as weighting procedures. The CPI based on PCA shows a significant correlation with gender inequality, while the correlation is only marginally significant when using the PLS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The plough, gender roles, and corruption

Article 26 February 2018

Gautam Hazarika

The Data Envelopment Analysis and Equal Weights/Minimax Methods of Composite Social Indicator Construction: a Methodological Study of Data Sensitivity and Robustness

Article 20 May 2020

Chao Shi & Kenneth C. Land

The Human Development Index: Objective Approaches (2)

Notes

We decided to use slightly different reference years than in Sect. 3, since a new standardization scheme is introduced in 2002 for the CPI data, which might undermine the comparability of the scalings.

References

Alesina, A., Devleeschauwer, A., Easterly, W., Kurlat, S., & Wacziarg, R. (2003). Fractionalization. Journal of Economic Growth, 8(2), 155–194.
Article Google Scholar
Branisa, B., Klasen, S., & Ziegler, M. (2013). Gender inequality in social institutions and gendered development outcomes. World Development, 45, 252–268.
Article Google Scholar
Branisa, B., & Ziegler, M. (2011). Reexamining the link between gender and corruption: The role of social institutions. In Proceedings of the German development economics conference, Berlin (Vol. 15).
Correlates of War 2 Project. (2003). Colonial/dependency contiguity data, v3.0. http://correlatesofwar.org/.
de Jong, S. (1993). SIMPLS: An alternative approach to partial least squares regression. Chemometrics and Intelligent Laboratory System, 18, 251–263.
Article Google Scholar
Dreher, A. (2006). Does globalization affect growth? Evidence from a new index of globalization. Applied Economics, 38(10), 1091–1110.
Article Google Scholar
Filmer, D., & Pritchett, L. H. (2001). Estimating wealth effects without expenditure data-or tears: An application to educational enrollments in states of India. Demography, 38(1), 115–132.
Google Scholar
Freedom House. (2008). Freedom in the world 2008. http://www.freedomhouse.org.
Greenacre, M. (2010). Correspondence analysis in practice. Boca Raton: Chapman and Hall/CRC.
Google Scholar
Helland, I. S. (1990). Partial least squares regression and statistical models. Scandinavian Journal of Statistics, 17(2), 97–114.
Google Scholar
Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24(6), 417–441.
Article Google Scholar
Kolenikov, S., & Angeles, G. (2009). Socioeconomic status measurement with discrete proxy variables: Is principal component analysis a reliable answer? Review of Income and Wealth, 55(1), 128–165.
Article Google Scholar
Krämer, N., & Sugiyama, M. (2011). The degrees of freedom of partial least squares regression. Journal of the American Statistical Association, 106(494), 697–705.
Article Google Scholar
Maitra, S., & Yan, J. (2008). Principle component analysis and partial least squares: Two dimension reduction techniques for regression. Applying Multivariate Statistical Models, 79, 79–90.
Google Scholar
Marshall, M. G. (2013). Polity IV project: Political regime characteristics and transitions, 1800–2012. http://www.systemicpeace.org/polity/polity4.htm.
Martens, H., & Martens, M. (2000). Modified Jack-knife estimation of parameter uncertainty in bilinear modelling by partial least squares regression (PLSR). Food quality and preference, 11(1), 5–16.
Article Google Scholar
Meulman, J. (2000). Optimal scaling methods for multivariate categorical data analysis (p. 12). Leiden: Leiden University.
Google Scholar
Mevik, B.-H., & Cederkvist, H. R. (2004). Mean squared error of prediction (MSEP) estimates for principal component regression (PCR) and partial least squares regression (PLSR). Journal of Chemometrics, 18(9), 422–429.
Article Google Scholar
Naes, T., & Martens, H. (1985). Comparison of prediction methods for multicollinear data. Communications in Statistics-Simulation and Computation, 14(3), 545–576.
Article Google Scholar
Niitsuma, H., & Okada, T. (2005). Covariance and PCA for categorical variables. In T. B. Ho, D. Cheung, & H. Liu (Eds.), Advances in knowledge discovery and data mining, PAKDD 2005. Lecture notes in computer science (Vol. 3518). Berlin: Springer.
Puwakkatiya-Kankanamage, E. H., García-Muñoz, S., Biegler, L. T. (2014). An optimization-based undeflated PLS (OUPLS) method to handle missing data in the training set. Journal of Chemometrics, 28(7), 575–584.
Article Google Scholar
Russolillo, G. (2009). Partial least squares methods for non-metric data. Ph.D. thesis, Università degli Studi di Napoli Federico II.
Rutstein, S. O., & Johnson, K. (2004). The DHS wealth index. ORC Macro, Measure DHS.
Schafer, J. L. (1999). Multiple imputation: A primer. Statistical Methods in Medical Research, 8(1), 3–15.
Article Google Scholar
Sen, A. (1999). Development as freedom. Oxford: Oxford University Press.
Google Scholar
Tenenhaus, M., & Young, F. W. (1985). An analysis and synthesis of multiple correspondence analysis, optimal scaling, dual scaling, homogeneity analysis and other methods for quantifying categorical multivariate data. Psychometrika, 50(1), 91–119.
Article Google Scholar
Transparency International. (2013). Corruption Perception Index. http://www.transparency.org/.
United Nations Development Programme. (1995). Human development report. New York: Oxford University Press.
Wold, H. (1966a). Estimation of principal components and related models by iterative least squares. In P. Krishnaiah (Ed.), Multiuariate analysis (pp. 391–420). New York: Academic Press.
Google Scholar
Wold, H. (1966b). Nonlinear estimation by iterative least squares procedures. Research papers in statistics. New York: Wiley.
Google Scholar
Wold, S., Martens, H., & Wold, H. (1983). The multivariate calibration problem in chemistry solved by the PLS method. Lecture Notes in Mathematics, 973, 286–293.
Article Google Scholar
World Bank. (2008). World development indicators. http://data.worldbank.org/data-catalog/world-development-indicators.
World Bank. (2009). GenderStats. http://datatopics.worldbank.org/gender/.
Yoon, J., & Krivobokova, T. (2015). Treatments of non-metric variables in partial least squares and principal component analysis. In Courant Research Centre: Poverty, equity and growth−Discussion Papers, 172. http://www.uni-goettingen.de/en/67061.html.
Zwick, W. R., & Velicer, W. F. (1986). Comparison of five rules for determining the number of components to retain. Psychological Bulletin, 99(3), 432.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Economics, University of Goettingen, Goettingen, Germany
Jisu Yoon & Stephan Klasen

Authors

Jisu Yoon
View author publications
You can also search for this author in PubMed Google Scholar
Stephan Klasen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jisu Yoon.

Appendices

Appendix 1: Sample Selection of the Regressands

Figure 2 shows the estimated density of the kept and dropped observations of the regressands using kernel density estimation. We observe that the dropped and kept observations of child mortality and the CPI have nearly identical distribution. On the other hand, we observe slight differences for female education and fertility. However, we suspected that the differences could be because of the randomness in the non-parametric estimation, which is expected to be high considering small number of the dropped observations (33, 27, 27 and 39). Therefore, we performed the Welch Two Sample t-test, which didn’t deny the null hypothesis that the dropped and the kept observations have the same mean. The p values are 0.43, 0.27, 0.82 and 0.64 for female education, fertility, child mortality and the CPI respectively.

Appendix 2: Weights and Coefficients from the Fertility and CPI Regressions

See Tables 8 and 9.

Table 8 Weights and coefficients in terms of the variables building the SIGI for fertility

Full size table

Table 9 Weights and coefficients in terms of the variables building the SIGI for the CPI

Full size table

Appendix 3: Reference Years of the Variables

See Table 10.

Table 10 Reference years of the variables

Full size table

Appendix 4: Sensitivity Analysis for the CPI Variable Selection

This section provides a sensitivity analysis in terms of the variables included in the CPI. Variables containing more than 60% of missing observations were dropped in the main analysis in Sect. 4. The threshold was selected based on a subjective judgement of the authors, which is not free from errors. Therefore, the effects of different threshold choices on the analysis are investigated, wherein variables containing more than 20, 40, 60 and 80% of missing observations were dropped.

Table 11 shows the results of the linear regressions of the CPIs with different thresholds on the SIGIs along with other covariates. More gender inequality associates with more corruption for all regressions. The PCRs find statistically significant association between gender inequality and corruption for all thresholds, but the PLSRs find only marginally significant associations for the thresholds 40 and 60%. Note that the coefficients, R-Squares and estimated MSEPs are not comparable across columns, since outcome variables include different variables and have different weights. Table 12 show the coefficients in terms of the variables in the SIGI. The coefficients of the PCRs are generally similar to each other, while the PLSRs show large differences across columns. Some PLSR coefficients change the signs with different thresholds, for example Son preference 4 with threshold 40 and 60%. Statistical significance do not change much across columns both for the PLSRs and the PCRs. Table 13 shows the weights. The PCA weights do not change across columns, while the PLS weights show large differences with respect to varying thresholds in analogy to the findings on the PLSR coefficients. The weights for the CPIs are reported in Table 14. Note that variables containing more than 80% of missing values are not reported in the table. The CPIs include less variables with stricter thresholds, while the number of variables range from 9 to 18. The weights from the main analysis (60% threshold) shows small differences to the weights from the other thresholds. However, variables with positive weights drop out more with stricter thresholds, so that the majority of the PCA weights with the 20 and 40% thresholds and the PLS weights with the 20% threshold are negative. Considering that CU1999 with positive weights dominate over other variables for these cases, the weights are not rotated. But the negative weighted variables work against the interpretation of the CPI that high value represents low corruption. This problem is not too serious with less strict thresholds, as shown in the main analysis.

Table 11 Linear regressions for the CPI with various thresholds

Full size table

Table 12 Coefficients in terms of the variables building the SIGI for the CPI with various thresholds

Full size table

Table 13 Weights of the SIGI for the CPI with various thresholds

Full size table

Table 14 Weights of the CPI with various thresholds

Full size table

In conclusion, using different thresholds do not result in major changes to the findings in Sect. 4. Higher gender inequality associates with more corruption, whereby the association is significant based on the PCRs with different thresholds, and marginally or not significantly based on the PLSRs. Nevertheless, the PLS weights, the PLSR coefficients in terms of the variables in the SIGI, the CPI weights and the included variables in the CPIs do change notably with different thresholds.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yoon, J., Klasen, S. An Application of Partial Least Squares to the Construction of the Social Institutions and Gender Index (SIGI) and the Corruption Perception Index (CPI). Soc Indic Res 138, 61–88 (2018). https://doi.org/10.1007/s11205-017-1655-8

Download citation

Accepted: 14 May 2017
Published: 25 May 2017
Issue Date: July 2018
DOI: https://doi.org/10.1007/s11205-017-1655-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Application of Partial Least Squares to the Construction of the Social Institutions and Gender Index (SIGI) and the Corruption Perception Index (CPI)

Abstract

Access this article