Skip to main content
Log in

Factor analysis for paired ranked data with application on parent–child value orientation preference data

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Ranking data appear in everyday life and arise in many fields of study such as marketing, psychology and politics. Very often, the key objective of analyzing and modeling ranking data is to identify underlying factors that affect the individuals’ choice behavior. Factor analysis for ranking data is one of the most widely used methods to tackle the aforementioned problem. Recently, Yu et al. [J R Stat Soc Ser A (Statistics in Society) 168:583–597, 2005] have developed factor models for ranked data in which each individual is asked to rank a set of items. However, paired ranked data may arise when the same set of items are ranked by a pair of judges such as a couple in a family. This paper extended the factor model to accommodate such paired ranked data. The Monte Carlo expectation-maximization algorithm was used for parameter estimation, at which the E-step is implemented via the Gibbs Sampler. For model assessment and selection, a tailor-made method called the bootstrap predictive checks approach was proposed. Simulation studies were conducted to illustrate the proposed estimation and model selection method. The proposed method was applied to analyze a parent–child partially ranked data collected from a value priorities survey carried out in the United States.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. The derivation applies the properties of the bivariate normal distribution. See Section 4.3.8 “Moments and Absolute Moments” in Hutchinson and Lai (1990).

  2. In our study, the value of \(h\) is monitored and so far no violation of the condition \(h > 0\) is detected.

  3. If \({\varvec{A}}\) and \({\varvec{D}}\) are symmetric, then

    $$\begin{aligned} \begin{pmatrix} {\varvec{A}}&\quad {\varvec{B}} \\ {\varvec{B}}^T&\quad {\varvec{D}} \\ \end{pmatrix} = \begin{pmatrix} {\varvec{A}}^{-1}+{\varvec{F}}{\varvec{E}}{\varvec{F}}^T&\quad -{\varvec{F}}{\varvec{E}}^{-1} \\ -{\varvec{E}}^{-1}{\varvec{F}}^T&{\varvec{E}}^{-1} \\ \end{pmatrix}, \end{aligned}$$

    where \({\varvec{E}}={\varvec{D}}-{\varvec{B}}^T{\varvec{A}}^{-1}{\varvec{B}}\) and \({\varvec{F}}={\varvec{A}}^{-1}{\varvec{B}}\).

References

  • Barnes SH, Kaase M, Allerbeck KR, Farah BG, Heunks F, Inglehart R, Jennings MK, Klingemann HD, Marsh A, Rosenmayr L (1979) Political action: mass participation in five western democracies. Sage, Beverly Hills, CA

    Google Scholar 

  • Barnes SH, Samuel H, Kaase M (1999) Political action: an eight nation study, 1973–1976 (Computer file). ICPSR version. Conducted by University of Michigan, Survey Research Center. ICPSR

  • Blackwell D (1947) Conditional expectation and unbiased sequential estimation. Ann Math Stat 18:105–110

    Article  MathSciNet  MATH  Google Scholar 

  • Bock RD, Böckenholt U (2005) Nominal categories model. In: Kempf-Leonard K (ed) Encyclopedia of social measurement. Elsevier, Amsterdam

    Google Scholar 

  • Bock RD, Gibbons RD (1996) High-dimensional multivariate probit analysis. Biometrics 52:1183–1194

    Article  MathSciNet  MATH  Google Scholar 

  • Böckenholt U (1996) Analysing multiattribute ranking data: joint and conditional approaches. Br J Math Stat Psychol 49:57–78

    Article  MATH  Google Scholar 

  • Booth JG, Hobert JP (1999) Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm. J R Stat Soc Ser B (Methodological) 61:265–285

    Article  MATH  Google Scholar 

  • Chan JSK, Kuk AYC (1997) Maximium likelihood estimation for probit-linear mixed models with correlated random effects. Biometrics 53:86–97

    Article  MathSciNet  MATH  Google Scholar 

  • Cudeck R (1988) Multiplicative models and MTMM matrices. J Educ Stat 13:131–147

    Article  Google Scholar 

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood for incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodological) 38:1–38

    MathSciNet  Google Scholar 

  • Dunham W (1990) Journey through genius: the great theorems of mathematics. Wiley, New York

    MATH  Google Scholar 

  • Gelfand AE, Smith AFM (1990) Sampling-based approaches to calculating marginal densities. J Am Stat Assoc 85:398–409

    Article  MathSciNet  MATH  Google Scholar 

  • Gelman A, Meng XL, Stern HS (1996) Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica 6:733–807

    MathSciNet  MATH  Google Scholar 

  • Geman S, Geman D (1984) Stochastic simulation, gibbs distributions, and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6:721–741

    Google Scholar 

  • Geweke J (1991) Efficient simulation from the multivariate normal and student-t distributions subject to linear constraints. In: Computer science and statistics: proceedings of the twenty-third symposium on the interface pp 571–578

  • Hajivassiliou V, McFadden D (1990) The method of simulated scores for the estimation of LDV models with an application to external debt crises. Cowles Foundation Yale University discussion paper 967

  • Hutchinson TP, Lai CD (1990) Continuous bivariate distributions, emphasising applications. Rumsby Scientific, Adelaide

  • Inglehart R (1977) The silent revolution: changing values and political styles among western publics. Princeton University Press, Princeton

    Google Scholar 

  • Keane MP (1994) A computationally practical simulation estimator for panel data. Econometrica 62:95–116

    Article  MATH  Google Scholar 

  • Louis TA (1982) Finding the observed information matrix when using the EM algorithm. J R Stat Soc Ser B (Methodological) 44:226–233

    MathSciNet  MATH  Google Scholar 

  • Maydeu-Olivares A, Böckenholt U (2005) Structural equation modeling of paired comparison and ranking data. Psychol Methods 10:285–304

    Article  Google Scholar 

  • McLachlan GJ, Krishnan T (1997) The EM algorithm and extensions. Wiley, New York

    MATH  Google Scholar 

  • Meng XL, Schilling S (1996) Fitting full-information item factor models and an empirical investigation of bridge sampling. J Am Stat Assoc 91:1254–1267

    Google Scholar 

  • Meng XL, Wong WH (1996) Simulating ratios of normalizing constants via a simple identity: a theoretical exploration. Statistica Sinica 6:831–860

    MathSciNet  MATH  Google Scholar 

  • Ogasawara H (2009) Asymptotic expansions in the singular value decomposition for cross covariance and correlation under nonnormality. Ann Inst Stat Math 61:995–1017

    Article  MathSciNet  Google Scholar 

  • Rao CR (1965) Linear statistical inference and its applications. Wiley, London

    MATH  Google Scholar 

  • Rubin DB (1984) Bayesianly justifiable and relevant frequency calculations for the applied statistician. Ann Stat 12:1151–1172

    Article  MATH  Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464

    Article  MATH  Google Scholar 

  • Thurstone LL (1947) Multiple factor analysis. University of Chicago Press, Chicago

    MATH  Google Scholar 

  • Tsai RC, Yao G (2000) Testing Thurstonian case V ranking models using posterior predictive checks. Br J Math Stat Psychol 53:275–292

    Google Scholar 

  • van Dyk D (2000) Nesting EM algorithms for computational efficiency. Statistica Sinica 10:203–225

    MathSciNet  MATH  Google Scholar 

  • Wegelin JA, Packer A, Richardson TS (2006) Latent models for cross-covariance. J Multivar Anal 97: 79–102

    Google Scholar 

  • Wei GCG, Tanner MA (1990) A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. J Am Stat Assoc 85:699–704

    Article  Google Scholar 

  • Yao KG, Böckenholt U (1999) Bayesian estimation of Thurstonian ranking models based on the Gibbs sampler. Br J Math Statist Psychol 52:79–92

    Article  Google Scholar 

  • Yu PLH (2000) Bayesian analysis of order-statistics models for ranking data. Psychometrika 65:281–299

    Article  Google Scholar 

  • Yu PLH, Lam KF, Lo SM (2005) Factor analysis for ranked data with application to a job selection attitude survey. J R Stat Soc Ser A (Statistics in Society) 168:583–597

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Paul H. Lee.

Additional information

The research of Philip L. H. Yu was supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. HKU 7473/05H). We thank the two anonymous referees for their helpful suggestions for improving this article.

Appendix

Appendix

The complete-data log-likelihood function is:

$$\begin{aligned}&= \! - \frac{n}{2} \; \left\{ {\sum \limits _{c=1}^{2}} \log \left| {\varvec{\varPsi }} \right| + \log \left| {\varvec{\varSigma }}_{f} \right| \right\} \\&\!- \frac{1}{2}{\sum \limits _{i=1}^{n}} \!\left\{ {\sum \limits _{c=1}^{2}} tr \!\left[ {\varvec{\varPsi }}^{-1} ({\varvec{U}}_{ic} \!\!-\!\! {\varvec{\mu }}_{c} \!\!-\!\! {\varvec{\varLambda }}^{T} {\varvec{f}}_{ic}) ({\varvec{U}}_{ic} \!-\! {\varvec{\mu }}_{c} \!-\! {\varvec{\varLambda }}^{T} {\varvec{f}}_{ic})^{T} \right] \!+\! tr \left[ {\varvec{\varSigma }}_{f}^{-1} {\varvec{f}}_{i} {\varvec{f}}_{i}^{T} \right] \right\} , \end{aligned}$$

where

$$\begin{aligned} {\varvec{\varSigma }}_{f}= \begin{pmatrix} {\varvec{I}}_d\quad&\quad {\varvec{\rho }}_{12}\\ {\varvec{\rho }}_{12}&\quad {\varvec{I}}_d\\ \end{pmatrix}, \end{aligned}$$

assuming that \({\varvec{\rho }}_{12}\) is a \(d \times d\) diagonal matrix with (\(\ell ,\ell \))th element equals \({\rho }_{12,\ell }\). Then, \(\log |{\varvec{\varSigma }}_{f}|\) = \(\log \prod _{\ell =1}^d (1-\rho ^2_{12,\ell })\) = \(\sum _{\ell =1}^d \log (1-\rho ^2_{12,\ell })\).

Also,Footnote 3

$$\begin{aligned} {\varvec{\varSigma }}_{f}^{-1} = \begin{pmatrix} {\varvec{I}}_d+{\varvec{\rho }}_{12}({\varvec{I}}_d-{\varvec{\rho }}^2)^{-1}{\varvec{\rho }}_{12}&\quad -{\varvec{\rho }}_{12}({\varvec{I}}_d-{\varvec{\rho }}_{12}^2)^{-1} \\ -({\varvec{I}}_d-{\varvec{\rho }}_{12}^2)^{-1}{\varvec{\rho }}_{12}&({\varvec{I}}_d-{\varvec{\rho }}_{12}^2)^{-1} \\ \end{pmatrix}, \end{aligned}$$

therefore,

$$\begin{aligned} \sum _{i=1}^n tr \left[ {\varvec{\varSigma }}_{f}^{-1} {\varvec{f}}_{i} {\varvec{f}}_{i}^{T} \right]&= tr\left[({\varvec{I}}_d+{\varvec{\rho }}_{12}({\varvec{I}}_d-{\varvec{\rho }}^2)^{-1}{\varvec{\rho }}_{12})\sum _{i=1}^n {\varvec{f}}_{i1}{\varvec{f}}_{i1}^T \right]\\&-tr\left[{\varvec{\rho }}_{12}({\varvec{I}}_d-{\varvec{\rho }}_{12}^2)^{-1} \sum _{i=1}^n{\varvec{f}}_{i2}{\varvec{f}}_{i1}^T\right]\\&-tr\left[({\varvec{I}}_d-{\varvec{\rho }}_{12}^2)^{-1}{\varvec{\rho }}_{12} \sum _{i=1}^n{\varvec{f}}_{i1}{\varvec{f}}_{i2}^T\right]\\&+tr\left[({\varvec{I}}_d-{\varvec{\rho }}_{12}^2)^{-1} \sum _{i=1}^n{\varvec{f}}_{i2}{\varvec{f}}_{i2}^T\right]. \end{aligned}$$

Ignoring the terms independent of \({\rho }_{12,\ell }\), this becomes \((1+\frac{{\rho }_{12,\ell }^2}{1-{\rho }_{12,\ell }^2})\sum _{i=1}^n f_{i1\ell }^2\) - \((\frac{2{\rho }_{12,\ell }}{1-{\rho }_{12,\ell }^2})\sum _{i=1}^n f_{i1\ell }f_{i2\ell }\) + \((\frac{1}{1-{\rho }_{12,\ell }^2})\sum _{i=1}^n f_{i2\ell }^2\).

Hence, the derivative of complete log-likelihood with respect to \({\rho }_{12,\ell }\) equals

$$\begin{aligned}&\frac{n{\rho }_{12,\ell }}{1-{\rho }_{12,\ell }^2} - \frac{1}{2}\left(\frac{(1-{\rho }_{12,\ell }^2)\times 2{\rho }_{12,\ell }+{\rho }_{12,\ell }^2\times 2{\rho }_{12,\ell }}{(1-{\rho }_{12,\ell }^2)^2}\right.\sum _{i=1}^n f_{i1\ell }^2\\&\left.\quad \quad \quad \quad \quad -\frac{2\left[(1-{\rho }_{12,\ell }^2)+2{\rho }_{12,\ell }^2\right]}{(1-{\rho }_{12,\ell }^2)^2} \sum _{i=1}^n f_{i1\ell }f_{i2\ell } + \frac{2{\rho }_{12,\ell }}{(1-{\rho }_{12,\ell }^2)^2} \sum _{i=1}^n f_{i2\ell }^2 \right) \\&\quad = -\frac{1}{(1-{\rho }_{12,\ell }^2)^2}\left[n \rho _{12\ell }^{3} - \left( \sum _{i=1}^{n}{f_{i1\ell } f_{i2\ell }} \right) \rho _{12\ell }^{2} + \left( \sum _{i=1}^{n}{\left[ f_{i1\ell }^{2} + f_{i2\ell }^{2} \right]} - n \right) \rho _{12\ell }\right.\nonumber \\&\qquad \qquad \qquad \qquad \qquad \left.-\sum _{i=1}^{n}{f_{i1\ell } f_{i2\ell }} \right]. \end{aligned}$$

Note that this is a cubic function and is decreasing on \(\rho _{12\ell }\). Therefore, if there is only one real solution of \(\rho _{12\ell }\), the solution will maximize the complete log-likelihood function.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, P.L.H., Lee, P.H. & Wan, W.M. Factor analysis for paired ranked data with application on parent–child value orientation preference data. Comput Stat 28, 1915–1945 (2013). https://doi.org/10.1007/s00180-012-0387-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-012-0387-0

Keywords

Navigation