Skip to main content
Log in

Human ratings take time: A hierarchical facets model for the joint analysis of ratings and rating times

  • Original Manuscript
  • Published:
Behavior Research Methods Aims and scope Submit manuscript

Abstract

Performance assessments increasingly utilize onscreen or internet-based technology to collect human ratings. One of the benefits of onscreen ratings is the automatic recording of rating times along with the ratings. Considering rating times as an additional data source can provide a more detailed picture of the rating process and improve the psychometric quality of the assessment outcomes. However, currently available models for analyzing performance assessments do not incorporate rating times. The present research aims to fill this gap and advance a joint modeling approach, the “hierarchical facets model for ratings and rating times” (HFM-RT). The model includes two examinee parameters (ability and time intensity) and three rater parameters (severity, centrality, and speed). The HFM-RT successfully recovered examinee and rater parameters in a simulation study and yielded superior reliability indices. A real-data analysis of English essay ratings collected in a high-stakes assessment context revealed that raters exhibited considerably different speed measures, spent more time on high-quality than low-quality essays, and tended to rate essays faster with increasing severity. However, due to the significant heterogeneity of examinees’ writing proficiency, the improvement in the assessment’s reliability using the HFM-RT was not salient in the real-data example. This discussion focuses on the advantages of accounting for rating times as a source of information in rating quality studies and highlights perspectives from the HFM-RT for future research on rater cognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

The JAGS codes for the HFM-RT and the FM-SC, along with a simulated dataset, are publicly accessible on the Open Science Framework (https://osf.io/nhprs/).

Notes

  1. These settings were partially informed by the real-data study results reported later.

  2. Thinning is inefficient in reducing autocorrelation (Link & Eaton, 2012).

  3. We also compared the parameter estimates before and after thinning the Markov chain (with a thinning interval of 10). The estimates proved highly concordant. Therefore, we opted for the Markov chain without thinning (the larger effective sample size leads to more accurate inference).

  4. The rating scale and its categories are not publicly available.

  5. The bivariate distributions between response time and rating scores are available at https://osf.io/nhprs/.

  6. Due to the enormous dataset size, either analysis took about 240 hours each, using a workstation with CPUs of 2.4 GHz and RAM of 384 GB.

  7. Because true parameter values are unavailable in empirical studies, the assessment’s reliability was calculated as \(1-\overline{{SE(\widehat{\mathrm{\uptheta }})}^{2}}/{\hat{\mathrm{\upsigma }}}_{\mathrm{\uptheta }}^{2}\). The high reliabilities from both models and the consequently small reliability difference can be explained by the pronounced heterogeneity of examinees’ English writing proficiency (see top left panel of Fig. 3).

  8. The expected rating time can be calculated as follows: \({\hat{T}}_{nk}=\mathrm{exp}\left({\hat{\upbeta }}_{n}-{\hat{{\upzeta }}_{k}}+{\hat{\upsigma }}_{\upepsilon }^{2}/2\right)\).

References

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kuan-Yu Jin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jin, KY., Eckes, T. Human ratings take time: A hierarchical facets model for the joint analysis of ratings and rating times. Behav Res (2023). https://doi.org/10.3758/s13428-023-02259-2

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.3758/s13428-023-02259-2

Keywords

Navigation