Skip to main content

Advertisement

Log in

Comparative Test Evaluation: Methods and Challenges

  • Original Paper
  • Published:
Journal of Gambling Studies Aims and scope Submit manuscript

Abstract

The present paper has three objectives. First, methods for comparing alternative tests for the purpose of replacement of one test with a second presumably superior test are described. Second, problems in the interpretation of the relevance of different diagnostic thresholds (thresholds of positivity) that define who is and who is not a disordered gambler are examined and a potential solution offered in the form of a common quantitative measure of the risk of being a disordered gambler. Third, alternative methodologies are described as potential solutions to the lack of a gold or reference standard in the evaluation of new tests.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Adapted from Lord et al. (2011)

Fig. 2

Similar content being viewed by others

Notes

  1. In this notational scheme * = multiplication and / = division. Sometimes / is used as synonym for per.

  2. It may be argued that wanted help is not a true reference standard. Such arguments are subjective, however, and reference standards by definition involve a degree of subjectivity and some degree of disagreement is expected. Wanted help appears to be a plausible and credible definition for purposes of large epidemiologic surveys and in theory the question can be resolved through empirical analysis relative to other proposed reference standards to determine which reference standard is “best”. Moreover, if the same reference standard is applied then differences in accuracy are empirical and observable.

  3. A test with high sensitivity (specificity) but poor specificity (sensitivity) may be supplemented by applying a second test with high specificity (sensitivity) to all positive (negative) outcomes on the first test thus eliminating false positives (negatives) and increasing the overall specificity (sensitivity) of the testing process.

  4. Most presentations focus on the overlap in CI, however, the simpler ‘contained in’ language presented here is not uncommon and easy to understand. Many readers may be unfamiliar with the expansive and sometimes complex discussions in the statistical literature on CI.

  5. The results presented here will differ with those of Williams and Volberg since the reconstruction was based on original thresholds for CPGI (≥ 8), NODS (≥ 5) and SOGS (≥ 5) whereas the authors employed a threshold ≥ 3 for all tests.

  6. The authors note the procedure likely attracted greater numbers of [disordered] gamblers and heavier gamblers to participate, hence an increase in average levels of severity among gamblers in the survey. Since heavier gamblers more closely resemble disordered gamblers fpr will increase with severity.

  7. Most test evaluation studies treat false negatives and false positives as equally important; although statistically convenient for purposes of evaluation, in most situations and settings one or the other error will be of greater importance; hence initial thresholds reported in evaluation studies are unlikely to be applicable to other settings and populations and the need for replication is again emphasized.

  8. The expected decrease in LR+ reflects the greater number of true and false positive results that corresponds to an increase in P. The LR has been defined as the weight of the test evidence. The expected higher number of positive tests therefore carry less diagnostic information or weight hence a decrease in LR+. Similar logic applies to LR− and negative test results.

  9. In the practice setting the clinician acquires diagnostic information from a number of sources prior to applying a test for disordered gambling. Sources might include information from taking a clinical history, e.g., weekly gambling, information from a significant other, applying one or more supplemental tests, e.g., CAGE alcohol screen, and so on. Each piece of information raises or lowers the initial pre-test probability leading to some estimate including the diagnostic test resulting in different prior probabilities, e.g., 30%, for identifiable classes of gamblers.

References

  • Aertgeerts, B., Buntix, F., Ansoms, S., & Fevery, A. (2001). Screening properties of questionnaires and laboratory tests for the detection of alcohol abuse or dependence in a general practice population. British Journal of General Practice, 51, 206–217.

    CAS  Google Scholar 

  • Ahlberg, A. J., Park, J. W., Hager, B. W., Brock, M. V., & Diener-West, M. (2004). The use of “overall accuracy” to evaluate the validity of screening or diagnostic tests. Journal of General Internal Medicine, 19, 460–465.

    Google Scholar 

  • American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders (4th ed.). Washington, DC: American Psychiatric Association.

    Google Scholar 

  • Austin, P. C., E, J., & Hux., J. E. (2002). A brief note on overlapping confidence intervals. Journal of Vascular Surgery, 36, 194–195.

    PubMed  Google Scholar 

  • Back, K., Williams, R. J., & Lee, C. (2015). Reliability and validity of three instruments (DSM-IV, CPGI, and PPGM) in the assessment of problem gambling in South Korea. Journal of Gambling Studies, 31, 775–786.

    PubMed  Google Scholar 

  • Bertens, L. C. M., Broekhuizen, B. D. L., Naaktgeboren, C. A., et al. (2013). Use of expert panels to define the reference standard in diagnostic research: A systematic review of published methods and reporting. PLoS Medicine, 10, e.1–e.17.

    Google Scholar 

  • Biggerstaff, B. G. (2000). Comparing diagnostic tests: A simple graphic using likelihood ratios. Statistics in Medicine, 19, 649–663.

    CAS  PubMed  Google Scholar 

  • Bonett, D. G., & Price, R. M. (2012). Adjusted Wald confidence interval for a difference of binomial proportions based on paired data. Journal of Educational and Behavioral Statistics, 37, 479–488.

    Google Scholar 

  • Bossuyt, P. M., Reitsma, J. B., Bruns, D. E., et al. (2003). Towards complete and accurate reporting of studies of diagnostic accuracy: The STARD initiative. BMJ, 326, 41–44.

    PubMed  PubMed Central  Google Scholar 

  • Boudreau, B., & Poulin, C. (2007). The South Oaks Gambling Screen-Revised Adolescent (SOGS-RA) revisited: A cut-point analysis. Journal of Gambling Studies, 23, 299–308.

    PubMed  Google Scholar 

  • Boyko, E. J. (1994). Ruling out or ruling in disease with the most sensitive or specific diagnostic test. Medical Decision Making, 14, 175–179.

    CAS  PubMed  Google Scholar 

  • Braun, B., Ludwig, M., Sleczka, P., Buhringer, G., & Kraus, L. (2014). Gamblers seeking treatment: Who does and who doesn’t? Journal of Behavioral Addictions, 3, 189–198.

    PubMed  PubMed Central  Google Scholar 

  • Browne, M., Langham, E., Rawat, V., Greer, N., Li, E., Rose, J., et al. (2016). Assessing gambling-related harm in Victoria: A public health perspective. Melbourne: Victorian Responsible Gambling Foundation.

    Google Scholar 

  • Calado, F., Alexandre, J., & Griffiths, M. D. (2017). Prevalence of adolescent problem gambling: A systematic review of recent research. Journal of Gambling Studies, 33, 397–424.

    PubMed  Google Scholar 

  • Calado, F., & Griffiths, M. D. (2016). Problem gambling worldwide: An update and systematic review of empirical research (2000–2015). Journal of Behavioral Addictions, 5, 592–613.

    PubMed  PubMed Central  Google Scholar 

  • Challet-Bouju, G., Perrot, B., Romo, et al. (2016). Harmonizing screening for gambling problems in epidemiological surveys—Development of the rapid screener for problem gambling (RSPG). Journal of Behavioral Addictions, 5, 239–250.

    PubMed  PubMed Central  Google Scholar 

  • Chock, C., Irwig, L., Berry, G., & Glasziou, P. (1997). Comparing dichotomous screening tests when individuals negative on both tests are not verified. Journal of Clinical Epidemiology, 50, 1211–1217.

    CAS  PubMed  Google Scholar 

  • Cook, N. R. (2008). Statistical evaluation of prognostic versus diagnostic models: Beyond the ROC curve. Clinical Chemistry, 54, 17–23.

    CAS  PubMed  Google Scholar 

  • Cumming, G. & Fidler, F. (2005). Interval estimates for statistical communication: Problems and possible solutions. In IASE/ISI satellite (pp. 1–7).

  • Currie, S. K., Hodgins, D. C., Wang, J., el-Guebaly, N., Wynne, H., & Chen, S. (2006). Risk of harm among gamblers in the general population as a function of level of participation in gambling activities. Addiction, 101, 570–580.

    PubMed  Google Scholar 

  • Edgren, R., Castrén, S., Mäkelä, M., et al. (2016). Reliability of instruments measuring at-risk and problem gambling among young individuals: A systematic review covering years 2009–2015. Journal of Adolescent Health, 58, 600–615.

    Google Scholar 

  • Enoe, C., Georgiadis, M. P., & Johnson, W. O. (2000). Estimation of sensitivity and specificity of diagnostic tests and disease prevalence when the true disease state is unknown. Preventive Veterinary Medicine, 45, 61–81.

    CAS  PubMed  Google Scholar 

  • Feinstein, A. R. (1985). Clinical epidemiology: The architecture of clinical research. Philadelphia: Saunders Press.

    Google Scholar 

  • Ferris, J., & Wynne, H. (2001). The Canadian Problem Gambling Index: Final report. Ottawa: Canadian Centre on Substance Abuse.

    Google Scholar 

  • Fleiss, J. L. (1981). Statistical methods for rates and proportions. New York: Wiley.

    Google Scholar 

  • Gallagher, E. J. (1998). Clinical utility of likelihood ratios. Annals of Emergency Medicine, 31, 391–397.

    CAS  PubMed  Google Scholar 

  • Gambino, B. (1997). The correction for bias in prevalence estimation with screening tests. Journal of Gambling Studies, 13, 343–351.

    CAS  PubMed  Google Scholar 

  • Gambino, B. (2005). Interpreting prevalence estimates of pathological gambling: Implications for policy. Journal of Gambling Issues, 14. http://www.camh.net/egambling/issue14/jgi_14_gambino.html.

  • Gambino, B. (2006). Reflections on accuracy. Journal of Gambling Studies, 22, 393–404.

    PubMed  Google Scholar 

  • Gambino, B. (2012). The validation of screening tests: Meet the new screen same as the old screen? Journal of Gambling Studies, 28, 573–605.

    PubMed  Google Scholar 

  • Gambino, B. (2014). Setting criterion thresholds for estimating prevalence: What is being validated? Journal of Gambling Studies, 30, 577–607.

    PubMed  Google Scholar 

  • Gambino, B. (in press). Test performance variation between settings and populations. Journal of Gambling Studies. https://doi.org/10.1007/s10899-017-9728-9.

  • Gebauer, L., LaBrie, R., & Shaffer, H. J. (2010). Optimizing DSM-IV-TR classification accuracy: A brief biosocial screen for detecting current gambling disorders among gamblers in the general household population. Canadian Journal of Psychiatry, 55, 82–90.

    PubMed  Google Scholar 

  • Gerstein, D., Hoffman, J., Larison, C., Murphy, S., Palmer, A., Chuchro, L., et al. (1999). Gambling impact and behavior study. Report to the National Gambling Impact Study Commission. Chicago: NORC.

    Google Scholar 

  • Goodie, A. S., MacKillop, J., Miller, J. D., et al. (2013). Evaluating the South Oaks Gambling Screen with DSM-IV and DSM-5 criteria: Results from a diverse community sample of gamblers. Assessment, 20, 523–531.

    PubMed  PubMed Central  Google Scholar 

  • Gunther, C.C., Bakke, O., Lydersen, S. & Langaas, M. (2008). Comparison of predictive values from two diagnostic tests in large samples. Statistics No. 9. Department of Mathematical Sciences, Norwegian University of Science and Technology Trondheim, Norway.

  • Hawkins, D. M., Garrett, J. A., & Stephenson, B. (2001). Some issues in resolution of diagnostic tests using an imperfect gold standard. Statistics in Medicine, 20, 1987–2001.

    CAS  PubMed  Google Scholar 

  • Hayen, A., Macaskill, P., Irwig, L., & Bossuyt, P. (2010). Appropriate statistical methods are required to assess diagnostic tests for replacement, add-on, and triage. Journal of Clinical Epidemiology, 63, 883–891.

    PubMed  Google Scholar 

  • Henderson, A. R. (1993). Assessing test accuracy and its clinical consequences: A primer for receiver operating characteristic curve analysis. Annals of Clinical Biochemistry, 30, 521–539.

    PubMed  Google Scholar 

  • Himelhoch, S. S., Miles-McLean, H., Medoff, D. R., et al. (2015). Evaluation of brief screens for gambling disorder in the substance use treatment setting. The American Journal on Addictions, 24, 460–466.

    PubMed  Google Scholar 

  • Hui, S. L., & Zhou, X. H. (1998). Evaluation of diagnostic tests without gold standards. Statistical Methods in Medical Research, 7, 354–370.

    CAS  PubMed  Google Scholar 

  • Jiménez-Murcia, S., Stinchfield, R., Alvarez-Moya, E., Jaurrieta, N., Bueno, B., Granero, R., et al. (2009). Reliability, validity, and classification accuracy of a Spanish translation of a measure of DSM-IV diagnostic criteria for pathological gambling. Journal of Gambling Studies, 25, 93–104.

    PubMed  Google Scholar 

  • Johansson, A., Grant, J. E., Kim, S. W., Odlang, B. L., & Gotestam, G. (2009). Risk factors for problematic gambling: A critical literature review. Journal of Gambling Studies, 25, 67–92.

    PubMed  Google Scholar 

  • Kessler, R. C., Abelson, J., Demler, O., et al. (2004). Clinical calibration of DSM-IV diagnoses in the World Mental Health (WMH) version of the World Health Organization (WHO) Composite International Diagnostic Interview (WMH-CIDI). International Journal of Methods in Psychiatric Research, 13, 122–139.

    PubMed  Google Scholar 

  • Knottnerus, J. A., & Buntix, F. (Eds.). (2009). The evidence base of clinical diagnosis. London: BMJ Publishing Group.

    Google Scholar 

  • Knottnerus, J. A., & Muris, J. W. (2009). Assessment of the accuracy of diagnostic tests: The cross-sectional study. In J. A. Knottnerus & F. Buntix (Eds.), The evidence base of clinical diagnosis (pp. 42–62). London: BMJ Publishing Group.

    Google Scholar 

  • Kroenke, K., & Spitzer, R. L. (2002). The PHQ-9: A new depression diagnostic and severity measure. Psychiatric Annals, 32, 1–7.

    Google Scholar 

  • Ladouceur, R. (2005). Controlled gambling for pathological gamblers. Journal of Gambling Studies, 21, 49–59.

    PubMed  Google Scholar 

  • Ladouceur, R., Lachance, S., & Fournieus, P. M. (2009). Is control a viable goal in the treatment of pathological gambling? Behaviour Research and Therapy, 47, 189–197.

    PubMed  Google Scholar 

  • Lesieur, H. R., & Blume, S. B. (1987). South Oaks Gambling Screen (SOGS): A new instrument for the identification of pathological gamblers. American Journal of Psychiatry, 144, 1184–1188.

    CAS  Google Scholar 

  • Lichtenstein, M., & Kiefe, C. (1990). Incorporating severity of illness into estimates of likelihood ratios. American Journal of the Medical Sciences, 299, 38–42.

    CAS  Google Scholar 

  • Lijmer, J. G., & Bossuyt, P. M. M. (2009). Diagnostic testing and prognosis: The randomized controlled trial in test evaluation research. In J. A. Knottnerus & F. Buntix (Eds.), The evidence base of clinical diagnosis (pp. 63–82). London: BMJ Publishing Group.

    Google Scholar 

  • Lord, S. J., Staub, L. P., Bossuyt, P., & Irwig, L. M. (2011). Target practice: Choosing target conditions for test accuracy studies that are relevant to clinical practice. BMJ, 343, 1–5.

    Google Scholar 

  • Mackinnon, A. (2000). A spreadsheet for the calculation of comprehensive statistics for the assessment of diagnostic tests and inter-rater agreement. Computers in Biology and Medicine, 30, 127–134.

    CAS  PubMed  Google Scholar 

  • McMillen, J., & Wenzel, M. (2006). Measuring problem gambling: Assessment of three prevalence screens. International Gambling Studies, 6, 147–174.

    Google Scholar 

  • Mercaldo, N. D., Zhou, X. & Lau, K. F. (December, 2005). Confidence intervals for predictive values using data from a case control study. UW Biostatistics Working Paper Series. Working Paper 271.

  • Merkouris, S. S., Thomas, S. A., Browning, C. J., & Dowling, N. A. (2016). Predictors of outcomes of psychological treatments for disordered gambling: A systematic review. Clinical Psychology Review, 48, 7–31.

    CAS  PubMed  Google Scholar 

  • Miettinen, O. S., & Nurminen, M. Ô. (1985). Comparative analysis of two rates. Statistics in Medicine, 4, 213–226.

    CAS  PubMed  Google Scholar 

  • Milosevic, A., & Ledgerwood, D. M. (2010). The subtyping of pathological gambling: A comprehensive review. Clinical Psychology Review, 30, 988–998.

    PubMed  Google Scholar 

  • Moskowitz, C. S., & Pepe, M. S. (2006). Comparing the predictive values of diagnostic tests: Sample size and analysis for paired study designs. Clinical Trials, 3, 272–279.

    PubMed  Google Scholar 

  • Naeger, D. M., Kohi, M. P., Webb, E. M., Phelps, A., Ordovas, K. G., & Newman, T. B. (2013). Correctly using sensitivity, specificity, and predictive values in clinical practice: How to avoid three common pitfalls. AJR, 200, W566–W570.

    PubMed  Google Scholar 

  • Nam, J. (2009). Efficient interval estimation of a ratio of marginal probabilities in matched-pair data: Non-iterative method. Statistics in Medicine, 15(28), 2929–2935.

    Google Scholar 

  • Nestor, M. R. (1996). An applied statistician’s creed. Applied Statistics, 45, 401–410.

    Google Scholar 

  • Newcombe, R. G. (1998). Two-sided confidence intervals for the single proportion: Comparison of seven methods. Statistics in Medicine, 17, 857–872.

    CAS  PubMed  Google Scholar 

  • Obuchowski, N. A., & Zhou, X. (2002). Prospective studies of diagnostic test accuracy when disease prevalence is low. Biostatistics, 3, 477–492.

    PubMed  Google Scholar 

  • Payton, M. E., Greenstone, M. H., & Schenker, N. (2003). Overlapping confidence intervals or standard error intervals: What do they mean in terms of statistical significance. Journal of Insect Science, 34, 1–5.

    Google Scholar 

  • Pepe, M. S. (2003). The statistical evaluation of medical tests for classification and prediction. Oxford: Oxford University Press.

    Google Scholar 

  • Pepe, M. S., Feng, Z., Huang, Y., et al. (2008). Integrating the predictiveness of a marker with its performance as a classifier. American Journal of Epidemiology, 167, 362–368.

    PubMed  Google Scholar 

  • Pepe, P., & Janes, H. (2007). Insights into latent class analysis of diagnostic test performance. Biostatistics, 8, 474–484.

    PubMed  Google Scholar 

  • Petry, N., Blanco, C., Stinchfield, R., & Volberg, R. (2013). An empirical evaluation of proposed changes for gambling diagnosis in the DSM-5. Addiction, 108, 575–581.

    PubMed  Google Scholar 

  • Petry, N. M., Weinstock, J., Ledgerwood, D. M., et al. (2008). A randomized trial of brief interventions for problem and pathological gamblers. Journal of Consulting and Clinical Psychology, 76, 31–38.

    Google Scholar 

  • PGRTC, Problem Gambling Research and Treatment Centre. (2011). Guidelines for screening, assessment, and treatment in problem gambling. Clayton: Monash University.

    Google Scholar 

  • Rector, T. S., Taylor, B. C., & Wilt, T. J. (2012). Systematic review of prognostic tests. Journal of General Internal Medicine, 27(Suppl 1), S94–S101.

    PubMed  Google Scholar 

  • Reitsma, J. B., Rutjes, A. W. S., Khan, K. S., Coomarasamy, A., & Bossuyt, P. M. (2009). A review of solutions for diagnostic accuracy studies with an imperfect or missing reference standard. Journal of Clinical Epidemiology, 62, 797–806.

    PubMed  Google Scholar 

  • Rutjes, A. W., Reitsma, J. B., Coomarasamy, A., Khan, K. S., & Bossuyt, P. M. (2007). Evaluation of diagnostic tests when there is no gold standard. A review of methods. Health, Technological Assessment, 11, 50.

    Google Scholar 

  • Sacco, P., Torres, L. R., Cunningham-Williams, R. M., Woods, C., & Unick, G. J. (2011). Differential item functioning of pathological gambling criteria: An examination of gender, race/ethnicity, and age. Journal of Gambling Studies, 27, 317–330.

    PubMed  PubMed Central  Google Scholar 

  • Sackett, D. L., Haynes, R. B., & Tugwell, P. (1985). Clinical epidemiology: A basic science for clinical medicine (2nd ed.). Boston: Little Brown.

    Google Scholar 

  • Sassen, M., Kraus, L., & Buhringer, G. (2011). Differences in pathological gambling prevalence estimates. International Journal of Methods in Psychiatric Research, 20, e83–e99.

    PubMed  PubMed Central  Google Scholar 

  • Schatzkin, A., Connor, R. J., Taylor, P. R., & Bunnag, B. (1987). Comparing new and old screening tests when a reference procedure cannot be performed on all screenees. American Journal of Epidemiology, 125, 672–678.

    CAS  PubMed  Google Scholar 

  • Schenker, N., & Gentleman, J. F. (2001). On judging the significance of differences by examining the overlap between confidence intervals. American Statistician, 55, 182–186.

    Google Scholar 

  • Shaffer, H. J., & Hall, M. N. (1996). Estimating the prevalence of adolescent gambling disorders: A quantitative synthesis and guide toward standard gambling nomenclature. Journal of Gambling Studies, 12, 193–214.

    CAS  PubMed  Google Scholar 

  • Shaffer, H. J., Hall, M. N., & Vander Bilt, J. V. (1999). Estimating the prevalence of disordered gambling behaviour in the United States and Canada. American Journal of Public Health, 89, 1369–1376.

    CAS  PubMed  PubMed Central  Google Scholar 

  • Simel, D. L., Samsa, G. P., & Matchar, D. B. (1991). Likelihood ratios with confidence: Sample size estimation for diagnostic test studies. J Clinical Epidemiology, 44, 763–770.

    CAS  Google Scholar 

  • Smith, D., Harvey, P., Humeniuk, R., Battersby, M., & Pols, R. (2015). Effects of affective and anxiety disorders on outcome in problem gamblers attending routine cognitive behavioural treatment in South Australia. Journal of Gambling Studies, 31, 1047–1068.

    Google Scholar 

  • Sonis, J. (1999). How to use and interpret interval likelihood ratios. Family Medicine, 31, 432–437.

    CAS  PubMed  Google Scholar 

  • Steinberg, D. M., Fine, J., & Chappell, R. (2009). Sample size for positive and negative predictive value in diagnostic research using case–control designs. Biostatistics, 10, 94–105.

    PubMed  Google Scholar 

  • Steyerberg, E. W., Vickers, A. J., Cook, N. R., et al. (2010). Assessing the performance of prediction models: A framework for some traditional and novel measures. Epidemiology, 21, 128–138.

    PubMed  PubMed Central  Google Scholar 

  • Stinchfield, R. (2003). Reliability, validity, and classification accuracy of a measure of DSM-IV diagnostic criteria for pathological gambling. American Journal of Psychiatry, 160, 180–182.

    Google Scholar 

  • Stinchfield, R. (2010). A critical review of adolescent problem gambling assessment instruments. International Journal of Adolescent Medicine and Health, 22, 77–93.

    PubMed  Google Scholar 

  • Stinchfield, R. (2014). A review of problem gambling assessment instruments and brief screens. In D. C. S. Richard, A. Blaszczynski, & L. Nower (Eds.), Handbook of disordered gambling (pp. 165–203). New York: Wiley-Blackwell.

    Google Scholar 

  • Stinchfield, R., Govoni, R., & Frisch, G. R. (2005). DSM-IV diagnostic criteria for pathological gambling: Reliability, validity, and classification accuracy. Am J Addict, 14, 73–82.

    PubMed  Google Scholar 

  • Stinchfield, R., McCready, J., & Govoni, R. (2012a). Cross-validation of the Windsor Ontario Gambling Problem Severity Item pool. Ontario: Healthy Horizons Consulting.

    Google Scholar 

  • Stinchfield, R., McCready, J., & Turner, N. (2012b). A comprehensive review of problem gambling screens and scales for online self-assessment. Toronto, Ontario: Healthy Horizons Consulting.

    Google Scholar 

  • Strong, D. R., & Kahler, C. W. (2007). Evaluation of the continuum of gambling problems using the DSM-IV. Addiction, 102, 713–721.

    PubMed  Google Scholar 

  • Strong, D. R., Lesieur, H. R., Breen, R. B., Stinchfield, R., & Lejuez, C. W. (2004). Using a Rasch model approach to examine the utility of the SOGS screen across pathological and nonpathological gamblers. Addictive Behaviors, 29, 465–481.

    PubMed  Google Scholar 

  • Stucki, S., & Rihs-Middel, M. Prevalence. (2007). Prevalence of problem and pathological gambling between 2000 and 2005: An update. Journal of Gambling Studies, 23, 245–257.

    PubMed  Google Scholar 

  • Tang, M. L. (2004). On simultaneous assessment of sensitivity and specificity while combining two diagnostic tests. Statistics in Medicine, 23, 3593–3605.

    PubMed  Google Scholar 

  • Tang, M., Li, H., & Tang, N. (2012). Confidence interval construction for proportion ratio in paired studies based on hybrid method. Statistical Methods in Medical Research, 21, 361–378.

    PubMed  Google Scholar 

  • Thomas, S., Jackson, A., & Blaszczynski, A. (2003). Measuring problem gambling: Evaluation of the Victorian Gambling Screen. Melbourne: Gambling Research Panel.

    Google Scholar 

  • Toce-Gerstein, M., Gerstein, D. R., & Volberg, R. A. (2009). The NODS-CLIP: A rapid screen for adult pathological and problem gambling. Journal of Gambling Studies, 25, 541–555.

    PubMed  PubMed Central  Google Scholar 

  • Tolchard, & Battersby, M. W. (2010). The Victorian Gambling Screen: Reliability and validation in a clinical population. Journal of Gambling Studies, 26, 623–638.

    CAS  PubMed  Google Scholar 

  • Trikalinos, T. A., & Balion, C. M. (2012). Chapter 9: Options for summarizing medical test performance in the absence of a “gold standard”. Journal of General Internal Medicine, 27(Suppl 1), S67–S75.

    PubMed  Google Scholar 

  • Usher-Smith, J. A., Sharp, S. J., & Griffin, S. J. (2016). The spectrum effect in tests for risk prediction, screening, and diagnosis. BMJ, 353, 1–5.

    Google Scholar 

  • Volberg, R. A., Gupta, R., Griffiths, M. D., Olason, D., & Delfabbro, P. (2010). An international perspective on youth gambling prevalence studies. International Journal of Adolescent Medicine and Health, 22, 3–38.

    PubMed  Google Scholar 

  • Wenzel, M., McMillen, J., Marshall, D., & Ahmed, E. (2004). Validation of the Victorian Gambling Screen. Melbourne: Australian National University.

    Google Scholar 

  • Williams, R. J. & Volberg, R. A. (2010). Best practices in the population assessment of problem gambling. Report prepared for the Ontario Problem Gambling Research Centre. Guelph, Ontario, Canada, March 31, 2010.

  • Williams, R. J., Volberg, R. A. & Stevens, R. M. G. (2012). The population prevalence of problem gambling: Methodological influences, standardized rates, jurisdictional differences, and worldwide trends. Report prepared for the Ontario Problem Gambling Research Centre and the Ontario Ministry of Health and Long Term Care, May 8, 2012.

  • Winters, K., Stinchfield, R., & Fulkerson, J. (1993). Toward the development of an adolescent gambling problem severity scale. Journal of Gambling Studies, 9, 63–84.

    Google Scholar 

  • Zaane, B., Vergouwe, Y., Rogier, A. T., Donders, T., & Moons, K. (2012). Comparison of approaches to estimate confidence intervals of post-test probabilities of diagnostic test results in a nested case-control study. BMC Medical Research Methodology, 12, 166.

    PubMed  PubMed Central  Google Scholar 

  • Zhou, X., Obuchowski, N. A., & McClish, D. K. (2002). Statistical methods in diagnostic medicine. New York: Wiley.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Blase Gambino.

Ethics declarations

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gambino, B. Comparative Test Evaluation: Methods and Challenges. J Gambl Stud 34, 1109–1138 (2018). https://doi.org/10.1007/s10899-018-9745-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10899-018-9745-3

Keywords

Navigation