Skip to main content

Methods for Assessment of Interrater Reliability for Diagnosis and Intervention in Traditional Chinese Medicine Studies

  • Chapter
  • First Online:
Evidence-based Research Methods for Chinese Medicine

Abstract

Traditional medicines experienced an increasing interest in theoretical, experimental, and clinical research since its recognition in the Alma-Ata Declaration. Particularly, Chinese medicine (CM) was developed by a society geographic, social, and culturally different from the Western community during the last 3000 years. The diagnostic process of CM has a unique feature: patterns, the counterpart of Western diseases, are identified through a process named pattern differentiation. The collection of clinical manifestations of an individual is obtained using four examinations known as inspection, auscultation-olfaction, inquiry, and palpation. As a corollary, CM diagnosis is considered as subjective because only the five senses are used to gather meaningful clinical data and must be interpreted by an expert; no equipment or diagnostic exam was developed for collecting data for pattern differentiation until the last decades. Pattern differentiation comprises a procedure subjected to errors as any other diagnostic system, but this variability in diagnosis might have consequences: different patterns might lead to distinct treatment choices such as herbs or acupoints selection. In contrast with Western medicine that has treatment protocols for various diseases, there are no defined protocols of acupoints for patterns because of the personalized aspect of CM’s diagnostic process and the possibility of selecting acupoints using a variety of criteria. Therefore, it is important to assess simultaneously the amount of agreement—mainly among different raters—for CM diagnosis and the diagnostic accuracy for pattern differentiation to determine the validity of this traditional system in both clinical and research scenarios. In this sense, high interrater agreement (i.e. the degree to which raters achieve identical results when performing the same assessment under similar conditions) and diagnostic accuracy (i.e. the rate of correct diagnosis) are important characteristics of any model used for health classification. Previous studies investigated the agreement for pattern differentiation and/or for acupuncture prescription, though they present important limitations either from the traditional or scientific perspective. A lack of calculating and reporting statistical measures of agreement or a lack of investigating the relationship between diagnosis and therapeutic prescription was observed. Finally, the above-cited studies used real human patients, in which the true pattern was unknown and therefore it is not possible to assess the diagnostic accuracy with a gold-standard model. This chapter introduces advanced methods for assessing interrater reliability for diagnosis and intervention in CM. More specifically, this chapter discusses the choices of study design and statistical methods for measuring interrater reliability and diagnostic accuracy in the context of pattern differentiation and acupuncture prescription. Sample size calculation and proper agreement coefficients for the multinomial, univariate or multivariate scenarios are presented. The role of computational simulation as a gold-standard method is also addressed. Finally, computational methods for statistical analysis of reliability and diagnostic performance are presented and discussed in the context of reliability and diagnostic accuracy analysis in CM.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Available at http://www.unisuam.edu.br/index.php/downloads-cr.

  2. 2.

    Available at http://www.r-project.org.

  3. 3.

    Enhancing the QUAlity and Transparency Of health Research (EQUATOR network) available at http://www.equator-network.org.

  4. 4.

    Available at www.random.org.

References

  1. World Health Organization. Primary Health Care Report of the International Conference on Primary Health Care, Alma-Ata, USSR, 6–12 Sept 1978. Geneva, 1978.

    Google Scholar 

  2. T’ao L. Chinese medicine during the Chin (1127–1234) and Yuan (1234–1368) eras. Chin Med J. 1955;73(3):241–56.

    Google Scholar 

  3. Lee T, Cheng CF, Chang CS. Some early records of nervous and mental diseases in traditional Chinese medicine. Chin Med J. 1962;81:55–9.

    CAS  PubMed  Google Scholar 

  4. O’Connor J, Bensky D. Acupuncture a comprehensive text. Seattle: Eastland Press; 1987.

    Google Scholar 

  5. Ferreira AS, Lopes AJ. Chinese medicine pattern differentiation and its implications for clinical practice. Chin J Integr Med. 2011;17(11):818–23.

    Article  PubMed  Google Scholar 

  6. Weingart SN, Wilson RM, Gibberd RW, Harrison B. Epidemiology of medical error. Br Med J. 2000;320(18):774–7.

    Article  CAS  Google Scholar 

  7. Kuhn GJ. Diagnostic errors. Acad Emerg Med. 2002;9(7):740–50.

    Article  PubMed  Google Scholar 

  8. Schiff GD, Hasan O, Kim S, Abrams R, Cosby R, Lambert BL, Elstein AS, Hasler S, Kabongo ML, Krosnjar N, Odwazny R, Wisniewski MF, McNutt RA. Diagnostic errors in medicine: analysis of 583 physician-reported errors. Arch Int Med. 2009;169(20):1881–7.

    Article  Google Scholar 

  9. Sung JJY, Leung WK, Ching JYL, Lao L, Zhang G, Wu JCY, Liang SM, Xie H, Ho YP, Chan LS, Bernam B, Chan FKL. Agreements among traditional Chinese medicine practitioners in the diagnosis and treatment of irritable bowel syndrome. Aliment Pharmacol Ther. 2004;20(10):1205–10.

    Article  CAS  PubMed  Google Scholar 

  10. Zhang GG, Lee W, Bausell B, Lao L, Handwerger B, Berman B. Variability in the traditional Chinese medicine (TCM) diagnoses and herbal prescriptions provided by three TCM practitioners for 40 patients with rheumatoid arthritis. J Altern Complement Med. 2005;11(3):415–21.

    Article  CAS  PubMed  Google Scholar 

  11. Zhang GG, Singh B, Lee W, Handwerger B, Lao L, Berman B. Improvement of agreement in TCM diagnosis among TCM practitioners for persons with the conventional diagnosis of rheumatoid arthritis: effect of training. J Altern Complement Med. 2008;14(4):381–6.

    Article  PubMed  Google Scholar 

  12. Coyetaux RR, Chen W, Lindemuth CE, Tan Y, Reilly AC. Variability in the diagnosis and point selection for persons with frequent headache by traditional Chinese medicine acupuncturists. J Altern Complement Med. 2006;12(9):863–72.

    Article  Google Scholar 

  13. Mist S, Ritenbaugh C, Aickin M. Effects of questionnaire-based diagnosis and training on inter-rater reliability among practitioners of traditional Chinese medicine. J Altern Complement Med. 2009;15(7):703–9.

    Article  PubMed  PubMed Central  Google Scholar 

  14. O’Brien KA, Abbas E, Zhang J, Guo Z, Luo R, Bensoussan A, Komesaroff PA. An investigation into the reliability of Chinese medicine diagnosis according to the eight guiding principles and Zang-fu theory in Australians with hypercholesterolemia. J Altern Complement Med. 2009;15(3):259–66.

    Article  PubMed  Google Scholar 

  15. Grant SJ, Schnyer RN, Chang DH, Fahey P, Bensoussan A. Interrater reliability of Chinese medicine diagnosis in people with prediabetes. Evid Based Complem Altern Med. 2013;2013:Article ID 710892.

    Google Scholar 

  16. Birkeflet O, Laake P, Vollestad NK. Poor multi-rater reliability in TCM pattern diagnoses and variation in the use of symptoms to obtain a diagnosis. Acupunct Med. 2014;32(4):325–32.

    Article  PubMed  Google Scholar 

  17. Xu ZX, Xu J, Yan JJ, Wang YQ, Guo R, Liu GP, Yan HX, Qian P, Hong YJ. Analysis of the diagnostic consistency of Chinese medicine specialists in cardiovascular disease cases and syndrome identification based on the relevant feature for each label learning method. Chin J Integr Med. (30 Jul 2014, Epub ahead of print).

    Google Scholar 

  18. Ferreira AS. Advances in Chinese medicine diagnosis: from traditional methods to computational models. In: Kuang H, editor. Recent advances in Chinese medicine. Croatia: InTech; 2012.

    Google Scholar 

  19. Ferreira AS. Promoting the Integrative Medicine by the computerization of traditional Chinese medicine for scientific research and clinical practice: The SuiteTCM Project. J Integr Med. 2013;11(2):135–9.

    Article  Google Scholar 

  20. Ferreira AS, Pacheco AG. SimTCM: a human patient simulator with application to diagnostic accuracy studies of Chinese medicine. J Integr Med. (Dec 2014, Epub ahead of print).

    Google Scholar 

  21. de Sá Ferreira A. Statistical validation of strategies for Zang-fu single pattern differentiation. J Chin Integr Med. 2008;6(11):1109–16.

    Google Scholar 

  22. Ferreira AS. Diagnostic accuracy of pattern differentiation algorithm based on traditional Chinese medicine theory: a stochastic simulation study. Chin Med. 2009;4:24.

    Article  PubMed  PubMed Central  Google Scholar 

  23. Sá Ferreira A. Misdiagnosis and undiagnosis due to pattern similarity in Chinese medicine: a stochastic simulation study using pattern differentiation algorithm. Chin Med. 2011;6:13.

    Google Scholar 

  24. R Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.

    Google Scholar 

  25. Snow G. blockrand: Randomization for block random clinical trials. 2013. R package version 1.3.

    Google Scholar 

  26. Canty A, Ripley B. boot: Bootstrap R (S-Plus) functions. 2014. R package version 1.3-11.

    Google Scholar 

  27. Kuhn M. Contributions from Jed Wing, Steve Weston, Andre Williams, Chris Keefer, Allan Engelhardt, Tony Cooper, Zachary Mayer, Brenton Kenkel, the R Core Team and Michael Benesty. caret: Classification and Regression Training. 2014. R package version 6.0-37.

    Google Scholar 

  28. Gamer M, Lemon J, Singh IFP. irr: Various Coefficients of Interrater Reliability and Agreement. 2012. R package version 0.84.

    Google Scholar 

  29. Rotondi MA. kappaSize: Sample Size Estimation Functions for Studies of Interobserver Agreement. 2013. R package version 1.1.

    Google Scholar 

  30. Falissard B. psy: Various procedures used in psychometry. 2012. R package version 1.1.

    Google Scholar 

  31. Dragulescu AA. xlsx: Read, write, format Excel 2007 and Excel 97/2000/XP/2003 files. 2014. R package version 0.5.7.

    Google Scholar 

  32. Kottner J, Audigé L, Brorson S, Donner A, Gajewski BJ, Hróbjartsson A, Roberts C, Shoukri M, Streiner DL. Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed. J Clin Epidemiol. 2011;64(1):96–106.

    Article  PubMed  Google Scholar 

  33. MacPherson H, Altman DG, Hammerschlag R, Youping L, Taixing W, White A, Moher D, STRICTA Revision Group. Revised STandards for Reporting Interventions in Clinical Trials of Acupuncture (STRICTA): extending the CONSORT statement. PLoS Med. 2010;7(6):e1000261.

    Google Scholar 

  34. MacPherson H, White A, Cummings M, Jobst K, Rose K, Niemtzow R. Standards for reporting interventions in controlled trials of acupuncture: The STRICTA recommendations. STandards for Reporting Interventions in Controlled Trails of Acupuncture. Acupunct Med. 2002;20(1):22–5.

    Article  PubMed  Google Scholar 

  35. Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, Moher D, Rennie D, de Vet HCW, Lijmer JG. The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Ann Intern Med. 2003;138:W1–2.

    Article  PubMed  Google Scholar 

  36. Altaye M, Donner A, Eliasziw M. A general goodness-of-fit approach for inference procedures concerning the kappa statistic. Stat Med. 2011;20(16):2479–88.

    Article  Google Scholar 

  37. Siemiatycki J, Campbell S. Nonresponse bias and early versus all responders in mail and telephone surveys. Am J Epidemiol. 1984;120(2):291–301.

    CAS  PubMed  Google Scholar 

  38. Shin BC, Kim S, Cho YH. Syndrome pattern and its application in parallel randomized controlled trials. Chin J Integr Med. 2013;19(3):163–71.

    Article  PubMed  Google Scholar 

  39. Grimes DA, Schulz KF. Compared to what? Finding controls for case-control studies. Lancet. 2005;365:1429–33.

    Article  PubMed  Google Scholar 

  40. World Health Organization. Standard acupuncture nomenclature. Geneva: World Health Organization; 1993.

    Google Scholar 

  41. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159–74.

    Article  CAS  PubMed  Google Scholar 

  42. World Health Organization. Health research methodology: a guide for training in research methods. Geneva: World Health Organization; 2001.

    Google Scholar 

  43. Light RJ. Measures of response agreement for qualitative data: some generalizations and alternatives. Psychol Bull. 1971;76(5):365–77.

    Article  Google Scholar 

  44. Hallgren KA. Computing inter-rater reliability for observational data: an overview and tutorial. Tutor Quant Methods Psychol. 2012;8(1):23–34.

    Article  PubMed  PubMed Central  Google Scholar 

  45. Janson H, Olsson U. A measure of agreement for interval or nominal multivariate observations. Educ Psychol Measur. 2001;6(21):277–89.

    Article  Google Scholar 

  46. Altman DG, Bland JM. Diagnostic tests. 1: Sensitivity and specificity. BMJ. 1994;308(6943):1552.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Altman DG, Bland JM. Diagnostic tests 2: Predictive values. BMJ. 1994;309(6947):102.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Kanji G. 100 statistical tests. 3a edição. London: Sage Publications; 2006.

    Google Scholar 

  49. Efron B, Tibshirani RJ. An introduction to the bootstrap. Florida: CRC Press; 1998.

    Google Scholar 

  50. North BV, Curtis D, Sham PC. A note on the calculation of empirical P-values from Monte Carlo procedures. Am J Hum Genet. 2002;71(2):439–41.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgments

This work was supported by a grant from the Fundação Carlos Chagas Filho de Amparo à Pesquisa do Estado do Rio de Janeiro.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arthur Sá Ferreira .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media Singapore

About this chapter

Cite this chapter

Ferreira, A.S., Oliveira, I.J.A.S. (2016). Methods for Assessment of Interrater Reliability for Diagnosis and Intervention in Traditional Chinese Medicine Studies. In: Leung, Sw., Hu, H. (eds) Evidence-based Research Methods for Chinese Medicine. Springer, Singapore. https://doi.org/10.1007/978-981-10-2290-6_7

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-2290-6_7

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-2289-0

  • Online ISBN: 978-981-10-2290-6

  • eBook Packages: MedicineMedicine (R0)

Publish with us

Policies and ethics