Abstract
Traditional medicines experienced an increasing interest in theoretical, experimental, and clinical research since its recognition in the Alma-Ata Declaration. Particularly, Chinese medicine (CM) was developed by a society geographic, social, and culturally different from the Western community during the last 3000 years. The diagnostic process of CM has a unique feature: patterns, the counterpart of Western diseases, are identified through a process named pattern differentiation. The collection of clinical manifestations of an individual is obtained using four examinations known as inspection, auscultation-olfaction, inquiry, and palpation. As a corollary, CM diagnosis is considered as subjective because only the five senses are used to gather meaningful clinical data and must be interpreted by an expert; no equipment or diagnostic exam was developed for collecting data for pattern differentiation until the last decades. Pattern differentiation comprises a procedure subjected to errors as any other diagnostic system, but this variability in diagnosis might have consequences: different patterns might lead to distinct treatment choices such as herbs or acupoints selection. In contrast with Western medicine that has treatment protocols for various diseases, there are no defined protocols of acupoints for patterns because of the personalized aspect of CM’s diagnostic process and the possibility of selecting acupoints using a variety of criteria. Therefore, it is important to assess simultaneously the amount of agreement—mainly among different raters—for CM diagnosis and the diagnostic accuracy for pattern differentiation to determine the validity of this traditional system in both clinical and research scenarios. In this sense, high interrater agreement (i.e. the degree to which raters achieve identical results when performing the same assessment under similar conditions) and diagnostic accuracy (i.e. the rate of correct diagnosis) are important characteristics of any model used for health classification. Previous studies investigated the agreement for pattern differentiation and/or for acupuncture prescription, though they present important limitations either from the traditional or scientific perspective. A lack of calculating and reporting statistical measures of agreement or a lack of investigating the relationship between diagnosis and therapeutic prescription was observed. Finally, the above-cited studies used real human patients, in which the true pattern was unknown and therefore it is not possible to assess the diagnostic accuracy with a gold-standard model. This chapter introduces advanced methods for assessing interrater reliability for diagnosis and intervention in CM. More specifically, this chapter discusses the choices of study design and statistical methods for measuring interrater reliability and diagnostic accuracy in the context of pattern differentiation and acupuncture prescription. Sample size calculation and proper agreement coefficients for the multinomial, univariate or multivariate scenarios are presented. The role of computational simulation as a gold-standard method is also addressed. Finally, computational methods for statistical analysis of reliability and diagnostic performance are presented and discussed in the context of reliability and diagnostic accuracy analysis in CM.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Available at http://www.unisuam.edu.br/index.php/downloads-cr.
- 2.
Available at http://www.r-project.org.
- 3.
Enhancing the QUAlity and Transparency Of health Research (EQUATOR network) available at http://www.equator-network.org.
- 4.
Available at www.random.org.
References
World Health Organization. Primary Health Care Report of the International Conference on Primary Health Care, Alma-Ata, USSR, 6–12 Sept 1978. Geneva, 1978.
T’ao L. Chinese medicine during the Chin (1127–1234) and Yuan (1234–1368) eras. Chin Med J. 1955;73(3):241–56.
Lee T, Cheng CF, Chang CS. Some early records of nervous and mental diseases in traditional Chinese medicine. Chin Med J. 1962;81:55–9.
O’Connor J, Bensky D. Acupuncture a comprehensive text. Seattle: Eastland Press; 1987.
Ferreira AS, Lopes AJ. Chinese medicine pattern differentiation and its implications for clinical practice. Chin J Integr Med. 2011;17(11):818–23.
Weingart SN, Wilson RM, Gibberd RW, Harrison B. Epidemiology of medical error. Br Med J. 2000;320(18):774–7.
Kuhn GJ. Diagnostic errors. Acad Emerg Med. 2002;9(7):740–50.
Schiff GD, Hasan O, Kim S, Abrams R, Cosby R, Lambert BL, Elstein AS, Hasler S, Kabongo ML, Krosnjar N, Odwazny R, Wisniewski MF, McNutt RA. Diagnostic errors in medicine: analysis of 583 physician-reported errors. Arch Int Med. 2009;169(20):1881–7.
Sung JJY, Leung WK, Ching JYL, Lao L, Zhang G, Wu JCY, Liang SM, Xie H, Ho YP, Chan LS, Bernam B, Chan FKL. Agreements among traditional Chinese medicine practitioners in the diagnosis and treatment of irritable bowel syndrome. Aliment Pharmacol Ther. 2004;20(10):1205–10.
Zhang GG, Lee W, Bausell B, Lao L, Handwerger B, Berman B. Variability in the traditional Chinese medicine (TCM) diagnoses and herbal prescriptions provided by three TCM practitioners for 40 patients with rheumatoid arthritis. J Altern Complement Med. 2005;11(3):415–21.
Zhang GG, Singh B, Lee W, Handwerger B, Lao L, Berman B. Improvement of agreement in TCM diagnosis among TCM practitioners for persons with the conventional diagnosis of rheumatoid arthritis: effect of training. J Altern Complement Med. 2008;14(4):381–6.
Coyetaux RR, Chen W, Lindemuth CE, Tan Y, Reilly AC. Variability in the diagnosis and point selection for persons with frequent headache by traditional Chinese medicine acupuncturists. J Altern Complement Med. 2006;12(9):863–72.
Mist S, Ritenbaugh C, Aickin M. Effects of questionnaire-based diagnosis and training on inter-rater reliability among practitioners of traditional Chinese medicine. J Altern Complement Med. 2009;15(7):703–9.
O’Brien KA, Abbas E, Zhang J, Guo Z, Luo R, Bensoussan A, Komesaroff PA. An investigation into the reliability of Chinese medicine diagnosis according to the eight guiding principles and Zang-fu theory in Australians with hypercholesterolemia. J Altern Complement Med. 2009;15(3):259–66.
Grant SJ, Schnyer RN, Chang DH, Fahey P, Bensoussan A. Interrater reliability of Chinese medicine diagnosis in people with prediabetes. Evid Based Complem Altern Med. 2013;2013:Article ID 710892.
Birkeflet O, Laake P, Vollestad NK. Poor multi-rater reliability in TCM pattern diagnoses and variation in the use of symptoms to obtain a diagnosis. Acupunct Med. 2014;32(4):325–32.
Xu ZX, Xu J, Yan JJ, Wang YQ, Guo R, Liu GP, Yan HX, Qian P, Hong YJ. Analysis of the diagnostic consistency of Chinese medicine specialists in cardiovascular disease cases and syndrome identification based on the relevant feature for each label learning method. Chin J Integr Med. (30 Jul 2014, Epub ahead of print).
Ferreira AS. Advances in Chinese medicine diagnosis: from traditional methods to computational models. In: Kuang H, editor. Recent advances in Chinese medicine. Croatia: InTech; 2012.
Ferreira AS. Promoting the Integrative Medicine by the computerization of traditional Chinese medicine for scientific research and clinical practice: The SuiteTCM Project. J Integr Med. 2013;11(2):135–9.
Ferreira AS, Pacheco AG. SimTCM: a human patient simulator with application to diagnostic accuracy studies of Chinese medicine. J Integr Med. (Dec 2014, Epub ahead of print).
de Sá Ferreira A. Statistical validation of strategies for Zang-fu single pattern differentiation. J Chin Integr Med. 2008;6(11):1109–16.
Ferreira AS. Diagnostic accuracy of pattern differentiation algorithm based on traditional Chinese medicine theory: a stochastic simulation study. Chin Med. 2009;4:24.
Sá Ferreira A. Misdiagnosis and undiagnosis due to pattern similarity in Chinese medicine: a stochastic simulation study using pattern differentiation algorithm. Chin Med. 2011;6:13.
R Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.
Snow G. blockrand: Randomization for block random clinical trials. 2013. R package version 1.3.
Canty A, Ripley B. boot: Bootstrap R (S-Plus) functions. 2014. R package version 1.3-11.
Kuhn M. Contributions from Jed Wing, Steve Weston, Andre Williams, Chris Keefer, Allan Engelhardt, Tony Cooper, Zachary Mayer, Brenton Kenkel, the R Core Team and Michael Benesty. caret: Classification and Regression Training. 2014. R package version 6.0-37.
Gamer M, Lemon J, Singh IFP. irr: Various Coefficients of Interrater Reliability and Agreement. 2012. R package version 0.84.
Rotondi MA. kappaSize: Sample Size Estimation Functions for Studies of Interobserver Agreement. 2013. R package version 1.1.
Falissard B. psy: Various procedures used in psychometry. 2012. R package version 1.1.
Dragulescu AA. xlsx: Read, write, format Excel 2007 and Excel 97/2000/XP/2003 files. 2014. R package version 0.5.7.
Kottner J, Audigé L, Brorson S, Donner A, Gajewski BJ, Hróbjartsson A, Roberts C, Shoukri M, Streiner DL. Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed. J Clin Epidemiol. 2011;64(1):96–106.
MacPherson H, Altman DG, Hammerschlag R, Youping L, Taixing W, White A, Moher D, STRICTA Revision Group. Revised STandards for Reporting Interventions in Clinical Trials of Acupuncture (STRICTA): extending the CONSORT statement. PLoS Med. 2010;7(6):e1000261.
MacPherson H, White A, Cummings M, Jobst K, Rose K, Niemtzow R. Standards for reporting interventions in controlled trials of acupuncture: The STRICTA recommendations. STandards for Reporting Interventions in Controlled Trails of Acupuncture. Acupunct Med. 2002;20(1):22–5.
Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, Moher D, Rennie D, de Vet HCW, Lijmer JG. The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Ann Intern Med. 2003;138:W1–2.
Altaye M, Donner A, Eliasziw M. A general goodness-of-fit approach for inference procedures concerning the kappa statistic. Stat Med. 2011;20(16):2479–88.
Siemiatycki J, Campbell S. Nonresponse bias and early versus all responders in mail and telephone surveys. Am J Epidemiol. 1984;120(2):291–301.
Shin BC, Kim S, Cho YH. Syndrome pattern and its application in parallel randomized controlled trials. Chin J Integr Med. 2013;19(3):163–71.
Grimes DA, Schulz KF. Compared to what? Finding controls for case-control studies. Lancet. 2005;365:1429–33.
World Health Organization. Standard acupuncture nomenclature. Geneva: World Health Organization; 1993.
Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159–74.
World Health Organization. Health research methodology: a guide for training in research methods. Geneva: World Health Organization; 2001.
Light RJ. Measures of response agreement for qualitative data: some generalizations and alternatives. Psychol Bull. 1971;76(5):365–77.
Hallgren KA. Computing inter-rater reliability for observational data: an overview and tutorial. Tutor Quant Methods Psychol. 2012;8(1):23–34.
Janson H, Olsson U. A measure of agreement for interval or nominal multivariate observations. Educ Psychol Measur. 2001;6(21):277–89.
Altman DG, Bland JM. Diagnostic tests. 1: Sensitivity and specificity. BMJ. 1994;308(6943):1552.
Altman DG, Bland JM. Diagnostic tests 2: Predictive values. BMJ. 1994;309(6947):102.
Kanji G. 100 statistical tests. 3a edição. London: Sage Publications; 2006.
Efron B, Tibshirani RJ. An introduction to the bootstrap. Florida: CRC Press; 1998.
North BV, Curtis D, Sham PC. A note on the calculation of empirical P-values from Monte Carlo procedures. Am J Hum Genet. 2002;71(2):439–41.
Acknowledgments
This work was supported by a grant from the Fundação Carlos Chagas Filho de Amparo à Pesquisa do Estado do Rio de Janeiro.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Science+Business Media Singapore
About this chapter
Cite this chapter
Ferreira, A.S., Oliveira, I.J.A.S. (2016). Methods for Assessment of Interrater Reliability for Diagnosis and Intervention in Traditional Chinese Medicine Studies. In: Leung, Sw., Hu, H. (eds) Evidence-based Research Methods for Chinese Medicine. Springer, Singapore. https://doi.org/10.1007/978-981-10-2290-6_7
Download citation
DOI: https://doi.org/10.1007/978-981-10-2290-6_7
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-2289-0
Online ISBN: 978-981-10-2290-6
eBook Packages: MedicineMedicine (R0)