Methods for Assessment of Interrater Reliability for Diagnosis and Intervention in Traditional Chinese Medicine Studies

Ferreira, Arthur Sá; Oliveira, Ingrid Jardim Azeredo Souza

doi:10.1007/978-981-10-2290-6_7

Arthur Sá Ferreira³ &
Ingrid Jardim Azeredo Souza Oliveira³

665 Accesses
2 Altmetric

Abstract

Traditional medicines experienced an increasing interest in theoretical, experimental, and clinical research since its recognition in the Alma-Ata Declaration. Particularly, Chinese medicine (CM) was developed by a society geographic, social, and culturally different from the Western community during the last 3000 years. The diagnostic process of CM has a unique feature: patterns, the counterpart of Western diseases, are identified through a process named pattern differentiation. The collection of clinical manifestations of an individual is obtained using four examinations known as inspection, auscultation-olfaction, inquiry, and palpation. As a corollary, CM diagnosis is considered as subjective because only the five senses are used to gather meaningful clinical data and must be interpreted by an expert; no equipment or diagnostic exam was developed for collecting data for pattern differentiation until the last decades. Pattern differentiation comprises a procedure subjected to errors as any other diagnostic system, but this variability in diagnosis might have consequences: different patterns might lead to distinct treatment choices such as herbs or acupoints selection. In contrast with Western medicine that has treatment protocols for various diseases, there are no defined protocols of acupoints for patterns because of the personalized aspect of CM’s diagnostic process and the possibility of selecting acupoints using a variety of criteria. Therefore, it is important to assess simultaneously the amount of agreement—mainly among different raters—for CM diagnosis and the diagnostic accuracy for pattern differentiation to determine the validity of this traditional system in both clinical and research scenarios. In this sense, high interrater agreement (i.e. the degree to which raters achieve identical results when performing the same assessment under similar conditions) and diagnostic accuracy (i.e. the rate of correct diagnosis) are important characteristics of any model used for health classification. Previous studies investigated the agreement for pattern differentiation and/or for acupuncture prescription, though they present important limitations either from the traditional or scientific perspective. A lack of calculating and reporting statistical measures of agreement or a lack of investigating the relationship between diagnosis and therapeutic prescription was observed. Finally, the above-cited studies used real human patients, in which the true pattern was unknown and therefore it is not possible to assess the diagnostic accuracy with a gold-standard model. This chapter introduces advanced methods for assessing interrater reliability for diagnosis and intervention in CM. More specifically, this chapter discusses the choices of study design and statistical methods for measuring interrater reliability and diagnostic accuracy in the context of pattern differentiation and acupuncture prescription. Sample size calculation and proper agreement coefficients for the multinomial, univariate or multivariate scenarios are presented. The role of computational simulation as a gold-standard method is also addressed. Finally, computational methods for statistical analysis of reliability and diagnostic performance are presented and discussed in the context of reliability and diagnostic accuracy analysis in CM.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Available at http://www.unisuam.edu.br/index.php/downloads-cr.
2.
Available at http://www.r-project.org.
3.
Enhancing the QUAlity and Transparency Of health Research (EQUATOR network) available at http://www.equator-network.org.
4.
Available at www.random.org.

References

World Health Organization. Primary Health Care Report of the International Conference on Primary Health Care, Alma-Ata, USSR, 6–12 Sept 1978. Geneva, 1978.
Google Scholar
T’ao L. Chinese medicine during the Chin (1127–1234) and Yuan (1234–1368) eras. Chin Med J. 1955;73(3):241–56.
Google Scholar
Lee T, Cheng CF, Chang CS. Some early records of nervous and mental diseases in traditional Chinese medicine. Chin Med J. 1962;81:55–9.
CAS PubMed Google Scholar
O’Connor J, Bensky D. Acupuncture a comprehensive text. Seattle: Eastland Press; 1987.
Google Scholar
Ferreira AS, Lopes AJ. Chinese medicine pattern differentiation and its implications for clinical practice. Chin J Integr Med. 2011;17(11):818–23.
Article PubMed Google Scholar
Weingart SN, Wilson RM, Gibberd RW, Harrison B. Epidemiology of medical error. Br Med J. 2000;320(18):774–7.
Article CAS Google Scholar
Kuhn GJ. Diagnostic errors. Acad Emerg Med. 2002;9(7):740–50.
Article PubMed Google Scholar
Schiff GD, Hasan O, Kim S, Abrams R, Cosby R, Lambert BL, Elstein AS, Hasler S, Kabongo ML, Krosnjar N, Odwazny R, Wisniewski MF, McNutt RA. Diagnostic errors in medicine: analysis of 583 physician-reported errors. Arch Int Med. 2009;169(20):1881–7.
Article Google Scholar
Sung JJY, Leung WK, Ching JYL, Lao L, Zhang G, Wu JCY, Liang SM, Xie H, Ho YP, Chan LS, Bernam B, Chan FKL. Agreements among traditional Chinese medicine practitioners in the diagnosis and treatment of irritable bowel syndrome. Aliment Pharmacol Ther. 2004;20(10):1205–10.
Article CAS PubMed Google Scholar
Zhang GG, Lee W, Bausell B, Lao L, Handwerger B, Berman B. Variability in the traditional Chinese medicine (TCM) diagnoses and herbal prescriptions provided by three TCM practitioners for 40 patients with rheumatoid arthritis. J Altern Complement Med. 2005;11(3):415–21.
Article CAS PubMed Google Scholar
Zhang GG, Singh B, Lee W, Handwerger B, Lao L, Berman B. Improvement of agreement in TCM diagnosis among TCM practitioners for persons with the conventional diagnosis of rheumatoid arthritis: effect of training. J Altern Complement Med. 2008;14(4):381–6.
Article PubMed Google Scholar
Coyetaux RR, Chen W, Lindemuth CE, Tan Y, Reilly AC. Variability in the diagnosis and point selection for persons with frequent headache by traditional Chinese medicine acupuncturists. J Altern Complement Med. 2006;12(9):863–72.
Article Google Scholar
Mist S, Ritenbaugh C, Aickin M. Effects of questionnaire-based diagnosis and training on inter-rater reliability among practitioners of traditional Chinese medicine. J Altern Complement Med. 2009;15(7):703–9.
Article PubMed PubMed Central Google Scholar
O’Brien KA, Abbas E, Zhang J, Guo Z, Luo R, Bensoussan A, Komesaroff PA. An investigation into the reliability of Chinese medicine diagnosis according to the eight guiding principles and Zang-fu theory in Australians with hypercholesterolemia. J Altern Complement Med. 2009;15(3):259–66.
Article PubMed Google Scholar
Grant SJ, Schnyer RN, Chang DH, Fahey P, Bensoussan A. Interrater reliability of Chinese medicine diagnosis in people with prediabetes. Evid Based Complem Altern Med. 2013;2013:Article ID 710892.
Google Scholar
Birkeflet O, Laake P, Vollestad NK. Poor multi-rater reliability in TCM pattern diagnoses and variation in the use of symptoms to obtain a diagnosis. Acupunct Med. 2014;32(4):325–32.
Article PubMed Google Scholar
Xu ZX, Xu J, Yan JJ, Wang YQ, Guo R, Liu GP, Yan HX, Qian P, Hong YJ. Analysis of the diagnostic consistency of Chinese medicine specialists in cardiovascular disease cases and syndrome identification based on the relevant feature for each label learning method. Chin J Integr Med. (30 Jul 2014, Epub ahead of print).
Google Scholar
Ferreira AS. Advances in Chinese medicine diagnosis: from traditional methods to computational models. In: Kuang H, editor. Recent advances in Chinese medicine. Croatia: InTech; 2012.
Google Scholar
Ferreira AS. Promoting the Integrative Medicine by the computerization of traditional Chinese medicine for scientific research and clinical practice: The SuiteTCM Project. J Integr Med. 2013;11(2):135–9.
Article Google Scholar
Ferreira AS, Pacheco AG. SimTCM: a human patient simulator with application to diagnostic accuracy studies of Chinese medicine. J Integr Med. (Dec 2014, Epub ahead of print).
Google Scholar
de Sá Ferreira A. Statistical validation of strategies for Zang-fu single pattern differentiation. J Chin Integr Med. 2008;6(11):1109–16.
Google Scholar
Ferreira AS. Diagnostic accuracy of pattern differentiation algorithm based on traditional Chinese medicine theory: a stochastic simulation study. Chin Med. 2009;4:24.
Article PubMed PubMed Central Google Scholar
Sá Ferreira A. Misdiagnosis and undiagnosis due to pattern similarity in Chinese medicine: a stochastic simulation study using pattern differentiation algorithm. Chin Med. 2011;6:13.
Google Scholar
R Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.
Google Scholar
Snow G. blockrand: Randomization for block random clinical trials. 2013. R package version 1.3.
Google Scholar
Canty A, Ripley B. boot: Bootstrap R (S-Plus) functions. 2014. R package version 1.3-11.
Google Scholar
Kuhn M. Contributions from Jed Wing, Steve Weston, Andre Williams, Chris Keefer, Allan Engelhardt, Tony Cooper, Zachary Mayer, Brenton Kenkel, the R Core Team and Michael Benesty. caret: Classification and Regression Training. 2014. R package version 6.0-37.
Google Scholar
Gamer M, Lemon J, Singh IFP. irr: Various Coefficients of Interrater Reliability and Agreement. 2012. R package version 0.84.
Google Scholar
Rotondi MA. kappaSize: Sample Size Estimation Functions for Studies of Interobserver Agreement. 2013. R package version 1.1.
Google Scholar
Falissard B. psy: Various procedures used in psychometry. 2012. R package version 1.1.
Google Scholar
Dragulescu AA. xlsx: Read, write, format Excel 2007 and Excel 97/2000/XP/2003 files. 2014. R package version 0.5.7.
Google Scholar
Kottner J, Audigé L, Brorson S, Donner A, Gajewski BJ, Hróbjartsson A, Roberts C, Shoukri M, Streiner DL. Guidelines for Reporting Reliability and Agreement Studies (GRRAS) were proposed. J Clin Epidemiol. 2011;64(1):96–106.
Article PubMed Google Scholar
MacPherson H, Altman DG, Hammerschlag R, Youping L, Taixing W, White A, Moher D, STRICTA Revision Group. Revised STandards for Reporting Interventions in Clinical Trials of Acupuncture (STRICTA): extending the CONSORT statement. PLoS Med. 2010;7(6):e1000261.
Google Scholar
MacPherson H, White A, Cummings M, Jobst K, Rose K, Niemtzow R. Standards for reporting interventions in controlled trials of acupuncture: The STRICTA recommendations. STandards for Reporting Interventions in Controlled Trails of Acupuncture. Acupunct Med. 2002;20(1):22–5.
Article PubMed Google Scholar
Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, Moher D, Rennie D, de Vet HCW, Lijmer JG. The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Ann Intern Med. 2003;138:W1–2.
Article PubMed Google Scholar
Altaye M, Donner A, Eliasziw M. A general goodness-of-fit approach for inference procedures concerning the kappa statistic. Stat Med. 2011;20(16):2479–88.
Article Google Scholar
Siemiatycki J, Campbell S. Nonresponse bias and early versus all responders in mail and telephone surveys. Am J Epidemiol. 1984;120(2):291–301.
CAS PubMed Google Scholar
Shin BC, Kim S, Cho YH. Syndrome pattern and its application in parallel randomized controlled trials. Chin J Integr Med. 2013;19(3):163–71.
Article PubMed Google Scholar
Grimes DA, Schulz KF. Compared to what? Finding controls for case-control studies. Lancet. 2005;365:1429–33.
Article PubMed Google Scholar
World Health Organization. Standard acupuncture nomenclature. Geneva: World Health Organization; 1993.
Google Scholar
Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1):159–74.
Article CAS PubMed Google Scholar
World Health Organization. Health research methodology: a guide for training in research methods. Geneva: World Health Organization; 2001.
Google Scholar
Light RJ. Measures of response agreement for qualitative data: some generalizations and alternatives. Psychol Bull. 1971;76(5):365–77.
Article Google Scholar
Hallgren KA. Computing inter-rater reliability for observational data: an overview and tutorial. Tutor Quant Methods Psychol. 2012;8(1):23–34.
Article PubMed PubMed Central Google Scholar
Janson H, Olsson U. A measure of agreement for interval or nominal multivariate observations. Educ Psychol Measur. 2001;6(21):277–89.
Article Google Scholar
Altman DG, Bland JM. Diagnostic tests. 1: Sensitivity and specificity. BMJ. 1994;308(6943):1552.
Article CAS PubMed PubMed Central Google Scholar
Altman DG, Bland JM. Diagnostic tests 2: Predictive values. BMJ. 1994;309(6947):102.
Article CAS PubMed PubMed Central Google Scholar
Kanji G. 100 statistical tests. 3a edição. London: Sage Publications; 2006.
Google Scholar
Efron B, Tibshirani RJ. An introduction to the bootstrap. Florida: CRC Press; 1998.
Google Scholar
North BV, Curtis D, Sham PC. A note on the calculation of empirical P-values from Monte Carlo procedures. Am J Hum Genet. 2002;71(2):439–41.
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgments

This work was supported by a grant from the Fundação Carlos Chagas Filho de Amparo à Pesquisa do Estado do Rio de Janeiro.

Author information

Authors and Affiliations

Laboratory of Computational Simulation and Modeling in Rehabilitation, Augusto Motta University Center, Rio de Janeiro, RJ, Brazil
Arthur Sá Ferreira & Ingrid Jardim Azeredo Souza Oliveira

Authors

Arthur Sá Ferreira
View author publications
You can also search for this author in PubMed Google Scholar
Ingrid Jardim Azeredo Souza Oliveira
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arthur Sá Ferreira .

Editor information

Editors and Affiliations

Institute of Chinese Medical Sciences, University of Macau, Macau, Macao
Siu-wai Leung
Institute of Chinese Medical Sciences, University of Macau, Macau, Macao
Hao Hu

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ferreira, A.S., Oliveira, I.J.A.S. (2016). Methods for Assessment of Interrater Reliability for Diagnosis and Intervention in Traditional Chinese Medicine Studies. In: Leung, Sw., Hu, H. (eds) Evidence-based Research Methods for Chinese Medicine. Springer, Singapore. https://doi.org/10.1007/978-981-10-2290-6_7

Download citation

DOI: https://doi.org/10.1007/978-981-10-2290-6_7
Published: 25 November 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-2289-0
Online ISBN: 978-981-10-2290-6
eBook Packages: MedicineMedicine (R0)

Publish with us

Policies and ethics