Skip to main content

Advertisement

Log in

Automated Identification of Surveillance Colonoscopy in Inflammatory Bowel Disease Using Natural Language Processing

  • Original Article
  • Published:
Digestive Diseases and Sciences Aims and scope Submit manuscript

Abstract

Background

Differentiating surveillance from non-surveillance colonoscopy for colorectal cancer in patients with inflammatory bowel disease (IBD) using electronic medical records (EMR) is important for practice improvement and research purposes, but diagnosis code algorithms are lacking. The automated retrieval console (ARC) is natural language processing (NLP)-based software that allows text-based document-level classification.

Aims

The purpose of this study was to test the feasibility and accuracy of ARC in identifying surveillance and non-surveillance colonoscopy in IBD using EMR.

Methods

We performed a split validation study of electronic reports of colonoscopy pathology for patients with IBD from the Michael E. DeBakey VA Medical Center. A gastroenterologist manually classified pathology reports as either derived from surveillance or non-surveillance colonoscopy. Pathology reports were randomly split into two sets: 70 % for algorithm derivation and 30 % for validation. An ARC generated classification model was applied to the validation set of pathology reports. The performance of the model was compared with manual classification for surveillance and non-surveillance colonoscopy.

Results

A total of 575 colonoscopy pathology reports were available on 195 IBD patients, of which 400 reports were designated as training and 175 as testing sets. Within the testing set, a total of 69 pathology reports were classified as surveillance by manual review, whereas the ARC model classified 66 reports as surveillance for a recall of 0.77, precision of 0.80, and specificity of 0.88.

Conclusions

ARC was able to identify surveillance colonoscopy for IBD without customized software programming. NLP-based document-level classification may be used to differentiate surveillance from non-surveillance colonoscopy in IBD.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Abbreviations

IBD:

Inflammatory bowel disease

ARC:

Automated retrieval console

NLP:

Natural language processing

CRC:

Colorectal cancer

VA:

Veterans affairs

ICD-9:

International classification of diseases, 9th revision

EMR:

Electronic medical records

CPRS:

Computerized patient record system

cTAKES:

Clinical text analysis and knowledge extraction system

UIMA:

Unstructured information management architecture

CRF:

Conditional random fields

MALLET:

Machine learning for language toolkit

PPV:

Positive predictive value

CD:

Crohn’s disease

UC:

Ulcerative colitis

IC:

Indeterminate colitis

CI:

Confidence interval

NPV:

Negative predictive value

References

  1. Kornbluth A, Sachar DB. Practice parameters committee of the American College of Gastroenterology. Ulcerative colitis practice guidelines in adults: American College of Gastroenterology, practice parameters committee. Am J Gastroenterol. 2010;105:501–523.

    Article  PubMed  Google Scholar 

  2. Farraye FA, Odze RD, Eaden J, et al. AGA technical review on the diagnosis and management of colorectal neoplasia in inflammatory bowel disease. Gastroenterology. 2010;138:746–774.

    Article  PubMed  Google Scholar 

  3. Velayos FS, Liu L, Lewis JD, et al. Prevalence of colorectal cancer surveillance for ulcerative colitis in an integrated health care delivery system. Gastroenterology. 2010;139:1511–1518.

    Article  PubMed  Google Scholar 

  4. Kottachchi D, Yung D, Marshall JK. Adherence to guidelines for surveillance colonoscopy in patients with ulcerative colitis at a Canadian quaternary care hospital. Can J Gastroenterol. 2009;23:613–617.

    PubMed  Google Scholar 

  5. Savova GK, Masanz JJ, Ogren PV, et al. Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc. 2010;17:507–513.

    Article  PubMed  Google Scholar 

  6. D’Avolio LW, Nguyen TM, Farwell WR, et al. Evaluation of a generalizable approach to clinical information retrieval using the automated retrieval console (ARC). J Am Med Inform Assoc. 2010;17:375–382.

    Article  PubMed  Google Scholar 

  7. Shiner B, D’Avolio LW, Nguyen TM, et al. Automated classification of psychotherapy note text: implications for quality assessment in PTSD care. J Eval Clin Pract. 2012;18:698–701.

    Google Scholar 

  8. Farwell WR, D’Avolio LW, Scranton RE, et al. Statins and prostate cancer diagnosis and grade in a veterans population. J Natl Cancer Inst. 2011;103:885–892.

    Article  PubMed  CAS  Google Scholar 

  9. McCallum AK. MALLET: A machine learning for language toolkit. http://mallet.cs.umass.edu; 2012.

Download references

Acknowledgments

The research reported here was supported by the American College of Gastroenterology Junior Faculty Development Award to J.K. Hou, a pilot grant from the Houston VA HSR&D Center of Excellence (HFP90-020) to J.K. Hou, the Department of Veterans Affairs, Veterans Health Administration, Health Services Research and Development Service, grant MRP05-305 to J.R. Kramer, and the National Institutes of Health/National Institute of Diabetes and Digestive and Kidney Disease Center grant P30 DK56338, K24 DK078154-05.

Conflict of interest

None.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jason K. Hou.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hou, J.K., Chang, M., Nguyen, T. et al. Automated Identification of Surveillance Colonoscopy in Inflammatory Bowel Disease Using Natural Language Processing. Dig Dis Sci 58, 936–941 (2013). https://doi.org/10.1007/s10620-012-2433-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10620-012-2433-8

Keywords

Navigation