Skip to main content

Advertisement

Log in

Pattern-based information extraction from pathology reports for cancer registration

  • Original paper
  • Published:
Cancer Causes & Control Aims and scope Submit manuscript

Abstract

Objective

To evaluate precision and recall rates for the automatic extraction of information from free-text pathology reports. To assess the impact that implementation of pattern-based methods would have on cancer registration completeness.

Method

Over 300,000 electronic pathology reports were scanned for the extraction of Gleason score, Clark level and Breslow depth, by a number of Perl routines progressively enhanced by a trial-and-error method. An additional test set of 915 reports potentially containing Gleason score was used for evaluation.

Results

Values for recall and precision of over 98 and 99%, respectively, were easily reached. Potential increase in cancer staging completeness of up to 32% was proved.

Conclusions

In cancer registration, simple pattern matching applied to free-text documents can be effectively used to improve completeness and accuracy of pathology information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  1. Stevens R, Wroe C, Lord P, Goble C (2004) Ontologies in bioinformatics. In: Staab S, Studer R (eds) Handbook on ontologies. Springer, Berlin, pp 635–657

    Google Scholar 

  2. Health level 7. http://www.hl7.org/. Accessed Jan 2010

  3. Systematized nomenclature of medicine. http://www.snomed.org/. Accessed Jan 2010

  4. International classification of disease. ver. 10. http://www.who.int/classifications/icd/en/. Accessed Jan 2010

  5. Collier N, Nazarenko A, Baud R, Ruch P (2006) Recent advances in natural language processing for biomedical applications. Int J Med Inform 75:413–417

    Article  PubMed  Google Scholar 

  6. Taira RK, Soderland SG, Jakobovits RM (2001) Automatic structuring of radiology free-text reports. Radiographics 21:237–245

    CAS  PubMed  Google Scholar 

  7. Hotho A, Nürnberger A, Paaß G (2005) A brief survey of text mining. LDV Forum 20:19–62

    Google Scholar 

  8. Turchin A, Kolatkar NS, Grant RW, Makhni EC, Pendergrass ML, Einbinder JS (2006) Using regular expressions to abstract blood pressure and treatment intensification information from the text of physician notes. J Am Med Inform Assoc 13:691–695

    Article  PubMed  Google Scholar 

  9. Gleason DF (1977) The veteran’s administration cooperative urologic research group: histologic grading and clinical staging of prostatic carcinoma. In: Tannenbaum M (ed) Urologic pathology: the prostate. Lea and Febiger, Philadelphia, pp 171–198

    Google Scholar 

  10. Clark WHJ, From L, Bernardino EA, Mihm MC (1969) The histogenesis and biological behavior of primary human malignant melanoma of the skin. Cancer Res 14:705–726

    Google Scholar 

  11. Breslow A (1970) Thickness, cross-sectional areas and depth of invasion in the prognosis of cutaneous melanoma. Ann Surg 172:902–908

    Article  CAS  PubMed  Google Scholar 

  12. NHS Information standards board, data standards: cancer registration data set, data set change notice (2005). http://www.connectingforhealth.nhs.uk/ dscn/dscn2005/092005.pdf

  13. NHS connecting for health. http://www.connectingforhealth.nhs.uk/. Accessed Jan 2010

  14. Friedl JEF (1997) Mastering regular expressions. O’Reilly & Associates, Cambridge (MA)

    Google Scholar 

  15. Sobin LH, Wittekind C (2002) UICC TNM classification of malignant tumours. Wiley-Liss, New York

    Google Scholar 

  16. SEER training modules, skin cancer: melanoma. U. S. National Institutes of Health, National Cancer Institute. http://training.seer.cancer.gov/melanoma/abstract-code-stage/staging.html. Accessed 19 July 2010

  17. Meystre SM, Savova GK, Kipper-Schuler KC, Hurdle JF (2008) Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inform 2008:128–144

  18. Coden A, Savova G, Sominsky I, Tanenblatt M, Masanz J, Schuler K, Cooper J, Guan W, de Groen PC (2009) Automatically extracting cancer disease characteristics from pathology reports into a disease knowledge representation model. J Biomed Inform 42:937–949

    Article  PubMed  Google Scholar 

  19. van Leeuwen PJ, Connolly D, Napolitano G, Gavin A, Schröder FH, Roobol MJ (2009) Metastasis-free survival in screen and clinical detected prostate cancer: a comparison between the European randomized study of screening for prostate cancer and Northern Ireland. J Urol 181(4)Suppl 1: 798

    Google Scholar 

Download references

Acknowledgments

The Northern Ireland Cancer Registry was funded by the Department of Health, Social Services and Public Safety Northern Ireland (DHSSPSNI), at the time this study was completed. It is now funded by the Public Health Agency. We also wish to thank Alejandra González Beltrán for her stimulating comments on this paper.

Financial support

The Northern Ireland Cancer Registry was funded by the Department of Health, Social Services and Public Safety Northern Ireland (DHSSPSNI), at the time this study was completed. It is now funded by the Public Health Agency.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giulio Napolitano.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Napolitano, G., Fox, C., Middleton, R. et al. Pattern-based information extraction from pathology reports for cancer registration. Cancer Causes Control 21, 1887–1894 (2010). https://doi.org/10.1007/s10552-010-9616-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10552-010-9616-4

Keywords

Navigation