Skip to main content

Machine Learning for Medical Examination Report Processing

  • Chapter
  • First Online:
  • 2863 Accesses

Part of the book series: Annals of Information Systems ((AOIS,volume 17))

Abstract

Vast amounts of human medical documents contain rich knowledge that can be used to facilitate a broad range of medical research and clinical study. One important application is to automatically categorize medical documents into specific categories. However, those medical documents usually contain names and identities of patients and doctors that are not allowed to be disclosed due to patient privacy and regulation issues concerning medical data. In this article, we address two issues, automatic name entity detection, and automatic classification of medical reports. We present a name entity recognition system, MD_NER_NCL, and a text document classification system, C_IME_RPT for medical report processing and categorization. The MD_NER_NCL contains an innovative segmentation algorithm, called HBE segmentation, that segments a medical text document into the Heading, Body and Ending parts, and a statistical reasoning process that utilizes knowledge of three entity lists: people name prefix list, people name suffix list, and false positive prefix list. The C_IME_RPT is developed based on Self Organizing Maps (SOM) and a machine learning process. Both systems have been evaluated using Independent Medical Examination (IME) reports provided by medical professionals. The proposed system MD_NER_NCL made a significant improvement over the well-known text analysis software, OpenNLP, for people name entity detection. The C_IME_RPT system attained a 89.9% classification accuracy, which is very good in clinical record classification. We also present an in-depth empirical study on the effectiveness of parameters associated with the SOM learning process and text mining, and their effects on classification results.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Arai, K., Barakbah, A.: Hierarchical k-means: an algorithm for centroids initialization for k-means. Rep. Fac. Sci. Eng. 36(1), 25–31 (2007)

    Google Scholar 

  2. Basu, A., Walters, C., Shepherd, M.: Support vector machines for text categorization. In: Proceedings of the 36th Annual Hawaii International Conference on System Sciences, Los Alamitos, California, USA, 2003, pp. 7–IEEE (2003)

    Google Scholar 

  3. Bender, O., Och, F., Ney, H.: Maximum entropy models for named entity recognition. In: Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL 2003, vol. 4, pp. 148–151, Edmonton, Canada. Association for Computational Linguistics (2003)

    Google Scholar 

  4. Benkhalifa, M., Bensaid, A., Mouradi, A.: Text categorization using the semi-supervised fuzzy c-means algorithm. In: Fuzzy Information Processing Society, 1999, NAFIPS. 18th International Conference of the North American. pp. 561–565, New York, USA. IEEE (1999)

    Google Scholar 

  5. Céréghino, R., Park, Y.: Review of the self-organizing map (som) approach in water resources: commentary. Environ. Model. Softw. 24(8), 945–947 (2009)

    Article  Google Scholar 

  6. Chang, Y., Sung, Y.: Applying name entity recognition to informal text. Recall 1, 1 (2005)

    Google Scholar 

  7. Chen, Z., Ni, C., Murphey, Y.L.: Neural network approaches for text document categorization. In: IEEE International Joint Conference on Neural Networks, Vancouver, BC, Canada (2006)

    Google Scholar 

  8. Chen, Z., Huang, L., Murphey, Y.L.: Incremental neural learning for text document classification. In: International Joint Conference on Neural Networks, Orlando, Florida, USA (2007)

    Google Scholar 

  9. Cheung, Y.: k*-means: a new generalized k-means clustering algorithm. Pattern Recognit. Lett. 24(15), 2883–2893 (2003)

    Article  Google Scholar 

  10. Chieu, H., Ng, H.: Named entity recognition: a maximum entropy approach using global information. In: Proceedings of the 19th International Conference on Computational Linguistics, vol. 1, pp. 1–7. Association for Computational Linguistics, Taipei, Taiwan (2002)

    Google Scholar 

  11. Cios, K., William Moore, G.: Uniqueness of medical data mining. Artif. Intell. Med. 26(1), 1–24 (2002)

    Article  Google Scholar 

  12. Claster, W., Shanmuganathan, S., Ghotbi, N.: Text mining of medical records for radiodiagnostic decision-making. J. Comput. 3(1), 1–6 (2008)

    Article  Google Scholar 

  13. Collier, N., Nazarenko, A., Baud, R., Ruch, P.: Recent advances in natural language processing for biomedical applications. Int. J. Med. Inform. 75(6), 413–417 (2006)

    Article  Google Scholar 

  14. Cortes, C., Vapnik, V.: Support-vector networks. Machine Learn. 20(3), 273–297 (1995)

    Google Scholar 

  15. Farkas, J.: Generating document clusters using thesauri and neural networks. In: Canadian Conference on Electrical and Computer Engineering, 1994, Conference Proceedings 1994, pp. 710–713, New York, NY, USA. IEEE (1994)

    Google Scholar 

  16. Florian, R., Ittycheriah, A., Jing, H., Zhang, T.: Named entity recognition through classifier combination. In: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003, vol. 4, pp. 168–171. Association for Computational Linguistics, Edmonton, Canada (2003)

    Google Scholar 

  17. Ho, T.: Random decision forests. In: Proceedings of the 3rd International Conference on Document Analysis and Recognition, 1995, vol. 1, pp. 278–282, Montreal, Canada. IEEE (1995)

    Google Scholar 

  18. Holzinger, A., Geierhofer, R., Mödritscher, F., Tatzl, R.: Semantic information in medical information systems: Utilization of text mining techniques to analyze medical diagnoses. J. Univ. Comput. Sci. 14(22), 3781–3795 (2008)

    Google Scholar 

  19. Huang, L., Murphey, Y.: Text mining with application to engineering diagnostics. Advances in Applied Artificial Intelligence, pp. 1309–1317 (2006)

    Chapter  Google Scholar 

  20. Huang, Y., Seliya, N., Murphey, Y.L., Friedenthal, R.B.: Named entity recognition and classification in medical text documents. In: The 5th International Conference on Data Mining, Las Vegas, Nevada, USA (2009)

    Google Scholar 

  21. Hyotyniemi, H., et al.: Text document classification with self-organizing maps. STeP'96, Genes, Nets and Symbols, pp. 64–72 (1996)

    Google Scholar 

  22. Joachims, T.: Text categorization with support vector machines: learning with many relevant features. Machine Learning: ECML-98, pp. 137–142, Chemnitz, Germany (1998)

    Google Scholar 

  23. Kohonen, T.: Self-organizing maps, vol. 30. Springer, Berlin, Germany (2001)

    Google Scholar 

  24. Kohonen, T., Kaski, S., Lagus, K., Salojarvi, J., Honkela, J., Paatero, V., Saarela, A.: Self organization of a massive document collection. IEEE Trans. Neur. Netw. 11(3), 574–585 (2000)

    Article  Google Scholar 

  25. Lam, W., Low, K.: Automatic document classification based on probabilistic reasoning: Model and performance analysis. In: Systems, Man, and Cybernetics, 1997. IEEE International Conference on Computational Cybernetics and Simulation, 1997, vol. 3, pp. 2719–2723. IEEE (1997)

    Google Scholar 

  26. Langley, P., Iba, W., Thompson, K.: An analysis of bayesian classifiers. In: Proceedings of the National Conference on Artificial Intelligence, pp. 223–223, Menlo Park, CA, USA . Wiley (1992)

    Google Scholar 

  27. Lee, D., Chuang, H., Seamons, K.: Document ranking and the vector-space model. IEEE Softw. 14(2), 67–75 (1997)

    Article  Google Scholar 

  28. Luhn, H.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159–165 (1958)

    Article  Google Scholar 

  29. MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. California, USA (1967)

    Google Scholar 

  30. Makoto, I., Takenobu, T.: Hierarchical bayesian clustering for automatic text classification. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI'95), Montreal, Quebec, Canada (1995)

    Google Scholar 

  31. Manine, A., Alphonse, E., Bessières, P.: Learning ontological rules to extract multiple relations of genic interactions from text. Int. J. Med. Inform. 78(12), e31–e38 (2009)

    Article  Google Scholar 

  32. Marcus, M., Marcinkiewicz, M., Santorini, B.: Building a large annotated corpus of english: The penn treebank. Comput. Linguist. 19(2), 313–330 (1993)

    Google Scholar 

  33. Mayfield, J., McNamee, P., Piatko, C.: Named entity recognition using hundreds of thousands of features. In: Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL 2003, vol. 4, pp. 184–187. Association for Computational Linguistics (2003)

    Google Scholar 

  34. McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL 2003, vol. 4, pp. 188–191, Edmonton, Canada. Association for Computational Linguistics (2003)

    Google Scholar 

  35. Merkl, D.: Text classification with self-organizing maps: Some lessons learned. Neurocomputing 21(1), 61–77 (1998)

    Article  Google Scholar 

  36. Nigam, K., McCallum, A., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using em algorithm. Machine Learn. 39(2), 103–134 (2000)

    Article  Google Scholar 

  37. Ou, G., Murphey, Y.L., Feldkamp, L.: Multicategory pattern classification using neural networks. In: International Conference on Pattern Recognition, Cambridge, UK (2004)

    Google Scholar 

  38. Pölzlbauer, G.: Survey and comparison of quality measures for self-organizing maps. In: 5th Workshop on Data Analysis (WDA 2004), pp. 67–82 2004

    Google Scholar 

  39. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)

    Article  Google Scholar 

  40. Soderland, S.: Learning information extraction rules for semi-structured and free text. Machine Learn. 34(1), 233–272 (1999)

    Article  Google Scholar 

  41. Soderland, S., Aronow, D., Fisher, D., Aseltine, J., Lehnert, W.: Machine learning of text analysis rules for clinical records. TE-39: University of Massachusetts, Center for Intelligent Information Retrieval Technical Report (1995)

    Google Scholar 

  42. Svingen, B.: Using genetic programming for document classification. Diane J. Cook (1998)

    Google Scholar 

  43. Toutanova, K., Klein, D., Manning, C., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 173–180. Association for Computational Linguistics (2003)

    Google Scholar 

  44. Uriarte, E., Mart\'ı n, F.: Topology preservation in SOM. Int. J. Appl. Math. Comput. Sci. 1(1), 19–22 (2005)

    Google Scholar 

  45. Vesanto, J., et al.: Technical report on SOM toolbox 2.0. Espoo, Finland (2000)

    Google Scholar 

  46. Wang, J., Delabie, J., Aasheim, H., Smeland, E., Myklebost, O.: Clustering of the SOM easily reveals distinct gene expression patterns: results of a reanalysis of lymphoma study. BMC Bioinform. 3(1), 3–6 (2002)

    Article  Google Scholar 

  47. Zhou, X., Han, H., Chankai, I., Prestrud, A., Brooks, A.: Converting semi-structured clinical medical records into information and knowledge. In: 21st International Conference on Data Engineering Workshops, 2005, pp. 1162–1162, Tokyo, Japan. IEEE (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yinghao Huang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Huang, Y., Murphey, Y., Seliya, N., Friedenthal, R. (2015). Machine Learning for Medical Examination Report Processing. In: Abou-Nasr, M., Lessmann, S., Stahlbock, R., Weiss, G. (eds) Real World Data Mining Applications. Annals of Information Systems, vol 17. Springer, Cham. https://doi.org/10.1007/978-3-319-07812-0_14

Download citation

Publish with us

Policies and ethics