Machine Learning for Medical Examination Report Processing

Huang, Yinghao; Murphey, Yi Lu; Seliya, Naeem; Friedenthal, Roy B.

doi:10.1007/978-3-319-07812-0_14

Machine Learning for Medical Examination Report Processing

Yinghao Huang⁷,
Yi Lu Murphey⁸,
Naeem Seliya⁷ &
…
Roy B. Friedenthal⁹

Chapter
First Online: 14 November 2014

2863 Accesses

Part of the book series: Annals of Information Systems ((AOIS,volume 17))

Abstract

Vast amounts of human medical documents contain rich knowledge that can be used to facilitate a broad range of medical research and clinical study. One important application is to automatically categorize medical documents into specific categories. However, those medical documents usually contain names and identities of patients and doctors that are not allowed to be disclosed due to patient privacy and regulation issues concerning medical data. In this article, we address two issues, automatic name entity detection, and automatic classification of medical reports. We present a name entity recognition system, MD_NER_NCL, and a text document classification system, C_IME_RPT for medical report processing and categorization. The MD_NER_NCL contains an innovative segmentation algorithm, called HBE segmentation, that segments a medical text document into the Heading, Body and Ending parts, and a statistical reasoning process that utilizes knowledge of three entity lists: people name prefix list, people name suffix list, and false positive prefix list. The C_IME_RPT is developed based on Self Organizing Maps (SOM) and a machine learning process. Both systems have been evaluated using Independent Medical Examination (IME) reports provided by medical professionals. The proposed system MD_NER_NCL made a significant improvement over the well-known text analysis software, OpenNLP, for people name entity detection. The C_IME_RPT system attained a 89.9% classification accuracy, which is very good in clinical record classification. We also present an in-depth empirical study on the effectiveness of parameters associated with the SOM learning process and text mining, and their effects on classification results.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Arai, K., Barakbah, A.: Hierarchical k-means: an algorithm for centroids initialization for k-means. Rep. Fac. Sci. Eng. 36(1), 25–31 (2007)
Google Scholar
Basu, A., Walters, C., Shepherd, M.: Support vector machines for text categorization. In: Proceedings of the 36th Annual Hawaii International Conference on System Sciences, Los Alamitos, California, USA, 2003, pp. 7–IEEE (2003)
Google Scholar
Bender, O., Och, F., Ney, H.: Maximum entropy models for named entity recognition. In: Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL 2003, vol. 4, pp. 148–151, Edmonton, Canada. Association for Computational Linguistics (2003)
Google Scholar
Benkhalifa, M., Bensaid, A., Mouradi, A.: Text categorization using the semi-supervised fuzzy c-means algorithm. In: Fuzzy Information Processing Society, 1999, NAFIPS. 18th International Conference of the North American. pp. 561–565, New York, USA. IEEE (1999)
Google Scholar
Céréghino, R., Park, Y.: Review of the self-organizing map (som) approach in water resources: commentary. Environ. Model. Softw. 24(8), 945–947 (2009)
Article Google Scholar
Chang, Y., Sung, Y.: Applying name entity recognition to informal text. Recall 1, 1 (2005)
Google Scholar
Chen, Z., Ni, C., Murphey, Y.L.: Neural network approaches for text document categorization. In: IEEE International Joint Conference on Neural Networks, Vancouver, BC, Canada (2006)
Google Scholar
Chen, Z., Huang, L., Murphey, Y.L.: Incremental neural learning for text document classification. In: International Joint Conference on Neural Networks, Orlando, Florida, USA (2007)
Google Scholar
Cheung, Y.: k*-means: a new generalized k-means clustering algorithm. Pattern Recognit. Lett. 24(15), 2883–2893 (2003)
Article Google Scholar
Chieu, H., Ng, H.: Named entity recognition: a maximum entropy approach using global information. In: Proceedings of the 19th International Conference on Computational Linguistics, vol. 1, pp. 1–7. Association for Computational Linguistics, Taipei, Taiwan (2002)
Google Scholar
Cios, K., William Moore, G.: Uniqueness of medical data mining. Artif. Intell. Med. 26(1), 1–24 (2002)
Article Google Scholar
Claster, W., Shanmuganathan, S., Ghotbi, N.: Text mining of medical records for radiodiagnostic decision-making. J. Comput. 3(1), 1–6 (2008)
Article Google Scholar
Collier, N., Nazarenko, A., Baud, R., Ruch, P.: Recent advances in natural language processing for biomedical applications. Int. J. Med. Inform. 75(6), 413–417 (2006)
Article Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Machine Learn. 20(3), 273–297 (1995)
Google Scholar
Farkas, J.: Generating document clusters using thesauri and neural networks. In: Canadian Conference on Electrical and Computer Engineering, 1994, Conference Proceedings 1994, pp. 710–713, New York, NY, USA. IEEE (1994)
Google Scholar
Florian, R., Ittycheriah, A., Jing, H., Zhang, T.: Named entity recognition through classifier combination. In: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003, vol. 4, pp. 168–171. Association for Computational Linguistics, Edmonton, Canada (2003)
Google Scholar
Ho, T.: Random decision forests. In: Proceedings of the 3rd International Conference on Document Analysis and Recognition, 1995, vol. 1, pp. 278–282, Montreal, Canada. IEEE (1995)
Google Scholar
Holzinger, A., Geierhofer, R., Mödritscher, F., Tatzl, R.: Semantic information in medical information systems: Utilization of text mining techniques to analyze medical diagnoses. J. Univ. Comput. Sci. 14(22), 3781–3795 (2008)
Google Scholar
Huang, L., Murphey, Y.: Text mining with application to engineering diagnostics. Advances in Applied Artificial Intelligence, pp. 1309–1317 (2006)
Chapter Google Scholar
Huang, Y., Seliya, N., Murphey, Y.L., Friedenthal, R.B.: Named entity recognition and classification in medical text documents. In: The 5th International Conference on Data Mining, Las Vegas, Nevada, USA (2009)
Google Scholar
Hyotyniemi, H., et al.: Text document classification with self-organizing maps. STeP'96, Genes, Nets and Symbols, pp. 64–72 (1996)
Google Scholar
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. Machine Learning: ECML-98, pp. 137–142, Chemnitz, Germany (1998)
Google Scholar
Kohonen, T.: Self-organizing maps, vol. 30. Springer, Berlin, Germany (2001)
Google Scholar
Kohonen, T., Kaski, S., Lagus, K., Salojarvi, J., Honkela, J., Paatero, V., Saarela, A.: Self organization of a massive document collection. IEEE Trans. Neur. Netw. 11(3), 574–585 (2000)
Article Google Scholar
Lam, W., Low, K.: Automatic document classification based on probabilistic reasoning: Model and performance analysis. In: Systems, Man, and Cybernetics, 1997. IEEE International Conference on Computational Cybernetics and Simulation, 1997, vol. 3, pp. 2719–2723. IEEE (1997)
Google Scholar
Langley, P., Iba, W., Thompson, K.: An analysis of bayesian classifiers. In: Proceedings of the National Conference on Artificial Intelligence, pp. 223–223, Menlo Park, CA, USA . Wiley (1992)
Google Scholar
Lee, D., Chuang, H., Seamons, K.: Document ranking and the vector-space model. IEEE Softw. 14(2), 67–75 (1997)
Article Google Scholar
Luhn, H.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159–165 (1958)
Article Google Scholar
MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. California, USA (1967)
Google Scholar
Makoto, I., Takenobu, T.: Hierarchical bayesian clustering for automatic text classification. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI'95), Montreal, Quebec, Canada (1995)
Google Scholar
Manine, A., Alphonse, E., Bessières, P.: Learning ontological rules to extract multiple relations of genic interactions from text. Int. J. Med. Inform. 78(12), e31–e38 (2009)
Article Google Scholar
Marcus, M., Marcinkiewicz, M., Santorini, B.: Building a large annotated corpus of english: The penn treebank. Comput. Linguist. 19(2), 313–330 (1993)
Google Scholar
Mayfield, J., McNamee, P., Piatko, C.: Named entity recognition using hundreds of thousands of features. In: Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL 2003, vol. 4, pp. 184–187. Association for Computational Linguistics (2003)
Google Scholar
McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL 2003, vol. 4, pp. 188–191, Edmonton, Canada. Association for Computational Linguistics (2003)
Google Scholar
Merkl, D.: Text classification with self-organizing maps: Some lessons learned. Neurocomputing 21(1), 61–77 (1998)
Article Google Scholar
Nigam, K., McCallum, A., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using em algorithm. Machine Learn. 39(2), 103–134 (2000)
Article Google Scholar
Ou, G., Murphey, Y.L., Feldkamp, L.: Multicategory pattern classification using neural networks. In: International Conference on Pattern Recognition, Cambridge, UK (2004)
Google Scholar
Pölzlbauer, G.: Survey and comparison of quality measures for self-organizing maps. In: 5th Workshop on Data Analysis (WDA 2004), pp. 67–82 2004
Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)
Article Google Scholar
Soderland, S.: Learning information extraction rules for semi-structured and free text. Machine Learn. 34(1), 233–272 (1999)
Article Google Scholar
Soderland, S., Aronow, D., Fisher, D., Aseltine, J., Lehnert, W.: Machine learning of text analysis rules for clinical records. TE-39: University of Massachusetts, Center for Intelligent Information Retrieval Technical Report (1995)
Google Scholar
Svingen, B.: Using genetic programming for document classification. Diane J. Cook (1998)
Google Scholar
Toutanova, K., Klein, D., Manning, C., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 173–180. Association for Computational Linguistics (2003)
Google Scholar
Uriarte, E., Mart\'ı n, F.: Topology preservation in SOM. Int. J. Appl. Math. Comput. Sci. 1(1), 19–22 (2005)
Google Scholar
Vesanto, J., et al.: Technical report on SOM toolbox 2.0. Espoo, Finland (2000)
Google Scholar
Wang, J., Delabie, J., Aasheim, H., Smeland, E., Myklebost, O.: Clustering of the SOM easily reveals distinct gene expression patterns: results of a reanalysis of lymphoma study. BMC Bioinform. 3(1), 3–6 (2002)
Article Google Scholar
Zhou, X., Han, H., Chankai, I., Prestrud, A., Brooks, A.: Converting semi-structured clinical medical records into information and knowledge. In: 21st International Conference on Data Engineering Workshops, 2005, pp. 1162–1162, Tokyo, Japan. IEEE (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer and Information Science, University of Michigan—Dearborn, Dearborn, MI, 48128, USA
Yinghao Huang & Naeem Seliya
Electrical and Computer Engineering, University of Michigan—Dearborn, Dearborn, MI, 48128, USA
Yi Lu Murphey
Central Orthopedics, 820 S. White Horse Pike, Hammonton, NJ, 08037, USA
Roy B. Friedenthal

Authors

Yinghao Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yi Lu Murphey
View author publications
You can also search for this author in PubMed Google Scholar
Naeem Seliya
View author publications
You can also search for this author in PubMed Google Scholar
Roy B. Friedenthal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yinghao Huang .

Editor information

Editors and Affiliations

Research & Advanced Engineering, Ford Motor Company, Dearborn, Michigan, USA
Mahmoud Abou-Nasr
Universität Hamburg Inst. Wirtschaftsinformatik, Hamburg, Germany
Stefan Lessmann
Universität Hamburg Inst. Wirtschaftsinformatik, Hamburg, Germany
Robert Stahlbock
Deptartment of Computer & Information Science, Fordham University, Bronx, New York, USA
Gary M. Weiss

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Huang, Y., Murphey, Y., Seliya, N., Friedenthal, R. (2015). Machine Learning for Medical Examination Report Processing. In: Abou-Nasr, M., Lessmann, S., Stahlbock, R., Weiss, G. (eds) Real World Data Mining Applications. Annals of Information Systems, vol 17. Springer, Cham. https://doi.org/10.1007/978-3-319-07812-0_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-07812-0_14
Published: 14 November 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07811-3
Online ISBN: 978-3-319-07812-0
eBook Packages: Business and EconomicsBusiness and Management (R0)

Publish with us

Policies and ethics