Abstract
Vast amounts of human medical documents contain rich knowledge that can be used to facilitate a broad range of medical research and clinical study. One important application is to automatically categorize medical documents into specific categories. However, those medical documents usually contain names and identities of patients and doctors that are not allowed to be disclosed due to patient privacy and regulation issues concerning medical data. In this article, we address two issues, automatic name entity detection, and automatic classification of medical reports. We present a name entity recognition system, MD_NER_NCL, and a text document classification system, C_IME_RPT for medical report processing and categorization. The MD_NER_NCL contains an innovative segmentation algorithm, called HBE segmentation, that segments a medical text document into the Heading, Body and Ending parts, and a statistical reasoning process that utilizes knowledge of three entity lists: people name prefix list, people name suffix list, and false positive prefix list. The C_IME_RPT is developed based on Self Organizing Maps (SOM) and a machine learning process. Both systems have been evaluated using Independent Medical Examination (IME) reports provided by medical professionals. The proposed system MD_NER_NCL made a significant improvement over the well-known text analysis software, OpenNLP, for people name entity detection. The C_IME_RPT system attained a 89.9% classification accuracy, which is very good in clinical record classification. We also present an in-depth empirical study on the effectiveness of parameters associated with the SOM learning process and text mining, and their effects on classification results.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Arai, K., Barakbah, A.: Hierarchical k-means: an algorithm for centroids initialization for k-means. Rep. Fac. Sci. Eng. 36(1), 25–31 (2007)
Basu, A., Walters, C., Shepherd, M.: Support vector machines for text categorization. In: Proceedings of the 36th Annual Hawaii International Conference on System Sciences, Los Alamitos, California, USA, 2003, pp. 7–IEEE (2003)
Bender, O., Och, F., Ney, H.: Maximum entropy models for named entity recognition. In: Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL 2003, vol. 4, pp. 148–151, Edmonton, Canada. Association for Computational Linguistics (2003)
Benkhalifa, M., Bensaid, A., Mouradi, A.: Text categorization using the semi-supervised fuzzy c-means algorithm. In: Fuzzy Information Processing Society, 1999, NAFIPS. 18th International Conference of the North American. pp. 561–565, New York, USA. IEEE (1999)
Céréghino, R., Park, Y.: Review of the self-organizing map (som) approach in water resources: commentary. Environ. Model. Softw. 24(8), 945–947 (2009)
Chang, Y., Sung, Y.: Applying name entity recognition to informal text. Recall 1, 1 (2005)
Chen, Z., Ni, C., Murphey, Y.L.: Neural network approaches for text document categorization. In: IEEE International Joint Conference on Neural Networks, Vancouver, BC, Canada (2006)
Chen, Z., Huang, L., Murphey, Y.L.: Incremental neural learning for text document classification. In: International Joint Conference on Neural Networks, Orlando, Florida, USA (2007)
Cheung, Y.: k*-means: a new generalized k-means clustering algorithm. Pattern Recognit. Lett. 24(15), 2883–2893 (2003)
Chieu, H., Ng, H.: Named entity recognition: a maximum entropy approach using global information. In: Proceedings of the 19th International Conference on Computational Linguistics, vol. 1, pp. 1–7. Association for Computational Linguistics, Taipei, Taiwan (2002)
Cios, K., William Moore, G.: Uniqueness of medical data mining. Artif. Intell. Med. 26(1), 1–24 (2002)
Claster, W., Shanmuganathan, S., Ghotbi, N.: Text mining of medical records for radiodiagnostic decision-making. J. Comput. 3(1), 1–6 (2008)
Collier, N., Nazarenko, A., Baud, R., Ruch, P.: Recent advances in natural language processing for biomedical applications. Int. J. Med. Inform. 75(6), 413–417 (2006)
Cortes, C., Vapnik, V.: Support-vector networks. Machine Learn. 20(3), 273–297 (1995)
Farkas, J.: Generating document clusters using thesauri and neural networks. In: Canadian Conference on Electrical and Computer Engineering, 1994, Conference Proceedings 1994, pp. 710–713, New York, NY, USA. IEEE (1994)
Florian, R., Ittycheriah, A., Jing, H., Zhang, T.: Named entity recognition through classifier combination. In: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003, vol. 4, pp. 168–171. Association for Computational Linguistics, Edmonton, Canada (2003)
Ho, T.: Random decision forests. In: Proceedings of the 3rd International Conference on Document Analysis and Recognition, 1995, vol. 1, pp. 278–282, Montreal, Canada. IEEE (1995)
Holzinger, A., Geierhofer, R., Mödritscher, F., Tatzl, R.: Semantic information in medical information systems: Utilization of text mining techniques to analyze medical diagnoses. J. Univ. Comput. Sci. 14(22), 3781–3795 (2008)
Huang, L., Murphey, Y.: Text mining with application to engineering diagnostics. Advances in Applied Artificial Intelligence, pp. 1309–1317 (2006)
Huang, Y., Seliya, N., Murphey, Y.L., Friedenthal, R.B.: Named entity recognition and classification in medical text documents. In: The 5th International Conference on Data Mining, Las Vegas, Nevada, USA (2009)
Hyotyniemi, H., et al.: Text document classification with self-organizing maps. STeP'96, Genes, Nets and Symbols, pp. 64–72 (1996)
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. Machine Learning: ECML-98, pp. 137–142, Chemnitz, Germany (1998)
Kohonen, T.: Self-organizing maps, vol. 30. Springer, Berlin, Germany (2001)
Kohonen, T., Kaski, S., Lagus, K., Salojarvi, J., Honkela, J., Paatero, V., Saarela, A.: Self organization of a massive document collection. IEEE Trans. Neur. Netw. 11(3), 574–585 (2000)
Lam, W., Low, K.: Automatic document classification based on probabilistic reasoning: Model and performance analysis. In: Systems, Man, and Cybernetics, 1997. IEEE International Conference on Computational Cybernetics and Simulation, 1997, vol. 3, pp. 2719–2723. IEEE (1997)
Langley, P., Iba, W., Thompson, K.: An analysis of bayesian classifiers. In: Proceedings of the National Conference on Artificial Intelligence, pp. 223–223, Menlo Park, CA, USA . Wiley (1992)
Lee, D., Chuang, H., Seamons, K.: Document ranking and the vector-space model. IEEE Softw. 14(2), 67–75 (1997)
Luhn, H.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159–165 (1958)
MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. California, USA (1967)
Makoto, I., Takenobu, T.: Hierarchical bayesian clustering for automatic text classification. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI'95), Montreal, Quebec, Canada (1995)
Manine, A., Alphonse, E., Bessières, P.: Learning ontological rules to extract multiple relations of genic interactions from text. Int. J. Med. Inform. 78(12), e31–e38 (2009)
Marcus, M., Marcinkiewicz, M., Santorini, B.: Building a large annotated corpus of english: The penn treebank. Comput. Linguist. 19(2), 313–330 (1993)
Mayfield, J., McNamee, P., Piatko, C.: Named entity recognition using hundreds of thousands of features. In: Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL 2003, vol. 4, pp. 184–187. Association for Computational Linguistics (2003)
McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL 2003, vol. 4, pp. 188–191, Edmonton, Canada. Association for Computational Linguistics (2003)
Merkl, D.: Text classification with self-organizing maps: Some lessons learned. Neurocomputing 21(1), 61–77 (1998)
Nigam, K., McCallum, A., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using em algorithm. Machine Learn. 39(2), 103–134 (2000)
Ou, G., Murphey, Y.L., Feldkamp, L.: Multicategory pattern classification using neural networks. In: International Conference on Pattern Recognition, Cambridge, UK (2004)
Pölzlbauer, G.: Survey and comparison of quality measures for self-organizing maps. In: 5th Workshop on Data Analysis (WDA 2004), pp. 67–82 2004
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988)
Soderland, S.: Learning information extraction rules for semi-structured and free text. Machine Learn. 34(1), 233–272 (1999)
Soderland, S., Aronow, D., Fisher, D., Aseltine, J., Lehnert, W.: Machine learning of text analysis rules for clinical records. TE-39: University of Massachusetts, Center for Intelligent Information Retrieval Technical Report (1995)
Svingen, B.: Using genetic programming for document classification. Diane J. Cook (1998)
Toutanova, K., Klein, D., Manning, C., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 173–180. Association for Computational Linguistics (2003)
Uriarte, E., Mart\'ı n, F.: Topology preservation in SOM. Int. J. Appl. Math. Comput. Sci. 1(1), 19–22 (2005)
Vesanto, J., et al.: Technical report on SOM toolbox 2.0. Espoo, Finland (2000)
Wang, J., Delabie, J., Aasheim, H., Smeland, E., Myklebost, O.: Clustering of the SOM easily reveals distinct gene expression patterns: results of a reanalysis of lymphoma study. BMC Bioinform. 3(1), 3–6 (2002)
Zhou, X., Han, H., Chankai, I., Prestrud, A., Brooks, A.: Converting semi-structured clinical medical records into information and knowledge. In: 21st International Conference on Data Engineering Workshops, 2005, pp. 1162–1162, Tokyo, Japan. IEEE (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Huang, Y., Murphey, Y., Seliya, N., Friedenthal, R. (2015). Machine Learning for Medical Examination Report Processing. In: Abou-Nasr, M., Lessmann, S., Stahlbock, R., Weiss, G. (eds) Real World Data Mining Applications. Annals of Information Systems, vol 17. Springer, Cham. https://doi.org/10.1007/978-3-319-07812-0_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-07812-0_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07811-3
Online ISBN: 978-3-319-07812-0
eBook Packages: Business and EconomicsBusiness and Management (R0)