Text Classification for Healthcare Information Support
Healthcare information support (HIS) is essential in managing, gathering, and disseminating information for healthcare decision support through the Internet. To support HIS, text classification (TC) is a key kernel. Upon receiving a text of healthcare need (e.g. symptom description from patients) or healthcare information (e.g. information from medical literature and news), a text classifier may determine its corresponding categories (e.g. diseases), and hence subsequent HIS tasks (e.g. online healthcare consultancy and information recommendation) may be conducted. The key challenge lies on high-quality TC, which aims to classify most texts into suitable categories (i.e. recall is very high), while at the same time, avoid misclassifications of most texts (precision is very high). High-quality TC is particularly essential, since healthcare is a domain where an error may incur higher cost and/or serious problems. Unfortunately, high-quality TC was seldom achieved in previous studies. In the paper, we present a case study in which a high-quality classifier is built to support HIS in Chinese disease-related information, including the cause, symptom, curing, side-effect, and prevention of cancer. The results show that, without relying on domain knowledge and complicated processing, cancer information may be classified into suitable categories, with a controlled amount of confirmations.
Unable to display preview. Download preview PDF.
- 1.Arampatzis, A., Beney, J., Koster, C.H.A., van der Weide, T.P.: Incrementality, Half-life, and Threshold Optimization for Adaptive Document Filtering. In: Proceedings of the 9th Text Retrieval Conference (2000), Gaithersburg, Maryland, pp. 589–600 (2000)Google Scholar
- 2.Fahey, D.K., Weinberg, J.: LASIK Complications and the Internet: Is the Public being Mislead? Journal of Medical Internet Research 5(1) (2003)Google Scholar
- 5.Kittler, A.F., Hobbs, J., Volk, L.A., Kreps, G.L., Bates, D.W.: The Internet as a Vehicle to Communicate Health Information During a Public Health Emergency: A Survey Analysis Involving the Anthrax Scare of 2001, Journal of Medical Internet Research 6(1) (2004)Google Scholar
- 11.Wu, H., Phang, T.H., Liu, B., Li, X.: A Refinement Approach to Handling Model Misfit in Text Categorization. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, pp. 207–216. ACM Press, New York (2002)Google Scholar
- 13.Yang, Y., Pedersen, J.O.: A Comparative Study on Feature Selection in Text Categorization. In: Proceedings of the 14th International Conference on Machine Learning, Nashville, Tennessee, pp. 412–420 (1997)Google Scholar