Text Classification for Healthcare Information Support

  • Rey-Long Liu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4570)

Abstract

Healthcare information support (HIS) is essential in managing, gathering, and disseminating information for healthcare decision support through the Internet. To support HIS, text classification (TC) is a key kernel. Upon receiving a text of healthcare need (e.g. symptom description from patients) or healthcare information (e.g. information from medical literature and news), a text classifier may determine its corresponding categories (e.g. diseases), and hence subsequent HIS tasks (e.g. online healthcare consultancy and information recommendation) may be conducted. The key challenge lies on high-quality TC, which aims to classify most texts into suitable categories (i.e. recall is very high), while at the same time, avoid misclassifications of most texts (precision is very high). High-quality TC is particularly essential, since healthcare is a domain where an error may incur higher cost and/or serious problems. Unfortunately, high-quality TC was seldom achieved in previous studies. In the paper, we present a case study in which a high-quality classifier is built to support HIS in Chinese disease-related information, including the cause, symptom, curing, side-effect, and prevention of cancer. The results show that, without relying on domain knowledge and complicated processing, cancer information may be classified into suitable categories, with a controlled amount of confirmations.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Arampatzis, A., Beney, J., Koster, C.H.A., van der Weide, T.P.: Incrementality, Half-life, and Threshold Optimization for Adaptive Document Filtering. In: Proceedings of the 9th Text Retrieval Conference (2000), Gaithersburg, Maryland, pp. 589–600 (2000)Google Scholar
  2. 2.
    Fahey, D.K., Weinberg, J.: LASIK Complications and the Internet: Is the Public being Mislead? Journal of Medical Internet Research 5(1) (2003)Google Scholar
  3. 3.
    Ivanitskaya, L., O’Boyle, I., Casey, A.M.: Health Information Literacy and Competencies of Information Age Students: Results From the Interactive Online Research Readiness Self-Assessment (RRSA). Journal of Medical Internet Research 8(2), e6 (2006)CrossRefGoogle Scholar
  4. 4.
    Iyengar, V.S., Apte, C., Zhang, T.: Active Learning using Adaptive Resampling. In: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, Massachusetts, pp. 91–98. ACM Press, New York (2000)CrossRefGoogle Scholar
  5. 5.
    Kittler, A.F., Hobbs, J., Volk, L.A., Kreps, G.L., Bates, D.W.: The Internet as a Vehicle to Communicate Health Information During a Public Health Emergency: A Survey Analysis Involving the Anthrax Scare of 2001, Journal of Medical Internet Research 6(1) (2004)Google Scholar
  6. 6.
    Liu, R.-L., Lin, W.-J.: Adaptive Sampling for Thresholding in Document Filtering and Classification. Information Processing and Management 41(4), 745–758 (2005)CrossRefGoogle Scholar
  7. 7.
    Liu, R.-L.: Dynamic Category Profiling for Text Filtering and Classification. Information Processing and Management 43(1), 154–168 (2007)CrossRefGoogle Scholar
  8. 8.
    Schapire, R.E., Singer, Y., Singhal, A.: Boosting and Rocchio Applied to Text Filtering. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval, Melbourne, Australia, pp. 215–223. ACM Press, New York (1998)CrossRefGoogle Scholar
  9. 9.
    Singhal, A., Mitra, M., Buckley, C.: Learning Routing Queries in a Query Zone. In: Proceedings of the 20th annual international ACM SIGIR conference on research and development in information retrieval, Philadelphia, Pennsylvania, pp. 25–32. ACM Press, New York (1997)CrossRefGoogle Scholar
  10. 10.
    Tang, T.T., Hawking, D., Craswell, N., Griffiths, K.: Focused Crawling for both Topical Relevance and Quality of Medical Information. In: Proceedings of the ACM 14th Conference on Information and Knowledge Management, Bremen, Germany, pp. 147–154. ACM Press, New York (2005)CrossRefGoogle Scholar
  11. 11.
    Wu, H., Phang, T.H., Liu, B., Li, X.: A Refinement Approach to Handling Model Misfit in Text Categorization. In: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, pp. 207–216. ACM Press, New York (2002)Google Scholar
  12. 12.
    Yang, Y.: A Study of Thresholding Strategies for Text Categorization. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, New Orleans, Louisiana, pp. 137–145. ACM Press, New York (2001)CrossRefGoogle Scholar
  13. 13.
    Yang, Y., Pedersen, J.O.: A Comparative Study on Feature Selection in Text Categorization. In: Proceedings of the 14th International Conference on Machine Learning, Nashville, Tennessee, pp. 412–420 (1997)Google Scholar
  14. 14.
    Zhang, Y., Callan, J.: Maximum Likelihood Estimation for Filtering Thresholds. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval, New Orleans, Louisiana, pp. 294–302. ACM Press, New York (2001)CrossRefGoogle Scholar

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • Rey-Long Liu
    • 1
  1. 1.Department of Medical Informatics, Tzu Chi University, Hualien, TaiwanR.O.C.

Personalised recommendations