Using Domain-Specific Term Frequencies to Identify and Classify Health Queries
In this paper we propose a multilingual method to identify health-related queries and classify them into health categories. Our method uses a consumer health vocabulary and the Unified Medical Language System semantic structure to compute the association degree of a query to medical concepts and categories. This method can be applied in different languages with translated versions of the health vocabulary. To evaluate its efficacy and applicability in two languages we used two manually classified sets of queries, each on a different language. Results are better for the English sample where a distance of 0.38 to the ROC optimal point (0,1) was obtained. This shows some influence of the translation in the method’s performance.
KeywordsHealth Information Retrieval Health Queries Medical Vocabularies Web Information Retrieval
Unable to display preview. Download preview PDF.
- 1.Beitzel, S.M., Jensen, E.C., Frieder, O., Lewis, D.D., Chowdhury, A., Kolcz, A.: Improving Automatic Query Classification via Semi-Supervised Learning. In: Fifth IEEE International Conference on Data Mining (2005)Google Scholar
- 2.Eysenbach, G., Köhler, C.: What is the prevalence of health-related searches on the World Wide Web? Qualitative and quantitative analysis of search engine queries on the internet. In: AMIA Symposium (2003)Google Scholar
- 3.Fox, S.: Online Health Search 2006. Tech. rep., Pew Internet & American Life Project (2006)Google Scholar
- 4.Lopes, C.T.: Evaluation and comparison of automatic methods to identify health queries. In: Doctoral Symposium on Informatics Engineering 2008 (2008)Google Scholar
- 5.Zeng, Q.T., Crowell, J., Plovnick, R.M., Kim, E., Ngo, L., Dibble, E.: Assisting consumer health information retrieval with query recommendations. JAMIA (2006)Google Scholar