Topic categorization and representation of health community generated data
- 263 Downloads
The representation and categorization of professional health provider released data have been well investigated and practically implemented. These have facilitated browsing, search and high-order learning of health information. On the other hand, there has been little corresponding studies on the representation and categorization of health community generated data. It is usually more complex, inconsistent and ambiguous, and consequently raises challenges for data access and analytics. This paper explores various representations for health community generated data and categorizes these data in terms of health topics. In addition, this work utilizes pseudo-labeled data to train the supervised topic categorization models, and this makes the whole categorization process unsupervised and extendable to handle large-scale data. The extensive experiments on two real-world datasets reveal our interesting findings of the informative representation approaches and effective categorization models for health community generated data.
KeywordsHealth community generated data Learning model Semantic representation Health topic categorization
The work presented in this paper is partially supported by the National Natural Science Foundation of China under Grant No. 61100133 and the Major Projects of National Social Science Foundation of China under Grant No. 11&ZD189.
- 1.Babashzadeh A, Huang J, Daoud M (2013) Exploiting semantics for improving clinical information retrieval. Proceedings of the International ACM SIGIR Conference 801–804Google Scholar
- 3.Chan W, Yang W, Tang J, et al (2013) Community question topic categorization via hierarchical kernelized classification. Proceedings of the 22nd ACM International Conference on Information and Knowledge Management 959–968Google Scholar
- 4.Chang X, Yang Y, Xing E, Yu Y (2015) Complex event detection using semantic saliency and nearly-isotonic SVM. Proceedings of the 32nd International Conference on Machine Learning 1348–1357Google Scholar
- 6.Hong R, Li G, Nie L, Tang J, Chua T (2010) Exploring large scale data for multimedia QA: an initial study. Proceedings of the ACM International Conference on Image and Video Retrieval 74–81Google Scholar
- 8.Kim M and Goebel R (2010) Detection and normalization of medical terms using domain-specific term frequency and adaptive ranking. IEEE International Conference on Information Technology and Applications in Biomedicine 1–5Google Scholar
- 10.Limsopatham N, Macdonald C and Ounis I (2013a) A task-specific query and document representation for medical records search. Proceedings of the European Conference on Advances in Information Retrieval 747–751Google Scholar
- 11.Limsopatham N, Macdonald C and Ounis I (2013b) Learning to combine representations for medical records search. Proceedings of the International ACM SIGIR Conference 833–836Google Scholar
- 12.Nie L, Wang M, Zha Z, Li G, and Chua T (2011) Multimedia answering: Enriching text QA with media information. Proceedings of the International ACM SIGIR Conference 695–704Google Scholar
- 16.Nie L, Li T, Akbari M, Shen J, Chua T (2014b) WenZher: comprehensive vertical search for healthcare domain. Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval 1245–1246Google Scholar
- 17.Nie L, Akbari M, Li T, Chua T (2014c) A joint local-global approach for medical terminology assignment. In Medical Information Retrieval Workshop at SIGIR 2014, 24–27Google Scholar
- 21.Trieschnigg D, Hiemstra D, de Jong F and Kraaij W (2010) A cross-lingual framework for monolingual biomedical information retrieval. Proceedings of the ACM Conference on Information and Knowledge Management 169–178Google Scholar
- 22.Velardi P, Missikoff M and Basili R (2001) Identification of relevant terms to support the construction of domain ontologies. Proceedings of the workshop on Human Language Technology and Knowledge Management, doi: 10.3115/1118220.1118225.
- 23.Yan Y, Ricci E, Subramanian R, Lanz O, Sebe N (2013a) No matter where you are: Flexible graph-guided multi-task learning for multi-view head pose classification under target motion. Proceedings of 2013 I.E. International Conference on Computer Vision 1177–1184Google Scholar
- 24.Yan Y, Xu Z, Liu G, Ma Z, Sebe N (2013b) GLocal structural feature selection with sparsity for multimedia data understanding, Proceedings of the ACM International Conference on Multimedia 537–540Google Scholar
- 28.Yang S, White R and Horvitz E (2013) Pursuing insights about healthcare utilization via geocoded search queries. Proceedings of the International ACM SIGIR Conference 993–996Google Scholar
- 29.Zhang W, Ming Z, Zhang Y, Nie L, Liu T, Chua T (2012) The use of dependency relation graph to enhance the term weighting in question retrieval. Proceedings of the 25th International Conference on Computational Linguistics 3105–3120Google Scholar
- 36.Zhu D and Carterette B (2013) An adaptive evidence weighting method for medical record search. Proceedings of the International ACM SIGIR Conference 1025–1028Google Scholar