Abstract
The representation and categorization of professional health provider released data have been well investigated and practically implemented. These have facilitated browsing, search and high-order learning of health information. On the other hand, there has been little corresponding studies on the representation and categorization of health community generated data. It is usually more complex, inconsistent and ambiguous, and consequently raises challenges for data access and analytics. This paper explores various representations for health community generated data and categorizes these data in terms of health topics. In addition, this work utilizes pseudo-labeled data to train the supervised topic categorization models, and this makes the whole categorization process unsupervised and extendable to handle large-scale data. The extensive experiments on two real-world datasets reveal our interesting findings of the informative representation approaches and effective categorization models for health community generated data.
Similar content being viewed by others
Notes
In this work, D2 is a general English Gigaword data of Linguistic Data Consortium (http://www.ldc.upenn.edu/)
References
Babashzadeh A, Huang J, Daoud M (2013) Exploiting semantics for improving clinical information retrieval. Proceedings of the International ACM SIGIR Conference 801–804
Blei D, Ng A, Jordan M, Lafferty J (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Chan W, Yang W, Tang J, et al (2013) Community question topic categorization via hierarchical kernelized classification. Proceedings of the 22nd ACM International Conference on Information and Knowledge Management 959–968
Chang X, Yang Y, Xing E, Yu Y (2015) Complex event detection using semantic saliency and nearly-isotonic SVM. Proceedings of the 32nd International Conference on Machine Learning 1348–1357
Hersh W, Hickam D, Haynes R, Mckibbon K (1994) A performance and failure analysis of SAPHIRE with a MEDLINE test collection. J Am Med Inform Assoc 1(1):51–60
Hong R, Li G, Nie L, Tang J, Chua T (2010) Exploring large scale data for multimedia QA: an initial study. Proceedings of the ACM International Conference on Image and Video Retrieval 74–81
Kanavos A, Makris C, Theodoridis E (2015) Topic categorization of biomedical abstracts. Int J Artif Intell Tools. doi:10.1142/S0218213015400047
Kim M and Goebel R (2010) Detection and normalization of medical terms using domain-specific term frequency and adaptive ranking. IEEE International Conference on Information Technology and Applications in Biomedicine 1–5
Li J, Liu C, Liu B, Mao R, Wang Y, Chen S, Yang J, Pan H, Wang Q (2015) Diversity-aware retrieval of medical records. Comput Ind 69:81–91
Limsopatham N, Macdonald C and Ounis I (2013a) A task-specific query and document representation for medical records search. Proceedings of the European Conference on Advances in Information Retrieval 747–751
Limsopatham N, Macdonald C and Ounis I (2013b) Learning to combine representations for medical records search. Proceedings of the International ACM SIGIR Conference 833–836
Nie L, Wang M, Zha Z, Li G, and Chua T (2011) Multimedia answering: Enriching text QA with media information. Proceedings of the International ACM SIGIR Conference 695–704
Nie L, Wang M, Gao Y, Zha Z, Chua T (2013a) Beyond text QA: multimedia answer generation by harvesting web information. IEEE Trans Multimedia 15(2):426–441
Nie L, ZhaoY WX, Shen J, Chua T (2013b) Learning to recommend descriptive tags for questions in social forums. ACM Trans Inf Syst 32(1):5. doi:10.1145/2559157
Nie L, Wang M, Zhang L, et al. (2014a) Disease inference from health-related questions via sparse deep learning. IEEE Trans Knowl Data Eng 27(8):2107–2119
Nie L, Li T, Akbari M, Shen J, Chua T (2014b) WenZher: comprehensive vertical search for healthcare domain. Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval 1245–1246
Nie L, Akbari M, Li T, Chua T (2014c) A joint local-global approach for medical terminology assignment. In Medical Information Retrieval Workshop at SIGIR 2014, 24–27
Nie L, Zhao Y, Akbari M, Shen J, Chua T (2015) Bridging the vocabulary gap between health seekers and healthcare knowledge. IEEE Trans Knowl Data Eng 27(2):396–409
Qu B, Cong G, Li C, et al. (2012) An evaluation of classification models for question topic categorization. J Am Soc Inf Sci Technol 63(5):889–903
Srinivasan P (1996) Optimal document-indexing vocabulary for MEDLINE. Inform Process Manag 32:503–514
Trieschnigg D, Hiemstra D, de Jong F and Kraaij W (2010) A cross-lingual framework for monolingual biomedical information retrieval. Proceedings of the ACM Conference on Information and Knowledge Management 169–178
Velardi P, Missikoff M and Basili R (2001) Identification of relevant terms to support the construction of domain ontologies. Proceedings of the workshop on Human Language Technology and Knowledge Management, doi:10.3115/1118220.1118225.
Yan Y, Ricci E, Subramanian R, Lanz O, Sebe N (2013a) No matter where you are: Flexible graph-guided multi-task learning for multi-view head pose classification under target motion. Proceedings of 2013 I.E. International Conference on Computer Vision 1177–1184
Yan Y, Xu Z, Liu G, Ma Z, Sebe N (2013b) GLocal structural feature selection with sparsity for multimedia data understanding, Proceedings of the ACM International Conference on Multimedia 537–540
Yan Y, Shen H, Liu G, Ma Z, Gao C, Sebe N (2014) GLocal tells you more: coupling GLocal structural for feature selection with sparsity for image and video classification. Comput Vis Image Underst 124:99–109
Yan Y, Ricci E, Liu G, Sebe N (2015a) Egocentric daily activity recognition via multitask clustering. IEEE Trans Image Process 24(10):2984–2995
Yan Y, Yang Y, Meng D, Liu G, Tong W, Hauptmann A, Sebe N (2015b) Event oriented dictionary learning for complex event detection. IEEE Trans Image Process 24(6):1867–1878
Yang S, White R and Horvitz E (2013) Pursuing insights about healthcare utilization via geocoded search queries. Proceedings of the International ACM SIGIR Conference 993–996
Zhang W, Ming Z, Zhang Y, Nie L, Liu T, Chua T (2012) The use of dependency relation graph to enhance the term weighting in question retrieval. Proceedings of the 25th International Conference on Computational Linguistics 3105–3120
Zhang L, Han Y, Yang Y, Song M, Yan S, Tian Q (2013) Discovering discriminative graphlets for aerial image categories recognition. IEEE Trans Image Process 22(12):5071–5084
Zhang L, Yang Y, Gao Y, Yu Y, Wang C, Li X (2014a) A probabilistic associative model for segmenting weakly supervised images. IEEE Trans Image Process 23(9):4150–4159
Zhang L, Gao Y, Ji R, Xia Y, Dai Q, Li X (2014b) Actively learning human gaze shifting paths for semantics-aware photo cropping. IEEE Trans Image Process 23(5):2235–2245
Zhang L, Gao Y, Xia Y, Lu K, Shen J, Ji R (2014c) Representative discovery of structure cues for weakly-supervised image segmentation. IEEE Trans Multimedia 16(2):470–479
Zhang L, Gao Y, Xia Y, Dai Q, Li X (2015a) A fine-grained image categorization system by cellet-encoded spatial pyramid modeling. IEEE Trans Ind Electron 62(1):564–571
Zhang L, Xia Y, Mao K, Ma H, Shan Z (2015b) An effective video summarization framework toward handheld devices. IEEE Trans Ind Electron 62(2):1309–1316
Zhu D and Carterette B (2013) An adaptive evidence weighting method for medical record search. Proceedings of the International ACM SIGIR Conference 1025–1028
Acknowledgments
The work presented in this paper is partially supported by the National Natural Science Foundation of China under Grant No. 61100133 and the Major Projects of National Social Science Foundation of China under Grant No. 11&ZD189.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liu, M., Zhang, H., Hu, H. et al. Topic categorization and representation of health community generated data. Multimed Tools Appl 76, 10541–10553 (2017). https://doi.org/10.1007/s11042-015-3094-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-015-3094-3