Multimedia Tools and Applications

, Volume 76, Issue 8, pp 10541–10553 | Cite as

Topic categorization and representation of health community generated data

  • Maofu LiuEmail author
  • He Zhang
  • Huijun Hu
  • Wei Wei


The representation and categorization of professional health provider released data have been well investigated and practically implemented. These have facilitated browsing, search and high-order learning of health information. On the other hand, there has been little corresponding studies on the representation and categorization of health community generated data. It is usually more complex, inconsistent and ambiguous, and consequently raises challenges for data access and analytics. This paper explores various representations for health community generated data and categorizes these data in terms of health topics. In addition, this work utilizes pseudo-labeled data to train the supervised topic categorization models, and this makes the whole categorization process unsupervised and extendable to handle large-scale data. The extensive experiments on two real-world datasets reveal our interesting findings of the informative representation approaches and effective categorization models for health community generated data.


Health community generated data Learning model Semantic representation Health topic categorization 



The work presented in this paper is partially supported by the National Natural Science Foundation of China under Grant No. 61100133 and the Major Projects of National Social Science Foundation of China under Grant No. 11&ZD189.


  1. 1.
    Babashzadeh A, Huang J, Daoud M (2013) Exploiting semantics for improving clinical information retrieval. Proceedings of the International ACM SIGIR Conference 801–804Google Scholar
  2. 2.
    Blei D, Ng A, Jordan M, Lafferty J (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022zbMATHGoogle Scholar
  3. 3.
    Chan W, Yang W, Tang J, et al (2013) Community question topic categorization via hierarchical kernelized classification. Proceedings of the 22nd ACM International Conference on Information and Knowledge Management 959–968Google Scholar
  4. 4.
    Chang X, Yang Y, Xing E, Yu Y (2015) Complex event detection using semantic saliency and nearly-isotonic SVM. Proceedings of the 32nd International Conference on Machine Learning 1348–1357Google Scholar
  5. 5.
    Hersh W, Hickam D, Haynes R, Mckibbon K (1994) A performance and failure analysis of SAPHIRE with a MEDLINE test collection. J Am Med Inform Assoc 1(1):51–60CrossRefGoogle Scholar
  6. 6.
    Hong R, Li G, Nie L, Tang J, Chua T (2010) Exploring large scale data for multimedia QA: an initial study. Proceedings of the ACM International Conference on Image and Video Retrieval 74–81Google Scholar
  7. 7.
    Kanavos A, Makris C, Theodoridis E (2015) Topic categorization of biomedical abstracts. Int J Artif Intell Tools. doi: 10.1142/S0218213015400047 Google Scholar
  8. 8.
    Kim M and Goebel R (2010) Detection and normalization of medical terms using domain-specific term frequency and adaptive ranking. IEEE International Conference on Information Technology and Applications in Biomedicine 1–5Google Scholar
  9. 9.
    Li J, Liu C, Liu B, Mao R, Wang Y, Chen S, Yang J, Pan H, Wang Q (2015) Diversity-aware retrieval of medical records. Comput Ind 69:81–91CrossRefGoogle Scholar
  10. 10.
    Limsopatham N, Macdonald C and Ounis I (2013a) A task-specific query and document representation for medical records search. Proceedings of the European Conference on Advances in Information Retrieval 747–751Google Scholar
  11. 11.
    Limsopatham N, Macdonald C and Ounis I (2013b) Learning to combine representations for medical records search. Proceedings of the International ACM SIGIR Conference 833–836Google Scholar
  12. 12.
    Nie L, Wang M, Zha Z, Li G, and Chua T (2011) Multimedia answering: Enriching text QA with media information. Proceedings of the International ACM SIGIR Conference 695–704Google Scholar
  13. 13.
    Nie L, Wang M, Gao Y, Zha Z, Chua T (2013a) Beyond text QA: multimedia answer generation by harvesting web information. IEEE Trans Multimedia 15(2):426–441CrossRefGoogle Scholar
  14. 14.
    Nie L, ZhaoY WX, Shen J, Chua T (2013b) Learning to recommend descriptive tags for questions in social forums. ACM Trans Inf Syst 32(1):5. doi: 10.1145/2559157 Google Scholar
  15. 15.
    Nie L, Wang M, Zhang L, et al. (2014a) Disease inference from health-related questions via sparse deep learning. IEEE Trans Knowl Data Eng 27(8):2107–2119CrossRefGoogle Scholar
  16. 16.
    Nie L, Li T, Akbari M, Shen J, Chua T (2014b) WenZher: comprehensive vertical search for healthcare domain. Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval 1245–1246Google Scholar
  17. 17.
    Nie L, Akbari M, Li T, Chua T (2014c) A joint local-global approach for medical terminology assignment. In Medical Information Retrieval Workshop at SIGIR 2014, 24–27Google Scholar
  18. 18.
    Nie L, Zhao Y, Akbari M, Shen J, Chua T (2015) Bridging the vocabulary gap between health seekers and healthcare knowledge. IEEE Trans Knowl Data Eng 27(2):396–409CrossRefGoogle Scholar
  19. 19.
    Qu B, Cong G, Li C, et al. (2012) An evaluation of classification models for question topic categorization. J Am Soc Inf Sci Technol 63(5):889–903CrossRefGoogle Scholar
  20. 20.
    Srinivasan P (1996) Optimal document-indexing vocabulary for MEDLINE. Inform Process Manag 32:503–514CrossRefGoogle Scholar
  21. 21.
    Trieschnigg D, Hiemstra D, de Jong F and Kraaij W (2010) A cross-lingual framework for monolingual biomedical information retrieval. Proceedings of the ACM Conference on Information and Knowledge Management 169–178Google Scholar
  22. 22.
    Velardi P, Missikoff M and Basili R (2001) Identification of relevant terms to support the construction of domain ontologies. Proceedings of the workshop on Human Language Technology and Knowledge Management, doi: 10.3115/1118220.1118225.
  23. 23.
    Yan Y, Ricci E, Subramanian R, Lanz O, Sebe N (2013a) No matter where you are: Flexible graph-guided multi-task learning for multi-view head pose classification under target motion. Proceedings of 2013 I.E. International Conference on Computer Vision 1177–1184Google Scholar
  24. 24.
    Yan Y, Xu Z, Liu G, Ma Z, Sebe N (2013b) GLocal structural feature selection with sparsity for multimedia data understanding, Proceedings of the ACM International Conference on Multimedia 537–540Google Scholar
  25. 25.
    Yan Y, Shen H, Liu G, Ma Z, Gao C, Sebe N (2014) GLocal tells you more: coupling GLocal structural for feature selection with sparsity for image and video classification. Comput Vis Image Underst 124:99–109CrossRefGoogle Scholar
  26. 26.
    Yan Y, Ricci E, Liu G, Sebe N (2015a) Egocentric daily activity recognition via multitask clustering. IEEE Trans Image Process 24(10):2984–2995MathSciNetCrossRefGoogle Scholar
  27. 27.
    Yan Y, Yang Y, Meng D, Liu G, Tong W, Hauptmann A, Sebe N (2015b) Event oriented dictionary learning for complex event detection. IEEE Trans Image Process 24(6):1867–1878MathSciNetCrossRefGoogle Scholar
  28. 28.
    Yang S, White R and Horvitz E (2013) Pursuing insights about healthcare utilization via geocoded search queries. Proceedings of the International ACM SIGIR Conference 993–996Google Scholar
  29. 29.
    Zhang W, Ming Z, Zhang Y, Nie L, Liu T, Chua T (2012) The use of dependency relation graph to enhance the term weighting in question retrieval. Proceedings of the 25th International Conference on Computational Linguistics 3105–3120Google Scholar
  30. 30.
    Zhang L, Han Y, Yang Y, Song M, Yan S, Tian Q (2013) Discovering discriminative graphlets for aerial image categories recognition. IEEE Trans Image Process 22(12):5071–5084MathSciNetCrossRefGoogle Scholar
  31. 31.
    Zhang L, Yang Y, Gao Y, Yu Y, Wang C, Li X (2014a) A probabilistic associative model for segmenting weakly supervised images. IEEE Trans Image Process 23(9):4150–4159MathSciNetCrossRefGoogle Scholar
  32. 32.
    Zhang L, Gao Y, Ji R, Xia Y, Dai Q, Li X (2014b) Actively learning human gaze shifting paths for semantics-aware photo cropping. IEEE Trans Image Process 23(5):2235–2245MathSciNetCrossRefGoogle Scholar
  33. 33.
    Zhang L, Gao Y, Xia Y, Lu K, Shen J, Ji R (2014c) Representative discovery of structure cues for weakly-supervised image segmentation. IEEE Trans Multimedia 16(2):470–479CrossRefGoogle Scholar
  34. 34.
    Zhang L, Gao Y, Xia Y, Dai Q, Li X (2015a) A fine-grained image categorization system by cellet-encoded spatial pyramid modeling. IEEE Trans Ind Electron 62(1):564–571CrossRefGoogle Scholar
  35. 35.
    Zhang L, Xia Y, Mao K, Ma H, Shan Z (2015b) An effective video summarization framework toward handheld devices. IEEE Trans Ind Electron 62(2):1309–1316CrossRefGoogle Scholar
  36. 36.
    Zhu D and Carterette B (2013) An adaptive evidence weighting method for medical record search. Proceedings of the International ACM SIGIR Conference 1025–1028Google Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  1. 1.College of Computer Science and TechnologyWuhan University of Science and TechnologyWuhanChina
  2. 2.Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial SystemWuhanChina
  3. 3.School of Computer Science and TechnologyHuazhong University of Science and TechnologyWuhanChina

Personalised recommendations