Skip to main content
Log in

Topic hierarchy construction from heterogeneous evidence

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Existing studies on hierarchy constructionmainly focus on text corpora and indiscriminately mix numerous topics, thus increasing the possibility of knowledge acquisition bottlenecks and misconceptions. To address these problems and provide a comprehensive and in-depth representation of domain specific topics, we propose a novel topic hierarchy construction method with real-time update. This method combines heterogeneous evidence from multiple sources including folksonomy and encyclopedia, separately in both initial topic hierarchy construction and topic hierarchy improvement. Results of comprehensive experiments indicate that the proposed method significantly outperforms state-of-theart methods (t-test, p-value < 0.000 1); recall has particularly improved by 20.4% to 38.7%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Liu X, Song Y, Liu S, Wang H. Automatic taxonomy construction from keywords. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2012, 1433–1441

    Chapter  Google Scholar 

  2. Trant J. Studying social tagging and folksonomy: a review and framework. Journal of Digital Information, 2009, 10(1): 1–42

    Google Scholar 

  3. Hoffart J, Suchanek F M, Berberich K, Weikum G. YAGO2: a spatially and temporally enhanced knowledge base from Wikipedia. Artificial Intelligence, 2013, 194: 28–61

    Article  MATH  MathSciNet  Google Scholar 

  4. Zhu X, Ming Z Y, Zhu X, Chua T. Topic hierarchy construction for the organization of multi-source user generated contents. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2013, 233–242

    Google Scholar 

  5. Hearst M A. Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th Conference on Computational Linguistic, 1992, 539–545

    Chapter  Google Scholar 

  6. Girju R, Badulescu A, Moldovan D. Learning semantic constraints for the automatic discovery of part-whole relations. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology. 2003, 1–8

    Google Scholar 

  7. Ming Z Y, Wang K, Chua T S. Prototype hierarchy based clustering for the categorization and navigation of web collections. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2010, 2–9

    Google Scholar 

  8. Snow R, Jurafsky D, Ng A Y. Semantic taxonomy induction from heterogenous evidence. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. 2006, 801–808

    Google Scholar 

  9. Yang H, Callan J. A metric-based framework for automatic taxonomy induction. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the AFNLP. 2009, 271–279

    Google Scholar 

  10. Yu J, Zha Z J, Wang M, Wang K, Chua T. Domain-assisted product aspect hierarchy generation: towards hierarchical organization of unstructured consumer reviews. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2011, 140–150

    Google Scholar 

  11. Navigli R, Velardi P, Faralli S. A graph-based algorithm for inducing lexical taxonomies from scratch. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence. 2011, 1872–1877

    Google Scholar 

  12. Zhou M, Bao S, Wu X, Yu Y. An unsupervised model for exploring hierarchical semantics from social annotations. In: Proceedings of the 6th International Semantic Web Conference and the 2nd Asian Semantic Web Conference. 2007, 680–693

    Google Scholar 

  13. Heymann P, Garcia-Molina H. Collaborative Creation of Communal Hierarchical Taxonomies in Social Tagging Systems. Technical Report. 2006

    Google Scholar 

  14. Angeletou S, Sabou M, Motta E. Semantically enriching folksonomies with FLOR. In: Proceedings of the 1st International Workshop on Collective Semantics: Collective Intelligence & the Semantic Web. 2008, 1–16

    Google Scholar 

  15. Tomuro N, Shepitsen A. Construction of disambiguated folksonomy ontologies using Wikipedia. In: Proceedings of the 2009 Workshop on the People’s Web Meets NLP: Collaboratively Constructed Semantic Resources. 2009, 42–50

    Chapter  Google Scholar 

  16. Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation. Journal of Machine Learning Research, 2003, (3): 993–1022

    MATH  Google Scholar 

  17. Tang J, Leung H, Luo Q, Chen D, Gong J. Towards ontology learning from folksonomies. In: Proceedings of the 21st International Joint Conference on Artificial Intelligence. 2009, 2089–2094

    Google Scholar 

  18. Bundschus M, Yu S, Tresp V, Rettinger A. Hierarchical bayesian models for collaborative tagging systems. In: Proceedings of the 9th IEEE International Conference on Data Mining. 2009, 728–733

    Google Scholar 

  19. Daud A, Li J Z, Zhou L Z, Zhang L. Modeling ontology of folksonomy with latent semantics of tags. In: Proceedings of the 2010 IEEE/WIC/ACM International Conference onWeb Intelligence and Intelligent Agent Technology. 2010, 516–523

    Chapter  Google Scholar 

  20. Xue H, Qin B, Liu T. Topical key concept extraction from folksonomy through graph-based ranking. Multimedia Tools and Applications, 2014: 1–19

    Google Scholar 

  21. Edmonds J. Optimum branchings. Journal of Research of the National Bureau of Standards B, 1967, 71: 233–240

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ting Liu.

Additional information

Han Xue, PhD candidate in computer science, is a student at Harbin Institute of Technology, China. Her interests include information extraction and social computing.

Bing Qin, PhD in computer science, is a professor at the Department of Computer Science of Harbin Institute of Technology, China. Her interests include text mining and natural language processing.

Ting Liu, PhD in computer science, is a professor at the Department of Computer Science and Technology of Harbin Institute of Technology, China. His interests include information retrieval and social computing.

Shen Liu, MS candidate in computer science, is a student at Harbin Institute of Technology, China. His interests include information extraction and natural language processing.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xue, H., Qin, B., Liu, T. et al. Topic hierarchy construction from heterogeneous evidence. Front. Comput. Sci. 10, 136–146 (2016). https://doi.org/10.1007/s11704-015-4548-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11704-015-4548-5

Keywords

Navigation