A hybrid-based method for Chinese domain lightweight ontology construction

  • Jing Qiu
  • Lin Qi
  • Jianliang Wang
  • Guanghua Zhang
Original Article


This paper proposes a framework to automatically construct lightweight ontology from a corpus of Chinese domain Web documents. A hybrid-based method was used for domain lightweight ontology learning. Rule-based method, statistics-based method and cluster-based method were combined to complete two sub-tasks: concept extraction and taxonomic relationships extraction. Firstly, multiword terms were identified based on a set of rules as well as a Named Entity Module. Three statistic methods were employed jointly to rank the order of domain concepts. Secondly, clustering and subsumption methods were joined to construct taxonomy. Concepts were clustered into several groups through clustering method. Three similarity measures were defined to compute similarities between concepts, which aims at capturing semantic, spatial, and co-occurrence information. Subsumption method was adopted to construct taxonomic structure for each concept group, since taxonomic relations only existed between similar concepts. Thirdly, the definitions of the concepts extracted in the first step are collected from online Chinese Encyclopedia. On this collection of concept definitions, the rule-based method and a set of lexico-syntactic patterns were applied to extract taxonomic relationships and refine the taxonomy. Finally, we evaluate our method using gold-standard evaluation on domain of football games. In our evaluation, we compare our method with several classical algorithms. The experimental results show the effectiveness of our method.


Ontology learning Concept extraction Taxonomic relationships extraction Hybrid-based method 



This paper is supported by the National Natural Science Foundation of China (61300120), partially supported by the China Postdoctoral Science Foundation (2015M582622), Colleges of Science and Technology Research Foundation in Hebei Province (YQ2013032, YQ2014036), Science and technology department of Hebei province of china (15210338), and The Open Project of Beijing Key Laboratory of IOT information security technology, Institute of Information Engineering. And we thank Harbin Institute of Technology Information Retrieval Laboratory for providing us with LTP modules.


  1. 1.
    Abney S (2004) Understanding the yarowsky algorithm. Comput Linguist 30(3):365–395MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Berners-Lee T, Hendler J, Lassila O (2001) The semantic web: a new form of web content that is meaningful to computers will unleash a revolution of new possibilities. Sci Am 285(5):34–43CrossRefGoogle Scholar
  3. 3.
    Bird S, Klein E, Loper E, Baldridge J (2008) Multi-disciplinary instruction with the natural language toolkit. In: Proceedings of the Third Workshop on Issues in Teaching Computational Linguistics (TeachCL’08), 2008, pp 62–70Google Scholar
  4. 4.
    Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th annual conference on Computational learning theory, 1998, pp 92–100Google Scholar
  5. 5.
    Bradesko L, Dali L, Fortuna B et al (2010) Contextualized question answering. In: ITI 2010, pp 73–78Google Scholar
  6. 6.
    Brewster C, Jupp S, Luciano J et al (2009) Issues in learning an ontology from text. BMC Bioinform 10(5):S1CrossRefGoogle Scholar
  7. 7.
    Buitelaar P, Magnini B (2005) Ontology learning from text: an overview. In: Buitelaar P, Cimiano P, Magnini B (eds) Ontology learning from text: methods, applications and evaluation. IOS Press, The Netherlands, pp 3–12.Google Scholar
  8. 8.
    Bunescu RC, Mooney RJ (2005) A shortest path dependency kernels for relation extraction. In: Proceedings of EMNLP’2005, 2005, pp 724–731Google Scholar
  9. 9.
    Che W, Li Z, Liu T (2010) LTP: a Chinese language technology platform. In: Coling, pp 13–16Google Scholar
  10. 10.
    Ciaramita M, Gangemi A, Ratsch E, Saric J, Rojas I (2005) Unsupervised learning of semantic relations between concepts of a molecular biology ontology. In: Proceedings of the 19th International Joint Conference on Artificial Intelligence, 2005, pp 659–664Google Scholar
  11. 11.
    Cimiano P, Hotho A, Staab S (2005) Learning concept hierarchies from text corpora using formal concept analysis. J Artif Intell Res 24:305–339CrossRefzbMATHGoogle Scholar
  12. 12.
    Cimiano P, Volker J (2005) Text2Onto: A framework for ontology learning and data-driven change discovery. In: NLDB, pp 227–238Google Scholar
  13. 13.
    Colace F, Santo MD, Greco L et al (2014) Terminological ontology learning and population using latent Dirichlet allocation. J Visual Lang Comput 25:818–826CrossRefGoogle Scholar
  14. 14.
    Curtis J, Matthews G, Baxter D (2005) On the effective use of Cyc in a question answering system. In: IJCAI Workshop on KRAQ’05, Edinburgh, Scotland, pp 61–71.Google Scholar
  15. 15.
    Dietz EA, Vandic D, Frasincar F (2012) TaxoLearn: a semantic approach to domain taxonomy learning. In: Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, 2012, pp 58–65Google Scholar
  16. 16.
    Doing-Harris K, Livnat Y, Meystre S (2015) Automated concept and relationship extraction for the semi-automated ontology management (SEAM) system. J Biomed Semant 6(15):1–15Google Scholar
  17. 17.
    Fallucchi F, Zanzotto F M (2011) Inductive probabilistic taxonomy learning using singular value decomposition. Nat Lang Eng 17(1):71–94CrossRefGoogle Scholar
  18. 18.
    Faure D, Poibeau T (2000) First experiments of using semantic knowledge learned by ASIUM for information extraction task using INTEX. In ECAI Workshop on Ontology Learning, pp 7–12Google Scholar
  19. 19.
    Ferreira V H, Lopes l, Vieira R, Finatto M J (2013) Automatic extraction of domain specific non-taxonomic relations from Portuguese Corpora. In: Proceedings of 12th IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology, 2013, pp 135–138Google Scholar
  20. 20.
    Fortuna B, Lavrac N, Velardi P (2008) Advancing Topic Ontology Learning through Term Extraction. In: PRICAI, pp 626–635Google Scholar
  21. 21.
    Gruber T (1993) A translation approach to portable ontology specifications. Knowl Acquis 5:199–220CrossRefGoogle Scholar
  22. 22.
    Hearst MA (1992) Automatic acquisition of hyponyms from large text corpora. In: COLING, vol 2, pp 539–545Google Scholar
  23. 23.
    Heflin J, Hendler J (2000) Dynamic ontologies on the Web. In: AAAI, pp 443–449Google Scholar
  24. 24.
    Hippisley A, Cheng D, Ahmad K (2005) The head-modifier principle and multilingual term extraction. Nat Lang Eng 11(2):129–157CrossRefGoogle Scholar
  25. 25.
    Kang Y, Haghigh PD, Burstein F (2016) TaxoFinder: a graph-based approach for taxonomy learning. IEEE Trans Knowl Data Eng 28(2):524–536CrossRefGoogle Scholar
  26. 26.
    Knijff JD, Frasincar F, Hogenboom F (2013) Domain taxonomy learning from text: the subsumption method versus hierarchical clustering. Data Knowl Eng 83(1):54–69CrossRefGoogle Scholar
  27. 27.
    Kozareva Z, Hovy E (2010) A semi-supervised method to learn and construct taxonomies using the web. In: EMNLP, pp 1110–1118Google Scholar
  28. 28.
    Kozareva Z, Hovy E, Riloff E (2009) Learning and evaluating the content and structure of a term taxonomy. In: AAAI, pp 50–57Google Scholar
  29. 29.
    Li D, Kipper-Schuler K, Savova G (2008) Conditional random fields and support vector machines for disorder named entity recognition in clinical texts. In: Proceedings of the workshop on current trends in biomedical natural language processing, 2008, pp 94–95Google Scholar
  30. 30.
    Li J, Luong T, Jurafsky D, and Hovy E (2015) When are tree structures necessary for deep learning of representations? In: Proceedings of the 2015 EMNLP, 2015, pp 2304–2314Google Scholar
  31. 31.
    Liu X, Song Y, Liu S, Wang H (2012) Automatic taxonomy construction from keywords. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2012, pp 1433–1441Google Scholar
  32. 32.
    Lv X, Guan Y, Deng B (2014) Learning based clinical concept extraction on data from multiple sources. J Biomed Inform 52:55–64CrossRefGoogle Scholar
  33. 33.
    Maedche A, Staab S (2000) The text-to-onto ontology learning environment. In: Proceedings of SoftwareDemonstration at the 8th International Conference on Conceptual Structures, 2000, pp 14–18Google Scholar
  34. 34.
    Meijer K, Frasincar F, Hogenboom F (2014) A semantic approach for extracting domain taxonomies from text. Decis Support Syst 62:78–93CrossRefGoogle Scholar
  35. 35.
    Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: ICLR Workshop, 2013Google Scholar
  36. 36.
    Milano M, Agopito G, Guzzi PH, Cannataro M (2016) An experimental study of information content measurement of gene ontology terms. Int J Mach Learn Cybern. doi: 10.1007/s13042-015-0482-y Google Scholar
  37. 37.
    Navigli R, Velardi P (2004) Learning domain ontologies from document warehouses and dedicated web sites. Comput Linguist 30(2):151–179CrossRefzbMATHGoogle Scholar
  38. 38.
    Nedellec C (2000) Corpus-based learning of semantic relations by the ILP system, Asium. In: Proceeding of Learning Language in Logic, 2000, pp 259–278Google Scholar
  39. 39.
    Paukkeri MS, Garcia-Plaza AP, Fresno V et al (2012) Learning a taxonomy from a set of text documents. Appl Soft Comput 12:1138–1148CrossRefGoogle Scholar
  40. 40.
    Pennacchiotti M, Pantel P (2006) A bootstrapping algorithm for automatically harvesting semantic relations. In: Proceedings of Inference in Computational Semantics, 2006, pp 87–96Google Scholar
  41. 41.
    Ponzetto SP, Strube M (2011) Taxonomy induction based on a collaboratively built knowledge repository. Artif Intell 75(9–10):1737–1756MathSciNetCrossRefGoogle Scholar
  42. 42.
    Rehurek R, Sojka P (2010) Software Framework for Topic Modelling with Large Corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, 2010, pp 45–50Google Scholar
  43. 43.
    Rios-Alvarado AB, Lopez-Arevalo I, Sosa-Sosa VJ (2013) Learning concept hierarchies from textual resources for ontologies construction. Expert Syst Appl 40(15):5907–5915CrossRefGoogle Scholar
  44. 44.
    Ryu P M, Choi K S (2006) Taxonomy learning using term specificity and similarity. In: Proceedings Workshop on Ontology Learning and Population, 2006, pp 41–48Google Scholar
  45. 45.
    Salton G, McGill MJ (1986) Introduction to modern information retrieval. In: McGraw-Hill Inc. New York, USA, pp 180–198Google Scholar
  46. 46.
    Santos CD, Xiang B, Zhou B (2015) Classifying relations by ranking with convolutional neural networks. In: Proceedings of the 53rd ACL and the 7th IJCNLP, 2015, pp 626–634Google Scholar
  47. 47.
    Schutz A, Buitelaar P (2005) RelExt: a tool for relation extraction from text in ontology extension. In: Proceedings of 4th International Semantic Web Conference, 2005, pp 593–606Google Scholar
  48. 48.
    Sclano F, Velardi P (2007) TermExtractor: a web application to learn the shared terminology of emergent web communities. Enterp Interoper. II. doi: 10.1007/978-1-84628-858-6_32 Google Scholar
  49. 49.
    Shamsfard M, Barforoush A (2004) Learning ontologies from natural language texts. Int J Hum Comput Stud 60(1):17–63CrossRefGoogle Scholar
  50. 50.
    Snchez D, Moreno A (2005) Web-scale taxonomy learning. In: Proceedings of workshop on extending and learning lexical ontologies using machine learning, 2005, pp 53–60Google Scholar
  51. 51.
    Snow R, Jurafsky D, Ng A Y (2006) Semantic taxonomy induction from heterogenous evidence, In: ACL, pp 801–808Google Scholar
  52. 52.
    Specia L, Motta E (2006) A hybrid approach for extracting semantic relations from texts. In: Proceedings of 2nd Workshop on Ontology Learning and Population, 2006, pp 57–64Google Scholar
  53. 53.
    Suchanek FM, Ifrim G, Weikum G (2006) Combining linguistic and statistical analysis to extract relations from web documents. In: Proceeding of 12th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2006, pp 712–717Google Scholar
  54. 54.
    Tai KS, Socher R, Manning CD (2015) Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of the 53rd ACL and the 7th IJCNLP, 2015, pp 1556–1566Google Scholar
  55. 55.
    Thompson CA, Califf ME, Mooney RJ (1999) Active learning for natural language parsing and information extraction. In: Proceedings of the 16th International Conference on Machine Learning, Morgan Kaufmann, 1999, pp 406–414Google Scholar
  56. 56.
    Velardi P, Cucchiarelli A, Ptit M (2007) A taxonomy learning method and its application to characterize a scientific web community. IEEE Trans Knowl Data Eng 19:180–191CrossRefGoogle Scholar
  57. 57.
    Velardi P, Fabriani P, Missikoff M (2001) Using text processing techniques to automatically enrich a domain ontology. In: Proceedings of the ACM Conference on Formal Ontologies in Information Systems, 2001, pp 270–284Google Scholar
  58. 58.
    Velardi P, Faralli S, Navigli R (2013) OntoLearn reloaded: a graph-based algorithm for taxonomy induction. Comput Linguist 39(3):665–707CrossRefGoogle Scholar
  59. 59.
    Velardi P, Navigli R, Cucchiarelli A et al (2005) Evaluation of OntoLearn, a methodology for automatic learning of domain ontologies. In: Buitelaar P, Cimiano P, Magnini B (eds) Ontology learning from text: methods, applications and evaluation. IOS Press, Amsterdam, pp 92–106Google Scholar
  60. 60.
    Wang W, Mamaani Barnaghi P, Bargiela A (2010) Probabilistic topic models for learning terminological ontologies. IEEE Trans Knowl Data Eng 22(7):1028–1040CrossRefGoogle Scholar
  61. 61.
    Wang Y, Patrick J (2009) Cascading classifiers for named entity recognition in clinical notes. In: Proceedings of the workshop on biomedical information extraction, 2009, pp 42–49Google Scholar
  62. 62.
    Weichselbraun A, Wohlgenannt G, Scharl A (2010) Refining non-taxonomic relation labels with external structured data to support ontology learning. J Data Knowl Eng 69(8):763–778CrossRefGoogle Scholar
  63. 63.
    Wong MK, Abidi SSR, Jonsen ID (2014) A multi-phase correlation search framework for mining non-taxonomic relations from unstructured text. J Knowl Inf Syst 38(3):641–667CrossRefGoogle Scholar
  64. 64.
    Wong W, Liu W, Bennanoun M (2012) Ontology learning from text: a look back and into the future. ACM Comput Surv 44(4):20CrossRefGoogle Scholar
  65. 65.
    Xu Y, Mou L, Li G, Chen Y, Peng H, Jin Z (2015) Classifying relations via long short term memory networks along shortest dependency paths. In: Proceedings of the 2015 EMNLP, 2015, pp 1785–1794Google Scholar
  66. 66.
    Zelenko D, Aone C, Richardella A (2003) Kernel methods for relation extraction. J Mach Learn Res 3(3/1/2003):1083–1106MathSciNetzbMATHGoogle Scholar
  67. 67.
    Zhang Z (2008) Mining relational data from text: from strictly supervised to weakly supervised learning. Inf Syst 33(3):300–314CrossRefGoogle Scholar
  68. 68.
    Zhou G D, Su J, Zhang J, and Zhang M (2005) Exploring various knowledge in relation extraction. In: Proceedings of the ACL’2005, 2005, pp 419–444Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2017

Authors and Affiliations

  1. 1.Department of Information Science and EngineeringHebei University of Science and TechnologyShijiazhuangChina

Personalised recommendations