Soft Computing

, Volume 22, Issue 9, pp 2907–2919 | Cite as

Discovering taxonomies in Wikipedia by means of grammatical evolution

  • Lourdes AraujoEmail author
  • Juan Martinez-Romo
  • Andrés Duque
Methodologies and Application


This work applies grammatical evolution to identify taxonomic hierarchies of concepts from Wikipedia. Each article in Wikipedia covers a topic and is cross-linked by hyperlinks that connect related topics. Hierarchical taxonomies and their generalization to ontologies are a highly useful resource for many applications since they enable semantic search and reasoning. Thus, the automatic identification of taxonomies composed of concepts associated with linked Wikipedia pages has attracted much attention. We have developed a system which arranges a set of Wikipedia concepts into a taxonomy. This technique is based on the relationships among a set of features extracted from the contents of the Wikipedia pages. We have used a grammatical evolution algorithm to discover the best way of combining the considered features in an explicit function. Candidate functions are evaluated by applying a genetic algorithm to approximate the optimal taxonomy that the function can provide for a number of training cases. The fitness is computed as an average of the precision obtained by comparing, for the set of training cases, the taxonomy provided by the evaluated function with the reference one. Experimental results show that the proposal is able to provide valuable functions to find high-quality taxonomies.


Grammatical evolution Genetic algorithm Wikipedia taxonomies Information extraction 



This work has been partially supported by the Spanish Ministry of Science and Innovation within the projects EXTRECM (TIN2013-46616-C2-2-R) and PROSA-MED (TIN2016-77820-C3-2-R), as well as by the Universidad Nacional de Educación a Distancia (UNED) through the FPI-UNED 2013 Grant. The authors would like to thank the referees for their valuable comments which led to improvements in the paper.

Compliance with ethical standards

Conflict of interest

Lourdes Araujo declares that she has no conflict of interest. Juan Martinez-Romo declares that he has no conflict of interest. Andres Duque declares that he has no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.


  1. Ali E, Raghavan V (2015) Extending skos: A wikipedia-based unified annotation model for creating interoperable domain ontologies. In: Esposito F, Pivert O, Hacid MS, Rás ZW, Ferilli S (eds) Proceedings of the 22nd international symposium on foundations of intelligent systems. Springer, pp 364–370Google Scholar
  2. Araujo L, Martinez-Romo J, Duque A (2015) Grammatical evolution for identifying wikipedia taxonomies. In: Genetic and evolutionary computation conference, GECCO 2015, Madrid, Spain, July 11–15, 2015, companion material proceedings, pp 1345–1346Google Scholar
  3. Bartoli A, De Lorenzo A, Medvet E, Tarlao F (2016) Syntactical similarity learning by means of grammatical evolution. In: Handl J, Hart E, Lewis PR, López-Ibáñez M, Ochoa G, Paechter B (eds) Proceedings of parallel problem solving from nature—PPSN XIV. Springer, pp 260–269Google Scholar
  4. Ben Aouicha M, Hadj Taieb MA, Ezzeddine M (2016) Derivation of “is” taxonomy from wikipedia category graph. Eng Appl Artif Intell 50(C):265–286. doi: 10.1016/j.engappai.2016.01.033 CrossRefGoogle Scholar
  5. Bhogal J, Macfarlane A, Smith P (2007) A review of ontology based query expansion. Inf Process Manag 43(4):866–886CrossRefGoogle Scholar
  6. Camous F, Blott S, Smeaton A (2007) Ontology-based medline document classification. In: Hochreiter S, Wagner R (eds) Bioinformatics research and development. Lecture notes in computer science, vol 4414. Springer, Berlin, pp 439–452. doi: 10.1007/978-3-540-71233-6_34
  7. Cerri R, Barros RC, Freitas AA, de Carvalho AC (2014) Evolving relational hierarchical classification rules for predicting gene ontology-based protein functions. In: Proceedings of the 2014 conference companion on genetic and evolutionary computation companion, GECCO Comp ’14. ACM, New York, pp 1279–1286Google Scholar
  8. Chernov S, Iofciu T, Nejdl W, Zhou X (2006) Extracting semantics relationships between wikipedia categories. In: Völkel M, Schaffert S (eds) Proceedings of the first workshop on semantic wikis-from wiki to semantics, ESWC2006. Workshop on semantic wikisGoogle Scholar
  9. Clarke LE (1958) On Cayley’s formula for counting trees. J Lond Math Soci 33(4):471–474MathSciNetCrossRefzbMATHGoogle Scholar
  10. Dempsey I, O’Neill M, Brabazon A (2007) Constant creation in grammatical evolution. Int J Innov Comput Appl 1(1):23–38CrossRefGoogle Scholar
  11. Forsati R, Shamsfard M (2016) Symbiosis of evolutionary and combinatorial ontology mapping approaches. Inf Sci 342(C):53–80MathSciNetCrossRefGoogle Scholar
  12. Galitsky BA (2013) Transfer learning of syntactic structures for building taxonomies for search engines. Eng Appl Artif Intell 26(10):2504–2515CrossRefGoogle Scholar
  13. Geem ZW, Kim JH, Loganathan G (2001) A new heuristic optimization algorithm: harmony search. Simulation 76(2):60–68CrossRefGoogle Scholar
  14. He P, Deng Z, Gao C, Wang X, Li J (2016) Model approach to grammatical evolution: deep-structured analyzing of model and representation. Soft Comput 1–11. doi: 10.1007/s00500-016-2130-1
  15. Herbelot A, Copestake A (2006) Acquiring ontological relationships from wikipedia using rmrs. In: Proceedings of the ISWC 2006 workshop on web content mining with human language technologiesGoogle Scholar
  16. Hovy E (1998) Combining and standardizing large-scale, practical ontologies for machine translation and other uses. In: Language resource and evaluation conference.
  17. Isele R, Bizer C (2013) Active learning of expressive linkage rules using genetic programming. Web Semant Sci Serv Agents World Wide Web 23(0):2–15Google Scholar
  18. Khalatbari S, Mirroshandel SA (2015) Automatic construction of domain ontology using wikipedia and enhancing it by google search engine. J Inf Syst Telecommun 3:248–258Google Scholar
  19. Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, cambridgezbMATHGoogle Scholar
  20. Lehmann J, Isele R, Jakob M, Jentzsch A, Kontokostas D, Mendes PN, Hellmann S, Morsey M, van Kleef P, Auer S, Bizer C (2015) DBpedia–a large-scale, multilingual knowledge base extracted from wikipedia. Semant Web J 6(2):167–195Google Scholar
  21. Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, New YorkCrossRefzbMATHGoogle Scholar
  22. Mao Y (2001) A semantic-based genetic algorithm for sub-ontology evolution. Inf Technol J 9(4):609–620Google Scholar
  23. Medelyan O, Milne D, Legg C, Witten IH (2009) Mining meaning from wikipedia. Int J Hum Comput Stud 67(9):716–754CrossRefGoogle Scholar
  24. Miles A, Bechhofer S (2008) SKOS simple knowledge organization system reference. Working draft, W3C.
  25. Morales LP, Esteban AD, Gervás P (2008) Concept-graph based biomedical automatic summarization using ontologies. In: Proceedings of the 3rd textgraphs workshop on graph-based algorithms for natural language processing. Association for Computational Linguistics, Stroudsburg, pp 53–56Google Scholar
  26. Nakayama K, Hara T, Nishio S (2007) A thesaurus construction method from large scale web dictionaries. In: Proceedings of the 21st IEEE international conference on advanced information networking and applications, AINA07. IEEE Computer Society, pp 932–939Google Scholar
  27. Navigli R, Velardi P, Gangemi A (2003) Ontology learning and its application to automated terminology translation. Intell Syst IEEE 18(1):22–31CrossRefGoogle Scholar
  28. Nguyen DPT, Matsuo Y, Ishizuka M (2007) Exploiting syntactic and semantic information for relation extraction from Wikipedia. In: IJCAI workshop on text-mining and link-analysis (TextLink 2007)Google Scholar
  29. O’Neill M, Ryan C (2001) Grammatical evolution. IEEE Trans Evol Comput 5(4):349–358CrossRefGoogle Scholar
  30. Otero FEB, Freitas AA, Johnson CG (2009) A hierarchical classification ant colony algorithm for predicting gene ontology terms. In: Pizzuti C, Ritchie MD, Giacobini M (eds) EvoBIO. Lecture notes in computer science, vol 5483. Springer, pp 68–79Google Scholar
  31. Othman RM, Deris S, Illias RM, Alashwal HT, Hassan R, Farhan M (2007) Incorporating semantic similarity measure in genetic algorithm: an approach for searching the gene ontology terms. Int J Comput Intell 1(12):325–334Google Scholar
  32. Ponzetto SP, Strube M (2007) Deriving a large scale taxonomy from wikipedia. In: AAAI’07, Proceedings of the 22nd national conference on artificial intelligence, vol 2. AAAI Press, pp 1440–1445Google Scholar
  33. Prokofyev R, Demartini G, Boyarsky A, Ruchayskiy O, Cudr-Mauroux P (2013) Ontology-based word sense disambiguation for scientific literature. In: Serdyukov P, Braslavski P, Kuznetsov S, Kamps J, Rger S, Agichtein E, Segalovich I, Yilmaz E (eds) Advances in information retrieval. Lecture notes in computer science, vol 7814. Springer, Berlin, pp 594–605Google Scholar
  34. Ruiz-Casado M, Alfonseca E, Castells P (2005) Automatic extraction of semantic relationships for wordnet by means of pattern learning from wikipedia. In: NLDB, pp 67–79Google Scholar
  35. Salton G, Wong A, Yang CS (1975) A vector space model for automatic indexing. Commun ACM 18(11):613–620CrossRefzbMATHGoogle Scholar
  36. Schlegel DR, Crowner C, Elkin PL (2015) Automatically expanding the synonym set of SNOMED CT using wikipedia. In: MEDINFO 2015: eHealth-enabled Health—Proceedings of the 15th world congress on health and biomedical informatics, São Paulo, Brazil, 19–23 August 2015, pp 619–623Google Scholar
  37. Suchanek FM, Ifrim G, Weikum G (2006) Combining linguistic and statistical analysis to extract relations from web documents. In: KDD ’06, Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 712–717Google Scholar
  38. Suchanek FM, Kasneci G, Weikum G (2007) Yago: A core of semantic knowledge. In: WWW ’07, Proceedings of the 16th international conference on world wide web. ACM, New York, pp 697–706Google Scholar
  39. Vicient C, Sánchez D, Moreno A (2013) An automatic approach for ontology-based feature extraction from heterogeneous textualresources. Eng Appl Artif Intell 26(3):1092–1106CrossRefGoogle Scholar
  40. Weber N, Buitelaar P (2006) Web-based ontology learning with isolde. In: Proceedings of the workshop on web content mining with human language at the international semantic web conferenceGoogle Scholar
  41. Wu F, Weld DS (2007) Autonomously semantifying wikipedia. In: CIKM ’07, Proceedings of the sixteenth ACM conference on conference on information and knowledge management. ACM, New York, USA, pp 41–50Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2017

Authors and Affiliations

  1. 1.Universidad Nacional de Educación a Distancia (UNED)MadridSpain

Personalised recommendations