Extracting Ontologies from Arabic Wikipedia: A Linguistic Approach

Abstract

As one of the important aspects of semantic web, building ontological models became a driving demand for developing a variety of semantic web applications. Through the years, much research was conducted to investigate the process of generating ontologies automatically from semi-structured knowledge sources such as Wikipedia. Different ontology building techniques were investigated, e.g., NLP tools and pattern matching, infoboxes and structured knowledge sources (Cyc and WordNet). Looking at the results of previous approaches we can see that the vast majority of employed techniques did not consider the linguistic aspect of Wikipedia. In this article, we present our solution to extract ontologies from Wikipedia using a linguistic approach based on the semantic field theory introduced by Jost Trier. Linguistic ontologies are significant in many applications for both linguists and Web researchers. We applied the proposed approach on the Arabic version of Wikipedia. The semantic relations were extracted from infoboxes, hyperlinks within infoboxes and list of categories that articles belong to. Our system successfully extracted approximately (760,000) triples from the Arabic Wikipedia. We conducted three experiments to evaluate the system output, namely: Validation Test, Crowd Evaluation and Domain Experts’ evaluation. The system output achieved an average precision of 65 %.

This is a preview of subscription content, access via your institution.

References

  1. 1

    Noy, N.F.; McGuinness, D.L.: Ontology development 101: a guide to creating your first ontology

  2. 2

    Studer, R.; Benjamins, V.R.; Fensel, D.: Knowledge engineering: principles and methods. Data Knowl. Eng. 25, 161–197 (1998)

    Article  MATH  Google Scholar 

  3. 3

    Antoniou, G.; van Harmelen, F.: A Semantic Web Primer. The MIT Press, Cambridge (2004)

    Google Scholar 

  4. 4

    Passin, T.B.: Explorer’s Guide to the Semantic Web. Manning Publications, New Jersey (2004)

    Google Scholar 

  5. 5

    Ruiz-Casado, M.; Alfonseca, E.; Castells, P.: Automatic extraction of semantic relationships for WordNet by means of pattern learning from wikipedia. In: 10th International Conference on Applications of Natural Language to Information Systems, pp. 67–79. Springer, Alicante (2005)

  6. 6

    Ruiz-Casado, M.; Alfonseca, E.; Castells, P.: From Wikipedia to semantic relationships: a semi-automated annotation approach. In: 1st Workshop on Semantic Wikis: From Wiki to Semantics, at the 3rd European Semantic Web Conference (ESWC), Budva, Montenegro (2006)

  7. 7

    Ruiz-Casado, M.; Alfonseca, E.; Castells, P.: Automatising the learning of lexical patterns: an application to the enrichment of WordNet by extracting semantic relationships from Wikipedia. Data Knowl. Eng. 61, 484–499 (2007)

    Article  Google Scholar 

  8. 8

    Suchanek, F.M.; Kasneci, G.; Weikum, G.: Yago: a core of semantic knowledge. In: Proceedings of the 16th International Conference on World Wide Web, pp. 697–706. ACM, New York (2007)

  9. 9

    Suchanek, F.M.; Kasneci, G.; Weikum, G.: Yago: a large ontology from Wikipedia and WordNet. J. Web Semant. 6, 203–217 (2008)

    Article  Google Scholar 

  10. 10

    Wu, F.; Weld, D.S.: Automatically refining the wikipedia infobox ontology. In: Proceedings of the 17th International Conference on World Wide Web, pp. 635–644. ACM, New York (2008)

  11. 11

    Auer, S.; Bizer, C.; Kobilarov, G.; Lehmann, J.; Cyganiak, R.; Ives, Z.: DBpedia: a nucleus for a web of open data. In: Proceedings of the 6th International Semantic Web Conference, The Semantic Web and 2nd Asian Conference on Asian Semantic Web Conference, pp. 722–735. Springer, Berlin (2007)

  12. 12

    JurĆi, D.; Banek, M.; SkoČir, Z.: Automated construction of domain ontology taxonomies from wikipedia. In: The 22nd International Conference on Database And Expert Systems Applications, vol. Part II, pp. 439–446. Springer, Berlin (2011)

  13. 13

    Trier, J.: Der deutsche wortschatz im sinnbezirk des verstandes: Von den anfängen bis zum beginn des 13. Jahrhunderts. C. Winter (1931)

  14. 14

    Lyons, J.: Semantics. Cambridge University Press, London (1977)

    Book  Google Scholar 

  15. 15

    Fodor, J.D.: Semantics: theories of meaning in generative grammar. Harvester Press, Brighton (1977)

    Google Scholar 

  16. 16

    Katz, J.J.: Semantic theory. Harper & Row, New York (1972)

    Google Scholar 

  17. 17

    Steinberg, D.D.; Jakobovits, L.A. (eds): On Generative Semantics. Semantics: An Interdisciplinary Reader in Philosophy, Linguistics and Psychology, pp. 232–296. Cambridge University Press, Cambridge (1974)

  18. 18

    Al-Yahya, M.; Alkhalifa, H.; Bahanshal, A.; Alodah, I.; Al-Helwah, N.: An ontological model for representing computational lexicons a componential based approach. In: 2010 International Conference on Natural Language Processing and Knowledge Engineering, pp. 1–6. IEEE press, New York (2010)

  19. 19

    Al-Yahya, M.; Al-Khalifa, H.; Bahanshal, A.; Al-Oudh, I.; Al-Helwah, N.: An ontological model for representing semantic lexicons: an application on time nouns in the holy Quran. Arab. J. Sci. Eng. 35, 21–35 (2010)

    Google Scholar 

  20. 20

    Al-Rajebah, N.I.; Al-Khalifa, H.S.; Al-Salman, A.S.: Building ontological models from Arabic Wikipedia: a proposed hybrid approach. In: Proceedings of the 12th International Conference on Information Integration and Web-based Applications & Services, pp. 899–902. ACM, Paris (2010)

  21. 21

    Al-Rajebah, N.I.; Al-Khalifa, H.S.; Al-Salman, A.-M.S.: Exploiting Arabic Wikipedia for automatic ontology generation: a proposed approach. In: 2011 International Conference on Semantic Technology and Information Retrieval, pp. 70–76. IEEE press, New Jersey (2011)

  22. 22

    Medelyan, O.; Milne, D.; Legg, C.; Witten, I.H.: Mining meaning from Wikipedia. Int. J. Hum. Comput. Stud. 67, 716–754 (2009)

    Article  Google Scholar 

  23. 23

    Wikimedia Downloads, http://dumps.wikimedia.org/. Accessed 16 Jan 2011

  24. 24

    Magnini, B.; Speranza, M.: Merging global and specialized linguistic ontologies. In: Proceedings of the Workshop Ontolex-2002 Ontologies and Lexical Knowledge Bases, pp. 43–48 (2002)

  25. 25

    Schadd, F.C.; Roos, N.: Improving ontology matchers utilizing linguistic ontologies. In: The 23rd Benelux Conference on Artificial Intelligence (2011)

  26. 26

    Francopoulo, G.; Bel, N.; George, M.; Calzolari, N.; Monachini, M.; Pet, M.: Lexical Markup Framework: ISO standard for semantic information in NLP lexicons. Gesellschaft für linguistische Datenverarbeitung, Tubingen (2007)

  27. 27

    Cimiano, P.; Buitelaar, P.; McCrae, J.; Sintek, M.: LexInfo: A declarative model for the lexicon-ontology interface. Web Semant. 9, 29–51 (2011)

    Article  Google Scholar 

  28. 28

    Soria, C.; Tesconi, M.; Bertagna, F.; Calzolari, N.; Marchetti, A.; Monachini, M.: Moving to dynamic computational lexicons with lexflow. In: Proceedings of the Fifth International Conference on Language Resources and Evaluation, pp. 7–12, Genoa, Italy (2006)

  29. 29

    Takenobu, T.; Calzolari, N.; Huang, C.-R.; Prevot, L.; Sornlertlamvanich, V.; Monachini, M.; YingJu, X.; Kiyoaki, S.; Charoenporn, T.; Soria, C.; Hao, Y.: Infrastructure for standardization of Asian language resources. In: Proceedings of the COLING/ACL on Main Conference Poster Sessions, pp. 827–834. Association for Computational Linguistics, Stroudsburg (2006)

  30. 30

    Language resource management—Lexical markup framework (LMF) (2008)

  31. 31

    Buitelaar, P.; Declerck, T.; Frank, A.; Racioppa, S.; Kiesel, M.; Sintek, M.; Engel, R.; Romanelli, M.; Sonntag, D.; Loos, B.; Micelli, V.; Porzel, R.; Cimiano, P.: LingInfo: design and applications of a model for the integration of linguistic information in ontologies. OntoLex06, a Workshop at LREC (2006)

  32. 32

    Buitelaar, P.; Cimiano, P.; Haase, P.; Sintek, M.: Towards linguistically grounded ontologies. The semantic web: research and applications, pp. 111–125. Springer, Berlin (2009)

  33. 33

    Brank, J.; Grobelnik, M.; MladeniĆ, D.: A survey of ontology evaluation techniques. In: The Conference on Data Mining and Data Warehouses (2005)

  34. 34

    Inter-rater reliability-Wikipedia, the free encyclopedia. http://en.wikipedia.org/wiki/Inter-rater_reliability. Accessed 20 Jan 2011

  35. 35

    Randolph, J.J.: Online kappa calculator. http://justusrandolph.net/kappa/. Accessed 20 Jan 2011

  36. 36

    Chernov, S.; Iofciu, T.; Nejdl, W.; Zhou, X.: Extracting semantic relationships between wikipedia categories. In: Proceedings of the 1st International Workshop: “SemWiki2006—From Wiki to Semantics” Co-Located with the ESWC2006, Budva, Montenegro (2006)

  37. 37

    Wang, G.; Yu, Y.; Zhu, H.: PORE: positive-only relation extraction from wikipedia text. In: Proceedings of the 6th International The Semantic Web and 2nd Asian Conference on Asian Semantic Web Conference, pp. 580–594. Springer, Berlin (2007)

  38. 38

    Wu, F.; Weld, D.S.: Autonomously semantifying wikipedia. In: Proceedings of the Sixteenth ACM Conference on Information And Knowledge Management, pp. 41–50. ACM, New York (2007)

  39. 39

    Wu, F.; Hoffmann, R.; Weld, D.S.: Information extraction from Wikipedia: moving down the long tail. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 731–739. ACM, New York (2008)

  40. 40

    Nguyen, D.P.T.; Matsuo, Y.; Ishizuka, M.: Relation extraction from wikipedia using subtree mining. In: Proceedings of the 22nd National Conference on Artificial Intelligence, vol. 2, pp. 1414–1420. AAAI Press, Menlo Park (2007)

  41. 41

    Cui, G.; Lu, Q.; Li, W.; Chen, Y.: Mining concepts from Wikipedia for ontology construction. In: The 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, pp. 287–290. IEEE Computer Society, Washington, DC (2009)

  42. 42

    Herbelot, A.; Copestake, A.: Acquiring ontological relationships from Wikipedia using RMRS. In: Proceedings of the Workshop on Web content Mining with Human Language Technologies (2006)

  43. 43

    Auer, S.; Lehmann, J.: What have innsbruck and leipzig in common? Extracting semantics from wiki content. In: Franconi, E.; Kifer, M.; May, W. (eds.) The Semantic Web: Research and Applications, pp. 503–517. Springer, Berlin (2007)

  44. 44

    Ponzetto, S.P.; Strube, M.: Deriving a large scale taxonomy from Wikipedia. In: Proceedings of the 22nd National Conference on Artificial Intelligence, vol. 2, pp. 1440–1445. AAAI Press, Menlo Park (2007)

  45. 45

    Liu, Q.; Xu, K.; Zhang, L.; Wang, H.; Yu, Y.; Pan, Y.: Catriple: extracting triples from wikipedia categories. The Semantic Web, pp. 330–344. Springer, Berlin (2008)

  46. 46

    Medelyan, O.; Legg, C.: Integrating cyc and Wikipedia: folksonomy meets rigorously defined common-sense. In: Proceedings of the AAAI Workshop on Wikipedia and Artificial Intelligence, pp. 13–18. Chicago, USA (2008)

  47. 47

    Sarjant, S.; Legg, C.; Robinson, M.; Medelyan, O.: “All You Can Eat” Ontology-Building: Feeding Wikipedia to Cyc. In: The 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, vol. 01, pp. 341–348. IEEE Computer Society, Washington, DC (2009)

  48. 48

    De Silva, L.; Jayaratne, L.: Semi-automatic extraction and modeling of ontologies using Wikipedia XML Corpus. In: Proceedings of Second International Conference on the Applications of Digital Information and Web Technologies, pp. 446–451 (2009)

  49. 49

    Farhoodi, M.; Mahmoudi, M.; Zareh Bidoki, A.M.; Yari, A.; Azadnia, M.: Query expansion using persian ontology derived from Wikipedia. World Appl. Sci. J. 7, (2009)

  50. 50

    Lu, C.-Y.; Ho, S.-W.; Chung, J.-M.; Hsu, F.-Y.; Lee, H.-M.; Ho, J.-M.: Mining fuzzy domain ontology based on concept vector from Wikipedia category network. In: IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, pp. 249–252 (2011)

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Hend S. Al-Khalifa.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Al-Rajebah, N.I., Al-Khalifa, H.S. Extracting Ontologies from Arabic Wikipedia: A Linguistic Approach. Arab J Sci Eng 39, 2749–2771 (2014). https://doi.org/10.1007/s13369-013-0791-y

Download citation

Keywords

  • Semantic field theory
  • Wikipedia
  • Linguistics
  • Ontologies