International Journal of Information Technology

, Volume 10, Issue 3, pp 303–311 | Cite as

MwTExt: automatic extraction of multi-word terms to generate compound concepts within ontology

  • Pratik Thanawala
  • Jyoti Pareek
Original Research


Multiword expressions are omnipresent element of natural language, whose construal as a linguistic resource has significant importance in various applications. This paper presents an architecture-MwTExt, for automatic extraction of multi-word terms-MWTs from such expressions within un-annotated English documents. Natural Language Processing techniques such as Shallow parsing and syntactic structure analysis are used to extract MWTs, with specific focus on lexical patterns as (Noun Preposition Noun), (Noun Preposition Noun + Noun) and (Noun Preposition Noun Preposition Noun). The MWTs extracted can be further used to form compound concepts within Ontology. The lexical descriptions of MWTs are encoded in Web Ontology Language OWL/XML. MwTExt has been tested on Computer Science domain texts, and the results obtained are compared with those obtained by Text2Onto, an Ontology learning tool and term extractors such as TermRaider and TerMine. The result signifies that MwTExt performs better for extraction of accurate lexicalized MWTs with average precision of 97%.


Multi-word terms Compound concepts Lexical pattern Ontology 



Multi-word terms


Multi-word expressions


Multi-word terms extraction


Natural language processing


Web ontology language


eXtensible markup language


Proceedings references

  1. 1.
    Graliński F, Savary A, Czerepowicka M, Makowiecki F (2010) Computational lexicography of multi-word units: how efficient can it be? In: Proceedings of the workshop on multiword expressions: from theory to applications (MWE), pp 1–9Google Scholar
  2. 2.
    Attia M, Toral A, Tounsi L, Pecina P, van Genabith j (2010) Automatic extraction of arabic multiword expressions. In: Proceedings of the workshop on multiword expressions: from theory to applications (mWE), pp 18–26Google Scholar
  3. 4.
    Cimiano P, Völker J (2005) Text2Onto: a framework for ontology learning and data-driven change discovery. In: Proceedings of the 10th international conference on applications of natural language to information systems (NLDB), vol 3513, pp 227–238Google Scholar
  4. 11.
    Stanković R, Krstev C, Obradović I, Lazić B, Trtovac A (2016) Rule-based automatic multi-word term extraction and lemmatization. In: Tenth international conference on language resources and evaluationGoogle Scholar
  5. 12.
    Liu Y, Shi M, Li C (2016) Domain ontology concept extraction method based on text. In: IEEE ICISGoogle Scholar
  6. 13.
    Riedl M, Biemann C (2015) A single word is not enough: ranking multiword expressions using distributional semantics. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 2430–2440Google Scholar
  7. 14.
    Ramisch C (2012) A generic framework for multiword expressions treatment: from acquisition to applications. In: Proceedings of the ACL 2012 student research workshop, Jeju, Republic of KoreaGoogle Scholar
  8. 15.
    Drymonas EG (2009) Ontology learning from text based on multi-word term concepts: the OntoGain method. Master of Science thesis, Technical University of Crete, GreeceGoogle Scholar
  9. 16.
    Frank E, Paynter GW, Witten IH, Gutwin C, Nevill-Manning CG (1999) Domain specific keyphrase extraction. In: Proc. sixteenth international joint conference on artificial intelligence, Morgan Kaufmann Publishers, pp 668–673Google Scholar
  10. 17.
    Bonin F, Dell’Orletta F, Venturi G, Montemagni S (2010) Contrastive filtering of domain-specific multi-word terms from different types of corpora. In: Proceedings of the workshop on multiword expressions: from theory to applications, pp 76–79Google Scholar
  11. 18.
    Jiang X, Tan A-H (2005) Mining ontological knowledge from domain-specific text documents. In: Proceedings of the fifth IEEE international conference on data miningGoogle Scholar

Journal references

  1. 3.
    Gruber T (1993) A translation approach to portable ontology specifications. Knowl Acquis 5(2):199–220CrossRefGoogle Scholar
  2. 5.
    Buitelaar P, Olejnik D, Sintek M (2004) A protégé plug-in for ontology extraction from text based on linguistic analysis. In: Davies J et al (eds) The semantic web: research and applications. ESWS 2004, LNCS 3053. Springer, BerlinGoogle Scholar
  3. 6.
    Velardi P, Faralli S, Navigli R (2013) OntoLearn reloaded: a graph-based algorithm for taxonomy induction. Assoc Comput Linguist 39(3):665–707CrossRefGoogle Scholar
  4. 7.
    Wong W, Liu W, Bennamoun M (2012) Ontology learning from text: a look back and into the future. ACM Comput Surv (CSUR) 44(4):20CrossRefzbMATHGoogle Scholar
  5. 8.
    Biemann C (2005) Ontology learning from text: a survey of methods. LDV Forum 20(2):75–93Google Scholar
  6. 10.
    Frantzi K, Ananiadou S, Mima H (2000) Automatic recognition of multi-word terms. Int J Digit Librar 3(2):117–132 (TerMine) Google Scholar
  7. 19.
    Meryem H, Ouatik SA, Lachkar A (2014) A novel method for arabic multi-word term extraction. Int J Database Manag Syst (IJDMS) 6(3):53–67CrossRefGoogle Scholar

Copyright information

© Bharati Vidyapeeth's Institute of Computer Applications and Management 2018

Authors and Affiliations

  1. 1.School of Computer StudiesAhmedabad UniversityAhmedabadIndia
  2. 2.Department of Computer ScienceGujarat UniversityAhmedabadIndia

Personalised recommendations