Chapter

Computational Linguistics

Volume 458 of the series Studies in Computational Intelligence pp 109-129

An Approach to Efficient Processing of Multi-word Units

  • Cvetana KrstevAffiliated withFaculty of Philology, University of Belgrade Email author 
  • , Ivan ObradovićAffiliated withFaculty of Mining and Geology, University of Belgrade
  • , Ranka StankovićAffiliated withFaculty of Mining and Geology, University of Belgrade
  • , Duško VitasAffiliated withFaculty of Mathematics, University of Belgrade

* Final gross prices may vary according to local VAT.

Get Access

Abstract

Efficient processing of Multi-Word Units in the course of development of morphological MWU dictionaries is not easy to achieve, especially when languages with complex morphological structures are concerned, such as Serbian. Manual development of this type of dictionaries is a tedious and extremely slow process. To alleviate this problem we turned to our multipurpose software tool, dubbed LeXimir, in the production of lemmas for e-dictionaries of multi-word units. In addition to that, we developed a procedure aimed at making the production of MWU dictionary lemmas more efficient. This procedure, which strongly relies on our comprehensive e-dictionaries of Serbian simple words, was subsequently implemented as a new functionality LeXimir. In this paper we present our approach, and offer an evaluation of the performance of the new functionality of LeXimir, and hence of our procedure, obtained through two rounds of experiments on various types of data. The paper ends with a brief discussion of some further possible applications of both the procedure and LeXimir in various language processing tasks.