Chapter

Advances in Artificial Intelligence

Volume 7094 of the series Lecture Notes in Computer Science pp 394-406

Unsupervised Identification of Persian Compound Verbs

  • Mohammad Sadegh RasooliAffiliated withDepartment of Computer Engineering, Iran University of Science and Technology
  • , Heshaam FailiAffiliated withSchool of Electrical & Computer Engineering, Tehran University
  • , Behrouz Minaei-BidgoliAffiliated withDepartment of Computer Engineering, Iran University of Science and Technology

* Final gross prices may vary according to local VAT.

Get Access

Abstract

One of the main tasks related to multiword expressions (MWEs) is compound verb identification. There have been so many works on unsupervised identification of multiword verbs in many languages, but there has not been any conspicuous work on Persian language yet. Persian multiword verbs (known as compound verbs), are a kind of light verb construction (LVC) that have syntactic flexibility such as unrestricted word distance between the light verb and the nonverbal element. Furthermore, the nonverbal element can be inflected. These characteristics have made the task in Persian very difficult. In this paper, two different unsupervised methods have been proposed to automatically detect compound verbs in Persian. In the first method, extending the concept of pointwise mutual information (PMI) measure, a bootstrapping method has been applied. In the second approach, K-means clustering algorithm is used. Our experiments show that the proposed approaches have gained results superior to the baseline which uses PMI measure as its association metric.

Keywords

multiword expression light verb constructions unsupervised identification bootstrapping K-means Persian