Unsupervised Identification of Persian Compound Verbs

  • Mohammad Sadegh Rasooli
  • Heshaam Faili
  • Behrouz Minaei-Bidgoli
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7094)

Abstract

One of the main tasks related to multiword expressions (MWEs) is compound verb identification. There have been so many works on unsupervised identification of multiword verbs in many languages, but there has not been any conspicuous work on Persian language yet. Persian multiword verbs (known as compound verbs), are a kind of light verb construction (LVC) that have syntactic flexibility such as unrestricted word distance between the light verb and the nonverbal element. Furthermore, the nonverbal element can be inflected. These characteristics have made the task in Persian very difficult. In this paper, two different unsupervised methods have been proposed to automatically detect compound verbs in Persian. In the first method, extending the concept of pointwise mutual information (PMI) measure, a bootstrapping method has been applied. In the second approach, K-means clustering algorithm is used. Our experiments show that the proposed approaches have gained results superior to the baseline which uses PMI measure as its association metric.

Keywords

multiword expression light verb constructions unsupervised identification bootstrapping K-means Persian 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Mohammad Sadegh Rasooli
    • 1
  • Heshaam Faili
    • 2
  • Behrouz Minaei-Bidgoli
    • 1
  1. 1.Department of Computer EngineeringIran University of Science and TechnologyIran
  2. 2.School of Electrical & Computer EngineeringTehran UniversityIran

Personalised recommendations