Chapter

Computational Linguistics and Intelligent Text Processing

Volume 7816 of the series Lecture Notes in Computer Science pp 13-24

Syntactic Dependency-Based N-grams: More Evidence of Usefulness in Classification

  • Grigori SidorovAffiliated withCenter for Computing Research (CIC), Instituto Politécnico Nacional (IPN)
  • , Francisco VelasquezAffiliated withCenter for Computing Research (CIC), Instituto Politécnico Nacional (IPN)
  • , Efstathios StamatatosAffiliated withUniversity of the Aegean
  • , Alexander GelbukhAffiliated withCenter for Computing Research (CIC), Instituto Politécnico Nacional (IPN)
  • , Liliana Chanona-HernándezAffiliated withESIME, Instituto Politécnico Nacional (IPN)

* Final gross prices may vary according to local VAT.

Get Access

Abstract

The paper introduces and discusses a concept of syntactic n-grams (sn-grams) that can be applied instead of traditional n-grams in many NLP tasks. Sn-grams are constructed by following paths in syntactic trees, so sn-grams allow bringing syntactic knowledge into machine learning methods. Still, previous parsing is necessary for their construction. We applied sn-grams in the task of authorship attribution for corpora of three and seven authors with very promising results.

Keywords

Syntactic n-grams sn-grams syntactic paths authorship attribution task SVM classifier