Syntactic Dependency-Based N-grams: More Evidence of Usefulness in Classification

  • Grigori Sidorov
  • Francisco Velasquez
  • Efstathios Stamatatos
  • Alexander Gelbukh
  • Liliana Chanona-Hernández
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7816)

Abstract

The paper introduces and discusses a concept of syntactic n-grams (sn-grams) that can be applied instead of traditional n-grams in many NLP tasks. Sn-grams are constructed by following paths in syntactic trees, so sn-grams allow bringing syntactic knowledge into machine learning methods. Still, previous parsing is necessary for their construction. We applied sn-grams in the task of authorship attribution for corpora of three and seven authors with very promising results.

Keywords

Syntactic n-grams sn-grams syntactic paths authorship attribution task SVM classifier 

References

  1. 1.
    Sidorov, G., Velasquez, F., Stamatatos, E., Gelbukh, A., Chanona-Hernández, L.: Syntactic Dependency-Based N-grams as Classification Features. In: Mendoza, M.G. (ed.) MICAI 2012, Part II. LNCS (LNAI), vol. 7630, pp. 1–11. Springer, Heidelberg (2013)Google Scholar
  2. 2.
    Khalilov, M., Fonollosa, J.A.R.: N-gram-based Statistical Machine Translation versus Syntax Augmented Machine Translation: comparison and system combination. In: Proceedings of the 12th Conference of the European Chapter of the ACL, pp. 424–432 (2009)Google Scholar
  3. 3.
    Habash, N.: The Use of a Structural N-gram Language Model in Generation-Heavy Hybrid Machine Translation. In: Belz, A., Evans, R., Piwek, P. (eds.) INLG 2004. LNCS (LNAI), vol. 3123, pp. 61–69. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  4. 4.
    Agarwal, A., Biads, F., Mckeown, K.R.: Contextual Phrase-Level Polarity Analysis using Lexical Affect Scoring and Syntactic N-grams. In: Proceedings of the 12th Conference of the European Chapter of the ACL (EACL), pp. 24–32 (2009)Google Scholar
  5. 5.
    Cheng, W., Greaves, C., Warren, M.: From n-gram to skipgram to concgram. International Journal of Corpus Linguistics 11(4), 411–433 (2006)CrossRefGoogle Scholar
  6. 6.
    Baayen, H., Tweedie, F., Halteren, H.: Outside The Cave of Shadows: Using Syntactic Annotation to Enhance Authorship Attribution. Literary and Linguistic Computing, pp. 121–131 (1996)Google Scholar
  7. 7.
    Stamatatos, E.: A survey of modern authorship attribution methods. Journal of the American Society for Information Science and Technology 60(3), 538–556 (2009)CrossRefGoogle Scholar
  8. 8.
    Juola, P.: Authorship Attribution. Foundations and Trends in Information Retrieval 1(3), 233–334 (2006)CrossRefGoogle Scholar
  9. 9.
    Argamon, S., Juola, P.: Overview of the international authorship identification competition at PAN-2011. In: 5th Int. Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (2011)Google Scholar
  10. 10.
    Koppel, M., Schler, J., et al.: Authorship attribution in the wild. Language Resources and Evaluation 45(1), 83–94 (2011)CrossRefGoogle Scholar
  11. 11.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009)Google Scholar
  12. 12.
    de Marneffe, M.C., MacCartney, B., Manning, C.D.: Generating Typed Dependency Parses from Phrase Structure Parses. In: Proc. of LREC (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Grigori Sidorov
    • 1
  • Francisco Velasquez
    • 1
  • Efstathios Stamatatos
    • 2
  • Alexander Gelbukh
    • 1
  • Liliana Chanona-Hernández
    • 3
  1. 1.Center for Computing Research (CIC)Instituto Politécnico Nacional (IPN)Mexico CityMexico
  2. 2.University of the AegeanGreece
  3. 3.ESIMEInstituto Politécnico Nacional (IPN)Mexico CityMexico

Personalised recommendations