Skip to main content

Rational Kernels for Arabic Text Classification

  • Conference paper
Statistical Language and Speech Processing (SLSP 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7978))

Included in the following conference series:

Abstract

Many stemming techniques are used in the context of Arabic Text Classification. In this paper, we show the effect of stemming on classification systems. We introduce a new stemming technique -approximate stemming- based on the use of Arabic patterns. These patterns are modeled using transducers and stemming is done without depending on any dictionary. Using transducers for stemming words, documents are transformed into finite state transducers. This allow us to use rational kernels as a framework for Arabic Text Classification. Experiments show that, when compared with other approaches, our approach is more effective specially in term of Accuracy, Recall and F1.

This work is supported by the MESRS - Algeria under Project 8/U03/7015.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sebastiani, F., Ricerche, C.N.D.: Machine Learning in Automated Text Categorization. ACM Computing Surveys 34, 1–47 (2002)

    Article  Google Scholar 

  2. Althubaity, A., Almuhareb, A., Alharbi, S., Al-Rajeh, A., Khorsheed, M.: KACST Arabic Text Classification Project: Overview and Preliminary Results. In: Proceedings of The 9th IBIMA Conference on Information Management in Modern Organizations (January 2008)

    Google Scholar 

  3. Duwairi, R.M.: Arabic Text Categorization. Int. Arab J. Inf. Technol. 4(2), 125–132 (2007)

    Google Scholar 

  4. Gharib, T., Habib, M., Fayed, Z.: Arabic Text Classification Using Support Vector Machines. International Journal of Computers and Their Applications 16(4), 192–199 (2009)

    Google Scholar 

  5. Khreisat, L.: A machine learning approach for Arabic text classification using N-gram frequency statistics. Journal of Informatrics 3(1), 72–77 (2009)

    Article  Google Scholar 

  6. Mesleh, A.: Support Vector Machines based Arabic Language Text Classification System: Feature Selection Comparative Study. In: Sobh, T. (ed.) Advances in Computer and Information Sciences and Engineering, pp. 11–16. Springer, Netherlands (2008)

    Chapter  Google Scholar 

  7. Syiam, M., Fayed, Z., Habib, M.: An Intelligent System For Arabic Text Categorization. International Journal of Intelligent Computing and Information Sciences 6(1), 1–19 (2006)

    Google Scholar 

  8. Al-Nashashibi, M., Neagu, D., Yaghi, A.: Stemming techniques for Arabic words: A comparative study. In: Computer Technology and Development (ICCTD), pp. 270–276 (November 2010)

    Google Scholar 

  9. Khoja, S., Garside, R.: Stemming arabic text (1999)

    Google Scholar 

  10. Al-Serhan, H., Shalabi, R.A., Kannan, G.: New Approach For Extracting Arabic Roots. In: Proceedings of The 2003 Arab Conf. on Infor. Technology, Alexandria, Egypt, pp. 42–59 (December 2003)

    Google Scholar 

  11. Aljlayl, M., Frieder, O.: On Arabic Search: Improving the Retrieval Effectiveness Via Light Stemming Approach. In: ACM Eleventh Conference on Infor. and Knowledge Management, pp. 340–347 (2002)

    Google Scholar 

  12. Cortes, C., Haffner, P., Mohri, M.: Rational Kernels: Theory and Algorithms. J. Mach. Learn. Res. 5, 1035–1062 (2004)

    MathSciNet  MATH  Google Scholar 

  13. Berstel, J.: Transductions and Context-Free Languages. Teubner Studienbücher, Stuttgart (1979)

    Google Scholar 

  14. Cortes, C., Kontorovich, L., Mohri, M.: Learning languages with rational kernels. In: Bshouty, N.H., Gentile, C. (eds.) COLT. LNCS (LNAI), vol. 4539, pp. 349–364. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  15. Allauzen, C., Riley, M.D., Schalkwyk, J., Skut, W., Mohri, M.: OpenFst: A General and Efficient Weighted Finite-State Transducer Library. In: Holub, J., Žďárek, J. (eds.) CIAA 2007. LNCS, vol. 4783, pp. 11–23. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  16. Nehar, A., Ziadi, D., Cherroun, H., Guellouma, Y.: An Efficient Stemming for Arabic Text Classification. In: Innovations in Information Technology (IIT), pp. 328–332. IEEE (March 2012)

    Google Scholar 

  17. Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011)

    Google Scholar 

  18. Lakhdari, A., Cherroun, H.: Effective Unsupervised Morphological Analysis and Modeling: Statistical Study for Arabic Language. In: Book of Abstracts of the 23rd Meeting of Computational Linguistics in the Netherlands: CLIN, p. 85 (January 2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nehar, A., Ziadi, D., Cherroun, H. (2013). Rational Kernels for Arabic Text Classification. In: Dediu, AH., Martín-Vide, C., Mitkov, R., Truthe, B. (eds) Statistical Language and Speech Processing. SLSP 2013. Lecture Notes in Computer Science(), vol 7978. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39593-2_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-39593-2_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-39592-5

  • Online ISBN: 978-3-642-39593-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics