Effect of the Text Size on Stylometry—Application on Arabic Religious Texts

Ouamour, S.; Khennouf, S.; Bourib, S.; Hadjadj, H.; Sayoud, H.

doi:10.1007/978-3-319-38884-7_16

S. Ouamour¹⁸,
S. Khennouf¹⁸,
S. Bourib¹⁸,
H. Hadjadj¹⁸ &
…
H. Sayoud¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 453))

595 Accesses
4 Citations

Abstract

In stylometry, there are two important technical questions: Firstly, does the text size affect the authorship attribution performances? and secondly, what could be the effect of the language on that attribution? To respond to those questions, we have conducted several experiments of authorship attribution applied on multi-size text documents. The text size varies from 100 words to 3000 words per document. For that purpose, a specific Arabic dataset has been conceived (i.e. A4P corpus). The corpus is made available for the scientific community and is suitable for the task of stylometry since the genre and theme are quite similar. Two types of features are investigated: character n-grams and words, in association with several classifiers, namely: SVM, MLP, Linear regression, Stamatatos distance and Manhattan distance. During the experiments, 2 types of scores are proposed: the “Score of Good Attribution” and “Robustness against Size Reduction” ratio. Results are quite interesting, showing that the minimum text size required for performing a fair authorship attribution, depends on the feature and classification method that are employed. For the evaluation task, a specific application of authorship attribution has been conducted on 7 religious books, where the main purpose was to check whether the Quran and Hadith could have the same Author or not. Results have clearly shown that those two books should have 2 different Authors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Juola, P.: JGAAP, Authorship attribution. In: Foundations and Trends in Information Retrieval, vol. 1, no. 3, pp. 233–334. Now Publisher (2006)
Google Scholar
Stamatatos, E.: A survey of modern authorship attribution methods. J. Am. Soc. Inform. Sci. Technol. 60(3), 538–556 (2009)
Article Google Scholar
Sayoud, H.: A Visual analytics based investigation on the authorship of the holy Quran. In: 6th International Conference on Information Visualization Theory and Applications, pp. 177−181. Berlin, 11−14 Mar 2015
Google Scholar
Vel, O. de., Anderson, A., Corney, M., Mohay, G.: ACM SIGMOD Rec. 30(4), 55−64 (2001)
Google Scholar
Luyckx, K., Daelemans, W.: The effect of author set size and data size in authorship attribution. Lit. Ling. Comput. 26(1), 35–55 (2011)
Article Google Scholar
Eder, M.: Does size matter? Authorship attribution, small samples, big problem. Lit. Ling. Comput. (2013). doi:10.1093/llc/fqt066
Article Google Scholar
Stamatatos, E., Fakotakis, N., Kokkinakis, G.: Automatic authorship attribution. In: Proceedings of the 9th Conference οf the European Chapter of the Association for Computer Linguistics, pp. 158−164 (1999)
Google Scholar
Peng, F., Huang, X., Schuurmans, D., Wang, S.: Text classification in Asian languages without word segmentation. Proceedings of the sixth international workshop on Information retrieval with Asian languages 1, 41–48 (2003)
Article Google Scholar
Kešelj, V., Peng, F., Cercone, N., Thomas, C.: N-gram-based author profiles for authorship attribution. In: Proceedings of the Pacific Association for Computer Linguistics, vol. 3, pp. 255−264 (2003)
Google Scholar
Sayoud, H.: Automatic speaker recognition—Connexionnist approach. PhD thesis, USTHB University, Algiers (2003)
Google Scholar
Witten, I.H., Eibe, F., Trigg, L., Hall, M., Holmes, G., Cunningham S.J.: Weka: practical machine learning tools and techniques with Java implementations. In: Proceedings of the ICONIP/ANZIIS/ANNES’99 Workshop on Emerging Knowledge Engineering and Connectionist-Based Information Systems, New Zealand, pp. 192−196 (1999)
Google Scholar
Keerthi, S.S., Shevade, S.K., Bhattacharyya, C., Murthy, K.R.K.: Improvements to platt’s SMO algorithm for SVM classifier design. Neural Comput. 13, 637–649 (2001)
Article MATH Google Scholar
Stamatatos, E.: On the robustness of authorship attribution based on character n-gram features. J. Law Policy 21(2), 421–439 (2013)
Google Scholar

Download references

Acknowledgements

We warmly thank the research team of Dr. Juola and Al-Waraq library.

Author information

Authors and Affiliations

Electronics & Computer Engineering Faculty, USTHB University, Bab Ezzouar, Algeria
S. Ouamour, S. Khennouf, S. Bourib, H. Hadjadj & H. Sayoud

Authors

S. Ouamour
View author publications
You can also search for this author in PubMed Google Scholar
S. Khennouf
View author publications
You can also search for this author in PubMed Google Scholar
S. Bourib
View author publications
You can also search for this author in PubMed Google Scholar
H. Hadjadj
View author publications
You can also search for this author in PubMed Google Scholar
H. Sayoud
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. Ouamour .

Editor information

Editors and Affiliations

International Institute for Applied Systems Analysis (IIASA), Laxenburg, Austria
Thanh Binh Nguyen
Department of Networked Systems and Services, Budapest University of Technology and Economics, Budapest, Hungary
Tien van Do
Laboratory of Theoretical and Applied Computer Science (LITA), UFR MIM, University of Lorraine, Ile du Saulcy, Metz, France
Hoai An Le Thi
Institute of Informatics, Wrocław University of Technology, Wrocław, Poland
Ngoc Thanh Nguyen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ouamour, S., Khennouf, S., Bourib, S., Hadjadj, H., Sayoud, H. (2016). Effect of the Text Size on Stylometry—Application on Arabic Religious Texts. In: Nguyen, T.B., van Do, T., An Le Thi, H., Nguyen, N.T. (eds) Advanced Computational Methods for Knowledge Engineering. Advances in Intelligent Systems and Computing, vol 453. Springer, Cham. https://doi.org/10.1007/978-3-319-38884-7_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-38884-7_16
Published: 23 April 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-38883-0
Online ISBN: 978-3-319-38884-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics