Abstract
Authorship analysis process in which finding the author of unknown text when the history of the writing style of the author known. It can be viewed as a multi-class, single-mark content classification assignment. A lot of author identification problems had been created and solved with the different methods. Character and word n-gram are most commonly used methods for feature construction and participated in authorship identification task. In this paper, we point out the methodology of formation of word n-gram. The approach described in the paper does not depend on constant value n, the value of n changes according to the occurrence of word sequences. The methodology applied to the collection of text which is from the varied time domain. We used a dataset of 13 authors, whose text generation time is big. Dynamic value of n chosen to generate word sequence. In terms of accuracy, the result shows improvement as compared to the fixed value of n in word n-gram. We also explore the significance of the dynamic value of n on the occurrence of a specified set of consecutive words.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zheng, R., Li, J., Chen, H., Huang, Z.: A framework for authorship identification of online messages: writing-style features and classification techniques. J. Am. Soc. Inf. Sci. Technol. 57(3), 378–393 (2006)
Rocha, A., Scheirer, W.J., Forstall, C.W., Cavalcante, T., Theophilo, A., Shen, B., Carvalho, A.R.B., Stamatatos, E.: Authorship attribution for social media forensics. IEEE Trans. Inf. Forensics Secur. 12(1), 5–33 (2017)
Stamatatos, E.: A survey of modern authorship attribution methods. J. Am. Soc. Inf. Sci. Technol. 60(3), 538–556 (2009)
Haj Hassan, F.I., Chaurasia, M.A.: N-gram based text author verification. In: Proceedings International Conference on Innovation and Information Management (ICIIM), Chengdu, China, vol. 36 (2012)
Niesler, T.R., Woodland, P.C.: A variable-length category-based n-gram language model. In: ICASSP. IEEE (1996)
Dagan, I., Lee, L., Pereira, F.C.N.: Similarity-based models of word cooccurrence probabilities. Mach. Learn. 34(1–3), 43–69 (1999)
Kepler, F.N., Mergen, S.L.S., Billa, C.Z.: Simple variable length n-grams for probabilistic automata learning. In: International Conference on Grammatical Inference (2012)
Houvardas, J., Stamatatos, E.: N-gram feature selection for authorship identification. In: International Conference on Artificial Intelligence: Methodology, Systems, and Applications. Springer, Heidelberg (2006)
Sun, J., Yang, Z., Wang, P., Liu, S.: Variable length character n-gram approach for online writeprint identification. In: 2010 International Conference on Multimedia Information Networking and Security (MINES). IEEE (2010)
da Silva, J.F., Lopes, G.P.: A local maxima method and a fair dispersion normalization for extracting multi-word units from corpora. In: Sixth Meeting on Mathematics of Language (1999)
Pokou, Y.J.M., Fournier-Viger, P., Moghrabi, C.: Authorship attribution using variable length part-of-speech patterns. In: ICAART, vol. 2 (2016)
Zecevic, A.: N-gram based text classification according to authorship. In: Proceedings of the Second Student Research Workshop associated with RANLP (2011)
Layton, R., Watters, P., Dazeley, R.: Local n-grams for author identification notebook for PAN at CLEF 2013 CLEF (Working Notes) (2013)
Koppel, M., Schler, J., Argamon, S.: Authorship attribution in the wild. Lang. Res. Eval. 45(1), 83–94 (2011)
Azarbonyad, H., Dehghani, M., Marx, M., Kamps, J.: Time-aware authorship attribution for short text streams. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 727–730. ACM (2015)
Peng, F., Schuurmans, D., Wang, S.: Augmenting naive bayes classifiers with statistical language models. Inf. Retrieval 7(3–4), 317–345 (2004)
Tamboli, M.S., Prasad, R.S.: Authorship analysis and identification techniques: a review. Int. J. Comput. Appl. 77(16) (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Tamboli, M.S., Prasad, R.S. (2020). Authorship Identification with Multi Sequence Word Selection Method. In: Abraham, A., Cherukuri, A.K., Melin, P., Gandhi, N. (eds) Intelligent Systems Design and Applications. ISDA 2018 2018. Advances in Intelligent Systems and Computing, vol 940. Springer, Cham. https://doi.org/10.1007/978-3-030-16657-1_61
Download citation
DOI: https://doi.org/10.1007/978-3-030-16657-1_61
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-16656-4
Online ISBN: 978-3-030-16657-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)