Authorship Identification with Multi Sequence Word Selection Method

  • Mubin Shoukat TamboliEmail author
  • Rajesh S. Prasad
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 940)


Authorship analysis process in which finding the author of unknown text when the history of the writing style of the author known. It can be viewed as a multi-class, single-mark content classification assignment. A lot of author identification problems had been created and solved with the different methods. Character and word n-gram are most commonly used methods for feature construction and participated in authorship identification task. In this paper, we point out the methodology of formation of word n-gram. The approach described in the paper does not depend on constant value n, the value of n changes according to the occurrence of word sequences. The methodology applied to the collection of text which is from the varied time domain. We used a dataset of 13 authors, whose text generation time is big. Dynamic value of n chosen to generate word sequence. In terms of accuracy, the result shows improvement as compared to the fixed value of n in word n-gram. We also explore the significance of the dynamic value of n on the occurrence of a specified set of consecutive words.


Author identification Multiword gram Variable length Word n-gram 


  1. 1.
    Zheng, R., Li, J., Chen, H., Huang, Z.: A framework for authorship identification of online messages: writing-style features and classification techniques. J. Am. Soc. Inf. Sci. Technol. 57(3), 378–393 (2006)CrossRefGoogle Scholar
  2. 2.
    Rocha, A., Scheirer, W.J., Forstall, C.W., Cavalcante, T., Theophilo, A., Shen, B., Carvalho, A.R.B., Stamatatos, E.: Authorship attribution for social media forensics. IEEE Trans. Inf. Forensics Secur. 12(1), 5–33 (2017)CrossRefGoogle Scholar
  3. 3.
    Stamatatos, E.: A survey of modern authorship attribution methods. J. Am. Soc. Inf. Sci. Technol. 60(3), 538–556 (2009)CrossRefGoogle Scholar
  4. 4.
    Haj Hassan, F.I., Chaurasia, M.A.: N-gram based text author verification. In: Proceedings International Conference on Innovation and Information Management (ICIIM), Chengdu, China, vol. 36 (2012)Google Scholar
  5. 5.
    Niesler, T.R., Woodland, P.C.: A variable-length category-based n-gram language model. In: ICASSP. IEEE (1996)Google Scholar
  6. 6.
    Dagan, I., Lee, L., Pereira, F.C.N.: Similarity-based models of word cooccurrence probabilities. Mach. Learn. 34(1–3), 43–69 (1999)CrossRefGoogle Scholar
  7. 7.
    Kepler, F.N., Mergen, S.L.S., Billa, C.Z.: Simple variable length n-grams for probabilistic automata learning. In: International Conference on Grammatical Inference (2012)Google Scholar
  8. 8.
    Houvardas, J., Stamatatos, E.: N-gram feature selection for authorship identification. In: International Conference on Artificial Intelligence: Methodology, Systems, and Applications. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  9. 9.
    Sun, J., Yang, Z., Wang, P., Liu, S.: Variable length character n-gram approach for online writeprint identification. In: 2010 International Conference on Multimedia Information Networking and Security (MINES). IEEE (2010)Google Scholar
  10. 10.
    da Silva, J.F., Lopes, G.P.: A local maxima method and a fair dispersion normalization for extracting multi-word units from corpora. In: Sixth Meeting on Mathematics of Language (1999)Google Scholar
  11. 11.
    Pokou, Y.J.M., Fournier-Viger, P., Moghrabi, C.: Authorship attribution using variable length part-of-speech patterns. In: ICAART, vol. 2 (2016)Google Scholar
  12. 12.
    Zecevic, A.: N-gram based text classification according to authorship. In: Proceedings of the Second Student Research Workshop associated with RANLP (2011)Google Scholar
  13. 13.
    Layton, R., Watters, P., Dazeley, R.: Local n-grams for author identification notebook for PAN at CLEF 2013 CLEF (Working Notes) (2013)Google Scholar
  14. 14.
    Koppel, M., Schler, J., Argamon, S.: Authorship attribution in the wild. Lang. Res. Eval. 45(1), 83–94 (2011)CrossRefGoogle Scholar
  15. 15.
    Azarbonyad, H., Dehghani, M., Marx, M., Kamps, J.: Time-aware authorship attribution for short text streams. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 727–730. ACM (2015)Google Scholar
  16. 16.
    Peng, F., Schuurmans, D., Wang, S.: Augmenting naive bayes classifiers with statistical language models. Inf. Retrieval 7(3–4), 317–345 (2004)CrossRefGoogle Scholar
  17. 17.
    Tamboli, M.S., Prasad, R.S.: Authorship analysis and identification techniques: a review. Int. J. Comput. Appl. 77(16) (2013)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Matoshri College of Engineering and Research CentreNashikIndia
  2. 2.Sinhgad Institute Technology and Science NarhePuneIndia

Personalised recommendations