Skip to main content

Authorship Identification with Multi Sequence Word Selection Method

  • Conference paper
  • First Online:
Intelligent Systems Design and Applications (ISDA 2018 2018)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 940))

Abstract

Authorship analysis process in which finding the author of unknown text when the history of the writing style of the author known. It can be viewed as a multi-class, single-mark content classification assignment. A lot of author identification problems had been created and solved with the different methods. Character and word n-gram are most commonly used methods for feature construction and participated in authorship identification task. In this paper, we point out the methodology of formation of word n-gram. The approach described in the paper does not depend on constant value n, the value of n changes according to the occurrence of word sequences. The methodology applied to the collection of text which is from the varied time domain. We used a dataset of 13 authors, whose text generation time is big. Dynamic value of n chosen to generate word sequence. In terms of accuracy, the result shows improvement as compared to the fixed value of n in word n-gram. We also explore the significance of the dynamic value of n on the occurrence of a specified set of consecutive words.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Zheng, R., Li, J., Chen, H., Huang, Z.: A framework for authorship identification of online messages: writing-style features and classification techniques. J. Am. Soc. Inf. Sci. Technol. 57(3), 378–393 (2006)

    Article  Google Scholar 

  2. Rocha, A., Scheirer, W.J., Forstall, C.W., Cavalcante, T., Theophilo, A., Shen, B., Carvalho, A.R.B., Stamatatos, E.: Authorship attribution for social media forensics. IEEE Trans. Inf. Forensics Secur. 12(1), 5–33 (2017)

    Article  Google Scholar 

  3. Stamatatos, E.: A survey of modern authorship attribution methods. J. Am. Soc. Inf. Sci. Technol. 60(3), 538–556 (2009)

    Article  Google Scholar 

  4. Haj Hassan, F.I., Chaurasia, M.A.: N-gram based text author verification. In: Proceedings International Conference on Innovation and Information Management (ICIIM), Chengdu, China, vol. 36 (2012)

    Google Scholar 

  5. Niesler, T.R., Woodland, P.C.: A variable-length category-based n-gram language model. In: ICASSP. IEEE (1996)

    Google Scholar 

  6. Dagan, I., Lee, L., Pereira, F.C.N.: Similarity-based models of word cooccurrence probabilities. Mach. Learn. 34(1–3), 43–69 (1999)

    Article  Google Scholar 

  7. Kepler, F.N., Mergen, S.L.S., Billa, C.Z.: Simple variable length n-grams for probabilistic automata learning. In: International Conference on Grammatical Inference (2012)

    Google Scholar 

  8. Houvardas, J., Stamatatos, E.: N-gram feature selection for authorship identification. In: International Conference on Artificial Intelligence: Methodology, Systems, and Applications. Springer, Heidelberg (2006)

    Google Scholar 

  9. Sun, J., Yang, Z., Wang, P., Liu, S.: Variable length character n-gram approach for online writeprint identification. In: 2010 International Conference on Multimedia Information Networking and Security (MINES). IEEE (2010)

    Google Scholar 

  10. da Silva, J.F., Lopes, G.P.: A local maxima method and a fair dispersion normalization for extracting multi-word units from corpora. In: Sixth Meeting on Mathematics of Language (1999)

    Google Scholar 

  11. Pokou, Y.J.M., Fournier-Viger, P., Moghrabi, C.: Authorship attribution using variable length part-of-speech patterns. In: ICAART, vol. 2 (2016)

    Google Scholar 

  12. Zecevic, A.: N-gram based text classification according to authorship. In: Proceedings of the Second Student Research Workshop associated with RANLP (2011)

    Google Scholar 

  13. Layton, R., Watters, P., Dazeley, R.: Local n-grams for author identification notebook for PAN at CLEF 2013 CLEF (Working Notes) (2013)

    Google Scholar 

  14. Koppel, M., Schler, J., Argamon, S.: Authorship attribution in the wild. Lang. Res. Eval. 45(1), 83–94 (2011)

    Article  Google Scholar 

  15. Azarbonyad, H., Dehghani, M., Marx, M., Kamps, J.: Time-aware authorship attribution for short text streams. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 727–730. ACM (2015)

    Google Scholar 

  16. Peng, F., Schuurmans, D., Wang, S.: Augmenting naive bayes classifiers with statistical language models. Inf. Retrieval 7(3–4), 317–345 (2004)

    Article  Google Scholar 

  17. Tamboli, M.S., Prasad, R.S.: Authorship analysis and identification techniques: a review. Int. J. Comput. Appl. 77(16) (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mubin Shoukat Tamboli .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tamboli, M.S., Prasad, R.S. (2020). Authorship Identification with Multi Sequence Word Selection Method. In: Abraham, A., Cherukuri, A.K., Melin, P., Gandhi, N. (eds) Intelligent Systems Design and Applications. ISDA 2018 2018. Advances in Intelligent Systems and Computing, vol 940. Springer, Cham. https://doi.org/10.1007/978-3-030-16657-1_61

Download citation

Publish with us

Policies and ethics