Skip to main content

Authorship Identification Using Random Projections

  • 797 Accesses

Part of the Advances in Intelligent Systems and Computing book series (AISC,volume 764)

Abstract

The paper describes the results of experiments in applying the Random Projection (RP) method for authorship identification of online texts. We propose using RP for feature dimensionality reduction to low-dimensional feature subspace combined with probability density function (PDF) estimation for identification of the features of each author. In our experiments, we use the dataset of Internet comments posted on a web news site in Lithuanian language, and we have achieved 92% accuracy of author identification.

Keywords

  • Author identification
  • Text mining
  • Digital text forensics
  • Random projections

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-91189-2_6
  • Chapter length: 10 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   219.00
Price excludes VAT (USA)
  • ISBN: 978-3-319-91189-2
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   279.99
Price excludes VAT (USA)
Fig. 1.

References

  1. Bean, J.: The medium is the fake news. Interactions 24(3), 24–25 (2017)

    CrossRef  Google Scholar 

  2. Iqbal, F., Binsalleeh, H., Fung, B.C.M., Debbabi, M.: A unified data mining solution for authorship analysis in anonymous textual communications. Inf. Sci. 231, 98–112 (2013)

    CrossRef  Google Scholar 

  3. de Vel, O., Anderson, A., Corney, M., Mohay, G.: Mining e-mail content for author identification forensics. SIGMOD Rec. 30(4), 55–64 (2001)

    CrossRef  Google Scholar 

  4. Pillay, S.R., Solorio, T.: Authorship attribution of web forum posts. In: Proceedings of the eCrime Researchers Summit (eCrime), pp. 1–7 (2010)

    Google Scholar 

  5. Potthast, M., Stein, B., Barrón, A., Rosso, P.: An evaluation framework for plagiarism detection. In: Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), pp. 997–1005 (2010)

    Google Scholar 

  6. Stein, B., Nedim Lipka, N., Prettenhofer, P.: Intrinsic plagiarism analysis. Lang. Resour. Eval. 45(1), 63–82 (2011)

    CrossRef  Google Scholar 

  7. van Dam, M., Hauff, C.: Large-scale author verification: temporal and topical influences. In: 37th International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR 2014), pp. 1039–1042 (2014)

    Google Scholar 

  8. Sanderson, C., Guenter, S.: Short text authorship attribution via sequence kernels, Markov chains and author unmasking: an investigation. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP 2006), pp. 482–491 (2006)

    Google Scholar 

  9. Kestemont, M., Luyckx, K., Daelemans, W., Crombez, T.: Cross-genre authorship verification using unmasking. Engl. Stud. 93(3), 340–356 (2012)

    CrossRef  Google Scholar 

  10. Clark, J.H., Hannon, C.J.: A classifier system for author recognition using synonym-based features. In: Mexican International Conference on Advances in Artificial Intelligence, MICAI 2007. LNCS, vol. 4827, pp. 839–849. Springer (2007)

    Google Scholar 

  11. Luyckx, K., Daelemans, W.: Authorship attribution and verification with many authors and limited data. In: 22nd International Conference on Computational Linguistics, COLING 2008, vol. 1, pp. 513–520 (2008)

    Google Scholar 

  12. Sukhoparov, M.E.: Mechanism of establishing authorship of short messages posted by users of internet portals by methods of mathematical linguistics. Aut. Control Comp. Sci. 49, 813–819 (2015)

    CrossRef  Google Scholar 

  13. Kapociute-Dzikiene, J., Venckauskas, A., Damasevicius, R.: A comparison of authorship attribution approaches applied on the Lithuanian language. In: Federated Conference on Computer Science and Information Systems, FedCSIS 2017, pp. 347–351 (2017)

    Google Scholar 

  14. Nagy, T.I., Farkas, R., Csirik, J.: On positive and unlabeled learning for text classification. In: Proceedings of the 14th International Conference on Text, Speech and Dialogue (TSD 2011), pp. 2019–226 (2011)

    Google Scholar 

  15. Wang, Y.: An incremental classification algorithm for mining data with feature space heterogeneity. Math. Probl. Eng. 2014, art. 327142, 9 p. (2014)

    Google Scholar 

  16. Prati, R.C., Batista, G.E.A.P.A., Monard, M.C.: Class imbalances versus class overlapping: an analysis of a learning system behavior. In: Mexican International Conference on Advances in Artificial Intelligence, MICAI 2004. LNCS, vol. 2972, pp. 312–321 (2004)

    CrossRef  Google Scholar 

  17. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)

    CrossRef  Google Scholar 

  18. Lim, T.S., Loh, W.Y., Shih, Y.S.: A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Mach. Learn. 40, 203 (2000)

    CrossRef  Google Scholar 

  19. Venckauskas, A., Damasevicius, R., Marcinkevicius, R., Karpavicius, A.: Problems of authorship identification of the national language electronic discourse. In: 21st International Conference on Information and Software Technologies - ICIST 2015. CCIS, vol. 538, pp. 415–432. Springer (2015)

    Google Scholar 

  20. Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2001), pp. 245–250 (2001)

    Google Scholar 

  21. Fradkin, D., Madigan, D.: Experiments with random projections for machine learning. In: 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 517–522 (2003)

    Google Scholar 

  22. Carraher, L.A., Wilsey, P.A., Moitra, A., Dey, S.: Random projection clustering on streaming data. In: 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), pp. 708–715 (2016)

    Google Scholar 

  23. Thanei, G.A., Heinze, C., Meinshausen, N.: Random projections for large-scale regression. In: Ahmed, S. (ed.) Big and Complex Data Analysis. Contributions to Statistics, pp. 51–68. Springer, Cham (2017)

    CrossRef  Google Scholar 

  24. Oh’uchi, H., Miura, T., Shioya, I.: Retrieval for text stream by random projection. In: International Conference on Information Systems Technology and its Applications (ISTA), pp. 151–164 (2004)

    Google Scholar 

  25. Achlioptas, D.: Database-friendly random projections. In: 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2001, p. 274 (2001)

    Google Scholar 

  26. Matoušek, J.: On variants of the Johnson-Lindenstrauss lemma. Random Struct. Alg. 33, 142–156 (2008)

    MathSciNet  CrossRef  Google Scholar 

  27. Parzen, E.: On estimation of a probability density function and mode. Ann. Math. Stat. 33(3), 1065 (1962)

    MathSciNet  CrossRef  Google Scholar 

  28. Palmer, A.D., Bunch, J., Styles, I.B.: The use of random projections for the analysis of mass spectrometry imaging data. J. Am. Soc. Mass Spectrom. 26, 315–322 (2015)

    CrossRef  Google Scholar 

  29. Naga Prasad, S., Narsimha, V.B., Vijayapal Reddy, P., Vinaya Babu, A.: Influence of lexical, syntactic and structural features and their combination on authorship attribution for Telugu text. Proced. Comput. Sci. 48, 58–64 (2015). International Conference on Computer, Communication and Convergence (ICCC 2015)

    CrossRef  Google Scholar 

  30. Kapociute-Dzikiene, J., Utka, A., Sarkute, L.: Authorship attribution of Internet comments with thousand candidate authors. In: 21st International Conference on Information and Software Technologies, ICIST 2015. CCIS, vol. 538, pp. 433–448. Springer (2015)

    Google Scholar 

  31. Venckauskas, A., Karpavicius, A., Damasevicius, R., Marcinkevicius, R., Kapociute-Dzikiene, J., Napoli, C.: Open class authorship attribution of Lithuanian Internet comments using one-class classifier. In: Federated Conference on Computer Science and Information Systems, FedCSIS 2017, pp. 373–382 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Robertas Damaševičius .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2019 Springer International Publishing AG, part of Springer Nature

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Damaševičius, R., Kapočiūtė-Dzikienė, J., Woźniak, M. (2019). Authorship Identification Using Random Projections. In: Silhavy, R. (eds) Artificial Intelligence and Algorithms in Intelligent Systems. CSOC2018 2018. Advances in Intelligent Systems and Computing, vol 764. Springer, Cham. https://doi.org/10.1007/978-3-319-91189-2_6

Download citation