Abstract
We present a supervised approach for style change detection, which aims at predicting whether there are changes in the style in a given text document, as well as at finding the exact positions where such changes occur. In particular, we combine a TF.IDF representation of the document with features specifically engineered for the task, and we make predictions via an ensemble of diverse classifiers including SVM, Random Forest, AdaBoost, MLP, and LightGBM. Whenever the model detects that style change is present, we apply it recursively, looking to find the specific positions of the change. Our approach powered the winning system for the PAN@CLEF 2018 task on Style Change Detection.
D. Kopev, D. Zlatkova, K. Mitov and A. Atanasov—Equal Contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
In this dataset, style change also means switch of authorship.
References
Damerau, F.J.: A technique for computer detection and correction of spelling errors. Commun. ACM 7(3), 171–176 (1964)
Meyer zu Eissen, S., Stein, B., Kulig, M.: Plagiarism detection without reference collections. In: Decker, R., Lenz, H.J. (eds.) Advances in Data Analysis, pp. 359–366. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-70981-7_40
Hagen, M., Potthast, M., Stein, B.: Overview of the author obfuscation task at PAN 2017: safety evaluation revisited. In: Working Notes Papers of the CLEF 2017 Evaluation Labs, CLEF 2017, vol. 1866 (2017)
Karadzhov, G., Mihaylova, T., Kiprov, Y., Georgiev, G., Koychev, I., Nakov, P.: The case for being average: a mediocrity approach to style masking and author obfuscation. In: Jones, G.J.F., et al. (eds.) CLEF 2017. LNCS, vol. 10456, pp. 173–185. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65813-1_18
Karaś, D., Śpiewak, M., Sobecki, P.: OPI-JSA at CLEF 2017: author clustering and style breach detection-notebook for PAN at CLEF 2017. In: CLEF 2017 Evaluation Labs and Workshop - Working Notes Papers, CLEF 2017, Dublin, Ireland (2017)
Ke, G., et al: LightGBM: a highly efficient gradient boosting decision tree. In: Proceedings of the 30th Annual Conference on Neural Information Processing Systems, NIPS 2017, Long Beach, California, pp. 3146–3154 (2017)
Kestemont, M., et al.: Overview of the author identification task at PAN-2018: cross-domain authorship attribution and style change detection. In: Working Notes of CLEF 2018 - Conference and Labs of the Evaluation Forum, CLEF 2018, Avignon, France (2018)
Khan, J.: Style breach detection: an unsupervised detection model–notebook for PAN at CLEF 2017. In: CLEF 2017 Evaluation Labs and Workshop - Working Notes Papers, CLEF 2017, Dublin, Ireland (2017)
Kuznetsov, M., Motrenko, A., Kuznetsova, R., Strijov, V.: Methods for intrinsic plagiarism detection and author diarization–notebook for PAN at CLEF 2016. In: CLEF 2016 Evaluation Labs and Workshop - Working Notes Papers, CLEF 2016, Évora, Portugal (2016)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions and reversals. Sov. Phys. Dokl. 10, 707 (1966)
Loper, E., Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, ETMTNLP 2002, Philadelphia, Pennsylvania, pp. 63–70 (2002)
Mihaylova, T., Karadjov, G., Kiprov, Y., Georgiev, G., Koychev, I., Nakov, P.: SU@PAN’2016: author obfuscation. In: Working Notes of CLEF 2016 - Conference and Labs of the Evaluation forum, CLEF 2016, Évora, Portugal, pp. 956–969 (2016)
Pervaz, I., Ameer, I., Sittar, A., Nawab, R.: Identification of author personality traits using stylistic features–notebook for PAN at CLEF 2015. In: CLEF 2015 Evaluation Labs and Workshop - Working Notes Papers, CLEF 2015, Toulouse, France (2015)
Pevzner, L., Hearst, M.A.: A critique and improvement of an evaluation metric for text segmentation. Comput. Linguist. 28(1), 19–36 (2002)
Potthast, M., Hagen, M., Stein, B.: Author obfuscation: attacking the state of the art in authorship verification. In: Working Notes Papers of the CLEF 2016 Evaluation Labs, CLEF 2016, Évora, Portugal (2016)
Safin, K., Kuznetsova, R.: Style breach detection with neural sentence embeddings–notebook for PAN at CLEF 2017. In: CLEF 2017 Evaluation Labs and Workshop - Working Notes Papers, CLEF 2017, Dublin, Ireland (2017)
Scaiano, M., Inkpen, D.: Getting more from segmentation evaluation. In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2012, Montreal, Canada, pp. 362–366 (2012)
Sittar, A., Iqbal, H., Nawab, R.: Author diarization using cluster-distance approach-notebook for PAN at CLEF 2016. In: CLEF 2016 Evaluation Labs and Workshop - Working Notes Papers, CLEF 2016, Évora, Portugal (2016)
Tschuggnall, M., et al.: Overview of the author identification task at PAN-2017: style breach detection and author clustering. In: Working Notes Papers of the CLEF 2017 Evaluation Labs, CLEF 2017, Dublin, Ireland (2017)
Zlatkova, D., et al.: An ensemble-rich multi-aspect approach towards robust style change detection: notebook for PAN at CLEF 2018. In: Working Notes of CLEF 2018 - Conference and Labs of the Evaluation Forum, CLEF 2018, Avignon, France (2018)
Acknowledgements
This work was supported by the Bulgarian National Scientific Fund within the project no. DN 12/9, and by the Scientific Fund of the Sofia University within project no. 80-10-162/25.04.2018.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Kopev, D. et al. (2018). Recursive Style Breach Detection with Multifaceted Ensemble Learning. In: Agre, G., van Genabith, J., Declerck, T. (eds) Artificial Intelligence: Methodology, Systems, and Applications. AIMSA 2018. Lecture Notes in Computer Science(), vol 11089. Springer, Cham. https://doi.org/10.1007/978-3-319-99344-7_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-99344-7_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99343-0
Online ISBN: 978-3-319-99344-7
eBook Packages: Computer ScienceComputer Science (R0)