Advertisement

Weighting Attributes and Decision Rules Through Rankings and Discretisation Parameters

  • Urszula Stańczyk
Chapter
Part of the Studies in Computational Intelligence book series (SCI, volume 801)

Abstract

Estimation of relevance for attributes can be gained by the means of their ranking, which, by calculated weights, puts variables into a specific order. A ranking of features can be exploited not only at the stage of data pre-processing, but also in post-processing exploration of properties for obtained solutions. The chapter is dedicated to research on weighting condition attributes and decision rules inferred within Classical Rough Set Approach, basing on a ranking and numbers of intervals found for features during supervised discretisation. The rule classifiers tested were employed within the stylometric analysis of texts for the task of binary authorship attribution with balanced data.

Keywords

Condition attribute Discretisation Ranking Decision rule CRSA Stylometry Authorship attribution 

Notes

Acknowledgements

In the research described in the chapter WEKA workbench [47], and RSES Software (developed at the Institute of Mathematics, Warsaw University (http://logic.mimuw.edu.pl/~rses/) [46]) was used. The research was performed at the Silesian University of Technology, Gliwice, within the project BK/RAu2/2018.

References

  1. 1.
    Peng, R., Hengartner, H.: Quantitative analysis of literary styles. Am. Stat. 56(3), 15–38 (2002)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Jockers, M., Witten, D.: A comparative study of machine learning methods for authorship attribution. Lit. Linguist. Comput. 25(2), 215–223 (2010)CrossRefGoogle Scholar
  3. 3.
    Kotsiantis, S., Kanellopoulos, D.: Discretization techniques: a recent survey. GESTS Int. Trans. Comput. Sci. Eng. 32(1), 47–58 (2006)Google Scholar
  4. 4.
    Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L. (eds.): Feature Extraction: Foundations and Applications. Springer, Berlin, Heidelberg (2006)Google Scholar
  5. 5.
    Stańczyk, U.: Ranking of characteristic features in combined wrapper approaches to selection. Neural Comput. Appl. 26(2), 329–344 (2015)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Stańczyk, U.: Weighting of attributes in an embedded rough approach. In: Gruca, A., Czachórski, T., Kozielski, S. (eds.) Man-Machine Interactions 3. Advances in Intelligent and Soft Computing, vol. 242, pp. 475–483. Springer, Berlin (2013)CrossRefGoogle Scholar
  7. 7.
    Sikora, M.: Rule quality measures in creation and reduction of data rule models. In: Greco, S., Hata, Y., Hirano, S., Inuiguchi, M., Miyamoto, S., Nguyen, H., Słowiński, R. (eds.) Rough Sets and Current Trends in Computing. Lecture Notes in Computer Science, vol. 4259, pp. 716–725. Springer (2006)Google Scholar
  8. 8.
    Pawlak, Z.: Rough sets and intelligent data analysis. Inf. Sci. 147, 1–12 (2002)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Pawlak, Z.: Rough sets. Int. J. Comput. Inf. Sci. 11(5), 341–356 (1982)CrossRefGoogle Scholar
  10. 10.
    Fayyad, U., Irani, K.: Multi-interval discretization of continuous valued attributes for classification learning. In: Proceedings of the 13th International Joint Conference on Artificial Intelligence, Morgan Kaufmann, pp. 1022–1027 (1993)Google Scholar
  11. 11.
    Argamon, S., Burns, K., Dubnov, S. (eds.): The Structure of Style: Algorithmic Approaches to Understanding Manner and Meaning. Springer, Berlin (2010)Google Scholar
  12. 12.
    Burrows, J.: Textual analysis. In: Schreibman, S., Siemens, R., Unsworth, J. (eds.) A Companion to Digital Humanities. Blackwell, Oxford (2004)Google Scholar
  13. 13.
    Craig, H.: Stylistic analysis and authorship studies. In: Schreibman, S., Siemens, R., Unsworth, J. (eds.) A Companion to Digital Humanities. Blackwell, Oxford (2004)Google Scholar
  14. 14.
    Lynam, T., Clarke, C., Cormack, G.: Information extraction with term frequencies. In: Proceedings of the Human Language Technology Conference, San Diego, pp. 1–4 (2001)Google Scholar
  15. 15.
    Baayen, H., van Haltern, H., Tweedie, F.: Outside the cave of shadows: using syntactic annotation to enhance authorship attribution. Lit. Linguist. Comput. 11(3), 121–132 (1996)CrossRefGoogle Scholar
  16. 16.
    Munro, R.: A queing-theory model of word frequency distributions. In: Proceedings of the 1st Australasian Language Technology Workshop, Melbourne, pp. 1–8 (2003)Google Scholar
  17. 17.
    Koppel, M., Schler, J., Argamon, S.: Computational methods in authorship attribution. J. Am. Soc. Inf. Sci. Technol. 60(1), 9–26 (2009)CrossRefGoogle Scholar
  18. 18.
    Stamatatos, E.: A survey of modern authorship attribution methods. J. Am. Soc. Inf. Sci. Technol. 60(3), 538–556 (2009)CrossRefGoogle Scholar
  19. 19.
    Stańczyk, U.: Application of DRSA-ANN classifier in computational stylistics. In: Kryszkiewicz, M., Rybiński, H., Skowron, A., Raś, Z. (eds.) Foundations of Intelligent Systems, ISMIS’11 Proceedings. Lecture Notes in Artificial Intelligence, vol. 6804, pp. 695–704. Springer (2011)Google Scholar
  20. 20.
    Waugh, S., Adams, A., Tweedie, F.: Computational stylistics using artificial neural networks. Lit. Linguist. Comput. 15(2), 187–198 (2000)CrossRefGoogle Scholar
  21. 21.
    Grzymała-Busse, J., Stefanowski, J., Wilk, S.: A comparison of two approaches to data mining from imbalanced data. In Negoita, M., Howlett, R., Jain, L. (eds.) Knowledge-Based Intelligent Information and Engineering Systems. Lecture Notes in Computer Science, vol. 3213, pp. 757–763. Springer (2004)Google Scholar
  22. 22.
    Stańczyk, U.: The class imbalance problem in construction of training datasets for authorship attribution. In: Gruca, A., Brachman, A., Kozielski, S., Czachórski, T. (eds.) Man-Machine Interactions 4. Advances in Intelligent and Soft Computing, vol. 391, pp. 535–547. Springer, Berlin (2016)CrossRefGoogle Scholar
  23. 23.
    Baron, G.: Comparison of cross-validation and test sets approaches to evaluation of classifiers in authorship attribution domain. In: Czachórski, T., Gelenbe, E., Grochla, K., Lent, R. (eds.) Proceedings of the 31st International Symposium on Computer and Information Sciences. Communications in Computer and Information Science, vol. 659, pp. 81–89. Springer, Cracow (2016)Google Scholar
  24. 24.
    Biesiada, J., Duch, W., Kachel, A., Pałucha, S.: Feature ranking methods based on information entropy with Parzen windows. In: Proceedings of International Conference on Research in Electrotechnology and Applied Informatics, Katowice, pp. 109–119 (2005)Google Scholar
  25. 25.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)zbMATHGoogle Scholar
  26. 26.
    Jensen, R., Shen, Q.: Computational Intelligence and Feature Selection. Wiley, Hoboken, US (2008)Google Scholar
  27. 27.
    Shannon, C.: A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423, 623–656 (1948)Google Scholar
  28. 28.
    John, G., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: Cohen, W., Hirsh, H. (eds.): Proceedings of the 11th International Conference on Machine Learning, pp. 121–129. Morgan Kaufmann Publishers (1994)Google Scholar
  29. 29.
    Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: Machine Learning Proceedings 1995: Proceedings of the 12th International Conference on Machine Learning, pp. 194–202. Elsevier (1995)Google Scholar
  30. 30.
    Baron, G.: On approaches to discretization of datasets used for evaluation of decision systems. In: Czarnowski, I., Caballero, A., Howlett, R., Jain, L. (eds.) Intelligent Decision Technologies 2016. Smart Innovation, Systems and Technologies, vol. 56, pp. 149–159. Springer (2016)Google Scholar
  31. 31.
    Abraham, A., Falcón, R., Bello, R. (eds.): Rough Set Theory: A True Landmark in Data Analysis. Studies in Computational Intelligence, vol. 174. Springer, Berlin (2009)Google Scholar
  32. 32.
    Deuntsch, I., Gediga, G.: Rough Set Data Analysis: A Road to Noninvasive Knowledge Discovery. Matho\(\delta \)os Publishers, Bangor (2000)Google Scholar
  33. 33.
    Pawlak, Z.: Computing, artificial intelligence and information technology: rough sets, decision algorithms and Bayes’ theorem. Eur. J. Oper. Res. 136, 181–189 (2002)MathSciNetCrossRefGoogle Scholar
  34. 34.
    Greco, S., Matarazzo, B., Słowiński, R.: Dominance-based rough set approach as a proper way of handling graduality in rough set theory. Trans. Rough Sets VII 4400, 36–52 (2007)MathSciNetCrossRefGoogle Scholar
  35. 35.
    Słowiński, R., Greco, S., Matarazzo, B.: Dominance-based rough set approach to reasoning about ordinal data. In: Kryszkiewicz, M., Peters, J., Rybiński, H., Skowron, A. (eds.) Rough Sets and Emerging Intelligent Systems Pardigms. Lecture Notes in Computer Science, vol. 4585, pp. 5–11. Springer, Berlin (2007)CrossRefGoogle Scholar
  36. 36.
    Bayardo Jr., R., Agrawal, R.: Mining the most interesting rules. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 145–154 (1999)Google Scholar
  37. 37.
    Michalak, M., Sikora, M., Wróbel, L.: Rule quality measures settings in a sequential covering rule induction algorithm—an empirical approach. In: Proceedings of the 2015 Federated Conference on Computer Science and Information Systems, ACSIS, vol. 5, pp. 109–118 (2015)Google Scholar
  38. 38.
    Zielosko, B.: Optimization of decision rules relative to coverage–comparison of greedy and modified dynamic programming approaches. In: Gruca, A., Brachman, A., Kozielski, S., Czachórski, T. (eds.) Man-Machine Interactions 4. Advances in Intelligent and Soft Computing, vol. 391, pp. 639–650. Springer, Berlin (2016)CrossRefGoogle Scholar
  39. 39.
    Zielosko, B.: Application of dynamic programming approach to optimization of association rules relative to coverage and length. Fundam. Inf. 148(1–2), 87–105 (2016)MathSciNetCrossRefGoogle Scholar
  40. 40.
    Moshkov, M., Piliszczuk, M., Zielosko, B.: On partial covers, reducts and decision rules with weights. Trans. Rough Sets VI 4374, 211–246 (2006)MathSciNetCrossRefGoogle Scholar
  41. 41.
    Wróbel, L., Sikora, M., Michalak, M.: Rule quality measures settings in classification, regression and survival rule induction–an empirical approach. Fundam. Inf. 149, 419–449 (2016)MathSciNetCrossRefGoogle Scholar
  42. 42.
    Stańczyk, U.: Evaluating importance for numbers of bins in discretised learning and test sets. In: Czarnowski, I., Howlett, J.R., Jain, C.L. (eds.) Intelligent Decision Technologies 2017: Proceedings of the 9th KES International Conference on Intelligent Decision Technologies (KES-IDT 2017)–Part II. Smart Innovation, Systems and Technologies, vol. 72. Springer International Publishing, pp. 159–169 (2018)Google Scholar
  43. 43.
    Stańczyk, U.: Filtering decision rules with continuous attributes governed by discretisation. In: Kryszkiewicz, M., Appice, A., Ślȩzak, D., Rybiński, H., Skowron, A., Raś, Z.W. (eds.) Foundations of Intelligent Systems. LNAI, vol. 10352, pp. 333–343. Springer, Cham, Switzerland (2017)CrossRefGoogle Scholar
  44. 44.
    Stańczyk, U., Zielosko, B.: On combining discretisation parameters and attribute ranking for selection of decision rules. In: Polkowski, L., Yao, Y., Artiemjew, P., Ciucci, D., Liu, D., Ślȩzak, D., Zielosko, B. (eds.) Rough Sets: International Joint Conference, IJCRS 2017, Olsztyn, Poland, July 3–7, 2017, Proceedings, Part I. Lecture Notes in Artificial Intelligence, vol. 10313, pp. 329–349. Springer, Cham, Switzerland (2017)CrossRefGoogle Scholar
  45. 45.
    Koppel, M., Argamon, S., Shimoni, A.: Automatically categorizing written texts by author gender. Lit. Linguist. Comput. 17(4), 401–412 (2002)CrossRefGoogle Scholar
  46. 46.
    Bazan, J., Szczuka, M.: The rough set exploration system. In: Peters, J.F., Skowron, A. (eds.) Transactions on Rough Sets III. Lecture Notes in Computer Science, vol. 3400, pp. 37–56. Springer, Berlin, Heidelberg (2005)CrossRefGoogle Scholar
  47. 47.
    Witten, I., Frank, E., Hall, M.: Data Mining. Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann (2011)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Institute of Informatics, Silesian University of TechnologyGliwicePoland

Personalised recommendations