Evaluating Importance for Numbers of Bins in Discretised Learning and Test Sets

  • Urszula StańczykEmail author
Conference paper
Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 72)


The paper presents research on the influence of the numbers of bins, found for attributes in supervised discretisation for input sets, on classifiers performance. Firstly, the variables were divided into categories defined by numbers of bins, and for these categories several decision systems were tested. Secondly, for features with single bins, unsupervised discretisation was executed and the resulting performance studied. The experiments show usefulness of characterisation of variables by numbers of bins, and cases of improvement of solutions by combining supervised with unsupervised discretisation.


Supervised discretisation Unsupervised discretisation Attribute Bin Classification 



In the research there was used RSES system, developed at the Institute of Mathematics, Warsaw University (, and WEKA workbench [10]. The research was performed at the Silesian University of Technology, Gliwice, within the project BK/RAu2/2017.


  1. 1.
    Argamon, S., Burns, K., Dubnov, S. (eds.): The Structure of Style: Algorithmic Approaches to Understanding Manner and Meaning. Springer, Berlin (2010)Google Scholar
  2. 2.
    Baron, G.: Comparison of cross-validation and test sets approaches to evaluation of classifiers in authorship attribution domain. In: Czachórski, T., Gelenbe, E., Grochla, K., Lent, R. (eds.) Proceedings of the 31st International Symposium on Computer and Information Sciences, Communications in Computer and Information Science, vol. 659, pp. 81–89. Springer, Cracow (2016)Google Scholar
  3. 3.
    Baron, G.: On approaches to discretization of datasets used for evaluation of decision systems. In: Czarnowski, I., Caballero, A., Howlett, R., Jain, L. (eds.) Intelligent Decision Technologies 2016, Smart Innovation, Systems and Technologies, vol. 56, pp. 149–159. Springer (2016)Google Scholar
  4. 4.
    Bazan, J., Szczuka, M.: The rough set exploration system. In: Peters, J.F., Skowron, A. (eds.) Transactions on Rough Sets III. LNCS, vol. 3400, pp. 37–56. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  5. 5.
    Burrows, J.: Textual analysis. In: Schreibman, S., Siemens, R., Unsworth, J. (eds.) A Companion to Digital Humanities. Blackwell, Oxford (2004)Google Scholar
  6. 6.
    Craig, H.: Stylistic analysis and authorship studies. In: Schreibman, S., Siemens, R., Unsworth, J. (eds.) A Companion to Digital Humanities. Blackwell, Oxford (2004)Google Scholar
  7. 7.
    Cyran, K., Stańczyk, U.: Indiscernibility relation for continuous attributes: application in image recognition. In: Kryszkiewicz, M., Peters, J., Rybiński, H., Skowron, A. (eds.) Rough Sets and Emerging Intelligent Systems Pardigms. LNAI, vol. 4585, pp. 726–735. Springer, Berlin (2007)CrossRefGoogle Scholar
  8. 8.
    Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: Machine Learning Proceedings 1995: Proceedings of the 12th International Conference on Machine Learning, pp. 194–202. Elsevier (1995)Google Scholar
  9. 9.
    Fayyad, U., Irani, K.: Multi-interval discretization of continuous valued attributes for classification learning. In: Proceedings of the 13th International Joint Conference on Artificial Intelligence, vol. 2, pp. 1022–1027. Morgan Kaufmann Publishers (1993)Google Scholar
  10. 10.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)CrossRefGoogle Scholar
  11. 11.
    Kotsiantis, S., Kanellopoulos, D.: Discretization techniques: a recent survey. GESTS Int. Trans. Comput. Sci. Eng. 32(1), 47–58 (2006)Google Scholar
  12. 12.
    Pawlak, Z.: Rough sets and intelligent data analysis. Inf. Sci. 147, 1–12 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Peng, R., Hengartner, H.: Quantitative analysis of literary styles. Am. Stat. 56(3), 15–38 (2002)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Stańczyk, U.: Recognition of author gender for literary texts. In: Czachórski, T., Kozielski, S., Stańczyk, U. (eds.) Man-Machine Interactions 2, Advances in Intelligent and Soft Computing, vol. 103, pp. 229–238. Springer, Berlin (2011)Google Scholar
  15. 15.
    Stańczyk, U.: Weighting of attributes in an embedded rough approach. In: Gruca, A., Czachórski, T., Kozielski, S. (eds.) Man-Machine Interactions 3, Advances in Intelligent and Soft Computing, vol. 242, pp. 475–483. Springer, Berlin (2013)Google Scholar
  16. 16.
    Stańczyk, U.: Weighting of features by sequential selection. In: Stańczyk, U., Jain, L. (eds.) Feature Selection for Data and Pattern Recognition. SCI, vol. 584, pp. 71–90. Springer, Berlin (2015)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.Institute of InformaticsSilesian University of TechnologyGliwicePoland

Personalised recommendations