Advertisement

On Approaches to Discretization of Datasets Used for Evaluation of Decision Systems

  • Grzegorz Baron
  • Katarzyna Harężlak
Conference paper
Part of the Smart Innovation, Systems and Technologies book series (SIST, volume 57)

Abstract

The paper describes research on ways of datasets discretization, when test datasets are used for evaluation of a classifier. Three different approaches of processing for training and test datasets are presented: “independent”—where discretization is performed separately for both sets assuming that the same algorithm parameters are used; “glued”—where both sets are concatenated, discretized, and resulting set is separated to obtain training and test sets, and finally “test on learn”—where test dataset is discretized using ranges obtained from learning data. All methods have been investigated and tested in authorship attribution domain using Naive Bayes classifier.

Keywords

Discretization Decision system Classification Naive Bayes classifier Authorship attribution 

Notes

Acknowledgments

The research described was performed at the Silesian University of Technology, Gliwice, Poland, in the framework of the project BK/RAu2/2016. All experiments were performed using WEKA workbench [4].

References

  1. 1.
    Baron, G.: Influence of data discretization on efficiency of Bayesian Classifier for authorship attribution. Procedia Comput. Sci. 35, 1112–1121 (2014)CrossRefGoogle Scholar
  2. 2.
    Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: Machine Learning: Proceedings of the 12th International Conference, pp. 194–202. Morgan Kaufmann (1995)Google Scholar
  3. 3.
    Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proceedings of the 13th International Joint Conference on Artificial Intelligence (IJCAI), pp. 1022–1029 (1993)Google Scholar
  4. 4.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)CrossRefGoogle Scholar
  5. 5.
    Kim, S.B., Han, K.S., Rim, H.C., Myaeng, S.H.: Some effective techniques for Naive Bayes text classification. IEEE Trans. Knowl. Data Eng. 18(11), 1457–1466 (2006)CrossRefGoogle Scholar
  6. 6.
    Kononenko, I.: On biases in estimating multi-valued attributes. In: 14th International Joint Conference on Articial Intelligence, pp. 1034–1040 (1995)Google Scholar
  7. 7.
    Kotsiantis, S.B.: Supervised machine learning: a review of classification techniques. In: Proceedings of the 2007 Conference on Emerging Artificial Intelligence Applications in Computer Engineering: Real Word AI Systems with Applications in eHealth. HCI, Information Retrieval and Pervasive Technologies, pp. 3–24. IOS Press, Amsterdam, The Netherlands (2007)Google Scholar
  8. 8.
    Kotsiantis, S., Kanellopoulos, D.: Discretization techniques: a recent survey. Int. Trans. Comput. Sci. Eng. 1(32), 47–58 (2006)Google Scholar
  9. 9.
    McCallum, A., Nigam, K.: A comparison of event models for Naive Bayes text classification. In: AAAI-98 Workshop On Learning For Text Categorization, pp. 41–48. AAAI Press (1998)Google Scholar
  10. 10.
    Schneider, K.M.: Techniques for improving the performance of Naive Bayes for text classification. In: Proceedings of 6th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing), pp. 682–693 (2005)CrossRefGoogle Scholar
  11. 11.
    Stańczyk, U.: Rule-based approach to computational stylistics. In: Bouvry, P., Kłopotek, M., Marciniak, M., Mykowiecka, A., Rybiński, H. (eds.) Security and Intelligent Information Systems, LNCS (LNAI), vol. 7053, pp. 168–179. Springer, Berlin (2012)CrossRefGoogle Scholar
  12. 12.
    Stańczyk, U.: Ranking of characteristic features in combined wrapper approaches to selection. Neural Comput. Appl. 26(2), 329–344 (2015)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Youn, E., Jeong, M.K.: Class dependent feature scaling method using Naive Bayes classifier for text datamining. Pattern Recognit. Lett. 30(5), 477–485 (2009)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (http://creativecommons.org/licenses/by-nc/2.5/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Authors and Affiliations

  1. 1.Silesian University of TechnologyGliwicePoland

Personalised recommendations