CD: A Coupled Discretization Algorithm

Wang, Can; Wang, Mingchun; She, Zhong; Cao, Longbing

doi:10.1007/978-3-642-30220-6_34

Can Wang²³,
Mingchun Wang²⁴,
Zhong She²³ &
…
Longbing Cao²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7302))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

2336 Accesses
6 Citations

Abstract

Discretization technique plays an important role in data mining and machine learning. While numeric data is predominant in the real world, many algorithms in supervised learning are restricted to discrete variables. Thus, a variety of research has been conducted on discretization, which is a process of converting the continuous attribute values into limited intervals. Recent work derived from entropy-based discretization methods, which has produced impressive results, introduces information attribute dependency to reduce the uncertainty level of a decision table; but no attention is given to the increment of certainty degree from the aspect of positive domain ratio. This paper proposes a discretization algorithm based on both positive domain and its coupling with information entropy, which not only considers information attribute dependency but also concerns deterministic feature relationship. Substantial experiments on extensive UCI data sets provide evidence that our proposed coupled discretization algorithm generally outperforms other seven existing methods and the positive domain based algorithm proposed in this paper, in terms of simplicity, stability, consistency, and accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

An, A., Cercone, N.: Discretization of Continuous Attributes for Learning Classification Rules. In: Zhong, N., Zhou, L. (eds.) PAKDD 1999. LNCS (LNAI), vol. 1574, pp. 509–514. Springer, Heidelberg (1999)
Chapter Google Scholar
Banda, J.M., Angryk, R.A.: On the effectiveness of fuzzy clustering as a data discretization technique for large-scale classification of solar images. In: FUZZ-IEEE 2009, pp. 2019–2024 (2009)
Google Scholar
Beynon, M.J.: Stability of continuous value discretisation: an application within rough set theory. International Journal of Approximate Reasoning 35, 29–53 (2004)
Article MATH Google Scholar
Chen, C., Wang, L.: Rough set-based clustering with refinement using Shannon’s entropy theory. Computers and Mathematics with Applications 52(10-11), 1563–1576 (2006)
Article MathSciNet MATH Google Scholar
Chmielewski, M.R., Grzymala-Busse, J.W.: Global discretization of continuous attributes as preprocessing for machine learning. International Journal of Approximate Reasoning 15, 319–331 (1996)
Article MATH Google Scholar
Liu, H., Hussain, F., Tan, C.L., Dash, M.: Discretization: an enabling technique. Data Mining and Knowledge Discovery 6, 393–423 (2002)
Article MathSciNet Google Scholar
Liu, W., Chawla, S.: Class Confidence Weighted kNN Algorithms for Imbalanced Data Sets. In: Huang, J.Z., Cao, L., Srivastava, J. (eds.) PAKDD 2011, Part II. LNCS, vol. 6635, pp. 345–356. Springer, Heidelberg (2011)
Chapter Google Scholar
Pawlak, Z., Wong, S.K.M., Ziarko, W.: Rough sets: probabilistic versus deterministic approach. International Journal of Man-Machine Studies 29, 81–95 (1988)
Article MATH Google Scholar
Qin, B., Xia, Y., Li, F.: DTU: A Decision Tree for Uncertain Data. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 4–15. Springer, Heidelberg (2009)
Chapter Google Scholar
Son, N.H., Szczuka, M.: Rough sets in KDD. In: PAKDD 2005, pp. 1–91 (2005)
Google Scholar
Wang, C., Cao, L., Wang, M., Li, J., Wei, W., Ou, Y.: Coupled nominal similarity in unsupervised learning. In: CIKM 2011, pp. 973–978 (2011)
Google Scholar
Wang, G., Zhao, J., An, J., Wu, Y.: A comparative study of algebra viewpoint and information viewpoint in attribute reduction. Fundamenta Informaticae 68, 289–301 (2005)
MathSciNet MATH Google Scholar
Yang, Y., Webb, G.I.: Discretization for Naive-Bayes learning: managing discretization bias and variance. Machine Learning 74, 39–74 (2009)
Article Google Scholar
Zhang, X., Wu, J., Yang, X., Lu, T.: Estimation of market share by using discretization technology: an application in China mobile. In: ICCS 2008, pp. 466–475 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Quantum Computation and Intelligent Systems Advanced Analytics Institute, University of Technology, Sydney, Australia
Can Wang, Zhong She & Longbing Cao
School of Science, Tianjin University of Technology and Education, China
Mingchun Wang

Authors

Can Wang
View author publications
You can also search for this author in PubMed Google Scholar
Mingchun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhong She
View author publications
You can also search for this author in PubMed Google Scholar
Longbing Cao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Michigan State University, 428 S. Shaw Lane, 48824-1226, East Lansing, MI, USA
Pang-Ning Tan
School of Information Technologies, University of Sydney, 1 Cleveland St., 2006, Sydney, NSW, Australia
Sanjay Chawla
Faculty of Computing and Informatics, Jalan Multimedia, Multimedia University, 63100, Cyberjaya, Selangor, Malaysia
Chin Kuan Ho
Department of Computing and Information Systems, The University of Melbourne, 111 Barry Street, 3053, Melbourne, VIC, Australia
James Bailey

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, C., Wang, M., She, Z., Cao, L. (2012). CD: A Coupled Discretization Algorithm. In: Tan, PN., Chawla, S., Ho, C.K., Bailey, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2012. Lecture Notes in Computer Science(), vol 7302. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30220-6_34

Download citation

DOI: https://doi.org/10.1007/978-3-642-30220-6_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30219-0
Online ISBN: 978-3-642-30220-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics