Efficient Multisplitting Revisited: Optima-Preserving Elimination of Partition Candidates

Elomaa, Tapio; Rousu, Juho

doi:10.1023/B:DAMI.0000015868.85039.e6

Efficient Multisplitting Revisited: Optima-Preserving Elimination of Partition Candidates

Published: March 2004

Volume 8, pages 97–126, (2004)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Tapio Elomaa¹ &
Juho Rousu¹

219 Accesses
34 Citations
6 Altmetric
Explore all metrics

Abstract

We consider multisplitting of numerical value ranges, a task that is encountered as a discretization step preceding induction and also embedded into learning algorithms. We are interested in finding the partition that optimizes the value of a given attribute evaluation function. For most commonly used evaluation functions this task takes quadratic time in the number of potential cut points in the numerical range. Hence, it is a potential bottleneck in data mining algorithms.

We present two techniques that speed up the optimal multisplitting task. The first one aims at discarding cut point candidates in a quick linear-time preprocessing scan before embarking on the actual search. We generalize the definition of boundary points by Fayyad and Irani to allow us to merge adjacent example blocks that have the same relative class distribution. We prove for several commonly used evaluation functions that this processing removes only suboptimal cut points. Hence, the algorithm does not lose optimality.

Our second technique tackles the quadratic-time dynamic programming algorithm, which is the best schema for optimizing many well-known evaluation functions. We present a technique that dynamically—i.e., during the search—prunes partitions of prefixes of the sorted data from the search space of the algorithm. The method works for all convex and cumulative evaluation functions.

Together the use of these two techniques speeds up the multisplitting process considerably. Compared to the baseline dynamic programming algorithm the speed-up is around 50 percent on the average and up to 90 percent in some cases. We conclude that optimal multisplitting is fully feasible on all benchmark data sets we have encountered.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

RealKrimp — Finding Hyperintervals that Compress with MDL for Real-Valued Data

A two-stage discretization algorithm based on information entropy

Article 24 May 2017

A Novel Attributes Partition Method for Decision Tree

References

Ankerst, M., Ester, M., and Kriegel, H.-P. 2000. Toward an effective cooperation of the user and the computer for classification. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, R. Ramakrishnan, S. Stolfo, R. Bayardo, and I. Parsa (Eds.), New York, NY: ACM Press, pp. 179–188.
Google Scholar
Auer, P. 1997. Optimal splits of single attributes. Technical report, Instute for Theoretical Computer Science, Graz University of Technology. Unpublished manuscript.
Auer, P., Holte, R.C., and Maass, W. 1995. Theory and application of agnostic PAC-learning with small decision trees. In Proceedings of the Twelfth International Conference on Machine Learning, A. Prieditis and S. Russell (Eds.), San Francisco, CA: Morgan Kaufmann, pp. 21–29.
Google Scholar
Birkendorf, A. 1997. On fast and simple algorithms for finding maximal subarrays and applications in learning theory. In Computational Learning Theory, Proceedings of the Third European Conference, S. Ben-David (Ed.), Vol. 1208of Lecture Notes in Artificial Intelligence. Berlin, Heidelberg: Springer-Verlag, pp. 198–209.
Google Scholar
Blake, C.L. and Merz, C.J. 1998. UCI repository of machine learning databases. University of California, Department of Information and Computer Science, Irvine, CA. http://www.ics.uci.edu/∼mlearn/MLRepository.html.
Google Scholar
Breiman, L. 1996. Some properties of splitting criteria. Machine Learning, 24(1):41–47.
Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.J. 1984. Classification and Regression Trees. Pacific Grove, CA: Wadsworth.
Google Scholar
Catlett, J. 1991. On changing continuous attributes into ordered discrete attributes. In Proceedings of the Fifth European Working Session on Learning, Y. Kodratoff (Ed.), Vol. 482of Lecture Notes in Computer Science. Heidelberg: Springer-Verlag, pp. 164–178.
Google Scholar
Cerquides, J. and L´opez de M`antaras, R. 1997. Proposal and empirical comparison of a parallelizable distancebased discretization method. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, D. Heckerman, H. Mannila, D. Pregibon, and R. Uthurusamy (Eds.), Menlo Park, CA: AAAI Press, pp. 139–142.
Google Scholar
Ching, J., Wong, A., and Chan, K. 1995. Class-dependent discretization for inductive learning from continuous and mixed-mode data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(7):641–651.
Google Scholar
Codrington, C.W. and Brodley, C.E. 1997. On the qualitative behavior of impurity-based splitting rules I: The minima-free property. Technical Report 97-5, School of Electrical and Computer Engineering, Purdue University.
Coppersmith, D., Hong, S.J., and Hosking, J.R.M. 1999. Partitioning nominal attributes in decision trees. Data Mining and Knowledge Discovery, 3(2):197–217.
Google Scholar
Cover, T.M. and Thomas, J.A. 1991. Elements of Information Theory. New York, N.Y.: John Wiley & Sons.
Google Scholar
Dougherty, J., Kohavi, R., and Sahami, M. 1995.Supervised and unsupervised discretization of continuous features. In Proceedings of the Twelfth International Conference on Machine Learning, A. Prieditis and S. Russell (Eds.), San Francisco, CA: Morgan Kaufmann, pp. 194–202.
Google Scholar
Elomaa, T. and Rousu, J. 1999. General and efficient multisplitting of numerical attributes. Machine Learning, 36(3):201–244.
Google Scholar
Elomaa, T. and Rousu, J. 2001. On the computational complexity of optimal multisplitting. Fundamenta Informaticae, 47(1/2):35–52.
Google Scholar
Elomaa, T. and Rousu, J. 2003. Necessary and sufficient pre-processing in numerical range discretization. Knowledge and Information Systems, 5(2):162–182.
Google Scholar
Fayyad, U.M. and Irani, K.B. 1992. On the handling of continuous-valued attributes in decision tree generation. Machine Learning, 8(1):87–102.
Article Google Scholar
Fayyad, U.M. and Irani, K.B. 1993. Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, San Francisco, CA, Morgan Kaufmann, pp. 1022–1027.
Google Scholar
Fulton, T., Kasif, S., and Salzberg, S. 1995.Efficient algorithms for finding multi-way splits for decision trees. In Proceedings of the Twelfth International Conference on Machine Learning, A. Prieditis and S. Russell (Eds.), San Francisco, CA: Morgan Kaufmann, pp. 244–251.
Google Scholar
Hardy, G.H., Littlewood, J.E., and ´olya, G. 1934. Inequalities. Cambridge, UK: Cambridge University Press.
Google Scholar
Hickey, R.J. 1996. Noise modelling and evaluating learning from examples. Artificial Intelligence, 82(1/2):157–179.
Google Scholar
Ho, K. and Scott, P. 1997. Zeta: A global method for discretization of continuous variable. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, D. Heckerman, H. Mannila, D. Pregibon, and R. Uthurusamy (Eds.), Menlo Park, CA: AAAI Press, pp. 191–194.
Google Scholar
Hong, S.J., 1997. Use of contextual information for feature ranking and discretization. IEEE Transactions on Knowledge and Data Engineering, 9(5):718–730.
Google Scholar
Kohavi, R. and Sahami, M. 1996. Error-based and entropy-based discretization of continuous features. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, E. Simoudis, J. Han, and U. M. Fayyad (Eds.), Menlo Park, CA: AAAI Press, pp. 114–119.
Google Scholar
Liu, H. and Setiono, R. 1997. Feature selction via discretization. IEEE Transactions on Knowledge and Data Engineering, 9(4):642–645.
Google Scholar
López de Màntaras,. 1991. distance-based attribute selection measure for decision tree induction. Machine Learning, 61):81–92.
Article Google Scholar
Maass, W. 1994. Efficient agnostic PAC-learning with simple hypotheses. In Proceedings of the Seventh Annual ACM Conference on Computational Learning Theory, New York, NY: ACM Press, pp. 67–75.
Google Scholar
Meyer, B. 1984. Some inequalities for elementary mean values. Mathematics of Computation, 42(1):193–194.
Google Scholar
Pfahringer, B. 1995. Compression-based discretization of continuous attributes. In Proceedings of the Twelfth International Conference on Machine Learning, A. Prieditis and S. Russell (Eds.), San Francisco, CA: Morgan Kaufmann, pp. 456–463.
Google Scholar
Provost, F., Jensen, D., and Oates, T. 1999. Efficient progressive sampling. In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, S. Chanduri and D. Madigan (Eds.), New York, ACM Press, pp. 23–32.
Google Scholar
Quinlan, J.R. 1986. Induction of decision trees. Machine Learning, 1(1):81–106.
Article Google Scholar
Quinlan, J.R. 1988. Decision trees and multi-valued attributes. In Machine Intelligence 11: Logic and the Acquisition of Knowledge, J.E. Hayes, D. Michie, and J. Richards (Eds.), Oxford, UK: Oxford University Press, pp. 305–318.
Google Scholar
Quinlan, J.R. 1993. C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.
Google Scholar
Rousu, J. 2001. Efficient range partitioning in classification learning. Ph.D. thesis, Department of Computer Science, University of Helsinki. Report A-2001-1.
Utgoff, P.E. 1989. Incremental induction of decision trees. Machine Learning, 4(2):161–186.
Article Google Scholar
Wu, X. 1996. A bayesian discretizer for real-valued attributes. Computer Journal, 39(8):688–694.
Google Scholar
Zighed, D., Rakotomalala, R., and Feschet, F. 1997. Optimal multiple intervals discretization of continuous attributes for supervised learning. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, D. Heckerman, H. Mannila, D. Pregibon, and R. Uthurusamy (Eds.), Menlo Park, CA: AAAI Press, pp. 295–298.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Helsinki, P.O. Box 26, FIN-00014, Finland
Tapio Elomaa & Juho Rousu

Authors

Tapio Elomaa
View author publications
You can also search for this author in PubMed Google Scholar
Juho Rousu
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Elomaa, T., Rousu, J. Efficient Multisplitting Revisited: Optima-Preserving Elimination of Partition Candidates . Data Mining and Knowledge Discovery 8, 97–126 (2004). https://doi.org/10.1023/B:DAMI.0000015868.85039.e6

Download citation

Issue Date: March 2004
DOI: https://doi.org/10.1023/B:DAMI.0000015868.85039.e6

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient Multisplitting Revisited: Optima-Preserving Elimination of Partition Candidates

Abstract

Access this article

Similar content being viewed by others

RealKrimp — Finding Hyperintervals that Compress with MDL for Real-Valued Data

A two-stage discretization algorithm based on information entropy

A Novel Attributes Partition Method for Decision Tree

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Efficient Multisplitting Revisited: Optima-Preserving Elimination of Partition Candidates

Abstract

Access this article

Similar content being viewed by others

RealKrimp — Finding Hyperintervals that Compress with MDL for Real-Valued Data

A two-stage discretization algorithm based on information entropy

A Novel Attributes Partition Method for Decision Tree

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation