Finding Predictors

Berthold, Michael R.; Borgelt, Christian; Höppner, Frank; Klawonn, Frank

doi:10.1007/978-1-84882-260-3_9

Michael R. Berthold⁶,
Christian Borgelt⁷,
Frank Höppner⁸ &
…
Frank Klawonn⁹

Part of the book series: Texts in Computer Science ((TCS))

8575 Accesses

Abstract

In this chapter we consider methods of constructing predictors for class labels or numeric target attributes. However, in contrast to Chap. 8, where we discussed methods for basically the same purpose, the methods in this chapter yield models that do not help much to explain the data or even dispense with models altogether. Nevertheless, they can be useful, namely if the main goal is good prediction accuracy rather than an intuitive and interpretable model. Especially artificial neural networks and support vector machines, which we study in Sects. 9.2 and 9.3, are known to outperform other methods w.r.t. accuracy in many tasks. However, due to the abstract mathematical structure of the prediction procedure, which is usually difficult to map to the application domain, the models they yield are basically “black boxes” and almost impossible to interpret in terms of the application domain. Hence they should be considered only if a comprehensible model that can easily be checked for plausibility is not required, and high accuracy is the main concern.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Outliers for the complete data set, on the other hand, do not affect nearest-neighbor predictors much, because they can only change the prediction for data points that should not occur or should occur only very rarely (provided that the rest of the data is representative).
2.
The fold sizes may differ by one data point, to account for the fact that the total number of training examples may not be divisible by r, the number of folds.
3.
Note that this k is independent of and not to be confused with the k denoting the number of neighbors. This equivocation is an unfortunate accident, which, however, cannot be avoided without deviating from standard nomenclature.
4.
Note that for technical reasons, the threshold or bias value β of the neuronal activation function is turned into a connection weight by adding a connection to a dummy neuron emitting a permanent signal of 1, while the actual activation function has a threshold of zero.
5.
Note that with respect to the employed distance function all considerations of Sect. 7.2 can be transferred, although the Euclidean distance is the most common choice.
6.
A linear activation function also explains the term “basis” in “radial basis function,” as it allows one to interpret the output as an approximation of the desired function in the vector space spanned by the radial functions computed by the hidden neurons: the connection weights from the hidden layer to the output are the coordinates of the approximating function w.r.t. this vector space.
7.
The other reason is that without squaring the error, positive and negative errors could cancel out.
8.
Because vanilla is the standard ice cream flavor.
9.
KNIME Labs: http://labs.knime.org.

References

Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)
Google Scholar
Aurenhammer, F.: Voronoi diagrams—a survey of a fundamental geometric data structure. ACM Comput. Surv. 23(3), 345–405 (1991)
Article Google Scholar
Beckmann, N., Beckmann, H.-N., Kriegel, H.-P., Schneider, R., Seeger, B.: The R^*-tree: an efficient and robust access method for points and rectangles. In: Proc. ACM SIGMOD Conference on Management of Data (Atlantic City, NJ), pp. 322–331. ACM Press, New York (1990)
Google Scholar
Bentley, J.L.: Multidimensional divide and conquer. Commun. ACM 23(4), 214–229 (1980)
Article MATH MathSciNet Google Scholar
Blum, M., Floyd, R.W., Pratt, V., Rivest, R., Tarjan, R.: Time bounds for selection. J. Comput. Syst. Sci. 7, 448–461 (1973)
Article MATH MathSciNet Google Scholar
Breiman, L.: Stacked regressions. Mach. Learn. 24(1), 49–64 (1996)
MATH MathSciNet Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
MATH MathSciNet Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Article MATH Google Scholar
Chang, C.-L.: Finding prototypes for nearest neighbor classifiers. IEEE Trans. Comput. 23(11), 1179–1184 (1974)
Article MATH Google Scholar
Chang, C.-C., Lin, C.-L.: LIBSVM: a library for support vector machines. Manual (2001). http://www.csie.ntu.edu.tw/~cjlin/libsvm
Cleveland, W.S.: Robust locally weighted regression and smoothing scatterplots. J. Am. Stat. Assoc. 74(368), 829–836 (1979)
Article MATH MathSciNet Google Scholar
Cleveland, W.S., Devlin, S.J.: Locally-weighted regression: an approach to regression analysis by local fitting. J. Am. Stat. Assoc. 83(403), 596–610 (1988)
Article MATH Google Scholar
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)
Article MATH Google Scholar
Cormen, T.H., Stein, C., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms, 2nd edn. MIT Press/McGraw-Hill, Cambridge/New York (2001)
MATH Google Scholar
Cristianini, N., Shawe-Taylor, J.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Google Scholar
Cybenko, G.V.: Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 2, 303–314 (1989)
Article MATH MathSciNet Google Scholar
Dietterich, T.G.: Ensemble methods in machine learning. In: Proc. 1st Int. Workshop on Multiple Classifier Systems (MCS 2000, Cagliari, Italy). Lecture Notes in Computer Science, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)
Google Scholar
Freund, Y.: Boosting a weak learning algorithm by majority. In: Proc. 3rd Ann. Workshop on Computational Learning Theory (COLT’90, Rochester, NY), pp. 202–216. Morgan Kaufmann, San Mateo (1990)
Google Scholar
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)
Article MATH MathSciNet Google Scholar
Friedman, J.H., Bentley, J.L., Finkel, R.A.: An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw. 3(3), 209–226 (1977)
Article MATH Google Scholar
Haykin, S.: Neural Networks and Learning Machines. Prentice Hall, Englewood Cliffs (2008)
Google Scholar
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20, 832–644 (1998)
Article Google Scholar
Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)
MATH Google Scholar
Manolopoulos, Y., Nanopoulos, A., Papadopoulos, A.N., Theodoridis, Y.: R-Trees: Theory and Applications. Springer, Heidelberg (2005)
Google Scholar
Polikar, R.: Ensemble based systems in decision making. IEEE Circuits Syst. Mag. 6, 21–45 (2006)
Article Google Scholar
Rosenblatt, F.: The Perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65, 386–408 (1958)
Article MathSciNet Google Scholar
Rosenblatt, F.: Principles of Neurodynamics. Spartan Books, New York (1962)
MATH Google Scholar
Schapire, R.E.: Strength of weak learnability. Mach. Learn. 5, 197–227 (1990)
Google Scholar
Schapire, R.E., Singer, Y.: Improved boosting algorithms using confidence-rated predictors. Mach. Learn. 37(3), 297–336 (1999)
Article MATH Google Scholar
Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)
Google Scholar
Shakhnarovich, G., Darrel, T., Indyk, P. (eds.): Nearest Neighbor Methods in Learning and Vision: Theory and Practice. MIT Press, Cambridge (2006)
Google Scholar
Smola, A.J., Schölkopf, B.: A tutorial on support vector regression. Technical Report (2003). http://eprints.pascal-network.org/archive/00002057/01/SmoSch03b.pdf
Tropf, H., Herzog, H.: Multidimensional range search in dynamically balanced trees. Angew. Inform. 1981(2), 71–77 (1981)
Google Scholar
Voronoi, G.: Nouvelles applications des paramètres continus à la théorie des formes quadratiques. J. Reine Angew. Math. 133, 97–178 (1907)
Google Scholar
Xu, L., Amari, S.-I.: Combining classifiers and learning mixture of experts. In: Encyclopedia of Artificial Intelligence, pp. 318–326. IGI Global, Hershey (2008)
Chapter Google Scholar
Wolpert, D.: Stacked generalization. Neural Netw. 5(2), 241–259 (1992)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

FB Informatik und Informationswissenschaft, Universität Konstanz, 78457, Konstanz, Germany
Prof. Dr. Michael R. Berthold
Intelligent Data Analysis & Graphical Models Research Unit, European Centre for Soft Computing, C/ Gonzalo Gutiérrez Quirós s/n Edificio Científico-Technológico Campus Mieres, 3ª Planta, 33600, Mieres, Asturias, Spain
Dr. Christian Borgelt
FB Wirtschaft, Ostfalia University of Applied Sciences, Robert-Koch-Platz 10-14, 38440, Wolfsburg, Germany
Prof. Dr. Frank Höppner
FB Informatik, Ostfalia University of Applied Sciences, Salzdahlumer Str. 46/48, 38302, Wolfenbüttel, Germany
Prof. Dr. Frank Klawonn

Authors

Prof. Dr. Michael R. Berthold
View author publications
You can also search for this author in PubMed Google Scholar
Dr. Christian Borgelt
View author publications
You can also search for this author in PubMed Google Scholar
Prof. Dr. Frank Höppner
View author publications
You can also search for this author in PubMed Google Scholar
Prof. Dr. Frank Klawonn
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael R. Berthold .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Berthold, M.R., Borgelt, C., Höppner, F., Klawonn, F. (2010). Finding Predictors. In: Guide to Intelligent Data Analysis. Texts in Computer Science. Springer, London. https://doi.org/10.1007/978-1-84882-260-3_9

Download citation

DOI: https://doi.org/10.1007/978-1-84882-260-3_9
Publisher Name: Springer, London
Print ISBN: 978-1-84882-259-7
Online ISBN: 978-1-84882-260-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics