Abstract
In this chapter we consider methods of constructing predictors for class labels or numeric target attributes. However, in contrast to Chap. 8, where we discussed methods for basically the same purpose, the methods in this chapter yield models that do not help much to explain the data or even dispense with models altogether. Nevertheless, they can be useful, namely if the main goal is good prediction accuracy rather than an intuitive and interpretable model. Especially artificial neural networks and support vector machines, which we study in Sects. 9.2 and 9.3, are known to outperform other methods w.r.t. accuracy in many tasks. However, due to the abstract mathematical structure of the prediction procedure, which is usually difficult to map to the application domain, the models they yield are basically “black boxes” and almost impossible to interpret in terms of the application domain. Hence they should be considered only if a comprehensible model that can easily be checked for plausibility is not required, and high accuracy is the main concern.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Outliers for the complete data set, on the other hand, do not affect nearest-neighbor predictors much, because they can only change the prediction for data points that should not occur or should occur only very rarely (provided that the rest of the data is representative).
- 2.
The fold sizes may differ by one data point, to account for the fact that the total number of training examples may not be divisible by r, the number of folds.
- 3.
Note that this k is independent of and not to be confused with the k denoting the number of neighbors. This equivocation is an unfortunate accident, which, however, cannot be avoided without deviating from standard nomenclature.
- 4.
Note that for technical reasons, the threshold or bias value β of the neuronal activation function is turned into a connection weight by adding a connection to a dummy neuron emitting a permanent signal of 1, while the actual activation function has a threshold of zero.
- 5.
Note that with respect to the employed distance function all considerations of Sect. 7.2 can be transferred, although the Euclidean distance is the most common choice.
- 6.
A linear activation function also explains the term “basis” in “radial basis function,” as it allows one to interpret the output as an approximation of the desired function in the vector space spanned by the radial functions computed by the hidden neurons: the connection weights from the hidden layer to the output are the coordinates of the approximating function w.r.t. this vector space.
- 7.
The other reason is that without squaring the error, positive and negative errors could cancel out.
- 8.
Because vanilla is the standard ice cream flavor.
- 9.
KNIME Labs: http://labs.knime.org.
References
Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)
Aurenhammer, F.: Voronoi diagrams—a survey of a fundamental geometric data structure. ACM Comput. Surv. 23(3), 345–405 (1991)
Beckmann, N., Beckmann, H.-N., Kriegel, H.-P., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. In: Proc. ACM SIGMOD Conference on Management of Data (Atlantic City, NJ), pp. 322–331. ACM Press, New York (1990)
Bentley, J.L.: Multidimensional divide and conquer. Commun. ACM 23(4), 214–229 (1980)
Blum, M., Floyd, R.W., Pratt, V., Rivest, R., Tarjan, R.: Time bounds for selection. J. Comput. Syst. Sci. 7, 448–461 (1973)
Breiman, L.: Stacked regressions. Mach. Learn. 24(1), 49–64 (1996)
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Chang, C.-L.: Finding prototypes for nearest neighbor classifiers. IEEE Trans. Comput. 23(11), 1179–1184 (1974)
Chang, C.-C., Lin, C.-L.: LIBSVM: a library for support vector machines. Manual (2001). http://www.csie.ntu.edu.tw/~cjlin/libsvm
Cleveland, W.S.: Robust locally weighted regression and smoothing scatterplots. J. Am. Stat. Assoc. 74(368), 829–836 (1979)
Cleveland, W.S., Devlin, S.J.: Locally-weighted regression: an approach to regression analysis by local fitting. J. Am. Stat. Assoc. 83(403), 596–610 (1988)
Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)
Cormen, T.H., Stein, C., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms, 2nd edn. MIT Press/McGraw-Hill, Cambridge/New York (2001)
Cristianini, N., Shawe-Taylor, J.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Cybenko, G.V.: Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 2, 303–314 (1989)
Dietterich, T.G.: Ensemble methods in machine learning. In: Proc. 1st Int. Workshop on Multiple Classifier Systems (MCS 2000, Cagliari, Italy). Lecture Notes in Computer Science, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)
Freund, Y.: Boosting a weak learning algorithm by majority. In: Proc. 3rd Ann. Workshop on Computational Learning Theory (COLT’90, Rochester, NY), pp. 202–216. Morgan Kaufmann, San Mateo (1990)
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)
Friedman, J.H., Bentley, J.L., Finkel, R.A.: An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw. 3(3), 209–226 (1977)
Haykin, S.: Neural Networks and Learning Machines. Prentice Hall, Englewood Cliffs (2008)
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20, 832–644 (1998)
Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)
Manolopoulos, Y., Nanopoulos, A., Papadopoulos, A.N., Theodoridis, Y.: R-Trees: Theory and Applications. Springer, Heidelberg (2005)
Polikar, R.: Ensemble based systems in decision making. IEEE Circuits Syst. Mag. 6, 21–45 (2006)
Rosenblatt, F.: The Perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65, 386–408 (1958)
Rosenblatt, F.: Principles of Neurodynamics. Spartan Books, New York (1962)
Schapire, R.E.: Strength of weak learnability. Mach. Learn. 5, 197–227 (1990)
Schapire, R.E., Singer, Y.: Improved boosting algorithms using confidence-rated predictors. Mach. Learn. 37(3), 297–336 (1999)
Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)
Shakhnarovich, G., Darrel, T., Indyk, P. (eds.): Nearest Neighbor Methods in Learning and Vision: Theory and Practice. MIT Press, Cambridge (2006)
Smola, A.J., Schölkopf, B.: A tutorial on support vector regression. Technical Report (2003). http://eprints.pascal-network.org/archive/00002057/01/SmoSch03b.pdf
Tropf, H., Herzog, H.: Multidimensional range search in dynamically balanced trees. Angew. Inform. 1981(2), 71–77 (1981)
Voronoi, G.: Nouvelles applications des paramètres continus à la théorie des formes quadratiques. J. Reine Angew. Math. 133, 97–178 (1907)
Xu, L., Amari, S.-I.: Combining classifiers and learning mixture of experts. In: Encyclopedia of Artificial Intelligence, pp. 318–326. IGI Global, Hershey (2008)
Wolpert, D.: Stacked generalization. Neural Netw. 5(2), 241–259 (1992)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2010 Springer-Verlag London Limited
About this chapter
Cite this chapter
Berthold, M.R., Borgelt, C., Höppner, F., Klawonn, F. (2010). Finding Predictors. In: Guide to Intelligent Data Analysis. Texts in Computer Science. Springer, London. https://doi.org/10.1007/978-1-84882-260-3_9
Download citation
DOI: https://doi.org/10.1007/978-1-84882-260-3_9
Publisher Name: Springer, London
Print ISBN: 978-1-84882-259-7
Online ISBN: 978-1-84882-260-3
eBook Packages: Computer ScienceComputer Science (R0)