Skip to main content

Part of the book series: Texts in Computer Science ((TCS))

  • 8575 Accesses

Abstract

In this chapter we consider methods of constructing predictors for class labels or numeric target attributes. However, in contrast to Chap. 8, where we discussed methods for basically the same purpose, the methods in this chapter yield models that do not help much to explain the data or even dispense with models altogether. Nevertheless, they can be useful, namely if the main goal is good prediction accuracy rather than an intuitive and interpretable model. Especially artificial neural networks and support vector machines, which we study in Sects. 9.2 and 9.3, are known to outperform other methods w.r.t. accuracy in many tasks. However, due to the abstract mathematical structure of the prediction procedure, which is usually difficult to map to the application domain, the models they yield are basically “black boxes” and almost impossible to interpret in terms of the application domain. Hence they should be considered only if a comprehensible model that can easily be checked for plausibility is not required, and high accuracy is the main concern.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Outliers for the complete data set, on the other hand, do not affect nearest-neighbor predictors much, because they can only change the prediction for data points that should not occur or should occur only very rarely (provided that the rest of the data is representative).

  2. 2.

    The fold sizes may differ by one data point, to account for the fact that the total number of training examples may not be divisible by r, the number of folds.

  3. 3.

    Note that this k is independent of and not to be confused with the k denoting the number of neighbors. This equivocation is an unfortunate accident, which, however, cannot be avoided without deviating from standard nomenclature.

  4. 4.

    Note that for technical reasons, the threshold or bias value β of the neuronal activation function is turned into a connection weight by adding a connection to a dummy neuron emitting a permanent signal of 1, while the actual activation function has a threshold of zero.

  5. 5.

    Note that with respect to the employed distance function all considerations of Sect. 7.2 can be transferred, although the Euclidean distance is the most common choice.

  6. 6.

    A linear activation function also explains the term “basis” in “radial basis function,” as it allows one to interpret the output as an approximation of the desired function in the vector space spanned by the radial functions computed by the hidden neurons: the connection weights from the hidden layer to the output are the coordinates of the approximating function w.r.t. this vector space.

  7. 7.

    The other reason is that without squaring the error, positive and negative errors could cancel out.

  8. 8.

    Because vanilla is the standard ice cream flavor.

  9. 9.

    KNIME Labs: http://labs.knime.org.

References

  1. Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)

    Google Scholar 

  2. Aurenhammer, F.: Voronoi diagrams—a survey of a fundamental geometric data structure. ACM Comput. Surv. 23(3), 345–405 (1991)

    Article  Google Scholar 

  3. Beckmann, N., Beckmann, H.-N., Kriegel, H.-P., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. In: Proc. ACM SIGMOD Conference on Management of Data (Atlantic City, NJ), pp. 322–331. ACM Press, New York (1990)

    Google Scholar 

  4. Bentley, J.L.: Multidimensional divide and conquer. Commun. ACM 23(4), 214–229 (1980)

    Article  MATH  MathSciNet  Google Scholar 

  5. Blum, M., Floyd, R.W., Pratt, V., Rivest, R., Tarjan, R.: Time bounds for selection. J. Comput. Syst. Sci. 7, 448–461 (1973)

    Article  MATH  MathSciNet  Google Scholar 

  6. Breiman, L.: Stacked regressions. Mach. Learn. 24(1), 49–64 (1996)

    MATH  MathSciNet  Google Scholar 

  7. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)

    MATH  MathSciNet  Google Scholar 

  8. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)

    Article  MATH  Google Scholar 

  9. Chang, C.-L.: Finding prototypes for nearest neighbor classifiers. IEEE Trans. Comput. 23(11), 1179–1184 (1974)

    Article  MATH  Google Scholar 

  10. Chang, C.-C., Lin, C.-L.: LIBSVM: a library for support vector machines. Manual (2001). http://www.csie.ntu.edu.tw/~cjlin/libsvm

  11. Cleveland, W.S.: Robust locally weighted regression and smoothing scatterplots. J. Am. Stat. Assoc. 74(368), 829–836 (1979)

    Article  MATH  MathSciNet  Google Scholar 

  12. Cleveland, W.S., Devlin, S.J.: Locally-weighted regression: an approach to regression analysis by local fitting. J. Am. Stat. Assoc. 83(403), 596–610 (1988)

    Article  MATH  Google Scholar 

  13. Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)

    Article  MATH  Google Scholar 

  14. Cormen, T.H., Stein, C., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms, 2nd edn. MIT Press/McGraw-Hill, Cambridge/New York (2001)

    MATH  Google Scholar 

  15. Cristianini, N., Shawe-Taylor, J.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)

    Google Scholar 

  16. Cybenko, G.V.: Approximation by superpositions of a sigmoidal function. Math. Control Signals Syst. 2, 303–314 (1989)

    Article  MATH  MathSciNet  Google Scholar 

  17. Dietterich, T.G.: Ensemble methods in machine learning. In: Proc. 1st Int. Workshop on Multiple Classifier Systems (MCS 2000, Cagliari, Italy). Lecture Notes in Computer Science, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)

    Google Scholar 

  18. Freund, Y.: Boosting a weak learning algorithm by majority. In: Proc. 3rd Ann. Workshop on Computational Learning Theory (COLT’90, Rochester, NY), pp. 202–216. Morgan Kaufmann, San Mateo (1990)

    Google Scholar 

  19. Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  20. Friedman, J.H., Bentley, J.L., Finkel, R.A.: An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw. 3(3), 209–226 (1977)

    Article  MATH  Google Scholar 

  21. Haykin, S.: Neural Networks and Learning Machines. Prentice Hall, Englewood Cliffs (2008)

    Google Scholar 

  22. Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20, 832–644 (1998)

    Article  Google Scholar 

  23. Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)

    MATH  Google Scholar 

  24. Manolopoulos, Y., Nanopoulos, A., Papadopoulos, A.N., Theodoridis, Y.: R-Trees: Theory and Applications. Springer, Heidelberg (2005)

    Google Scholar 

  25. Polikar, R.: Ensemble based systems in decision making. IEEE Circuits Syst. Mag. 6, 21–45 (2006)

    Article  Google Scholar 

  26. Rosenblatt, F.: The Perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65, 386–408 (1958)

    Article  MathSciNet  Google Scholar 

  27. Rosenblatt, F.: Principles of Neurodynamics. Spartan Books, New York (1962)

    MATH  Google Scholar 

  28. Schapire, R.E.: Strength of weak learnability. Mach. Learn. 5, 197–227 (1990)

    Google Scholar 

  29. Schapire, R.E., Singer, Y.: Improved boosting algorithms using confidence-rated predictors. Mach. Learn. 37(3), 297–336 (1999)

    Article  MATH  Google Scholar 

  30. Schölkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)

    Google Scholar 

  31. Shakhnarovich, G., Darrel, T., Indyk, P. (eds.): Nearest Neighbor Methods in Learning and Vision: Theory and Practice. MIT Press, Cambridge (2006)

    Google Scholar 

  32. Smola, A.J., Schölkopf, B.: A tutorial on support vector regression. Technical Report (2003). http://eprints.pascal-network.org/archive/00002057/01/SmoSch03b.pdf

  33. Tropf, H., Herzog, H.: Multidimensional range search in dynamically balanced trees. Angew. Inform. 1981(2), 71–77 (1981)

    Google Scholar 

  34. Voronoi, G.: Nouvelles applications des paramètres continus à la théorie des formes quadratiques. J. Reine Angew. Math. 133, 97–178 (1907)

    Google Scholar 

  35. Xu, L., Amari, S.-I.: Combining classifiers and learning mixture of experts. In: Encyclopedia of Artificial Intelligence, pp. 318–326. IGI Global, Hershey (2008)

    Chapter  Google Scholar 

  36. Wolpert, D.: Stacked generalization. Neural Netw. 5(2), 241–259 (1992)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael R. Berthold .

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag London Limited

About this chapter

Cite this chapter

Berthold, M.R., Borgelt, C., Höppner, F., Klawonn, F. (2010). Finding Predictors. In: Guide to Intelligent Data Analysis. Texts in Computer Science. Springer, London. https://doi.org/10.1007/978-1-84882-260-3_9

Download citation

  • DOI: https://doi.org/10.1007/978-1-84882-260-3_9

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84882-259-7

  • Online ISBN: 978-1-84882-260-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics