Fundamentals of Machine Learning

Du, Ke-Lin; Swamy, M. N. S.

doi:10.1007/978-1-4471-5571-3_2

Ke-Lin Du^3,4 &
M. N. S. Swamy³

10k Accesses
9 Citations

Abstract

Learning is a fundamental capability of neural networks. Learning rules are algorithms for finding suitable weights W and/or other network parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Akaike, H. (1969). Fitting autoregressive models for prediction. Annals of the Institute of Statistical Mathematics, 21, 425–439.
Google Scholar
Akaike, H. (1970). Statistical prediction information. Annals of the Institute of Statistical Mathematics, 22, 203–217.
MATH MathSciNet Google Scholar
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19, 716–723.
MATH MathSciNet Google Scholar
Amari, S., Murata, N., Muller, K. R., Finke, M., & Yang, H. (1996). Statistical theory of overtraining-Is cross-validation asymptotically effective? In D. S. Touretzky, M. C. Mozer & M. E. Hasselmo (Eds.), Advances in neural information processing systems (Vol. 8, pp. 176–182). Cambridge, MA: MIT Press.
Google Scholar
Anthony, M., & Biggs, N. (1992). Computational learning theory. Cambridge, UK: Cambridge University Press.
MATH Google Scholar
Auer, P., Herbster, M., & Warmuth, M. K. (1996). Exponentially many local minima for single neurons. In D. S. Touretzky, M. C. Mozer & M. E. Hasselmo (Eds.), Advances in neural information processing systems (Vol. 8, pp. 316–322). Cambridge, MA: MIT Press.
Google Scholar
Babadi, B., Kalouptsidis, N., & Tarokh, V. (2010). SPARLS: The sparse RLS algorithm. IEEE Transactions on Signal Processing, 58(8), 4013–4025.
MathSciNet Google Scholar
Back, A. D., & Trappenberg, T. P. (2001). Selecting inputs for modeling using normalized higher order statistics and independent component analysis. IEEE Transactions on Neural Networks, 12(3), 612–617.
Google Scholar
Baraniuk, R. G., Cevher, V., Duarte, M. F., & Hegde, C. (2010). Model-based compressive sensing. IEEE Transactions on Information Theory, 56(4), 1982–2001.
MathSciNet Google Scholar
Barron, A. R. (1993). Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information Theory, 39(3), 930–945.
MATH MathSciNet Google Scholar
Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, 13, 834–846.
Google Scholar
Bartlett, P. L. (1998). The sample complexity of pattern classification with neural networks: The size of the weights is more important than the size of the network. IEEE Transactions on Information Theory, 44(2), 525–536.
MATH MathSciNet Google Scholar
Bartlett, P. L. (1993). Lower bounds on the Vapnik-Chervonenkis dimension of multi-layer threshold networks. In Proceedings of the 6th Annual ACM Conference on Computational Learning Theory (pp. 144–150). New York: ACM Press.
Google Scholar
Bartlett, P. L., & Maass, W. (2003). Vapnik-Chervonenkis dimension of neural nets. In M. A. Arbib (Ed.), The handbook of brain theory and neural networks (2nd ed., pp. 1188–1192). Cambridge, MA: MIT Press.
Google Scholar
Battiti, R. (1994). Using mutual information for selecting features in supervised neural net learning. IEEE Transactions on Neural Networks, 5(4), 537–550.
Google Scholar
Baum, E. B., & Wilczek, F. (1988). Supervised learning of probability distributions by neural networks. In D. Z. Anderson (Ed.), Neural information processing systems (pp. 52–61). New York: American Institute of Physics.
Google Scholar
Baum, E. B., & Haussler, D. (1989). What size net gives valid generalization? Neural Computation, 1, 151–160.
Google Scholar
Belkin, M., Niyogi, P., & Sindhwani, V. (2006). Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research, 7, 2399–2434.
MATH MathSciNet Google Scholar
Bengio, Y., & Grandvalet, Y. (2004). No unbiased estimator of the variance of \(K\)-fold cross-validation. Journal of Machine Learning Research, 5, 1089–1105.
MATH MathSciNet Google Scholar
Bernier, J. L., Ortega, J., Ros, E., Rojas, I., & Prieto, A. (2000). A quantitative study of fault tolerance, noise immunity, and generalization ability of MLPs. Neural Computation, 12, 2941–2964.
Google Scholar
Biau, G., Bunea, F., & Wegkamp, M. (2005). Functional classification in Hilbert spaces. IEEE Transactions on Information Theory, 51, 2163–2172.
MathSciNet Google Scholar
Bishop, C. M. (1995). Neural networks for pattern recognition. New York: Oxford Press.
Google Scholar
Bishop, C. M. (1995). Training with noise is equivalent to Tikhonov regularization. Neural Computation, 7(1), 108–116.
Google Scholar
Bishop, C. (2006). Pattern recognition and machine learning. New York: Springer.
MATH Google Scholar
Blum, A. L., & Rivest, R. L. (1992). Training a 3-node neural network is NP-complete. Neural Networks, 5(1), 117–127.
Google Scholar
Bousquet, O., & Elisseeff, A. (2002). Stability and Generalization. Journal of Machine Learning Research, 2, 499–526.
MATH MathSciNet Google Scholar
Cai, J.-F., Candes, E. J., & Shen, Z. (2010). A singular value thresholding algorithm for matrix completion. SIAM Journal of Optimization, 20(4), 1956–1982.
Google Scholar
Candes, E. J. (2006). Compressive sampling. In Proceedings of International Congress on Mathematicians, Madrid, Spain (Vol. 3, pp. 1433–1452).
Google Scholar
Candes, E. J., & Recht, B. (2009). Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 9, 717–772.
MATH MathSciNet Google Scholar
Caruana, R. (1997). Multitask learning. Machine Learning, 28, 41–75.
Google Scholar
Cataltepe, Z., Abu-Mostafa, Y. S., & Magdon-Ismail, M. (1999). No free lunch for early stropping. Neural Computation, 11, 995–1009.
Google Scholar
Cawley, G. C., & Talbot, N. L. C. (2007). Preventing over-fitting during model selection via Bayesian regularisation of the hyper-parameters. Journal of Machine Learning Research, 8, 841–861.
MATH Google Scholar
Cawley, G. C., & Talbot, N. L. C. (2010). On over-fitting in model selection and subsequent selection bias in performance evaluation. Journal of Machine Learning Research, 11, 2079–2107.
MATH MathSciNet Google Scholar
Chawla, N., Bowyer, K., & Kegelmeyer, W. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357.
MATH Google Scholar
Chen, D. S., & Jain, R. C. (1994). A robust backpropagation learning algorithm for function approximation. IEEE Transactions on Neural Networks, 5(3), 467–479.
Google Scholar
Chen, S. S., Donoho, D. L., & Saunders, M. A. (1999). Atomic decomposition by basis pursuit. SIAM Journal of Scientific Computing, 20(1), 33–61.
MATH MathSciNet Google Scholar
Chen, X., Wang, Z. J., & McKeown, M. J. (2010). Asymptotic analysis of robust LASSOs in the presence of noise with large variance. IEEE Transactions on Information Theory, 56(10), 5131–5149.
MathSciNet Google Scholar
Chen, Y., Gu, Y., & Hero, A. O., III. (2009). Sparse LMS for system identification. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 3125–3128). Taipei, Taiwan.
Google Scholar
Cherkassky, V., & Ma, Y. (2003). Comparison of model selection for regression. Neural Computation, 15, 1691–1714.
Google Scholar
Cherkassky, V., & Mulier, F. (2007). Learning from data (2nd ed.). New York: Wiley.
MATH Google Scholar
Cherkassky, V., & Ma, Y. (2009). Another look at statistical learning theory and regularization. Neural Networks, 22, 958–969.
Google Scholar
Chiu, C., Mehrotra, K., Mohan, C. K., & Ranka, S. (1994). Modifying training algorithms for improved fault tolerance. In Proceedings of IEEE International Conference on Neural Networks (Vol. 4, pp. 333–338).
Google Scholar
Cichocki, A., & Unbehauen, R. (1992). Neural networks for optimization and signal processing. New York: Wiley.
Google Scholar
Cover, T. M. (1965). Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Transactions on Electronic Computers, 14, 326–334.
Google Scholar
Denker, J. S., Schwartz, D., Wittner, B., Solla, S. A., Howard, R., Jackel, L., et al. (1987). Large automatic learning, rule extraction, and generalization. Complex Systems, 1, 877–922.
MATH MathSciNet Google Scholar
Dietterich, T. G., Lathrop, R. H., & Lozano-Perez, T. (1997). Solving the multiple instance problem with axis-parallel rectangles. Artificial Intelligence, 89, 31–71.
MATH Google Scholar
Domingos, P. (1999). The role of Occam’s razor in knowledge discovery. Data Mining and Knowledge Discovery, 3, 409–425.
Google Scholar
Donoho, D. L. (2006). Compressed sensing. IEEE Transactions on Information Theory, 52(4), 1289–1306.
MathSciNet Google Scholar
Donoho, D. L. (2006). For most large underdetermined systems of linear equations the minimal \(l_1\)-norm solution is also the sparsest solution. Communications on Pure and Applied Mathematics, 59, 797–829.
MATH MathSciNet Google Scholar
Donoho, D. L., Maleki, A., & Montanari, A. (2009). Message-passing algorithms for compressed sensing. Proceedings of the National Academy of Sciences of the USA, 106(45), 18914–18919.
Google Scholar
Duda, R., Hart, P., & Stork, D. (2000). Pattern classification (2nd ed.). New York: Wiley.
Google Scholar
Edwards, P. J., & Murray, A. F. (1998). Towards optimally distributed computation. Neural Computation, 10, 997–1015.
Google Scholar
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. Annals of Statistics, 32(2), 407–499.
MATH MathSciNet Google Scholar
Estevez, P. A., Tesmer, M., Perez, C. A., & Zurada, J. M. (2009). Normalized mutual information feature selection. IEEE Transactions on Neural Networks, 20(2), 189–201.
Google Scholar
Fedorov, V. V. (1972). Theory of optimal experiments. San Diego, CA: Academic Press.
Google Scholar
Freund, Y., Seung, H. S., Shamir, E., & Tishby, N. (1997). Selective sampling using the query by committee algorithm. Machine Learning, 28, 133–168.
MATH Google Scholar
Friedman, J. H., & Tukey, J. W. (1974). A projection pursuit algorithm for exploratory data analysis. IEEE Transactions on Computers, 23(9), 881–889.
MATH Google Scholar
Friedrichs, F., & Schmitt, M. (2005). On the power of Boolean computations in generalized RBF neural networks. Neurocomputing, 63, 483–498.
Google Scholar
Geman, S., Bienenstock, E., & Doursat, R. (1992). Neural networks and the bias/variance dilemma. Neural Computation, 4(1), 1–58.
Google Scholar
Genovese, C. R., Jin, J., Wasserman, L., & Yao, Z. (2012). A comparison of the lasso and marginal regression. Journal of Machine Learning Research, 13, 2107–2143.
MathSciNet Google Scholar
Ghodsi, A., & Schuurmans, D. (2003). Automatic basis selection techniques for RBF networks. Neural Networks, 16, 809–816.
Google Scholar
Gish, H. (1990). A probabilistic approach to the understanding and training of neural network classifiers. In "Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (pp. 1361–1364).
Google Scholar
Goutte, C. (1997). Note on free lunches and cross-validation. Neural Computation, 9(6), 1245–1249.
Google Scholar
Hanson, S. J., & Burr, D. J. (1988). Minkowski back-propagation: Learning in connectionist models with non-Euclidean error signals. In D. Z. Anderson (Ed.), Neural Information processing systems (pp. 348–357). New York: American Institute of Physics.
Google Scholar
Hassoun, M. H. (1995). Fundamentals of artificial neural networks. Cambridge, MA: MIT Press.
MATH Google Scholar
Hastad, J. T. (1987). Computational limitations for small depth circuits. Cambridge, MA: MIT Press.
Google Scholar
Hastad, J., & Goldmann, M. (1991). On the power of small-depth threshold circuits. Computational Complexity, 1, 113–129.
MATH MathSciNet Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2005). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). Berlin: Springer.
Google Scholar
Haussler, D. (1990). Probably approximately correct learning. In Proceedings of 8th National Conference on Artificial Intelligence (Vol. 2, pp. 1101–1108). Boston, MA.
Google Scholar
Haykin, S. (1999). Neural networks: A comprehensive foundation (2nd ed.). Upper Saddle River, NJ: Prentice Hall.
Google Scholar
Hecht-Nielsen, R. (1987). Kolmogorov’s mapping neural network existence theorem. In Proceedings of the 1st IEEE International Conference on Neural Networks (Vol. 3, pp. 11–14). San Diego, CA.
Google Scholar
Hinton, G. E. (1989). Connectionist learning procedure. Artificial Intelligence, 40, 185–234.
Google Scholar
Hinton, G. E., & van Camp, D. (1993). Keeping neural networks simple by minimizing the description length of the weights. In Proceedings of the 6th Annual ACM Conference on Computational Learning Theory (pp. 5–13). Santa Cruz, CA.
Google Scholar
Ho, K. I.-J., Leung, C.-S., & Sum, J. (2010). Convergence and objective functions of some fault/noise-injection-based online learning algorithms for RBF networks. IEEE Transactions on Neural Networks, 21(6), 938–947.
Google Scholar
Hoi, S. C. H., Jin, R., & Lyu, M. R. (2009). Batch mode active learning with applications to text categorization and image retrieval. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1233–1248.
Google Scholar
Holmstrom, L., & Koistinen, P. (1992). Using additive noise in back-propagation training. IEEE Transactions on Neural Networks, 3(1), 24–38.
Google Scholar
Huber, P. J. (1981). Robust statistics. New York: Wiley.
MATH Google Scholar
Janssen, P., Stoica, P., Soderstrom, T., & Eykhoff, P. (1988). Model structure selection for multivariable systems by cross-validation. International Journal of Control, 47, 1737–1758.
MATH MathSciNet Google Scholar
Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97, 273–324.
MATH Google Scholar
Koiran, P., & Sontag, E. D. (1996). Neural networks with quadratic VC dimension. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems (Vol. 8, pp. 197–203). Cambridge, MA: MIT Press.
Google Scholar
Kolmogorov, A. N. (1957). On the representation of continuous functions of several variables by superposition of continuous functions of one variable and addition. Doklady Akademii Nauk USSR, 114(5), 953–956.
MATH MathSciNet Google Scholar
Krogh, A., & Hertz, J. A. (1992). A simple weight decay improves generalization. In Proceedings of Neural Information and Processing Systems (NIPS) Conference (pp. 950–957). San Mateo, CA: Morgan Kaufmann.
Google Scholar
Leiva-Murillo, J. M., & Artes-Rodriguez, A. (2007). Maximization of mutual information for supervised linear feature extraction. IEEE Transactions on Neural Networks, 18(5), 1433–1441.
Google Scholar
Lin, Y., Lee, Y., & Wahba, G. (2002). Support vector machines for classification in nonstandard situations. Machine Learning, 46, 191–202.
MATH Google Scholar
Lin, D., Pitler, E., Foster, D. P., & Ungar, L. H. (2008). In defense of \(l_0\). In Proceedings of International Conference on Machine Learning: Workshop of Sparse Optimization and Variable Selection. Helsinki, Finland.
Google Scholar
Liu, E., & Temlyakov, V. N. (2012). The orthogonal super greedy algorithm and applications in compressed sensing. IEEE Transactions on Information Theory, 58(4), 2040–2047.
MathSciNet Google Scholar
Liu, Y., Starzyk, J. A., & Zhu, Z. (2008). Optimized approximation algorithm in neural networks without overfitting. IEEE Transactions on Neural Networks, 19(6), 983–995.
Google Scholar
Maass, W. (2000). On the computational power of winner-take-all. Neural Computation, 12, 2519–2535.
Google Scholar
MacKay, D. (1992). Information-based objective functions for active data selection. Neural Computation, 4(4), 590–604.
Google Scholar
Magdon-Ismail, M. (2000). No free lunch for noise prediction. Neural Computation, 12, 547–564.
Google Scholar
Markatou, M., Tian, H., Biswas, S., & Hripcsak, G. (2005). Analysis of variance of cross-validation estimators of the generalization error. Journal of Machine Learning Research, 6, 1127–1168.
MATH MathSciNet Google Scholar
Matsuoka, K., & Yi, J. (1991). Backpropagation based on the logarithmic error function and elimination of local minima. In Proceedings of the International Joint Conference on Neural Networks (pp. 1117–1122). Seattle, WA.
Google Scholar
Muller, B., Reinhardt, J., & Strickland, M. (1995). Neural networks: An introduction (2nd ed.). Berlin: Springer.
Google Scholar
Murray, A. F., & Edwards, P. J. (1994). Synaptic weight noise euring MLP training: Enhanced MLP performance and fault tolerance resulting from synaptic weight noise during training. IEEE Transactions on Neural Networks, 5(5), 792–802.
Google Scholar
Nadeau, C., & Bengio, Y. (2003). Inference for the generalization error. Machine Learning, 52, 239–281.
MATH Google Scholar
Natarajan, B. K. (1995). Sparse approximate solutions to linear systems. SIAM Journal of Computing, 24(2), 227–234.
MATH MathSciNet Google Scholar
Niyogi, P., & Girosi, F. (1999). Generalization bounds for function approximation from scattered noisy data. In Advances in computational mathematics (Vol. 10, pp. 51–80). Berlin: Springer.
Google Scholar
Nowlan, S. J., & Hinton, G. E. (1992). Simplifying neural networks by soft weight-sharing. Neural Computation, 4(4), 473–493.
Google Scholar
Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381, 607–609.
Google Scholar
Parzen, E. (1962). On estimation of a probability density function and mode. The Annals of Mathematical Statistics, 33(1), 1065–1076.
MATH MathSciNet Google Scholar
Pati, Y. C., Rezaiifar, R., & Krishnaprasad, P. S. (1993). Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition. In Proceedings of the 27th Annuual Asilomar Conference on Signals, Systems, and Computers (Vol. 1, pp. 40–44).
Google Scholar
Phatak, D. S. (1999). Relationship between fault tolerance, generalization and the Vapnik-Cervonenkis (VC) dimension of feedforward ANNs. In Proceedings of International Joint Conference on Neural Networks (Vol. 1, pp. 705–709).
Google Scholar
Picone, J. (1993). Signal modeling techniques in speech recognition. Proceedings of the IEEE, 81(9), 1215–1247.
Google Scholar
Plutowski, M. E. P. (1996). Survey: Cross-validation in theory and in practice. Research Report, Princeton, NJ: Department of Computational Science Research, David Sarnoff Research Center.
Google Scholar
Poggio, T., & Girosi, F. (1990). Networks for approximation and learning. Proceedings of the IEEE, 78(9), 1481–1497.
Google Scholar
Prechelt, L. (1998). Automatic early stopping using cross validation: Quantifying the criteria. Neural Networks, 11, 761–767.
Google Scholar
Ramsay, J., & Silverman, B. (1997). Functional data analysis. New York: Springer.
MATH Google Scholar
Reed, R., Marks, R. J., II, & Oh, S. (1995). Similarities of error regularization, sigmoid gain scaling, target smoothing, and training with jitter. IEEE Transactions on Neural Networks, 6(3), 529–538.
Google Scholar
Rimer, M., & Martinez, T. (2006). Classification-based objective functions. Machine Learning, 63(2), 183–205.
MATH Google Scholar
Rimer, M., & Martinez, T. (2006). CB3: an adaptive error function for backpropagation training. Neural Processing Letters, 24, 81–92.
Google Scholar
Ripley, B. D. (1996). Pattern recognition and neural networks. Cambridge, UK: Cambridge University Press.
MATH Google Scholar
Rissanen, J. (1978). Modeling by shortest data description. Automatica, 14(5), 465–477.
MATH Google Scholar
Rissanen, J. (1999). Hypothesis selection and testing by the MDL principle. Computer Journal, 42(4), 260–269.
MATH MathSciNet Google Scholar
Rivals, I., & Personnaz, L. (1999). On cross-validation for model selection. Neural Computation, 11(4), 863–870.
Google Scholar
Rossi, F., & Conan-Guez, B. (2005). Functional multi-layer perceptron: A non-linear tool for functional data analysis. Neural Networks, 18, 45–60.
MATH Google Scholar
Rossi, F., Delannay, N., Conan-Guez, B., & Verleysen, M. (2005). Representation of functional data in neural networks. Neurocomputing, 64, 183–210.
Google Scholar
Rossi, F., & Villa, N. (2006). Support vector machine for functional data classification. Neurocomputing, 69, 730–742.
Google Scholar
Royden, H. L. (1968). Real analysis (2nd ed.). New York: Macmillan.
Google Scholar
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representations by error propagation. In D. E. Rumelhart & J. L. McClelland (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition. Foundation (Vol. 1, pp. 318–362). Cambridge, MA: MIT Press.
Google Scholar
Rumelhart, D. E., Durbin, R., Golden, R., & Chauvin, Y. (1995). Backpropagation: The basic theory. In Y. Chauvin & D. E. Rumelhart (Eds.), Backpropagation: Theory, architecture, and applications (pp. 1–34). Hillsdale, NJ: Lawrence Erlbaum.
Google Scholar
Sabato, S., & Tishby, N. (2012). Multi-instance learning with any hypothesis class. Journal of Machine Learning Research, 13, 2999–3039.
MathSciNet Google Scholar
Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5, 197–227.
Google Scholar
Schmitt, M. (2005). On the capabilities of higher-order neurons: A radial basis function approach. Neural Computation, 17, 715–729.
Google Scholar
Schultz, W. (1998). Predictive reward signal of dopamine neurons. Journal of Neurophysiology, 80(1), 1–27.
Google Scholar
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461–464.
MATH MathSciNet Google Scholar
Shao, X., Cherkassky, V., & Li, W. (2000). Measuring the VC-dimension using optimized experimental design. Neural Computing, 12, 1969–1986.
Google Scholar
Shawe-Taylor, J. (1995). Sample sizes for sigmoidal neural networks. In Proceedings of the 8th Annual Conference on Computational Learning Theory (pp. 258–264). Santa Cruz, CA.
Google Scholar
Siegelmann, H. T., & Sontag, E. D. (1995). On the computational power of neural nets. Journal of Computer and System Sciences, 50(1), 132–150.
MATH MathSciNet Google Scholar
Silva, L. M., de Sa, J. M., & Alexandre, L. A. (2008). Data classification with multilayer perceptrons using a generalized error function. Neural Networks, 21, 1302–1310.
MATH Google Scholar
Sima, J. (1996). Back-propagation is not efficient. Neural Networks, 9(6), 1017–1023.
Google Scholar
Solla, S. A., Levin, E., & Fleisher, M. (1988). Accelerated learning in layered neural networks. Complex Systems, 2, 625–640.
MATH MathSciNet Google Scholar
Stoica, P., & Selen, Y. (2004). A review of information criterion rules. IEEE Signal Processing Magazine, 21(4), 36–47.
Google Scholar
Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society Series B, 36, 111–147.
Google Scholar
Sugiyama, M., & Ogawa, H. (2000). Incremental active learning for optimal generalization. Neural Computation, 12, 2909–2940.
Google Scholar
Sugiyama, M., & Nakajima, S. (2009). Pool-based active learning in approximate linear regression. Machine Learning, 75, 249–274.
Google Scholar
Sum, J. P.-F., Leung, C.-S., & Ho, K. I.-J. (2012). On-line node fault injection training algorithm for MLP networks: Objective function and convergence analysis. IEEE Transactions on Neural Networks and Learning Systems, 23(2), 211–222.
Google Scholar
Tabatabai, M. A., & Argyros, I. K. (1993). Robust estimation and testing for general nonlinear regression models. Applied Mathematics and Computation, 58, 85–101.
MATH MathSciNet Google Scholar
Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society Series B, 58(1), 267–288.
Google Scholar
Tikhonov, A. N. (1963). On solving incorrectly posed problems and method of regularization. Doklady Akademii Nauk USSR, 151, 501–504.
Google Scholar
Tropp, J. A. (2004). Greed is good: Algorithmic results for sparse approximation. IEEE Transactions on Information Theory, 50, 2231–2242.
MathSciNet Google Scholar
Tropp, J. A., & Gilbert, A. C. (2007). Signal recovery from random measurements via orthogonal matching pursuit. IEEE Transactions on Information Theory, 53(12), 4655–4666.
MathSciNet Google Scholar
Valiant, P. (1984). A theory of the learnable. Communications of the ACM, 27(11), 1134–1142.
Google Scholar
Vapnik, V. N., & Chervonenkis, A. J. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability & its Applications, 16, 264–280.
MATH Google Scholar
Vapnik, V. N. (1982). Estimation of dependences based on empirical data. New York: Springer.
MATH Google Scholar
Vapnik, V., Levin, E., & Le Cun, Y. (1994). Measuring the VC-dimension of a learning machine. Neural Computation, 6, 851–876.
Google Scholar
Vapnik, V. N. (1995). The nature of statistical learning theory. New York: Springer.
MATH Google Scholar
Vapnik, V. N. (1998). Statistical learning theory. New York: Wiley.
MATH Google Scholar
Wang, J., Kwon, S., & Shim, B. (2012). Generalized orthogonal matching pursuit. IEEE Transactions on Signal Processing, 60(12), 6202–6216.
MathSciNet Google Scholar
Widrow, B., & Lehr, M. A. (1990). 30 years of adaptive neural networks: Perceptron, madaline, and backpropagation. Proceedings of the IEEE, 78(9), 1415–1442.
Google Scholar
Wolpert, D. H., & Macready, W. G. (1995). No free lunch theorems for search, SFI-TR-95-02-010, Santa Fe Institute.
Google Scholar
Wu, G., & Cheng, E. (2003). Class-boundary alignment for imbalanced dataset learning. In Proceedings of ICML 2003 Workshop on Learning Imbalanced Data Sets II (pp. 49–56). Washington, DC.
Google Scholar
Xu, H., Caramanis, C., & Mannor, S. (2010). Robust regression and Lasso. IEEE Transactions on Information Theory, 56(7), 3561–3574.
MathSciNet Google Scholar
Xu, H., Caramanis, C., & Mannor, S. (2012). Sparse algorithms are not stable: A no-free-lunch theorem. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(1), 187–193.
MathSciNet Google Scholar
Yang, L., Hanneke, S., & Carbonell, J. (2013). A theory of transfer learning with applications to active learning. Machine Learning, 90(2), 161–189.
Google Scholar
Yao, A. (1985). Separating the polynomial-time hierarchy by oracles. In Proceedings of 26th Annual IEEE Symposium on Foundations Computer Science (pp. 1–10).
Google Scholar
Zahalka, J., & Zelezny, F. (2011). An experimental test of Occam’s razor in classification. Machine Learning, 82, 475–481.
Google Scholar
Zhang, C. H. (2010). Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics, 38(2), 894–942.
MATH MathSciNet Google Scholar
Zhu, H. (1996). No free lunch for cross validation. Neural Computation, 8(7), 1421–1426.
Google Scholar

Download references

Author information

Authors and Affiliations

Enjoyor Labs, Enjoyor Inc., Hangzhou, China
Ke-Lin Du & M. N. S. Swamy
Department of Electrical and Computer Engineering, Concordia University, Montreal, QC, Canada
Ke-Lin Du

Authors

Ke-Lin Du
View author publications
You can also search for this author in PubMed Google Scholar
M. N. S. Swamy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ke-Lin Du .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Du, KL., Swamy, M.N.S. (2014). Fundamentals of Machine Learning. In: Neural Networks and Statistical Learning. Springer, London. https://doi.org/10.1007/978-1-4471-5571-3_2

Download citation

DOI: https://doi.org/10.1007/978-1-4471-5571-3_2
Published: 07 December 2013
Publisher Name: Springer, London
Print ISBN: 978-1-4471-5570-6
Online ISBN: 978-1-4471-5571-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics