Abstract
In this chapter, we first explained what are classification problems and what is a decision boundary. Then, we showed how to model a decision boundary using linear models. In order to better understand the intuition behind a linear model, they were also studied from geometrical perspective. A linear model needs to be trained on a training dataset. To this end, there must be a way to assess how good is a linear model in classification of training samples. For this purpose, we thoroughly explained different loss functions including 0/1 loss, squared loss, hinge loss and logistic loss. Then, methods for extending binary models to multiclass models including one-versus-one and one-versus-rest were reviewed. It is possible to generalize a binary linear model directly into a multiclass model. This requires loss functions that can be applied on multiclass dataset. We showed how to extend hinge loss and logistic loss into multiclass datasets. The big issue with linear models is that that they perform poorly on datasets in which classes are not linearly separable. To overcome this problem, we introduced the idea of feature transformation function and applied it on a toy example. Designing a feature transformation function by hand could be a tedious task especially, when they have to be applied on high-dimensional datasets. A better solution is to learn a feature transformation function directly from training data and training a linear classifier on top of it. We developed the idea of feature transformation from simple functions to compositional functions and explained how neural networks can be used for simultaneously learning a feature transformation function together with a linear classifier. Training a complex model such as neural network requires computing gradient of loss function with respect to every parameter in the model. Computing gradients using conventional chain rule might not be tractable. We explained how to factorize a multivariate chain rule and reduce the number of arithmetic operations. Using this formulation, we explained the backpropagation algorithm for computing gradients on any computational graph. Next, we explained different activation functions that can be used in designing neural networks. We mentioned why ReLU activations are preferable over traditional activations such as hyperbolic tangent. Role of bias in neural networks is also discussed in detail. Finally, we finished the chapter by mentioning how an image can be used as the input of a neural network.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Implementations of the methods in this chapter are available at github.com/pcnn/.
- 2.
You can read this formula as “\(N_K\) of \(\mathbf x _q\) given the dataset \(\mathscr {X}\)”.
References
Clevert DA, Unterthiner T, Hochreiter S (2015) Fast and accurate deep network learning by exponential linear units (ELUs). 1997, pp 1–13. arXiv:1511.07289
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. arXiv:1502.01852
Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: ICML workshop on deep learning for audio, speech and language processing, vol 28. http://www.stanford.edu/~awni/papers/relu_hybrid_icml2013_final.pdf
Stallkamp J, Schlipsing M, Salmen J, Igel C (2012) Man vs. computer: benchmarking machine learning algorithms for traffic sign recognition. Neural Netw 32:323–332. doi:10.1016/j.neunet.2012.02.016
Xu B, Wang N, Chen T (2015) Empirical evaluation of rectified activations in convolutional network. arXiv:1505.00853v2
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Habibi Aghdam, H., Jahani Heravi, E. (2017). Pattern Classification. In: Guide to Convolutional Neural Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-57550-6_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-57550-6_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57549-0
Online ISBN: 978-3-319-57550-6
eBook Packages: Computer ScienceComputer Science (R0)