Beyond ImageNet: Deep Learning in Industrial Practice

Stadelmann, Thilo; Tolkachev, Vasily; Sick, Beate; Stampfli, Jan; Dürr, Oliver

doi:10.1007/978-3-030-11821-1_12

Thilo Stadelmann⁴,
Vasily Tolkachev⁵,
Beate Sick⁴,
Jan Stampfli⁴ &
…
Oliver Dürr⁶

4718 Accesses
13 Citations

Abstract

Deep learning (DL) methods have gained considerable attention since 2014. In this chapter we briefly review the state of the art in DL and then give several examples of applications from diverse areas of application. We will focus on convolutional neural networks (CNNs), which have since the seminal work of Krizhevsky et al. (ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25, pp. 1097–1105, 2012) revolutionized image classification and even started surpassing human performance on some benchmark data sets (Ciresan et al., Multi-column deep neural network for traffic sign classification, 2012a; He et al., Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. CoRR, Vol. 1502.01852, 2015a). While deep neural networks have become popular primarily for image classification tasks, they can also be successfully applied to other areas and problems with some local structure in the data. We will first present a classical application of CNNs on image-like data, in particular, phenotype classification of cells based on their morphology, and then extend the task to clustering voices based on their spectrograms. Next, we will describe DL applications to semantic segmentation of newspaper pages into their corresponding articles based on clues in the pixels, and outlier detection in a predictive maintenance setting. We conclude by giving advice on how to work with DL having limited resources (e.g., training data).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Soft Computing Based Approach for Pixel Labelling on 2D Images Using Fine Tuned R-CNN

Convolutional Neural Networks-An Extensive arena of Deep Learning. A Comprehensive Study

Article 16 February 2021

Deep Learning and its Applications: A Real-World Perspective

References

Aucouturier, J.-J., Defreville, B., & Pachet, F. (2007). The bag-of-frames approach to audio pattern recognition: A sufficient model for urban soundscapes but not for polyphonic music. The Journal of the Acoustical Society of America, 122(2), 881–891.
Article Google Scholar
Beigi, H. (2011). Fundamentals of speaker recognition. Springer Science & Business Media.
Google Scholar
Bersimis, S., Psarakis, S., & Panaretos, J. (2007). Multivariate statistical process control charts: An overview. Quality and Reliability Engineering International, 23, 517–543.
Article Google Scholar
Bishop, C. M. (2006). Pattern recognition and machine learning. Springer.
Google Scholar
Bouma, G. (2009). Normalized (pointwise) mutual information in collocation extraction. In: From form to meaning: Processing texts automatically, Proceedings of the Biennial GSCL Conference 2009 (pp. 31–40). https://svn.spraakdata.gu.se/repos/gerlof/pub/www/Docs/npmi-pfd.pdf
Chung, J. S., Senior, A. W., Vinyals, O., & Zisserman, A. (2016). Lip reading sentences in the wild. CoRR, Vol. 1611.05358. http://arxiv.org/abs/1611.05358
Ciresan, D., Meier, U., Masci, J., & Schmidhuber, J. (2012a). Multi-column deep neural network for traffic sign classification. http://people.idsia.ch/~juergen/nn2012traffic.pdf
Ciresan, D., Giusti, A., Gambardella, L. M., & Schmidhuber, J. (2012b). Deep neural networks segment neuronal membranes in electron microscopy images. Advances in Neural Information Processing Systems, 25, 2843–2851.
Google Scholar
Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., & Ouellet, P. (2011). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 788–798.
Article Google Scholar
Dieleman, S., & Schrauwen, B. (2014). End-to-end learning for music audio. In Proceedings of ICASSP (pp. 6964–6968).
Google Scholar
Domingos, P. (2012). A few useful things to know about machine learning. Communications of the ACM, 55(10), 78–87.
Article Google Scholar
Dürr, O., Duval, F., Nichols, A., Lang, P., Brodte, A., Heyse, S., & Besson, D. (2007). Robust hit identification by quality assurance and multivariate data analysis of a high-content, cell-based assay. Journal of Biomolecular Screening, 12(8), 1042–1049.
Article Google Scholar
Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96) (pp. 226–231) AAAI Press.
Google Scholar
Fernández-Francos, D., Martínez-Rego, D., Fontenla-Romero, O., & Alonso-Betanzos, A. (2013). Automatic bearing fault diagnosis based on one-class ν-SVM. Computers & Industrial Engineering, 64(1), 357–365.
Article Google Scholar
Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4), 193–202.
Article Google Scholar
Ganchev, T., Fakotakis, N., & Kokkinakis, G. (2005). Comparative evaluation of various MFCC implementations on the speaker verification task. In Proceedings of SPECOM 2005 (Vol. 1, pp. 191–194).
Google Scholar
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press. http://www.deeplearningbook.org.
Gustafsdottir, S. M., Ljosa, V., Sokolnicki, K. L., Wilson, J. A., Walpita, D., Kemp, M. M., Petri Seiler, K., Carrel, H. A., Golub, T. R., Schreiber, S. L., Clemons, P. A., Carpenter, A. E., & Shamji, A. F. (2013). Multiplex cytological profiling assay to measure diverse cellular states. PLoS One, 12, e80999.
Article Google Scholar
Han, S., Mao, H., & Dally, W. J. (2015). Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. CoRR, Vol. 1510.00149. https://arxiv.org/abs/1510.00149
He, K., Zhang, X., Ren, S., & Sun, J. (2015a). Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. CoRR, Vol. 1502.01852. http://arxiv.org/abs/1502.01852
He, K., Zhang, X., Ren, S., & Sun, J. (2015b). Deep residual learning for image recognition. CoRR, Vol. 1512.03385. https://arxiv.org/abs/1512.03385
Hinton, G. E., Srivastava, N., & Swersky, K. (2012). Lecture 6a: Overview of mini-batch gradient descent. In Neural Networks for Machine Learning, University of Toronto. https://www.coursera.org/learn/neural-networks
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
Article Google Scholar
Hsu, Y.-C., & Kira, Z. (2015). Neural network-based clustering using pairwise constraints. CoRR, Vol. 1511.06321. https://arxiv.org/abs/1501.03084
Hubel, D. H., & Wiesel, T. N. (1959). Receptive fields of single neurons in the cat’s striate cortex. Journal of Physiology, 148, 574–591.
Article Google Scholar
Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning (Vol. 37, pp. 448–456). https://arxiv.org/pdf/1502.03167.
Google Scholar
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. CoRR, Vol. 1412.6980. http://arxiv.org/abs/1412.6980
Kingma, D. P., & Welling, M. (2013). Auto-encoding variational Bayes. CoRR, Vol. 1312.6114. https://arxiv.org/abs/1312.6114
Kingma, D. P., Mohamed, S., Rezende, D. J., & Welling, M. (2014). Semi-supervised learning with deep generative models. In Advances in Neural Information Processing Systems (pp. 3581–3589). https://papers.nips.cc/paper/5352-semi-supervised-learning-with-deep-generative-models
Kotti, M., Moschou, V., & Kotropoulos, C. (2008). Speaker segmentation and clustering. Signal Processing, 88(5), 1091–1124.
Article Google Scholar
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25, 1097–1105.
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998a). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
Article Google Scholar
LeCun, Y., Bottou, L., Orr, G. B., & Mueller, K.-R. (1998b). Efficient BackProp. In G. B. Orr, & K.-R. Mueller (Eds.), Neural networks: Tricks of the trade, Lecture Notes in Computer Science (Vol. 1524, pp. 9–50).
Google Scholar
LeCun, Y., Bengio, Y., & Hinton, G. E. (2015). Deep learning. Nature, 521, 436–444.
Article Google Scholar
Lee, J., Qiu, H., Yu, G., & Lin, J. (2007). Bearing data set. IMS, University of Cincinnati, NASA Ames Prognostics Data Repository, Rexnord Technical Services. https://ti.arc.nasa.gov/tech/dash/pcoe/prognostic-data-repository/
Ljosa, V., Sokolnicki, K. L., & Carpenter, A. E. (2009). Annotated high-throughput microscopy image sets for validation. Nature Methods, 9, 637.
Article Google Scholar
Long, J., Shelhamer, E., & Darrell, T. (2014). Fully convolutional networks for semantic segmentation. https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdf
Lukic, Y. X., Vogt, C., Dürr, O., & Stadelmann, T. (2016). Speaker identification and clustering using convolutional neural networks. In Proceedings of IEEE MLSP 2016.
Google Scholar
Lukic, Y. X., Vogt, C., Dürr, O., & Stadelmann, T. (2017). Learning embeddings for speaker clustering based on voice quality. In Proceedings of IEEE MLSP 2017.
Google Scholar
MacQueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability (Vol. 1, pp. 281–297). Berkeley: University of California Press.
Google Scholar
Meier, B., Stadelmann, T., Stampfli, J., Arnold, M., & Cieliebak, M. (2017). Fully convolutional neural networks for newspaper article segmentation. In Proceedings of ICDAR 2017.
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems (pp. 3111–3119). https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf
Mitchell, T. M. (1980). The need for biases in learning generalizations. Technical Report, Rutgers University, New Brunswick, NJ. http://www.cs.nott.ac.uk/~pszbsl/G52HPA/articles/Mitchell:80a.pdf
Moravcik, M., Schmid, M., Burch, N., Lisy, V., Morrill, D., Bard, N., Davis, T., Waugh, K., Johanson, M., & Bowling, M. H. (2017). DeepStack: Expert-level artificial intelligence in no-limit poker, CoRR, Vol. 1701.01724. http://arxiv.org/abs/1701.01724
Mori, S., Nishida, H., & Yamada, H. (1999). Optical character recognition. New York, NY: Wiley. ISBN 0471308196.
Google Scholar
Ng, A. (2016). Nuts and bolts of building AI applications using deep learning. NIPS Tutorial.
Google Scholar
Ng, A. (2019, in press). Machine learning yearning. http://www.mlyearning.org/
Nielsen, M. A. (2015). Neural networks and deep learning. Determination Press. http://neuralnetworksanddeeplearning.com.
Nielsen, F. A. (2017). Status on human vs. machines, post on “Finn Årup Nielsen’s blog”. https://finnaarupnielsen.wordpress.com/2015/03/15/status-on-human-vs-machines/
Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359. http://ieeexplore.ieee.org/abstract/document/5288526/.
Article Google Scholar
Pimentel, M. A. F., Clifton, D. A., Clifton, L., & Tarassenko, L. (2014). A review of novelty detection. Signal Processing, 215–249.
Google Scholar
Randall, R. B., & Antoni, J. (2011). Rolling element bearing diagnostics—A tutorial. Mechanical Systems and Signal Processing, 25(2), 485–520.
Article Google Scholar
Razavian, A. S., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). CNN features off-the-shelf: An astounding baseline for recognition. CVPR 2014 (pp. 806–813). https://arxiv.org/abs/1403.6382
Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3(1), 72–83.
Article Google Scholar
Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10(1), 19–41.
Article Google Scholar
Romanov, A., & Rumshisky, A. (2017). Forced to learn: Discovering disentangled representations without exhaustive labels. ICRL 2017. https://openreview.net/pdf?id=SkCmfeSFg
Rosenblatt, F. (1957). The perceptron – A perceiving and recognizing automaton. Technical report 85-460-1, Cornell Aeronautical Laboratory.
Google Scholar
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1988). Learning representations by back-propagating errors. In Neurocomputing: Foundations of Research (pp. 696–699). MIT Press. http://www.cs.toronto.edu/~hinton/absps/naturebp.pdf
Schmidhuber, J. (2014). Deep learning in neural networks: An overview. https://arxiv.org/abs/1404.7828
Schölkopf, B., & Smola, A. J. (2002). Learning with kernels: Support vector machines, regularization, optimization, and beyond. Cambridge, MA: MIT Press.
Google Scholar
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., & Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529, 484–503.
Article Google Scholar
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. CoRR, vol. 1409.1556. https://arxiv.org/abs/1409.1556
Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1), 1929–1958.
MathSciNet MATH Google Scholar
Stadelmann, T., & Freisleben, B. (2009). Unfolding speaker clustering potential: A biomimetic approach. In Proceedings of the 17th ACM International Conference on Multimedia (pp. 185–194). ACM.
Google Scholar
Stadelmann, T., Musy, T., Duerr, O., & Eyyi, G. (2016). Machine learning-style experimental evaluation of classic condition monitoring approaches on CWRU data. Technical report, ZHAW Datalab (unpublished).
Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2014). Going deeper with convolutions. CoRR, Vol. 1409.4842. https://arxiv.org/abs/1409.4842
Szeliski, R. (2010). Computer vision: Algorithms and applications. Texts in Computer Science. New York: Springer. http://szeliski.org/Book/.
MATH Google Scholar
van der Maaten, L., & Hinton, G. E. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(Nov), 2579–2605.
Google Scholar
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., & Manzagol, P.-A. (2010). Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11, 3371–3408.
MathSciNet MATH Google Scholar
Weyand, T., Kostrikov I., & Philbin, J. (2016). PlaNet – Photo geolocation with convolutional neural networks. CoRR, Vol. 1602.05314. http://arxiv.org/abs/1602.05314
Xiong, W., Droppo, J., Huang, X., Seide, F., Seltzer, M., Stolcke, A., Yu, D., & Zweig, G. (2016). Achieving human parity in conversational speech recognition. CoRR, Vol. 1610.05256. http://arxiv.org/abs/1610.05256
Zagoruyko, S., & Komodakis, N. (2016). Wide residual networks. In Wilson, R. C., Hancock, E. R., & Smith, W. A. P. (Eds.), Proceedings of the British Machine Vision Conference (BMVC) (pp. 87.1–87. 12. BMVA Press.
Google Scholar
Zeiler, M. D. (2012). ADADELTA: An adaptive learning rate method. CoRR, Vol. 1212.5701. http://arxiv.org/abs/1212.5701
Zheng, F., Zhang, G., & Song, Z. (2001). Comparison of different implementations of MFCC. Journal of Computer Science and Technology, 16(6), 582–589.
Article Google Scholar

Download references

Acknowledgments

The authors are grateful for the support by CTI grants 17719.1 PFES-ES, 17729.1 PFES-ES, and 19139.1 PFES-ES.

Author information

Authors and Affiliations

ZHAW Zurich University of Applied Sciences, Winterthur, Switzerland
Thilo Stadelmann, Beate Sick & Jan Stampfli
University of Bern, Bern, Switzerland
Vasily Tolkachev
HTWG Konstanz - University of Applied Sciences, Konstanz, Germany
Oliver Dürr

Authors

Thilo Stadelmann
View author publications
You can also search for this author in PubMed Google Scholar
Vasily Tolkachev
View author publications
You can also search for this author in PubMed Google Scholar
Beate Sick
View author publications
You can also search for this author in PubMed Google Scholar
Jan Stampfli
View author publications
You can also search for this author in PubMed Google Scholar
Oliver Dürr
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thilo Stadelmann .

Editor information

Editors and Affiliations

Inst. of Applied Information Technology, ZHAW Zurich University of Applied Sciences, Winterthur, Switzerland
Martin Braschler
Inst. of Applied Information Technology, ZHAW Zurich University of Applied Sciences, Winterthur, Switzerland
Thilo Stadelmann
Inst. of Applied Information Technology, ZHAW Zurich University of Applied Sciences, Winterthur, Switzerland
Kurt Stockinger

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Stadelmann, T., Tolkachev, V., Sick, B., Stampfli, J., Dürr, O. (2019). Beyond ImageNet: Deep Learning in Industrial Practice. In: Braschler, M., Stadelmann, T., Stockinger, K. (eds) Applied Data Science. Springer, Cham. https://doi.org/10.1007/978-3-030-11821-1_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-11821-1_12
Published: 14 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-11820-4
Online ISBN: 978-3-030-11821-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Beyond ImageNet: Deep Learning in Industrial Practice

Abstract

Access this chapter

Preview

Similar content being viewed by others

A Soft Computing Based Approach for Pixel Labelling on 2D Images Using Fine Tuned R-CNN

Convolutional Neural Networks-An Extensive arena of Deep Learning. A Comprehensive Study

Deep Learning and its Applications: A Real-World Perspective

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Beyond ImageNet: Deep Learning in Industrial Practice

Abstract

Access this chapter

Preview

Similar content being viewed by others

A Soft Computing Based Approach for Pixel Labelling on 2D Images Using Fine Tuned R-CNN

Convolutional Neural Networks-An Extensive arena of Deep Learning. A Comprehensive Study

Deep Learning and its Applications: A Real-World Perspective

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation