Skip to main content

Beyond ImageNet: Deep Learning in Industrial Practice

  • Chapter
  • First Online:
Applied Data Science

Abstract

Deep learning (DL) methods have gained considerable attention since 2014. In this chapter we briefly review the state of the art in DL and then give several examples of applications from diverse areas of application. We will focus on convolutional neural networks (CNNs), which have since the seminal work of Krizhevsky et al. (ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25, pp. 1097–1105, 2012) revolutionized image classification and even started surpassing human performance on some benchmark data sets (Ciresan et al., Multi-column deep neural network for traffic sign classification, 2012a; He et al., Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. CoRR, Vol. 1502.01852, 2015a). While deep neural networks have become popular primarily for image classification tasks, they can also be successfully applied to other areas and problems with some local structure in the data. We will first present a classical application of CNNs on image-like data, in particular, phenotype classification of cells based on their morphology, and then extend the task to clustering voices based on their spectrograms. Next, we will describe DL applications to semantic segmentation of newspaper pages into their corresponding articles based on clues in the pixels, and outlier detection in a predictive maintenance setting. We conclude by giving advice on how to work with DL having limited resources (e.g., training data).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  • Aucouturier, J.-J., Defreville, B., & Pachet, F. (2007). The bag-of-frames approach to audio pattern recognition: A sufficient model for urban soundscapes but not for polyphonic music. The Journal of the Acoustical Society of America, 122(2), 881–891.

    Article  Google Scholar 

  • Beigi, H. (2011). Fundamentals of speaker recognition. Springer Science & Business Media.

    Google Scholar 

  • Bersimis, S., Psarakis, S., & Panaretos, J. (2007). Multivariate statistical process control charts: An overview. Quality and Reliability Engineering International, 23, 517–543.

    Article  Google Scholar 

  • Bishop, C. M. (2006). Pattern recognition and machine learning. Springer.

    Google Scholar 

  • Bouma, G. (2009). Normalized (pointwise) mutual information in collocation extraction. In: From form to meaning: Processing texts automatically, Proceedings of the Biennial GSCL Conference 2009 (pp. 31–40). https://svn.spraakdata.gu.se/repos/gerlof/pub/www/Docs/npmi-pfd.pdf

  • Chung, J. S., Senior, A. W., Vinyals, O., & Zisserman, A. (2016). Lip reading sentences in the wild. CoRR, Vol. 1611.05358. http://arxiv.org/abs/1611.05358

  • Ciresan, D., Meier, U., Masci, J., & Schmidhuber, J. (2012a). Multi-column deep neural network for traffic sign classification. http://people.idsia.ch/~juergen/nn2012traffic.pdf

  • Ciresan, D., Giusti, A., Gambardella, L. M., & Schmidhuber, J. (2012b). Deep neural networks segment neuronal membranes in electron microscopy images. Advances in Neural Information Processing Systems, 25, 2843–2851.

    Google Scholar 

  • Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., & Ouellet, P. (2011). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 788–798.

    Article  Google Scholar 

  • Dieleman, S., & Schrauwen, B. (2014). End-to-end learning for music audio. In Proceedings of ICASSP (pp. 6964–6968).

    Google Scholar 

  • Domingos, P. (2012). A few useful things to know about machine learning. Communications of the ACM, 55(10), 78–87.

    Article  Google Scholar 

  • Dürr, O., Duval, F., Nichols, A., Lang, P., Brodte, A., Heyse, S., & Besson, D. (2007). Robust hit identification by quality assurance and multivariate data analysis of a high-content, cell-based assay. Journal of Biomolecular Screening, 12(8), 1042–1049.

    Article  Google Scholar 

  • Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96) (pp. 226–231) AAAI Press.

    Google Scholar 

  • Fernández-Francos, D., Martínez-Rego, D., Fontenla-Romero, O., & Alonso-Betanzos, A. (2013). Automatic bearing fault diagnosis based on one-class ν-SVM. Computers & Industrial Engineering, 64(1), 357–365.

    Article  Google Scholar 

  • Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4), 193–202.

    Article  Google Scholar 

  • Ganchev, T., Fakotakis, N., & Kokkinakis, G. (2005). Comparative evaluation of various MFCC implementations on the speaker verification task. In Proceedings of SPECOM 2005 (Vol. 1, pp. 191–194).

    Google Scholar 

  • Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press. http://www.deeplearningbook.org.

  • Gustafsdottir, S. M., Ljosa, V., Sokolnicki, K. L., Wilson, J. A., Walpita, D., Kemp, M. M., Petri Seiler, K., Carrel, H. A., Golub, T. R., Schreiber, S. L., Clemons, P. A., Carpenter, A. E., & Shamji, A. F. (2013). Multiplex cytological profiling assay to measure diverse cellular states. PLoS One, 12, e80999.

    Article  Google Scholar 

  • Han, S., Mao, H., & Dally, W. J. (2015). Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. CoRR, Vol. 1510.00149. https://arxiv.org/abs/1510.00149

  • He, K., Zhang, X., Ren, S., & Sun, J. (2015a). Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. CoRR, Vol. 1502.01852. http://arxiv.org/abs/1502.01852

  • He, K., Zhang, X., Ren, S., & Sun, J. (2015b). Deep residual learning for image recognition. CoRR, Vol. 1512.03385. https://arxiv.org/abs/1512.03385

  • Hinton, G. E., Srivastava, N., & Swersky, K. (2012). Lecture 6a: Overview of mini-batch gradient descent. In Neural Networks for Machine Learning, University of Toronto. https://www.coursera.org/learn/neural-networks

  • Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.

    Article  Google Scholar 

  • Hsu, Y.-C., & Kira, Z. (2015). Neural network-based clustering using pairwise constraints. CoRR, Vol. 1511.06321. https://arxiv.org/abs/1501.03084

  • Hubel, D. H., & Wiesel, T. N. (1959). Receptive fields of single neurons in the cat’s striate cortex. Journal of Physiology, 148, 574–591.

    Article  Google Scholar 

  • Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning (Vol. 37, pp. 448–456). https://arxiv.org/pdf/1502.03167.

    Google Scholar 

  • Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. CoRR, Vol. 1412.6980. http://arxiv.org/abs/1412.6980

  • Kingma, D. P., & Welling, M. (2013). Auto-encoding variational Bayes. CoRR, Vol. 1312.6114. https://arxiv.org/abs/1312.6114

  • Kingma, D. P., Mohamed, S., Rezende, D. J., & Welling, M. (2014). Semi-supervised learning with deep generative models. In Advances in Neural Information Processing Systems (pp. 3581–3589). https://papers.nips.cc/paper/5352-semi-supervised-learning-with-deep-generative-models

  • Kotti, M., Moschou, V., & Kotropoulos, C. (2008). Speaker segmentation and clustering. Signal Processing, 88(5), 1091–1124.

    Article  Google Scholar 

  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25, 1097–1105.

    Google Scholar 

  • LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998a). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.

    Article  Google Scholar 

  • LeCun, Y., Bottou, L., Orr, G. B., & Mueller, K.-R. (1998b). Efficient BackProp. In G. B. Orr, & K.-R. Mueller (Eds.), Neural networks: Tricks of the trade, Lecture Notes in Computer Science (Vol. 1524, pp. 9–50).

    Google Scholar 

  • LeCun, Y., Bengio, Y., & Hinton, G. E. (2015). Deep learning. Nature, 521, 436–444.

    Article  Google Scholar 

  • Lee, J., Qiu, H., Yu, G., & Lin, J. (2007). Bearing data set. IMS, University of Cincinnati, NASA Ames Prognostics Data Repository, Rexnord Technical Services. https://ti.arc.nasa.gov/tech/dash/pcoe/prognostic-data-repository/

  • Ljosa, V., Sokolnicki, K. L., & Carpenter, A. E. (2009). Annotated high-throughput microscopy image sets for validation. Nature Methods, 9, 637.

    Article  Google Scholar 

  • Long, J., Shelhamer, E., & Darrell, T. (2014). Fully convolutional networks for semantic segmentation. https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdf

  • Lukic, Y. X., Vogt, C., Dürr, O., & Stadelmann, T. (2016). Speaker identification and clustering using convolutional neural networks. In Proceedings of IEEE MLSP 2016.

    Google Scholar 

  • Lukic, Y. X., Vogt, C., Dürr, O., & Stadelmann, T. (2017). Learning embeddings for speaker clustering based on voice quality. In Proceedings of IEEE MLSP 2017.

    Google Scholar 

  • MacQueen, J. B. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability (Vol. 1, pp. 281–297). Berkeley: University of California Press.

    Google Scholar 

  • Meier, B., Stadelmann, T., Stampfli, J., Arnold, M., & Cieliebak, M. (2017). Fully convolutional neural networks for newspaper article segmentation. In Proceedings of ICDAR 2017.

    Google Scholar 

  • Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems (pp. 3111–3119). https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf

  • Mitchell, T. M. (1980). The need for biases in learning generalizations. Technical Report, Rutgers University, New Brunswick, NJ. http://www.cs.nott.ac.uk/~pszbsl/G52HPA/articles/Mitchell:80a.pdf

  • Moravcik, M., Schmid, M., Burch, N., Lisy, V., Morrill, D., Bard, N., Davis, T., Waugh, K., Johanson, M., & Bowling, M. H. (2017). DeepStack: Expert-level artificial intelligence in no-limit poker, CoRR, Vol. 1701.01724. http://arxiv.org/abs/1701.01724

  • Mori, S., Nishida, H., & Yamada, H. (1999). Optical character recognition. New York, NY: Wiley. ISBN 0471308196.

    Google Scholar 

  • Ng, A. (2016). Nuts and bolts of building AI applications using deep learning. NIPS Tutorial.

    Google Scholar 

  • Ng, A. (2019, in press). Machine learning yearning. http://www.mlyearning.org/

  • Nielsen, M. A. (2015). Neural networks and deep learning. Determination Press. http://neuralnetworksanddeeplearning.com.

  • Nielsen, F. A. (2017). Status on human vs. machines, post on “Finn Årup Nielsen’s blog”. https://finnaarupnielsen.wordpress.com/2015/03/15/status-on-human-vs-machines/

  • Pan, S. J., & Yang, Q. (2010). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359. http://ieeexplore.ieee.org/abstract/document/5288526/.

    Article  Google Scholar 

  • Pimentel, M. A. F., Clifton, D. A., Clifton, L., & Tarassenko, L. (2014). A review of novelty detection. Signal Processing, 215–249.

    Google Scholar 

  • Randall, R. B., & Antoni, J. (2011). Rolling element bearing diagnostics—A tutorial. Mechanical Systems and Signal Processing, 25(2), 485–520.

    Article  Google Scholar 

  • Razavian, A. S., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). CNN features off-the-shelf: An astounding baseline for recognition. CVPR 2014 (pp. 806–813). https://arxiv.org/abs/1403.6382

  • Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3(1), 72–83.

    Article  Google Scholar 

  • Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10(1), 19–41.

    Article  Google Scholar 

  • Romanov, A., & Rumshisky, A. (2017). Forced to learn: Discovering disentangled representations without exhaustive labels. ICRL 2017. https://openreview.net/pdf?id=SkCmfeSFg

  • Rosenblatt, F. (1957). The perceptron – A perceiving and recognizing automaton. Technical report 85-460-1, Cornell Aeronautical Laboratory.

    Google Scholar 

  • Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1988). Learning representations by back-propagating errors. In Neurocomputing: Foundations of Research (pp. 696–699). MIT Press. http://www.cs.toronto.edu/~hinton/absps/naturebp.pdf

  • Schmidhuber, J. (2014). Deep learning in neural networks: An overview. https://arxiv.org/abs/1404.7828

  • Schölkopf, B., & Smola, A. J. (2002). Learning with kernels: Support vector machines, regularization, optimization, and beyond. Cambridge, MA: MIT Press.

    Google Scholar 

  • Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., & Hassabis, D. (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529, 484–503.

    Article  Google Scholar 

  • Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. CoRR, vol. 1409.1556. https://arxiv.org/abs/1409.1556

  • Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1), 1929–1958.

    MathSciNet  MATH  Google Scholar 

  • Stadelmann, T., & Freisleben, B. (2009). Unfolding speaker clustering potential: A biomimetic approach. In Proceedings of the 17th ACM International Conference on Multimedia (pp. 185–194). ACM.

    Google Scholar 

  • Stadelmann, T., Musy, T., Duerr, O., & Eyyi, G. (2016). Machine learning-style experimental evaluation of classic condition monitoring approaches on CWRU data. Technical report, ZHAW Datalab (unpublished).

    Google Scholar 

  • Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2014). Going deeper with convolutions. CoRR, Vol. 1409.4842. https://arxiv.org/abs/1409.4842

  • Szeliski, R. (2010). Computer vision: Algorithms and applications. Texts in Computer Science. New York: Springer. http://szeliski.org/Book/.

    MATH  Google Scholar 

  • van der Maaten, L., & Hinton, G. E. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(Nov), 2579–2605.

    Google Scholar 

  • Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., & Manzagol, P.-A. (2010). Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11, 3371–3408.

    MathSciNet  MATH  Google Scholar 

  • Weyand, T., Kostrikov I., & Philbin, J. (2016). PlaNet – Photo geolocation with convolutional neural networks. CoRR, Vol. 1602.05314. http://arxiv.org/abs/1602.05314

  • Xiong, W., Droppo, J., Huang, X., Seide, F., Seltzer, M., Stolcke, A., Yu, D., & Zweig, G. (2016). Achieving human parity in conversational speech recognition. CoRR, Vol. 1610.05256. http://arxiv.org/abs/1610.05256

  • Zagoruyko, S., & Komodakis, N. (2016). Wide residual networks. In Wilson, R. C., Hancock, E. R., & Smith, W. A. P. (Eds.), Proceedings of the British Machine Vision Conference (BMVC) (pp. 87.1–87. 12. BMVA Press.

    Google Scholar 

  • Zeiler, M. D. (2012). ADADELTA: An adaptive learning rate method. CoRR, Vol. 1212.5701. http://arxiv.org/abs/1212.5701

  • Zheng, F., Zhang, G., & Song, Z. (2001). Comparison of different implementations of MFCC. Journal of Computer Science and Technology, 16(6), 582–589.

    Article  Google Scholar 

Download references

Acknowledgments

The authors are grateful for the support by CTI grants 17719.1 PFES-ES, 17729.1 PFES-ES, and 19139.1 PFES-ES.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thilo Stadelmann .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Stadelmann, T., Tolkachev, V., Sick, B., Stampfli, J., Dürr, O. (2019). Beyond ImageNet: Deep Learning in Industrial Practice. In: Braschler, M., Stadelmann, T., Stockinger, K. (eds) Applied Data Science. Springer, Cham. https://doi.org/10.1007/978-3-030-11821-1_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-11821-1_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-11820-4

  • Online ISBN: 978-3-030-11821-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics