Deep Architectures in Visual Transfer Learning

Gomaa, Walid

doi:10.1007/978-3-030-75490-7_1

Walid Gomaa^5,6

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 207))

821 Accesses

Abstract

Deep learning is a recent form of machine learning that depends on structural compositional models that can represent mapping functions that otherwise require an exponentially larger size flat models. The most successful realization of such learning paradigm is deep neural networks that have in recent years achieved state-of-the-art performance in tasks related, in particular, to visual data, audio data, and natural language processing. The main drawback of such methods is that, albeit their superhuman performance, that performance is achieved solely on very specific individual tasks. Intelligence must however, be broader and more general. So the next big research program is to build machines that are smart in a more general sense over multiple tasks and domains. One way to achieve that is through transfer learning. Transfer learning refers to a broad set of techniques, all aimed towards the reuse of knowledge gained from solving some problem towards the solution of some other problem. In this paper we study the effectiveness of known deep architectures in transfer learning in visual tasks. We consider two of the VGG family, namely, VGG16 and VGG19, the Xception architecture, DenseNet121, and finally the ResNet50 architecture. They are already pretrained on the ImageNet dataset, we tune and test their transfer performance on the Caltech-256 image set. Four of these architectures have shown good training/validation performance on the latter dataset. In addition, DenseNet121 and Xception have shown exceptional superior performance over the VGG variants and ResNet50, though thay are an order of magnitude less in size. However, they are an order of magnitude deeper confirming the lasting conjecture that deep compositional architectures are exponentially more representative and expressive than flatter networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abdu-Aguye, M.G., Gomaa, W.: Novel approaches to activity recognition based on vector autoregression and wavelet transforms. In: 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 951–954. IEEE (2018)
Google Scholar
Abdu-Aguye, M.G., Gomaa, W.: Competitive feature extraction for activity recognition based on wavelet transforms and adaptive pooling. In: 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2019). https://doi.org/10.1109/IJCNN.2019.8852299
Abdu-Aguye, M.G., Gomaa, W.: Robust human activity recognition based on deep metric learning. In: Proceedings of the 16th International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO, pp. 656–663. INSTICC, SciTePress (2019). https://doi.org/10.5220/0007916806560663
Abdu-Aguye, M.G., Gomaa, W.: Versatl: versatile transfer learning for IMU-based activity recognition using convolutional neural networks. In: Proceedings of the 16th International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO, pp. 507–516. INSTICC, SciTePress (2019). https://doi.org/10.5220/0007916705070516
Abdu-Aguye, M.G., Gomaa, W., Makihara, Y., Yagi, Y.: On the feasibility of on-body roaming models in human activity recognition. In: Proceedings of the 16th International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO, pp. 680–690. INSTICC, SciTePress (2019). https://doi.org/10.5220/0007921606800690
Adel, O., Nafea, Y., Hesham, A., Goma, W.: Gait-based person identification using multiple inertial sensors. In: Proceedings of the 17th International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO, pp. 621–628. INSTICC, SciTePress (2020). https://doi.org/10.5220/0009791506210628
Agarwal, S., Roth, D.: Learning a sparse representation for object detection. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) Computer Vision - ECCV 2002, pp. 113–127. Springer, Heidelberg (2002)
Google Scholar
Ashry, S., Elbasiony, R., Gomaa, W.: An LSTM-based descriptor for human activities recognition using IMU sensors. In: Proceedings of the 15th International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO, pp. 494–501. INSTICC, SciTePress (2018). https://doi.org/10.5220/0006902404940501
Ashry, S., Gomaa, W.: Descriptors for human activity recognition. In: 2019 7th International Japan-Africa Conference on Electronics, Communications, and Computations, (JAC-ECC), pp. 116–119 (2019)
Google Scholar
Ashry, S., Gomaa, W., Abdu-Aguye, M.G., El-borae, N.: Improved IMU-based human activity recognition using hierarchical hmm dissimilarity. In: Proceedings of the 17th International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO, pp. 702–709. INSTICC, SciTePress (2020). https://doi.org/10.5220/0009886607020709
Ashry, S., Ogawa, T., Gomaa, W.: Charm-deep: continuous human activity recognition model based on deep neural network using IMU sensors of smartwatch. IEEE Sensors J. 20(15), 8757–8770 (2020)
Google Scholar
Baños, O., Damas, M., Pomares, H., Rojas, I., Tóth, M.A., Amft, O.: A benchmark dataset to evaluate sensor displacement in activity recognition. In: Proceedings of the 2012 ACM Conference on Ubiquitous Computing, pp. 1026–1035. ACM (2012)
Google Scholar
Baykal, E., Dogan, H., Ercin, M.E., Ersoz, S., Ekinci, M.: Transfer learning with pre-trained deep convolutional neural networks for serous cell classification. Multimed. Tools Appl. 79(21), 15593–15611 (2020). https://doi.org/10.1007/s11042-019-07821-9
Bengio, Y.: Deep learning of representations for unsupervised and transfer learning. In: Proceedings of ICML Workshop on Unsupervised and Transfer Learning, pp. 17–36 (2012)
Google Scholar
Chikhaoui, B., Gouineau, F., Sotir, M.: A CNN based transfer learning model for automatic activity recognition from accelerometer sensors. In: International Conference on Machine Learning and Data Mining in Pattern Recognition, pp. 302–315. Springer (2018)
Google Scholar
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1800–1807 (2017)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR09 (2009)
Google Scholar
Elkholy, A., Hussein, M., Gomaa, W., Damen, D., Saba, E.: Efficient and robust skeleton-based quality assessment and abnormality detection in human action performance. IEEE J. Biomed. Health Inform. 24(1), 280–291 (2019)
Google Scholar
Elkholy, A., Hussein, M.E., Gomaa, W., Damen, D., Saba, E.: A general descriptor for detecting abnormal action performance from skeletal data. In: Proceedings of the 39th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC 17). JeJu Island, S. Korea (2017)
Google Scholar
Elkholy, A., Makihara, Y., Gomaa, W., Ahad, M.A.R., Yagi, Y.: Unsupervised gei-based gait disorders detection from different views. In: 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 5423–5426 (2019). https://doi.org/10.1109/EMBC.2019.8856294
Olivas, E.S., Guerrero, J.D.M., Martinez-Sober, M., Magdalena-Benedito, J.R., Serrano, L.: Handbook of Research on Machine Learning Applications and Trends - Algorithms, Methods, and Techniques. Information Science Reference, Hershey, PA (2009)
Google Scholar
Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings, vol. 2, pp.II (2003)
Google Scholar
Griffin, G., Holub, A., Perona, P.: Caltech-256 object category dataset. Technical Report 7694, Caltech (2007)
Google Scholar
Gomaa, W.: Probabilistic approach to human activity recognition from accelerometer data. In: 2019 7th International Japan-Africa Conference on Electronics, Communications, and Computations, (JAC-ECC), pp. 63–66 (2019)
Google Scholar
Gomaa, W.: Statistical and time series analysis of accelerometer signals for human activity recognition. In: 2019 14th International Conference on Computer Engineering and Systems (ICCES), pp. 351–356 (2019)
Google Scholar
Gomaa, W.: Statistical metric-theoretic approach to activity recognition based on accelerometer data. In: Hassanien, A.E., Shaalan, K., Tolba, M.F. (eds.) Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2019, pp. 537–546. Springer International Publishing, Cham (2020)
Google Scholar
Gomaa, W., Elbasiony, R., Ashry, S.: ADL classification based on autocorrelation function of inertial signals. In: 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 833–837 (2017). https://doi.org/10.1109/ICMLA.2017.00-53
Gopalakrishnan, K., Khaitan, S.K., Choudhary, A., Agrawal, A.: Deep convolutional neural networks with transfer learning for computer vision-based data-driven pavement distress detection. Construct. Build. Mater. 157, 322–330 (2017). https://doi.org/10.1016/j.conbuildmat.2017.09.110, http://www.sciencedirect.com/science/article/pii/S0950061817319335
Harley, A.W.: An interactive node-link visualization of convolutional neural networks. In: International Symposium on Visual Computing, pp. 867–877. Springer (2015)
Google Scholar
Hasan, M.K., Aleef, T.A.: Automatic mass detection in breast using deep convolutional neural network and SVM classifier. CoRR abs/1907.04424 (2019). http://arxiv.org/abs/1907.04424
Hassanein, A., Hussein, M., Gomaa, W.: Semantic analysis of crowded scenes based on non-parametric tracklet clustering. In: Proceedings of the 25th International Joint Conference on Artificial Intelligence IJCAI-16. New York City, USA (2016)
Google Scholar
Hassanein, A.S., Hussein, M.E., Gomaa, W., Makihara, Y., Yagi, Y.: Identifying motion pathways in highly crowded scenes: Aa non-parametric tracklet clustering approach. Comput. Vision Image Understand. 191 (2020). https://doi.org/10.1016/j.cviu.2018.08.004, http://www.sciencedirect.com/science/article/pii/S1077314218301887
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Google Scholar
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Google Scholar
Hu, D.H., Zheng, V.W., Yang, Q.: Cross-domain activity recognition via transfer learning. Pervasive Mobile Comput. 7(3), 344–358 (2011). https://doi.org/10.1016/j.pmcj.2010.11.005, http://www.sciencedirect.com/science/article/pii/S1574119210001227. Knowledge-Driven Activity Recognition in Intelligent Environments
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269 (2017)
Google Scholar
Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning. Illustrated edn. Adaptive Computation and Machine Learning series. The MIT Press (2016)
Google Scholar
ImageNet: http://www.image-net.org
Jordan, M.I., Mitchell, T.M.: Machine learning: trends, perspectives, and prospects. Science 349(6245), 255–260 (2015)
Google Scholar
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014)
Google Scholar
Khan, M.A.A.H., Roy, N.: Transact: transfer learning enabled activity recognition. In: 2017 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops), pp. 545–550 (2017). https://doi.org/10.1109/PERCOMW.2017.7917621
Khan, M.A.A.H., Roy, N.: Untran: recognizing unseen activities with unlabeled data using transfer learning. In: 2018 IEEE/ACM Third International Conference on Internet-of-Things Design and Implementation (IoTDI), pp. 37–47 (2018). https://doi.org/10.1109/IoTDI.2018.00014
Li Fei-Fei, Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. In: 2004 Conference on Computer Vision and Pattern Recognition Workshop, pp. 178–178 (2004)
Google Scholar
Lu, G., Hao, Q., Kong, K., Yan, J., Li, H., Li, X.: Deep convolutional neural networks with transfer learning for neonatal pain expression recognition. In: 2018 14th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), pp. 251–256 (2018). https://doi.org/10.1109/FSKD.2018.8687129
Masrour, T., El Hassani, I., Bouchama, M.S.: Deep convolutional neural networks with transfer learning for old buildings pathologies automatic detection. In: Ezziyyani, M. (ed.) Advanced Intelligent Systems for Sustainable Development (AI2SD’2019), pp. 204–216. Springer, Cham (2020)
Google Scholar
Mostafa., A., Barghash., T.O., Assaf., A.A., Gomaa., W.: Multi-sensor gait analysis for gender recognition. In: Proceedings of the 17th International Conference on Informatics in Control, Automation and Robotics - Volume 1: ICINCO, pp. 629–636. INSTICC, SciTePress (2020). https://doi.org/10.5220/0009792006290636
Moustafa, A., Hussein, M., Gomaa, W.: Gate and common pathway detection in crowd scenes using motion units and meta-tracking. In: Proceedings of the International Conference on Digital Image Computing: Techniques and Applications (DICTA 2017), Sydney, Australia (2017)
Google Scholar
Moustafa, A.N., Gomaa, W.: Gate and common pathway detection in crowd scenes and anomaly detection using motion units and LSTM predictive models. Multimed. Tools Appl. (2020). https://doi.org/10.1007/s11042-020-08840-7
Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1717–1724 (2014)
Google Scholar
Pan, S.J., Yang, Q., et al.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010). https://doi.org/10.1109/TKDE.2009.191
Ponce, J., Berg, T.L., Everingham, M., Forsyth, D.A., Hebert, M., Lazebnik, S., Marszalek, M., Schmid, C., Russell, B.C., Torralba, A., Williams, C.K.I., Zhang, J., Zisserman, A.: Dataset Issues in Object Recognition, pp. 29–48. Springer, Heidelberg (2006)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, Conference Track Proceedings, San Diego, CA, USA, 7–9 May 2015 (2015). URL http://arxiv.org/abs/1409.1556
Su, Y., Chiu, T., Yeh, C., Huang, H., Hsu, W.H.: Transfer learning for video recognition with scarce training data. CoRR abs/1409.4127 (2014). http://arxiv.org/abs/1409.4127
Thrun, S., Pratt, L.: Learning to Learn: Introduction and Overview, pp. 3–17. Springer, Boston (1998)
Google Scholar
Yosinski, J., Clune, J., Bengio, Y., Lipson, H.: How transferable are features in deep neural networks? In: Ghahramani, Z., Welling,M., Cortes, C., Lawrence, N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27, pp. 3320–3328. Curran Associates, Inc. (2014). https://proceedings.neurips.cc/paper/2014/file/375c71349b295fbe2dcdca9206f20a06-Paper.pdf
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: European Conference on Computer Vision, pp. 818–833. Springer (2014)
Google Scholar

Download references

Acknowledgements

This work is Funded by the Science and Technology Development Fund STDF (Egypt); Project id: 42519 - “Automatic Video Surveillance System for Crowd Scenes”.

Author information

Authors and Affiliations

Cyber Physical Systems Lab., Egypt Japan University of Science and Technology, New Borg El-Arab City, Alexandria, Egypt
Walid Gomaa
Faculty of Engineering, Alexandria University, Alexandria, Egypt
Walid Gomaa

Authors

Walid Gomaa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Walid Gomaa .

Editor information

Editors and Affiliations

Professor, Department of Electrical and Electronic Engineering, University of Dhaka, Dhaka, Bangladesh
Md Atiqur Rahman Ahad
Solutions Architect for Greenfield and Startups, Amazon Web Services, USA, Visiting Professor, Graduate School of Regional Innovation, Mie University, Tsu, Japan
Atsushi Inoue

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Gomaa, W. (2021). Deep Architectures in Visual Transfer Learning. In: Ahad, M.A.R., Inoue, A. (eds) Vision, Sensing and Analytics: Integrative Approaches. Intelligent Systems Reference Library, vol 207. Springer, Cham. https://doi.org/10.1007/978-3-030-75490-7_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-75490-7_1
Published: 06 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75489-1
Online ISBN: 978-3-030-75490-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics