Abstract
We present a new landmark detection problem on the upper body of a clothed person for tailoring purposes. This is a landmark detection problem unknown in the literature, which is in the same domain as, but different to the ‘fashion’ landmark detection problem where the landmarks are for classifying clothing. An existing ‘attentive fashion network’ (AFN) was trained using 800,000 annotated images of the DeepFashion dataset, with a base network of VGG16 pre-trained on the ImageNet dataset, to provide initial weights. To train a network for ‘body’ landmark detection would require a similar sized dataset. We propose a deep neural network for body landmark detection where the knowledge from an existing network was transferred and trained with an extremely small dataset of just 99 images, annotated with body landmarks. A baseline model was tested where only the fashion landmark branch was used, but retrained for body landmarks. This produced a testing error of 0.068 (normalised mean distance between the predicted landmarks and ground-truth). The error was significantly reduced by adopting the fashion landmark branch and the attention unit of AFN, but substituting the classification branch with a new body landmark detection branch for the proposed Attention-based Fashion-to-Body landmark Network (AFBN). We tested 6 variants of the proposed AFBN model with different convolutional block designs and auto-encoders for enforcing landmark relations. The trained model had a low testing error ranging from 0.022 to 0.028 over these variants. The variant with an increased number of channels and inception units with residual connections, had the best overall performance. Although AFBN and its variants were trained with a limited dataset, the performance exceeds the state-of-the-art attentive fashion network AFN (0.0534). The principle of transfer learning demonstrated here is relevant where labelled domain data are scarce providing a low solution cost of faster training of a deep neural network with a significantly small dataset.
Graphical abstract
Similar content being viewed by others
References
Alansary A, Oktay O, Li Y, Folgoc LL, Hou B, Vaillant G, Kamnitsas K, Vlontzos A, Glocker B, Kainz B, Rueckert D (2019) Evaluating reinforcement learning agents for anatomical landmark detection. Med Image Anal 53:156–164. https://doi.org/10.1016/j.media.2019.02.007
Baddar W, Son J, Kim D, Kim S (2016) A deep facial landmarks detection with facial contour and facial components constraint. Proc Int Conf Image Process ICIP 2016:3209–3213
Chen C, Yang X, Huang R, Shi W, Liu S (2020) Region proposal network with graph prior and iou-balance loss for landmark detection in 3D ultrasound. Proceedings - international symposium on biomedical imaging (2020), vol 2020. IEEE, New York, pp 1829–1833
Chen X, Zhou E, Mo Y, Liu J, Cao Z, Research M (2017) Delving deep into coarse-to-fine framework for facial landmark localization. IEEE conference on computer vision and pattern recognition workshops
Chen Y, Yang J, Jianjun JQ (2017) Recurrent neural network for facial landmark detection facial landmark RNN. Neurocomputing 219(5):26–38
Chu W, Liu Y (2019) Thermal facial landmark detection by deep multi-task learning. IEEE 21st international workshop on multimedia signal processing MMSP 2019. IEEE, New York, pp 1–6
Cootes T, Edwards G, Taylor C (2001) Active appearance models. IEEE Trans Pattern Anal Mach Intell 23(6):681–685. https://doi.org/10.1109/34.927467
Cootes T, Taylor C, Cooper D, Graham J (1995) Active shape models-their training and application. Comput Vis Image Underst 61(1):38–59. https://doi.org/10.1006/cviu.1995.1004
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 1. IEEE Computer Society, New York, pp 886–893
Devries T, Biswaranjan K, Taylor GW (2014) Multi-task learning of facial landmarks and expression. 2014 Canadian conference on computer and robot vision. IEEE, New York, pp 98–103
Dong X, Yang Y (2019) Teacher supervises students how to learn from partially labeled images for facial landmark detection. In: Proceedings of the IEEE international conference on computer vision, pp 783–792
Dong X, Yu SI, Weng X, Wei SE, Yang Y, Sheikh Y (2018) Supervision-by-registration: an unsupervised approach to improve the precision of facial landmark detectors. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 360–368
Feng Z, Hu G, Kittler J, Christmas W (2015) Cascaded collaborative regression for robust facial landmark detection trained using a mixture of synthetic and real images with dynamic weighting cascade facial landmark synthetic data. IEEE Trans Image Process 24(11):3425–3440
Feng Z, Huber P, Kittler J, Christmas W (2015) X-J Wu: Random cascaded-regression copse for robust facial landmark detection. IEEE Signal Process Lett 22(1):76–80
Feng Z, Kittler J, Awais M, Huber P, Wu X (2018) Wing loss for robust facial landmark localisation with convolutional neural networks. 2018 IEEE/CVF conference on computer vision and pattern recognition. IEEE, New York, pp 2235–2245
Feng ZH, Kittler J (2018) Advances in facial landmark detection. Biom Technol Today 2018(3):8–11. https://doi.org/10.1016/S0969-4765(18)30038-9
Feng ZH, Kittler J, Awais M, Wu XJ (2020) Rectified wing loss for efficient and robust facial landmark localisation with convolutional neural networks. Int J Comput Vis 128:2126–2145. https://doi.org/10.1007/s11263-019-01275-0
Gao P, Lu K, Xue J, Shao L, Lyu J (2021) A coarse-to-fine facial landmark detection method based on self-attention mechanism. IEEE Trans Multimed 23:926–938. https://doi.org/10.1109/TMM.2020.2991507
Gao Y, Shen D (2015) Collaborative regression-based anatomical landmark detection. Phys Med 60(24):9377–9401. https://doi.org/10.1088/0031-9155/60/24/9377
Ghesu F, Georgescu B (2019) Y Zheng: multi-scale deep reinforcement learning for real-time 3D-landmark detection in CT scans. IEEE Trans Pattern Anal Mach Intell 41(1):176–189
Ghesu FC, Georgescu B, Grbic S, Maier AK, Hornegger J, Comaniciu D (2017) Robust multi-scale anatomical landmark detection in incomplete 3D-CT data. Lect Note Comput Sci 10433:194–202. https://doi.org/10.1007/978-3-319-66182-7_23
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. 2014 IEEE conference on computer vision and pattern recognition. IEEE, New York, pp 580–587
Gou C, Ji Q (2020) Coupled cascade regression from real and synthesized faces for simultaneous landmark detection and head pose estimation. J Electron Imaging 29(02):023028
Hannane R, Elboushaki A, Afdel K (2020) A divide-and-conquer strategy for facial landmark detection using dual-task CNN architecture. Pattern Recognit 107:107504
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. 2016 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, New York, pp 770–778
He Z, Kan M, Zhang J, Chen X (2017) A fully end-to-end cascaded cnn for facial landmark detection. 2017 12th IEEE international conference on automatic face gesture recognition. IEEE, New York
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Honari S, Molchanov P, Tyree S, Vincent P, Pal C, Kautz J (2018) Improving landmark localization with semi-supervised learning. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 1546–1555
Hou Q, Wang J, Cheng L, Gong Y (2015) Facial landmark detection via cascade multi-channel convolutional neural network. IEEE International conference on image processing. IEEE, New York, pp 1800–1804
Hsu CF, Lin CC, Hung TY, Lei C, Chen KT (2020) A detailed look at cnn-based approaches in facial landmark detection. ArXiv abs/2005.08649
Huang L, Yang Y, Deng Y, Yu Y (2015) Densebox: Unifying landmark localization with end to end object detection. ArXiv abs/1509.04874
Iranmanesh S, Dabouei A (2020) Robust facial landmark detection via aggregation on geometrically manipulated faces. Proceedings - 2020 IEEE winter conference on applications of computer vision, WACV 2020. IEEE, New York, pp 319–329
Jain A, Powers A, Johnson H (2020) Robust automatic multiple landmark detection. Proceedings - international symposium on biomedical imaging. IEEE, New York, pp 1178–1182
Jakab T, Gupta A, Bilen H, Vedaldi A (2018) Unsupervised learning of object landmarks through conditional image generation. Adv Neural Infor Process Syst 31:4016–4027
Jeon S, Min D, Kim S, Sohn K (2019) Joint learning of semantic alignment and object landmark detection. In: Proceedings of the IEEE international conference on computer vision, pp 7293–7302
Johnston B, de Chazal P (2018) A review of image-based automatic facial landmark identification techniques. Eurasip J Image Video Process. https://doi.org/10.1186/s13640-018-0324-4
Johnston B, de Chazal P (2018) A review of image-based automatic facial landmark identification techniques. EURASIP J Image Video Process. https://doi.org/10.1186/s13640-018-0324-4
Kim K, Baltrušaitis T, Zadeh A, Morency LP, Medioni G (2016) Holistically constrained local model: going beyond frontal poses for facial landmark detection. British machine vision conference, BMVC 2016. IEEE, New York, pp 951–9512
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: Bengio Y, LeCun Y (eds) 3rd International conference on learning representations,. San Diego, CA, USA
Kopaczka M, Schock J, Merhof D (2019) Super-realtime facial landmark detection and shape fitting by deep regression of shape model parameters. ArXiv abs/1902.03459
Kortylewski A, Egger B, Morel-Forster A, Schneider A, Gerig T, Blumer C, Reyneke C, Vetter T (2018) Can synthetic faces undo the damage of dataset bias to face recognition and facial landmark detection. arXiv: Computer vision and pattern recognition
Kumar A, Chellappa R (2020) S 2 LD: Semi-supervised landmark detection in low resolution images and impact on face verification. 2020 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW). IEEE, New York
Lai H, Xiao S, Pan Y, Cui Z (2018) J Feng: deep recurrent regression for facial landmark detection. IEEE Trans Circuit Syst Video Technol 28(5):1144–1157
Lee H, Kim S, Lee H (2020) Lightweight and effective facial landmark detection using adversarial learning with face geometric map generative network. IEEE Trans Circuit Syst Video Technol 30(3):771–780
Lee S, Oh S, Jung C, Kim C (2019) A global-local emebdding module for fashion landmark detection. 2019 International conference on computer vision workshop, ICCVW 2019. IEEE, New York
Li W, Lu Y, Zheng K, Liao H, Lin C, Luo J, Cheng CT, Xiao J, Lu L, Kuo CF, Miao S (2020) Structured landmark detection via topology-adapting deep graph learning. In: Vedaldi A, Bischof H, Brox T, Frahm J (eds) Lecture notes in computer science, computer vision - ECCV 2020, vol 12354. Springer, Cham
Li Y, Tang S, Ye Y, Ma J (2019) Spatial-aware non-local attention for fashion landmark detection. 2019 IEEE International conference on multimedia and expo (ICME). IEEE, New York
Liao IY, Hermawan ES (2020) Transferring fashion landmarks detection model for body landmarks detection with extremely small dataset. University of Nottingham Malaysia, Jalan Broga, p 43500
Liu C, Xie H, Xu J, Zhang S, Sun J, Zhang Y (2019) Misshapen pelvis landmark detection by spatial local correlation mining for diagnosing developmental dysplasia of the hip. Lect Note Comput Sci 11769:441–449. https://doi.org/10.1007/978-3-030-32226-7_49
Liu J, Lu H (2019) Deep fashion analysis with feature map upsampling and landmark-driven attention. In: Leal-Taixé L, Roth S (eds) Lecture notes in computer science, ECCV 2018, vol 11131. Springer, Cham
Liu L, Li G, Xie Y, Yu Y (2019) Q Wang: facial landmark machines: a backbone-branches architecture with progressive representation learning. IEEE Trans Multimed 21(9):2248–2262
Liu Z, Yan S, Luo P, Wang X, Tang X (2016) Fashion landmark detection in the wild. European conference on computer vision (ECCV). IEEE, New York
Liu Z, Zhu X, Hu G, Guo H, Tang M, Lei Z, Robertson N, Wang J (2019) Semantic alignment: finding semantically consistent ground-truth for facial landmark detection. 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR). IEEE, New York, pp 3462–3471
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94
Lucas BD, Kanade T (1981) An iterative image registration technique with an application to stereo vision. In: Proceedings of the 7th international joint conference on artificial intelligence, Vancouver, CA, USA, p 674-679
Maghari A, Venkat I, Liao I, Belaton B (2014) Adaptive face modelling for reconstructing 3D face shapes from single 2D images. IET Comput Vis. https://doi.org/10.1049/iet-cvi.2013.0220
Mao R, Lin Q, Allebach JP (2018) Robust convolutional neural network cascade for facial land-mark localization exploiting training data augmentation. Electron Imaging 10:3741–3745. https://doi.org/10.2352/ISSN.2470-1173.2018.10.IMAWM-374
Maschler B, Weyrich M (2020) Deep transfer learning for industrial automation: a review and discussion of new techniques for data-driven machine learning. IEEE Ind Electron Mag 15:65–75
Merget D, Rock M, Rigoll G (2018) Robust facial landmark detection via a fully-convolutional local-global context network. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 781–790
Newell A, Yang K, Deng J (2016) Stacked hourglass networks for human pose estimation. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer vision - ECCV 2016. Springer, Cham, pp 483–499
Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987
Payer C, Stern D, Bischof H, Urschler M (2016) Regressing heatmaps for multiple landmark localization using CNNs embedded fingerprint analysis view project AUTOVISTA view project regressing heatmaps for multiple landmark localization using CNNs. Springer, Cham, pp 230–238
Qian J, Cheng M, Tao Y, Lin J (2019) CephaNet: an improved faster R-CNN for cephalometric landmark detection. Proceedings - international symposium on biomedical imaging. IEEE, New York, pp 868–871
Qian J, Luo W, Cheng M, Tao Y, Lin J (2020) H Lin: CephaNN: a multi-head attention network for cephalometric landmark detection. IEEE Access 8:112633–112641. https://doi.org/10.1109/ACCESS.2020.3002939
Ranjan R, Patel V, Chellappa R (2019) Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans Pattern Anal Mach Intell 4(1):121–135
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R (eds) Advances in neural information processing systems, vol 28. Curran Associates Inc, New York
Riegler G, Urschler M, Ruther M, Bischof H, Stern D (2015) Anatomical landmark detection in medical applications driven by synthetic data. 2015 IEEE International conference on computer vision workshop (ICCVW). IEEE, New York, pp 85–89
Sadiq M, Shi D, Guo M, Cheng X (2019) Facial landmark detection via attention-adaptive deep network. IEEE Access. https://doi.org/10.1109/ACCESS.2019.2955156
Sanchez E, Tzimiropoulos G (2019) Object landmark discovery through unsupervised adaptation. 33rd Conference on neural information processing systems (NeurIPS 2019). IEEE, New York
Shi H, Wang Z (2019) Improved stacked hourglass network with offset learning for robust facial landmark detection. 2019 9th International conference on information science and technology (ICIST). IEEE, New York, pp 58–64
Singh P, Seto M (2019) Morphological landmark detection on lobsters using attention networks. IEEE international conference on systems, man and cybernetics. IEEE, New York, pp 4088–4093
Storey G, Bouridane A, Jiang R (2018) Integrated deep model for face detection and landmark localization from in the wild images. IEEE Access 6:74442–74452
Teixeira B, Tamersoy B, Singh V, Kapoor A (2019) Adaloss: adaptive loss function for landmark localization. ArXiv abs/1908.01070
Thewlis J, Bilen H, Vedaldi A (2017) Unsupervised learning of object landmarks by factorized spatial embeddings. In: Proceedings of the IEEE international conference on computer vision, pp 3229–3238
Tiulpin A, Melekhov I, Saarakkala S (2019) Kneel: Knee anatomical landmark localization using hourglass networks. 2019 IEEE/CVF international conference on computer vision workshop (ICCVW). IEEE, New York, pp 352–361
Vlontzos A, Alansary A, Kamnitsas K, Rueckert D, Kainz B (2019) Multiple landmark detection using multi-agent reinforcement learning. Lecture notes in computer science, vol 11767. Springer, Chem, pp 262–270
Wang L, Yu X, Bourlai T, Metaxas D (2019) A coupled encoder-decoder network for joint face detection and landmark localization. Image Vis Comput 87:37–46
Wang N, Gao X, Tao D, Yang H, Li X (2018) Facial feature point detection: a comprehensive survey. Neurocomputing 275:50–65
Wang W, Xu Y, Shen J, Zhu SC (2018) Attentive fashion grammar network for fashion landmark detection and clothing category classification. In: IEEE Computer society conference on computer vision and pattern recognition
Wu H, Xie H, Lin F, Zhang S, Sun J, Zhang Y (2019) WaveCSN: cascade segmentation network for hip landmark detection. In: MMAsia 19: Proceedings of the ACM multimedia asia, Association for Computing Machinery, Inc, pp 1–6
Wu Y, Ji Q (2016) Constrained joint cascade regression framework for simultaneous facial action unit recognition and facial landmark detection. 2016 IEEE Conference on computer vision and pattern recognition (CVPR). IEEE, New York, pp 3400–3408
Xiao B, Wu H, Wei Y (2018) Simple baselines for human pose estimation and tracking. Proc Eur Conf Comput. https://doi.org/10.1007/978-3-030-01231-1_29
Xiao S, Feng J, Xing J, Lai H, Yan S, Kassim A, Yan S (2016) Robust facial landmark detection via recurrent attentive-refinement networks. Lecture notes in computer science, vol 9905. LNCS, Springer, Germany, pp 57–72
Xiao S, Yan S, Kassim AA (2015) Facial landmark detection via progressive initialization. 2015 IEEE International conference on computer vision workshop (ICCVW). IEEE, New York, pp 986–993
Yan S, Liu Z, Luo P, Qiu S, Wang X, Tang X(2017) Unconstrained fashion landmark detection via hierarchical recurrent transformer networks. In: MM 2017 - Proceedings of the 2017 ACM multimedia conference, pp 172–180
Yan Y, Duffner S, Phutane P, Berthelier A, Blanc C, Garcia C, Chateau T (2020) 2d wasserstein loss for robust facial landmark detection. Pattern Recognit 116:107945
Yan Y, Duffner S, Phutane P, Berthelier A, Blanc C, Garcia C, Chateau T (2020) Facial landmark correlation analysis
Yang J, Liu Q, Zhang K (2017) Stacked hourglass network for robust facial landmark localisation. 2017 IEEE Conference on computer vision and pattern recognition workshops (CVPRW). IEEE, New York, pp 2025–2033
Yang X, Tang WTW, Tjio G, Yeo SSY, Su Y (2020) Automatic detection of anatomical landmarks in brain MR scanning using multi-task deep neural networks. Neurocomputing 396:514–521. https://doi.org/10.1016/j.neucom.2018.10.105
Yu W, Liang X, Gong K, Jiang C, Xiao N, Lin L (2019) Layout-graph reasoning for fashion landmark detection. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (2019), pp 2932–2940
Zhang H, Li Q, Sun Z, Liu Y (2018) Combining data-driven and model-driven methods for robust facial landmark detection. IEEE Trans Infor Forensics Sec 13(10):2409–2422
Zhang J, Liu M, Wang L, Chen S, Yuan P, Li J, Shen SGF, Tang Z, Chen KC, Xia JJ, Shen D (2017) Joint craniomaxillofacial bone segmentation and landmark digitization by context-guided fully convolutional networks. Lecture notes in computer science, vol 10434. Springer, Germany, pp 720–728
Zhang R, Mu C, Fan J, Xu WT (2020) Semi-supervised learning for facial component-landmark detection. Twelfth international conference on digital image processing (ICDIP), vol 1151905. SPIE, Bellingham, pp 28–33
Zhang R, Mu C, Fan J, Wang J, Xu L (2020) Semi-supervised learning for facial component-landmark detection. In: Jiang X, Fujita H (eds) Twelfth international conference on digital image processing (ICDIP 2020), vol 11519. International society for optics and photonics. SPIE, Bellingham, pp 28–33
Zhang Y, Guo Y, Jin Y, Luo Y, He Z, Lee H (2018) Unsupervised discovery of object landmarks as structural representations. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 2694–2703
Zhang Y, Zhang C, Du F (2019) A brief review of recent progress in fashion landmark detection. 2019 12th International congress on image and signal processing, bioMedical engineering and informatics (CISP-BMEI). IEEE, New York, pp 1–6
Zhou S, Xu Z (2019) Landmark detection and multiorgan segmentation: representations and supervised approaches. Handbook of medical image computing and computer assisted intervention. Elsevier, Netherlands, pp 205–229
Zhu M, Shi D, Zheng M, Sadiq M (2019) Robust facial landmark detection via occlusion-adaptive deep networks. 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR). IEEE, New York, pp 3481–3491
Zhuang C, Zhang S, Zhu X, Lei Z (2019) FLDet: a CPU real-time joint face and landmark detector. 2019 International conference on biometrics. IEEE, New York, pp 1–6
Acknowledgements
The authors acknowledge the support by the Malaysian Ministry of Education for the grant awarded under the Fundamental Research Grant Scheme (No. FRGS/1/2014/ICT07/UNIM/02/1). The authors are grateful to Daniel Chua of Saratix Sdn Bhd (Malaysia), for providing the dataset of human body images along with annotated body landmarks without which this work could not have been done. The contributions of the authors are as follows. The first author, as the Principal Investigator, conceptualised the idea for the transfer learning model(s) and implemented the 7 AFBN attention models in PyTorch. The first author also conducted the literature review. The second author implemented the baseline model and assisted the first author with the training and validation of the models. The third author led the drafting of the paper assisted by the first author.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Statement and Declarations
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. The data that support the findings of this study are available from Saratix Sdn. Bhd., but restrictions apply to the availability of these data, which were used under licence for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of Saratix Sdn. Bhd.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liao, I.Y., Hermawan, E.S. & Zaman, M. Body landmark detection with an extremely small dataset using transfer learning. Pattern Anal Applic 26, 163–199 (2023). https://doi.org/10.1007/s10044-022-01098-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-022-01098-9