Few-shot learning in realistic settings for text CAPTCHA recognition

Wang, Yao; Wei, Yuliang; Zhang, Yifan; Jin, Chuhao; Xin, Guodong; Wang, Bailing

doi:10.1007/s00521-023-08262-0

Few-shot learning in realistic settings for text CAPTCHA recognition

Original Article
Published: 14 February 2023

Volume 35, pages 10751–10764, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Yao Wang^1,2,
Yuliang Wei^1,2,
Yifan Zhang¹,
Chuhao Jin^1,3,
Guodong Xin ORCID: orcid.org/0000-0001-6997-2447¹ &
…
Bailing Wang²

422 Accesses
2 Citations
Explore all metrics

Abstract

Text-based captcha is commonly used by many commercial websites. Most existing captcha recognition methods rely on deep learning and large-scale labeled data. Recently, few-shot learning has shown its effectiveness in various visual classification tasks in the case of insufficient data. However, the performance of current few-shot learning methods will deteriorate in realistic settings with class-imbalance and cross-domain. In this paper, we have proposed a novel captcha solver based on prototypical networks and model-agnostic meta-learning. Two major improvements, including multi-source domain data augmentation and intra-class variance distance weighting method, are proposed to alleviate the performance degradation problems caused by cross-domain and class imbalance. Our approaches achieve an average character accuracy of more than 90% in 5-shot and 10-shot tasks and an astonishing attack rate of 88% in one-shot tasks. The efficacy of this work may promote the application of few-shot learning in realistic settings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 6

Fig. 13

Microsoft COCO: Common Objects in Context

ImageNet Large Scale Visual Recognition Challenge

Article 11 April 2015

Scaling Up Multi-domain Semantic Segmentation with Sentence Embeddings

Article 01 May 2024

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

Zi Y, Gao H, Cheng Z, Liu Y (2019) An end-to-end attack on text captchas. IEEE Trans Inf Forensics Secur 15:753–766
Article Google Scholar
Kim D, Sample L (2019) Search prevention with captcha against web indexing: a proof of concept. In: 2019 IEEE International conference on computational science and engineering (CSE) and IEEE international conference on embedded and ubiquitous computing (EUC), pp 219–224. IEEE
Kumar M, Jindal MK, Kumar M (2022) Design of innovative CAPTCHA for hindi language. Neural Comput Appl 34:4957–4992
Article Google Scholar
Mohamed M, Sachdeva N, Georgescu M, Gao S, Saxena N, Zhang C, Kumaraguru P, Van Oorschot PC, Chen W-B (2014) A three-way investigation of a game-captcha: automated attacks, relay attacks and usability. In: Proceedings of the 9th ACM symposium on information, computer and communications security, pp 195–206
Xu X, Liu L, Li B (2020) A survey of captcha technologies to distinguish between human and computer. Neurocomputing 408:292–307
Article Google Scholar
Yu N, Darling K (2019) A low-cost approach to crack python captchas using AI-based chosen-plaintext attack. Appl Sci 9(10):2010
Article Google Scholar
Wang J, Qin JH, Xiang XY, Tan Y, Pan N (2019) Captcha recognition based on deep convolutional neural network. Math Biosci Eng 16(5):5851–5861
Article MathSciNet Google Scholar
Chellapilla K, Simard PY (2005) Using machine learning to break visual human interaction proofs (HIPs). Adv Neural Inf Process Syst 17:265–272
Google Scholar
Goodfellow IJ, Bulatov Y, Ibarz J, Arnoud S, Shet V (2013) Multi-digit number recognition from street view imagery using deep convolutional neural networks. CoRR arxiv:1312.6082
Mansilla L, Echeveste R, Milone DH, Ferrante E (2021) Domain generalization via gradient surgery. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6630–6638
Li C, Chen X, Wang H, Wang P, Zhang Y, Wang W (2021) End-to-end attack on text-based CAPTCHAs based on cycle-consistent generative adversarial network. Neurocomputing 433:223–236
Article Google Scholar
Ye G, Tang Z, Fang D, Zhu Z, Feng Y, Xu P, Chen X, Wang Z (2018) Yet another text captcha solver: a generative adversarial network based approach. In: Proceedings of the 2018 ACM SIGSAC conference on computer and communications security, pp 332–348
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems. Red Hook, NY Curran, pp 2672–2680
Koch G, Zemel R, Salakhutdinov R (2015) Siamese neural networks for one-shot image recognition. In: ICML Deep learning workshop, vol 2. Lille
Ye H-J, Hu H, Zhan D-C, Sha F (2020) Few-shot learning via embedding adaptation with set-to-set functions. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8808–8817
Wang Y, Yao Q, Kwok JT, Ni LM (2020) Generalizing from a few examples: a survey on few-shot learning. ACM Comput Surveys (CSUR) 53(3):1–34
Article Google Scholar
Cao T, Law M, Fidler S (2019) A theoretical analysis of the number of shots in few-shot learning. arXiv preprint arXiv:1909.11722
Chen W-Y, Liu Y-C, Kira Z, Wang Y-CF, Huang J-B (2019) A closer look at few-shot classification. arXiv preprint arXiv:1904.04232
Snell J, Swersky K, Zemel R (2017) Prototypical networks for few-shot learning. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc,. https://proceedings.neurips.cc/paper/2017/file/cb8da6767461f2812ae4290eac7cbc42-Paper.pdf
Finn C, Abbeel P, Levine S (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In: International conference on machine learning, pp. 1126–1135. PMLR
Bansal A, Garg D, Gupta A, Gupta A (2008) Breaking a visual CAPTCHA: a novel approach using HMM
Yan J, El Ahmad AS (2007) Breaking visual captchas with naive pattern recognition algorithms. In: Twenty-third annual computer security applications conference (ACSAC 2007), pp 279–291. IEEE
Yan J, El Ahmad AS (2008) A low-cost attack on a microsoft captcha. In: Proceedings of the 15th ACM conference on computer and communications security, pp 543–554
Gao H, Tang M, Liu Y, Zhang P, Liu X (2017) Research on the security of microsoft’s two-layer captcha. IEEE Trans Inf Forensics Secur 12(7):1671–1685
Article Google Scholar
Chen J, Luo X, Hu J, Ye D, Gong D (2018) An attack on hollow captcha using accurate filling and nonredundant merging. IETE Tech Rev 35(sup1):106–118
Article Google Scholar
Ferreira DD, Leira L, Mihaylova P, Georgieva P (2019) Breaking text-based captcha with sparse convolutional neural networks. Iberian conference on pattern recognition and image analysis. Springer, Cham, pp 404–415
Google Scholar
Wang Z, Shi P (2021) Captcha recognition method based on CNN with focal loss. Complexity. https://doi.org/10.1155/2021/6641329
Article Google Scholar
Liu J, Zhang Z, Yang G (2021) Cross-class generative network for zero-shot learning. Inf Sci 555:147–163
Article MATH MathSciNet Google Scholar
Wang Y, Wei Y, Zhang M, Liu Y, Wang B (2021) Make complex captchas simple: a fast text captcha solver based on a small number of samples. Inf Sci 578:181–194
Article MathSciNet Google Scholar
Alfassy A, Karlinsky L, Aides A, Shtok J, Harary S, Feris R, Giryes R, Bronstein AM (2019) Laso: Label-set operations networks for multi-label few-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6548–6557
Chu W-H, Li Y-J, Chang J-C, Wang Y-CF (2019) Spot and learn: a maximum-entropy patch sampler for few-shot image classification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6251–6260
Schonfeld E, Ebrahimi S, Sinha S, Darrell T, Akata Z (2019) Generalized zero-and few-shot learning via aligned variational autoencoders. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8247–8255
Li A, Luo T, Lu Z, Xiang T, Wang L (2019) Large-scale few-shot learning: Knowledge transfer with class hierarchy. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7212–7220
Schwartz E, Karlinsky L, Feris R, Giryes R, Bronstein AM (2019) Baby steps towards few-shot learning with multiple semantics. arXiv preprint arXiv:1906.01905
Vinyals O, Blundell C, Lillicrap T, kavukcuoglu k, Wierstra D (2016) Matching networks for one shot learning. In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol 29. Curran Associates, Inc,. https://proceedings.neurips.cc/paper/2016/file/90e1357833654983612fb05e3ec9148c-Paper.pdf
Sung F, Yang Y, Zhang L, Xiang T, Torr PH, Hospedales TM (2018) Learning to compare: relation network for few-shot learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1199–1208
Lifchitz Y, Avrithis Y, Picard S, Bursuc, A (2019) Dense classification and implanting for few-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9258–9267
Li W, Wang L, Xu J, Huo J, Gao Y, Luo J (2019) Revisiting local descriptor based image-to-class measure for few-shot learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7260–7268
Mahmud S, Lim KH (2022) One-step model agnostic meta-learning using two-phase switching optimization strategy. Neural Comput Appl 34:13529–13537
Article Google Scholar
Wertheimer D, Hariharan B (2019) Few-shot learning with localization in realistic settings. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6558–6567
Ochal M, Patacchiola M, Storkey A, Vazquez J, Wang S (2021) Few-shot learning with class imbalance. arXiv preprint arXiv:2101.02523
Triantafillou E, Zhu T, Dumoulin V, Lamblin P, Evci U, Xu K, Goroshin R, Gelada C, Swersky K, Manzagol P-A, et al (2019) Meta-dataset: a dataset of datasets for learning to learn from few examples. arXiv preprint arXiv:1903.03096
Guan J, Liu J, Sun J, Feng P, Shuai T, Wang W (2020) Meta metric learning for highly imbalanced aerial scene classification. In: ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 4047–4051. IEEE
Chen X, Dai H, Li Y, Gao X, Song L (2020) Learning to stop while learning to predict. In: International conference on machine learning, pp 1520–1530. PMLR
Guo Y, Codella NC, Karlinsky L, Codella JV, Smith JR, Saenko K, Rosing T, Feris R (2020) A broader study of cross-domain few-shot learning. European conference on computer vision. Springer, Cham, pp 124–141
Google Scholar
Tseng H-Y, Lee H-Y, Huang J-B, Yang M-H (2020) Cross-domain few-shot classification via learned feature-wise transformation. arXiv preprint arXiv:2001.08735
Sa L, Yu C, Ma X, Zhao X, Xie T (2022) Attentive fine-grained recognition for cross-domain few-shot classification. Neural Comput Appl 34(6):4733–4746
Article Google Scholar
Ye G, Tang Z, Fang D, Zhu Z, Feng Y, Xu P, Chen X, Han J, Wang Z (2020) Using generative adversarial networks to break and protect text captchas. ACM Trans Privacy Secur (TOPS) 23(2):1–29
Article Google Scholar
Tian S, Xiong T (2020) A generic solver combining unsupervised learning and representation learning for breaking text-based captchas. In: Proceedings of the web conference 2020, pp 860–871
Chellapilla K, Larson K, Simard PY, Czerwinski M (2005) Computers beat humans at single character recognition in reading based human interaction proofs (HIPs). In: CEAS

Download references

Acknowledgements

This work was supported by the National Key Research and Development Program of China (2021YFB2012400), the Fundamental Research Funds for the Central Universities (HIT.NSRIF.2020098), Key Technology Research and Development Program of Shandong (2017CXGC0706), National Regional Innovation Center Science and Technology Special Project of China (2017QYCX14).

Author information

Authors and Affiliations

School of Computer Science and Technology, Harbin Institute of Technology, Weihai, 264209, China
Yao Wang, Yuliang Wei, Yifan Zhang, Chuhao Jin & Guodong Xin
Research Institute of Cyberspace Security, Harbin Institute of Technology, Harbin, 150001, China
Yao Wang, Yuliang Wei & Bailing Wang
Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, 100872, China
Chuhao Jin

Authors

Yao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yuliang Wei
View author publications
You can also search for this author in PubMed Google Scholar
Yifan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Chuhao Jin
View author publications
You can also search for this author in PubMed Google Scholar
Guodong Xin
View author publications
You can also search for this author in PubMed Google Scholar
Bailing Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Guodong Xin or Bailing Wang.

Ethics declarations

Conflict of interest

The author declares that there is no conflict of interest regarding the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, Y., Wei, Y., Zhang, Y. et al. Few-shot learning in realistic settings for text CAPTCHA recognition. Neural Comput & Applic 35, 10751–10764 (2023). https://doi.org/10.1007/s00521-023-08262-0

Download citation

Received: 08 May 2022
Accepted: 06 January 2023
Published: 14 February 2023
Issue Date: May 2023
DOI: https://doi.org/10.1007/s00521-023-08262-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Few-shot learning in realistic settings for text CAPTCHA recognition

Abstract

Access this article

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

ImageNet Large Scale Visual Recognition Challenge

Scaling Up Multi-domain Semantic Segmentation with Sentence Embeddings

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Few-shot learning in realistic settings for text CAPTCHA recognition

Abstract

Access this article

Similar content being viewed by others

Microsoft COCO: Common Objects in Context

ImageNet Large Scale Visual Recognition Challenge

Scaling Up Multi-domain Semantic Segmentation with Sentence Embeddings

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation