Abstract
Although the self-supervised pre-training of transformer models has resulted in the revolutionizing of natural language processing (NLP) applications and the achievement of state-of-the-art results with regard to various benchmarks, this process is still vulnerable to small and imperceptible permutations originating from legitimate inputs. Intuitively, the representations should be similar in the feature space with subtle input permutations, while large variations occur with different meanings. This motivates us to investigate the learning of robust textual representation in a contrastive manner. However, it is non-trivial to obtain opposing semantic instances for textual samples. In this study, we propose a disentangled contrastive learning method that separately optimizes the uniformity and alignment of representations without negative sampling. Specifically, we introduce the concept of momentum representation consistency to align features and leverage power normalization while conforming the uniformity. Our experimental results for the NLP benchmarks demonstrate that our approach can obtain better results compared with the baselines, as well as achieve promising improvements with invariance tests and adversarial attacks. The code is available in https://github.com/zxlzr/DCL.
X. Chen and X. Xie—Equal contribution and shared co-first authorship.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abe, F., Josh, A.: Understanding self-supervised and contrastive learning with bootstrap your own latent (BYOL). https://untitled-ai.github.io/understanding-self-supervised-contrastive-learning.html
Arora, S., Khandeparkar, H., Khodak, M., Plevrakis, O., Saunshi, N.: A theoretical analysis of contrastive unsupervised representation learning. arXiv preprint arXiv:1902.09229 (2019)
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: Proceedings of Machine Learning and Systems 2020, pp. 10719–10729 (2020)
Chi, Z., et al.: InfoXLM: an information-theoretic framework for cross-lingual language model pre-training. arXiv preprint arXiv:2007.07834 (2020)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL, pp. 4171–4186. Minneapolis, Minnesota, June 2019. https://doi.org/10.18653/v1/N19-1423
Fang, H., Xie, P.: CERT: contrastive self-supervised learning for language understanding. arXiv preprint arXiv:2005.12766 (2020)
Giorgi, J.M., Nitski, O., Bader, G.D., Wang, B.: DeCLUTR: deep contrastive learning for unsupervised textual representations. arXiv preprint arXiv:2006.03659 (2020)
Grill, J.B., et al.: Bootstrap your own latent: a new approach to self-supervised learning. arXiv preprint arXiv:2006.07733 (2020)
Gunel, B., Du, J., Conneau, A., Stoyanov, V.: Supervised contrastive learning for pre-trained language model fine-tuning. arXiv preprint arXiv:2011.01403 (2020)
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.B.: Momentum contrast for unsupervised visual representation learning. CoRR abs/1911.05722 (2019)
Jin, D., Jin, Z., Zhou, J.T., Szolovits, P.: Is BERT really robust ? A strong baseline for natural language attack on text classification and entailment. arXiv:1907 (2019)
Li, L., Qiu, X.: TextAT: adversarial training for natural language understanding with token-level perturbation. arXiv preprint arXiv:2004.14543 (2020)
Li, L., et al.: Normal vs. adversarial: salience-based analysis of adversarial samples for relation extraction. arXiv preprint arXiv:2104.00312 (2021)
Liu, T., Moore, A.W., Yang, K., Gray, A.G.: An investigation of practical approximate nearest neighbor algorithms. In: Advances in Neural Information Processing Systems, pp. 825–832 (2005)
Maaten, L.V.D., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)
Meng, Y., Zhang, Y., Huang, J., Zhang, Y., Zhang, C., Han, J.: Hierarchical topic mining via joint spherical tree and text embedding. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1908–1917 (2020)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems 26, pp. 3111–3119 (2013)
Mnih, A., Kavukcuoglu, K.: Learning word embeddings efficiently with noise-contrastive estimation. In: Advances in Neural Information Processing Systems 26, pp. 2265–2273 (2013)
Ren, S., Deng, Y., He, K., Che, W.: Generating natural language adversarial examples through probability weighted word saliency. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1085–1097 (2019)
Rethmeier, N., Augenstein, I.: A primer on contrastive pretraining in language processing: methods, lessons learned and perspectives. arXiv preprint arXiv:2102.12982 (2021)
Ribeiro, M.T., Wu, T., Guestrin, C., Singh, S.: Beyond accuracy: behavioral testing of NLP models with checklist. In: Jurafsky, D., Chai, J., Schluter, N., Tetreault, J.R. (eds.) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, 5–10 July 2020, pp. 4902–4912. Association for Computational Linguistics (2020). https://www.aclweb.org/anthology/2020.acl-main.442/
Santurkar, S., Tsipras, D., Ilyas, A., Madry, A.: How does batch normalization help optimization? In: Advances in Neural Information Processing Systems, pp. 2483–2493 (2018)
Saunshi, N., Plevrakis, O., Arora, S., Khodak, M., Khandeparkar, H.: A theoretical analysis of contrastive unsupervised representation learning. In: Proceedings of the 36th International Conference on Machine Learning, vol. 97, pp. 5628–5637. PMLR, 9–15 June 2019, Long Beach, California, USA. http://proceedings.mlr.press/v97/saunshi19a.html
Shen, S., Yao, Z., Gholami, A., Mahoney, M.W., Keutzer, K.: PowerNorm: rethinking batch normalization in transformers. In: The proceedings of the International Conference on Machine Learning (ICML) (2020)
Tian, Y., Krishnan, D., Isola, P.: Contrastive multiview coding. CoRR abs/1906.05849 (2019). arxiv:1906.05849
Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: GLUE: a multi-task benchmark and analysis platform for natural language understanding. In: 7th International Conference on Learning Representations, ICLR 2019. OpenReview.net (2019). https://openreview.net/forum?id=rJ4km2R5t7
Wang, T., Isola, P.: Understanding contrastive representation learning through alignment and uniformity on the hypersphere. arXiv preprint arXiv:2005.10242 (2020)
Wei, X., Hu, Y., Weng, R., Xing, L., Yu, H., Luo, W.: On learning universal representations across languages. arXiv preprint arXiv:2007.15960 (2020)
Wen, Y., Li, S., Jia, K.: Towards understanding the regularization of adversarial robustness on neural networks (2019)
Wu, Z., Xiong, Y., Yu, S.X., Lin, D.: Unsupervised feature learning via non-parametric instance discrimination. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, pp. 3733–3742. IEEE Computer Society (2018)
Wu, Z., Wang, S., Gu, J., Khabsa, M., Sun, F., Ma, H.: CLEAR: contrastive learning for sentence representation. arXiv preprint arXiv:2012.15466 (2020)
Xiong, L., et al.: Approximate nearest neighbor negative contrastive learning for dense text retrieval. arXiv preprint arXiv:2007.00808 (2020)
Ye, H., et al.: Contrastive triple extraction with generative transformer. arXiv preprint arXiv:2009.06207 (2020)
Ye, M., Zhang, X., Yuen, P.C., Chang, S.: Unsupervised embedding learning via invariant and spreading instance feature. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, pp. 6210–6219. Computer Vision Foundation/IEEE (2019)
Acknowledgments
We want to express gratitude to the anonymous reviewers for their hard work and kind comments. This work is funded by NSFC91846204/NSFCU19B2027.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, X. et al. (2021). Disentangled Contrastive Learning for Learning Robust Textual Representations. In: Fang, L., Chen, Y., Zhai, G., Wang, J., Wang, R., Dong, W. (eds) Artificial Intelligence. CICAI 2021. Lecture Notes in Computer Science(), vol 13070. Springer, Cham. https://doi.org/10.1007/978-3-030-93049-3_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-93049-3_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93048-6
Online ISBN: 978-3-030-93049-3
eBook Packages: Computer ScienceComputer Science (R0)