Skip to main content
Log in

Automatic bi-modal question title generation for Stack Overflow with prompt learning

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

When drafting question posts for Stack Overflow, developers may not accurately summarize the core problems in the question titles, which can cause these questions to not get timely help. Therefore, improving the quality of question titles has attracted the wide attention of researchers. An initial study aimed to automatically generate the titles by only analyzing the code snippets in the question body. However, this study ignored the helpful information in their corresponding problem descriptions. Therefore, we propose an approach SOTitle+ by considering bi-modal information (i.e., the code snippets and the problem descriptions) in the question body. Then we formalize the title generation for different programming languages as separate but related tasks and utilize multi-task learning to solve these tasks. Later we fine-tune the pre-trained language model CodeT5 to automatically generate the titles. Unfortunately, the inconsistent inputs and optimization objectives between the pre-training task and our investigated task may make fine-tuning hard to fully explore the knowledge of the pre-trained model. To solve this issue, SOTitle+ further prompt-tunes CodeT5 with hybrid prompts (i.e., mixture of hard and soft prompts). To verify the effectiveness of SOTitle+, we construct a large-scale high-quality corpus from recent data dumps shared by Stack Overflow. Our corpus includes 179,119 high-quality question posts for six popular programming languages. Experimental results show that SOTitle+ can significantly outperform four state-of-the-art baselines in both automatic evaluation and human evaluation. In addition, our ablation studies also confirm the effectiveness of component settings (such as bi-modal information, prompt learning, hybrid prompts, and multi-task learning) of SOTitle+. Our work indicates that considering bi-modal information and prompt learning in Stack Overflow title generation is a promising exploration direction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data Availibility Statements

The datasets generated during and analyzed during the current study are available in the Github repository (https://github.com/shaoyuyoung/SOTitlePlus).

Notes

  1. https://stackoverflow.com/help/how-to-ask

  2. https://www.youtube.com/watch?v=_KgUISAT74M

  3. https://github.com/shaoyuyoung/SOTitlePlus

  4. https://zenodo.org/records/10656359

  5. https://archive.org/download/stackexchange, downloaded in March 2023

  6. https://stackoverflow.com/tags?tab=popular, accessed in March 2023

  7. https://stackoverflow.com/questions/51560850

  8. https://lxml.de/

  9. An example used to clarify this process can be found in https://github.com/shaoyuyoung/SOTitlePlus/blob/main/embeddings.md

  10. https://stackoverflow.com/questions/1478248

  11. https://github.com/Maluuba/nlg-eval

  12. https://pytorch.org/

  13. https://github.com/huggingface/transformers

  14. https://github.com/thunlp/OpenPrompt

  15. https://stackoverflow.com/questions/66287470

  16. https://github.com/shaoyuyoung/SOTitlePlus/blob/main/Appendix.md

  17. https://platform.openai.com/docs/api-reference. The version of ChatGPT we use is GPT-3.5-turbo.

  18. https://stackoverflow.com/questions/51523765

  19. https://stackoverflow.com/questions/51579215

  20. https://stackoverflow.com/questions/59391560

  21. https://console.cloud.google.com/marketplace/details/github/github-repos

References

  • Ahmad W, Chakraborty S, Ray B, Chang KW (2020) A transformer-based approach for source code summarization. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp 4998–5007

  • Ahmad W, Chakraborty S, Ray B, Chang KW (2021) Unified pre-training for program understanding and generation. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 2655–2668

  • Anderson A, Huttenlocher D, Kleinberg J, Leskovec J (2012) Discovering value from community activity on focused question answering sites: a case study of stack overflow. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 850–858

  • Arora P, Ganguly D, Jones GJ (2015) The good, the bad and their kins: Identifying questions with negative scores in stackoverflow. In: 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), IEEE, pp 1232–1239

  • Banerjee S, Lavie A (2005) Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In: Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pp 65–72

  • Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, et al (2020) Language models are few-shot learners. In: Proceedings of the 34th International Conference on Neural Information Processing Systems, pp 1877–1901

  • Cao K, Chen C, Baltes S, Treude C, Chen X (2021) Automated query reformulation for efficient search based on query logs from stack overflow. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), IEEE, pp 1273–1285

  • Cheng J, Dong L, Lapata M (2016) Long short-term memory-networks for machine reading. In: 2016 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp 551–561

  • Cohen J (1960) A coefficient of agreement for nominal scales. Educational and psychological measurement 20(1):37–46

    Article  Google Scholar 

  • Correa D, Sureka A (2013) Fit or unfit: analysis and prediction of’closed questions’ on stack overflow. In: Proceedings of the first ACM conference on Online social networks, pp 201–212

  • Ding N, Hu S, Zhao W, Chen Y, Liu Z, Zheng H, Sun M (2022) Openprompt: An open-source framework for prompt-learning. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp 105–113

  • Duijn M, Kucera A, Bacchelli A (2015) Quality questions need quality code: Classifying code fragments on stack overflow. In: 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, IEEE, pp 410–413

  • El-Kassas WS, Salama CR, Rafea AA, Mohamed HK (2021) Automatic text summarization: A comprehensive survey. Expert systems with applications 165:113679

    Article  Google Scholar 

  • Gao Z, Xia X, Grundy J, Lo D, Li YF (2020) Generating question titles for stack overflow from mined code snippets. ACM Transactions on Software Engineering and Methodology (TOSEM) 29(4):1–37

    Article  Google Scholar 

  • Gao Z, Xia X, Lo D, Grundy J, Li YF (2021) Code2que: A tool for improving question titles from mined code snippets in stack overflow. In: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp 1525–1529

  • Gao Z, Xia X, Lo D, Grundy J, Zhang X, Xing Z (2023) I know what you are searching for: Code snippet recommendation from stack overflow posts. ACM Transactions on Software Engineering and Methodology 32(3):1–42

    Article  Google Scholar 

  • Gros D, Sezhiyan H, Devanbu P, Yu Z (2020) Code to comment “translation”: Data, metrics, baselining & evaluation. In: 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), IEEE, pp 746–757

  • Hu X, Li G, Xia X, Lo D, Jin Z (2020) Deep code comment generation with hybrid lexical and syntactical information. Empirical Software Engineering 25(3):2179–2217

    Article  Google Scholar 

  • Huang Q, Yuan Z, Xing Z, Xu X, Zhu L, Lu Q (2022) Prompt-tuned code language model as a neural knowledge base for type inference in statically-typed partial code. In: 37th IEEE/ACM International Conference on Automated Software Engineering, pp 1–13

  • Husain H, Wu HH, Gazit T, Allamanis M, Brockschmidt M (2019) Codesearchnet challenge: Evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436

  • Islam MJ, Nguyen G, Pan R, Rajan H (2019) A comprehensive study on deep learning bug characteristics. In: Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp 510–520

  • Iyer S, Konstas I, Cheung A, Zettlemoyer L (2016) Summarizing source code using a neural attention model. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp 2073–2083

  • Jin X, Servant F (2019) What edits are done on the highly answered questions in stack overflow? an empirical study. In: 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), IEEE, pp 225–229

  • Lester B, Al-Rfou R, Constant N (2021) The power of scale for parameter-efficient prompt tuning. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp 3045–3059

  • Li X, Ren X, Xue Y, Xing Z, Sun J (2023) Prediction of vulnerability characteristics based on vulnerability description and prompt learning. 2023 IEEE International Conference on Software Analysis. Evolution and Reengineering (SANER), IEEE, pp 604–615

    Google Scholar 

  • Li Z, Wu Y, Peng B, Chen X, Sun Z, Liu Y, Yu D (2021) Secnn: A semantic cnn parser for code comment generation. Journal of Systems and Software 181:111036

    Article  Google Scholar 

  • Li Z, Wu Y, Peng B, Chen X, Sun Z, Liu Y, Paul D (2022) Setransformer: A transformer-based code semantic parser for code comment generation. IEEE Transactions on Reliability 72(1):258–273

    Article  Google Scholar 

  • LIN C (2004) Rouge: A package for automatic evaluation of summaries. In: Proc. Workshop on Text Summariation Branches Out, Post-Conference Workshop of ACL 2004

  • Lin H, Chen X, Chen X, Cui Z, Miao Y, Zhou S, Wang J, Su Z (2023) Gen-fl: Quality prediction-based filter for automated issue title generation. Journal of Systems and Software 195:111513

    Article  Google Scholar 

  • Liu C, Bao X, Zhang H, Zhang N, Hu H, Zhang X, Yan M (2023a) Improving chatgpt prompt for code generation. arXiv:2305.08360

  • Liu K, Yang G, Chen X, Yu C (2022) Sotitle: A transformer-based post title generation approach for stack overflow. 2022 IEEE International Conference on Software Analysis. Evolution and Reengineering (SANER), IEEE, pp 577–588

    Google Scholar 

  • Liu K, Chen X, Chen C, Xie X, Cui Z (2023) Automated question title reformulation by mining modification logs from stack overflow. IEEE Transactions on Software Engineering 49(9):4390–4410

    Article  Google Scholar 

  • Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G (2023) Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys 55(9):1–35

    Article  Google Scholar 

  • Liu Q, Liu Z, Zhu H, Fan H, Du B, Qian Y (2019a) Generating commit messages from diffs using pointer-generator network. In: 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), IEEE, pp 299–309

  • Liu X, He P, Chen W, Gao J (2019b) Multi-task deep neural networks for natural language understanding. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp 4487–4496

  • Liu X, Zheng Y, Du Z, Ding M, Qian Y, Yang Z, Tang J (2023d) Gpt understands, too. AI Open

  • Loshchilov I, Hutter F (2018) Decoupled weight decay regularization. In: International Conference on Learning Representations

  • Niu C, Li C, Ng V, Chen D, Ge J, Luo B (2023) An empirical comparison of pre-trained models of source code. arXiv:2302.04026

  • Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp 311–318

  • Ponzanelli L, Mocci A, Bacchelli A, Lanza M, Fullerton D (2014) Improving low quality stack overflow post detection. In: 2014 IEEE international conference on software maintenance and evolution, IEEE, pp 541–544

  • Prechelt L (1998) Early stopping-but when? In: Neural Networks: Tricks of the trade, Springer, pp 55–69

  • Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, Zhou Y, Li W, Liu PJ (2020) Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research 21:1–67

    MathSciNet  Google Scholar 

  • Schick T, Schütze H (2021) Exploiting cloze-questions for few-shot text classification and natural language inference. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp 255–269

  • Tóth L, Nagy B, Janthó D, Vidács L, Gyimóthy T (2019) Towards an accurate prediction of the question quality on stack overflow using a deep-learning-based nlp approach. In: Proceedings of the 14th International Conference on Software Technologies, pp 631–639

  • Trienes J, Balog K (2019) Identifying unclear questions in community question answering websites. In: 41st European Conference on Information Retrieval, ECIR 2019, Springer, pp 276–289

  • Vedantam R, Lawrence Zitnick C, Parikh D (2015) Cider: Consensus-based image description evaluation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4566–4575

  • Wang C, Yang Y, Gao C, Peng Y, Zhang H, Lyu MR (2022) No more fine-tuning? an experimental evaluation of prompt tuning in code intelligence. In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp 382–394

  • Wang Y, Wang W, Joty S, Hoi SC (2021) Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp 8696–8708

  • Wei B, Li Y, Li G, Xia X, Jin Z (2020) Retrieve and refine: exemplar-based neural comment generation. In: Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, pp 349–360

  • Wilcoxon F (1992) Individual comparisons by ranking methods. Springer

    Book  Google Scholar 

  • Xia CS, Zhang L (2023) Keep the conversation going: Fixing 162 out of 337 bugs for $0.42 each using chatgpt. arXiv preprint arXiv:2304.00385

  • Xu B, Hoang T, Sharma A, Yang C, Xia X, Lo D (2021) Post2vec: Learning distributed representations of stack overflow posts. IEEE Transactions on Software Engineering 48(9):3423–3441

    Article  Google Scholar 

  • Yang G, Chen X, Zhou Y, Yu C (2022) Dualsc: Automatic generation and summarization of shellcode via transformer and dual learning. 2022 IEEE International Conference on Software Analysis. Evolution and Reengineering (SANER), IEEE, pp 361–372

    Google Scholar 

  • Yang G, Liu K, Chen X, Zhou Y, Yu C, Lin H (2022) Ccgir: Information retrieval-based code comment generation method for smart contracts. Knowledge-Based Systems 237:107858

    Article  Google Scholar 

  • Yang G, Zhou Y, Chen X, Zhang X, Han T, Chen T (2023a) Exploitgen: Template-augmented exploit code generation based on codebert. Journal of Systems and Software 197:111577

  • Yang G, Zhou Y, Chen X, Zhang X, Xu Y, Han T, Chen T (2023) A syntax-guided multi-task learning approach for turducken-style code generation. Empirical Software Engineering 28(6):141

    Article  Google Scholar 

  • Yang J, Hauff C, Bozzon A, Houben GJ (2014) Asking the right question in collaborative q &a systems. In: Proceedings of the 25th ACM conference on Hypertext and social media, pp 179–189

  • Yazdaninia M, Lo D, Sami A (2021) Characterization and prediction of questions without accepted answers on stack overflow. In: 2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC), IEEE, pp 59–70

  • Yin P, Deng B, Chen E, Vasilescu B, Neubig G (2018) Learning to mine aligned code and natural language pairs from stack overflow. In: Proceedings of the 15th international conference on mining software repositories, pp 476–486

  • Zhang F, Yu X, Keung J, Li F, Xie Z, Yang Z, Ma C, Zhang Z (2022) Improving stack overflow question title generation with copying enhanced codebert model and bi-modal information. Information and Software Technology 148:106922

    Article  Google Scholar 

  • Zhang F, Liu J, Wan Y, Yu X, Liu X, Keung J (2023) Diverse title generation for stack overflow posts with multiple-sampling-enhanced transforme. Journal of Systems and Software 200:111672

    Article  Google Scholar 

  • Zhang Y, Yang Q (2021) A survey on multi-task learning. IEEE Transactions on Knowledge and Data Engineering 34(12):5586–5609

    Article  Google Scholar 

  • Zhu J, Li L, Yang L, Ma X, Zuo C (2023) Automating method naming with context-aware prompt-tuning. arXiv:2303.05771

Download references

Acknowledgements

The authors would like to thank the editors and the anonymous reviewers for their insightful comments and suggestions, which can substantially improve the quality of this work. Shaoyu Yang and Xiang Chen have contributed equally to this work and they are co-first authors. Xiang Chen is the corresponding author. This work is supported in part by the National Natural Science Foundation of China (Grant no. 61202006 and 61702041) and the Innovation Training Program for College Students (Grant no. 2023214 and 2023356).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiang Chen.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Communicated by: Sebastian Baltes.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, S., Chen, X., Liu, K. et al. Automatic bi-modal question title generation for Stack Overflow with prompt learning. Empir Software Eng 29, 63 (2024). https://doi.org/10.1007/s10664-024-10466-4

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-024-10466-4

Keywords

Navigation