Empirical Software Engineering

, Volume 22, Issue 6, pp 3149–3185 | Cite as

What do developers search for on the web?

  • Xin Xia
  • Lingfeng BaoEmail author
  • David Lo
  • Pavneet Singh Kochhar
  • Ahmed E. Hassan
  • Zhenchang Xing


Developers commonly make use of a web search engine such as Google to locate online resources to improve their productivity. A better understanding of what developers search for could help us understand their behaviors and the problems that they meet during the software development process. Unfortunately, we have a limited understanding of what developers frequently search for and of the search tasks that they often find challenging. To address this gap, we collected search queries from 60 developers, surveyed 235 software engineers from more than 21 countries across five continents. In particular, we asked our survey participants to rate the frequency and difficulty of 34 search tasks which are grouped along the following seven dimensions: general search, debugging and bug fixing, programming, third party code reuse, tools, database, and testing. We find that searching for explanations for unknown terminologies, explanations for exceptions/error messages (e.g., HTTP 404), reusable code snippets, solutions to common programming bugs, and suitable third-party libraries/services are the most frequent search tasks that developers perform, while searching for solutions to performance bugs, solutions to multi-threading bugs, public datasets to test newly developed algorithms or systems, reusable code snippets, best industrial practices, database optimization solutions, solutions to security bugs, and solutions to software configuration bugs are the most difficult search tasks that developers consider. Our study sheds light as to why practitioners often perform some of these tasks and why they find some of them to be challenging. We also discuss the implications of our findings to future research in several research areas, e.g., code search engines, domain-specific search engines, and automated generation and refinement of search queries.


Search task Understanding Empirical study Survey 



The authors thank to all the developers who participated in this study. This research is supported by NSFC Program (No.61602403) and National Key Technology R&D Program of the Ministry of Science and Technology of China under grant 2015BAH17F01.


  1. Koders (2016)
  2. Bajracharya S, Ngo T, Linstead E, Dou Y, Rigor P, Baldi P, Lopes C (2006) Sourcerer: a search engine for open source code supporting structure-based search Proceedings of the 21st ACM SIGPLAN symposium on object-oriented programming systems, languages, and applications, ACM, pp 681–682Google Scholar
  3. Bajracharya SK, Lopes CV (2009) Mining search topics from a code search engine usage log Proceedings of the 6th international working conference on mining software repositories (MSR), IEEEGoogle Scholar
  4. Bajracharya SK, Lopes CV (2012) Analyzing and mining a code search engine usage log. Empir Softw Eng 17(4-5):424–466CrossRefGoogle Scholar
  5. Bao L, Xing Z, Wang X, Zhou B (2015a) Tracking and analyzing cross-cutting activities in developers’ daily work Proceedings of the 30th IEEE/ACM international conference on automated software engineering (ASE), pp 277–282Google Scholar
  6. Bao L, Ye D, Xing Z, Xia X, Wang X (2015b) Activityspace: a remembrance framework to support interapplication information needs Proceedings of the 30th IEEE/ACM international conference on automated software engineering (ASE), IEEE, pp 864–869Google Scholar
  7. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022zbMATHGoogle Scholar
  8. Brandt J, Guo PJ, Lewenstein J, Dontcheva M, Klemmer SR (2009) Two studies of opportunistic programming: interleaving web foraging, learning, and writing code Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp 1589–1598Google Scholar
  9. Broder A (2002) A taxonomy of web search ACM SIGIR Forum, ACM, vol 36, pp 3–10Google Scholar
  10. Cutrell E, Guan Z (2007) What are you looking for?: an eye-tracking study of information usage in web search Proceedings of the SIGCHI conference on human factors in computing systems, ACM, pp 407–416Google Scholar
  11. Haiduc S, Bavota G, Marcus A, Oliveto R, Lucia AD, Menzies T (2013) Automatic query reformulations for text retrieval in software engineering Proceedings of the 35th international conference on software engineering (ICSE), pp 842–851Google Scholar
  12. Jansen BJ, Spink A, Saracevic T (2000) Real life, real users, and real needs: a study and analysis of user queries on the web. Inf Process Manag 36(2):207–227CrossRefGoogle Scholar
  13. Ko AJ, Myers BA, Coblenz MJ, Aung HH (2006) An exploratory study of how developers seek, relate, and collect relevant information during software maintenance tasks. IEEE Trans Softw Eng (TSE) 32(12):971–987CrossRefGoogle Scholar
  14. Lee U, Liu Z, Cho J (2005) Automatic identification of user goals in web search Proceedings of the 14th international conference on world wide web (WWW), ACM, pp 391–400Google Scholar
  15. Lemos OAL, Bajracharya SK, Ossher J, Morla RS, Masiero PC, Baldi P, Lopes CV (2007) Codegenie: using test-cases to search and reuse source code Proceedings of the 22nd IEEE/ACM international conference on automated software engineering (ASE), ACM, pp 525–526Google Scholar
  16. Li H, Xing Z, Peng X, Zhao W (2013) What help do developers seek, when and how? Proceedings of the 20th working conference on reverse engineering (WCRE), IEEE, pp 142–151Google Scholar
  17. Linstead E, Bajracharya S, Ngo T, Rigor P, Lopes C, Baldi P (2009) Sourcerer: mining and searching internet-scale software repositories. Data Min Knowl Disc 18(2):300–336MathSciNetCrossRefGoogle Scholar
  18. Ponzanelli L, Bacchelli A, Lanza M (2013) Seahawk: Stack overflow in the ide Proceedings of the 2013 international conference on software engineering, IEEE Press, pp 1295–1298Google Scholar
  19. Rahman MM, Yeasmin S, Roy CK (2014) Towards a context-aware ide-based meta search engine for recommendation about programming errors and exceptions Software evolution week-IEEE conference on software maintenance, reengineering and reverse engineering (CSMR-WCRE), 2014, IEEE, pp 194–203Google Scholar
  20. Rose DE, Levinson D (2004) Understanding user goals in web search Proceedings of the 13th international conference on world wide web (WWW), ACM, pp 13–19Google Scholar
  21. Sadowski C, Stolee KT, Elbaum S (2015) How developers search for code: a case study Proceedings of the 10th joint meeting on foundations of software engineering (FSE), ACM, pp 191–201Google Scholar
  22. Scott AJ, Knott M (1974) A cluster analysis method for grouping means in the analysis of variance. Biometrics 30(3):507–512Google Scholar
  23. Sillito J, Murphy GC, De Volder K (2006) Questions programmers ask during software evolution tasks Proceedings of the 14th ACM SIGSOFT international symposium on foundations of software engineering, ACM, pp 23–34Google Scholar
  24. Silverstein C, Marais H, Henzinger M, Moricz M (1999) Analysis of a very large web search engine query log ACM SIGIR Forum, ACM, vol 33, pp 6–12Google Scholar
  25. Sim SE, Clarke CL, Holt RC (1998) Archetypal source code searches: a survey of software developers and maintainers Proceedings of the 6th international workshop on program comprehension (IWPC), IEEE, pp 180–187Google Scholar
  26. Sim SE, Umarji M, Ratanotayanon S, Lopes CV (2011) How well do search engines support code retrieval on the web? ACM Trans Softw Eng Methodol (TOSEM) 21(1):4CrossRefGoogle Scholar
  27. Sim SE, Philip K, Umarji M, Agarwala M, Gallardo-Valencia R, Lopes CV, Ratanotayanon S (2012) Software reuse through methodical component reuse and amethodical snippet remixing Proceedings of the ACM 2012 conference on computer supported cooperative work, ACM, pp 1361–1370Google Scholar
  28. Spink A, Jansen BJ, Wolfram D, Saracevic T (2002) From e-sex to e-commerce: Web search changes. Computer 35(3):107–109CrossRefGoogle Scholar
  29. Stolee KT, Elbaum S, Dobos D (2014) Solving the search for source code. ACM Trans Softw Eng Methodol (TOSEM) 23(3):26CrossRefGoogle Scholar
  30. Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2017) An empirical comparison of model validation techniques for defect prediction model. IEEE Trans Softw Eng (TSE) 43(1):1–18Google Scholar
  31. Treude C, Barzilay O, Storey MA (2011) How do programmers ask and answer questions on the web?: Nier track Proceedings of the 33rd international conference on software engineering (ICSE), IEEE, pp 804–807Google Scholar
  32. Wuensch KL (2005) What is a likert scale? and how do you pronounce’likert?’. East Carolina UniversityGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  • Xin Xia
    • 1
    • 2
  • Lingfeng Bao
    • 1
    Email author
  • David Lo
    • 3
  • Pavneet Singh Kochhar
    • 3
  • Ahmed E. Hassan
    • 4
  • Zhenchang Xing
    • 5
  1. 1.College of Computer Science and TechnologyZhejiang UniversityHangzhouChina
  2. 2.Department of Computer ScienceUniversity of British ColumbiaVancouverCanada
  3. 3.School of Information SystemsSingapore Management UniversitySingaporeSingapore
  4. 4.School of ComputingQueen’s UniversityKingstonCanada
  5. 5.Research School of Computer ScienceAustralian National UniversityCanberraAustralia

Personalised recommendations