Skip to main content

Software Mining Studies: Goals, Approaches, Artifacts, and Replicability

  • Chapter
  • First Online:
Software Engineering (LASER 2013, LASER 2014)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 8987))

Abstract

The mining of software archives has enabled new ways for increasing the productivity in software development: Analyzing software quality, mining project evolution, investigating change patterns and evolution trends, mining models for development processes, developing methods of integrating mined data from various historical sources, or analyzing natural language artifacts in software repositories, are examples of research topics. Software repositories include various data, ranging from source control systems, issue tracking systems, artifact repositories such as requirements, design and architectural documentation, to archived communication between project members. Practitioners and researchers have recognized the potential of mining these sources to support the maintenance of software, to improve their design or architecture, and to empirically validate development techniques or processes. We revisited software mining studies that were published in recent years in the top venues of software engineering, such as ICSE, ESEC/FSE, and MSR. In analyzing these software mining studies, we highlight different viewpoints: pursued goals, state-of-the-art approaches, mined artifacts, and study replicability. To analyze the mining artifacts, we (lexically) analyzed research papers of more than a decade. In terms of replicability we looked at existing work in the field in mining approaches, tools, and platforms. We address issues of replicability and reproducibility to shed light onto challenges for large-scale mining studies that would enable a stronger conclusion stability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://msrconf.org.

  2. 2.

    http://www.ifi.uzh.ch/seal/events/msa2010.html.

  3. 3.

    http://www.ifi.uzh.ch/seal/events/ASDS-2013.html.

  4. 4.

    http://msrcanada.org/msrvision2020/.

  5. 5.

    http://MSRconf.org.

  6. 6.

    http://www.se-on.org/.

  7. 7.

    http://findbugs.sourceforge.net/.

  8. 8.

    http://openscience.us/repo/.

  9. 9.

    We used the java PDF library Apache PDFBox, https://pdfbox.apache.org/.

  10. 10.

    http://jmlr.csail.mit.edu/papers/volume5/lewis04a/a11-smart-stop-list/english.stop.

  11. 11.

    www.promisedata.org.

  12. 12.

    www.krugle.org.

  13. 13.

    www.openhub.net.

References

  1. Aggarwal, K., Hindle, A., Stroulia, E.: Co-evolution of project documentation and popularity within github. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 360–363. ACM, New York (2014)

    Google Scholar 

  2. Alipour, A., Hindle, A., Stroulia, E.: A contextual approach towards more accurate duplicate bug report detection. In: Proceedings of the 10th Working Conference on Mining Software Repositories, MSR 2013, pp. 183–192. IEEE Press, Piscataway (2013)

    Google Scholar 

  3. Anderson, J., Salem, S., Do, H.: Improving the effectiveness of test suite through mining historical data. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 142–151. ACM, New York (2014)

    Google Scholar 

  4. Bajaj, K., Pattabiraman, K., Mesbah, A.: Mining questions asked by web developers. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 112–121. ACM, New York (2014)

    Google Scholar 

  5. Baldassari, B., Preux, P.: Understanding software evolution: the maisqual ant data set. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 424–427. ACM, New York (2014)

    Google Scholar 

  6. Beller, M., Bacchelli, A., Zaidman, A., Juergens, E.: Modern code reviews in open-source projects: which problems do they fix? In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 202–211. ACM, New York (2014)

    Google Scholar 

  7. Bevan, J., Whitehead Jr., E.J., Kim, S., Godfrey, M.: Facilitating software evolution research with kenyon. In: Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ESEC/FSE-13, pp. 177–186. ACM, New York (2005)

    Google Scholar 

  8. Bird, C., Bachmann, A., Aune, E., Duffy, J., Bernstein, A., Filkov, V., Devanbu, P.: Fair and balanced?: bias in bug-fix datasets. In: Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, ESEC/FSE 2009, pp. 121–130. ACM, New York (2009)

    Google Scholar 

  9. Bloemen, R., Amrit, C., Kuhlmann, S., Matamoros, G.O.: Gentoo package dependencies over time. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 404–407. ACM, New York (2014)

    Google Scholar 

  10. Bruch, M., Monperrus, M., Mezini, M.: Learning from examples to improve code completion systems. In: Proceedings of the the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, ESEC/FSE 2009, pp. 213–222. ACM, New York (2009)

    Google Scholar 

  11. Brunet, J., Murphy, G.C., Terra, R., Figueiredo, J., Serey, D.: Do developers discuss design? In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 340–343. ACM, New York (2014)

    Google Scholar 

  12. Campbell, J.C., Hindle, A., Amaral, J.N.: Syntax errors just aren’t natural: improving error reporting with language models. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 252–261. ACM, New York (2014)

    Google Scholar 

  13. Chen, N., Lin, J., Hoi, S.C.H., Xiao, X., Zhang, B.: Ar-miner: mining informative reviews for developers from mobile app marketplace. In: Proceedings of the 36th International Conference on Software Engineering, ICSE 2014, pp. 767–778. ACM, New York (2014)

    Google Scholar 

  14. Chen, T-H., Nagappan, M., Shihab, E., Hassan, A.E.: An empirical study of dormant bugs. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 82–91. ACM, New York (2014)

    Google Scholar 

  15. Davril, J-M., Delfosse, E., Hariri, N., Acher, M., Cleland-Huang, J., Heymans, P.: Feature model extraction from large collections of informal product descriptions. In: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2013, pp. 290–300. ACM, New York (2013)

    Google Scholar 

  16. Demeyer, S., Murgia, A., Wyckmans, K., Lamkanfi, A.: Happy birthday! a trend analysis on past msr papers. In: Proceedings of the 10th Working Conference on Mining Software Repositories, MSR 2013, pp. 353–362. IEEE Press, Piscataway (2013)

    Google Scholar 

  17. Dyer, R., Rajan, H., Nguyen, H.A., Nguyen, T.N.: Mining billions of ast nodes to study actual and potential usage of java language features. In: Proceedings of the 36th International Conference on Software Engineering, ICSE 2014, pp. 779–790. ACM, New York (2014)

    Google Scholar 

  18. Ekanayake, J., Tappolet, J., Gall, H.C., Bernstein, A.: Time variance and defect prediction in software projects. Empirical Softw. Eng. 17(4–5), 348–389 (2012)

    Article  Google Scholar 

  19. Joorabchi, M.E., Mirzaaghaei, M., Mesbah, A.: Works for me! characterizing non-reproducible bug reports. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 62–71. ACM, New York (2014)

    Google Scholar 

  20. Eyolfson, J., Tan, L., Lam, P.: Do time of day and developer experience affect commit bugginess? In: Proceedings of the 8th Working Conference on Mining Software Repositories, MSR 2011, pp. 153–162. ACM, New York (2011)

    Google Scholar 

  21. Farah, G., Tejada, J.S., Correal, D.: Openhub: a scalable architecture for the analysis of software quality attributes. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 420–423. ACM, New York (2014)

    Google Scholar 

  22. Femmer, H., Ganesan, D., Lindvall, M., McComas, D.: Detecting inconsistencies in wrappers: a case study. In: Proceedings of the 2013 International Conference on Software Engineering, ICSE 2013, pp. 1022–1031. IEEE Press, Piscataway (2013)

    Google Scholar 

  23. Fujiwara, K., Hata, H., Makihara, E., Fujihara, Y., Nakayama, N., Iida, H., Matsumoto, K.: Kataribe: a hosting service of historage repositories. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 380–383. ACM, New York (2014)

    Google Scholar 

  24. Fukushima, T., Kamei, Y., McIntosh, S., Yamashita, K., Ubayashi, N.: An empirical study of just-in-time defect prediction using cross-project models. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 172–181. ACM, New York (2014)

    Google Scholar 

  25. Ghezzi, G., Gall, H.: Replicating mining studies with SOFAS. In: 10th Working Conference on Mining Software Repositories. IEEE Computer Society, Washington (2013)

    Google Scholar 

  26. Ghezzi, G., Gall, H.C.: SOFAS: a lightweight architecture for software analysis as a service. In: Working IEEE/IFIP Conference on Software Architecture (WICSA: 20–24 June 2011). IEEE Computer Society, Boulder (2011)

    Google Scholar 

  27. Ghezzi, G., Gall, H.C.: A framework for semi-automated software evolution analysis composition. Int. J. Autom. Softw. Eng. 20(3), 463–496 (2013)

    Article  Google Scholar 

  28. Giger, E., D’Ambros, M., Pinzger, M., Gall, H.C.: Method-level bug prediction. In: Proceedings of the ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, ESEM 2012, pp. 171–180. ACM, New York (2012)

    Google Scholar 

  29. Gobeille, R.: The fossology project. In: Proceedings of the 2008 International Working Conference on Mining Software Repositories, MSR 2008 (Co-located with ICSE), May 10–11, 2008, pp. 47–50, Leipzig, Germany (2008)

    Google Scholar 

  30. González-Barahona, J.M., Robles, G.: On the reproducibility of empirical software engineering studies based on data retrieved from development repositories. Empirical Softw. Eng. 17(1–2), 75–89 (2012)

    Article  Google Scholar 

  31. Gousios, G., Spinellis, D.: A platform for software engineering research. In: Proceedings of the 6th International Working Conference on Mining Software Repositories, MSR 2009 (Co-located with ICSE), May 16–17, 2009, pp. 31–40, Vancouver, BC, Canada (2009)

    Google Scholar 

  32. Gousios, G., Vasilescu, B., Serebrenik, A., Zaidman, A.: Lean ghtorrent: github data on demand. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 384–387. ACM, New York (2014)

    Google Scholar 

  33. Gousios, G., Zaidman, A.: A dataset for pull-based development research. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 368–371. ACM, New York (2014)

    Google Scholar 

  34. Guo, L., Lawall, J., Muller, G.: Oops! where did that code snippet come from? In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 52–61. ACM, New York (2014)

    Google Scholar 

  35. Gupta, M., Sureka, A., Padmanabhuni, S.: Process mining multiple repositories for software defect resolution from control and organizational perspective. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 122–131. ACM, New York (2014)

    Google Scholar 

  36. Guzman, E., Azócar, D., Li, Y.: Sentiment analysis of commit comments in github: an empirical study. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 352–355. ACM, New York (2014)

    Google Scholar 

  37. Guzman, E., Bruegge, B.: Towards emotional awareness in software development teams. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2013, pp. 671–674. ACM, New York (2013)

    Google Scholar 

  38. Hanam, Q., Tan, L., Holmes, R., Lam, P.: Finding patterns in static analysis alerts: improving actionable alert ranking. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 152–161. ACM, New York (2014)

    Google Scholar 

  39. Heffner, C.L.: AllPsych: Research Methods, Chapter 1.11 Replication. http://allpsych.com/researchmethods/replication.html

  40. Heinemann, L., Bauer, V., Herrmannsdoerfer, M., Hummel, B.: Identifier-based context-dependent api method recommendation. In: CSMR 2012, pp. 31–40 (2012)

    Google Scholar 

  41. Herzig, K., Just, S., Zeller, A.: It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In: Proceedings of the 2013 International Conference on Software Engineering, ICSE 2013, pp. 392–401. IEEE Press, Piscataway (2013)

    Google Scholar 

  42. Hindle, A.: Green mining: a methodology of relating software change to power consumption. In: 2012 9th IEEE Working Conference on Mining Software Repositories (MSR), pp. 78–87, June 2012

    Google Scholar 

  43. Hindle, A., Wilson, A., Rasmussen, K., Barlow, E.J., Campbell, J.C., Romansky, S.: Greenminer: a hardware based mining software repositories software energy consumption framework. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 12–21. ACM, New York (2014)

    Google Scholar 

  44. Holmes, R., Murphy, G.C.: Using structural context to recommend source code examples. In: Proceedings of the 27th International Conference on Software Engineering, ICSE 2005, pp. 117–125. ACM, New York (2005)

    Google Scholar 

  45. Huang, C., Kamei, Y., Yamashita, K., Ubayashi, N.: Using alloy to support feature-based dsl construction for mining software repositories. In: Proceedings of the 17th International Software Product Line Conference Co-located Workshops, SPLC 2013 Workshops, pp. 86–89. ACM, New York (2013)

    Google Scholar 

  46. Inozemtseva, L., Holmes, R.: Coverage is not strongly correlated with test suite effectiveness. In: Proceedings of the 36th International Conference on Software Engineering, ICSE 2014, pp. 435–445. ACM, New York (2014)

    Google Scholar 

  47. Jing, X.-Y., Ying, S., Zhang, Z.-W., Wu, S.-S., Liu, J.: Dictionary learning based software defect prediction. In: Proceedings of the 36th International Conference on Software Engineering, ICSE 2014, pp. 414–423. ACM, New York (2014)

    Google Scholar 

  48. Johnson, B., Song, Y., Murphy-Hill, E., Bowdidge, R.: Why don’t software developers use static analysis tools to find bugs? In: Proceedings of the 2013 International Conference on Software Engineering, ICSE 2013, pp. 672–681. IEEE Press, Piscataway (2013)

    Google Scholar 

  49. Kagdi, H., Collard, M.L., Maletic, J.I.: A survey and taxonomy of approaches for mining software repositories in the context of software evolution. J. Softw. Maintenance Evol. 19, 77–131 (2007)

    Article  Google Scholar 

  50. Kalliamvakou, E., Gousios, G., Blincoe, K., Singer, L., German, D.M., Damian, D.: The promises and perils of mining github. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 92–101. ACM, New York (2014)

    Google Scholar 

  51. Kechagia, M., Spinellis, D.: Undocumented and unchecked: exceptions that spell trouble. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 312–315. ACM, New York (2014)

    Google Scholar 

  52. Keivanloo. I.: Online sharing and integration of results from mining software repositories. In: 2012 34th International Conference on Software Engineering (ICSE), pp. 1644–1646, June 2012

    Google Scholar 

  53. Keivanloo, I., Forbes, C., Hmood, A., Erfani, M., Neal, C., Peristerakis, G., Rilling, J.: A linked data platform for mining software repositories. In: 2012 9th IEEE Working Conference on Mining Software Repositories (MSR), pp. 32–35, June 2012

    Google Scholar 

  54. Keung, J., Kocaguneli, E., Menzies, T.: Finding conclusion stability for selecting the best effort predictor in software effort estimation. Autom. Softw. Engg. 20(4), 543–567 (2013)

    Article  Google Scholar 

  55. Kevic, K., Fritz, T.: A dictionary to translate change tasks to source code. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 320–323. ACM, New York (2014)

    Google Scholar 

  56. Kiefer, C., Bernstein, A., Tappolet, J.: Mining software repositories with isparol and a software evolution ontology. In: Proceedings of the Fourth International Workshop on Mining Software Repositories, MSR 2007, p. 10. IEEE Computer Society, Washington (2007)

    Google Scholar 

  57. Klein, N., Corley, C.S., Kraft, N.A.: New features for duplicate bug detection. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 324–327. ACM, New York (2014)

    Google Scholar 

  58. Kononenko, O., Baysal, O., Holmes, R., Godfrey, M.W.: Mining modern repositories with elasticsearch. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 328–331. ACM, New York (2014)

    Google Scholar 

  59. Krutz, D.E., Le, W.: A code clone oracle. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 388–391. ACM, New York (2014)

    Google Scholar 

  60. Lazar, A., Ritchey, S., Sharif, B.: Generating duplicate bug datasets. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 392–395. ACM, New York (2014)

    Google Scholar 

  61. Lazar, A., Ritchey, S., Sharif, B.: Improving the accuracy of duplicate bug report detection using textual similarity measures. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 308–311. ACM, New York (2014)

    Google Scholar 

  62. Lemos, O.A.L., de Paula, A.C., Zanichelli, F.C., Lopes, C.V.: Thesaurus-based automatic query expansion for interface-driven code search. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 212–221. ACM, New York (2014)

    Google Scholar 

  63. Lewis, C., Lin, Z., Sadowski, C., Zhu, X., Ou, R., Whitehead Jr., E.J.: Does bug prediction support human developers? findings from a google case study. In: Proceedings of the 2013 International Conference on Software Engineering, ICSE 2013, pp. 372–381. IEEE Press, Piscataway (2013)

    Google Scholar 

  64. Linares-Vásquez, M., Bavota, G., Bernal-Cárdenas, C., Penta, M.D., Oliveto, R., Poshyvanyk, D.: Api change and fault proneness: a threat to the success of android apps. In: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2013, pp. 477–487. ACM, New York (2013)

    Google Scholar 

  65. Linares-Vásquez, M., Holtzhauer, A., Bernal-Cárdenas, C., Poshyvanyk, D.: Revisiting android reuse studies in the context of code obfuscation and library usages. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 242–251. ACM, New York (2014)

    Google Scholar 

  66. Matragkas, N., Williams, J.R., Kolovos, D.S., Paige, R.F.: Analysing the ‘biodiversity’ of open source ecosystems: the github case. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 356–359. ACM, New York (2014)

    Google Scholar 

  67. McIntosh, S., Kamei, Y., Adams, B., Hassan, A.E.: The impact of code review coverage and code review participation on software quality: a case study of the qt, vtk, and itk projects. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 192–201. ACM, New York (2014)

    Google Scholar 

  68. Menzies, T., Jalali, O., Hihn, J., Baker, D., Lum, K.: Stable rankings for different effort models. Autom. Softw. Eng. 17(4), 409–437 (2010)

    Article  Google Scholar 

  69. Menzies, T., Zimmermann, T.: Software analytics: so what? IEEE Softw. 30(4), 31–37 (2013)

    Article  Google Scholar 

  70. Merten, T., Mager, B., Bürsner, S., Paech, B.: Classifying unstructured data into natural language text and technical information. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 300–303. ACM, New York (2014)

    Google Scholar 

  71. Minku, L.L., Yao, X.: How to make best use of cross-company data in software effort estimation? In: Proceedings of the 36th International Conference on Software Engineering, ICSE 2014, pp. 446–456. ACM, New York (2014)

    Google Scholar 

  72. Mitropoulos, D., Karakoidas, V., Louridas, P., Gousios, G., Spinellis, D.: The bug catalog of the maven ecosystem. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 372–375. ACM, New York (2014)

    Google Scholar 

  73. Mockus, A.: Amassing and indexing a large sample of version control systems: towards the census of public source code history. In: Proceedings of the 2009 6th IEEE International Working Conference on Mining Software Repositories, MSR 2009, pp. 11–20. IEEE Computer Society, Washington (2009)

    Google Scholar 

  74. Mockus, A.: Is mining software repositories data science? (keynote). In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 1–1. ACM, New York (2014)

    Google Scholar 

  75. Mondal, M., Roy, C.K., Schneider, K.A.: Prediction and ranking of co-change candidates for clones. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 32–41. ACM, New York (2014)

    Google Scholar 

  76. Murakami, H., Higo, Y., Kusumoto, S.: A dataset of clone references with gaps. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 412–415. ACM, New York (2014)

    Google Scholar 

  77. Murgia, A., Tourani, P., Adams, B., Ortu, M.: Do developers feel emotions? an exploratory analysis of emotions in software artifacts. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 262–271. ACM, New York (2014)

    Google Scholar 

  78. Nagappan, M., Zimmermann, T., Bird, C.: Diversity in software engineering research. In: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2013, pp. 466–476. ACM, New York (2013)

    Google Scholar 

  79. Nam, J., Pan, S.J., Kim, S.: Transfer defect learning. In: Proceedings of the 2013 International Conference on Software Engineering, ICSE 2013, pp. 382–391. IEEE Press, Piscataway (2013)

    Google Scholar 

  80. Negara, S., Codoban, M., Dig, D., Johnson, R.E.: Mining fine-grained code changes to detect unknown change patterns. In: Proceedings of the 36th International Conference on Software Engineering, ICSE 2014, pp. 803–813. ACM, New York (2014)

    Google Scholar 

  81. Nguyen, H.V., Nguyen, H.A., Nguyen, A.T., Nguyen, T.N.: Mining interprocedural, data-oriented usage patterns in javascript web applications. In: Proceedings of the 36th International Conference on Software Engineering, ICSE 2014, pp. 791–802. ACM, New York (2014)

    Google Scholar 

  82. Nguyen, T.H.D., Nagappan, M., Hassan, A.E., Nasser, M., Flora, P.: An industrial case study of automatically identifying performance regression-causes. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 232–241. ACM, New York (2014)

    Google Scholar 

  83. Nguyen, T.T., Nguyen, H.A., Pham, N.H., Al-Kofahi, J.M., Nguyen, T.N.: Graph-based mining of multiple object usage patterns. In: Proceedings of the the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE 2009, pp. 383–392. ACM, New York (2009)

    Google Scholar 

  84. Nussbaum, L., Zacchiroli, S.: The ultimate debian database: consolidating bazaar metadata for quality assurance and data mining. In: 2010 7th IEEE Working Conference on Mining Software Repositories (MSR), pp. 52–61, May 2010

    Google Scholar 

  85. Ossher, J., Bajracharya, S.K., Lopes, C.V.: Automated dependency resolution for open source software. In: Proceedings of the 7th International Working Conference on Mining Software Repositories, MSR 2010 (Co-located with ICSE), Cape Town, South Africa, 2–3 May, pp. 130–140 (2010)

    Google Scholar 

  86. Padhye, R., Mani, S., Sinha, V.S.: A study of external community contribution to open-source projects on github. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 332–335. ACM, New York (2014)

    Google Scholar 

  87. Passos, L., Czarnecki, K.: A dataset of feature additions and feature removals from the linux kernel. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 376–379. ACM, New York (2014)

    Google Scholar 

  88. Pinto, G., Castor, F., Liu, Y.D.: Mining questions about software energy consumption. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 22–31. ACM, New York (2014)

    Google Scholar 

  89. Pletea, D., Vasilescu, B., Serebrenik, A.: Security and emotion: sentiment analysis of security discussions on github. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 348–351. ACM, New York (2014)

    Google Scholar 

  90. Ponzanelli, L., Bavota, G., Penta, M.D., Oliveto, R., Lanza, M.: Mining stackoverflow to turn the ide into a self-confident programming prompter. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 102–111. ACM, New York (2014)

    Google Scholar 

  91. Porter, M.F.: An algorithm for suffix stripping. Program Electron. Libr. Inf. Syst. 14(3), 130–137 (1980)

    Article  Google Scholar 

  92. Proksch, S., Amann, S., Mezini, M.: Towards standardized evaluation of developer-assistance tools. In: Proceedings of the 4th International Workshop on Recommendation Systems for Software Engineering, RSSE 2014, pp. 14–18. ACM, New York (2014)

    Google Scholar 

  93. Qiu, D., Li, B., Su, Z.: An empirical analysis of the co-evolution of schema and code in database applications. In: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2013, pp. 125–135. ACM, New York (2013)

    Google Scholar 

  94. Rahman, F., Khatri, S., Barr, E.T., Devanbu, P.: Comparing static bug finders and statistical prediction. In: Proceedings of the 36th International Conference on Software Engineering, ICSE, pp. 424–434. ACM, New York (2014)

    Google Scholar 

  95. Rahman, F., Posnett, D., Herraiz, I., Devanbu, P.: Sample size vs. bias in defect prediction. In: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2013, pp. 147–157. ACM, New York (2013)

    Google Scholar 

  96. Rahman, M.S., Aryani, A., Roy, C.K., Perin, F.: On the relationships between domain-based coupling and code clones: an exploratory study. In: Proceedings of the 2013 International Conference on Software Engineering, ICSE 2013, pp. 1265–1268. IEEE Press, Piscataway (2013)

    Google Scholar 

  97. Rahman, M.M., Roy, C.K.: An insight into the pull requests of github. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 364–367. ACM, New York (2014)

    Google Scholar 

  98. Åkerblom, B., Stendahl, J., Tumlin, M., Wrigstad, T.: Tracing dynamic features in python programs. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 292–295. ACM, New York (2014)

    Google Scholar 

  99. Robillard, M.P., Walker, R.J., Zimmermann, T.: Recommendation systems for software engineering. IEEE Softw. 27(4), 80–86 (2010)

    Article  Google Scholar 

  100. Robles, G.,. González-Barahona, J.M., Cervigón, C., Capiluppi, A., Izquierdo-Cortázar, D.: Estimating development effort in free/open source software projects by mining software repositories: a case study of openstack. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 222–231. ACM, New York (2014)

    Google Scholar 

  101. Saha, R.K., Saha, A.K., Perry, D.E.: Toward understanding the causes of unanswered questions in software information sites: a case study of stack overflow. In: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2013, pp. 663–666. ACM, New York (2013)

    Google Scholar 

  102. Saini, V., Sajnani, H., Ossher, J., Lopes, C.V.: A dataset for maven artifacts and bug patterns found in them. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 416–419. ACM, New York (2014)

    Google Scholar 

  103. Shirabad, J.S., Menzies, T.J.: The PROMISE Repository of Software Engineering Databases. School of Information Technology and Engineering, University of Ottawa, Canada (2005)

    Google Scholar 

  104. Schur, M., Roth, A., Zeller, A.: Mining behavior models from enterprise web applications. In: Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2013, pp. 422–432. ACM, New York (2013)

    Google Scholar 

  105. Shearer, C.: The crisp-dm model: the new blueprint for data mining. Data Warehouse. 5, 13–22 (2000)

    Google Scholar 

  106. Sheoran, J., Blincoe, K., Kalliamvakou, E., Damian, D., Ell, J.: Understanding “watchers” on github. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 336–339. ACM, New York (2014)

    Google Scholar 

  107. Shepperd, M., Kadoda, G.: Comparing software prediction techniques using simulation. IEEE Trans. Softw. Eng. 27(11), 1014–1022 (2001)

    Article  Google Scholar 

  108. Shi, A., Gyori, A., Gligoric, M., Zaytsev, A., Marinov, D.: Balancing trade-offs in test-suite reduction. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2014, pp. 246–256. ACM, New York (2014)

    Google Scholar 

  109. Shull, F.J., Carver, J.C., Vegas, S., Juristo, N.: The role of replications in empirical software engineering. Empirical Softw. Eng. 13(2), 211–218 (2008)

    Article  Google Scholar 

  110. Sliwerski, J., Zimmermann, T., Zeller, A.: When do changes induce fixes? In: Proceedings of the 2005 International Workshop on Mining Software Repositories, MSR (2005)

    Google Scholar 

  111. Spacco, J., Strecker, J., Hovemeyer, D., Pugh, W.: Software repository mining with marmoset: an automated programming project snapshot and testing system. In: Proceedings of the 2005 International Workshop on Mining Software Repositories, MSR 2005, pp. 1–5. ACM, New York (2005)

    Google Scholar 

  112. Srinivasan, K., Fisher, D.: Machine learning approaches to estimating software development effort. Trans. Softw. Eng. 21(2), 126–137 (1995)

    Article  Google Scholar 

  113. Steidl, D., Hummel, B., Juergens, E.: Incremental origin analysis of source code files. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 42–51. ACM, New York (2014)

    Google Scholar 

  114. Steinmacher, I., Wiese, I.S., Conte, T., Gerosa, M.A., Redmiles, D.: The hard life of open source software project newcomers. In: Proceedings of the 7th International Workshop on Cooperative and Human Aspects of Software Engineering, CHASE 2014, pp. 72–78. ACM, New York (2014)

    Google Scholar 

  115. Subramanian, S., Inozemtseva, L., Holmes, R.: Live api documentation. In: Proceedings of the 36th International Conference on Software Engineering, ICSE 2014, pp. 643–652. ACM, New York (2014)

    Google Scholar 

  116. Tiarks, R., Maalej, W.: How does a typical tutorial for mobile development look like? In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 272–281. ACM, New York (2014)

    Google Scholar 

  117. Tulsian, V., Kanade, A., Kumar, R., Lal, A., Nori, A.V.: Mux: algorithm selection for software model checkers. In: Mining Software Repositories (MSR). ACM, May 2014

    Google Scholar 

  118. Tymchuk, Y., Mocci, A., Lanza, M.: Collaboration in open-source projects: myth or reality? In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 304–307. ACM, New York (2014)

    Google Scholar 

  119. Garcia, H.V., Shihab, E.: Characterizing and predicting blocking bugs in open source projects. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 72–81. ACM, New York (2014)

    Google Scholar 

  120. Voinea, L., Telea, A.: Mining software repositories with cvsgrab. In: Proceedings of the 2006 International Workshop on Mining Software Repositories, MSR 2006, pp. 167–168. ACM, New York (2006)

    Google Scholar 

  121. Williams, J.R., Ruscio, D.D., Matragkas, N., Rocco, J.D., Kolovos, D.S.: Models of oss project meta-information: a dataset of three forges. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 408–411. ACM, New York (2014)

    Google Scholar 

  122. Würsch, M., Ghezzi, G., Hert, M., Reif, G., Gall, H.: Seon: a pyramid of ontologies for software evolution and its applications. Computing 94(11), 857–885 (2012)

    Article  Google Scholar 

  123. Würsch, M., Ghezzi, G., Reif, G., Gall, H.C.: Supporting developers with natural language queries. In: Proceedings of the 32nd International Conference on Software Engineering. ACM, May 2010

    Google Scholar 

  124. Würsch, M., Giger, E., Gall, H.: Evaluating a query framework for software evolution data. ACM Trans. Softw. Eng. Method. 22(4), 38–38 (2013)

    Article  Google Scholar 

  125. Yamashita, K.: Modular construction of an analysis tool for mining software repositories. In: Proceedings of the 12th Annual International Conference Companion on Aspect-oriented Software Development, AOSD 2013 Companion, pp. 37–38. ACM, New York (2013)

    Google Scholar 

  126. Yamashita, K., McIntosh, S., Kamei, Y., Ubayashi, N.: Magnet or sticky? an oss project-by-project typology. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 344–347. ACM, New York (2014)

    Google Scholar 

  127. Zaidman, A., Van Rompaey, B., van Deursen, A., Demeyer, S.: Studying the co-evolution of production and test code in open source and industrial developer test processes through repository mining. Empirical Softw. Eng. 16(3), 325–364 (2011)

    Article  Google Scholar 

  128. Zanetti, M.S., Scholtes, I., Tessone, C.J., Schweitzer, F.: Categorizing bugs with social networks: a case study on four open source software communities. In: Proceedings of the 2013 International Conference on Software Engineering, ICSE 2013, pp. 1032–1041. IEEE Press, Piscataway (2013)

    Google Scholar 

  129. Zanjani, M.B., Swartzendruber, G., Kagdi, H.: Impact analysis of change requests on source code based on interaction and commit histories. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 162–171. ACM, New York (2014)

    Google Scholar 

  130. Zhang, C., Hindle, A.: A green miner’s dataset: mining the impact of software change on energy consumption. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 400–403. ACM, New York (2014)

    Google Scholar 

  131. Zhang, F., Mockus, A., Keivanloo, I., Zou, Y.: Towards building a universal defect prediction model. In: Proceedings of the 11th Working Conference on Mining Software Repositories, MSR 2014, pp. 182–191. ACM, New York (2014)

    Google Scholar 

  132. Zhang, H., Gong, L., Versteeg, S.: Predicting bug-fixing time: an empirical study of commercial software projects. In: Proceedings of the 2013 International Conference on Software Engineering, ICSE 2013, pp. 1042–1051. IEEE Press, Piscataway (2013)

    Google Scholar 

Download references

Acknowledgements

This work was partially funded by the German Federal Ministry of Education and Research (BMBF) within the Software Campus projects KaVE and Eko, both grant no. 01IS12054. The views and opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or position of the funding agency.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sven Amann .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Amann, S., Beyer, S., Kevic, K., Gall, H. (2015). Software Mining Studies: Goals, Approaches, Artifacts, and Replicability. In: Meyer, B., Nordio, M. (eds) Software Engineering. LASER LASER 2013 2014. Lecture Notes in Computer Science(), vol 8987. Springer, Cham. https://doi.org/10.1007/978-3-319-28406-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-28406-4_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-28405-7

  • Online ISBN: 978-3-319-28406-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics