A Qualitative Method for Mining Open Source Software Repositories

  • John Noll
  • Dominik Seichter
  • Sarah Beecham
Part of the IFIP Advances in Information and Communication Technology book series (IFIPAICT, volume 378)


The volume of data archived in open source software project repositories makes automated, quantitative techniques attractive for extracting and analyzing information from these archives. However, many kinds of archival data include blocks of natural language text that are difficult to analyze automatically.

This paper introduces a qualitative analysis method that is transparent and repeatable, leads to objective findings when dealing with qualitative data, and is efficient enough to be applied to large archives.

The method was applied in a case study of developer and user forum discussions of an open source electronic medical record project. The study demonstrates that the qualitative repository mining method can be employed to derive useful results quickly yet accurately. These results would not be possible using a strictly automated approach.


Open Source Software Electronic Medical Record Qualitative Research 


  1. 1.
    Bakeman, R.: Behavioral observation and coding. In: Reis, H.T., Judge, C.M. (eds.) Handbook of Research Methods in Social and Personality Psychology, pp. 138–159. Cambridge University Press (2000)Google Scholar
  2. 2.
    Burnard, P.: A method of analysing interview transcripts in qualitative research. Nurse Education Today 11, 461–466 (1991)CrossRefGoogle Scholar
  3. 3.
    Dewey, M.E.: Coefficients of agreement. British Journal of Psychiatry 143, 487–489 (1983)CrossRefGoogle Scholar
  4. 4.
    El Emam, K., Wieczorek, I.: The repeatability of code defect classifications. In: Proceedings, Ninth International Symposium on Software Reliability Engineering (November 1998)Google Scholar
  5. 5.
    El Emam, K., Goldenson, D., Briand, L., Marshall, P.: Interrater agreement in SPICE-based assessments: some preliminary results. In: Proceedings, Fourth International Conference on the Software Process (December 1996)Google Scholar
  6. 6.
    El Emam, K., Simon, J.-M., Rousseau, S., Jacquet, E.: Cost implications of interrater agreement for software process assessments. In: Proceedings, Fifth International Software Metrics Symposium (November 1998)Google Scholar
  7. 7.
    Fusaro, P., El Emam, K., Smith, B.: Evaluating the interrater agreement of process capability ratings. In: Proceedings, Fourth International Software Metrics Symposium (November 1997)Google Scholar
  8. 8.
    Hall, T., Bowes, D., Liebchen, G., Wernick, P.: Evaluating Three Approaches to Extracting Fault Data from Software Change Repositories. In: Ali Babar, M., Vierimaa, M., Oivo, M. (eds.) PROFES 2010. LNCS, vol. 6156, pp. 107–115. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  9. 9.
    Henningsson, K., Wohlin, C.: Assuring fault classification agreement - an empirical evaluation. In: International Symposium on Empirical Software Engineering (ISESE 2004) (August 2004)Google Scholar
  10. 10.
    Krippendorff, K.: Content Analysis: An Introduction to Its Methodology, 2nd edn. Sage Publications (2004)Google Scholar
  11. 11.
    Landis, J.R., Koch, G.G.: An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics 33(2), 363–374 (1977)MathSciNetMATHCrossRefGoogle Scholar
  12. 12.
    Lee, H.-Y., Jung, H.-W., Chung, C.-S., Lee, J.M., Lee, K.W., Jeong, H.J.: Analysis of interrater agreement in ISO/IEC 15504-based software process assessment. In: Proceedings Second Asia-Pacific Conference on Quality Software (2001)Google Scholar
  13. 13.
    Mantyla, M.V.: An experiment on subjective evolvability evaluation of object-oriented software: explaining factors and interrater agreement. In: International Symposium on Empirical Software Engineering (November 2005)Google Scholar
  14. 14.
    Noll, J., Beecham, S., Seichter, D.: A qualitative study of open source software development: the OpenEMR project. In: 5th International Symposium on Empirical Software Engineering and Measurement (ESEM 2011), Banff, Alberta, Canada (Septemebr 2011)Google Scholar
  15. 15.
    Park, H.-M., Jung, H.-W.: Evaluating interrater agreement with intraclass correlation coefficient in SPICE-based software process assessment. In: Proceedings, Third International Conference on Quality Software (November 2003)Google Scholar
  16. 16.
    Vieira, S., Kaymak, U., Sousa, J.: Cohen’s kappa coefficient as a performance measure for feature selection. In: 2010 IEEE International Conference on Fuzzy Systems (FUZZ) (July 2010)Google Scholar
  17. 17.
    Vilbergsdóttir, S.G., Hvannberg, E.T., Law, L.-C.: Classification of usability problems (CUP) scheme. In: Proceedings of the 4th Nordic Conference on Human-Computer Interaction, NordiCHI 2006 (2003)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2012

Authors and Affiliations

  • John Noll
    • 1
  • Dominik Seichter
    • 1
  • Sarah Beecham
    • 1
  1. 1.Lero, The Irish Software Engineering Centre, Department of Computer Science and Information SystemsUniversity of LimerickLimerickIreland

Personalised recommendations