Measurement and Interpretation of Productivity and Functional Correctness
Measuring developer productivity and functional correctness is central to evaluating software practices and techniques. Researchers use a wide variety of measurement and reporting methods. As such, the interpretation, aggregation and comparison of experimental results become difficult. The problems often reduce to the proper ways of defining units of development work and quantifying developer output. For example, when is it appropriate to measure productivity in terms of elapsed problem solving time vs. output per unit time? Is number of lines of source code an eternally damned output measure in all situations? How do we define task completion to measure problem solving time? What are the consequences of having a cut-off time? When is a minimum quality or usability criterion necessary? How should such a criterion be defined? What are some good output metrics that proxy external functionality? How can we effectively measure these metrics? When is it acceptable to define functional correctness as a binary variable? What are the pros and cons of objective vs. subjective measures?